Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by ShalinMangar:
http://wiki.apache.org/solr/DataImportHandler

The comment on the change is:
Example for indexing Slashdot RSS feed

------------------------------------------------------------------------------
  If an API supports chunking (when the dataset is too large) multiple calls 
need to be made to complete the process. 
  X!PathEntityprocessor supports this with a transformer. If transformer 
returns a row which contains a field '''`$hasMore`''' with a the value `"true"` 
the Processor makes another request with the same url template (The actual 
value is recomputed before invoking ). A transformer can pass a totally new url 
too for the next call by returning a row which contains a field 
'''`$nextUrl`''' whose value must be the complete url for the next call.
  
- The X!PathEntityProcessor implements a streaming parser which supports a 
subset of xpath syntax. Complete xpath syntax is not supported but most of the 
common use cases are covered
+ The X!PathEntityProcessor implements a streaming parser which supports a 
subset of xpath syntax. Complete xpath syntax is not supported but most of the 
common use cases are covered.
+ 
+ == HttpDataSource Example ==
+ 
+ Download the full import example given in the DB section to try this out. 
We'll try indexing the [http://rss.slashdot.org/Slashdot/slashdot Slashdot RSS 
feed] for this example.
+ 
+ The dataimport section in solrconfig.xml looks like this:
+ {{{
+    <requestHandler name="/dataimport"
+       class="org.apache.solr.handler.dataimport.DataImportHandler">
+       <lst name="defaults">
+               <str name="config">rss-data-config.xml</str>
+               <lst name="datasource">
+                       <str name="type">HttpDataSource</str>
+               </lst>
+       </lst>
+    </requestHandler>
+ }}}
+ 
+ The data-config for this example looks like this:
+ {{{
+ <dataConfig>
+ 
+       <document>
+               <entity name="slashdot"
+                               pk="link"
+                               url="http://rss.slashdot.org/Slashdot/slashdot";
+                               processor="XPathEntityProcessor"
+                               forEach="/RDF/channel | /RDF/item"
+                               transformer="DateFormatTransformer">
+                               
+                       <field column="source" xpath="/RDF/channel/title" 
commonField="true" />
+                       <field column="source-link" xpath="/RDF/channel/link" 
commonField="true" />
+                       <field column="subject" xpath="/RDF/channel/subject" 
commonField="true" />
+                       
+                       <field column="title" xpath="/RDF/item/title" />
+                       <field column="link" xpath="/RDF/item/link" />
+                       <field column="description" 
xpath="/RDF/item/description" />
+                       <field column="creator" xpath="/RDF/item/creator" />
+                       <field column="item-subject" xpath="/RDF/item/subject" 
/>
+                       <field column="date" xpath="/RDF/item/date" 
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" />
+                       <field column="slash-department" 
xpath="/RDF/item/department" />
+                       <field column="slash-section" xpath="/RDF/item/section" 
/>
+                       <field column="slash-comments" 
xpath="/RDF/item/comments" />
+               </entity>
+       </document>
+ </dataConfig>
+ }}}
+ 
  = Extending the tool with APIs =
  The examples we explored are admittedly, trivial . It is not possible to have 
all user needs met by an xml configuration alone. So we expose a few interfaces 
which can be implemented by the user to enhance the functionality.
  

Reply via email to