Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by ShalinMangar: http://wiki.apache.org/solr/DataImportHandler The comment on the change is: Example for indexing Slashdot RSS feed ------------------------------------------------------------------------------ If an API supports chunking (when the dataset is too large) multiple calls need to be made to complete the process. X!PathEntityprocessor supports this with a transformer. If transformer returns a row which contains a field '''`$hasMore`''' with a the value `"true"` the Processor makes another request with the same url template (The actual value is recomputed before invoking ). A transformer can pass a totally new url too for the next call by returning a row which contains a field '''`$nextUrl`''' whose value must be the complete url for the next call. - The X!PathEntityProcessor implements a streaming parser which supports a subset of xpath syntax. Complete xpath syntax is not supported but most of the common use cases are covered + The X!PathEntityProcessor implements a streaming parser which supports a subset of xpath syntax. Complete xpath syntax is not supported but most of the common use cases are covered. + + == HttpDataSource Example == + + Download the full import example given in the DB section to try this out. We'll try indexing the [http://rss.slashdot.org/Slashdot/slashdot Slashdot RSS feed] for this example. + + The dataimport section in solrconfig.xml looks like this: + {{{ + <requestHandler name="/dataimport" + class="org.apache.solr.handler.dataimport.DataImportHandler"> + <lst name="defaults"> + <str name="config">rss-data-config.xml</str> + <lst name="datasource"> + <str name="type">HttpDataSource</str> + </lst> + </lst> + </requestHandler> + }}} + + The data-config for this example looks like this: + {{{ + <dataConfig> + + <document> + <entity name="slashdot" + pk="link" + url="http://rss.slashdot.org/Slashdot/slashdot" + processor="XPathEntityProcessor" + forEach="/RDF/channel | /RDF/item" + transformer="DateFormatTransformer"> + + <field column="source" xpath="/RDF/channel/title" commonField="true" /> + <field column="source-link" xpath="/RDF/channel/link" commonField="true" /> + <field column="subject" xpath="/RDF/channel/subject" commonField="true" /> + + <field column="title" xpath="/RDF/item/title" /> + <field column="link" xpath="/RDF/item/link" /> + <field column="description" xpath="/RDF/item/description" /> + <field column="creator" xpath="/RDF/item/creator" /> + <field column="item-subject" xpath="/RDF/item/subject" /> + <field column="date" xpath="/RDF/item/date" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" /> + <field column="slash-department" xpath="/RDF/item/department" /> + <field column="slash-section" xpath="/RDF/item/section" /> + <field column="slash-comments" xpath="/RDF/item/comments" /> + </entity> + </document> + </dataConfig> + }}} + = Extending the tool with APIs = The examples we explored are admittedly, trivial . It is not possible to have all user needs met by an xml configuration alone. So we expose a few interfaces which can be implemented by the user to enhance the functionality.
