Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by NoblePaul: http://wiki.apache.org/solr/DataImportHandler ------------------------------------------------------------------------------ = Usage with XML/HTTP Datasource = DataImportHandler can be used to index data from HTTP based data sources. This includes using indexing from REST/XML APIs as well as from RSS/ATOM Feeds. - == Configuration in solrconfig.xml == - A sample DataImportHandler configuration in solrconfig.xml looks like this + == Configuration of HttpDataSource == + + A sample configuration in for !HttpdataSource in data config xml looks like this {{{ + <dataSource type="HttpDataSource" baseUrl="http://host:port/" encoding="UTF-8" connectionTimeout="5000" readTimeout="10000"/> - <requestHandler name="/dataimport" class="org.apache.solr.handler.DataImportHandler"> - <lst name="defaults"> - <str name="config">/home/username/data-config.xml</str> - <lst name="datasource"> - <str name="type">HttpDataSource</str> - <str name="baseUrl">http://host:port/</str> - <str name="encoding">UTF-8</str> - <str name="connectionTimeout">5000</str> - <str name="readTimeout">10000</str> - </lst> - </lst> - </requestHandler> }}} ''' The attributes are ''' * '''`baseUrl`''' (optional): you should use it when the host/port changes between Dev/QA/Prod environments. Using this attribute isolates the changes to be made to the solrconfig.xml - * '''`encoding`'''(optional): by default the encoding in the response from the URL is used. You can use this property to override the default encoding. + * '''`encoding`'''(optional): By default the encoding in the response header is used. You can use this property to override the default encoding. * '''`connectionTimeout`''' (optional):The default value is 5000ms * '''`readTimeout`''' (optional): the default value is 10000ms @@ -358, +348 @@ * '''`processor`''' (required) : The value must be `"XPathEntityProcessor"` * '''`url`''' (required) : The url used to invoke the REST API. (Can be templatized) * '''`forEach`'''(required) : The xpath expression which demarcates a record. If there are mutiple types of record separate them with ''" | "'' (pipe) + * '''`xsl`'''(optional): This will be used as a preprocessor for applying the XSL transformation. Provide the full path in the filesystem or a url. + * '''`useSolrAddSchema`'''(optional): Set it's value to 'true' if the xml that is fed into this processor has the same schema as that of the solr add xml. No need to mention any fields if it is set to true. The fields can have the following attributes (over and above the default attributes): @@ -374, +366 @@ Download the full import example given in the DB section to try this out. We'll try indexing the [http://rss.slashdot.org/Slashdot/slashdot Slashdot RSS feed] for this example. - The dataimport section in solrconfig.xml looks like this: - {{{ - <requestHandler name="/dataimport" - class="org.apache.solr.handler.dataimport.DataImportHandler"> - <lst name="defaults"> - <str name="config">rss-data-config.xml</str> - <lst name="datasource"> - <str name="type">HttpDataSource</str> - </lst> - </lst> - </requestHandler> - }}} The data-config for this example looks like this: {{{ <dataConfig> + <dataSource type="HttpDataSource" /> <document> <entity name="slashdot" pk="link"
