Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by NoblePaul: http://wiki.apache.org/solr/DataImportHandler ------------------------------------------------------------------------------ Pay attention to the ''deltaQuery'' attribute which has an SQL statement capable of detecting changes in the ''item'' table. Note the variable {{{${dataimporter.last_index_time}}}} The DataImportHandler exposes a variable called ''last_index_time'' which is a timestamp value denoting the last time ''full-import'' ''''or'''' ''delta-import'' was run. You can use this variable anywhere in the SQL you write in data-config.xml and it will be replaced by the value during processing. - '''Note''' + /!\ Note * The deltaQuery in the above example only detects changes in ''item'' but not in other tables. You can detect the changes to all child tables in one SQL query as specified below. Figuring out it's details is an exercise for the user :) {{{ deltaQuery="select id from item where id in @@ -374, +374 @@ You can use this feature for indexing from REST API's such as rss/atom feeds, XML data feeds , other SOLR servers or even well formed xhtml documents . Our XPath support has its limitations (no wildcards , only fullpath etc) but we have tried to make sure that common use-cases are covered and since it's based on a streaming parser, it is extremely fast and consumes constant amount of memory even for large XMLs. It does not support namespaces , but it can handle xmls with namespaces . When you provide the xpath, just drop the namespace and give the rest (eg if the tag is `'<dc:subject>'` the mapping should just contain `'subject'`).Easy, isn't it? And you didn't need to write one line of code! Enjoy :) - note: Unlike with database , it is not possible to omit the field declarations if you are using X!PathEntityProcessor. It relies on the xpaths declared in the fields to identify what to extract from the xml. + /!\ Note : Unlike with database , it is not possible to omit the field declarations if you are using X!PathEntityProcessor. It relies on the xpaths declared in the fields to identify what to extract from the xml. = Extending the tool with APIs = The examples we explored are admittedly, trivial . It is not possible to have all user needs met by an xml configuration alone. So we expose a few interfaces which can be implemented by the user to enhance the functionality. @@ -383, +383 @@ {{{ <entity name="foo" transformer="com.foo.Foo" ... /> }}} + /!\ Note -- The trasformer value has to be fully qualified classname .If the class package is `'org.apache.solr.handler.dataimport'` the package name can be omitted. The solr.<classname> also works if the class belongs to one of the 'solr' packages . This rule applies for all the pluggable classes like !DataSource , !Entityprocessor and Evaluator. the class 'Foo' must implement the interface `org.apache.solr.hander.dataimport.Transformer` The interface has only one method. @@ -522, +523 @@ {{{ <dataConfig> <dataSource type="FileDataSource" /> - <document> + <document> - <entity name="f" processor="FileListEntityProcessor" fileName=".*xml" newerThan="'NOW-3DAYS'" recursive="true" rootEntity="false" dataSource="null"> + <entity name="f" processor="FileListEntityProcessor" fileName=".*xml" newerThan="'NOW-3DAYS'" recursive="true" rootEntity="false" dataSource="null"> - <entity name="x" processor="XPathEntityProcessor" forEach="/the/record/xpath" url="${f.fileAbsolutePath}"> + <entity name="x" processor="XPathEntityProcessor" forEach="/the/record/xpath" url="${f.fileAbsolutePath}"> - <field column="full_name" xpath="/field/xpath"/> + <field column="full_name" xpath="/field/xpath"/> - </entity> </entity> + </entity> - <document> + <document> <dataConfig> }}} Do not miss the `rootEntity` attribute. The implicit fields generated by the processor are `fileAbsolutePath,fileSize,fileLastModified,fileName`.
