Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by NoblePaul: http://wiki.apache.org/solr/DataImportHandler ------------------------------------------------------------------------------ <dataConfig> <dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:/temp/example/ex" user="sa" /> <document name="products"> - <entity name="item" pk="ID" query="select * from item" + <entity name="item" pk="ID" + query="select * from item" + deltaImportQuery="select * from item where ID=='${dataimporter.delta.id}'" deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'"> - <entity name="feature" pk="ITEM_ID" query="select description as features from feature where item_id='${item.ID}'"> </entity> @@ -253, +254 @@ Pay attention to the ''deltaQuery'' attribute which has an SQL statement capable of detecting changes in the ''item'' table. Note the variable {{{${dataimporter.last_index_time}}}} The DataImportHandler exposes a variable called ''last_index_time'' which is a timestamp value denoting the last time ''full-import'' ''''or'''' ''delta-import'' was run. You can use this variable anywhere in the SQL you write in data-config.xml and it will be replaced by the value during processing. + /!\ Note * The deltaQuery in the above example only detects changes in ''item'' but not in other tables. You can detect the changes to all child tables in one SQL query as specified below. Figuring out it's details is an exercise for the user :) {{{ @@ -270, +272 @@ <dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:/temp/example/ex" user="sa" /> <document> <entity name="item" pk="ID" query="select * from item" + deltaImportQuery="select * from item where ID=='${dataimporter.delta.id}'" deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'"> <entity name="feature" pk="ITEM_ID" query="select DESCRIPTION as features from FEATURE where ITEM_ID='${item.ID}'" @@ -292, +295 @@ }}} Here we have three queries specified for each entity except the root (which has only two). - * The ''query'' gives us the data needed to populate fields of the Solr document + * The ''query'' gives the data needed to populate fields of the Solr document in fill-import + * The ''deltaImportQuery'' gives the data needed to populate fields when running a delta-import * The ''deltaQuery'' gives the primary keys of the current entity which have changes since the last index time * The ''parentDeltaQuery'' uses the changed rows of the current table (fetched with deltaQuery) to give the changed rows in the parent table. This is necessary because whenever a row in the child table changes, we need to re-generate the document which has that field. @@ -300, +304 @@ * For each row given by ''query'', the query of the child entity is executed once. * For each row given by ''deltaQuery'', the parentDeltaQuery is executed. * If any row in the root/child entity changes, we regenerate the complete Solr document which contained that row. + + /!\ Note : The 'deltaImportQuery' is a Solr 1.4 feature. Originally it was generated automatically using the 'query' attribute which is error prone. + /!\ Note : It is possible to do delta-import using a full-import command . [http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta See here] = Usage with XML/HTTP Datasource = DataImportHandler can be used to index data from HTTP based data sources. This includes using indexing from REST/XML APIs as well as from RSS/ATOM Feeds.
