Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by ShalinMangar: http://wiki.apache.org/solr/DataImportHandler The comment on the change is: Added example for delta-import ------------------------------------------------------------------------------ This is a relational model of the same schema that SOLR currently ships with. We will use this as an example to build a data-config.xml for DataImportHandler. We've created a sample database with this schema in HSQLDB. To run it, do the following steps: - * Download [http://wiki.apache.org/solr-data/attachments/DataImportHandler/attachments/hsqldb-database.zip hsqldb-database.zip] to execute this example. This zip file contains all the hsqldb related files. Extract the downloaded zip file into c:\temp. Also save the following xml as c:\temp\example-data-config.xml + * Download [http://wiki.apache.org/solr-data/attachments/DataImportHandler/attachments/example-database.zip example-database.zip] to execute this example. This zip file contains all the hsqldb related files. Extract the downloaded zip file into c:\temp. Also save the following xml as c:\temp\example-data-config.xml {{{ <dataConfig> @@ -150, +150 @@ When delta-import command is executed, it reads the start time stored in ''conf/dataimport.properties''. It uses that timestamp to run delta queries (TODO: Example) and after completion, updates the timestamp in ''conf/dataimport.properties''. + === Delta-Import Example === + We will use the same example database used in the full import example. Note that the database schema has been updated and each table contains an additional column ''last_modified'' of timestamp type. You may want to download the database again since it has been updated recently. We use this timestamp field to determine what rows in each table have changed since the last indexed time. + + Take a look at the following data-config.xml + + {{{ + <dataConfig> + <document name="products"> + <entity name="item" pk="ID" query="select * from item" + deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'"> + <field column="ID" name="id" /> + <field column="NAME" name="name" /> + <field column="NAME" name="nameSort" /> + <field column="NAME" name="alphaNameSort" /> + <field column="MANU" name="manu" /> + <field column="WEIGHT" name="weight" /> + <field column="PRICE" name="price" /> + <field column="POPULARITY" name="popularity" /> + <field column="INSTOCK" name="inStock" /> + <field column="INCLUDES" name="includes" /> + + <entity name="feature" pk="ITEM_ID" + query="select DESCRIPTION from FEATURE where ITEM_ID='${item.ID}'"> + <field name="features" column="DESCRIPTION" /> + </entity> + <entity name="item_category" pk="ITEM_ID, CATEGORY_ID" + query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'"> + <entity name="category" pk="ID" + query="select DESCRIPTION from category where ID = '${item_category.CATEGORY_ID}'"> + <field column="description" name="cat" /> + </entity> + </entity> + </entity> + </document> + </dataConfig> + }}} + + Pay attention to the ''deltaQuery'' attribute which has an SQL statement capable of detecting changes in the ''item'' table. Note the variable {{{${dataimporter.last_index_time}}}} + The DataImportHandler exposes a variable called ''last_index_time'' which is a timestamp value denoting the last time ''full-import'' ''''or'''' ''delta-import'' was run. You can use this variable anywhere in the SQL you write in data-config.xml and it will be replaced by the value during processing. + + ''''Note'''' + - The deltaQuery in the above example only detects changes in ''item'' but not in other tables. You can detect the changes to all child tables in one SQL query as specified below. Figuring out it's details is an exercise for the user :) + {{{ + deltaQuery="select id from item where id in + (select item_id as id from feature where last_modified > '${dataimporter.last_index_time}') + or id in + (select item_id as id from item_category where item_id in + (select id as item_id from category where last_modified > '${dataimporter.last_index_time}') + or last_modified > '${dataimporter.last_index_time}') + or last_modified > '${dataimporter.last_index_time}'" + }}} + - Writing a huge deltaQuery like the above one is not a very enjoyable task, so we have an alternate easy mechanism of achieving this goal. TODO: Give an example here. + = Where to find it? = DataImportHandler is not in SOLR right now. It exists as a patch in [http://issues.apache.org/jira/browse/SOLR-469 SOLR-469] in the SOLR JIRA. Please help us by giving your comments, suggestions and/or code contributions on this new feature.
