Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by ShalinMangar: http://wiki.apache.org/solr/DataImportHandler The comment on the change is: Added example for delta data-config with deltaQuery and parentDeltaQuery ------------------------------------------------------------------------------ Pay attention to the ''deltaQuery'' attribute which has an SQL statement capable of detecting changes in the ''item'' table. Note the variable {{{${dataimporter.last_index_time}}}} The DataImportHandler exposes a variable called ''last_index_time'' which is a timestamp value denoting the last time ''full-import'' ''''or'''' ''delta-import'' was run. You can use this variable anywhere in the SQL you write in data-config.xml and it will be replaced by the value during processing. - ''''Note'''' + '''Note''' - - The deltaQuery in the above example only detects changes in ''item'' but not in other tables. You can detect the changes to all child tables in one SQL query as specified below. Figuring out it's details is an exercise for the user :) + * The deltaQuery in the above example only detects changes in ''item'' but not in other tables. You can detect the changes to all child tables in one SQL query as specified below. Figuring out it's details is an exercise for the user :) {{{ deltaQuery="select id from item where id in (select item_id as id from feature where last_modified > '${dataimporter.last_index_time}') @@ -201, +201 @@ or last_modified > '${dataimporter.last_index_time}') or last_modified > '${dataimporter.last_index_time}'" }}} - - Writing a huge deltaQuery like the above one is not a very enjoyable task, so we have an alternate easy mechanism of achieving this goal. TODO: Give an example here. + * Writing a huge deltaQuery like the above one is not a very enjoyable task, so we have an alternate mechanism of achieving this goal. + {{{ + <dataConfig> + <document name="products"> + <entity name="item" pk="ID" query="select * from item" + deltaQuery="select id from item where last_modified > '${dataimporter.last_index_time}'"> + <field column="ID" name="id" /> + <field column="NAME" name="name" /> + <field column="NAME" name="nameSort" /> + <field column="NAME" name="alphaNameSort" /> + <field column="MANU" name="manu" /> + <field column="WEIGHT" name="weight" /> + <field column="PRICE" name="price" /> + <field column="POPULARITY" name="popularity" /> + <field column="INSTOCK" name="inStock" /> + <field column="INCLUDES" name="includes" /> + + <entity name="feature" pk="ITEM_ID" + query="select DESCRIPTION from FEATURE where ITEM_ID='${item.ID}'" + deltaQuery="select ITEM_ID from FEATURE where last_modified > '${dataimporter.last_index_time}'" + parentDeltaQuery="select ID from item where ID=${feature.ITEM_ID}"> + <field name="features" column="DESCRIPTION" /> + </entity> + + <entity name="item_category" pk="ITEM_ID, CATEGORY_ID" + query="select CATEGORY_ID from item_category where ITEM_ID='${item.ID}'" + deltaQuery="select ITEM_ID, CATEGORY_ID from item_category where last_modified > '${dataimporter.last_index_time}'" + parentDeltaQuery="select ID from item where ID=${item_category.ITEM_ID}"> + <entity name="category" pk="ID" + query="select DESCRIPTION from category where ID = '${item_category.CATEGORY_ID}'" + deltaQuery="select ID from category where last_modified > '${dataimporter.last_index_time}'" + parentDeltaQuery="select ITEM_ID, CATEGORY_ID from item_category where CATEGORY_ID=${category.ID}"> + <field column="description" name="cat" /> + </entity> + </entity> + </entity> + </document> + </dataConfig> + }}} + + Here we have three queries specified for each entity except the root (which has only two). + * The ''query'' gives us the data needed to populate fields of the SOLR document + * The ''deltaQuery'' gives the primary keys of the current entity which have changes since the last index time + * The ''parentDeltaQuery'' uses the changed rows of the current table (fetched with deltaQuery) to give the changed rows in the parent table. This is necessary because whenever a row in the child table changes, we need to re-generate the document which has that field. + + Let us reiterate on the findings: + * For each row given by ''query'', the query of the child row is executed. + * For each row given by ''deltaQuery'', the parentDeltaQuery is executed. + * If any row in the root/child parent changes, we regenerate the complete SOLR document which contained that row. = Where to find it? = DataImportHandler is not in SOLR right now. It exists as a patch in [http://issues.apache.org/jira/browse/SOLR-469 SOLR-469] in the SOLR JIRA. Please help us by giving your comments, suggestions and/or code contributions on this new feature.
