Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by ShalinMangar:
http://wiki.apache.org/solr/DataImportHandler

The comment on the change is:
Added example for delta data-config with deltaQuery and parentDeltaQuery

------------------------------------------------------------------------------
  Pay attention to the ''deltaQuery'' attribute which has an SQL statement 
capable of detecting changes in the ''item'' table. Note the variable 
{{{${dataimporter.last_index_time}}}}
  The DataImportHandler exposes a variable called ''last_index_time'' which is 
a timestamp value denoting the last time ''full-import'' ''''or'''' 
''delta-import'' was run. You can use this variable anywhere in the SQL you 
write in data-config.xml and it will be replaced by the value during processing.
  
- ''''Note''''
+ '''Note'''
-  - The deltaQuery in the above example only detects changes in ''item'' but 
not in other tables. You can detect the changes to all child tables in one SQL 
query as specified below. Figuring out it's details is an exercise for the user 
:)
+  * The deltaQuery in the above example only detects changes in ''item'' but 
not in other tables. You can detect the changes to all child tables in one SQL 
query as specified below. Figuring out it's details is an exercise for the user 
:)
  {{{
        deltaQuery="select id from item where id in
                                (select item_id as id from feature where 
last_modified > '${dataimporter.last_index_time}')
@@ -201, +201 @@

                                or last_modified > 
'${dataimporter.last_index_time}')
                                or last_modified > 
'${dataimporter.last_index_time}'"
  }}}
-  - Writing a huge deltaQuery like the above one is not a very enjoyable task, 
so we have an alternate easy mechanism of achieving this goal. TODO: Give an 
example here.
+  * Writing a huge deltaQuery like the above one is not a very enjoyable task, 
so we have an alternate mechanism of achieving this goal.
+ {{{
+ <dataConfig>
+     <document name="products">
+           <entity name="item" pk="ID" query="select * from item"
+               deltaQuery="select id from item where last_modified > 
'${dataimporter.last_index_time}'">
+             <field column="ID" name="id" />
+             <field column="NAME" name="name" />
+             <field column="NAME" name="nameSort" />
+             <field column="NAME" name="alphaNameSort" />
+             <field column="MANU" name="manu" />
+             <field column="WEIGHT" name="weight" />
+             <field column="PRICE" name="price" />
+             <field column="POPULARITY" name="popularity" />
+             <field column="INSTOCK" name="inStock" />
+             <field column="INCLUDES" name="includes" />
+ 
+             <entity name="feature" pk="ITEM_ID" 
+                   query="select DESCRIPTION from FEATURE where 
ITEM_ID='${item.ID}'"
+                   deltaQuery="select ITEM_ID from FEATURE where last_modified 
> '${dataimporter.last_index_time}'"
+                   parentDeltaQuery="select ID from item where 
ID=${feature.ITEM_ID}">
+                 <field name="features" column="DESCRIPTION" />
+             </entity>
+           
+           <entity name="item_category" pk="ITEM_ID, CATEGORY_ID"
+                   query="select CATEGORY_ID from item_category where 
ITEM_ID='${item.ID}'"
+                   deltaQuery="select ITEM_ID, CATEGORY_ID from item_category 
where last_modified > '${dataimporter.last_index_time}'"
+                   parentDeltaQuery="select ID from item where 
ID=${item_category.ITEM_ID}">
+                 <entity name="category" pk="ID"
+                       query="select DESCRIPTION from category where ID = 
'${item_category.CATEGORY_ID}'"
+                       deltaQuery="select ID from category where last_modified 
> '${dataimporter.last_index_time}'"
+                       parentDeltaQuery="select ITEM_ID, CATEGORY_ID from 
item_category where CATEGORY_ID=${category.ID}">
+                     <field column="description" name="cat" />
+                 </entity>
+           </entity>
+         </entity>
+     </document>
+ </dataConfig>
+ }}}
+ 
+ Here we have three queries specified for each entity except the root (which 
has only two).
+  * The ''query'' gives us the data needed to populate fields of the SOLR 
document
+  * The ''deltaQuery'' gives the primary keys of the current entity which have 
changes since the last index time
+  * The ''parentDeltaQuery'' uses the changed rows of the current table 
(fetched with deltaQuery) to give the changed rows in the parent table. This is 
necessary because whenever a row in the child table changes, we need to 
re-generate the document which has that field.
+ 
+ Let us reiterate on the findings:
+  * For each row given by ''query'', the query of the child row is executed.
+  * For each row given by ''deltaQuery'', the parentDeltaQuery is executed.
+  * If any row in the root/child parent changes, we regenerate the complete 
SOLR document which contained that row.
  
  = Where to find it? =
  DataImportHandler is not in SOLR right now. It exists as a patch in 
[http://issues.apache.org/jira/browse/SOLR-469 SOLR-469] in the SOLR JIRA. 
Please help us by giving your comments, suggestions and/or code contributions 
on this new feature.

Reply via email to