Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by ShalinMangar:
http://wiki.apache.org/solr/DataImportHandler

The comment on the change is:
Added example for delta-import

------------------------------------------------------------------------------
  
  This is a relational model of the same schema that SOLR currently ships with. 
We will use this as an example to build a data-config.xml for 
DataImportHandler. We've created a sample database with this schema in HSQLDB.  
To run it, do the following steps:
  
-  * Download 
[http://wiki.apache.org/solr-data/attachments/DataImportHandler/attachments/hsqldb-database.zip
 hsqldb-database.zip] to execute this example. This zip file contains all the 
hsqldb related files. Extract the downloaded zip file into c:\temp. Also save 
the following xml as c:\temp\example-data-config.xml
+  * Download 
[http://wiki.apache.org/solr-data/attachments/DataImportHandler/attachments/example-database.zip
 example-database.zip] to execute this example. This zip file contains all the 
hsqldb related files. Extract the downloaded zip file into c:\temp. Also save 
the following xml as c:\temp\example-data-config.xml
  
  {{{
  <dataConfig>
@@ -150, +150 @@

  
  When delta-import command is executed, it reads the start time stored in 
''conf/dataimport.properties''. It uses that timestamp to run delta queries 
(TODO: Example) and after completion, updates the timestamp in 
''conf/dataimport.properties''.
  
+ === Delta-Import Example ===
+ We will use the same example database used in the full import example. Note 
that the database schema has been updated and each table contains an additional 
column ''last_modified'' of timestamp type. You may want to download the 
database again since it has been updated recently. We use this timestamp field 
to determine what rows in each table have changed since the last indexed time.
+ 
+ Take a look at the following data-config.xml
+ 
+ {{{
+ <dataConfig>
+     <document name="products">
+           <entity name="item" pk="ID" query="select * from item"
+               deltaQuery="select id from item where last_modified > 
'${dataimporter.last_index_time}'">
+             <field column="ID" name="id" />
+             <field column="NAME" name="name" />
+             <field column="NAME" name="nameSort" />
+             <field column="NAME" name="alphaNameSort" />
+             <field column="MANU" name="manu" />
+             <field column="WEIGHT" name="weight" />
+             <field column="PRICE" name="price" />
+             <field column="POPULARITY" name="popularity" />
+             <field column="INSTOCK" name="inStock" />
+             <field column="INCLUDES" name="includes" />
+ 
+             <entity name="feature" pk="ITEM_ID" 
+                     query="select DESCRIPTION from FEATURE where 
ITEM_ID='${item.ID}'">
+                 <field name="features" column="DESCRIPTION" />
+             </entity>
+             <entity name="item_category" pk="ITEM_ID, CATEGORY_ID"
+                     query="select CATEGORY_ID from item_category where 
ITEM_ID='${item.ID}'">
+                 <entity name="category" pk="ID"
+                         query="select DESCRIPTION from category where ID = 
'${item_category.CATEGORY_ID}'">
+                     <field column="description" name="cat" />
+                 </entity>
+             </entity>
+         </entity>
+     </document>
+ </dataConfig>
+ }}}
+ 
+ Pay attention to the ''deltaQuery'' attribute which has an SQL statement 
capable of detecting changes in the ''item'' table. Note the variable 
{{{${dataimporter.last_index_time}}}}
+ The DataImportHandler exposes a variable called ''last_index_time'' which is 
a timestamp value denoting the last time ''full-import'' ''''or'''' 
''delta-import'' was run. You can use this variable anywhere in the SQL you 
write in data-config.xml and it will be replaced by the value during processing.
+ 
+ ''''Note''''
+  - The deltaQuery in the above example only detects changes in ''item'' but 
not in other tables. You can detect the changes to all child tables in one SQL 
query as specified below. Figuring out it's details is an exercise for the user 
:)
+ {{{
+       deltaQuery="select id from item where id in
+                               (select item_id as id from feature where 
last_modified > '${dataimporter.last_index_time}')
+                               or id in 
+                               (select item_id as id from item_category where 
item_id in 
+                                   (select id as item_id from category where 
last_modified > '${dataimporter.last_index_time}')
+                               or last_modified > 
'${dataimporter.last_index_time}')
+                               or last_modified > 
'${dataimporter.last_index_time}'"
+ }}}
+  - Writing a huge deltaQuery like the above one is not a very enjoyable task, 
so we have an alternate easy mechanism of achieving this goal. TODO: Give an 
example here.
+ 
  = Where to find it? =
  DataImportHandler is not in SOLR right now. It exists as a patch in 
[http://issues.apache.org/jira/browse/SOLR-469 SOLR-469] in the SOLR JIRA. 
Please help us by giving your comments, suggestions and/or code contributions 
on this new feature.
  

Reply via email to