Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
  <dataConfig>
      <dataSource driver="org.hsqldb.jdbcDriver" 
url="jdbc:hsqldb:/temp/example/ex" user="sa" />
      <document name="products">
-           <entity name="item" pk="ID" query="select * from item"
+           <entity name="item" pk="ID" 
+                 query="select * from item"
+                 deltaImportQuery="select * from item where 
ID=='${dataimporter.delta.id}'"
                deltaQuery="select id from item where last_modified > 
'${dataimporter.last_index_time}'">
- 
              <entity name="feature" pk="ITEM_ID"
                      query="select description as features from feature where 
item_id='${item.ID}'">
              </entity>
@@ -253, +254 @@

  Pay attention to the ''deltaQuery'' attribute which has an SQL statement 
capable of detecting changes in the ''item'' table. Note the variable 
{{{${dataimporter.last_index_time}}}}
  The DataImportHandler exposes a variable called ''last_index_time'' which is 
a timestamp value denoting the last time ''full-import'' ''''or'''' 
''delta-import'' was run. You can use this variable anywhere in the SQL you 
write in data-config.xml and it will be replaced by the value during processing.
  
+ 
  /!\ Note
   * The deltaQuery in the above example only detects changes in ''item'' but 
not in other tables. You can detect the changes to all child tables in one SQL 
query as specified below. Figuring out it's details is an exercise for the user 
:)
  {{{
@@ -270, +272 @@

      <dataSource driver="org.hsqldb.jdbcDriver" 
url="jdbc:hsqldb:/temp/example/ex" user="sa" />
      <document>
            <entity name="item" pk="ID" query="select * from item"
+                 deltaImportQuery="select * from item where 
ID=='${dataimporter.delta.id}'"
                deltaQuery="select id from item where last_modified > 
'${dataimporter.last_index_time}'">
                  <entity name="feature" pk="ITEM_ID"
                    query="select DESCRIPTION as features from FEATURE where 
ITEM_ID='${item.ID}'"
@@ -292, +295 @@

  }}}
  
  Here we have three queries specified for each entity except the root (which 
has only two).
-  * The ''query'' gives us the data needed to populate fields of the Solr 
document
+  * The ''query'' gives the data needed to populate fields of the Solr 
document in fill-import
+  * The ''deltaImportQuery'' gives the data needed to populate fields when 
running a delta-import
   * The ''deltaQuery'' gives the primary keys of the current entity which have 
changes since the last index time
   * The ''parentDeltaQuery'' uses the changed rows of the current table 
(fetched with deltaQuery) to give the changed rows in the parent table. This is 
necessary because whenever a row in the child table changes, we need to 
re-generate the document which has that field.
  
@@ -300, +304 @@

   * For each row given by ''query'', the query of the child entity is executed 
once.
   * For each row given by ''deltaQuery'', the parentDeltaQuery is executed.
   * If any row in the root/child entity changes, we regenerate the complete 
Solr document which contained that row.
+ 
+ /!\ Note :  The 'deltaImportQuery' is a Solr 1.4 feature. Originally it was 
generated automatically using the 'query' attribute which is error prone.
+ /!\ Note : It is possible to do delta-import using a full-import command . 
[http://wiki.apache.org/solr/DataImportHandlerFaq#fullimportdelta  See here]
  
  = Usage with XML/HTTP Datasource =
  DataImportHandler can be used to index data from HTTP based data sources. 
This includes using indexing from REST/XML APIs as well as from RSS/ATOM Feeds.

Reply via email to