[Solr Wiki] Update of "DataImportHandler" by NoblePaul

Apache Wiki Tue, 24 Jun 2008 21:19:39 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
  Pay attention to the ''deltaQuery'' attribute which has an SQL statement 
capable of detecting changes in the ''item'' table. Note the variable 
{{{${dataimporter.last_index_time}}}}
  The DataImportHandler exposes a variable called ''last_index_time'' which is 
a timestamp value denoting the last time ''full-import'' ''''or'''' 
''delta-import'' was run. You can use this variable anywhere in the SQL you 
write in data-config.xml and it will be replaced by the value during processing.
  
- '''Note'''
+ /!\ Note 
   * The deltaQuery in the above example only detects changes in ''item'' but 
not in other tables. You can detect the changes to all child tables in one SQL 
query as specified below. Figuring out it's details is an exercise for the user 
:)
  {{{
        deltaQuery="select id from item where id in
@@ -374, +374 @@

  
  You can use this feature for indexing from REST API's such as rss/atom feeds, 
XML data feeds , other SOLR servers or even well formed xhtml documents . Our 
XPath support has its limitations (no wildcards , only fullpath etc) but we 
have tried to make sure that common use-cases are covered and since it's based 
on a streaming parser, it is extremely fast and consumes constant amount of 
memory even for large XMLs. It does not support namespaces , but it can handle 
xmls with namespaces . When you provide the xpath, just drop the namespace and 
give the rest (eg if the tag is `'<dc:subject>'` the mapping should just 
contain `'subject'`).Easy, isn't it? And you didn't need to write one line of 
code! Enjoy :)
  
- note: Unlike with database , it is not possible to omit the field 
declarations if you are using X!PathEntityProcessor. It relies on the xpaths 
declared in the fields to identify what to extract from the xml. 
+ /!\ Note : Unlike with database , it is not possible to omit the field 
declarations if you are using X!PathEntityProcessor. It relies on the xpaths 
declared in the fields to identify what to extract from the xml. 
  = Extending the tool with APIs =
  The examples we explored are admittedly, trivial . It is not possible to have 
all user needs met by an xml configuration alone. So we expose a few interfaces 
which can be implemented by the user to enhance the functionality.
  
@@ -383, +383 @@

  {{{
  <entity name="foo" transformer="com.foo.Foo" ... />
  }}}
+ /!\ Note -- The trasformer value has to be fully qualified classname .If the 
class package is `'org.apache.solr.handler.dataimport'` the package name can be 
omitted. The solr.<classname> also works if the class belongs to one of the 
'solr' packages . This rule applies for all the pluggable classes like 
!DataSource , !Entityprocessor and Evaluator.
  
  the class 'Foo' must implement the interface 
`org.apache.solr.hander.dataimport.Transformer` The interface has only one 
method.
  
@@ -522, +523 @@

  {{{
  <dataConfig>
      <dataSource type="FileDataSource" />
-         <document>
+     <document>
-             <entity name="f" processor="FileListEntityProcessor" 
fileName=".*xml" newerThan="'NOW-3DAYS'" recursive="true" rootEntity="false" 
dataSource="null">
+         <entity name="f" processor="FileListEntityProcessor" fileName=".*xml" 
newerThan="'NOW-3DAYS'" recursive="true" rootEntity="false" dataSource="null">
-                 <entity name="x" processor="XPathEntityProcessor" 
forEach="/the/record/xpath" url="${f.fileAbsolutePath}">
+             <entity name="x" processor="XPathEntityProcessor" 
forEach="/the/record/xpath" url="${f.fileAbsolutePath}">
-                     <field column="full_name" xpath="/field/xpath"/> 
+                 <field column="full_name" xpath="/field/xpath"/> 
-                 </entity>
              </entity>
+         </entity>
-         <document>
+     <document>
  <dataConfig>
  }}}
  Do not miss the `rootEntity` attribute. The implicit fields generated by the 
processor are `fileAbsolutePath,fileSize,fileLastModified,fileName`.

[Solr Wiki] Update of "DataImportHandler" by NoblePaul

Reply via email to