Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
  {{{
  <dataConfig>
      <dataSource driver="org.hsqldb.jdbcDriver" 
url="jdbc:hsqldb:/temp/example/ex" user="sa" />
-     <document name="products">
+     <document>
          <entity name="item" pk="ID" query="select * from item">               
     
              <entity name="feature" query="select description as features from 
feature where item_id='${item.ID}'"/>            
              <entity name="item_category" query="select category_id from 
item_category where item_id='${item.ID}'">
@@ -249, +249 @@

   * Writing a huge deltaQuery like the above one is not a very enjoyable task, 
so we have an alternate mechanism of achieving this goal.
  {{{
  <dataConfig>
-     <document name="products">
+     <dataSource driver="org.hsqldb.jdbcDriver" 
url="jdbc:hsqldb:/temp/example/ex" user="sa" />
+     <document>
            <entity name="item" pk="ID" query="select * from item"
                deltaQuery="select id from item where last_modified > 
'${dataimporter.last_index_time}'">
                  <entity name="feature" pk="ITEM_ID" 
@@ -266, +267 @@

                        query="select DESCRIPTION as cat from category where ID 
= '${item_category.CATEGORY_ID}'"
                        deltaQuery="select ID from category where last_modified 
> '${dataimporter.last_index_time}'"
                        parentDeltaQuery="select ITEM_ID, CATEGORY_ID from 
item_category where CATEGORY_ID=${category.ID}"/>
- 
            </entity>
          </entity>
      </document>
@@ -362, +362 @@

  
  What about this ''transformer=!DateFormatTransformer'' attribute in the 
entity? . See [#DateFormatTransformer DateFormatTransformer]  Section for 
details
  
- You can use this feature for indexing from REST API's such as rss/atom feeds, 
XML data feeds , other SOLR servers or even well formed xhtml documents . Our 
XPath support has its limitations but we have tried to make sure that common 
use-cases are covered and since it's based on a streaming parser, it is 
extremely fast and consumes constant amount of memory even for large XMLs. It 
does not support namespaces , but it can handle xmls with namespaces . When you 
provide the xpath, just drop the namespace and give the rest (eg if the tag is 
`'<dc:subject>'` the mapping should just contain `'subject'`).Easy, isn't it? 
And you didn't need to write one line of code! Enjoy :)
+ You can use this feature for indexing from REST API's such as rss/atom feeds, 
XML data feeds , other SOLR servers or even well formed xhtml documents . Our 
XPath support has its limitations (no wildcards , only fullpath etc) but we 
have tried to make sure that common use-cases are covered and since it's based 
on a streaming parser, it is extremely fast and consumes constant amount of 
memory even for large XMLs. It does not support namespaces , but it can handle 
xmls with namespaces . When you provide the xpath, just drop the namespace and 
give the rest (eg if the tag is `'<dc:subject>'` the mapping should just 
contain `'subject'`).Easy, isn't it? And you didn't need to write one line of 
code! Enjoy :)
  = Extending the tool with APIs =
  The examples we explored are admittedly, trivial . It is not possible to have 
all user needs met by an xml configuration alone. So we expose a few interfaces 
which can be implemented by the user to enhance the functionality.
  
@@ -478, +478 @@

   * '''`formatStyle`''' : The format used for parsing this field The value of 
the attribute must be one of (number|percent|integer|currency). This uses the 
semantics of java 
[http://java.sun.com/j2se/1.4.2/docs/api/java/text/NumberFormat.html 
NumberFormat].
   * '''`sourceColName`''' : The column on which the !NumberFormat is to be 
applied. If this is absent source and target are same 
  
+ === TemplateTransformer ===
+ Uses the powerful template engine of !DataImportHandler to construct/modify a 
field value.
+ eg:
+ {{{
+ <entity name="e" transformer="TemplateTransformer" ..>
+ <field column="price" template="hello${e.name},${eparent.surname}" />
+ ...
+ </entity>
+ }}}
+ The rules for the template are same as the templates in 'query', 'url' etc. 
it helps to concatenate multiple values or add extra characters to field for 
injection. Only appplies on fields which have a 'template' attribute.
+ ==== Attributes ====
+  * '''`template`''' : The template string. In the above example there are two 
placeholders '${e.name}' and '${eparent.surname}' .   Both the values must be 
present when it is being evaluated. Else it will not be evaluated. 
+ 
  == EntityProcessor ==
+ Each entity is handled by a default Entity processor called 
!SqlEntityProcessor. This works well for systems which use RDBMS as a 
datasource. For other kind of datasources like  REST or Non Sql datasources you 
can choose to implement this interface 
`org.apache.solr.handler.dataimport.Entityprocessor`. This is designed to 
Stream rows one by one from an entity. The simplest way to implement your own 
!EntityProcessor is to just extent !EntityProcessorBase and override the 
`public Map<String,Object> nextRow()` method.
- Each entity is handled by a default Entity processor called 
!SqlEntityProcessor. This works well for systems which use RDBMS as a 
datasource. For other kind of datasources like  REST or Non Sql datasources you 
can choose to implement this interface 
`org.apache.solr.handler.dataimport.Entityprocessor`
- {{{
- /** 
-  * An instance of entity processor serves an entity. It is reused for the same
-  * entity another time. Dees not have to be thread safe.
-  */
- public interface EntityProcessor {
  
-     /**  This method is called when it starts processing an entity . When it 
comes back
-      *  to the entity it is called again. So reset anything at that point
-      * @param context       The current context
-      */
-     void init(Context context);
  
-     /**
-      * This method helps streaming the data for each row .
-      * The implementation would fetch as many rows as needed and gives one 
'row'
-      * at a time.
-      * Only this method is used during a full import
-      *
-      * @return A 'row' . The 'key' for the map is the column name and the 
'value' is the value
-      *         of that column. If there are no more rows to be returned, 
return 'null'
-      */
-     public Map<String, Object> nextRow();
- 
- 
-     /**This is used for delta-import.
-      * It gives the pks of the changed rows in this entity
-      * @return the pk vs value of all changed rows
-      */
-     public Map<String, Object> nextModifiedRowKey();
- 
-     /** This is used during delta-import.
-      * It gives the pks of the rows that are deleted from this entity.
-      * If this entity is the root entity ,solr document is deleted.
-      * If this is a sub-entity , the solr document is considered as 'changed' 
and
-      * will be recreated
-      * @return the pk vs value of all changed rows
-      */
-     public Map<String, Object> nextDeletedRowKey();
- 
-     /**This is used during delta-import.
-      * This gives the primary keys and their values of all the rows changed in
-      * a parent entity due to changes in this entity.
-      * @return the pk vs value of all changed rows   in the parent entity
-      */
-     public Map<String, Object> nextModifiedParentRowKey();
- }
- 
- }}}
  == DataSource ==
  A class can implement `org.apache.solr.handler.dataimport.DataSource` 
  {{{
@@ -538, +504 @@

  
      /**Get a records for the given query. This is designed to stream records 
using an iterator
       * @param query . The query string . can be an sql for RDBMS .
-      * @return an iterator
+      * @return an Object which the Entityprocessor understands. For instanc, 
JdbcDataSource returns an Iterator<Map<String,Object>> and HttpDataSource and 
FileDataSource returs a java.io.reader
       */
      public T getData(String query);
  
@@ -567, +533 @@

      </lst>
    </requestHandler>
  }}}
- 
+ [[Anchor(arch)]]
  = Architecture =
  The following diagram describes the logical flow for a sample configuration.
  

Reply via email to