[Solr Wiki] Update of "DataImportHandler" by ShalinMangar

Apache Wiki Fri, 08 Aug 2008 10:53:38 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by ShalinMangar:
http://wiki.apache.org/solr/DataImportHandler

The comment on the change is:
Changed interfaces to abstract class to sync with code changes

------------------------------------------------------------------------------
  
  /!\ Note : Unlike with database , it is not possible to omit the field 
declarations if you are using X!PathEntityProcessor. It relies on the xpaths 
declared in the fields to identify what to extract from the xml. 
  = Extending the tool with APIs =
- The examples we explored are admittedly, trivial . It is not possible to have 
all user needs met by an xml configuration alone. So we expose a few interfaces 
which can be implemented by the user to enhance the functionality.
+ The examples we explored are admittedly, trivial . It is not possible to have 
all user needs met by an xml configuration alone. So we expose a few abstract 
class which can be implemented by the user to enhance the functionality.
  
  [[Anchor(transformer)]]
  == Transformer ==
@@ -386, +386 @@

  }}}
  /!\ Note -- The trasformer value has to be fully qualified classname .If the 
class package is `'org.apache.solr.handler.dataimport'` the package name can be 
omitted. The solr.<classname> also works if the class belongs to one of the 
'solr' packages . This rule applies for all the pluggable classes like 
!DataSource , !Entityprocessor and Evaluator.
  
- the class 'Foo' must implement the interface 
`org.apache.solr.hander.dataimport.Transformer` The interface has only one 
method.
+ the class 'Foo' must extend the abstract class 
`org.apache.solr.hander.dataimport.Transformer` The class has only one abstract 
method.
  
  {{{
- public interface Transformer {
+ public abstract class Transformer {
+   /**
-     /**The input is a row of data and the output has to be a new row.
+    * The input is a row of data and the output has to be a new row.
+    *
-      * @param context The current context
+    * @param context The current context
-      * @param aRow A row of data
+    * @param row     A row of data
-      * @return The changed data. It must be a Map<String, Object> if it 
returns only one row
+    * @return The changed data. It must be a Map<String, Object> if it returns
-      * or if there are multiple rows to be returned it must be a 
List<Map<String, Object>>
-      *
+    *         only one row or if there are multiple rows to be returned it must
+    *         be a List<Map<String, Object>>
-      */
+    */
-     public Object transformRow(Map<String, Object> row, Context context);
+   public abstract Object transformRow(Map<String, Object> row, Context 
context);
  }
  }}}
  
  
- The Context is the interface that provides the contextual information that 
may be necessary to process the data. 
+ The Context is the abstract class that provides the contextual information 
that may be necessary to process the data. 
  
- Alternately the class `Foo` may choose NOT TO implement this interface and 
just write a method with this signature
+ Alternately the class `Foo` may choose NOT TO implement this abstract class 
and just write a method with this signature
  {{{
  public Object transformRow(Map<String, Object> row)
  }}}
@@ -464, +466 @@

   * Write as many transformer functions as you want to use. Each such function 
must accept a ''row'' variable corresponding to ''Map<String, Object>'' and 
return a row (after applying transformations)
   * Make an entity use a function by specifying 
''transformer="script:<function-name>"'' in the ''entity'' node.
   * In the above data-config, the javascript function ''f1'' will be executed 
once for each row returned by entity e.
-  * The semantics of execution is same as that of a java transformer. The 
method can have two arguments as in 'transformRow(Map<String,Object> , Context 
context) in the interface 'Transformer' . As it is javascript the second 
argument may be omittted and it still works.
+  * The semantics of execution is same as that of a java transformer. The 
method can have two arguments as in 'transformRow(Map<String,Object> , Context 
context) in the abstract class 'Transformer' . As it is javascript the second 
argument may be omittted and it still works.
  [[Anchor(DateFormatTransformer)]]
  === DateFormatTransformer ===
  There is a built-in transformer called the !DateFormatTransformer which is 
useful for parsing date/time strings into java.util.Date instances.
@@ -522, +524 @@

        }
  }
  }}}
- No need to extend any interface. Just write any class which has a method 
named transformRow with the above signature and DataImportHandler will 
instantiate it and call the transformRow method using reflection. You will 
specify it in your data-config.xml as follows:
+ No need to extend any class. Just write any class which has a method named 
transformRow with the above signature and DataImportHandler will instantiate it 
and call the transformRow method using reflection. You will specify it in your 
data-config.xml as follows:
  {{{
  <entity name="artist" query="..." transformer="foo.TrimTransformer">
        <field column="artistName" />
@@ -536, +538 @@

        <field column="artistName" trim="true" />
  </entity>
  }}}
- Now you'll need to extend the [#transformer Transformer] interface and use 
the API methods in Context to get the list of fields in the entity and get 
attributes of the fields to detect if the flag is set.
+ Now you'll need to extend the [#transformer Transformer] abstract class and 
use the API methods in Context to get the list of fields in the entity and get 
attributes of the fields to detect if the flag is set.
  {{{
  package foo;
- public class TrimTransformer implements Transformer   {
+ public class TrimTransformer extends Transformer      {
  
        public Map<String, Object> transformRow(Map<String, Object> row, 
Context context) {
                List<Map<String, String>> fields = context.getAllEntityFields();
@@ -563, +565 @@

  
  }
  }}}
- If the field is multi-valued, then the value returned is a List instead of a 
single object and would need to handl appropriately. You'll need to add the jar 
for !DataImportHandler to your project as a dependency to use the Transformer 
and Context interfaces.
+ If the field is multi-valued, then the value returned is a List instead of a 
single object and would need to handl appropriately. You'll need to add the jar 
for !DataImportHandler to your project as a dependency to use the Transformer 
and Context abstract classes.
  
  [[Anchor(entityprocessor)]]
  == EntityProcessor ==
- Each entity is handled by a default Entity processor called 
!SqlEntityProcessor. This works well for systems which use RDBMS as a 
datasource. For other kind of datasources like  REST or Non Sql datasources you 
can choose to implement this interface 
`org.apache.solr.handler.dataimport.Entityprocessor`. This is designed to 
Stream rows one by one from an entity. The simplest way to implement your own 
!EntityProcessor is to just extent !EntityProcessorBase and override the 
`public Map<String,Object> nextRow()` method.
+ Each entity is handled by a default Entity processor called 
!SqlEntityProcessor. This works well for systems which use RDBMS as a 
datasource. For other kind of datasources like  REST or Non Sql datasources you 
can choose to extend this abstract class 
`org.apache.solr.handler.dataimport.Entityprocessor`. This is designed to 
Stream rows one by one from an entity. The simplest way to implement your own 
!EntityProcessor is to just extent !EntityProcessorBase and override the 
`public Map<String,Object> nextRow()` method.
  '!EntityProcessor' rely on the !DataSource for fetching data. The return type 
of the !DataSource is important for an !EntityProcessor. The in-built ones are,
  === SqlEntityProcessor ===
  This is the defaut. The !DataSource must be of type 
`DataSourec<Iterator<Map<String, Object>>` . !JdbcDataSource can be used with 
this.
@@ -627, +629 @@

  
  == DataSource ==
  [[Anchor(datasource)]]
- A class can implement `org.apache.solr.handler.dataimport.DataSource` 
+ A class can extend `org.apache.solr.handler.dataimport.DataSource` 
  {{{
- public interface DataSource <T> {
+ public abstract class DataSource<T> {
  
+   /**
+    * Initializes the DataSource with the <code>Context</code> and
+    * initialization properties.
+    * <p/>
+    * This is invoked by the <code>DataImporter</code> after creating an
+    * instance of this class.
+    *
+    * @param context
+    * @param initProps
+    */
-     public void init(Context context, Properties initProps);
+   public abstract void init(Context context, Properties initProps);
  
-     /**Get a records for the given query. This is designed to stream records 
-      * @param query . The query string . can be an sql for RDBMS . or a url 
for http etc etc
-      * @return an Object which the Entityprocessor understands. For instance, 
JdbcDataSource returns an Iterator<Map<String,Object>> and HttpDataSource and 
FileDataSource returs a java.io.reader
+   /**
+    * Get records for the given query.The return type depends on the
+    * implementation .
+    *
+    * @param query The query string. It can be a SQL for JdbcDataSource or a 
URL
+    *              for HttpDataSource or a file location for FileDataSource or 
a custom
+    *              format for your own custom DataSource.
+    * @return Depends on the implementation. For instance JdbcDataSource 
returns
+    *         an Iterator<Map <String,Object>>
-      */
+    */
-     public T getData(String query);
+   public abstract T getData(String query);
  
+   /**
+    * Cleans up resources of this DataSource after use.
+    */
+   public abstract void close();
  }
  }}}
- and can be used as a !DataSource.It must be configured in the dataSource 
definition
+ and can be used as a !DataSource. It must be configured in the dataSource 
definition
  {{{
  <dataSource type="com.foo.FooDataSource" prop1="hello"/>
  }}}
@@ -650, +672 @@

  === JdbcdataSource ===
  This is the default. See the  [#jdbcdatasource example] . The signature is as 
follows
  {{{
- public class JdbcDataSource implements DataSource<Iterator<Map<String, 
Object>>> 
+ public class JdbcDataSource extends DataSource<Iterator<Map<String, Object>>> 
  }}}
  
  It is designed to iterate rows in DB one by one. A row is represented as a 
Map.
  === HttpDataSource ===
  This is used by X!PathEntityProcessor . See the documentation [#httpds here] 
. The signature is as follows
  {{{
- public class HttpDataSource implements DataSource<Reader>
+ public class HttpDataSource extends DataSource<Reader>
  }}}
  === FileDataSource ===
  This can be used like an !HttpDataSource . The signature is as follows
  {{{
- public class FileDataSource implements DataSource<Reader>  
+ public class FileDataSource extends DataSource<Reader>  
  }}}
  
  The attributes are:

[Solr Wiki] Update of "DataImportHandler" by ShalinMangar

Reply via email to