Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
  The examples we explored are admittedly, trivial . It is not possible to have 
all user needs met by an xml configuration alone. So we expose a few interfaces 
which can be implemented by the user to enhance the functionality.
  
  == Transformer ==
+ Every row that is fetched from the DB can be either consumed directly or it 
can be massaged to create a totally new set of fields or it can even return 
more than one row of data. The configuration must be done on an entity level as 
follows.
+ {{{
+ <entity name="foo" tranfromer="com.foo.Foo" ... />
+ }}}
  
- ----
+ the class 'Foo' must implement the interface 
`org.apache.solr.hander.dataimport.Transformer` The interface has ony one 
method.
+ 
+ {{{
+ public interface Transformer {
+     /**The input is a row of data and the output has to be a new row.
+      * @param context The current context
+      * @param aRow A row of data
+      * @return The changed data. It must be a Map<String, Object> if it 
returns only one row
+      * or if there are multiple rows to be returned it must be a 
List<Map<String, Object>>
+      *
+      */
+     public Object transformRow(Context context, Map<String, Object> aRow);
+ }
+ }}}
+ 
+ 
+ The Context is the interface that provides the contextual information 
necessary to process the data. 
+ 
+ The confiuration has a 'flexible' schema. It lets the user provide arbitrary 
attributes in an 'entity' tag  and 'field' tags. The tool reads the data and 
hands it over to the implementation class as it is. If the 'Transformer' needs 
extra information to be provided on a per entity/field basis it can do so. The 
values can be obtained from the Context. 
+ 
+ There is an inbuilt transformer called 'RegExpTransfromer' provided with the 
tool itself. It helps in extracting values from fields (from db) using Regular 
Expressions.
+ 
+ example:
+ {{{
+ <entity name="foo" 
transformer="org.apache.solr.handler.dataimport.RegExpTransformer"  
+ query="select full_name from foo"/>
+ ... />
+    <field column="full_name"/>
+    <field column="firstName" regExp="Mr(\w*)\b.*" sourceColName="full_name"/>
+       <field column="lastName" regExp="Mr.*?\b(\w*)" 
sourceColName="full_name"/>
+ </entity>
+ }}}
+ Here the attributes 'regExp' and 'sourceColName' are custom attributes used 
by the transformer. It reads the field 'full_name' from the resultset and 
transform it to two target fields 'firstName' and 'lastName' . So even though 
the query returned onlyy one column 'full_name' in the resultset the solr 
document gets two extra fields 'firstName' and 'lastName' wich are 'derived' 
fields.----
  CategorySolrRequestHandler
  

Reply via email to