[Solr Wiki] Update of "DataImportHandler" by NoblePaul

Apache Wiki Fri, 05 Dec 2008 23:45:45 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
  
  [[Anchor(custom-transformers)]]
  == Writing Custom Transformers ==
+ [DIHCustomTransformer | See here]
- If you need any kind of custom processing before sending the row to Solr, you 
can write a transformer of your own. Let us take an example use-case. Suppose, 
you have a single-valued field named "artistName" in your schema which is of 
type="string" which you want to facet upon and therefore no index-time analysis 
should be done on this field. The value can contain multiple words like "Celine 
Dion" but there's a problem, your data contains extra leading and trailing 
whitespace which you want to remove. The !WhitespaceAnalyzer in Solr can't be 
applied since you don't want to tokenize the data into multiple tokens. A 
solution is to write a !TrimTransformer.
- 
- === A Simple TrimTransformer ===
- {{{
- package foo;
- public class TrimTransformer  {
-       public Object transformRow(Map<String, Object> row)     {
-               String artist = row.get("artist");
-               if (artist != null)             
-                       row.put("ar", artist.trim());
- 
-               return row;
-       }
- }
- }}}
- No need to extend any class. Just write any class which has a method named 
transformRow with the above signature and DataImportHandler will instantiate it 
and call the transformRow method using reflection. You will specify it in your 
data-config.xml as follows:
- {{{
- <entity name="artist" query="..." transformer="foo.TrimTransformer">
-       <field column="artistName" />
- </entity>
- }}}
- 
- === A General TrimTransformer ===
- Suppose you want to write a general !TrimTransformer without hardcoding the 
column on which it needs to operate. Now we'd need to have a flag on the field 
in data-config.xml to indicate that the !TrimTransformer should apply itself on 
this field.
- {{{
- <entity name="artist" query="..." transformer="foo.TrimTransformer">
-       <field column="artistName" trim="true" />
- </entity>
- }}}
- Now you'll need to extend the [#transformer Transformer] abstract class and 
use the API methods in Context to get the list of fields in the entity and get 
attributes of the fields to detect if the flag is set.
- {{{
- package foo;
- public class TrimTransformer extends Transformer      {
- 
-       public Map<String, Object> transformRow(Map<String, Object> row, 
Context context) {
-               List<Map<String, String>> fields = context.getAllEntityFields();
- 
-               for (Map<String, String> field : fields) {
-                       // Check if this field has trim="true" specified in the 
data-config.xml
-                       String trim = field.get("trim");
-                       if ("true".equals(trim))        {
-                               // Apply trim on this field
-                               String columnName = field.get("column");
-                               // Get this field's value from the current row
-                               String value = row.get(columnName);
-                               // Trim and put the updated value back in the 
current row
-                               if (value != null)
-                                       row.put(columnName, value.trim());
-                       }
-               }
- 
-               return row;
-       }
- 
- }
- }}}
- If the field is multi-valued, then the value returned is a List instead of a 
single object and would need to handl appropriately. You'll need to add the jar 
for !DataImportHandler to your project as a dependency to use the Transformer 
and Context abstract classes.
- 
  [[Anchor(entityprocessor)]]
  == EntityProcessor ==
  Each entity is handled by a default Entity processor called 
!SqlEntityProcessor. This works well for systems which use RDBMS as a 
datasource. For other kind of datasources like  REST or Non Sql datasources you 
can choose to extend this abstract class 
`org.apache.solr.handler.dataimport.Entityprocessor`. This is designed to 
Stream rows one by one from an entity. The simplest way to implement your own 
!EntityProcessor is to extend !EntityProcessorBase and override the `public 
Map<String,Object> nextRow()` method.

[Solr Wiki] Update of "DataImportHandler" by NoblePaul

Reply via email to