[Solr Wiki] Update of "DIHCustomTransformer" by NoblePaul

Apache Wiki Fri, 05 Dec 2008 23:41:02 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DIHCustomTransformer

The comment on the change is:
Moved from DataImportHandler Page

New page:
= Writing Custom Transformers =
If you need any kind of custom processing before sending the row to Solr, you 
can write a transformer of your own. Let us take an example use-case. Suppose, 
you have a single-valued field named "artistName" in your schema which is of 
type="string" which you want to facet upon and therefore no index-time analysis 
should be done on this field. The value can contain multiple words like "Celine 
Dion" but there's a problem, your data contains extra leading and trailing 
whitespace which you want to remove. The !WhitespaceAnalyzer in Solr can't be 
applied since you don't want to tokenize the data into multiple tokens. A 
solution is to write a !TrimTransformer.

== A Simple TrimTransformer ==
{{{
package foo;
public class TrimTransformer    {
        public Object transformRow(Map<String, Object> row)     {
                String artist = row.get("artist");
                if (artist != null)             
                        row.put("ar", artist.trim());

                return row;
        }
}
}}}
No need to extend any class. Just write any class which has a method named 
transformRow with the above signature and DataImportHandler will instantiate it 
and call the transformRow method using reflection. You will specify it in your 
data-config.xml as follows:
{{{
<entity name="artist" query="..." transformer="foo.TrimTransformer">
        <field column="artistName" />
</entity>
}}}

== A General TrimTransformer ==
Suppose you want to write a general !TrimTransformer without hardcoding the 
column on which it needs to operate. Now we'd need to have a flag on the field 
in data-config.xml to indicate that the !TrimTransformer should apply itself on 
this field.
{{{
<entity name="artist" query="..." transformer="foo.TrimTransformer">
        <field column="artistName" trim="true" />
</entity>
}}}
Now you'll need to extend the [#transformer Transformer] abstract class and use 
the API methods in Context to get the list of fields in the entity and get 
attributes of the fields to detect if the flag is set.
{{{
package foo;
public class TrimTransformer extends Transformer        {

        public Map<String, Object> transformRow(Map<String, Object> row, 
Context context) {
                List<Map<String, String>> fields = context.getAllEntityFields();

                for (Map<String, String> field : fields) {
                        // Check if this field has trim="true" specified in the 
data-config.xml
                        String trim = field.get("trim");
                        if ("true".equals(trim))        {
                                // Apply trim on this field
                                String columnName = field.get("column");
                                // Get this field's value from the current row
                                String value = row.get(columnName);
                                // Trim and put the updated value back in the 
current row
                                if (value != null)
                                        row.put(columnName, value.trim());
                        }
                }

                return row;
        }

}
}}}
If the field is multi-valued, then the value returned is a List instead of a 
single object and would need to handl appropriately. You'll need to add the jar 
for !DataImportHandler to your project as a dependency to use the Transformer 
and Context abstract classes.

[Solr Wiki] Update of "DIHCustomTransformer" by NoblePaul

Reply via email to