Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by NoblePaul: http://wiki.apache.org/solr/DataImportHandler ------------------------------------------------------------------------------ [[Anchor(custom-transformers)]] == Writing Custom Transformers == + [DIHCustomTransformer | See here] - If you need any kind of custom processing before sending the row to Solr, you can write a transformer of your own. Let us take an example use-case. Suppose, you have a single-valued field named "artistName" in your schema which is of type="string" which you want to facet upon and therefore no index-time analysis should be done on this field. The value can contain multiple words like "Celine Dion" but there's a problem, your data contains extra leading and trailing whitespace which you want to remove. The !WhitespaceAnalyzer in Solr can't be applied since you don't want to tokenize the data into multiple tokens. A solution is to write a !TrimTransformer. - - === A Simple TrimTransformer === - {{{ - package foo; - public class TrimTransformer { - public Object transformRow(Map<String, Object> row) { - String artist = row.get("artist"); - if (artist != null) - row.put("ar", artist.trim()); - - return row; - } - } - }}} - No need to extend any class. Just write any class which has a method named transformRow with the above signature and DataImportHandler will instantiate it and call the transformRow method using reflection. You will specify it in your data-config.xml as follows: - {{{ - <entity name="artist" query="..." transformer="foo.TrimTransformer"> - <field column="artistName" /> - </entity> - }}} - - === A General TrimTransformer === - Suppose you want to write a general !TrimTransformer without hardcoding the column on which it needs to operate. Now we'd need to have a flag on the field in data-config.xml to indicate that the !TrimTransformer should apply itself on this field. - {{{ - <entity name="artist" query="..." transformer="foo.TrimTransformer"> - <field column="artistName" trim="true" /> - </entity> - }}} - Now you'll need to extend the [#transformer Transformer] abstract class and use the API methods in Context to get the list of fields in the entity and get attributes of the fields to detect if the flag is set. - {{{ - package foo; - public class TrimTransformer extends Transformer { - - public Map<String, Object> transformRow(Map<String, Object> row, Context context) { - List<Map<String, String>> fields = context.getAllEntityFields(); - - for (Map<String, String> field : fields) { - // Check if this field has trim="true" specified in the data-config.xml - String trim = field.get("trim"); - if ("true".equals(trim)) { - // Apply trim on this field - String columnName = field.get("column"); - // Get this field's value from the current row - String value = row.get(columnName); - // Trim and put the updated value back in the current row - if (value != null) - row.put(columnName, value.trim()); - } - } - - return row; - } - - } - }}} - If the field is multi-valued, then the value returned is a List instead of a single object and would need to handl appropriately. You'll need to add the jar for !DataImportHandler to your project as a dependency to use the Transformer and Context abstract classes. - [[Anchor(entityprocessor)]] == EntityProcessor == Each entity is handled by a default Entity processor called !SqlEntityProcessor. This works well for systems which use RDBMS as a datasource. For other kind of datasources like REST or Non Sql datasources you can choose to extend this abstract class `org.apache.solr.handler.dataimport.Entityprocessor`. This is designed to Stream rows one by one from an entity. The simplest way to implement your own !EntityProcessor is to extend !EntityProcessorBase and override the `public Map<String,Object> nextRow()` method.
