Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by NoblePaul: http://wiki.apache.org/solr/DataImportHandler ------------------------------------------------------------------------------ The examples we explored are admittedly, trivial . It is not possible to have all user needs met by an xml configuration alone. So we expose a few interfaces which can be implemented by the user to enhance the functionality. == Transformer == + Every row that is fetched from the DB can be either consumed directly or it can be massaged to create a totally new set of fields or it can even return more than one row of data. The configuration must be done on an entity level as follows. + {{{ + <entity name="foo" tranfromer="com.foo.Foo" ... /> + }}} - ---- + the class 'Foo' must implement the interface `org.apache.solr.hander.dataimport.Transformer` The interface has ony one method. + + {{{ + public interface Transformer { + /**The input is a row of data and the output has to be a new row. + * @param context The current context + * @param aRow A row of data + * @return The changed data. It must be a Map<String, Object> if it returns only one row + * or if there are multiple rows to be returned it must be a List<Map<String, Object>> + * + */ + public Object transformRow(Context context, Map<String, Object> aRow); + } + }}} + + + The Context is the interface that provides the contextual information necessary to process the data. + + The confiuration has a 'flexible' schema. It lets the user provide arbitrary attributes in an 'entity' tag and 'field' tags. The tool reads the data and hands it over to the implementation class as it is. If the 'Transformer' needs extra information to be provided on a per entity/field basis it can do so. The values can be obtained from the Context. + + There is an inbuilt transformer called 'RegExpTransfromer' provided with the tool itself. It helps in extracting values from fields (from db) using Regular Expressions. + + example: + {{{ + <entity name="foo" transformer="org.apache.solr.handler.dataimport.RegExpTransformer" + query="select full_name from foo"/> + ... /> + <field column="full_name"/> + <field column="firstName" regExp="Mr(\w*)\b.*" sourceColName="full_name"/> + <field column="lastName" regExp="Mr.*?\b(\w*)" sourceColName="full_name"/> + </entity> + }}} + Here the attributes 'regExp' and 'sourceColName' are custom attributes used by the transformer. It reads the field 'full_name' from the resultset and transform it to two target fields 'firstName' and 'lastName' . So even though the query returned onlyy one column 'full_name' in the resultset the solr document gets two extra fields 'firstName' and 'lastName' wich are 'derived' fields.---- CategorySolrRequestHandler
