Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by FergusMcMenemie: http://wiki.apache.org/solr/DataImportHandler The comment on the change is: clarifying documenation of RegexTransformer ------------------------------------------------------------------------------ === RegexTransformer === - There is an built-in transformer called '!RegexTransfromer' provided with the tool itself. It helps in extracting values from fields (from the source) using Regular Expressions. The actual class name is `org.apache.solr.handler.dataimport.RegexTransformer` . But as it belongs to the default package , package-name can be omitted + There is an built-in transformer called '!RegexTransfromer' provided with DIH. It helps in extracting or manipulating values from fields (from the source) using Regular Expressions. The actual class name is `org.apache.solr.handler.dataimport.RegexTransformer`. But as it belongs to the default package the package-name can be omitted. + + + ==== Attributes ==== + !RegexTransfromer is only activated for fields with an attribute of 'regex' or 'splitBy'. Other fields are ignored. + * '''`regex`''' : The regular expression that is used to match against the column or sourceColName's value(s). If `replaceWith` is absent, each regex ''group'' is taken as a value and a list of values is returned + * '''`sourceColName`''' : The column on which the regex is to be applied. If this is absent source and target are same + * '''`splitBy`''' : Used to split a String to obtain multiple values, returns a list of values + * '''`groupNames`''' : A comma separated list of field column names, used where the `regex` contains groups and each group is to be saved to a different field. If some groups are not to be named leave a space between commas. <!> ["Solr1.4"] + * '''`replaceWith`''' : Used along with `regex` . It is equivalent to the method `new String(<sourceColVal>).replaceAll(<regex>, <replaceWith>)` example: {{{ @@ -486, +495 @@ </entity> }}} - ==== Attributes ==== - !RegexTransfromer applies only on the fields with an attribute 'regex' or 'splitBy'. All other fields are left as it is. - * '''`regex`''' : The regular expression that is used to match . This or `splitBy` must be present for each field. If not, that field is not touched by the transformer . If `replaceWith` is absent, each ''group'' is taken as a value and a list of values is returned - * '''`sourceColName`''' : The column on which the regex is to be applied. If this is absent source and target are same - * '''`splitBy`''' : If the `regex` is used to split a String to obtain multipple values use this - * '''`groupNames`''' : If the `regex` contains groups and each of them go to different fields , each group can be given a name (comma separated) . If some groups are not to be named leave a space between commas. <!> ["Solr1.4"] - * '''`replaceWith`''' : Used alongwith `regex` . It is equivalent to the method `new String(<sourceColVal>).replaceAll(<regex>, <replaceWith>)` - Here the attributes 'regex' and 'sourceColName' are custom attributes used by the transformer. It reads the field 'full_name' from the resultset and transform it to two target fields 'firstName' and 'lastName' . So even though the query returned only one column 'full_name' in the resultset the solr document gets two extra fields 'firstName' and 'lastName' wich are 'derived' fields. + In this example the attributes 'regex' and 'sourceColName' are custom attributes used by the transformer. It reads the field 'full_name' from the resultset and transforms it to two target fields 'firstName' and 'lastName' . So even though the query returned only one column 'full_name' in the resultset the solr document gets two extra fields 'firstName' and 'lastName' wich are 'derived' fields. The 'emailids' field in the table can be a comma separated value. So it ends up giving out one or more than one email ids and we expect the 'mailId' to be a multivalued field in Solr.
