[Solr Wiki] Update of "DataImportHandler" by FergusMcMenemie

Apache Wiki Fri, 20 Mar 2009 02:55:37 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by FergusMcMenemie:
http://wiki.apache.org/solr/DataImportHandler

The comment on the change is:
clarifying documenation of RegexTransformer

------------------------------------------------------------------------------
  
  === RegexTransformer ===
  
- There is an built-in transformer called '!RegexTransfromer' provided with the 
tool itself. It helps in extracting values from fields (from the source) using 
Regular Expressions. The actual class name is 
`org.apache.solr.handler.dataimport.RegexTransformer` . But as it belongs to 
the default package , package-name can be omitted
+ There is an built-in transformer called '!RegexTransfromer' provided with 
DIH. It helps in extracting or manipulating values from fields (from the 
source) using Regular Expressions. The actual class name is 
`org.apache.solr.handler.dataimport.RegexTransformer`. But as it belongs to the 
default package the package-name can be omitted.
  
+ 
+ 
+ ==== Attributes ====
+ !RegexTransfromer is only activated for fields with an attribute of 'regex' 
or 'splitBy'. Other fields are ignored.
+  * '''`regex`''' : The regular expression that is used to match against the 
column or sourceColName's value(s). If `replaceWith` is absent, each regex 
''group'' is taken as a value and a list of values is returned
+  * '''`sourceColName`''' : The column on which the regex is to be applied. If 
this is absent source and target are same
+  * '''`splitBy`''' : Used to split a String to obtain multiple values, 
returns a list of values
+  * '''`groupNames`''' : A comma separated list of field column names, used 
where the `regex` contains groups and each group is to be saved to a different 
field. If some groups are not to be named leave a space between commas.  <!> 
["Solr1.4"]
+  * '''`replaceWith`''' : Used along with `regex` . It is equivalent to the 
method `new String(<sourceColVal>).replaceAll(<regex>, <replaceWith>)`
  
  example:
  {{{
@@ -486, +495 @@

  </entity>
  }}}
  
- ==== Attributes ====
- !RegexTransfromer applies only on the fields with an attribute 'regex' or 
'splitBy'. All other fields are left as it is.
-  * '''`regex`''' : The regular expression that is used to match . This or 
`splitBy` must be present for each field. If not, that field is not touched by 
the transformer . If `replaceWith` is absent, each ''group'' is taken as a 
value and a list of values is returned
-  * '''`sourceColName`''' : The column on which the regex is to be applied. If 
this is absent source and target are same
-  * '''`splitBy`''' : If the `regex` is used to split a String to obtain 
multipple values use this
-  * '''`groupNames`''' : If the `regex` contains groups and each of them go to 
different fields , each group can be given a name (comma separated) . If some 
groups are not to be named leave a space between commas.  <!> ["Solr1.4"]
-  * '''`replaceWith`''' : Used alongwith `regex` . It is equivalent to the 
method `new String(<sourceColVal>).replaceAll(<regex>, <replaceWith>)`
- Here the attributes 'regex' and 'sourceColName' are custom attributes used by 
the transformer. It reads the field 'full_name' from the resultset and 
transform it to two target fields 'firstName' and 'lastName' . So even though 
the query returned only one column 'full_name' in the resultset the solr 
document gets two extra fields 'firstName' and 'lastName' wich are 'derived' 
fields.
+ In this example the attributes 'regex' and 'sourceColName' are custom 
attributes used by the transformer. It reads the field 'full_name' from the 
resultset and transforms it to two target fields 'firstName' and 'lastName' . 
So even though the query returned only one column 'full_name' in the resultset 
the solr document gets two extra fields 'firstName' and 'lastName' wich are 
'derived' fields.
  
  The 'emailids' field in the table can be a comma separated value. So it ends 
up giving out one or more than one email ids and we expect the 'mailId' to be a 
multivalued field in Solr.

[Solr Wiki] Update of "DataImportHandler" by FergusMcMenemie

Reply via email to