[Solr Wiki] Update of "DataImportHandler" by NoblePaul

Apache Wiki Wed, 26 Mar 2008 07:09:35 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
  
  The configuration has a 'flexible' schema. It lets the user provide arbitrary 
attributes in an 'entity' tag  and 'field' tags. The tool reads the data and 
hands it over to the implementation class as it is. If the 'Transformer' needs 
extra information to be provided on a per entity/field basis it can do so. The 
values can be obtained from the Context. 
  
- There is an inbuilt transformer called '!RegExpTransfromer' provided with the 
tool itself. It helps in extracting values from fields (from db) using Regular 
Expressions.
+ There is an inbuilt transformer called '!RegexTransfromer' provided with the 
tool itself. It helps in extracting values from fields (from db) using Regular 
Expressions. The actual class name is 
`org.apache.solr.handler.dataimport.RegexTransformer` . But as it belongs to 
the default package , package-name can be omitted
  
  example:
  {{{
- <entity name="foo" 
transformer="org.apache.solr.handler.dataimport.RegExpTransformer"  
+ <entity name="foo" transformer="RegexTransformer"  
  query="select full_name , emailids from foo"/>
  ... />
     <field column="full_name"/>
-    <field column="firstName" regExp="Mr(\w*)\b.*" sourceColName="full_name"/>
+    <field column="firstName" regex="Mr(\w*)\b.*" sourceColName="full_name"/>
-    <field column="lastName" regExp="Mr.*?\b(\w*)" sourceColName="full_name"/>
+    <field column="lastName" regex="Mr.*?\b(\w*)" sourceColName="full_name"/>
     <field column="mailId" splitBy="," sourceColName="emailids"/>
  </entity>
  }}}
+ 
+ ''''Attributes required by `RegexTransformer`''''
+  * '''`regex`''' : The regular expression that is used to match . This or 
`splitBy` must be present for each field . If not, that field is not touched by 
the transformer . If `replaceWith` is absent, each ''group'' is taken as a 
value and a list of values is returned
+  * '''`sourceColName`''' : The column on which the regex is to be applied. If 
there is only one column this can be omitted
+  * '''`splitBy`''' : If the `regex` is used to split a String to obtain 
multipple values use this
+  * '''`replaceWith`''' : Used alongwith `regex` . It is equivalent to the 
method `new String(<sourceColVal>).replaceAll(<regex>, <replaceWith>)`
- Here the attributes 'regExp' and 'sourceColName' are custom attributes used 
by the transformer. It reads the field 'full_name' from the resultset and 
transform it to two target fields 'firstName' and 'lastName' . So even though 
the query returned only one column 'full_name' in the resultset the solr 
document gets two extra fields 'firstName' and 'lastName' wich are 'derived' 
fields.
+ Here the attributes 'regex' and 'sourceColName' are custom attributes used by 
the transformer. It reads the field 'full_name' from the resultset and 
transform it to two target fields 'firstName' and 'lastName' . So even though 
the query returned only one column 'full_name' in the resultset the solr 
document gets two extra fields 'firstName' and 'lastName' wich are 'derived' 
fields.
  
  The 'emailids' field in the table can be a comma separated value. So it ends 
up giving out one or more than one email ids and we expect the 'mailId' to be a 
multivalued field in Solr

[Solr Wiki] Update of "DataImportHandler" by NoblePaul

Reply via email to