[Solr Wiki] Update of "DataImportHandler" by FergusMcMenemie

Apache Wiki Thu, 30 Apr 2009 02:59:57 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by FergusMcMenemie:
http://wiki.apache.org/solr/DataImportHandler

The comment on the change is:
typos plus improvements to regexp transformer explanation

------------------------------------------------------------------------------
  {{{
  <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" 
url="jdbc:mysql://localhost/dbname" user="db_username" password="db_password"/>
  }}}
-  * The datasource configuration can also be done in solr config xml 
[#solrconfigdatasource also]
+  * The datasource configuration can also be done in solr config xml 
[#solrconfigdatasource]
   * The attribute 'type' specifies the implementation class. It is optional. 
The default value is `'JdbcDataSource'`
   * The attribute 'name' can be used if there are [#multipleds multiple 
datasources] used by multiple entities
   * All other attributes in the <dataSource> tag are specific to the 
particular dataSource implementation being configured. 
@@ -505, +505 @@

  </entity>
  }}}
  
- In this example the attributes 'regex' and 'sourceColName' are custom 
attributes used by the transformer. It reads the field 'full_name' from the 
resultset and transforms it to two target fields 'firstName' and 'lastName' . 
So even though the query returned only one column 'full_name' in the resultset 
the solr document gets two extra fields 'firstName' and 'lastName' wich are 
'derived' fields.
+ In this example the attributes 'regex' and 'sourceColName' are custom 
attributes used by the transformer. It reads the field 'full_name' from the 
resultset and transforms it to two new target fields 'firstName' and 
'lastName'. So even though the query returned only one column 'full_name' in 
the resultset the solr document gets two extra fields 'firstName' and 
'lastName' which are 'derived' fields. These new fields are only created if the 
regexp matches.
  
  The 'emailids' field in the table can be a comma separated value. So it ends 
up giving out one or more than one email ids and we expect the 'mailId' to be a 
multivalued field in Solr.
+ 
+ <!> Note that this transformer can either be used to split a string into 
tokens based on a '''`splitBy`''' pattern, or to perform a string substitution 
as per '''`replaceWith`''', or it can assign groups within a pattern to a list 
of '''`groupNames`'''. It decides what it is to do based upon the above 
attributes '''`splitBy`''', '''`replaceWith`''' and  '''`groupNames`''' which 
are looked for in order. This first one found is acted upon and other unrelated 
attributes are ignored.
+ 
  
  === ScriptTransformer ===
  It is possible to write transformers in Javascript or any other scripting 
language supported by Java. You must use '''Java 6''' to use this feature.
@@ -731, +734 @@

  
  === FileListEntityProcessor ===
  A simple entity processor which can be used to enumerate the list of files 
from a File System based on some criteria. It does not use a !DataSource. The 
entity attributes are:
-  *'''`fileName`''' :(required) A regex pattern to identify files
+  * '''`fileName`''' :(required) A regex pattern to identify files
-  *'''`baseDir`''' : (required) The Base directory (absolute path)
+  * '''`baseDir`''' : (required) The Base directory (absolute path)
-  *'''`recursive`''' : Recursive listing or not.default is 'false '
+  * '''`recursive`''' : Recursive listing or not.default is 'false '
   * '''`excludes`''' : A Regex pattern of excluded file names
   * '''`newerThan`''' : A date param . Use the format (`yyyy-MM-dd HH:mm:ss`) 
. It can also be a datemath string eg: ('NOW-3DAYS'). The single quote is 
necessary . Or it can be a valid variableresolver format like (${var.name})
   * '''`olderThan`''' : A date param . Same rules as above

[Solr Wiki] Update of "DataImportHandler" by FergusMcMenemie

Reply via email to