Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by FergusMcMenemie: http://wiki.apache.org/solr/DataImportHandler The comment on the change is: typos plus improvements to regexp transformer explanation ------------------------------------------------------------------------------ {{{ <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/dbname" user="db_username" password="db_password"/> }}} - * The datasource configuration can also be done in solr config xml [#solrconfigdatasource also] + * The datasource configuration can also be done in solr config xml [#solrconfigdatasource] * The attribute 'type' specifies the implementation class. It is optional. The default value is `'JdbcDataSource'` * The attribute 'name' can be used if there are [#multipleds multiple datasources] used by multiple entities * All other attributes in the <dataSource> tag are specific to the particular dataSource implementation being configured. @@ -505, +505 @@ </entity> }}} - In this example the attributes 'regex' and 'sourceColName' are custom attributes used by the transformer. It reads the field 'full_name' from the resultset and transforms it to two target fields 'firstName' and 'lastName' . So even though the query returned only one column 'full_name' in the resultset the solr document gets two extra fields 'firstName' and 'lastName' wich are 'derived' fields. + In this example the attributes 'regex' and 'sourceColName' are custom attributes used by the transformer. It reads the field 'full_name' from the resultset and transforms it to two new target fields 'firstName' and 'lastName'. So even though the query returned only one column 'full_name' in the resultset the solr document gets two extra fields 'firstName' and 'lastName' which are 'derived' fields. These new fields are only created if the regexp matches. The 'emailids' field in the table can be a comma separated value. So it ends up giving out one or more than one email ids and we expect the 'mailId' to be a multivalued field in Solr. + + <!> Note that this transformer can either be used to split a string into tokens based on a '''`splitBy`''' pattern, or to perform a string substitution as per '''`replaceWith`''', or it can assign groups within a pattern to a list of '''`groupNames`'''. It decides what it is to do based upon the above attributes '''`splitBy`''', '''`replaceWith`''' and '''`groupNames`''' which are looked for in order. This first one found is acted upon and other unrelated attributes are ignored. + === ScriptTransformer === It is possible to write transformers in Javascript or any other scripting language supported by Java. You must use '''Java 6''' to use this feature. @@ -731, +734 @@ === FileListEntityProcessor === A simple entity processor which can be used to enumerate the list of files from a File System based on some criteria. It does not use a !DataSource. The entity attributes are: - *'''`fileName`''' :(required) A regex pattern to identify files + * '''`fileName`''' :(required) A regex pattern to identify files - *'''`baseDir`''' : (required) The Base directory (absolute path) + * '''`baseDir`''' : (required) The Base directory (absolute path) - *'''`recursive`''' : Recursive listing or not.default is 'false ' + * '''`recursive`''' : Recursive listing or not.default is 'false ' * '''`excludes`''' : A Regex pattern of excluded file names * '''`newerThan`''' : A date param . Use the format (`yyyy-MM-dd HH:mm:ss`) . It can also be a datemath string eg: ('NOW-3DAYS'). The single quote is necessary . Or it can be a valid variableresolver format like (${var.name}) * '''`olderThan`''' : A date param . Same rules as above
