Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by NoblePaul: http://wiki.apache.org/solr/DataImportHandler ------------------------------------------------------------------------------ '!EntityProcessor' rely on the !DataSource for fetching data. The return type of the !DataSource is important for an !EntityProcessor. The in-built ones are, * '''!SqlEntityProcessor''' : This is the defaut. The !DataSource must be of type `DataSourec<Iterator<Map<String, Object>>` . !JdbcDataSource can be used with this. * '''X!PathEntityProcessor''' : Used for XML type datasource. The !DataSource must be of type `DataSourec<Reader>` . !HttpDataSource or !FileDataSource can be used with this - * '''!FileListEntityProcessor''' : A simple one which can be used to enumerate the lost of files from a File System based on some criteria. It does not use a !DataSource + * '''!FileListEntityProcessor''' : A simple one which can be used to enumerate the list of files from a File System based on some criteria. It does not use a !DataSource . The entity attributes are.. + *'''`fileName`''' :(required) A regex pattern to identify files + *'''`baseDir`''' : (required) The Base directory (absolute path) + *'''`recursive`''' : Recursive listing or not.default is 'false ' + * '''`excludes`''' : A Regex pattern of excluded file names + * '''`newerThan`''' : A date param . Use the format (`yyyy-MM-dd HH:mm:ss`) . It can also be a datemath string eg: ('NOW-3DAYS'). The single quote is necessary . Or it can be a valid variableresolver format like (${var.name}) + * '''`olderThan`''' : A date param . Same rules as above + example: + {{{ + <entity name="f" processor="FileListEntityProcessor" fileName=".*xml" newerThan="'NOW-3DAYS'" recursive="true" rootEntity="false"> + <entity processor="XPathEntityProcessor" forEach="/the/record/xpath" url="${f.fileAbsolutePath}"> + <field column="full_name" xpath="/field/xpath"/> + </entity> + </entity> + }}} + + Do not miss the `rootEntity` attribute. The implicit fields generated by the processor are `fileAbsolutePath,fileSize,fileLastModified,fileName` [[Anchor(datasource)]] == DataSource == @@ -569, +585 @@ * The end output of each entity is combined together to construct a document * Note that the intermediate rows from C i.e `C.1, C.2, f(C.1) , f(C1)` are ignored == Field declarations == - Fields declared in the <entity> tags help us provide extra information which cannot be derived automatically. The tool relies on the 'column' values to fetch values from the results. The fields you explicitly add in the configuration are equivalent to the fields which are present in the solr schema.xml (implicit fields). It automatically inherits all the attributes present in the schema.xml. Just that you cannot add extra configuratio. Add the field entries when, + Fields declared in the <entity> tags help us provide extra information which cannot be derived automatically. The tool relies on the 'column' values to fetch values from the results. The fields you explicitly add in the configuration are equivalent to the fields which are present in the solr schema.xml (implicit fields). It automatically inherits all the attributes present in the schema.xml. Just that you cannot add extra configuration. Add the field entries when, * The fields emitted from the !EntityProcessor has a different name than the field in schema.xml * With in-built transformers . They expect extra information to decide which fields to process and how to process * X!PathEntityprocessor or any other processors which explicitly demand extra information in each fields
