Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

------------------------------------------------------------------------------
  '!EntityProcessor' rely on the !DataSource for fetching data. The return type 
of the !DataSource is important for an !EntityProcessor. The in-built ones are,
   * '''!SqlEntityProcessor''' : This is the defaut. The !DataSource must be of 
type `DataSourec<Iterator<Map<String, Object>>` . !JdbcDataSource can be used 
with this.
   * '''X!PathEntityProcessor''' : Used for XML type datasource. The 
!DataSource must be of type `DataSourec<Reader>` . !HttpDataSource or 
!FileDataSource can be used with this
-  * '''!FileListEntityProcessor'''  : A simple one which can be used to 
enumerate the lost of files from a File System based on some criteria. It does 
not use a !DataSource 
+  * '''!FileListEntityProcessor'''  : A simple one which can be used to 
enumerate the list of files from a File System based on some criteria. It does 
not use a !DataSource . The entity attributes are..
+    *'''`fileName`''' :(required) A regex pattern to identify files
+    *'''`baseDir`''' : (required) The Base directory (absolute path)
+    *'''`recursive`''' : Recursive listing or not.default is 'false '
+    * '''`excludes`''' : A Regex pattern of excluded file names
+    * '''`newerThan`''' : A date param . Use the format (`yyyy-MM-dd 
HH:mm:ss`) . It can also be a datemath string eg: ('NOW-3DAYS'). The single 
quote is necessary . Or it can be a valid variableresolver format like 
(${var.name})
+    * '''`olderThan`''' : A date param . Same rules as above
+  example:
+ {{{
+ <entity name="f" processor="FileListEntityProcessor" fileName=".*xml" 
newerThan="'NOW-3DAYS'" recursive="true" rootEntity="false">
+   <entity processor="XPathEntityProcessor" forEach="/the/record/xpath" 
url="${f.fileAbsolutePath}">
+      <field column="full_name" xpath="/field/xpath"/> 
+   </entity>
+ </entity>
+ }}}
+ 
+ Do not miss the `rootEntity` attribute. The implicit fields generated by the 
processor are `fileAbsolutePath,fileSize,fileLastModified,fileName`
  
  [[Anchor(datasource)]]
  == DataSource ==
@@ -569, +585 @@

   * The end output of each entity is combined together to construct a document
     * Note that the intermediate rows from C i.e `C.1, C.2, f(C.1) , f(C1)` 
are ignored
  == Field declarations ==
- Fields declared in the <entity> tags help us provide extra information which 
cannot be derived automatically. The tool relies on the 'column' values to 
fetch values from the results. The fields you explicitly add in the 
configuration are equivalent to the fields which are present in the solr 
schema.xml (implicit fields). It automatically inherits all the attributes 
present in the schema.xml. Just that you cannot add extra configuratio. Add the 
field entries when,
+ Fields declared in the <entity> tags help us provide extra information which 
cannot be derived automatically. The tool relies on the 'column' values to 
fetch values from the results. The fields you explicitly add in the 
configuration are equivalent to the fields which are present in the solr 
schema.xml (implicit fields). It automatically inherits all the attributes 
present in the schema.xml. Just that you cannot add extra configuration. Add 
the field entries when,
   * The fields emitted from the !EntityProcessor has a different name than the 
field in schema.xml
   * With in-built transformers . They expect extra information to decide which 
fields to process and how to process
   * X!PathEntityprocessor or any other processors which explicitly demand 
extra information in each fields

Reply via email to