[Solr Wiki] Update of "DataImportHandler" by NoblePaul

Apache Wiki Sun, 27 Apr 2008 21:53:49 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.


The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

The comment on the change is:
EntityProcessor documenation

------------------------------------------------------------------------------
  
  You can use this feature for indexing from REST API's such as rss/atom feeds, 
XML data feeds , other SOLR servers or even well formed xhtml documents . Our 
XPath support has its limitations (no wildcards , only fullpath etc) but we 
have tried to make sure that common use-cases are covered and since it's based 
on a streaming parser, it is extremely fast and consumes constant amount of 
memory even for large XMLs. It does not support namespaces , but it can handle 
xmls with namespaces . When you provide the xpath, just drop the namespace and 
give the rest (eg if the tag is `'<dc:subject>'` the mapping should just 
contain `'subject'`).Easy, isn't it? And you didn't need to write one line of 
code! Enjoy :)
  
- note: Unlike with database , it is note possible to omit the field 
declarations if you are using X!PathEntityProcessor. It relies on the xpaths 
declared in the fields to identify what to extract from the xml. 
+ note: Unlike with database , it is not possible to omit the field 
declarations if you are using X!PathEntityProcessor. It relies on the xpaths 
declared in the fields to identify what to extract from the xml. 
  = Extending the tool with APIs =
  The examples we explored are admittedly, trivial . It is not possible to have 
all user needs met by an xml configuration alone. So we expose a few interfaces 
which can be implemented by the user to enhance the functionality.
  
@@ -502, +502 @@

  The rules for the template are same as the templates in 'query', 'url' etc. 
it helps to concatenate multiple values or add extra characters to field for 
injection. Only appplies on fields which have a 'template' attribute.
  ==== Attributes ====
   * '''`template`''' : The template string. In the above example there are two 
placeholders '${e.name}' and '${eparent.surname}' .   Both the values must be 
present when it is being evaluated. Else it will not be evaluated. 
- 
+ [[Anchor(entityprocessor)]]
  == EntityProcessor ==
  Each entity is handled by a default Entity processor called 
!SqlEntityProcessor. This works well for systems which use RDBMS as a 
datasource. For other kind of datasources like  REST or Non Sql datasources you 
can choose to implement this interface 
`org.apache.solr.handler.dataimport.Entityprocessor`. This is designed to 
Stream rows one by one from an entity. The simplest way to implement your own 
!EntityProcessor is to just extent !EntityProcessorBase and override the 
`public Map<String,Object> nextRow()` method.
+ '!EntityProcessor' rely on the !DataSource for fetching data. The return type 
of the !DataSource is important for an !EntityProcessor. The in-built ones are,
+  * '''!SqlEntityProcessor''' : This is the defaut. The !DataSource must be of 
type `DataSourec<Iterator<Map<String, Object>>` . !JdbcDataSource can be used 
with this.
+  * '''X!PathEntityProcessor''' : Used for XML type datasource. The 
!DataSource must be of type `DataSourec<Reader>` . !HttpDataSource or 
!FileDataSource can be used with this
+  * '''!FileListEntityProcessor'''  : A simple one which can be used to 
enumerate the lost of files from a File System based on some criteria. It does 
not use a !DataSource 
  
  [[Anchor(datasource)]]
  == DataSource ==
@@ -516, +520 @@

  
      /**Get a records for the given query. This is designed to stream records 
using an iterator
       * @param query . The query string . can be an sql for RDBMS .
-      * @return an Object which the Entityprocessor understands. For instanc, 
JdbcDataSource returns an Iterator<Map<String,Object>> and HttpDataSource and 
FileDataSource returs a java.io.reader
+      * @return an Object which the Entityprocessor understands. For instance, 
JdbcDataSource returns an Iterator<Map<String,Object>> and HttpDataSource and 
FileDataSource returs a java.io.reader
       */
      public T getData(String query);

[Solr Wiki] Update of "DataImportHandler" by NoblePaul

Reply via email to