Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by NoblePaul: http://wiki.apache.org/solr/DataImportHandler The comment on the change is: EntityProcessor documenation ------------------------------------------------------------------------------ You can use this feature for indexing from REST API's such as rss/atom feeds, XML data feeds , other SOLR servers or even well formed xhtml documents . Our XPath support has its limitations (no wildcards , only fullpath etc) but we have tried to make sure that common use-cases are covered and since it's based on a streaming parser, it is extremely fast and consumes constant amount of memory even for large XMLs. It does not support namespaces , but it can handle xmls with namespaces . When you provide the xpath, just drop the namespace and give the rest (eg if the tag is `'<dc:subject>'` the mapping should just contain `'subject'`).Easy, isn't it? And you didn't need to write one line of code! Enjoy :) - note: Unlike with database , it is note possible to omit the field declarations if you are using X!PathEntityProcessor. It relies on the xpaths declared in the fields to identify what to extract from the xml. + note: Unlike with database , it is not possible to omit the field declarations if you are using X!PathEntityProcessor. It relies on the xpaths declared in the fields to identify what to extract from the xml. = Extending the tool with APIs = The examples we explored are admittedly, trivial . It is not possible to have all user needs met by an xml configuration alone. So we expose a few interfaces which can be implemented by the user to enhance the functionality. @@ -502, +502 @@ The rules for the template are same as the templates in 'query', 'url' etc. it helps to concatenate multiple values or add extra characters to field for injection. Only appplies on fields which have a 'template' attribute. ==== Attributes ==== * '''`template`''' : The template string. In the above example there are two placeholders '${e.name}' and '${eparent.surname}' . Both the values must be present when it is being evaluated. Else it will not be evaluated. - + [[Anchor(entityprocessor)]] == EntityProcessor == Each entity is handled by a default Entity processor called !SqlEntityProcessor. This works well for systems which use RDBMS as a datasource. For other kind of datasources like REST or Non Sql datasources you can choose to implement this interface `org.apache.solr.handler.dataimport.Entityprocessor`. This is designed to Stream rows one by one from an entity. The simplest way to implement your own !EntityProcessor is to just extent !EntityProcessorBase and override the `public Map<String,Object> nextRow()` method. + '!EntityProcessor' rely on the !DataSource for fetching data. The return type of the !DataSource is important for an !EntityProcessor. The in-built ones are, + * '''!SqlEntityProcessor''' : This is the defaut. The !DataSource must be of type `DataSourec<Iterator<Map<String, Object>>` . !JdbcDataSource can be used with this. + * '''X!PathEntityProcessor''' : Used for XML type datasource. The !DataSource must be of type `DataSourec<Reader>` . !HttpDataSource or !FileDataSource can be used with this + * '''!FileListEntityProcessor''' : A simple one which can be used to enumerate the lost of files from a File System based on some criteria. It does not use a !DataSource [[Anchor(datasource)]] == DataSource == @@ -516, +520 @@ /**Get a records for the given query. This is designed to stream records using an iterator * @param query . The query string . can be an sql for RDBMS . - * @return an Object which the Entityprocessor understands. For instanc, JdbcDataSource returns an Iterator<Map<String,Object>> and HttpDataSource and FileDataSource returs a java.io.reader + * @return an Object which the Entityprocessor understands. For instance, JdbcDataSource returns an Iterator<Map<String,Object>> and HttpDataSource and FileDataSource returs a java.io.reader */ public T getData(String query);
