Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by NoblePaul: http://wiki.apache.org/solr/DataImportHandler The comment on the change is: Documentation on Intercative development mode ------------------------------------------------------------------------------ * '''delta-import''' : For incremental imports and change detection run the command `http://<host>:<port>/solr/dataimport?command=delta-import` * '''status''' : To know the status of the current command , hit the URL `http://<host>:<port>/solr/dataimport` .It gives an elaborate statistics on no:of docs created, deleted, queries run, rows fetched , status etc * '''reload-config''' : If the data-config is changed and you wissh to reload the file without restarting Solr. run the command `http://<host>:<port>/solr/dataimport?command=reload-config` - * '''abort''' : Abort an ongoing opertaion by hitting the url `http://<host>:<port>/solr/dataimport?command=abort` + * '''abort''' : Abort an ongoing operation by hitting the url `http://<host>:<port>/solr/dataimport?command=abort` + * '''status''' : See the current by hitting the url `http://<host>:<port>/solr/dataimport` == Full Import Example == @@ -363, +364 @@ What about this ''transformer=!DateFormatTransformer'' attribute in the entity? . See [#DateFormatTransformer DateFormatTransformer] Section for details You can use this feature for indexing from REST API's such as rss/atom feeds, XML data feeds , other SOLR servers or even well formed xhtml documents . Our XPath support has its limitations (no wildcards , only fullpath etc) but we have tried to make sure that common use-cases are covered and since it's based on a streaming parser, it is extremely fast and consumes constant amount of memory even for large XMLs. It does not support namespaces , but it can handle xmls with namespaces . When you provide the xpath, just drop the namespace and give the rest (eg if the tag is `'<dc:subject>'` the mapping should just contain `'subject'`).Easy, isn't it? And you didn't need to write one line of code! Enjoy :) + + note: Unlike with database , it is note possible to omit the field declarations if you are using X!PathEntityProcessor. It relies on the xpaths declared in the fields to identify what to extract from the xml. = Extending the tool with APIs = The examples we explored are admittedly, trivial . It is not possible to have all user needs met by an xml configuration alone. So we expose a few interfaces which can be implemented by the user to enhance the functionality. @@ -483, +486 @@ eg: {{{ <entity name="e" transformer="TemplateTransformer" ..> - <field column="price" template="hello${e.name},${eparent.surname}" /> + <field column="namedesc" template="hello${e.name},${eparent.surname}" /> ... </entity> }}} @@ -555, +558 @@ * Each row that comes out of C is fed into 'f' and 'g' sequentially (transformers are chained) . Each transformer can change the input. Note that the transformer 'g' produces 2 output rows for an input row `f(C.1)) * The end output of each entity is combined together to construct a document * Note that the intermediate rows from C i.e `C.1, C.2, f(C.1) , f(C1)` are ignored - + == Field declarations == + Fields declared in the <entity> tags help us provide extra information which cannot be derived automatically. The tool relies on the 'column' values to fetch values from the results. The fields you explicitly add in the configuration are equivalent to the fields which are present in the solr schema.xml (implicit fields). It automatically inherits all the attributes present in the schema.xml. Just that you cannot add extra configuratio. Add the field entries when, + * The fields emitted from the !EntityProcessor has a different name than the field in schema.xml + * With in-built transformers . They expect extra information to decide which fields to process and how to process + * X!PathEntityprocessor or any other processors which explicitly demand extra information in each fields == What is a row? == - A row in !DataImportHandler is a Map (Map<String, Object). In the map , the key is the name of the field and the value can be anything which is a valid Solr type. The value can also be a Collection of the valid Solr types (this may get mapped to a multi-valued field). If the DataSource is RDBMS a query cannot emit a multivalued field. But it is possible to create a multivalued field by joining an entity with another.i.e if the sub-entity returns multiple rows for one row from parent entity it can go into a multivalued field. If the datadource is xml it is possible to return a multivalued field. + A row in !DataImportHandler is a Map (Map<String, Object). In the map , the key is the name of the field and the value can be anything which is a valid Solr type. The value can also be a Collection of the valid Solr types (this may get mapped to a multi-valued field). If the DataSource is RDBMS a query cannot emit a multivalued field. But it is possible to create a multivalued field by joining an entity with another.i.e if the sub-entity returns multiple rows for one row from parent entity it can go into a multivalued field. If the datasource is xml, it is possible to return a multivalued field. == A VariableResolver == A !VariableResolver is the component which replaces all those placholders such as `${<name>}`. It is a multilevel Map .Each namespace is a Map and namespaces are separated by periods (.) . eg if there is a placeholder ${item.ID} , 'item' is a nampespace (which is a map) and 'ID' is a value in that namespace. It is possible to nest namespaces like ${item.x.ID} where x could be another Map. A reference to the current !VariableResolver can be obtained from the Context. Or the object can be directly consumed by using ${<name>} in 'query' for RDMS queries or 'url' in Http . @@ -567, +574 @@ * ''escapeSql'' : Use this to escape special sql characters . eg : `'${dataimporter.functions.escapeSql(item.ID)}'` . Takes only one argument and must be a valid value in the !VaraiableResolver. * ''encodeUrl'' : Us this to encode urls . eg : `'${dataimporter.functions.encodeUrl(item.ID)}'` . Takes only one argument and must be a valid value in the !VariableResolver - + = Interactive Development Mode = + This is a new cool and powerful feature in the tool. It helps you build a dataconfigxml with rthe UI. It can be accessed from http://host:port/solr/admin/dataimport.jsp . The features are + * A UI with two panels . RHS takes in the input and LHS shows the output + * When you hit the button 'debug now' it runs the configuration and shows the documents created + * You can configure the start and rows parameters to debug documents say 115 to 118 . + * Choose the 'verbose' option to get detailed information about the intermediete steps. What was emitted by the query and what went into the Transformer and what was the output. + * If an exception occurred during the run, the stacktrace is shown right there + * The fields produced by the Entities, Transformers may not be visible in documents if the fields are either not present in the schema.xml of there is an explicit <field> declaration = Where to find it? = DataImportHandler is not in SOLR right now. You can either:
