Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by NoblePaul: http://wiki.apache.org/solr/DataImportHandler The comment on the change is: datasource documentation ------------------------------------------------------------------------------ {{{ <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/dbname" user="db_username" password="db_password"/> }}} - '''note''' The 'type' attribute is optional . The default value is `'JdbcDataSource'` - - The datasource configuration can also be done in solr config xml [#solrconfigdatasource also] . The attributes other than 'type' and 'name' are not decided by the datasource implementation. Each one can decide what it needs. We will discuss them as we see them - + * The datasource configuration can be done in solr config xml [#solrconfigdatasource also] + * The attribute 'type' specifies the implementation class. It is optional. The default value is `'JdbcDataSource'` + * The attribute 'name' can be used if there are [#multipleds multiple datasources] used by multiple entities + * All other attributes in the <dataSource> tag are arbitrary. It is decided by the !DataSource implementation. [#jdbcdatasource See here] for attributes used by !JdbcDataSource and [#httpds see here] for !HttpDataSource + * [#datasource See here] for plugging in your own + [[Anchor(multipleds)]] === Multiple DataSources === It is possible to have more than one datasources for a configuration. To configure an extra datasource , just keep an another 'dataSource' tag . There is an implicit attribute "name" for a datasource. If there are more than one, each extra datasource must be identified by a unique name `'name="datasource-2"'` . @@ -74, +76 @@ </entity> .. }}} + [[Anchor(jdbcdatasource)]] == Configuring JdbcDataSource == The attributes accepted by !JdbcDataSource are , * '''`driver`''' (required): The jdbc driver classname @@ -95, +98 @@ * '''`name`''' (required) : A unique name used to identify an entity * '''`processor`''' : Required only if the datasource is not RDBMS . (The default value is `SqlEntityProcessor`) * '''`transformer`''' : Transformers to be applied on this entity. (See the transformer section) - * '''`dataSource`''' : The name of a datasource as put in the solrconfig.xml .(USed if there are multiple datasources) + * '''`dataSource`''' : The name of a datasource as put in the solrconfig.xml .(Used if there are multiple datasources) * '''`pk`''' : The primary key for the entity. Only needed for the root entity. This will be the id for the document * '''`rootEntity`''' : By default the entities falling under the document are root entities. If it is set to false , the entity directly falling under that entity will be treated as the root entity (so on and so forth). For every row returned by the roor entity a document is created in Solr @@ -286, +289 @@ = Usage with XML/HTTP Datasource = DataImportHandler can be used to index data from HTTP based data sources. This includes using indexing from REST/XML APIs as well as from RSS/ATOM Feeds. - + [[Anchor(httpds)]] == Configuration of HttpDataSource == A sample configuration in for !HttpdataSource in data config xml looks like this @@ -497, +500 @@ == EntityProcessor == Each entity is handled by a default Entity processor called !SqlEntityProcessor. This works well for systems which use RDBMS as a datasource. For other kind of datasources like REST or Non Sql datasources you can choose to implement this interface `org.apache.solr.handler.dataimport.Entityprocessor`. This is designed to Stream rows one by one from an entity. The simplest way to implement your own !EntityProcessor is to just extent !EntityProcessorBase and override the `public Map<String,Object> nextRow()` method. - + [[Anchor(datasource)]] == DataSource == A class can implement `org.apache.solr.handler.dataimport.DataSource` {{{ @@ -564, +567 @@ * With in-built transformers . They expect extra information to decide which fields to process and how to process * X!PathEntityprocessor or any other processors which explicitly demand extra information in each fields == What is a row? == - A row in !DataImportHandler is a Map (Map<String, Object). In the map , the key is the name of the field and the value can be anything which is a valid Solr type. The value can also be a Collection of the valid Solr types (this may get mapped to a multi-valued field). If the DataSource is RDBMS a query cannot emit a multivalued field. But it is possible to create a multivalued field by joining an entity with another.i.e if the sub-entity returns multiple rows for one row from parent entity it can go into a multivalued field. If the datasource is xml, it is possible to return a multivalued field. + A row in !DataImportHandler is a Map (Map<String, Object). In the map , the key is the name of the field and the value can be anything which is a valid Solr type. The value can also be a Collection of the valid Solr types (this may get mapped to a multi-valued field). If the !DataSource is RDBMS a query cannot emit a multivalued field. But it is possible to create a multivalued field by joining an entity with another.i.e if the sub-entity returns multiple rows for one row from parent entity it can go into a multivalued field. If the datasource is xml, it is possible to return a multivalued field. == A VariableResolver == A !VariableResolver is the component which replaces all those placholders such as `${<name>}`. It is a multilevel Map .Each namespace is a Map and namespaces are separated by periods (.) . eg if there is a placeholder ${item.ID} , 'item' is a nampespace (which is a map) and 'ID' is a value in that namespace. It is possible to nest namespaces like ${item.x.ID} where x could be another Map. A reference to the current !VariableResolver can be obtained from the Context. Or the object can be directly consumed by using ${<name>} in 'query' for RDMS queries or 'url' in Http . @@ -575, +578 @@ * ''encodeUrl'' : Us this to encode urls . eg : `'${dataimporter.functions.encodeUrl(item.ID)}'` . Takes only one argument and must be a valid value in the !VariableResolver = Interactive Development Mode = - This is a new cool and powerful feature in the tool. It helps you build a dataconfigxml with rthe UI. It can be accessed from http://host:port/solr/admin/dataimport.jsp . The features are + This is a new cool and powerful feature in the tool. It helps you build a dataconfig.xml with the UI. It can be accessed from http://host:port/solr/admin/dataimport.jsp . The features are * A UI with two panels . RHS takes in the input and LHS shows the output * When you hit the button 'debug now' it runs the configuration and shows the documents created * You can configure the start and rows parameters to debug documents say 115 to 118 .
