Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by NoblePaul:
http://wiki.apache.org/solr/DataImportHandler

The comment on the change is:
datasource documentation

------------------------------------------------------------------------------
  {{{
  <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" 
url="jdbc:mysql://localhost/dbname" user="db_username" password="db_password"/>
  }}}
- '''note''' The 'type' attribute is optional . The default value is 
`'JdbcDataSource'`
- 
- The datasource configuration can also be done in solr config xml 
[#solrconfigdatasource also] . The attributes other than 'type' and 'name' are 
not decided by the datasource implementation. Each one can decide what it 
needs. We will discuss them as we see them
- 
+  * The datasource configuration can be done in solr config xml 
[#solrconfigdatasource also]
+  * The attribute 'type' specifies the implementation class. It is optional. 
The default value is `'JdbcDataSource'`
+  * The attribute 'name' can be used if there are [#multipleds multiple 
datasources] used by multiple entities   
+  * All other attributes in the <dataSource> tag are arbitrary. It is decided 
by the !DataSource implementation. [#jdbcdatasource See here] for attributes 
used by !JdbcDataSource and [#httpds see here] for !HttpDataSource 
+  * [#datasource See here] for plugging in your own 
+ [[Anchor(multipleds)]]
  === Multiple DataSources ===
  It is possible to have more than one datasources for a configuration. To 
configure an extra datasource , just keep an another 'dataSource'  tag . There 
is an implicit attribute "name" for a datasource. If there are more than one, 
each extra datasource must be identified by a unique name  
`'name="datasource-2"'` . 
  
@@ -74, +76 @@

  </entity>
  ..
  }}}
+ [[Anchor(jdbcdatasource)]]
  == Configuring JdbcDataSource ==
  The attributes accepted by !JdbcDataSource are ,
   * '''`driver`''' (required): The jdbc driver classname
@@ -95, +98 @@

   * '''`name`''' (required) : A unique name used to identify an entity
   * '''`processor`''' : Required only if the datasource is not RDBMS . (The 
default value is `SqlEntityProcessor`)
   * '''`transformer`'''  : Transformers to be applied on this entity. (See the 
transformer section)
-  * '''`dataSource`''' : The name of a datasource as put in the solrconfig.xml 
.(USed if there are multiple datasources) 
+  * '''`dataSource`''' : The name of a datasource as put in the solrconfig.xml 
.(Used if there are multiple datasources) 
   * '''`pk`''' : The primary key for the entity. Only needed for the root 
entity. This will be the id for the document
   * '''`rootEntity`''' : By default the entities falling under the document 
are root entities. If it is set to false , the entity directly falling under 
that entity will be treated as the root entity (so on and so forth). For every 
row returned by the roor entity a document is created in Solr
  
@@ -286, +289 @@

  
  = Usage with XML/HTTP Datasource =
  DataImportHandler can be used to index data from HTTP based data sources. 
This includes using indexing from REST/XML APIs as well as from RSS/ATOM Feeds.
- 
+ [[Anchor(httpds)]]
  == Configuration of HttpDataSource ==
  
  A sample configuration in for !HttpdataSource in data config xml looks like 
this
@@ -497, +500 @@

  == EntityProcessor ==
  Each entity is handled by a default Entity processor called 
!SqlEntityProcessor. This works well for systems which use RDBMS as a 
datasource. For other kind of datasources like  REST or Non Sql datasources you 
can choose to implement this interface 
`org.apache.solr.handler.dataimport.Entityprocessor`. This is designed to 
Stream rows one by one from an entity. The simplest way to implement your own 
!EntityProcessor is to just extent !EntityProcessorBase and override the 
`public Map<String,Object> nextRow()` method.
  
- 
+ [[Anchor(datasource)]]
  == DataSource ==
  A class can implement `org.apache.solr.handler.dataimport.DataSource` 
  {{{
@@ -564, +567 @@

   * With in-built transformers . They expect extra information to decide which 
fields to process and how to process
   * X!PathEntityprocessor or any other processors which explicitly demand 
extra information in each fields
  == What is a row? ==
- A row in !DataImportHandler is a Map (Map<String, Object). In the map , the 
key is the name of the field and the value can be anything which is a valid 
Solr type. The value can also be a Collection of the valid Solr types (this may 
get mapped to a multi-valued field). If the DataSource is RDBMS a query cannot 
emit a multivalued field. But it is possible to create a multivalued field by 
joining an entity with another.i.e if the sub-entity returns multiple rows for 
one row from parent entity it can go into a multivalued field. If the 
datasource is xml, it is possible to return a multivalued field.
+ A row in !DataImportHandler is a Map (Map<String, Object). In the map , the 
key is the name of the field and the value can be anything which is a valid 
Solr type. The value can also be a Collection of the valid Solr types (this may 
get mapped to a multi-valued field). If the !DataSource is RDBMS a query cannot 
emit a multivalued field. But it is possible to create a multivalued field by 
joining an entity with another.i.e if the sub-entity returns multiple rows for 
one row from parent entity it can go into a multivalued field. If the 
datasource is xml, it is possible to return a multivalued field.
  
  == A VariableResolver ==
  A !VariableResolver is the component which replaces all those placholders 
such as `${<name>}`. It is a multilevel Map .Each namespace is a Map and 
namespaces are separated by periods (.) . eg if there is a placeholder 
${item.ID} , 'item' is a nampespace (which is a map) and 'ID' is a value in 
that namespace. It is possible to nest namespaces like ${item.x.ID} where x 
could be another Map. A reference to the current !VariableResolver can be 
obtained from the Context. Or the object can be directly consumed by using 
${<name>} in 'query' for RDMS queries or 'url' in Http .
@@ -575, +578 @@

   * ''encodeUrl'' : Us this to encode urls . eg : 
`'${dataimporter.functions.encodeUrl(item.ID)}'` . Takes only one argument and 
must be a valid value in the !VariableResolver
  
  = Interactive Development Mode =
- This is a new cool and powerful feature in the tool. It helps you build a 
dataconfigxml with rthe UI. It can be accessed from 
http://host:port/solr/admin/dataimport.jsp . The features are
+ This is a new cool and powerful feature in the tool. It helps you build a 
dataconfig.xml with the UI. It can be accessed from 
http://host:port/solr/admin/dataimport.jsp . The features are
   * A UI with two panels . RHS takes in the input and LHS shows the output
   * When you hit the button 'debug now' it runs the configuration and shows 
the documents created
   * You can configure the start and rows parameters to debug documents say 115 
to 118 . 

Reply via email to