Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by FergusMcMenemie:
http://wiki.apache.org/solr/DataImportHandler

The comment on the change is:
Clarifying use of URLDataSource vs HTTPDataSource plus some camel case escaping

------------------------------------------------------------------------------
  {{{
  <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" 
url="jdbc:mysql://localhost/dbname" user="db_username" password="db_password"/>
  }}}
-  * The datasource configuration can be done in solr config xml 
[#solrconfigdatasource also]
+  * The datasource configuration can also be done in solr config xml 
[#solrconfigdatasource also]
   * The attribute 'type' specifies the implementation class. It is optional. 
The default value is `'JdbcDataSource'`
   * The attribute 'name' can be used if there are [#multipleds multiple 
datasources] used by multiple entities
-  * All other attributes in the <dataSource> tag are arbitrary. It is decided 
by the !DataSource implementation. [#jdbcdatasource See here] for attributes 
used by !JdbcDataSource and [#httpds see here] for URL!DataSource
+  * All other attributes in the <dataSource> tag are specific to the 
particular dataSource implementation being configured. 
   * [#datasource See here] for plugging in your own
  [[Anchor(multipleds)]]
  === Multiple DataSources ===
@@ -318, +318 @@

  DataImportHandler can be used to index data from HTTP based data sources. 
This includes using indexing from REST/XML APIs as well as from RSS/ATOM Feeds.
  
  [[Anchor(httpds)]]
+ 
- == Configuration of URLDataSource ==
+ == Configuration of URLDataSource or HTTPDataSource ==
  
+ <!> HTTP!DataSource is being deprecated in favour of URL!DataSource in 
["Solr1.4"]
+ 
- A sample configuration in for URL!DataSource in data config xml looks like 
this
+ Sample configurations for URL!DataSource <!> ["Solr1.4"] and HTTP!DataSource 
in data config xml look like this
  {{{
- <dataSource type="URLDataSource" baseUrl="http://host:port/"; encoding="UTF-8" 
connectionTimeout="5000" readTimeout="10000"/>
+ <dataSource name="a" type="URLDataSource" baseUrl="http://host:port/"; 
encoding="UTF-8" connectionTimeout="5000" readTimeout="10000"/>
+ <dataSource name="b" type="HTTPDataSource" baseUrl="http://host:port/"; 
encoding="UTF-8" connectionTimeout="5000" readTimeout="10000"/>
  }}}
- ''' The attributes are '''
+ ''' The extra attributes specific to this datasource are '''
  
   * '''`baseUrl`''' (optional): you should use it when the host/port changes 
between Dev/QA/Prod environments. Using this attribute isolates the changes to 
be made to the solrconfig.xml
   * '''`encoding`'''(optional): By default the encoding in the response header 
is used. You can use this property to override the default encoding.
@@ -359, +363 @@

  }}}
  
  
- == URLDataSource Example ==
+ == HTTPDataSource Example ==
+ <!> HTTP!DataSource is being deprecated in favour of URL!DataSource in 
["Solr1.4"]
  
  Download the full import example given in the DB section to try this out. 
We'll try indexing the [http://rss.slashdot.org/Slashdot/slashdot Slashdot RSS 
feed] for this example.
  
@@ -367, +372 @@

  The data-config for this example looks like this:
  {{{
  <dataConfig>
-         <dataSource type="URLDataSource" />
+         <dataSource type="HTTPDataSource" />
        <document>
                <entity name="slashdot"
-                               pk="link"
+                       pk="link"
-                               url="http://rss.slashdot.org/Slashdot/slashdot";
+                       url="http://rss.slashdot.org/Slashdot/slashdot";
-                               processor="XPathEntityProcessor"
+                       processor="XPathEntityProcessor"
-                               forEach="/RDF/channel | /RDF/item"
+                       forEach="/RDF/channel | /RDF/item"
-                               transformer="DateFormatTransformer">
+                       transformer="DateFormatTransformer">
  
-                       <field column="source" xpath="/RDF/channel/title" 
commonField="true" />
+                       <field column="source"       xpath="/RDF/channel/title" 
  commonField="true" />
-                       <field column="source-link" xpath="/RDF/channel/link" 
commonField="true" />
+                       <field column="source-link"  xpath="/RDF/channel/link"  
  commonField="true" />
-                       <field column="subject" xpath="/RDF/channel/subject" 
commonField="true" />
+                       <field column="subject"      
xpath="/RDF/channel/subject" commonField="true" />
  
-                       <field column="title" xpath="/RDF/item/title" />
+                       <field column="title"        xpath="/RDF/item/title" />
-                       <field column="link" xpath="/RDF/item/link" />
+                       <field column="link"         xpath="/RDF/item/link" />
-                       <field column="description" 
xpath="/RDF/item/description" />
+                       <field column="description"  
xpath="/RDF/item/description" />
-                       <field column="creator" xpath="/RDF/item/creator" />
+                       <field column="creator"      xpath="/RDF/item/creator" 
/>
                        <field column="item-subject" xpath="/RDF/item/subject" 
/>
+ 
+                       <field column="slash-department" 
xpath="/RDF/item/department" />
+                       <field column="slash-section"    
xpath="/RDF/item/section" />
+                       <field column="slash-comments"   
xpath="/RDF/item/comments" />
                        <field column="date" xpath="/RDF/item/date" 
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss" />
-                       <field column="slash-department" 
xpath="/RDF/item/department" />
-                       <field column="slash-section" xpath="/RDF/item/section" 
/>
-                       <field column="slash-comments" 
xpath="/RDF/item/comments" />
                </entity>
        </document>
  </dataConfig>
@@ -444, +450 @@

  Time taken was around 2 hours 40 minutes to index 7278241 articles with peak 
memory usage at around 4GB.
  
  == Using delta-import command ==
- The only EntityProcessor which supports delta is !SqlEntityProcessor! The 
X!PathEntityProcessor has not implemented it yet. So, unfortunately, there is 
no delta support for XML at this thime.
+ The only !EntityProcessor which supports delta is !SqlEntityProcessor! The 
X!PathEntityProcessor has not implemented it yet. So, unfortunately, there is 
no delta support for XML at this thime.
  If you want to implement those methods in X!PathEntityProcessor: The methods 
are explained in !EntityProcessor.java.
  
  = Indexing Emails =
@@ -720, +726 @@

  This is the defaut. The !DataSource must be of type 
`DataSource<Iterator<Map<String, Object>>>` . !JdbcDataSource can be used with 
this.
  
  === XPathEntityProcessor ===
- Used when indexing XML type data. The !DataSource must be of type 
`DataSourec<Reader>` . URL!DataSource or !FileDataSource is commonly used with 
X!PathEntityProcessor.
+ Used when indexing XML type data. The !DataSource must be of type 
`DataSourec<Reader>` . URL!DataSource <!> ["Solr1.4"] or !FileDataSource is 
commonly used with X!PathEntityProcessor.
  
  === FileListEntityProcessor ===
- A simple one which can be used to enumerate the list of files from a File 
System based on some criteria. It does not use a !DataSource. The entity 
attributes are:
+ A simple entity processor which can be used to enumerate the list of files 
from a File System based on some criteria. It does not use a !DataSource. The 
entity attributes are:
   *'''`fileName`''' :(required) A regex pattern to identify files
   *'''`baseDir`''' : (required) The Base directory (absolute path)
   *'''`recursive`''' : Recursive listing or not.default is 'false '
@@ -746, +752 @@

      <document>
  <dataConfig>
  }}}
- Do not miss the `rootEntity` attribute. The implicit fields generated by the 
FileListEntityProcessor are 
`fileAbsolutePath,fileSize,fileLastModified,fileName` and these are available 
for use within the entity X as shown above. It should be noted that 
FileListEntityProcessor returns a list of pathnames and that the subsequent 
entity must use the FileDataSource to fetch the files content.
+ Do not miss the `rootEntity` attribute. The implicit fields generated by the 
!FileListEntityProcessor are `fileAbsolutePath, fileSize, fileLastModified, 
fileName` and these are available for use within the entity X as shown above. 
It should be noted that !FileListEntityProcessor returns a list of pathnames 
and that the subsequent entity must use the !FileDataSource to fetch the files 
content.
  
  === CachedSqlEntityProcessor ===
  [[Anchor(cached)]]
@@ -808, +814 @@

  It is designed to iterate rows in DB one by one. A row is represented as a 
Map.
  
  === URLDataSource ===
+ <!> ["Solr1.4"]
  This datasource is often used with X!PathEntityProcessor to fetch content 
from an underlying file:// or http:// location. See the documentation [#httpds 
here] . The signature is as follows
  {{{
  public class URLDataSource extends DataSource<Reader>
  }}}
  
  === HTTPDataSource ===
- This datasource is now deprecated in favor of URL!DataSource. There is no 
change in functionality between URL!DataSource and !HTTP!DataSource, only a 
name change.
+ <!> HTTP!DataSource is being deprecated in favour of URL!DataSource in 
["Solr1.4"]. There is no change in functionality between URL!DataSource and 
!HTTP!DataSource, only a name change.
  
  === FileDataSource ===
  This can be used like an URL!DataSource but used to fetch content from files 
on disk. The only difference from URL!DataSource, when accessing disk files, is 
how a pathname is specified. The signature is as follows
@@ -896, +903 @@

  There are 3 datasources two RDBMS (jdbc1,jdbc2) and one xml/http (B)
  
   * `jdbc1` and `jdbc2` are instances of  type `JdbcDataSource` which are 
configured in the solrconfig.xml.
-  * `http` is an instance of type `URL!DataSource`
+  * `http` is an instance of type `HTTP!DataSource`
   * The root entity starts with a table called 'A' and uses 'jdbc1' as the 
datasource . The entity is conveniently named as the table itself
   * Entity 'A' has 2 sub-entities 'B' and 'C' . 'B' uses the datasource 
instance  'http' and 'C' uses the datasource instance 'jdbc2'
   * On doing a `command=full-import` The root-entity (A) is executed first

Reply via email to