[ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569572#action_12569572
 ] 

shalinmangar edited comment on SOLR-469 at 2/16/08 12:13 PM:
----------------------------------------------------------------------

*Changes*
* Support for deleted rows detection (Details will be added to Wiki soon)
* Numerous bug fixes
* Merged DataImporter and DataImporterContext together
* Improved response format showing status messages of operation
* DataImportHandler is now SolrCoreAware
* Code refactorings
* A Verifier which checks data-config.xml against the solr schema.xml to make 
sure that all fields defined in data-config.xml are defined in schema.xml and 
all (required) fields defined in solr schema.xml are mentioned in 
data-config.xml

We recently indexed around 1.7 million documents using this tool. The documents 
had mostly sint and sdouble fields in it (since we wanted to see the 
performance of this patch and not lucene's speed). We were able to index 1.7 
million documents in 166 seconds on our production hardware.

Note: Details of the API exposed in our work is now added to our 
[Wiki|http://wiki.apache.org/solr/DataImportHandler]. Also, an example solr 
home is provided in the Wiki page (under "Full Import Example" section to try 
this out.

      was (Author: shalinmangar):
    *Changes*
* Support for deleted rows detection (Details will be added to Wiki soon)
* Numerous bug fixes
* Merged DataImporter and DataImporterContext together
* Improved response format showing status messages of operation
* DataImportHandler is now SolrCoreAware
* Code refactorings

We recently indexed around 1.7 million documents using this tool. The documents 
had mostly sint and sdouble fields in it (since we wanted to see the 
performance of this patch and not lucene's speed). We were able to index 1.7 
million documents in 166 seconds on our production hardware.

Note: Details of the API exposed in our work is now added to our 
[Wiki|http://wiki.apache.org/solr/DataImportHandler]
  
> DB Import RequestHandler
> ------------------------
>
>                 Key: SOLR-469
>                 URL: https://issues.apache.org/jira/browse/SOLR-469
>             Project: Solr
>          Issue Type: New Feature
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Noble Paul
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: SOLR-469.patch, SOLR-469.patch, SOLR-469.patch
>
>
> We need a RequestHandler Which can import data from a DB or other dataSources 
> into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
> (SOLR-103).
> The way it works is as follows.
>     * Provide a configuration file (xml) to the Handler which takes in the 
> necessary SQL queries and mappings to a solr schema
>           - It also takes in a properties file for the data source 
> configuraution
>     * Given the configuration it can also generate the solr schema.xml
>     * It is registered as a RequestHandler which can take two commands 
> do-full-import, do-delta-import
>           -  do-full-import - dumps all the data from the Database into the 
> index (based on the SQL query in configuration)
>           - do-delta-import - dumps all the data that has changed since last 
> import. (We assume a modified-timestamp column in tables)
>     * It provides a admin page
>           - where we can schedule it to be run automatically at regular 
> intervals
>           - It shows the status of the Handler (idle, full-import, 
> delta-import)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to