Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by NoblePaul: http://wiki.apache.org/solr/DataImportHandler ------------------------------------------------------------------------------ = Overview = == Motivation == - Most applications store data in relational databases and searching over such data is a common use-case. However, there is no standard way to import this data into SOLR index requiring custom tools external to SOLR. + Most applications store data in relational databases and searching over such data is a common use-case. However, there is no standard way to import this data into SOLR index requiring custom tools external to SOLR. Another common use case is data available in REST datasources (eg: RSS) , xml files etc == Goals == - * Read data residing in relational databases + * Read data residing in relational databases * Build SOLR documents by aggregating data from multiple columns and tables according to configuration * Update SOLR with such documents * Provide ability to do full imports according to configuration * Detect inserts/update deltas (changes) and do delta imports (we assume a last-modified timestamp column for this to work) * Schedule full imports and delta imports + * Read and Index data from xml/(http/file) based on configuration + * Make it possible to plugin any kind of datasource (ftp,scp etc) and any other format of user choice (JSON,csv etc) = Design Overview = As the name suggests, this is implemented as a SolrRequestHandler. The configuration is provided in two places: - * solrconfig.xml (data source information is read from here e.g. JDBC Driver, JDBC URL, Username, Password etc.) + * solrconfig.xml . data source information is read from here. (For a Jdbc datasource JDBC Driver, JDBC URL, User name, Password etc.) - * data-config.xml (DB Table/column to SOLR document mapping comes here) - - + * data-config.xml + * How to fetch data (queries,url etc) + * What to read ( resultset columns, xml fields etc) + * How to process (modify/add/remove fields) = Usage with databases = In order to use this handler, the following steps are required. * Define a data-config.xml and specify the location this file in solrconfig.xml under DataImportHandler section @@ -52, +55 @@ </lst> </requestHandler> }}} - note: It is possible to have more than one datasources for a configuration. To configure another datasource , just keep an another `<lst name="datasource">` entry . There is an implicit attribute "name" for a datasource. If there are more than one, each extra datasource must be identified by a unique name like this `<str name="name">datasource-2/str>` + note: It is possible to have more than one datasources for a configuration. To configure another datasource , just keep an another `<lst name="datasource">` entry . There is an implicit attribute "name" for a datasource. If there are more than one, each extra datasource must be identified by a unique name . eg: `<str name="name">datasource-2/str>` == Configuration in data-config.xml == A SOLR document can be considered as a de-normalized schema having fields whose values come from multiple tables. @@ -62, +65 @@ In order to get data from the database, our design philosophy revolves around 'templatized sql' entered by the user for each entity. This gives the user the entire power of SQL if he needs it. The root entity is the central table whose columns can be used to join this table with other child entities. === Schema for the data config === - The dataconfig does not have a rigid schema. The attributes in the entity/field are arbitrary and depends on the `processor` and `transformer`. For !JdbcdataSource the entity attributes are + The dataconfig does not have a rigid schema. The attributes in the entity/field are arbitrary and depends on the `processor` and `transformer`. - The default attributes for an entity + The default attributes for an entity are: * '''`name`''' (required) : A unique name used to identify an entity * '''`processor`''' : Required only if the datasource is not RDBMS . (The default value is `SqlEntityProcessor`) * '''`transformer`''' : Transformers to be applied on this entity. (See the transformer section)
