Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The following page has been changed by ShalinMangar:
http://wiki.apache.org/solr/DataImportHandler

The comment on the change is:
Added section on data-config - more to come

------------------------------------------------------------------------------
    </requestHandler>
  }}}
  
- TODO - Section on data-config.xml
+ == Configuration in data-config.xml ==
+ A SOLR document can be considered as a de-normalized schema having fields 
whose values come from multiple tables.
+ 
+ The data-config.xml starts by defining a "document" element which contains 
'''one root entity'''. The root entity can contain multiple sub-entities. An 
entity corresponds to a table in a relational database. Each entity can contain 
multiple fields. Each field can correspond to a column in it's parent's table. 
Alternately, a field can also be a copyField which can get data from multiple 
columns. For each field, write the same attributes as you would write in a SOLR 
schema.xml, when you use DataImportHandler to create the schema, the 
SOLR-specifc attributes will be copied directly into the generated schema.
+ 
+ In order to get data from the database, our design philosophy revolves around 
templatized 'sql' entered by the user for each entity. This gives the user the 
entire power of SQL if he needs it. The root entity is the central table whose 
primary key can be used to join this table with other child entities.
+ 
+ Let us consider an example. Suppose we have the following schema in our 
database
+ 
+ attachment:example-schema.png;
+ 
+ This is a relational model of the same schema that SOLR currently ships with. 
We will use this as an example to build a data-config.xml for 
DataImportHandler. 
+ 
+ {{{
+ <dataConfig>
+     <document name="products" defaultSearchField="text">
+         <entity name="item" pk="id" query="select * from item">
+             <field column="id" type="string" indexed="false" stored="true"/>
+             <field column="name" type="text" indexed="true" stored="true"/>
+             <field column="name" name="nameSort" type="string" indexed="true" 
stored="false"/>
+             <field column="name" name="alphaNameSort" type="alphaOnlySort" 
indexed="true" stored="false"/>
+             <field column="manu" type="text" indexed="true" stored="true" 
omitNorms="true"/>
+             <field column="weight" type="sfloat" indexed="true" 
stored="true"/>
+             <field column="price" type="sfloat" indexed="true" stored="true"/>
+             <field column="popularity" type="sint" indexed="true" 
stored="true"/>
+             <field column="inStock" type="boolean" indexed="true" 
stored="true"/>
+ 
+             <entity name="feature"
+                     query="select description from feature where 
item_id='${item.id}'">
+                 <field name="feature" column="description" type="text" 
indexed="true" stored="true" multiValued="true"/>
+             </entity>
+             <entity name="item_category"
+                     query="select category_id from item_category where 
item_id='${item.id}'">
+                 <entity name="category"
+                         query="select description from category where id = 
'${item_category.category_id}'">
+                     <field column="description" name="cat" type="text_ws" 
indexed="true" stored="true" multiValued="true" omitNorms="true" 
termVectors="true" />
+                 </entity>
+             </entity>
+         </entity>
+         <field name="text">
+             <copyFrom>cat</copyFrom>
+             <copyFrom>name</copyFrom>
+             <copyFrom>manu</copyFrom>
+             <copyFrom>features</copyFrom>
+         </field>
+     </document>
+ </dataConfig>
+ 
+ }}}
+ 
+ Here, the root entity is a table called "item" whose primary key is a column 
"id". Data can be read from this table with the query "select * from item".
+ TODO: Further description
  
  ----
  CategorySolrRequestHandler

Reply via email to