Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The following page has been changed by ShalinMangar: http://wiki.apache.org/solr/DataImportHandler The comment on the change is: Added section on data-config - more to come ------------------------------------------------------------------------------ </requestHandler> }}} - TODO - Section on data-config.xml + == Configuration in data-config.xml == + A SOLR document can be considered as a de-normalized schema having fields whose values come from multiple tables. + + The data-config.xml starts by defining a "document" element which contains '''one root entity'''. The root entity can contain multiple sub-entities. An entity corresponds to a table in a relational database. Each entity can contain multiple fields. Each field can correspond to a column in it's parent's table. Alternately, a field can also be a copyField which can get data from multiple columns. For each field, write the same attributes as you would write in a SOLR schema.xml, when you use DataImportHandler to create the schema, the SOLR-specifc attributes will be copied directly into the generated schema. + + In order to get data from the database, our design philosophy revolves around templatized 'sql' entered by the user for each entity. This gives the user the entire power of SQL if he needs it. The root entity is the central table whose primary key can be used to join this table with other child entities. + + Let us consider an example. Suppose we have the following schema in our database + + attachment:example-schema.png; + + This is a relational model of the same schema that SOLR currently ships with. We will use this as an example to build a data-config.xml for DataImportHandler. + + {{{ + <dataConfig> + <document name="products" defaultSearchField="text"> + <entity name="item" pk="id" query="select * from item"> + <field column="id" type="string" indexed="false" stored="true"/> + <field column="name" type="text" indexed="true" stored="true"/> + <field column="name" name="nameSort" type="string" indexed="true" stored="false"/> + <field column="name" name="alphaNameSort" type="alphaOnlySort" indexed="true" stored="false"/> + <field column="manu" type="text" indexed="true" stored="true" omitNorms="true"/> + <field column="weight" type="sfloat" indexed="true" stored="true"/> + <field column="price" type="sfloat" indexed="true" stored="true"/> + <field column="popularity" type="sint" indexed="true" stored="true"/> + <field column="inStock" type="boolean" indexed="true" stored="true"/> + + <entity name="feature" + query="select description from feature where item_id='${item.id}'"> + <field name="feature" column="description" type="text" indexed="true" stored="true" multiValued="true"/> + </entity> + <entity name="item_category" + query="select category_id from item_category where item_id='${item.id}'"> + <entity name="category" + query="select description from category where id = '${item_category.category_id}'"> + <field column="description" name="cat" type="text_ws" indexed="true" stored="true" multiValued="true" omitNorms="true" termVectors="true" /> + </entity> + </entity> + </entity> + <field name="text"> + <copyFrom>cat</copyFrom> + <copyFrom>name</copyFrom> + <copyFrom>manu</copyFrom> + <copyFrom>features</copyFrom> + </field> + </document> + </dataConfig> + + }}} + + Here, the root entity is a table called "item" whose primary key is a column "id". Data can be read from this table with the query "select * from item". + TODO: Further description ---- CategorySolrRequestHandler
