Storing, indexing and searching XML documents in Solr
Hi, I'm new to solr so apologies if the solution is already documented. I have installed and populated a solr index using the examples as a template with a version of the data below. I have XML in the form of entity resource guid123898-2092099098982/guid media_formatBlu-Ray/media_format updated2011-05-05T11:25:35+0500/updated /resource price currency=usd3.99price discounts discount type=percentage rate=30 start=2011-05-03T00:00:00 end=2011-05-10T00:00:00 / discount type=decimal amount=1.99 coupon=1 / . /discounts aspect_ratio16:9/aspect_ratio duration1620/duration categories category id=drama / category id=horror / /categories rating rate id=D1contains some scenes which some viewers may find upsetting/rate /rating ... media_typeVideo/media_type /entity Can I populate solr directly with this document (like I believe marklogic does )? If yes Can I search on any attribute ( i.e. find all records where /entity/resource/media_format equals blu-ray ) If no What is the best practice to import the attributes above into solr ( i.e. patterns for sub dividing / flattening document ). Does solr support attached documents and if so is this advised ( how does it affect performance ). Any help is greatly appreciated. Pointers to documentation that address my issues is even more helpful. Thanks again OJ
Re: Storing, indexing and searching XML documents in Solr
On 5/18/2011 4:19 PM, Judioo wrote: Any help is greatly appreciated. Pointers to documentation that address my issues is even more helpful. I think this would be a good start: http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
Re: Storing, indexing and searching XML documents in Solr
The data is being imported directly from mysql. The document is however indeed a good starting place. Thanks 2011/5/18 Yury Kats yuryk...@yahoo.com On 5/18/2011 4:19 PM, Judioo wrote: Any help is greatly appreciated. Pointers to documentation that address my issues is even more helpful. I think this would be a good start: http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
Re: Storing, indexing and searching XML documents in Solr
Great document. I can see how to import the data direct from the database. However it seems as though I need to write xpath's in the config to extract the fields that I wish to transform into an solr document. So it seems that there is no way of storing the document structure in solr as is? 2011/5/18 Yury Kats yuryk...@yahoo.com On 5/18/2011 4:19 PM, Judioo wrote: Any help is greatly appreciated. Pointers to documentation that address my issues is even more helpful. I think this would be a good start: http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource
Re: Storing, indexing and searching XML documents in Solr
You're right, you can't store an XML document directly in Solr. You have to pull it apart and index it such that you can get whatever information back you need. How you flatten data depends entirely upon your needs. The high-level idea is that you want to create fields such that text searches work. The moment you start thinking about how can I express a relationship in the query, back up and try to flatten the data so you can just *search*. This is vague, I know. But so much depends on how you want to use the data that specifics are hard to give. You've gotta take off your DB hat and not worry about duplicating data. De-normalize lots and lots and lots first... Best Erick On Wed, May 18, 2011 at 5:27 PM, Judioo cont...@judioo.com wrote: Great document. I can see how to import the data direct from the database. However it seems as though I need to write xpath's in the config to extract the fields that I wish to transform into an solr document. So it seems that there is no way of storing the document structure in solr as is? 2011/5/18 Yury Kats yuryk...@yahoo.com On 5/18/2011 4:19 PM, Judioo wrote: Any help is greatly appreciated. Pointers to documentation that address my issues is even more helpful. I think this would be a good start: http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource