Re: Converting nested data model to solr schema

Jack Krupansky Mon, 01 Jul 2013 07:11:30 -0700

Simply duplicate a subset of the fields that you want to query of the parentdocument on each child document and then you can directly query the childdocuments without any join.

Yes, given the complexity of your data, a two-step query process may benecessary for some queries - do one query to get parent or child IDs andthen do a second query filtered by those IDs.

And, yes, this only approximates the full power of an SQL join - but at atiny fraction of the cost.


-- Jack Krupansky

-----Original Message-----From: adfel70

Sent: Monday, July 01, 2013 9:56 AM
To: solr-user@lucene.apache.org
Subject: Converting nested data model to solr schema

Hi,
I have the following data model:
1. Document (fields: doc_id, author, content)
2. Each Document has multiple  attachment types. Each attachment type has
multiple instances. And each attachment type may have different fields.
for example:
<doc>
  <doc_id>1</doc_id>
  <author>john</author>
  <content>some long long text...</content>
  <file_attachments>
     <file_attachment>
        <attach_id>458</attach_id>
        <attach_text>SomeText</attach_text>
        <attach_date>12/12/2012</attach_date>
     </file_attachment>
     <file_attachment>
        <attach_id>568</attach_id>
        <attach_text>SomeText2</attach_text>
        <attach_date>12/11/2012</attach_date>
     </file_attachment>
  </file_attachments>
  <reply_attachments>
     <reply_attachment>
        <reply_id>345</reply_id>
        <reply_text>SomeText</reply_text>
        <reply_author>Jack</reply_author>
        <reply_date>22-12-2012</reply_date>
     </reply_attachment>
     <reply_attachment>
        <reply_id>897</attach_id>
        <reply_text>SomeText2</reply_text>
        <reply_author>Bob</reply_author>
        <reply_date>23-12-2012</reply_date>
     </reply_attachment>
  </reply_attachments>

I want to index all this data in solr cloud.
My current solution is to index the original document by its self and index
each attachment as a single solr document with its parent_doc_id, and then
use solr join capability.
The problem with this solution is  that I must index all the attachments of
each document, and the document itself in the same shard (current solr
limitation).
This requires me to override the solr document distribution mechanism.
I fear that with this solution I may loose some of solr cloud's
capabilities.
My questions are:
1. Are my concerns regarding downside of overriding solr cloud's
out-of-the-box mechanism justified? Or should I proceed with this solution?
2. If I'm looking for another solution, can I  somehow keep all attachments
on the same document and be able to query on a single attachment?
A query example:
Retrieve  all documents where:
content: contains "abc"
AND
reply_attachment.author = 'Bob'
AND
reply_attachment.date = '12-12-2012'

Thanks.

--

View this message in context:http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351.htmlSent from the Solr - User mailing list archive at Nabble.com.

Re: Converting nested data model to solr schema

Reply via email to