Re: Converting nested data model to solr schema
As you see it, does SOLR-3076 fixes my problem? Is SOLR-3076 fix getting into solr 4.4? Mikhail Khludnev wrote On Mon, Jul 1, 2013 at 5:56 PM, adfel70 lt; adfel70@ gt; wrote: This requires me to override the solr document distribution mechanism. I fear that with this solution I may loose some of solr cloud's capabilities. It's not clear whether you aware of http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but what you did doesn't sound scary to me. If it works, it should be fine. I'm not aware of any capabilities that you are going to loose. Obviously SOLR-3076 provides astonishing query time performance, with offloading actual join work into index time. Check it if you current approach turns slow. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics lt;http://www.griddynamics.comgt; lt; mkhludnev@ gt; -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074668.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Converting nested data model to solr schema
It sounds like 4.4 will have an RC next week, so the prospects for block join in 4.4 are kind of dim. I mean, such a significant feature should have more than a few days to bake before getting released. But... who knows what Yonik has planned! -- Jack Krupansky -Original Message- From: adfel70 Sent: Tuesday, July 02, 2013 7:41 AM To: solr-user@lucene.apache.org Subject: Re: Converting nested data model to solr schema As you see it, does SOLR-3076 fixes my problem? Is SOLR-3076 fix getting into solr 4.4? Mikhail Khludnev wrote On Mon, Jul 1, 2013 at 5:56 PM, adfel70 lt; adfel70@ gt; wrote: This requires me to override the solr document distribution mechanism. I fear that with this solution I may loose some of solr cloud's capabilities. It's not clear whether you aware of http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but what you did doesn't sound scary to me. If it works, it should be fine. I'm not aware of any capabilities that you are going to loose. Obviously SOLR-3076 provides astonishing query time performance, with offloading actual join work into index time. Check it if you current approach turns slow. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics lt;http://www.griddynamics.comgt; lt; mkhludnev@ gt; -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074668.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Converting nested data model to solr schema
I'm not familiar with block join in lucene. I've read a bit, and I just want to make sure - do you think that when this ticket is released, it will solve the current problem of solr cloud joins? Also, can you elaborate a bit about your solution? Jack Krupansky-2 wrote It sounds like 4.4 will have an RC next week, so the prospects for block join in 4.4 are kind of dim. I mean, such a significant feature should have more than a few days to bake before getting released. But... who knows what Yonik has planned! -- Jack Krupansky -Original Message- From: adfel70 Sent: Tuesday, July 02, 2013 7:41 AM To: solr-user@.apache Subject: Re: Converting nested data model to solr schema As you see it, does SOLR-3076 fixes my problem? Is SOLR-3076 fix getting into solr 4.4? Mikhail Khludnev wrote On Mon, Jul 1, 2013 at 5:56 PM, adfel70 lt; adfel70@ gt; wrote: This requires me to override the solr document distribution mechanism. I fear that with this solution I may loose some of solr cloud's capabilities. It's not clear whether you aware of http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but what you did doesn't sound scary to me. If it works, it should be fine. I'm not aware of any capabilities that you are going to loose. Obviously SOLR-3076 provides astonishing query time performance, with offloading actual join work into index time. Check it if you current approach turns slow. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics lt;http://www.griddynamics.comgt; lt; mkhludnev@ gt; -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074668.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074696.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Converting nested data model to solr schema
during indexing whole block (doc and it's attachment) goes into particular shard, then it's can be queried per every shard and results are merged. btw, do you feel any problem with your current approach - query time joins and out-of-the-box shard routing? On Tue, Jul 2, 2013 at 5:19 PM, adfel70 adfe...@gmail.com wrote: I'm not familiar with block join in lucene. I've read a bit, and I just want to make sure - do you think that when this ticket is released, it will solve the current problem of solr cloud joins? Also, can you elaborate a bit about your solution? Jack Krupansky-2 wrote It sounds like 4.4 will have an RC next week, so the prospects for block join in 4.4 are kind of dim. I mean, such a significant feature should have more than a few days to bake before getting released. But... who knows what Yonik has planned! -- Jack Krupansky -Original Message- From: adfel70 Sent: Tuesday, July 02, 2013 7:41 AM To: solr-user@.apache Subject: Re: Converting nested data model to solr schema As you see it, does SOLR-3076 fixes my problem? Is SOLR-3076 fix getting into solr 4.4? Mikhail Khludnev wrote On Mon, Jul 1, 2013 at 5:56 PM, adfel70 lt; adfel70@ gt; wrote: This requires me to override the solr document distribution mechanism. I fear that with this solution I may loose some of solr cloud's capabilities. It's not clear whether you aware of http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but what you did doesn't sound scary to me. If it works, it should be fine. I'm not aware of any capabilities that you are going to loose. Obviously SOLR-3076 provides astonishing query time performance, with offloading actual join work into index time. Check it if you current approach turns slow. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics lt;http://www.griddynamics.comgt; lt; mkhludnev@ gt; -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074668.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074696.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Converting nested data model to solr schema
My current solution is overriding the out-of-the-box shard routing, and forcing each document and its attachment to go into a specific shard. But this is so I can support the query time joins (because join are only performed between documents in the same shard). I'm a bit concerned by this approach only because it forces me to overdrive out-of-the-box solr behavior. I didn't implement the whole thing yet, so can't say anything about performance. You're saying that your block-join solution does the same thing at index time (putting document and its attachments in the same shard), but at query time it doesn't require to perform explicit join? If you could add an example of what you'll index, and how you'll query , it would be very helpful. Also, if this ticket is going to get into one of the next releases, and it solves the join problem, it seems that its worth waiting for. Mikhail Khludnev wrote during indexing whole block (doc and it's attachment) goes into particular shard, then it's can be queried per every shard and results are merged. btw, do you feel any problem with your current approach - query time joins and out-of-the-box shard routing? On Tue, Jul 2, 2013 at 5:19 PM, adfel70 lt; adfel70@ gt; wrote: I'm not familiar with block join in lucene. I've read a bit, and I just want to make sure - do you think that when this ticket is released, it will solve the current problem of solr cloud joins? Also, can you elaborate a bit about your solution? Jack Krupansky-2 wrote It sounds like 4.4 will have an RC next week, so the prospects for block join in 4.4 are kind of dim. I mean, such a significant feature should have more than a few days to bake before getting released. But... who knows what Yonik has planned! -- Jack Krupansky -Original Message- From: adfel70 Sent: Tuesday, July 02, 2013 7:41 AM To: solr-user@.apache Subject: Re: Converting nested data model to solr schema As you see it, does SOLR-3076 fixes my problem? Is SOLR-3076 fix getting into solr 4.4? Mikhail Khludnev wrote On Mon, Jul 1, 2013 at 5:56 PM, adfel70 lt; adfel70@ gt; wrote: This requires me to override the solr document distribution mechanism. I fear that with this solution I may loose some of solr cloud's capabilities. It's not clear whether you aware of http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but what you did doesn't sound scary to me. If it works, it should be fine. I'm not aware of any capabilities that you are going to loose. Obviously SOLR-3076 provides astonishing query time performance, with offloading actual join work into index time. Check it if you current approach turns slow. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics lt;http://www.griddynamics.comgt; lt; mkhludnev@ gt; -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074668.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074696.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics lt;http://www.griddynamics.comgt; lt; mkhludnev@ gt; -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074876.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Converting nested data model to solr schema
Simply duplicate a subset of the fields that you want to query of the parent document on each child document and then you can directly query the child documents without any join. Yes, given the complexity of your data, a two-step query process may be necessary for some queries - do one query to get parent or child IDs and then do a second query filtered by those IDs. And, yes, this only approximates the full power of an SQL join - but at a tiny fraction of the cost. -- Jack Krupansky -Original Message- From: adfel70 Sent: Monday, July 01, 2013 9:56 AM To: solr-user@lucene.apache.org Subject: Converting nested data model to solr schema Hi, I have the following data model: 1. Document (fields: doc_id, author, content) 2. Each Document has multiple attachment types. Each attachment type has multiple instances. And each attachment type may have different fields. for example: doc doc_id1/doc_id authorjohn/author contentsome long long text.../content file_attachments file_attachment attach_id458/attach_id attach_textSomeText/attach_text attach_date12/12/2012/attach_date /file_attachment file_attachment attach_id568/attach_id attach_textSomeText2/attach_text attach_date12/11/2012/attach_date /file_attachment /file_attachments reply_attachments reply_attachment reply_id345/reply_id reply_textSomeText/reply_text reply_authorJack/reply_author reply_date22-12-2012/reply_date /reply_attachment reply_attachment reply_id897/attach_id reply_textSomeText2/reply_text reply_authorBob/reply_author reply_date23-12-2012/reply_date /reply_attachment /reply_attachments I want to index all this data in solr cloud. My current solution is to index the original document by its self and index each attachment as a single solr document with its parent_doc_id, and then use solr join capability. The problem with this solution is that I must index all the attachments of each document, and the document itself in the same shard (current solr limitation). This requires me to override the solr document distribution mechanism. I fear that with this solution I may loose some of solr cloud's capabilities. My questions are: 1. Are my concerns regarding downside of overriding solr cloud's out-of-the-box mechanism justified? Or should I proceed with this solution? 2. If I'm looking for another solution, can I somehow keep all attachments on the same document and be able to query on a single attachment? A query example: Retrieve all documents where: content: contains abc AND reply_attachment.author = 'Bob' AND reply_attachment.date = '12-12-2012' Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Converting nested data model to solr schema
On Mon, Jul 1, 2013 at 5:56 PM, adfel70 adfe...@gmail.com wrote: This requires me to override the solr document distribution mechanism. I fear that with this solution I may loose some of solr cloud's capabilities. It's not clear whether you aware of http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but what you did doesn't sound scary to me. If it works, it should be fine. I'm not aware of any capabilities that you are going to loose. Obviously SOLR-3076 provides astonishing query time performance, with offloading actual join work into index time. Check it if you current approach turns slow. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com