Re: Converting nested data model to solr schema

2013-07-02 Thread adfel70
As you see it, does SOLR-3076 fixes my problem?

Is SOLR-3076 fix getting into solr 4.4?


Mikhail Khludnev wrote
 On Mon, Jul 1, 2013 at 5:56 PM, adfel70 lt;

 adfel70@

 gt; wrote:
 
 This requires me to override the solr document distribution mechanism.
 I fear that with this solution I may loose some of solr cloud's
 capabilities.

 
 It's not clear whether you aware of
 http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but what
 you
 did doesn't sound scary to me. If it works, it should be fine. I'm not
 aware of any capabilities that you are going to loose.
 Obviously SOLR-3076 provides astonishing query time performance, with
 offloading actual join work into index time. Check it if you current
 approach turns slow.
 
 
 -- 
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 lt;http://www.griddynamics.comgt;
  lt;

 mkhludnev@

 gt;





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074668.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Converting nested data model to solr schema

2013-07-02 Thread Jack Krupansky
It sounds like 4.4 will have an RC next week, so the prospects for block 
join in 4.4 are kind of dim. I mean, such a significant feature should have 
more than a few days to bake before getting released. But... who knows what 
Yonik has planned!


-- Jack Krupansky

-Original Message- 
From: adfel70

Sent: Tuesday, July 02, 2013 7:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Converting nested data model to solr schema

As you see it, does SOLR-3076 fixes my problem?

Is SOLR-3076 fix getting into solr 4.4?


Mikhail Khludnev wrote

On Mon, Jul 1, 2013 at 5:56 PM, adfel70 lt;



adfel70@



gt; wrote:


This requires me to override the solr document distribution mechanism.
I fear that with this solution I may loose some of solr cloud's
capabilities.



It's not clear whether you aware of
http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but what
you
did doesn't sound scary to me. If it works, it should be fine. I'm not
aware of any capabilities that you are going to loose.
Obviously SOLR-3076 provides astonishing query time performance, with
offloading actual join work into index time. Check it if you current
approach turns slow.


--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

lt;http://www.griddynamics.comgt;
 lt;



mkhludnev@



gt;






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074668.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Converting nested data model to solr schema

2013-07-02 Thread adfel70
I'm not familiar with block join in lucene. I've read a bit, and I just want
to make sure - do you think that when this ticket is released, it will solve
the current problem of solr cloud joins?

Also, can you elaborate a bit about your solution?


Jack Krupansky-2 wrote
 It sounds like 4.4 will have an RC next week, so the prospects for block 
 join in 4.4 are kind of dim. I mean, such a significant feature should
 have 
 more than a few days to bake before getting released. But... who knows
 what 
 Yonik has planned!
 
 -- Jack Krupansky
 
 -Original Message- 
 From: adfel70
 Sent: Tuesday, July 02, 2013 7:41 AM
 To: 

 solr-user@.apache

 Subject: Re: Converting nested data model to solr schema
 
 As you see it, does SOLR-3076 fixes my problem?
 
 Is SOLR-3076 fix getting into solr 4.4?
 
 
 Mikhail Khludnev wrote
 On Mon, Jul 1, 2013 at 5:56 PM, adfel70 lt;
 
 adfel70@
 
 gt; wrote:

 This requires me to override the solr document distribution mechanism.
 I fear that with this solution I may loose some of solr cloud's
 capabilities.


 It's not clear whether you aware of
 http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but what
 you
 did doesn't sound scary to me. If it works, it should be fine. I'm not
 aware of any capabilities that you are going to loose.
 Obviously SOLR-3076 provides astonishing query time performance, with
 offloading actual join work into index time. Check it if you current
 approach turns slow.


 -- 
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics

 lt;http://www.griddynamics.comgt;
  lt;
 
 mkhludnev@
 
 gt;
 
 
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074668.html
 Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074696.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Converting nested data model to solr schema

2013-07-02 Thread Mikhail Khludnev
during indexing whole block (doc and it's attachment) goes into particular
shard, then it's can be queried per every shard and results are merged.

btw, do you feel any problem with your current approach - query time joins
and out-of-the-box shard routing?


On Tue, Jul 2, 2013 at 5:19 PM, adfel70 adfe...@gmail.com wrote:

 I'm not familiar with block join in lucene. I've read a bit, and I just
 want
 to make sure - do you think that when this ticket is released, it will
 solve
 the current problem of solr cloud joins?

 Also, can you elaborate a bit about your solution?


 Jack Krupansky-2 wrote
  It sounds like 4.4 will have an RC next week, so the prospects for block
  join in 4.4 are kind of dim. I mean, such a significant feature should
  have
  more than a few days to bake before getting released. But... who knows
  what
  Yonik has planned!
 
  -- Jack Krupansky
 
  -Original Message-
  From: adfel70
  Sent: Tuesday, July 02, 2013 7:41 AM
  To:

  solr-user@.apache

  Subject: Re: Converting nested data model to solr schema
 
  As you see it, does SOLR-3076 fixes my problem?
 
  Is SOLR-3076 fix getting into solr 4.4?
 
 
  Mikhail Khludnev wrote
  On Mon, Jul 1, 2013 at 5:56 PM, adfel70 lt;
 
  adfel70@
 
  gt; wrote:
 
  This requires me to override the solr document distribution mechanism.
  I fear that with this solution I may loose some of solr cloud's
  capabilities.
 
 
  It's not clear whether you aware of
  http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but what
  you
  did doesn't sound scary to me. If it works, it should be fine. I'm not
  aware of any capabilities that you are going to loose.
  Obviously SOLR-3076 provides astonishing query time performance, with
  offloading actual join work into index time. Check it if you current
  approach turns slow.
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  lt;http://www.griddynamics.comgt;
   lt;
 
  mkhludnev@
 
  gt;
 
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074668.html
  Sent from the Solr - User mailing list archive at Nabble.com.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074696.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Converting nested data model to solr schema

2013-07-02 Thread adfel70
My current solution is overriding the  out-of-the-box shard routing, and
forcing each document and its attachment to go into a specific shard. But
this is so I can support the query time joins (because join are only
performed between documents in the same shard).

I'm a bit concerned by this approach only because it forces me to overdrive
out-of-the-box solr behavior.
I didn't implement the whole thing yet, so can't say anything about
performance.

You're saying that your block-join solution does the same thing at index
time (putting document and its attachments in the same shard), but at query
time it doesn't require to perform explicit join?
If you could add an example of what you'll index, and how you'll query , it
would be very helpful.

Also, if this ticket is going to get into one of the next releases, and it
solves the join problem, it seems that its worth waiting for.



Mikhail Khludnev wrote
 during indexing whole block (doc and it's attachment) goes into particular
 shard, then it's can be queried per every shard and results are merged.
 
 btw, do you feel any problem with your current approach - query time joins
 and out-of-the-box shard routing?
 
 
 On Tue, Jul 2, 2013 at 5:19 PM, adfel70 lt;

 adfel70@

 gt; wrote:
 
 I'm not familiar with block join in lucene. I've read a bit, and I just
 want
 to make sure - do you think that when this ticket is released, it will
 solve
 the current problem of solr cloud joins?

 Also, can you elaborate a bit about your solution?


 Jack Krupansky-2 wrote
  It sounds like 4.4 will have an RC next week, so the prospects for
 block
  join in 4.4 are kind of dim. I mean, such a significant feature should
  have
  more than a few days to bake before getting released. But... who knows
  what
  Yonik has planned!
 
  -- Jack Krupansky
 
  -Original Message-
  From: adfel70
  Sent: Tuesday, July 02, 2013 7:41 AM
  To:

  solr-user@.apache

  Subject: Re: Converting nested data model to solr schema
 
  As you see it, does SOLR-3076 fixes my problem?
 
  Is SOLR-3076 fix getting into solr 4.4?
 
 
  Mikhail Khludnev wrote
  On Mon, Jul 1, 2013 at 5:56 PM, adfel70 lt;
 
  adfel70@
 
  gt; wrote:
 
  This requires me to override the solr document distribution
 mechanism.
  I fear that with this solution I may loose some of solr cloud's
  capabilities.
 
 
  It's not clear whether you aware of
  http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but
 what
  you
  did doesn't sound scary to me. If it works, it should be fine. I'm not
  aware of any capabilities that you are going to loose.
  Obviously SOLR-3076 provides astonishing query time performance, with
  offloading actual join work into index time. Check it if you current
  approach turns slow.
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  lt;http://www.griddynamics.comgt;
   lt;
 
  mkhludnev@
 
  gt;
 
 
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074668.html
  Sent from the Solr - User mailing list archive at Nabble.com.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074696.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 
 -- 
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 lt;http://www.griddynamics.comgt;
  lt;

 mkhludnev@

 gt;





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351p4074876.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Converting nested data model to solr schema

2013-07-01 Thread Jack Krupansky
Simply duplicate a subset of the fields that you want to query of the parent 
document on each child document and then you can directly query the child 
documents without any join.


Yes, given the complexity of your data, a two-step query process may be 
necessary for some queries - do one query to get parent or child IDs and 
then do a second query filtered by those IDs.


And, yes, this only approximates the full power of an SQL join - but at a 
tiny fraction of the cost.


-- Jack Krupansky

-Original Message- 
From: adfel70

Sent: Monday, July 01, 2013 9:56 AM
To: solr-user@lucene.apache.org
Subject: Converting nested data model to solr schema

Hi,
I have the following data model:
1. Document (fields: doc_id, author, content)
2. Each Document has multiple  attachment types. Each attachment type has
multiple instances. And each attachment type may have different fields.
for example:
doc
  doc_id1/doc_id
  authorjohn/author
  contentsome long long text.../content
  file_attachments
 file_attachment
attach_id458/attach_id
attach_textSomeText/attach_text
attach_date12/12/2012/attach_date
 /file_attachment
 file_attachment
attach_id568/attach_id
attach_textSomeText2/attach_text
attach_date12/11/2012/attach_date
 /file_attachment
  /file_attachments
  reply_attachments
 reply_attachment
reply_id345/reply_id
reply_textSomeText/reply_text
reply_authorJack/reply_author
reply_date22-12-2012/reply_date
 /reply_attachment
 reply_attachment
reply_id897/attach_id
reply_textSomeText2/reply_text
reply_authorBob/reply_author
reply_date23-12-2012/reply_date
 /reply_attachment
  /reply_attachments


I want to index all this data in solr cloud.
My current solution is to index the original document by its self and index
each attachment as a single solr document with its parent_doc_id, and then
use solr join capability.
The problem with this solution is  that I must index all the attachments of
each document, and the document itself in the same shard (current solr
limitation).
This requires me to override the solr document distribution mechanism.
I fear that with this solution I may loose some of solr cloud's
capabilities.
My questions are:
1. Are my concerns regarding downside of overriding solr cloud's
out-of-the-box mechanism justified? Or should I proceed with this solution?
2. If I'm looking for another solution, can I  somehow keep all attachments
on the same document and be able to query on a single attachment?
A query example:
Retrieve  all documents where:
content: contains abc
AND
reply_attachment.author = 'Bob'
AND
reply_attachment.date = '12-12-2012'


Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Converting nested data model to solr schema

2013-07-01 Thread Mikhail Khludnev
On Mon, Jul 1, 2013 at 5:56 PM, adfel70 adfe...@gmail.com wrote:

 This requires me to override the solr document distribution mechanism.
 I fear that with this solution I may loose some of solr cloud's
 capabilities.


It's not clear whether you aware of
http://searchhub.org/2013/06/13/solr-cloud-document-routing/ , but what you
did doesn't sound scary to me. If it works, it should be fine. I'm not
aware of any capabilities that you are going to loose.
Obviously SOLR-3076 provides astonishing query time performance, with
offloading actual join work into index time. Check it if you current
approach turns slow.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com