Re: Help with new Join Functionallity in Solr 4.0

2012-09-24 Thread Erick Erickson
NP, good luck!

On Sun, Sep 23, 2012 at 3:41 PM,  milen.ti...@materna.de wrote:
 Hello Erick,

 Thanks a lot for your reply! Your suggestion is actually exactly the 
 alternative solution we are thinking about and with your clarification on 
 Solr's performance we are going to go for it! Many thanks again!

 Milen

 
 Von: Erick Erickson [erickerick...@gmail.com]
 Gesendet: Sonntag, 23. September 2012 17:50
 An: solr-user@lucene.apache.org
 Betreff: Re: Help with new Join Functionallity in Solr 4.0

 The very first thing to try is flatten your data so you don't have to use 
 joins.
 I know that goes against your database instincts, but Solr easily handles
 millions and millions of documents. So if the cross-product of docs and 
 modules
 isn't prohibitive, that's what I'd do first. Then it's just a matter of
 forming a search without joins

 Joins run into performance issues when the join field has many unique
 values, unfortunately the field people often want to join on is something
 like a uniqueKey (or PK in RDBMS terms), so be aware of that.

 Best
 Erick

 On Fri, Sep 21, 2012 at 5:46 AM,  milen.ti...@materna.de wrote:
 Dear Solr community,

 I am rather new to Solr, however I already find it kind of attractive. We 
 are developing a research application, which contains a Solr index with 
 three different kinds of documents, here the basic idea:


 -  A document of type doc consisting of fields id, docid, doctitle 
 and some other metadata

 -  A document of type module consisting of fields id, modid and 
 text

 -  A document of type docmodule consisting of fields id, docrefid, 
 modrefid and some metadata about the relation between a document and a 
 module; filed docrefid refers to the id of a doc document, while field 
 modrefid contains the id of a module document

 In other words, in our model there are documents (type doc) consisting of 
 several modules and there is some characterization of each link between a 
 document and a module.

 Almost all fields of a doc document are searchable, as well as the text of 
 a module and the metadata of the docmodule entries.

 We are looking for a fast way to retrieve all modules containing a certain 
 text and associated with a given document, preferably with a single query. 
 This means we want to query the text from a module document while we set a 
 restriction on the docrefid from a docmodule or the id from a doc 
 document. Is this possible by means of the new pseudo joins? Any ideas are 
 highly appreciated!

 Thanks in advance!

 Milen Tilev
 Master of Science
 Softwareentwickler
 Business Unit Information
 

 MATERNA GmbH
 Information  Communications

 Voßkuhle 37
 44141 Dortmund
 Deutschland

 Telefon: +49 231 5599-8257
 Fax: +49 231 5599-98257
 E-Mail: milen.ti...@materna.demailto:milen.ti...@materna.de

 www.materna.dehttp://www.materna.de/ | 
 Newsletterhttp://www.materna.de/newsletter | 
 Twitterhttp://twitter.com/MATERNA_GmbH | 
 XINGhttp://www.xing.com/companies/MATERNAGMBH | 
 Facebookhttp://www.facebook.com/maternagmbh
 

 Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
 Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig
 Amtsgericht Dortmund HRB 5839



AW: Help with new Join Functionallity in Solr 4.0

2012-09-23 Thread Milen.Tilev
Hello Erick,

Thanks a lot for your reply! Your suggestion is actually exactly the 
alternative solution we are thinking about and with your clarification on 
Solr's performance we are going to go for it! Many thanks again!

Milen


Von: Erick Erickson [erickerick...@gmail.com]
Gesendet: Sonntag, 23. September 2012 17:50
An: solr-user@lucene.apache.org
Betreff: Re: Help with new Join Functionallity in Solr 4.0

The very first thing to try is flatten your data so you don't have to use joins.
I know that goes against your database instincts, but Solr easily handles
millions and millions of documents. So if the cross-product of docs and modules
isn't prohibitive, that's what I'd do first. Then it's just a matter of
forming a search without joins

Joins run into performance issues when the join field has many unique
values, unfortunately the field people often want to join on is something
like a uniqueKey (or PK in RDBMS terms), so be aware of that.

Best
Erick

On Fri, Sep 21, 2012 at 5:46 AM,  milen.ti...@materna.de wrote:
 Dear Solr community,

 I am rather new to Solr, however I already find it kind of attractive. We are 
 developing a research application, which contains a Solr index with three 
 different kinds of documents, here the basic idea:


 -  A document of type doc consisting of fields id, docid, doctitle 
 and some other metadata

 -  A document of type module consisting of fields id, modid and text

 -  A document of type docmodule consisting of fields id, docrefid, 
 modrefid and some metadata about the relation between a document and a 
 module; filed docrefid refers to the id of a doc document, while field 
 modrefid contains the id of a module document

 In other words, in our model there are documents (type doc) consisting of 
 several modules and there is some characterization of each link between a 
 document and a module.

 Almost all fields of a doc document are searchable, as well as the text of 
 a module and the metadata of the docmodule entries.

 We are looking for a fast way to retrieve all modules containing a certain 
 text and associated with a given document, preferably with a single query. 
 This means we want to query the text from a module document while we set a 
 restriction on the docrefid from a docmodule or the id from a doc 
 document. Is this possible by means of the new pseudo joins? Any ideas are 
 highly appreciated!

 Thanks in advance!

 Milen Tilev
 Master of Science
 Softwareentwickler
 Business Unit Information
 

 MATERNA GmbH
 Information  Communications

 Voßkuhle 37
 44141 Dortmund
 Deutschland

 Telefon: +49 231 5599-8257
 Fax: +49 231 5599-98257
 E-Mail: milen.ti...@materna.demailto:milen.ti...@materna.de

 www.materna.dehttp://www.materna.de/ | 
 Newsletterhttp://www.materna.de/newsletter | 
 Twitterhttp://twitter.com/MATERNA_GmbH | 
 XINGhttp://www.xing.com/companies/MATERNAGMBH | 
 Facebookhttp://www.facebook.com/maternagmbh
 

 Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
 Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig
 Amtsgericht Dortmund HRB 5839



Help with new Join Functionallity in Solr 4.0

2012-09-21 Thread Milen.Tilev
Dear Solr community,

I am rather new to Solr, however I already find it kind of attractive. We are 
developing a research application, which contains a Solr index with three 
different kinds of documents, here the basic idea:


-  A document of type doc consisting of fields id, docid, doctitle 
and some other metadata

-  A document of type module consisting of fields id, modid and text

-  A document of type docmodule consisting of fields id, docrefid, 
modrefid and some metadata about the relation between a document and a module; 
filed docrefid refers to the id of a doc document, while field modrefid 
contains the id of a module document

In other words, in our model there are documents (type doc) consisting of 
several modules and there is some characterization of each link between a 
document and a module.

Almost all fields of a doc document are searchable, as well as the text of a 
module and the metadata of the docmodule entries.

We are looking for a fast way to retrieve all modules containing a certain text 
and associated with a given document, preferably with a single query. This 
means we want to query the text from a module document while we set a 
restriction on the docrefid from a docmodule or the id from a doc document. 
Is this possible by means of the new pseudo joins? Any ideas are highly 
appreciated!

Thanks in advance!

Milen Tilev
Master of Science
Softwareentwickler
Business Unit Information


MATERNA GmbH
Information  Communications

Voßkuhle 37
44141 Dortmund
Deutschland

Telefon: +49 231 5599-8257
Fax: +49 231 5599-98257
E-Mail: milen.ti...@materna.demailto:milen.ti...@materna.de

www.materna.dehttp://www.materna.de/ | 
Newsletterhttp://www.materna.de/newsletter | 
Twitterhttp://twitter.com/MATERNA_GmbH | 
XINGhttp://www.xing.com/companies/MATERNAGMBH | 
Facebookhttp://www.facebook.com/maternagmbh


Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund
Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig
Amtsgericht Dortmund HRB 5839