Re: Help with new Join Functionallity in Solr 4.0
NP, good luck! On Sun, Sep 23, 2012 at 3:41 PM, milen.ti...@materna.de wrote: Hello Erick, Thanks a lot for your reply! Your suggestion is actually exactly the alternative solution we are thinking about and with your clarification on Solr's performance we are going to go for it! Many thanks again! Milen Von: Erick Erickson [erickerick...@gmail.com] Gesendet: Sonntag, 23. September 2012 17:50 An: solr-user@lucene.apache.org Betreff: Re: Help with new Join Functionallity in Solr 4.0 The very first thing to try is flatten your data so you don't have to use joins. I know that goes against your database instincts, but Solr easily handles millions and millions of documents. So if the cross-product of docs and modules isn't prohibitive, that's what I'd do first. Then it's just a matter of forming a search without joins Joins run into performance issues when the join field has many unique values, unfortunately the field people often want to join on is something like a uniqueKey (or PK in RDBMS terms), so be aware of that. Best Erick On Fri, Sep 21, 2012 at 5:46 AM, milen.ti...@materna.de wrote: Dear Solr community, I am rather new to Solr, however I already find it kind of attractive. We are developing a research application, which contains a Solr index with three different kinds of documents, here the basic idea: - A document of type doc consisting of fields id, docid, doctitle and some other metadata - A document of type module consisting of fields id, modid and text - A document of type docmodule consisting of fields id, docrefid, modrefid and some metadata about the relation between a document and a module; filed docrefid refers to the id of a doc document, while field modrefid contains the id of a module document In other words, in our model there are documents (type doc) consisting of several modules and there is some characterization of each link between a document and a module. Almost all fields of a doc document are searchable, as well as the text of a module and the metadata of the docmodule entries. We are looking for a fast way to retrieve all modules containing a certain text and associated with a given document, preferably with a single query. This means we want to query the text from a module document while we set a restriction on the docrefid from a docmodule or the id from a doc document. Is this possible by means of the new pseudo joins? Any ideas are highly appreciated! Thanks in advance! Milen Tilev Master of Science Softwareentwickler Business Unit Information MATERNA GmbH Information Communications Voßkuhle 37 44141 Dortmund Deutschland Telefon: +49 231 5599-8257 Fax: +49 231 5599-98257 E-Mail: milen.ti...@materna.demailto:milen.ti...@materna.de www.materna.dehttp://www.materna.de/ | Newsletterhttp://www.materna.de/newsletter | Twitterhttp://twitter.com/MATERNA_GmbH | XINGhttp://www.xing.com/companies/MATERNAGMBH | Facebookhttp://www.facebook.com/maternagmbh Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig Amtsgericht Dortmund HRB 5839
AW: Help with new Join Functionallity in Solr 4.0
Hello Erick, Thanks a lot for your reply! Your suggestion is actually exactly the alternative solution we are thinking about and with your clarification on Solr's performance we are going to go for it! Many thanks again! Milen Von: Erick Erickson [erickerick...@gmail.com] Gesendet: Sonntag, 23. September 2012 17:50 An: solr-user@lucene.apache.org Betreff: Re: Help with new Join Functionallity in Solr 4.0 The very first thing to try is flatten your data so you don't have to use joins. I know that goes against your database instincts, but Solr easily handles millions and millions of documents. So if the cross-product of docs and modules isn't prohibitive, that's what I'd do first. Then it's just a matter of forming a search without joins Joins run into performance issues when the join field has many unique values, unfortunately the field people often want to join on is something like a uniqueKey (or PK in RDBMS terms), so be aware of that. Best Erick On Fri, Sep 21, 2012 at 5:46 AM, milen.ti...@materna.de wrote: Dear Solr community, I am rather new to Solr, however I already find it kind of attractive. We are developing a research application, which contains a Solr index with three different kinds of documents, here the basic idea: - A document of type doc consisting of fields id, docid, doctitle and some other metadata - A document of type module consisting of fields id, modid and text - A document of type docmodule consisting of fields id, docrefid, modrefid and some metadata about the relation between a document and a module; filed docrefid refers to the id of a doc document, while field modrefid contains the id of a module document In other words, in our model there are documents (type doc) consisting of several modules and there is some characterization of each link between a document and a module. Almost all fields of a doc document are searchable, as well as the text of a module and the metadata of the docmodule entries. We are looking for a fast way to retrieve all modules containing a certain text and associated with a given document, preferably with a single query. This means we want to query the text from a module document while we set a restriction on the docrefid from a docmodule or the id from a doc document. Is this possible by means of the new pseudo joins? Any ideas are highly appreciated! Thanks in advance! Milen Tilev Master of Science Softwareentwickler Business Unit Information MATERNA GmbH Information Communications Voßkuhle 37 44141 Dortmund Deutschland Telefon: +49 231 5599-8257 Fax: +49 231 5599-98257 E-Mail: milen.ti...@materna.demailto:milen.ti...@materna.de www.materna.dehttp://www.materna.de/ | Newsletterhttp://www.materna.de/newsletter | Twitterhttp://twitter.com/MATERNA_GmbH | XINGhttp://www.xing.com/companies/MATERNAGMBH | Facebookhttp://www.facebook.com/maternagmbh Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig Amtsgericht Dortmund HRB 5839
Help with new Join Functionallity in Solr 4.0
Dear Solr community, I am rather new to Solr, however I already find it kind of attractive. We are developing a research application, which contains a Solr index with three different kinds of documents, here the basic idea: - A document of type doc consisting of fields id, docid, doctitle and some other metadata - A document of type module consisting of fields id, modid and text - A document of type docmodule consisting of fields id, docrefid, modrefid and some metadata about the relation between a document and a module; filed docrefid refers to the id of a doc document, while field modrefid contains the id of a module document In other words, in our model there are documents (type doc) consisting of several modules and there is some characterization of each link between a document and a module. Almost all fields of a doc document are searchable, as well as the text of a module and the metadata of the docmodule entries. We are looking for a fast way to retrieve all modules containing a certain text and associated with a given document, preferably with a single query. This means we want to query the text from a module document while we set a restriction on the docrefid from a docmodule or the id from a doc document. Is this possible by means of the new pseudo joins? Any ideas are highly appreciated! Thanks in advance! Milen Tilev Master of Science Softwareentwickler Business Unit Information MATERNA GmbH Information Communications Voßkuhle 37 44141 Dortmund Deutschland Telefon: +49 231 5599-8257 Fax: +49 231 5599-98257 E-Mail: milen.ti...@materna.demailto:milen.ti...@materna.de www.materna.dehttp://www.materna.de/ | Newsletterhttp://www.materna.de/newsletter | Twitterhttp://twitter.com/MATERNA_GmbH | XINGhttp://www.xing.com/companies/MATERNAGMBH | Facebookhttp://www.facebook.com/maternagmbh Sitz der MATERNA GmbH: Voßkuhle 37, 44141 Dortmund Geschäftsführer: Dr. Winfried Materna, Helmut an de Meulen, Ralph Hartwig Amtsgericht Dortmund HRB 5839