Re: Taxonomy in SOLR

2011-01-24 Thread Em
Hi Damien, can you provide a schema sample plus example-data? Since your information is really general, I think no one can give you a situation-specific advice. Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Taxonomy-in-SOLR-tp2317955p2318200.html Sent from the

Re: Taxonomy in SOLR

2011-01-24 Thread Damien Fontaine
My schema : field name=id type=string indexed=true stored=true required=true / !-- Document -- field name=lead type=string indexed=true stored=true / field name=title type=string indexed=true stored=true required=true / field name=text type=string indexed=true stored=true required=true / !--

Re: Taxonomy in SOLR

2011-01-24 Thread Em
Hi Damien, why are you storing the taxonomies? When it comes to faceting, it only depends on indexed values. If there is a meaningful difference between the indexed and the stored value, I would prefer to use an RDBMs or something like that to reduce redundancy. Does this help? Regards --

Re: Taxonomy in SOLR

2011-01-24 Thread Damien Fontaine
Yes, i am not obliged to store taxonomies. My taxonomies are type of english_taxon_label = Berlin english_taxon_type = location english_taxon_hierarchy = 0/world 1/world/europe 2/world/europe/germany

Re: Taxonomy in SOLR

2011-01-24 Thread Em
100 Entries per taxon? Well, with Solr you got 100 taxon-entries * 4mio docs * 10 taxons. If your indexed taxon-versions are looking okay, you could leave out the DB-overhead and could do everything in Solr. -- View this message in context:

Re: Taxonomy in SOLR

2011-01-24 Thread Damien Fontaine
Thanks Em, How i can calculate index time, update time and space disk used by one taxonomy ? Le 24/01/2011 10:58, Em a écrit : 100 Entries per taxon? Well, with Solr you got 100 taxon-entries * 4mio docs * 10 taxons. If your indexed taxon-versions are looking okay, you could leave out the

Re: Taxonomy in SOLR

2011-01-24 Thread Em
Hi Daniem, ahm, the formula I wrote was no definitive guide, just some numbers I combined to visualize the amount of data - perhaps not even a complete formula. Well, when you can use your taxonomy as indexed-only you do not double the used disk space when you are indexing two equal documents.

Re: Taxonomy in SOLR

2011-01-24 Thread Damien Fontaine
Le 24/01/2011 13:10, Em a écrit : Hi Daniem, ahm, the formula I wrote was no definitive guide, just some numbers I combined to visualize the amount of data - perhaps not even a complete formula. Well, when you can use your taxonomy as indexed-only you do not double the used disk space when

Re: Taxonomy in SOLR

2011-01-24 Thread Em
Just for illustration: This is your original data: doc1 : hello world doc2: hello daniem doc3: hello pal Now, Lucene produces something like this from the input: hello: id_doc1,id_doc2,id_doc3 daniem: id_doc2 pal: id_doc3 Well, it's more complex, but enough for illustration. As you can see,

Re: Taxonomy in SOLR

2011-01-24 Thread Erick Erickson
First, the redundancy is certainly there, but that's what Solr does, handles large amounts of data. 4 million documents is actually a pretty small corpus by Solr standards, so you may well be able to do exactly what you propose with acceptable performance/size. I'd advise just trying it with, say,

Re: Taxonomy in SOLR

2011-01-24 Thread Damien Fontaine
Thanks Em and Erick for your answers, Now, i better understand functioning of Solr. Damien Le 24/01/2011 16:23, Erick Erickson a écrit : First, the redundancy is certainly there, but that's what Solr does, handles large amounts of data. 4 million documents is actually a pretty small corpus by

Re: Taxonomy in SOLR

2011-01-24 Thread Em
Hi Erick, in some usecases I really think that your suggestion with some unique-documents for meta-information is a good approach to solve some issues. However there is a hurdle for me and maybe you can help me to clear it: What is the best way to get such meta-data? I see three possible

Re: Taxonomy in SOLR

2011-01-24 Thread Erick Erickson
I wasn't thinking about this for adding information to the *request*. Rather, in this case the autocomplete uses an Ajax call that just uses the TermsComponent to get the autocomplete data and display it. This is just textual, so adding it to the request is client-side magic. If you want your app

Re: Taxonomy in SOLR

2011-01-24 Thread Em
Thank you for the advice, Erick! I will take a look at extending the StandardRequestHandler for such usecases. Erick Erickson wrote: I wasn't thinking about this for adding information to the *request*. Rather, in this case the autocomplete uses an Ajax call that just uses the

Re: Taxonomy in SOLR

2011-01-24 Thread Jonathan Rochkind
There aren't any great general purpose out of the box ways to handle hieararchical data in Solr. Solr isn't an rdbms. There may be some particular advice on how to set up a particular Solr index to answer particular questions with regard to hieararchical data. I saw a great point made