Re: Taxonomy faceting
: Lucid Imagination did a webcast on this, as far as I remember? that was me ... the webcast was a pre-run of my apachecon talk... http://www.lucidimagination.com/why-lucid/webinars/mastering-power-faceted-search http://people.apache.org/~hossman/apachecon2010/facets/ ...taxonomy stuff comes up ~slide 30 : The '1/topics/computing'-solution works at a single level, so if you are : interested in a multi-level result like if you want to show the whole tree when facetig you can just leave the "depth" number prefix out of terms, thta should work fine (but i haven't though about hard) : > Are there better ways to achieve this? : : Taxonomy faceting is a bit of a mess right now, but it is also an area : where a lot is happening. For SOLR, there is right, some of which i havne't been able to keep up on and can't comment on -- but in my experience if you are serious organizing your data in a taxonomy then you probably already have some data structure in your application layer that models the whole thing in memory, and maps nodeIds to nodeLabels and what not. What usually works fine is to just index the nodeIds for the entire ancestory of the category each Document is in can work fine for the filtering (ie: fq=cat:1234), and to generate the facet presentation you do a simple facet.field=ancestorCategories&facet.limit=-1 to get all the counts in a big hashmap and then use that to annotate your own own category tree data structure that you use to generate the presentaiton. -Hoss
Re: Taxonomy faceting
On Thu, 2011-06-30 at 11:38 +0200, Russell B wrote: > a multivalued field labelled category which for each document defines > where in the tree it should appear. For example: doc1 has the > category field set to "0/topics", "1/topics/computing", > "2/topic/computing/systems". > > I then facet on the 'category' field, filter the results with fq={!raw > f=category}1/topics/computing to get everything below that point on the > tree, and use f.category.facet.prefix to restrict the facet fields to the > current level. Lucid Imagination did a webcast on this, as far as I remember? > Playing around with the results, it seems to work ok but despite reading > lots about faceting I can't help feel there might be a better solution. The '1/topics/computing'-solution works at a single level, so if you are interested in a multi-level result like - topic - computing - hardware - software - biology - plants - animals you have to do more requests. > Are there better ways to achieve this? Taxonomy faceting is a bit of a mess right now, but it is also an area where a lot is happening. For SOLR, there is https://issues.apache.org/jira/browse/SOLR-64 (single path/document hierarchical faceting) https://issues.apache.org/jira/browse/SOLR-792 (pivot faceting, now part of trunk AFAIR) https://issues.apache.org/jira/browse/SOLR-2412 (multi path/document hierarchical faceting, very experimental) Just yesterday, another multi path/document hierarchical faceting solution was added to the Lucene 3.x branch and Lucene trunk. It has been used by IBM for some time and appears to be mature and stable. https://issues.apache.org/jira/browse/LUCENE-3079 However, this solution requires a sidecar index for the taxonomy and I am a bit worried about how this fits into the Solr index workflow.
Re: Taxonomy faceting
That's a good way. How does it perform? Another way would be to store the "parent" topics in a field. Whenever a parent node is drilled-into, simply search for all documents with that parent. Perhaps not as elegant as your approach though. I'd be interested in the performance comparison between the two approaches. > I have a hierarchical taxonomy of documents that I would like users to be > able to search either through search or "drill-down" faceting. The > documents may appear at multiple points in the hierarchy. I've got a > solution working as follows: a multivalued field labelled category which > for > each document defines where in the tree it should appear. For example: > doc1 > has the category field set to "0/topics", "1/topics/computing", > "2/topic/computing/systems". > > I then facet on the 'category' field, filter the results with fq={!raw > f=category}1/topics/computing to get everything below that point on the > tree, and use f.category.facet.prefix to restrict the facet fields to the > current level. > > Full query something like: > > http://localhost:8080/solr/select/?q=something&facet=true&facet.field=category&fq={!rawf=category}1/topics/computing&f.category.facet.prefix=2/topic/computing > > > Playing around with the results, it seems to work ok but despite reading > lots about faceting I can't help feel there might be a better solution. > Are > there better ways to achieve this? Any comments/suggestions are welcome. > > (Any suggestions as to what interface I can put on top of this are also > gratefully received!). > > > Thanks, > > Russell >
Re: taxonomy faceting
: I have many taxonomies and each document can apply to some of them. I dont : know how many taxonomies they are, so i cant define a field in the schema : for each taxonomy (one field per each taxonomy). : : I want to use these feature but i need to know if i can handle the context : where each document apply few taxonomies and i cant define a field for each : taxonomy on the schema because they are dinamyc. Can Solr handle these : situation? Well, i'm not sure that i really understand your question... you could easily use a dynamic field to declare taxonomy_* naming pattern for all of your taxonomy fields. so then as long as you know what taxonomies each doc is in (and which branches it is in in each of those taxonomies) when you index teh doc you'd be fine. but if you don't actaully know the list of all taxonomies, what owuld you do with those fields once you indexed them? alternately you could model your data so that you only had one "taxonomy" field, and the root level nodes of that taxonomy would be the names of each of the multitudes of taxonomies you have -- then the same faceting tricks i described in that webinar would work (but again: youd have to know know what taxonomies each doc is in, and which branches it is in in each of those taxonomies, when you index each doc). -Hoss