Re: Taxonomy faceting

2011-06-30 Thread darren
That's a good way. How does it perform?

Another way would be to store the parent topics in a field.
Whenever a parent node is drilled-into, simply search for all documents
with that parent. Perhaps not as elegant as your approach though.

I'd be interested in the performance comparison between the two approaches.

 I have a hierarchical taxonomy of documents that I would like users to be
 able to search either through search or drill-down faceting.  The
 documents may appear at multiple points in the hierarchy.  I've got a
 solution working as follows: a multivalued field labelled category which
 for
 each document defines where in the tree it should appear.  For example:
 doc1
 has the category field set to 0/topics, 1/topics/computing,
 2/topic/computing/systems.

 I then facet on the 'category' field, filter the results with fq={!raw
 f=category}1/topics/computing to get everything below that point on the
 tree, and use f.category.facet.prefix to restrict the facet fields to the
 current level.

 Full query something like:

 http://localhost:8080/solr/select/?q=somethingfacet=truefacet.field=categoryfq={!rawf=category}1/topics/computingf.category.facet.prefix=2/topic/computing


 Playing around with the results, it seems to work ok but despite reading
 lots about faceting I can't help feel there might be a better solution.
 Are
 there better ways to achieve this?  Any comments/suggestions are welcome.

 (Any suggestions as to what interface I can put on top of this are also
 gratefully received!).


 Thanks,

 Russell




Re: Taxonomy faceting

2011-06-30 Thread Toke Eskildsen
On Thu, 2011-06-30 at 11:38 +0200, Russell B wrote:
 a multivalued field labelled category which for each document defines
 where in the tree it should appear.  For example: doc1 has the
 category field set to 0/topics, 1/topics/computing,
 2/topic/computing/systems.
 
 I then facet on the 'category' field, filter the results with fq={!raw
 f=category}1/topics/computing to get everything below that point on the
 tree, and use f.category.facet.prefix to restrict the facet fields to the
 current level.

Lucid Imagination did a webcast on this, as far as I remember?

 Playing around with the results, it seems to work ok but despite reading
 lots about faceting I can't help feel there might be a better solution.

The '1/topics/computing'-solution works at a single level, so if you are
interested in a multi-level result like
- topic
 - computing
  - hardware
  - software
 - biology
  - plants
  - animals
you have to do more requests.

 Are there better ways to achieve this?

Taxonomy faceting is a bit of a mess right now, but it is also an area
where a lot is happening. For SOLR, there is

https://issues.apache.org/jira/browse/SOLR-64
(single path/document hierarchical faceting)

https://issues.apache.org/jira/browse/SOLR-792
(pivot faceting, now part of trunk AFAIR)

https://issues.apache.org/jira/browse/SOLR-2412
(multi path/document hierarchical faceting, very experimental)

Just yesterday, another multi path/document hierarchical faceting
solution was added to the Lucene 3.x branch and Lucene trunk. It has
been used by IBM for some time and appears to be mature and stable.
https://issues.apache.org/jira/browse/LUCENE-3079
However, this solution requires a sidecar index for the taxonomy and I
am a bit worried about how this fits into the Solr index workflow.



Re: Taxonomy faceting

2011-06-30 Thread Chris Hostetter

: Lucid Imagination did a webcast on this, as far as I remember?

that was me ... the webcast was a pre-run of my apachecon talk...

http://www.lucidimagination.com/why-lucid/webinars/mastering-power-faceted-search
http://people.apache.org/~hossman/apachecon2010/facets/

...taxonomy stuff comes up ~slide 30

: The '1/topics/computing'-solution works at a single level, so if you are
: interested in a multi-level result like

if you want to show the whole tree when facetig you can just leave the 
depth number prefix out of terms, thta should work fine (but i haven't 
though about hard)

:  Are there better ways to achieve this?
: 
: Taxonomy faceting is a bit of a mess right now, but it is also an area
: where a lot is happening. For SOLR, there is

right, some of which i havne't been able to keep up on and can't comment 
on -- but in my experience if you are serious organizing your data in a 
taxonomy then you probably already have some data structure in your 
application layer that models the whole thing in memory, and maps nodeIds 
to nodeLabels and what not.  What usually works fine is to just index the 
nodeIds for the entire ancestory of the category each Document is in can 
work fine for the filtering (ie: fq=cat:1234), and to generate the facet 
presentation you do a simple facet.field=ancestorCategoriesfacet.limit=-1 
to get all the counts in a big hashmap and then use that to annotate your 
own own category tree data structure that you use to generate the 
presentaiton.



-Hoss


Re: taxonomy faceting

2011-02-23 Thread Chris Hostetter

: I have many taxonomies and each document can apply to some of them. I dont
: know how many taxonomies they are, so i cant define a field in the schema
: for each taxonomy (one field per each taxonomy).
: 
: I want to use these feature but i need to know if i can handle the context
: where each document apply few taxonomies and i cant define a field for each
: taxonomy on the schema because they are dinamyc. Can Solr handle these
: situation?

Well, i'm not sure that i really understand your question...

you could easily use a dynamic field to declare taxonomy_* naming pattern 
for all of your taxonomy fields.  so then as long as you know what 
taxonomies each doc is in (and which branches it is in in each of those 
taxonomies) when you index teh doc you'd be fine.  but if you don't 
actaully know the list of all taxonomies, what owuld you do with those 
fields once you indexed them?

alternately you could model your data so that you only had one taxonomy 
field, and the root level nodes of that taxonomy would be the names of 
each of the multitudes of taxonomies you have -- then the same faceting 
tricks i described in that webinar would work (but again: youd have to 
know know what taxonomies each doc is in, and which branches it is in in 
each of those taxonomies, when you index each doc).





-Hoss