On May 11, 2006, at 11:47 AM, Yonik Seeley wrote:
Also, we allow for inverted facet selection as well, allowing a user
to select all documents that do not have a specified value.

So for a certain facet like "platform:pc", you also allow for "- platform:pc"?

Yup! And it magically is lightening fast with the BitSet stuff I've implemented. It is a handy feature in our domain (19th century literature). "Show me all documents in 1870 that Dante Gabriel Rossetti did NOT create" - this is done completely with BitSet's when no full-text queries are used. The key thing is that the facets return back value/counts for each of the non-zero facets (only the values for documents that match the constraints).

If this is a common enough thing for faceted browsing, we should
probably build in support for that in the Solr APIs somehow (w/o
storing DocSets for both).

I'm not sure how common an inverted constraint is, but it certainly is key to my world :)

Do you facet on all terms for a particular set of fields, or are the
terms to be faceted on defined outside the system?  If the former,
most of your system would fall into what I would think of as "simple"
faceted browsing, that should be supported by default some day.  The
latter isn't too big of a leap either... maybe with the terms defined
in solrconfig.xml or something.

I'm afraid to let folks outside my group bang on it, but the non-Solr architecture (XML-RPC-based Lucene search server) is up and running here: http://www.nines.org/search/browse (be nice, and also note that it may very well go down as this is not a production-quality deployment). The UI is a bit sluggish because of the fairly large (by HTML standards, not Lucene) number of facet values being rendered. But you'll see that you can add any number of constraints. Things get faster to render as the set is constrained. The pie charts and numbers are all dynamic based on the current constraints. A constraint can be added in the negative sense by clicking the "-", or it can be toggled once added by clicking the "+" or "-" link.

The faceted fields are currently hard-coded - they require special indexing considerations (indexed, but not tokenized). And the set of values in each field is fairly limited, but the agent (author, creator, artist, etc) is the most unconstrained one. I'm looking forward to refactoring for DocSet's to leverage the LRU cache goodness for this case as our data grows.

        Erik


Reply via email to