On May 11, 2006, at 11:47 AM, Yonik Seeley wrote:
Also, we allow for inverted facet selection as well, allowing a user
to select all documents that do not have a specified value.
So for a certain facet like "platform:pc", you also allow for "-
platform:pc"?
Yup! And it magically is lightening fast with the BitSet stuff I've
implemented. It is a handy feature in our domain (19th century
literature). "Show me all documents in 1870 that Dante Gabriel
Rossetti did NOT create" - this is done completely with BitSet's when
no full-text queries are used. The key thing is that the facets
return back value/counts for each of the non-zero facets (only the
values for documents that match the constraints).
If this is a common enough thing for faceted browsing, we should
probably build in support for that in the Solr APIs somehow (w/o
storing DocSets for both).
I'm not sure how common an inverted constraint is, but it certainly
is key to my world :)
Do you facet on all terms for a particular set of fields, or are the
terms to be faceted on defined outside the system? If the former,
most of your system would fall into what I would think of as "simple"
faceted browsing, that should be supported by default some day. The
latter isn't too big of a leap either... maybe with the terms defined
in solrconfig.xml or something.
I'm afraid to let folks outside my group bang on it, but the non-Solr
architecture (XML-RPC-based Lucene search server) is up and running
here: http://www.nines.org/search/browse (be nice, and also note that
it may very well go down as this is not a production-quality
deployment). The UI is a bit sluggish because of the fairly large
(by HTML standards, not Lucene) number of facet values being
rendered. But you'll see that you can add any number of
constraints. Things get faster to render as the set is constrained.
The pie charts and numbers are all dynamic based on the current
constraints. A constraint can be added in the negative sense by
clicking the "-", or it can be toggled once added by clicking the "+"
or "-" link.
The faceted fields are currently hard-coded - they require special
indexing considerations (indexed, but not tokenized). And the set of
values in each field is fairly limited, but the agent (author,
creator, artist, etc) is the most unconstrained one. I'm looking
forward to refactoring for DocSet's to leverage the LRU cache
goodness for this case as our data grows.
Erik