Thanks to Hoss and Yonik again(!) for their valuable assistance pointing me to better ways to do what I want with facets within Solr's infrastructure. Very helpful.

At this point I need to pragmatically put the DocSet refactoring on hold to accomplish some other things, but I did get the SolrCache and firstSearcher event listener working using my BitSet's and will tackle the DocSet migration in the near future.

A couple of questions about DocSet's though, so that I'm confident I'll be able to get the same functionality...

Along with a BitSet for each term in selected fields, I also store a "catchall" BitSet that is an OR'd BitSet of all term BitSets and then flipped (using BitSet.or() and .flip()). How can I flip a DocSet or achieve the same sort of thing? This catchall BitSet is used to show "<unspecified>" on the user interface for that field, to allow someone to select all documents that do not have any terms in that field.

Also, we allow for inverted facet selection as well, allowing a user to select all documents that do not have a specified value. I currently accomplish this in my loop to build up an aggregate constraint BitSet by using its .andNot() method. How can I accomplish this using DocSet's?

If I can achieve these capabilities without too much effort, then my DocSet refactoring will happen sooner rather than later :)

Again thanks for all the help and rapid response. Most helpful, and also shows that Solr is alive, vibrant, and extremely capable.

        Erik




On May 10, 2006, at 5:23 PM, Yonik Seeley wrote:

On 5/10/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
For a fixed set of fields (currently 4 or so of them) I'm building a
HashMap keyed by field name, with the values of each key also a
HashMap, keyed by term value.  The value of the inner HashMap is a
BitSet representing all documents that have that value for that
field.  These BitSets are used for a faceted browser and ANDed
together based on user criteria, as well as combined with full-text
queries using QueryFilter's BitSet.  Nothing fancy, and perhaps
something Solr already helps provide?

Using Solr's DocSet implementations will dramatically speed up your
faceted browsing and reduce your memory footprint.  You could store
these DocSets yourself (and turn off the filter cache so things aren't
doubly stored), but here is how I might go about it:

In your custom cache, just store the terms for the faceting fields
(everything but the bitsets).
field1 -> [term1, term2, term3, term4]
field2 -> [terma, termb, termc, termd]

Then when it comes time to get the count of items matching query x,
do
 count1 = searcher.numDocs(x,TermQuery(term1))
 count2 = searcher.numDocs(x,TermQuery(term2))
 ...

Solr will check the filter cache for "x" and for the TermQuery facets,
and generate them on the fly if they are not found.

What you loose:
 - teeny bit of performance because each facet gets looked up in a
HashMap (I've profiled... this has been negligible for us)

What you gain:
- re-use of the filtercache (including the filter for the base
query), much faster intersections with less average memory usage &
less garbage produced
- an ability to easily cap the number of filters used for the facets,
allowing a gradual reduction in performance as cache hits lower,
rather than an OOM.


Reply via email to