Re: request handler and caches

Erik Hatcher Thu, 11 May 2006 07:14:58 -0700

Thanks to Hoss and Yonik again(!) for their valuable assistancepointing me to better ways to do what I want with facets withinSolr's infrastructure. Very helpful.

At this point I need to pragmatically put the DocSet refactoring onhold to accomplish some other things, but I did get the SolrCache andfirstSearcher event listener working using my BitSet's and willtackle the DocSet migration in the near future.

A couple of questions about DocSet's though, so that I'm confidentI'll be able to get the same functionality...

Along with a BitSet for each term in selected fields, I also store a"catchall" BitSet that is an OR'd BitSet of all term BitSets and thenflipped (using BitSet.or() and .flip()). How can I flip a DocSet orachieve the same sort of thing? This catchall BitSet is used to show"<unspecified>" on the user interface for that field, to allowsomeone to select all documents that do not have any terms in thatfield.

Also, we allow for inverted facet selection as well, allowing a userto select all documents that do not have a specified value. Icurrently accomplish this in my loop to build up an aggregateconstraint BitSet by using its .andNot() method. How can Iaccomplish this using DocSet's?

If I can achieve these capabilities without too much effort, then myDocSet refactoring will happen sooner rather than later :)

Again thanks for all the help and rapid response. Most helpful, andalso shows that Solr is alive, vibrant, and extremely capable.


        Erik




On May 10, 2006, at 5:23 PM, Yonik Seeley wrote:

On 5/10/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:

For a fixed set of fields (currently 4 or so of them) I'm building a
HashMap keyed by field name, with the values of each key also a
HashMap, keyed by term value.  The value of the inner HashMap is a
BitSet representing all documents that have that value for that
field.  These BitSets are used for a faceted browser and ANDed
together based on user criteria, as well as combined with full-text
queries using QueryFilter's BitSet.  Nothing fancy, and perhaps
something Solr already helps provide?


Using Solr's DocSet implementations will dramatically speed up your
faceted browsing and reduce your memory footprint.  You could store
these DocSets yourself (and turn off the filter cache so things aren't
doubly stored), but here is how I might go about it:

In your custom cache, just store the terms for the faceting fields
(everything but the bitsets).
field1 -> [term1, term2, term3, term4]
field2 -> [terma, termb, termc, termd]

Then when it comes time to get the count of items matching query x,
do
 count1 = searcher.numDocs(x,TermQuery(term1))
 count2 = searcher.numDocs(x,TermQuery(term2))
 ...

Solr will check the filter cache for "x" and for the TermQuery facets,
and generate them on the fly if they are not found.

What you loose:
 - teeny bit of performance because each facet gets looked up in a
HashMap (I've profiled... this has been negligible for us)

What you gain:
- re-use of the filtercache (including the filter for the base
query), much faster intersections with less average memory usage &
less garbage produced
- an ability to easily cap the number of filters used for the facets,
allowing a gradual reduction in performance as cache hits lower,
rather than an OOM.

Re: request handler and caches

Reply via email to