On May 16, 2006, at 10:47 PM, Chris Hostetter wrote:
: I've just improved the code to be a better DocSet citizen and it now
: does this:
:
:             BitDocSet constraintDocSet = new BitDocSet(constraintMask);
:                ...
:                map.put(term.text(), docSet.intersectionSize
: (constraintDocSet));

how are you building constraintMask come from? ... if it's a BitSet you are building up by executing a bunch of queries, getting their DocSets, asking those DocSets for their bits, and then unioning/interescting them
then that's probably the best place where there's likely to be benefit
from Solr that you aren't taking advantage of already (except that i seem to recall you wanting to do things that DocSets don't currently support:
like invert .. so maybe this is hte best way)

Yeah, I'm building up constraintMask using the refactoring to use Solr's caching - so its still BitSet's within my FacetCache, but these are all pre-loaded during warming. I'll eventually refactor it further to use DocSet's, but for now the speed and memory usage are all more than acceptable (we have hundreds, not thousands or millions, of facet values).

Of the cuff: the one thing i would do differnetly if it were me, is...

  BitDocSet constraintDocSet = new BitDocSet(constraintMask);
  ...
if (term != null && term.field().equals(facet) && term.text ().startsWith(prefix)) {
     map.put(term.text(), searcher.numDocs(new TermQuery(term),
                                           constraintDocSet);
  } else {
  ...

...there's no performacne gain, but it makes your code a little cleaner.

how does searcher.numDocs() compare to using docSet.intersectionSize (constraintDocSet)?

As for issue of how you get the values based on your prefix, i would keep
using a TermEnum, but build it on a field that isn't tokenized.

That is currently how I have it set up. The agent field is not tokenized. However, I need it to be. Here's a concrete example. "Dante Gabriel Rossetti" is one of the agents in our system. Users should be able to find him by typing either "d", "g", or "r" (case insensitive) and they'd see "Dante Gabriel Rossetti (42)" in the suggest popup where 42 is the number of documents he's involved in given the constraints.

: Oh, one other wrinkle to getting the stored field value is that the
: agent field is multi-valued, so several people could collaborate and
: have their individual names associated with a work.  So there are

this won't be a problem with the multiValued="true" option ... it does
what you expect regardless of wether the field is text,string,integer,
tokenized/non-tokenized.

(well, it does what *I* expect ... if you exepct something and it doesn't
do that -- let us know)

Here's a concrete multivalued example... a work has two agents "Otis Hatcher" and "Erik Gospodnetic". The user types "g" and "Erik Gospodnetic (2)" pops up, or types "o" and "Otis Hatcher (1)" pops up. So still not quite there - looks like I'll have to walk TermDocs and get the stored agent fields, but even then thats not refined enough as I wouldn't know which agent field in the array of stored values was the match.

        Erik

Reply via email to