On May 16, 2006, at 10:47 PM, Chris Hostetter wrote:
: I've just improved the code to be a better DocSet citizen and it now
: does this:
:
: BitDocSet constraintDocSet = new BitDocSet(constraintMask);
: ...
: map.put(term.text(), docSet.intersectionSize
: (constraintDocSet));
how are you building constraintMask come from? ... if it's a BitSet
you
are building up by executing a bunch of queries, getting their
DocSets,
asking those DocSets for their bits, and then unioning/interescting
them
then that's probably the best place where there's likely to be benefit
from Solr that you aren't taking advantage of already (except that
i seem
to recall you wanting to do things that DocSets don't currently
support:
like invert .. so maybe this is hte best way)
Yeah, I'm building up constraintMask using the refactoring to use
Solr's caching - so its still BitSet's within my FacetCache, but
these are all pre-loaded during warming. I'll eventually refactor it
further to use DocSet's, but for now the speed and memory usage are
all more than acceptable (we have hundreds, not thousands or
millions, of facet values).
Of the cuff: the one thing i would do differnetly if it were me, is...
BitDocSet constraintDocSet = new BitDocSet(constraintMask);
...
if (term != null && term.field().equals(facet) && term.text
().startsWith(prefix)) {
map.put(term.text(), searcher.numDocs(new TermQuery(term),
constraintDocSet);
} else {
...
...there's no performacne gain, but it makes your code a little
cleaner.
how does searcher.numDocs() compare to using docSet.intersectionSize
(constraintDocSet)?
As for issue of how you get the values based on your prefix, i
would keep
using a TermEnum, but build it on a field that isn't tokenized.
That is currently how I have it set up. The agent field is not
tokenized. However, I need it to be. Here's a concrete example.
"Dante Gabriel Rossetti" is one of the agents in our system. Users
should be able to find him by typing either "d", "g", or "r" (case
insensitive) and they'd see "Dante Gabriel Rossetti (42)" in the
suggest popup where 42 is the number of documents he's involved in
given the constraints.
: Oh, one other wrinkle to getting the stored field value is that the
: agent field is multi-valued, so several people could collaborate and
: have their individual names associated with a work. So there are
this won't be a problem with the multiValued="true" option ... it does
what you expect regardless of wether the field is text,string,integer,
tokenized/non-tokenized.
(well, it does what *I* expect ... if you exepct something and it
doesn't
do that -- let us know)
Here's a concrete multivalued example... a work has two agents "Otis
Hatcher" and "Erik Gospodnetic". The user types "g" and "Erik
Gospodnetic (2)" pops up, or types "o" and "Otis Hatcher (1)" pops
up. So still not quite there - looks like I'll have to walk TermDocs
and get the stored agent fields, but even then thats not refined
enough as I wouldn't know which agent field in the array of stored
values was the match.
Erik