Re: solr-suggestion - terms that "start with"...

Erik Hatcher Wed, 17 May 2006 04:03:01 -0700


On May 16, 2006, at 10:47 PM, Chris Hostetter wrote:

: I've just improved the code to be a better DocSet citizen and it now
: does this:
:
:             BitDocSet constraintDocSet = new BitDocSet(constraintMask);
:                ...
:                map.put(term.text(), docSet.intersectionSize
: (constraintDocSet));
how are you building constraintMask come from? ... if it's a BitSetyouare building up by executing a bunch of queries, getting theirDocSets,asking those DocSets for their bits, and then unioning/interesctingthem
then that's probably the best place where there's likely to be benefit
from Solr that you aren't taking advantage of already (except thati seemto recall you wanting to do things that DocSets don't currentlysupport:
like invert .. so maybe this is hte best way)

Yeah, I'm building up constraintMask using the refactoring to useSolr's caching - so its still BitSet's within my FacetCache, butthese are all pre-loaded during warming. I'll eventually refactor itfurther to use DocSet's, but for now the speed and memory usage areall more than acceptable (we have hundreds, not thousands ormillions, of facet values).

Of the cuff: the one thing i would do differnetly if it were me, is...

  BitDocSet constraintDocSet = new BitDocSet(constraintMask);
  ...
if (term != null && term.field().equals(facet) && term.text().startsWith(prefix)) {
     map.put(term.text(), searcher.numDocs(new TermQuery(term),
                                           constraintDocSet);
  } else {
  ...
...there's no performacne gain, but it makes your code a littlecleaner.

how does searcher.numDocs() compare to using docSet.intersectionSize(constraintDocSet)?

As for issue of how you get the values based on your prefix, iwould keep
using a TermEnum, but build it on a field that isn't tokenized.

That is currently how I have it set up. The agent field is nottokenized. However, I need it to be. Here's a concrete example."Dante Gabriel Rossetti" is one of the agents in our system. Usersshould be able to find him by typing either "d", "g", or "r" (caseinsensitive) and they'd see "Dante Gabriel Rossetti (42)" in thesuggest popup where 42 is the number of documents he's involved ingiven the constraints.

: Oh, one other wrinkle to getting the stored field value is that the
: agent field is multi-valued, so several people could collaborate and
: have their individual names associated with a work.  So there are

this won't be a problem with the multiValued="true" option ... it does
what you expect regardless of wether the field is text,string,integer,
tokenized/non-tokenized.

(well, it does what *I* expect ... if you exepct something and itdoesn't

do that -- let us know)

Here's a concrete multivalued example... a work has two agents "OtisHatcher" and "Erik Gospodnetic". The user types "g" and "ErikGospodnetic (2)" pops up, or types "o" and "Otis Hatcher (1)" popsup. So still not quite there - looks like I'll have to walk TermDocsand get the stored agent fields, but even then thats not refinedenough as I wouldn't know which agent field in the array of storedvalues was the match.


        Erik

Re: solr-suggestion - terms that "start with"...

Reply via email to