Re: solr-suggestion - terms that "start with"...

Erik Hatcher Tue, 16 May 2006 18:36:48 -0700


On May 16, 2006, at 9:19 PM, Erik Hatcher wrote:

User story: We have a lot of peoples names in our data ("agents"that in some way contributed to a 19th century work). We'rerefactoring our user interface to have a better navigation of thesenames, such that someone can just start typing and immediately(google-suggest style) see terms and their document frequencywithin a set of filters. Someone types "yo", pauses, and "YonikSeely (37)" appears. Also it would appear if someone typed "see".
Falling back on my Lucene know-how, I've gotten Solr to respondwith almost what I need using this code:
      TreeMap map = new TreeMap();
      String prefix = req.getParam("prefix");

      try {
        TermEnum enumerator = reader.terms(new Term(facet, prefix));

        do {
          Term term = enumerator.term();
if (term != null && term.field().equals(facet) &&term.text().startsWith(prefix)) {
            DocSet docSet = searcher.getDocSet(new TermQuery(term));
            BitSet bits = docSet.getBits();
            bits.and(constraintMask);
            map.put(term.text(), bits.cardinality());
          } else {
            break;
          }
        }
        while (enumerator.next());
      } catch (IOException e) {
        rsp.setException(e);
        numErrors++;
        return;
      }

      rsp.add(facet, map);
I'm going on gut feeling that Solr provides some handy benefits forme in this regard. For quick-and-dirty's sake I used DocSet.getBits() and did things the way I know how in order to AND it with anexisting constraintMask BitSet (built earlier in my custom requesthandler based on constraint parameters passed in).

I've just improved the code to be a better DocSet citizen and it nowdoes this:


              BitDocSet constraintDocSet = new BitDocSet(constraintMask);
              ...

map.put(term.text(), docSet.intersectionSize(constraintDocSet));

Oh, one other wrinkle to getting the stored field value is that theagent field is multi-valued, so several people could collaborate andhave their individual names associated with a work. So there aremultiple Lucene stored field values for the "agent" field. I'mguessing that the best way to do this sort of thing is to index justthese fields into a separate set of documents and query only those.Thoughts?


Thanks,
        Erik

Re: solr-suggestion - terms that "start with"...

Reply via email to