On May 16, 2006, at 9:19 PM, Erik Hatcher wrote:

User story: We have a lot of peoples names in our data ("agents" that in some way contributed to a 19th century work). We're refactoring our user interface to have a better navigation of these names, such that someone can just start typing and immediately (google-suggest style) see terms and their document frequency within a set of filters. Someone types "yo", pauses, and "Yonik Seely (37)" appears. Also it would appear if someone typed "see".

Falling back on my Lucene know-how, I've gotten Solr to respond with almost what I need using this code:

      TreeMap map = new TreeMap();
      String prefix = req.getParam("prefix");

      try {
        TermEnum enumerator = reader.terms(new Term(facet, prefix));

        do {
          Term term = enumerator.term();
if (term != null && term.field().equals(facet) && term.text().startsWith(prefix)) {
            DocSet docSet = searcher.getDocSet(new TermQuery(term));
            BitSet bits = docSet.getBits();
            bits.and(constraintMask);
            map.put(term.text(), bits.cardinality());
          } else {
            break;
          }
        }
        while (enumerator.next());
      } catch (IOException e) {
        rsp.setException(e);
        numErrors++;
        return;
      }

      rsp.add(facet, map);

I'm going on gut feeling that Solr provides some handy benefits for me in this regard. For quick-and-dirty's sake I used DocSet.getBits () and did things the way I know how in order to AND it with an existing constraintMask BitSet (built earlier in my custom request handler based on constraint parameters passed in).

I've just improved the code to be a better DocSet citizen and it now does this:

              BitDocSet constraintDocSet = new BitDocSet(constraintMask);
              ...
map.put(term.text(), docSet.intersectionSize (constraintDocSet));

Oh, one other wrinkle to getting the stored field value is that the agent field is multi-valued, so several people could collaborate and have their individual names associated with a work. So there are multiple Lucene stored field values for the "agent" field. I'm guessing that the best way to do this sort of thing is to index just these fields into a separate set of documents and query only those. Thoughts?

Thanks,
        Erik

Reply via email to