On May 16, 2006, at 9:19 PM, Erik Hatcher wrote:
User story: We have a lot of peoples names in our data ("agents"
that in some way contributed to a 19th century work). We're
refactoring our user interface to have a better navigation of these
names, such that someone can just start typing and immediately
(google-suggest style) see terms and their document frequency
within a set of filters. Someone types "yo", pauses, and "Yonik
Seely (37)" appears. Also it would appear if someone typed "see".
Falling back on my Lucene know-how, I've gotten Solr to respond
with almost what I need using this code:
TreeMap map = new TreeMap();
String prefix = req.getParam("prefix");
try {
TermEnum enumerator = reader.terms(new Term(facet, prefix));
do {
Term term = enumerator.term();
if (term != null && term.field().equals(facet) &&
term.text().startsWith(prefix)) {
DocSet docSet = searcher.getDocSet(new TermQuery(term));
BitSet bits = docSet.getBits();
bits.and(constraintMask);
map.put(term.text(), bits.cardinality());
} else {
break;
}
}
while (enumerator.next());
} catch (IOException e) {
rsp.setException(e);
numErrors++;
return;
}
rsp.add(facet, map);
I'm going on gut feeling that Solr provides some handy benefits for
me in this regard. For quick-and-dirty's sake I used DocSet.getBits
() and did things the way I know how in order to AND it with an
existing constraintMask BitSet (built earlier in my custom request
handler based on constraint parameters passed in).
I've just improved the code to be a better DocSet citizen and it now
does this:
BitDocSet constraintDocSet = new BitDocSet(constraintMask);
...
map.put(term.text(), docSet.intersectionSize
(constraintDocSet));
Oh, one other wrinkle to getting the stored field value is that the
agent field is multi-valued, so several people could collaborate and
have their individual names associated with a work. So there are
multiple Lucene stored field values for the "agent" field. I'm
guessing that the best way to do this sort of thing is to index just
these fields into a separate set of documents and query only those.
Thoughts?
Thanks,
Erik