User story: We have a lot of peoples names in our data ("agents" that
in some way contributed to a 19th century work). We're refactoring
our user interface to have a better navigation of these names, such
that someone can just start typing and immediately (google-suggest
style) see terms and their document frequency within a set of
filters. Someone types "yo", pauses, and "Yonik Seely (37)"
appears. Also it would appear if someone typed "see".
Falling back on my Lucene know-how, I've gotten Solr to respond with
almost what I need using this code:
TreeMap map = new TreeMap();
String prefix = req.getParam("prefix");
try {
TermEnum enumerator = reader.terms(new Term(facet, prefix));
do {
Term term = enumerator.term();
if (term != null && term.field().equals(facet) && term.text
().startsWith(prefix)) {
DocSet docSet = searcher.getDocSet(new TermQuery(term));
BitSet bits = docSet.getBits();
bits.and(constraintMask);
map.put(term.text(), bits.cardinality());
} else {
break;
}
}
while (enumerator.next());
} catch (IOException e) {
rsp.setException(e);
numErrors++;
return;
}
rsp.add(facet, map);
I'm going on gut feeling that Solr provides some handy benefits for
me in this regard. For quick-and-dirty's sake I used DocSet.getBits
() and did things the way I know how in order to AND it with an
existing constraintMask BitSet (built earlier in my custom request
handler based on constraint parameters passed in).
The thing I'm missing is retrieving the stored field value and using
that instead of term.text() in the data sent back to the client. In
the example mentioned above, I currently get back "yonik (37)" if
"yo" was sent in as a prefix. But I want the full stored field name,
not the analyzed tokens.
Advice on how to implement what I'm after using Solr's infrastructure
(or just Lucene's) is welcome.
Thanks,
Erik