On 8/31/06, Kevin Ollivier <[EMAIL PROTECTED]> wrote:

One thing I'd like to do with my indexes is provide a browsable list
of various metadata fields, such as Subject, so that users could
click on any subject in the index and get a list of documents which
have that subject.

I do something similar. I found that using the MatchAllDocs() query
was indeed too slow. Based on the Lucene In Action examples, I found
that using a term enumerator was faster. On my index of over a million
rows, it took just a few seconds. Based on the LIA example for
distance sorting, try this:

fieldName = 'subject'
uniqueFieldValues = set()

enumerator = reader.terms(Term(fieldName, ""))
if reader.numDocs() > 0:
   termDocs = reader.termDocs()
   try:
       while True:
           term = enumerator.term()
           if term is None:
               raise RuntimeError, "no terms in field %s" %(fieldName)
           if term.field() != fieldName:
               break
           termDocs.seek(enumerator)
           while termDocs.next():
               fieldValue = term.text()
               if fieldValue not in uniqueFieldValues:
                   uniqueFieldValues.append(fieldValue)
           if not enumerator.next():
               break
   finally:
       termDocs.close()
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to