On Wed, 26 Mar 2008 16:13:12 +0100, Andi Vajda <[EMAIL PROTECTED]> wrote:


On Mar 26, 2008, at 2:16, "Dirk Rothe" <[EMAIL PROTECTED]> wrote:

I cannot find the HighFreqTerms Class from [1] in the flattened lucene Namespace. Any obvious reasons why?

Probably because it's in a contrib jar file not currently on the list of jar files in the PyLucene build. Adding the jar file to the list in Makefile and rebuilding PyLucene should be enough to resolve the issue.

Andi..

Ok, but by inspecting the java code, this was pretty trivial to implement in Python. Only curiosity, but do you think the java version would be (significantly) faster. I'm not sure I understand the performance implications from the jcc bridge.


def getHighFreqTerms(indexPath,fieldName,topN):
    ''' get top n terms from field given by fieldName '''
    reader = IndexReader.open(indexPath)
    terms = reader.terms()
    result = []
    while terms.next():
        if terms.term().field() == fieldName:
            result.append((terms.docFreq(),unicode(terms.term())))
    term = terms.next()
    reader.close()

    result.sort(reverse=True)
    return result[:topN]



_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to