On Wed, 26 Mar 2008 16:13:12 +0100, Andi Vajda <[EMAIL PROTECTED]>
wrote:
On Mar 26, 2008, at 2:16, "Dirk Rothe" <[EMAIL PROTECTED]> wrote:
I cannot find the HighFreqTerms Class from [1] in the flattened lucene
Namespace. Any obvious reasons why?
Probably because it's in a contrib jar file not currently on the list of
jar files in the PyLucene build. Adding the jar file to the list in
Makefile and rebuilding PyLucene should be enough to resolve the issue.
Andi..
Ok, but by inspecting the java code, this was pretty trivial to implement
in Python. Only curiosity, but do you think the java version would be
(significantly) faster. I'm not sure I understand the performance
implications from the jcc bridge.
def getHighFreqTerms(indexPath,fieldName,topN):
''' get top n terms from field given by fieldName '''
reader = IndexReader.open(indexPath)
terms = reader.terms()
result = []
while terms.next():
if terms.term().field() == fieldName:
result.append((terms.docFreq(),unicode(terms.term())))
term = terms.next()
reader.close()
result.sort(reverse=True)
return result[:topN]
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev