On Tue, 1 Apr 2008, Dirk Rothe wrote:
On Wed, 26 Mar 2008 16:13:12 +0100, Andi Vajda <[EMAIL PROTECTED]>
wrote:
On Mar 26, 2008, at 2:16, "Dirk Rothe" <[EMAIL PROTECTED]> wrote:
I cannot find the HighFreqTerms Class from [1] in the flattened lucene
Namespace. Any obvious reasons why?
Probably because it's in a contrib jar file not currently on the list of
jar files in the PyLucene build. Adding the jar file to the list in
Makefile and rebuilding PyLucene should be enough to resolve the issue.
Andi..
Ok, but by inspecting the java code, this was pretty trivial to implement in
Python. Only curiosity, but do you think the java version would be
(significantly) faster. I'm not sure I understand the performance
implications from the jcc bridge.
I don't know. How about measuring it ?
The jcc bridge involves converting some literals from java to python (such
as strings), releasing the GIL (global interpreter lock) when leaving python
and reacquiring it when returnig.
The jcc bridge also keeps track of the java objects returned to python so
that they don't get garbage collected until python no longer uses them. This
is implemented via a C++ multimap.
It's been shown before that using a python HitCollector (used in a very
tight loop by the Lucene core) is significantly slower than using the java
equivalent [1].
Andi..
[1]
http://www.google.com/search?q=python+hitcollector&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
def getHighFreqTerms(indexPath,fieldName,topN):
''' get top n terms from field given by fieldName '''
reader = IndexReader.open(indexPath)
terms = reader.terms()
result = []
while terms.next():
if terms.term().field() == fieldName:
result.append((terms.docFreq(),unicode(terms.term())))
term = terms.next()
reader.close()
result.sort(reverse=True)
return result[:topN]
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev