Thanks. I admit it is a typing error. I choose to give up fetching "contents" 
for both python and java, but only to fetch the fields "name" and "path". 
However, I found nearly the same results. Moreover, experiments ongoing shows 
that great time consuming occurs when fetching the field "name" and "path" for 
pyLucene, while original java program did the job in a much shorter time. I 
doublt if it is caused by switch from a python caller to its native funtion. 
regards, 
[EMAIL PROTECTED] 写道:
Send pylucene-dev mailing list submissions to [email protected] To 
subscribe or unsubscribe via the World Wide Web, visit 
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev or, via email, 
send a message with subject or body 'help' to [EMAIL PROTECTED] You can reach 
the person managing the list at [EMAIL PROTECTED] When replying, please edit 
your Subject line so it is more specific than "Re: Contents of pylucene-dev 
digest..." Today's Topics: 1. performance experiment of PyLucene vs Lucene 
(Liang Xing) 2. Re: performance experiment of PyLucene vs Lucene (Brett Parker) 
---------------------------------------------------------------------- Message: 
1 Date: Fri, 11 May 2007 12:03:42 +0800 (CST) From: "Liang Xing" <[EMAIL 
PROTECTED]> Subject: [pylucene-dev] performance experiment of PyLucene vs 
Lucene To: [email protected] Message-ID: <[EMAIL PROTECTED]> 
Content-Type: text/plain; charset="gbk" [Title] My awful performance experiment 
of PyLucene vs Lucene [Results] PyLucene ?= 0.5 Lucene(as to the search 
capacity) with the samples program "SearchFiles.py" provided by PyLucene, and a 
java program tackling similar task, I found PyLucene show a awful result, that 
is, the average time for Pylucene in Searching is about twice that of 
JAVA-Lucene. The best Java Result(365713ms for 6400 searches) (most result lays 
around 400000ms) The best PyLucene(662815ms for 6400 searches)( mostly result 
lays around 680000ms) [Prequsitive] Intel-Pentium-D DuralCore 2.8GHZ DDR-1G 
centos(Linux) kernel 2.6.9 Lucene 2.1.0(ant/java) vs PyLucene 
2.1.0(lucene-java-2.1.0-509013, "_Pylucene.so" achieved from OSAF) (even worse 
result is achieved with lower PyLucene versions) Python 2.5.1 vs Java2 1.5.0_10 
[Object : index files] The data source includes a directory and 27000 or so 
files, size of 0.5kb to 20kb respectively. The Index files is built by a 
Pylucene test-program, namely IndexFile.py(with the Path Pylucene-X.X/samples/, 
but is revised a littel by me, to change the "Store Attribute of Field:Content 
as NO", Since otherwise the memory cost would be so huge with original python 
program) [object: Testcases] A file with Name "Zop3" containing 6400 English 
words(as our search words), each within a line. [Major Steps of two 
programe:Search.java vs xSearchIndex.py] Simply Searching and Retriving 
performance comparion between the two brother. [Peer Actions that will be 
summed up in our test] 1.Construct a index Searcher Object(SEARCH) in Java and 
python languages. 2.Use the Searcher to achieve a search result(HITS) from 
index already-exist. 3.LOOP within HITS document-object, while reading each 
field-value of result items. 4.Repeat Step1-3 for arbitary 6399 other similar 
testcases. 5.Get the Record of total consuming-time, which would be prequistive 
to achieve the average time. Here goes with my 
program(xSearchFiles.py)(Search.java) ---- import part: xSearchFiles.py( one 
complete search procedure )---- def RunSearch(searcher, parser, word): global 
logger, time_costing local_parse = parser.parse local_search = searcher.search 
start = datetime.now() hits = local_search(local_parse(word)) #map(Processor, 
hits) for i in xrange(0, hits.length()): getMethod = hits.doc(i).get 
getMethod("name"), getMethod("path"), getMethod("contents") end = 
datetime.now() during = end - start wss = ["[Result]", "[Time]"] wss.insert(1, 
'\t'+ str(hits.length())) wss.append('\t'+ str(during)+ '\n') 
logger.writelines(wss) time_costing += during.microseconds/1000 ---- import 
part: Search.java( one complete search procedure) ---- clock.start(); for (int 
i = 0; m_words != null && i < m_words.length; i++) { int testonly = 0; Query q 
= qp.parse(m_words[i]); Hits h = is.search(q); clock.suspend(); 
System.out.println("\r" + i); clock.resume(); for(int j = 0; j < h.length(); j 
++) { h.doc(j).get("name"); h.doc(j).get("path"); h.doc(j).get("contens"); 
testonly = j; } } clock.stop(); System.out.println("Total: " + clock.getTime() 
+ "ms."); .. -------------- next part -------------- An HTML attachment was 
scrubbed... URL: 
http://lists.osafoundation.org/pipermail/pylucene-dev/attachments/20070511/c45e5b77/attachment.html
 ------------------------------ Message: 2 Date: Fri, 11 May 2007 08:37:49 
+0100 From: Brett Parker <[EMAIL PROTECTED]> Subject: Re: [pylucene-dev] 
performance experiment of PyLucene vs Lucene To: [email protected] 
Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=us-ascii On 
Fri, May 11, 2007 at 12:03:42PM +0800, Liang Xing wrote: <snip 
class="Description + Python Example" /> > ---- import part: Search.java( one 
complete search procedure) ---- > clock.start(); > for (int i = 0; m_words != 
null && i < m_words.length; i++) > { > int testonly = 0; > Query q = 
qp.parse(m_words[i]); > Hits h = is.search(q); > clock.suspend(); > 
System.out.println("\r" + i); > clock.resume(); > for(int j = 0; j < 
h.length(); j ++) > { > h.doc(j).get("name"); > h.doc(j).get("path"); > 
h.doc(j).get("contens"); ^^^^^^^ Surely that should be contents - is this a 
typo in the mail or was this a copy paste? Because if this is a copy paste, and 
you're really fetching contens rather than contents, then that might well be 
why the java is seeming to go twice as fast as the python. > testonly = j; > } 
> } > clock.stop(); > System.out.println("Total: " + clock.getTime() + "ms."); 
> .. > Thanks, -- Brett Parker ------------------------------ 
_______________________________________________ pylucene-dev mailing list 
[email protected] 
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev End of 
pylucene-dev Digest, Vol 36, Issue 6 *******************************************
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to