[Title]
My awful performance experiment of PyLucene vs Lucene
[Results]
PyLucene ?= 0.5 Lucene(as to the search capacity)
with the samples program "SearchFiles.py" provided by PyLucene, and a java
program tackling similar task, I found PyLucene show a awful result, that is,
the average time for Pylucene in Searching is about twice that of JAVA-Lucene.
The best Java Result(365713ms for 6400 searches) (most result lays around
400000ms)
The best PyLucene(662815ms for 6400 searches)( mostly result lays around
680000ms)
[Prequsitive]
Intel-Pentium-D DuralCore 2.8GHZ
DDR-1G
centos(Linux) kernel 2.6.9
Lucene 2.1.0(ant/java) vs
PyLucene 2.1.0(lucene-java-2.1.0-509013, "_Pylucene.so" achieved from OSAF)
(even worse result is achieved with lower PyLucene versions)
Python 2.5.1 vs Java2 1.5.0_10
[Object : index files]
The data source includes a directory and 27000 or so files, size of 0.5kb to
20kb respectively.
The Index files is built by a Pylucene test-program, namely IndexFile.py(with
the Path Pylucene-X.X/samples/, but is revised a littel by me, to change the
"Store Attribute of Field:Content as NO", Since otherwise the memory cost would
be so huge with original python program)
[object: Testcases]
A file with Name "Zop3" containing 6400 English words(as our search words),
each within a line.
[Major Steps of two programe:Search.java vs xSearchIndex.py]
Simply Searching and Retriving performance comparion between the two brother.
[Peer Actions that will be summed up in our test]
1.Construct a index Searcher Object(SEARCH) in Java and python languages.
2.Use the Searcher to achieve a search result(HITS) from index already-exist.
3.LOOP within HITS document-object, while reading each field-value of result
items.
4.Repeat Step1-3 for arbitary 6399 other similar testcases.
5.Get the Record of total consuming-time, which would be prequistive to achieve
the average time.
Here goes with my program(xSearchFiles.py)(Search.java)
---- import part: xSearchFiles.py( one complete search procedure )----
def RunSearch(searcher, parser, word):
global logger, time_costing
local_parse = parser.parse
local_search = searcher.search
start = datetime.now()
hits = local_search(local_parse(word))
#map(Processor, hits)
for i in xrange(0, hits.length()):
getMethod = hits.doc(i).get
getMethod("name"), getMethod("path"), getMethod("contents")
end = datetime.now()
during = end - start
wss = ["[Result]", "[Time]"]
wss.insert(1, '\t'+ str(hits.length()))
wss.append('\t'+ str(during)+ '\n')
logger.writelines(wss)
time_costing += during.microseconds/1000
---- import part: Search.java( one complete search procedure) ----
clock.start();
for (int i = 0; m_words != null && i < m_words.length; i++)
{
int testonly = 0;
Query q = qp.parse(m_words[i]);
Hits h = is.search(q);
clock.suspend();
System.out.println("\r" + i);
clock.resume();
for(int j = 0; j < h.length(); j ++)
{
h.doc(j).get("name");
h.doc(j).get("path");
h.doc(j).get("contens");
testonly = j;
}
}
clock.stop();
System.out.println("Total: " + clock.getTime() + "ms.");
..
_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev