[pylucene-dev] performance experiment of PyLucene vs Lucene

Liang Xing Thu, 10 May 2007 21:03:47 -0700

[Title]
My awful performance experiment of PyLucene vs Lucene

[Results]
PyLucene ?= 0.5  Lucene(as to the search capacity)
with the samples program "SearchFiles.py" provided by PyLucene, and a java 
program tackling similar task, I found PyLucene show a awful result, that is, 
the average time for Pylucene in Searching is about twice that of JAVA-Lucene.


The best Java Result(365713ms for 6400 searches) (most result lays around 
400000ms)
The best PyLucene(662815ms for 6400 searches)( mostly result lays around 
680000ms)

[Prequsitive]
Intel-Pentium-D DuralCore 2.8GHZ  
DDR-1G
centos(Linux) kernel 2.6.9
Lucene 2.1.0(ant/java) vs
PyLucene 2.1.0(lucene-java-2.1.0-509013, "_Pylucene.so" achieved from OSAF)
(even worse result is achieved with lower PyLucene versions)
Python 2.5.1 vs Java2 1.5.0_10

[Object : index files]
The data source includes a directory and 27000 or so files, size of 0.5kb to 
20kb respectively.

The Index files is built by a Pylucene test-program, namely IndexFile.py(with 
the Path Pylucene-X.X/samples/, but is revised a littel by me, to change the 
"Store Attribute of Field:Content as NO", Since otherwise the memory cost would 
be so huge with original python program)

[object: Testcases]
A file with Name "Zop3" containing 6400 English words(as our search words), 
each within a line.

[Major Steps of two programe:Search.java vs xSearchIndex.py]
Simply Searching and Retriving performance comparion between the two brother.

[Peer Actions that will be summed up in our test]
1.Construct a index Searcher Object(SEARCH) in Java and python languages.
2.Use the Searcher to achieve a search result(HITS) from index already-exist. 
3.LOOP within HITS document-object, while reading each field-value of result 
items.
4.Repeat Step1-3 for arbitary 6399 other similar testcases.
5.Get the Record of total consuming-time, which would be prequistive to achieve 
the average time.

Here goes with my program(xSearchFiles.py)(Search.java)
---- import part: xSearchFiles.py( one complete search procedure )----

def RunSearch(searcher, parser, word):
    global logger, time_costing   
    local_parse  = parser.parse
    local_search = searcher.search
    start = datetime.now()
    hits  = local_search(local_parse(word))
    
    #map(Processor, hits)
    for i in xrange(0, hits.length()):
        getMethod = hits.doc(i).get
        getMethod("name"), getMethod("path"), getMethod("contents")
    end = datetime.now()
    during = end - start
    wss = ["[Result]", "[Time]"]
    wss.insert(1, '\t'+ str(hits.length()))
    wss.append('\t'+ str(during)+ '\n')
    logger.writelines(wss)
    time_costing += during.microseconds/1000

---- import part: Search.java( one complete search procedure) ----
  clock.start();
  for (int i = 0; m_words != null && i < m_words.length; i++)
  {
   int testonly = 0;
   Query q = qp.parse(m_words[i]);
   Hits h = is.search(q);
   clock.suspend();
   System.out.println("\r" + i);
   clock.resume();
   for(int j = 0; j < h.length(); j ++)
   {
    h.doc(j).get("name"); 
    h.doc(j).get("path");
    h.doc(j).get("contens");
    testonly = j; 
   }
  }
  clock.stop();
  System.out.println("Total: " + clock.getTime() + "ms.");
..

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

[pylucene-dev] performance experiment of PyLucene vs Lucene

Reply via email to