On 01/02/2017 01:44, Kasi Lakshman Karthi Anbumony wrote:
(1)  Is Lucy multithreaded or single threaded?

Single-threaded.

(2) Are "C" runtime and bindings stable?

Yes.

(2) Is there preexisting benchmark code written in "C" to measure Lucy 
performance?

No.

(3) I am seeing one under devel/benchmarks/indexers/LuceneIndexer.java. But 
this one is written in Java and looks like benchmarking Lucene not Lucy. Am I 
right in my observation?

The corresponding Perl benchmark script for Lucy is lucy_indexer.plx:


https://git1-us-west.apache.org/repos/asf?p=lucy.git;a=tree;f=devel/benchmarks/indexers;h=77626c37285602941376c5e5950a20e50683da40;hb=HEAD

(4) I was thinking of modifying the lucy/c/sample applications as benchmarking 
application. Is this a good strategy.
Btw is there a good way to build sample files. I have to modify the Makefile in 
luc/c/ directory to build the sample files and  I am not sure if this is the 
correct way.

You can find some guidance on how to compile Lucy applications in the comment on top of getting_started.c:


https://git1-us-west.apache.org/repos/asf?p=lucy.git;a=blob;f=c/sample/getting_started.c;h=6d6193d772f2ceaac86c67cc49169878b4d4d2f6;hb=HEAD

Basically, you have to run the Clownfish compiler "cfc" to generate header files, then you can compile your code and link against libclownfish and liblucy.

Benchmark results for the indexer will largely depend on the particular Analyzer chain and the total size of your index. The default EasyAnalyzer consists of

- StandardTokenizer
- Unicode Normalizer
- SnowballStemmer

StandardTokenizer is pretty fast, but Normalizer and Stemmer are CPU-intensive. Last time I checked, they account for about two-thirds of the processing time for small indices.

A better benchmarking framework would be a much needed contribution.

Nick

Reply via email to