Re: Benchmarkers

2006-04-04 Thread Marvin Humphrey
On Apr 3, 2006, at 7:08 AM, karl wettin wrote: And if possible, it would be very interesting to see results using - d64 and -d32. And different platforms. I only have easy access to one machine running Java: my G4 1.67 MHz laptop, running Mac OS X 10.4.5. I agree that it would be very in

Re: Benchmarkers

2006-04-04 Thread karl wettin
3 apr 2006 kl. 17.26 skrev karl wettin: Solaris: HP DL145, 1 x Dualcore Opteron 2.2 GHz, 4 GB of RAM Linux: HP DL140 with 2x 3.06GHz Xeon CPUs and 4GB of RAM If you want me to, and package the benchmark tests in a way simple for me to run them, I'll run them on these machines for you. The

Re: Benchmarkers

2006-04-04 Thread Marvin Humphrey
On Apr 3, 2006, at 6:26 PM, Marvin Humphrey wrote: On Apr 3, 2006, at 5:43 PM, Doug Cutting wrote: Marvin Humphrey wrote: Plucene is a Lucene 1.3 port, so it doesn't have max_buffered_docs -- but I can set merge_factor to 1000. I would not recommend that. With a merge factor that high

Re: Benchmarkers

2006-04-03 Thread Marvin Humphrey
On Apr 3, 2006, at 5:43 PM, Doug Cutting wrote: Marvin Humphrey wrote: Plucene is a Lucene 1.3 port, so it doesn't have max_buffered_docs -- but I can set merge_factor to 1000. I would not recommend that. With a merge factor that high you may run out of file handles, and, moreover, I do

Re: Benchmarkers

2006-04-03 Thread Doug Cutting
Marvin Humphrey wrote: Plucene is a Lucene 1.3 port, so it doesn't have max_buffered_docs -- but I can set merge_factor to 1000. I would not recommend that. With a merge factor that high you may run out of file handles, and, moreover, I doubt that disks are very efficient when reading from

Re: Benchmarkers

2006-04-03 Thread Marvin Humphrey
On Apr 3, 2006, at 11:11 AM, Doug Cutting wrote: You might still, if you have time, try swapping in something like StopAnalyzer and/or turning off Field.Store.YES. The relative speeds of the various implementations may vary in interesting ways, since these paramters may emphasize differen

Re: Benchmarkers

2006-04-03 Thread Marvin Humphrey
On Apr 3, 2006, at 10:36 AM, Doug Cutting wrote: Marvin Humphrey wrote: IndexWriter writer = new IndexWriter(indexDir, new WhitespaceAnalyzer(), true); Please make sure that analyzers are comparable between the various engines you benchmark. WhitespaceAnalyzer is efficient, but

Re: Benchmarkers

2006-04-03 Thread Marvin Humphrey
On Apr 3, 2006, at 6:57 AM, Yonik Seeley wrote: A couple of points: - Are all the lucene variations using the same index parameters? max buffered docs, index format (compound or not), mergeFactor, etc I personally use non-compound index format, max buffered docs=1000, mergeFactor=10

Re: Benchmarkers

2006-04-03 Thread Doug Cutting
Doug Cutting wrote: Please make sure that analyzers are comparable between the various engines you benchmark. I just went back and re-read what you're benchmarking, and they're all versions of Lucene, so you're probably already using comparable analyzers! Sorry for not noticing that the firs

Re: Benchmarkers

2006-04-03 Thread Doug Cutting
Marvin Humphrey wrote: IndexWriter writer = new IndexWriter(indexDir, new WhitespaceAnalyzer(), true); Please make sure that analyzers are comparable between the various engines you benchmark. WhitespaceAnalyzer is efficient, but results in far more tokens and terms than, e.g., Sto

Re: Benchmarkers

2006-04-03 Thread karl wettin
3 apr 2006 kl. 16.50 skrev Grant Ingersoll: And if possible, it would be very interesting to see results using -d64 and -d32. And different platforms. So far I've got best results in decending order on Solaris, OS X and last(!) Linux. Solaris is straight out amazing under heavy load. Might

Re: Benchmarkers

2006-04-03 Thread Grant Ingersoll
karl wettin wrote: And if possible, it would be very interesting to see results using -d64 and -d32. And different platforms. So far I've got best results in decending order on Solaris, OS X and last(!) Linux. Solaris is straight out amazing under heavy load. Might even do the switch next

Re: Benchmarkers

2006-04-03 Thread karl wettin
3 apr 2006 kl. 15.57 skrev Yonik Seeley: - use enough heap so too much time isn't taken in GC I recommend -XX:+AggressiveHeap. And if possible, it would be very interesting to see results using - d64 and -d32. And different platforms. So far I've got best results in decending order on

Re: Benchmarkers

2006-04-03 Thread Yonik Seeley
Hi Marvin, A couple of points: - Are all the lucene variations using the same index parameters? max buffered docs, index format (compound or not), mergeFactor, etc I personally use non-compound index format, max buffered docs=1000, mergeFactor=10 - reading in the file line by line probably