RE: Lucene Speed under diff JVMs

2002-12-06 Thread Armbrust, Daniel C.
One more bit of info that I should have included:

The randomly generated documents consisted of 2 fields, one Text with 3 words, and one 
UnStored with 500 words.  Average word length was 7 characters.

If Otis (he wrote it, I just made a tweak or two) doesn't mind, I'll post the source 
code.

Dan


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Lucene Speed under diff JVMs

2002-12-06 Thread Otis Gospodnetic
Otis doesn't mind.

--- Armbrust, Daniel C. [EMAIL PROTECTED] wrote:
 One more bit of info that I should have included:
 
 The randomly generated documents consisted of 2 fields, one Text with
 3 words, and one UnStored with 500 words.  Average word length was 7
 characters.
 
 If Otis (he wrote it, I just made a tweak or two) doesn't mind, I'll
 post the source code.
 
 Dan
 
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




RE: Lucene Speed under diff JVMs

2002-12-06 Thread Jonathan Reichhold
It doesn't surprise me that the IBM JDK is faster indexing.  This JVM is
better optimized in this case from my experience.

I did some serious load testing with various JVM implementation from Sun
and IBM and found that the opposite when it came to searching.  I.e.
Lucene searches were fastest under Sun 1.4.1.  This JVM was consequently
able to handle a higher load (faster response increases queries/second).
IBM was drastically slower at handling queries.  I've never tried
Jrocket since I don't like the cost.

The index for my tests had 7million records and 6 major fields.  Queries
were randomly chosen from a list of 2 million real user queries.  The
query load was meant to simulate real loads from a production site.
This was all accomplished on a single 1U, Redhat Linux 7.2, 2-processor
box with 1 GB of RAM.  Query times were very good compared to previous
indexing methods.

Jonathan

-Original Message-
From: Armbrust, Daniel C. [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, December 05, 2002 2:47 PM
To: 'Lucene Users List'
Subject: Lucene Speed under diff JVMs


This may be of use to people who want to make lucene index faster.
Also, I'm curious as to what JVM most people run Lucene under, and if
anyone else has seen results like this:

I'm using the class that Otis wrote (see message from about 3 weeks ago)
for testing the scalability of lucene (more results on that later) and I
first tried running it under different versions of Java, to see where it
runs the fastest.  The class simply creates an index out of randomly
generated documents. 

All of the following were running on a dual CPU 1 GHz PIII Windows 2000
machine that wasn't doing much else during the benchmark.  The indexing
program was single threaded, so it only used one of the processors of
the machine.

java version 1.3.1_04
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_04-b02)
Java HotSpot(TM) Client VM (build 1.3.1_04-b02, mixed mode)

42 seconds/1000 documents

java version 1.4.1
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21) Java
HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)

42 seconds/1000 documents

Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01) BEA
WebLogic JRockit(R) Virtual Machine (build
8.0_Beta-1.4.1_01-win32-CROSIS-20021105-1617, Native Threads,
Generational Concurrent Garbage Collector)

35 seconds/1000 documents

java version 1.3.1
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1) Classic
VM (build 1.3.1, J2RE 1.3.1 IBM Windows 32 build cn131-20020403 (JIT
enabled: jitc))

27 seconds/1000 documents


As you can see, the IBM jvm pretty much smoked Suns.  And beat out
JRockit as well.  Just a hunch, but it wouldn't surprise me if search
times were also faster under the IBM jdk.  Has anyone else come to this
conclusion?


Dan

--
To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
For additional commands, e-mail:
mailto:[EMAIL PROTECTED]



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Lucene Speed under diff JVMs

2002-12-05 Thread Armbrust, Daniel C.
This may be of use to people who want to make lucene index faster.  Also, I'm curious 
as to what JVM most people run Lucene under, and if anyone else has seen results like 
this:

I'm using the class that Otis wrote (see message from about 3 weeks ago) for testing 
the scalability of lucene (more results on that later) and I first tried running it 
under different versions of Java, to see where it runs the fastest.  The class simply 
creates an index out of randomly generated documents. 

All of the following were running on a dual CPU 1 GHz PIII Windows 2000 machine that 
wasn't doing much else during the benchmark.  The indexing program was single 
threaded, so it only used one of the processors of the machine.

java version 1.3.1_04
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_04-b02)
Java HotSpot(TM) Client VM (build 1.3.1_04-b02, mixed mode)

42 seconds/1000 documents

java version 1.4.1
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)

42 seconds/1000 documents

Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01)
BEA WebLogic JRockit(R) Virtual Machine (build 
8.0_Beta-1.4.1_01-win32-CROSIS-20021105-1617, Native Threads, Generational Concurrent 
Garbage Collector)

35 seconds/1000 documents

java version 1.3.1
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1)
Classic VM (build 1.3.1, J2RE 1.3.1 IBM Windows 32 build cn131-20020403 (JIT enabled: 
jitc))

27 seconds/1000 documents


As you can see, the IBM jvm pretty much smoked Suns.  And beat out JRockit as well.  
Just a hunch, but it wouldn't surprise me if search times were also faster under the 
IBM jdk.  Has anyone else come to this conclusion?


Dan

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene Speed under diff JVMs

2002-12-05 Thread Leo Galambos
On Thu, 5 Dec 2002, Armbrust, Daniel C. wrote:

 I'm using the class that Otis wrote (see message from about 3 weeks ago)
 for testing the scalability of lucene (more results on that later) and I

May I ask you where one can get the source code? I cannot find it in 
archive. Thank you

-g-



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene Speed under diff JVMs

2002-12-05 Thread Joshua O'Madadhain
On Thu, 5 Dec 2002, Armbrust, Daniel C. wrote:

 I'm using the class that Otis wrote (see message from about 3 weeks ago)
 for testing the scalability of lucene (more results on that later) and I
 first tried running it under different versions of Java, to see where it
 runs the fastest.  The class simply creates an index out of randomly
 generated documents.

 All of the following were running on a dual CPU 1 GHz PIII Windows 2000
 machine that wasn't doing much else during the benchmark.  The indexing
 program was single threaded, so it only used one of the processors of
 the machine.

[snip specific measurements]

 As you can see, the IBM jvm pretty much smoked Suns.  And beat out
 JRockit as well.  Just a hunch, but it wouldn't surprise me if search
 times were also faster under the IBM jdk.  Has anyone else come to this
 conclusion?

Just a brief note on performance measurements and statistical sampling: no
offense, but if these are measurements of a single trial of 1000 documents
for each JVM, they're not so different that I'd be willing to conclude
that one JVM is notably faster for this task than another.  The problem is
compounded by the fact that it can be hard to tell just how much CPU is
being taken up by OS tasks (and this can fluctuate quite a lot).  If you
really want to quote statistics like this, using 5 or 10 trials would give
a more accurate notion of the real performance differences (if any).

Casuistically :),

Joshua O'Madadhain

  [EMAIL PROTECTED] Per Obscuriuswww.ics.uci.edu/~jmadden
   Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for.  -- Bill Watterson
 My opinions are too rational and insightful to be those of any organization.





--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]