Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-04 Thread Daniel Noll
On Monday 04 February 2008 21:51:39 Michael McCandless wrote: > Even pre-2.3, you should have seen gains by adding threads, if indeed > your hardware has good concurrency. > > And definitely with the changes in 2.3, you should see gains by > adding threads. With regards to this, I have been wonder

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-04 Thread Michael McCandless
Even pre-2.3, you should have seen gains by adding threads, if indeed your hardware has good concurrency. And definitely with the changes in 2.3, you should see gains by adding threads. Note that as you add threads, the "sweet spot" for RAM buffer size increases. Ie, make the RAM buffe

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Jake Mannix
Note that in particular, we use the StandardTokenizer as part of our analyzer chain, which means it has the switch from the JavaCC version to the JFlex based code, which I'm betting is a substantial part of that speedup. -jake On Feb 3, 2008 2:11 PM, Briggs <[EMAIL PROTECTED]> wrote: > Damn, r

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Jake Mannix
The test in which we got the 11X speedup? That was single threaded. I haven't yet found a way to make multithreaded (shared IndexWriter) indexing perform with any better speed than singlethreaded, so that code is not enabled in our tests. Do you think that 2.3 would better take advantage of mult

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread ajay_garg
Hi Jake. Was the test conducted with a single indexing thread, or multiple ones ? Jake Mannix wrote: > > Hello all, > I know you lucene devs did a lot of work on indexing performance in 2.3, > and I just tested it out last thursday, so I thought I'd let you know how > it > fared: > > On

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Jake Mannix
Yeah, I should have mentioned - this was merely with a jar replacement, we haven't gotten around to doing fun 2.3-related stuff like making sure our domain-specific tokenizers use the next(Token), as well as making sure set all of our buffersizes by RAM used. We tried multithreading the process, a

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Briggs
Damn, really? I haven't had the opportunity to test this yet. Has anyone else seen this kind of improvement? On Feb 3, 2008 2:57 PM, Jake Mannix <[EMAIL PROTECTED]> wrote: > Hello all, > I know you lucene devs did a lot of work on indexing performance in 2.3, > and I just tested it out last

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Michael McCandless
Awesome! We are glad to hear that :) You might be able to make it even faster with the steps here: http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Mike Jake Mannix wrote: Hello all, I know you lucene devs did a lot of work on indexing performance in 2.3, and I just tested i

Indexing Speed: 2.3 vs 2.2 (real world numbers)

2008-02-03 Thread Jake Mannix
Hello all, I know you lucene devs did a lot of work on indexing performance in 2.3, and I just tested it out last thursday, so I thought I'd let you know how it fared: On a 2.17 million document index, a recent test gave indexing time to be: * lucene 2.2: 4.83 hours * lucene 2.3: 26 m