Re: Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-14 Thread Kevin A. Burton
Doug Cutting wrote:
Aviran wrote:
I changed the Lucene 1.4 final source code and yes this is the source
version I changed.

Note that this patch won't produce the a speedup on earlier releases, 
since their was another multi-thread bottleneck higher up the stack 
that was only recently removed, revealing this lower-level bottleneck.

The other patch was:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg07873.html
Both are required to see the speedup.
Thanks...
Also, is there any reason folks cannot use 1.4 final now?
No... just that I'm trying to be conservative... I'm probably going to 
look at just migrating to 1.4 ASAP but we're close to a milestone...

Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-13 Thread Doug Cutting
Aviran wrote:
I changed the Lucene 1.4 final source code and yes this is the source
version I changed.
Note that this patch won't produce the a speedup on earlier releases, 
since their was another multi-thread bottleneck higher up the stack that 
was only recently removed, revealing this lower-level bottleneck.

The other patch was:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg07873.html
Both are required to see the speedup.
Also, is there any reason folks cannot use 1.4 final now?
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-13 Thread Aviran
I changed the Lucene 1.4 final source code and yes this is the source
version I changed.

Aviran

-Original Message-
From: Kevin A. Burton [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 12, 2004 9:42 PM
To: Lucene Users List
Subject: Re: Lucene Search has poor cpu utilization on a 4-CPU machine


Aviran wrote:

>Bug 30058 posted
>
>  
>
Which of course is here:

http://issues.apache.org/bugzilla/show_bug.cgi?id=30058

Is this the source of the revision you modified?

http://www.mail-archive.com/[EMAIL PROTECTED]/msg06116.html

Also what version of Lucene?

Kevin

-- 

Please reply using PGP.

http://peerfear.org/pubkey.asc

NewsMonster - http://www.newsmonster.org/

Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
   AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
  IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-12 Thread Kevin A. Burton
Aviran wrote:
Bug 30058 posted
 

Which of course is here:
http://issues.apache.org/bugzilla/show_bug.cgi?id=30058
Is this the source of the revision you modified?
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06116.html
Also what version of Lucene?
Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-12 Thread Kevin A. Burton
Doug Cutting wrote:
I noticed that the class org.apache.lucene.index.FieldInfos uses private
class members Vector byNumber and Hashtable byName, both of which are
synchronized objects. By changing the Vector byNumber to ArrayList 
byNumber
I was able to get 110% improvement in performance (number of searches 
per
second).

That's impressive! Good job finding a bottleneck!
Wow... thats awesome.
We have all dual XEONs with Hyperthreading and kernel 2.6 so I imagine 
in this situation we'd see an improvement too.

I wonder if we could break this out into a patch for legacy Lucene 
users. I'd like to see the stacktrace too.

We're using a lot of synchronized code (Hashtable, Vector, etc) so I'm 
willing to bet this is happening in other places.

My question is: do the fields byNumber and byName have to be 
synchronized
and what can happen if I'll change them to be ArrayList and HashMap 
which
are not synchronized ? Can this corrupt the index or the integrity of 
the
results?

I think that is a safe change. FieldInfos is only modifed by 
DocumentWriter and SegmentMerger, and there is no possibility of other 
threads accessing those instances. Please submit a patch to the 
developer mailing list.

That would be great!
Kevin
--
Please reply using PGP.
   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-12 Thread Aviran
Bug 30058 posted

Aviran

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 12, 2004 1:38 PM
To: Lucene Users List
Subject: Re: Lucene Search has poor cpu utilization on a 4-CPU machine


Aviran wrote:
> I use Lucene 1.4 final
> 
> Here is the thread dump for one blocked thread (If you want a full 
> thread dump for all threads I can do that too)

Thanks.  I think I get the point.  I recently removed a synchronization 
point higher in the stack, so that now this one shows up!

Whether or not you submit a patch, please file a bug report in Bugzilla 
with your proposed change, so that we don't lose track of this issue.

Thanks,

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-12 Thread Doug Cutting
Aviran wrote:
I use Lucene 1.4 final
Here is the thread dump for one blocked thread (If you want a full thread
dump for all threads I can do that too)
Thanks.  I think I get the point.  I recently removed a synchronization 
point higher in the stack, so that now this one shows up!

Whether or not you submit a patch, please file a bug report in Bugzilla 
with your proposed change, so that we don't lose track of this issue.

Thanks,
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-12 Thread Aviran
per mailing list? Just attach a
changed java file to an email?

Aviran

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 12, 2004 12:42 PM
To: Lucene Users List
Subject: Re: Lucene Search has poor cpu utilization on a 4-CPU machine


Aviran wrote:
> First let me explain what I found out. I'm running Lucene on a 4 CPU 
> server. While doing some stress tests I've noticed (by doing full 
> thread dump) that searching threads are blocked on the method: public 
> FieldInfo fieldInfo(int
> fieldNumber) This causes for a significant cpu idle time. 

What version of Lucene are you running?  Also, can you please send the 
stack traces of the blocked threads, or at least a description of them? 
  I'd be interested to see what context this happens in.  In particular, 
which IndexReader and Searcher/Scorer/Weight methods does it happen under?

> I noticed that the class org.apache.lucene.index.FieldInfos uses 
> private class members Vector byNumber and Hashtable byName, both of 
> which are synchronized objects. By changing the Vector byNumber to 
> ArrayList byNumber I was able to get 110% improvement in performance 
> (number of searches per second).

That's impressive!  Good job finding a bottleneck!

> My question is: do the fields byNumber and byName have to be 
> synchronized and what can happen if I'll change them to be ArrayList 
> and HashMap which are not synchronized ? Can this corrupt the index or 
> the integrity of the results?

I think that is a safe change.  FieldInfos is only modifed by 
DocumentWriter and SegmentMerger, and there is no possibility of other 
threads accessing those instances.  Please submit a patch to the 
developer mailing list.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-12 Thread Doug Cutting
Aviran wrote:
First let me explain what I found out. I'm running Lucene on a 4 CPU server.
While doing some stress tests I've noticed (by doing full thread dump) that
searching threads are blocked on the method: public FieldInfo fieldInfo(int
fieldNumber) This causes for a significant cpu idle time. 
What version of Lucene are you running?  Also, can you please send the 
stack traces of the blocked threads, or at least a description of them? 
 I'd be interested to see what context this happens in.  In particular, 
which IndexReader and Searcher/Scorer/Weight methods does it happen under?

I noticed that the class org.apache.lucene.index.FieldInfos uses private
class members Vector byNumber and Hashtable byName, both of which are
synchronized objects. By changing the Vector byNumber to ArrayList byNumber
I was able to get 110% improvement in performance (number of searches per
second).
That's impressive!  Good job finding a bottleneck!
My question is: do the fields byNumber and byName have to be synchronized
and what can happen if I'll change them to be ArrayList and HashMap which
are not synchronized ? Can this corrupt the index or the integrity of the
results?
I think that is a safe change.  FieldInfos is only modifed by 
DocumentWriter and SegmentMerger, and there is no possibility of other 
threads accessing those instances.  Please submit a patch to the 
developer mailing list.

Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Lucene Search has poor cpu utilization on a 4-CPU machine

2004-07-12 Thread Aviran
Hi all,
First let me explain what I found out. I'm running Lucene on a 4 CPU server.
While doing some stress tests I've noticed (by doing full thread dump) that
searching threads are blocked on the method: public FieldInfo fieldInfo(int
fieldNumber) This causes for a significant cpu idle time. 
I noticed that the class org.apache.lucene.index.FieldInfos uses private
class members Vector byNumber and Hashtable byName, both of which are
synchronized objects. By changing the Vector byNumber to ArrayList byNumber
I was able to get 110% improvement in performance (number of searches per
second).
 
My question is: do the fields byNumber and byName have to be synchronized
and what can happen if I'll change them to be ArrayList and HashMap which
are not synchronized ? Can this corrupt the index or the integrity of the
results?

Thanks,
Aviran



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]