Re: Lucene Search has poor cpu utilization on a 4-CPU machine
Doug Cutting wrote: Aviran wrote: I changed the Lucene 1.4 final source code and yes this is the source version I changed. Note that this patch won't produce the a speedup on earlier releases, since their was another multi-thread bottleneck higher up the stack that was only recently removed, revealing this lower-level bottleneck. The other patch was: http://www.mail-archive.com/[EMAIL PROTECTED]/msg07873.html Both are required to see the speedup. Thanks... Also, is there any reason folks cannot use 1.4 final now? No... just that I'm trying to be conservative... I'm probably going to look at just migrating to 1.4 ASAP but we're close to a milestone... Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search has poor cpu utilization on a 4-CPU machine
Aviran wrote: I changed the Lucene 1.4 final source code and yes this is the source version I changed. Note that this patch won't produce the a speedup on earlier releases, since their was another multi-thread bottleneck higher up the stack that was only recently removed, revealing this lower-level bottleneck. The other patch was: http://www.mail-archive.com/[EMAIL PROTECTED]/msg07873.html Both are required to see the speedup. Also, is there any reason folks cannot use 1.4 final now? Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene Search has poor cpu utilization on a 4-CPU machine
I changed the Lucene 1.4 final source code and yes this is the source version I changed. Aviran -Original Message- From: Kevin A. Burton [mailto:[EMAIL PROTECTED] Sent: Monday, July 12, 2004 9:42 PM To: Lucene Users List Subject: Re: Lucene Search has poor cpu utilization on a 4-CPU machine Aviran wrote: >Bug 30058 posted > > > Which of course is here: http://issues.apache.org/bugzilla/show_bug.cgi?id=30058 Is this the source of the revision you modified? http://www.mail-archive.com/[EMAIL PROTECTED]/msg06116.html Also what version of Lucene? Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search has poor cpu utilization on a 4-CPU machine
Aviran wrote: Bug 30058 posted Which of course is here: http://issues.apache.org/bugzilla/show_bug.cgi?id=30058 Is this the source of the revision you modified? http://www.mail-archive.com/[EMAIL PROTECTED]/msg06116.html Also what version of Lucene? Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search has poor cpu utilization on a 4-CPU machine
Doug Cutting wrote: I noticed that the class org.apache.lucene.index.FieldInfos uses private class members Vector byNumber and Hashtable byName, both of which are synchronized objects. By changing the Vector byNumber to ArrayList byNumber I was able to get 110% improvement in performance (number of searches per second). That's impressive! Good job finding a bottleneck! Wow... thats awesome. We have all dual XEONs with Hyperthreading and kernel 2.6 so I imagine in this situation we'd see an improvement too. I wonder if we could break this out into a patch for legacy Lucene users. I'd like to see the stacktrace too. We're using a lot of synchronized code (Hashtable, Vector, etc) so I'm willing to bet this is happening in other places. My question is: do the fields byNumber and byName have to be synchronized and what can happen if I'll change them to be ArrayList and HashMap which are not synchronized ? Can this corrupt the index or the integrity of the results? I think that is a safe change. FieldInfos is only modifed by DocumentWriter and SegmentMerger, and there is no possibility of other threads accessing those instances. Please submit a patch to the developer mailing list. That would be great! Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene Search has poor cpu utilization on a 4-CPU machine
Bug 30058 posted Aviran -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Monday, July 12, 2004 1:38 PM To: Lucene Users List Subject: Re: Lucene Search has poor cpu utilization on a 4-CPU machine Aviran wrote: > I use Lucene 1.4 final > > Here is the thread dump for one blocked thread (If you want a full > thread dump for all threads I can do that too) Thanks. I think I get the point. I recently removed a synchronization point higher in the stack, so that now this one shows up! Whether or not you submit a patch, please file a bug report in Bugzilla with your proposed change, so that we don't lose track of this issue. Thanks, Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search has poor cpu utilization on a 4-CPU machine
Aviran wrote: I use Lucene 1.4 final Here is the thread dump for one blocked thread (If you want a full thread dump for all threads I can do that too) Thanks. I think I get the point. I recently removed a synchronization point higher in the stack, so that now this one shows up! Whether or not you submit a patch, please file a bug report in Bugzilla with your proposed change, so that we don't lose track of this issue. Thanks, Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene Search has poor cpu utilization on a 4-CPU machine
per mailing list? Just attach a changed java file to an email? Aviran -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Monday, July 12, 2004 12:42 PM To: Lucene Users List Subject: Re: Lucene Search has poor cpu utilization on a 4-CPU machine Aviran wrote: > First let me explain what I found out. I'm running Lucene on a 4 CPU > server. While doing some stress tests I've noticed (by doing full > thread dump) that searching threads are blocked on the method: public > FieldInfo fieldInfo(int > fieldNumber) This causes for a significant cpu idle time. What version of Lucene are you running? Also, can you please send the stack traces of the blocked threads, or at least a description of them? I'd be interested to see what context this happens in. In particular, which IndexReader and Searcher/Scorer/Weight methods does it happen under? > I noticed that the class org.apache.lucene.index.FieldInfos uses > private class members Vector byNumber and Hashtable byName, both of > which are synchronized objects. By changing the Vector byNumber to > ArrayList byNumber I was able to get 110% improvement in performance > (number of searches per second). That's impressive! Good job finding a bottleneck! > My question is: do the fields byNumber and byName have to be > synchronized and what can happen if I'll change them to be ArrayList > and HashMap which are not synchronized ? Can this corrupt the index or > the integrity of the results? I think that is a safe change. FieldInfos is only modifed by DocumentWriter and SegmentMerger, and there is no possibility of other threads accessing those instances. Please submit a patch to the developer mailing list. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search has poor cpu utilization on a 4-CPU machine
Aviran wrote: First let me explain what I found out. I'm running Lucene on a 4 CPU server. While doing some stress tests I've noticed (by doing full thread dump) that searching threads are blocked on the method: public FieldInfo fieldInfo(int fieldNumber) This causes for a significant cpu idle time. What version of Lucene are you running? Also, can you please send the stack traces of the blocked threads, or at least a description of them? I'd be interested to see what context this happens in. In particular, which IndexReader and Searcher/Scorer/Weight methods does it happen under? I noticed that the class org.apache.lucene.index.FieldInfos uses private class members Vector byNumber and Hashtable byName, both of which are synchronized objects. By changing the Vector byNumber to ArrayList byNumber I was able to get 110% improvement in performance (number of searches per second). That's impressive! Good job finding a bottleneck! My question is: do the fields byNumber and byName have to be synchronized and what can happen if I'll change them to be ArrayList and HashMap which are not synchronized ? Can this corrupt the index or the integrity of the results? I think that is a safe change. FieldInfos is only modifed by DocumentWriter and SegmentMerger, and there is no possibility of other threads accessing those instances. Please submit a patch to the developer mailing list. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene Search has poor cpu utilization on a 4-CPU machine
Hi all, First let me explain what I found out. I'm running Lucene on a 4 CPU server. While doing some stress tests I've noticed (by doing full thread dump) that searching threads are blocked on the method: public FieldInfo fieldInfo(int fieldNumber) This causes for a significant cpu idle time. I noticed that the class org.apache.lucene.index.FieldInfos uses private class members Vector byNumber and Hashtable byName, both of which are synchronized objects. By changing the Vector byNumber to ArrayList byNumber I was able to get 110% improvement in performance (number of searches per second). My question is: do the fields byNumber and byName have to be synchronized and what can happen if I'll change them to be ArrayList and HashMap which are not synchronized ? Can this corrupt the index or the integrity of the results? Thanks, Aviran - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]