One other possiblity is that the OS or BIOS is doing that, at least on a 
laptop. 
There is a new feature where, if the load is low enough, non multi threaded 
applications can be assigned to one processor and that processor has it's clock 
boosted so the older software will run faster on the new processors - Otherwise 
they run SLOWER!.

My brother has a cad program that runs slower on his new quad core because the 
base clock speed is slower than a single processor CPU. The software company is 
not taking the time to rewrite their code, excpet where they add features or 
fixes. 




----- Original Message ----

From: Brian Burke <bbu...@techtarget.com>
To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
Sent: Mon, January 10, 2011 10:56:27 AM
Subject: Re: Box occasionally pegs one cpu at 100%

This sounds like it could be garbage collection related, especially with a heap 
that large.  Depending on your jvm tuning, a FGC could take quite a while, 
effectively 'pausing' the JVM.

Have you looked at something like jstat -gcutil   or similar to monitor the 
garbage collection?


On Jan 10, 2011, at 1:36 PM, Simon Wistow wrote:

> I have a fairly classic master/slave set up.
> 
> Response times on the slave are generally good with blips periodically, 
> apparently when replication is happening.
> 
> Occasionally however the process will have one incredibly slow query and 
> will peg the CPU at 100%.
> 
> The weird thing is that it will remain that way even if we stop querying 
> it and stop replication and then wait for over 20 minutes. The only way 
> to fix the problem at that point is to restart tomcat.
> 
> Looking at slow queries around the time of the incident they don't look 
> particularly bad - they're predominantly filter queries running under 
> dismax and there doesn't seem to be anything unusual about them.
> 
> The index file is about 266G and has 30G of disk free. The machine has 
> 50G of RAM and is running with -Xmx35G.
> 
> Looking at the processes running it appears to be the main Java thread 
> that's CPU bound, not the child threads. 
> 
> Stracing the process gives a lot of brk instructions (presumably some 
> sort of wait loop) with occasional blips of: 
> 
> 
> mprotect(0x7fc5721d9000, 4096, PROT_READ) = 0
> futex(0x451c24a4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x451c24a0, 
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x4269dd14, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x4269dd10, 
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x7fbc941603b4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 
> 325, {1294683789, 614186000}, ffffffff) = 0
> futex(0x41d19b28, FUTEX_WAKE_PRIVATE, 1) = 0
> mprotect(0x7fc5721d8000, 4096, PROT_READ) = 0
> mprotect(0x7fc5721d8000, 4096, PROT_READ|PROT_WRITE) = 0
> futex(0x7fbc94eeb5b4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fbc94eeb5b0, 
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x426a6a28, FUTEX_WAKE_PRIVATE, 1) = 1
> mprotect(0x7fc5721d9000, 4096, PROT_NONE) = 0
> futex(0x41cae8f4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x41cae8f0, 
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
> futex(0x41cae328, FUTEX_WAKE_PRIVATE, 1) = 1
> futex(0x7fbc941603b4, FUTEX_WAIT_PRIVATE, 327, NULL) = 0
> futex(0x41d19b28, FUTEX_WAKE_PRIVATE, 1) = 0
> mmap(0x7fc2e0230000, 121962496, PROT_NONE, 
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 
> 0x7fc2e0230000
> mmap(0x7fbca58e0000, 237568, PROT_NONE, 
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 
> 0x7fbca58e0000
> 
> Any ideas about what's happening and if there's anyway to mitigate it? 
> If the box at least recovered then I could run another slave and load 
> balance between them working on the principle that the second box 
> would pick up the slack whilst the first box restabilised but, as it is, 
> that's not reliable.
> 
> Thanks,
> 
> Simon
> 

Reply via email to