One other possiblity is that the OS or BIOS is doing that, at least on a laptop. There is a new feature where, if the load is low enough, non multi threaded applications can be assigned to one processor and that processor has it's clock boosted so the older software will run faster on the new processors - Otherwise they run SLOWER!.
My brother has a cad program that runs slower on his new quad core because the base clock speed is slower than a single processor CPU. The software company is not taking the time to rewrite their code, excpet where they add features or fixes. ----- Original Message ---- From: Brian Burke <bbu...@techtarget.com> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org> Sent: Mon, January 10, 2011 10:56:27 AM Subject: Re: Box occasionally pegs one cpu at 100% This sounds like it could be garbage collection related, especially with a heap that large. Depending on your jvm tuning, a FGC could take quite a while, effectively 'pausing' the JVM. Have you looked at something like jstat -gcutil or similar to monitor the garbage collection? On Jan 10, 2011, at 1:36 PM, Simon Wistow wrote: > I have a fairly classic master/slave set up. > > Response times on the slave are generally good with blips periodically, > apparently when replication is happening. > > Occasionally however the process will have one incredibly slow query and > will peg the CPU at 100%. > > The weird thing is that it will remain that way even if we stop querying > it and stop replication and then wait for over 20 minutes. The only way > to fix the problem at that point is to restart tomcat. > > Looking at slow queries around the time of the incident they don't look > particularly bad - they're predominantly filter queries running under > dismax and there doesn't seem to be anything unusual about them. > > The index file is about 266G and has 30G of disk free. The machine has > 50G of RAM and is running with -Xmx35G. > > Looking at the processes running it appears to be the main Java thread > that's CPU bound, not the child threads. > > Stracing the process gives a lot of brk instructions (presumably some > sort of wait loop) with occasional blips of: > > > mprotect(0x7fc5721d9000, 4096, PROT_READ) = 0 > futex(0x451c24a4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x451c24a0, > {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 > futex(0x4269dd14, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x4269dd10, > {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 > futex(0x7fbc941603b4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, > 325, {1294683789, 614186000}, ffffffff) = 0 > futex(0x41d19b28, FUTEX_WAKE_PRIVATE, 1) = 0 > mprotect(0x7fc5721d8000, 4096, PROT_READ) = 0 > mprotect(0x7fc5721d8000, 4096, PROT_READ|PROT_WRITE) = 0 > futex(0x7fbc94eeb5b4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7fbc94eeb5b0, > {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 > futex(0x426a6a28, FUTEX_WAKE_PRIVATE, 1) = 1 > mprotect(0x7fc5721d9000, 4096, PROT_NONE) = 0 > futex(0x41cae8f4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x41cae8f0, > {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 > futex(0x41cae328, FUTEX_WAKE_PRIVATE, 1) = 1 > futex(0x7fbc941603b4, FUTEX_WAIT_PRIVATE, 327, NULL) = 0 > futex(0x41d19b28, FUTEX_WAKE_PRIVATE, 1) = 0 > mmap(0x7fc2e0230000, 121962496, PROT_NONE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = > 0x7fc2e0230000 > mmap(0x7fbca58e0000, 237568, PROT_NONE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = > 0x7fbca58e0000 > > Any ideas about what's happening and if there's anyway to mitigate it? > If the box at least recovered then I could run another slave and load > balance between them working on the principle that the second box > would pick up the slack whilst the first box restabilised but, as it is, > that's not reliable. > > Thanks, > > Simon >