Maybe a thread dump would be useful if you still have some instance running on 7.7
On Wed, Feb 27, 2019 at 7:28 AM Lukas Weiss <lukas.we...@raiffeisen.it> wrote: > I can confirm this. Downgrading to 7.6.0 solved the issue. > Thanks for the hint. > > > > Von: "Joe Obernberger" <joseph.obernber...@gmail.com> > An: solr-user@lucene.apache.org, "Lukas Weiss" > <lukas.we...@raiffeisen.it>, > Datum: 27.02.2019 15:59 > Betreff: Re: High CPU usage with Solr 7.7.0 > > > > Just to add to this. We upgraded to 7.7.0 and saw very large CPU usage > on multi core boxes - sustained in the 1200% range. We then switched to > 7.6.0 (no other configuration changes) and the problem went away. > > We have a 40 node cluster and all 40 nodes had high CPU usage with 3 > indexes stored on HDFS. > > -Joe > > On 2/27/2019 5:04 AM, Lukas Weiss wrote: > > Hello, > > > > we recently updated our Solr server from 6.6.5 to 7.7.0. Since then, we > > have problems with the server's CPU usage. > > We have two Solr cores configured, but even if we clear all indexes and > do > > not start the index process, we see 100 CPU usage for both cores. > > > > Here's what our top says: > > > > root@solr:~ # top > > top - 09:25:24 up 17:40, 1 user, load average: 2,28, 2,56, 2,68 > > Threads: 74 total, 3 running, 71 sleeping, 0 stopped, 0 zombie > > %Cpu0 :100,0 us, 0,0 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, > > 0,0 st > > %Cpu1 :100,0 us, 0,0 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 0,0 si, > > 0,0 st > > %Cpu2 : 11,3 us, 1,0 sy, 0,0 ni, 86,7 id, 0,7 wa, 0,0 hi, 0,3 si, > > 0,0 st > > %Cpu3 : 3,0 us, 3,0 sy, 0,0 ni, 93,7 id, 0,3 wa, 0,0 hi, 0,0 si, > > 0,0 st > > KiB Mem : 8388608 total, 7859168 free, 496744 used, 32696 > > buff/cache > > KiB Swap: 2097152 total, 2097152 free, 0 used. 7859168 avail > Mem > > > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > COMMAND > > P > > 10209 solr 20 0 6138468 452520 25740 R 99,9 5,4 29:43.45 java > > -server -Xms1024m -Xmx1024m -XX:NewRatio=3 -XX:SurvivorRatio=4 > > -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 > > -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4 + 24 > > 10214 solr 20 0 6138468 452520 25740 R 99,9 5,4 28:42.91 java > > -server -Xms1024m -Xmx1024m -XX:NewRatio=3 -XX:SurvivorRatio=4 > > -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 > > -XX:+UseConcMarkSweepGC -XX:ConcGCThreads=4 + 25 > > > > The solr server is installed on a Debian Stretch 9.8 (64bit) on Linux > LXC > > dedicated Container. > > > > Some more server info: > > > > root@solr:~ # java -version > > openjdk version "1.8.0_181" > > OpenJDK Runtime Environment (build 1.8.0_181-8u181-b13-2~deb9u1-b13) > > OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode) > > > > root@solr:~ # free -m > > total used free shared buff/cache > > available > > Mem: 8192 484 7675 701 31 7675 > > Swap: 2048 0 2048 > > > > We also found something strange if we do an strace of the main process, > we > > get lots of ongoing connection timeouts: > > > > root@solr:~ # strace -F -p 4136 > > strace: Process 4136 attached with 48 threads > > strace: [ Process PID=11089 runs in x32 mode. ] > > [pid 4937] epoll_wait(139, <unfinished ...> > > [pid 4936] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4909] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4618] epoll_wait(136, <unfinished ...> > > [pid 4576] futex(0x7ff61ce66474, FUTEX_WAIT_PRIVATE, 1, NULL > <unfinished > > ...> > > [pid 4279] futex(0x7ff61ce62b34, FUTEX_WAIT_PRIVATE, 2203, NULL > > <unfinished ...> > > [pid 4244] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4227] futex(0x7ff56c71ae14, FUTEX_WAIT_PRIVATE, 2237, NULL > > <unfinished ...> > > [pid 4243] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4228] futex(0x7ff5608331a4, FUTEX_WAIT_PRIVATE, 2237, NULL > > <unfinished ...> > > [pid 4208] futex(0x7ff61ce63e54, FUTEX_WAIT_PRIVATE, 5, NULL > <unfinished > > ...> > > [pid 4205] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4204] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4196] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4195] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4194] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4193] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4187] restart_syscall(<... resuming interrupted restart_syscall > ...> > > <unfinished ...> > > [pid 4180] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4179] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4177] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4174] accept(133, <unfinished ...> > > [pid 4173] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4172] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4171] restart_syscall(<... resuming interrupted restart_syscall > ...> > > <unfinished ...> > > [pid 4165] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4164] futex(0x7ff61c1f5054, FUTEX_WAIT_PRIVATE, 3, NULL > <unfinished > > ...> > > [pid 4163] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4162] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4161] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4160] futex(0x7ff623d52c20, > > FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, 0xffffffff > > <unfinished ...> > > [pid 4159] futex(0x7ff61c1e9d54, FUTEX_WAIT_PRIVATE, 7, NULL > <unfinished > > ...> > > [pid 4158] futex(0x7ff61c1b7f54, FUTEX_WAIT_PRIVATE, 15, NULL > <unfinished > > ...> > > [pid 4157] futex(0x7ff61c1b5554, FUTEX_WAIT_PRIVATE, 19, NULL > <unfinished > > ...> > > [pid 4156] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4155] restart_syscall(<... resuming interrupted futex ...> > > <unfinished ...> > > [pid 4153] futex(0x7ff61c06c754, FUTEX_WAIT_PRIVATE, 7, NULL > <unfinished > > ...> > > [pid 4152] futex(0x7ff61c06ab54, FUTEX_WAIT_PRIVATE, 3, NULL > <unfinished > > ...> > > [pid 4151] futex(0x7ff61c068f54, FUTEX_WAIT_PRIVATE, 7, NULL > <unfinished > > ...> > > [pid 4150] futex(0x7ff61c067354, FUTEX_WAIT_PRIVATE, 7, NULL > <unfinished > > ...> > > [pid 4148] futex(0x7ff61c024a54, FUTEX_WAIT_PRIVATE, 403, NULL > > <unfinished ...> > > [pid 4165] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection > > timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564856, tv_nsec=849859736}, 0xffffffff <unfinished ...> > > [pid 4147] futex(0x7ff61c022e54, FUTEX_WAIT_PRIVATE, 415, NULL > > <unfinished ...> > > [pid 4146] futex(0x7ff61c021254, FUTEX_WAIT_PRIVATE, 397, NULL > > <unfinished ...> > > [pid 4145] futex(0x7ff61c01f654, FUTEX_WAIT_PRIVATE, 405, NULL > > <unfinished ...> > > [pid 4144] futex(0x7ff61c00e354, FUTEX_WAIT_PRIVATE, 1, NULL > <unfinished > > ...> > > [pid 4136] futex(0x7ff624b729d0, FUTEX_WAIT, 4144, NULL <unfinished > ...> > > [pid 4165] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed > > out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564856, tv_nsec=900162344}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564856, tv_nsec=950365105}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=586325}, 0xffffffff) = -1 ETIMEDOUT > (Connection > > timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=50791977}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=100997890}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=151206817}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=201402531}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=251616284}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=301813556}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=352036802}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=402239182}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=452439835}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=502635489}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=552844020}, 0xffffffff <unfinished ...> > > [pid 4156] <... restart_syscall resumed> ) = -1 ETIMEDOUT (Connection > > timed out) > > [pid 4156] futex(0x7ff61c1aba28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4156] futex(0x7ff61c1aba54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564858, tv_nsec=506449064}, 0xffffffff <unfinished ...> > > [pid 4165] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed > > out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=603013734}, 0xffffffff) = -1 ETIMEDOUT > > (Connection timed out) > > [pid 4165] futex(0x7ff61c1f7a28, FUTEX_WAKE_PRIVATE, 1) = 0 > > [pid 4165] futex(0x7ff61c1f7a54, FUTEX_WAIT_BITSET_PRIVATE, 1, > > {tv_sec=32564857, tv_nsec=653149664}, 0xffffffff^Cstrace: Process 4136 > > detached > > strace: Process 4144 detached > > strace: Process 4145 detached > > strace: Process 4146 detached > > strace: Process 4147 detached > > strace: Process 4148 detached > > strace: Process 4150 detached > > strace: Process 4151 detached > > strace: Process 4152 detached > > strace: Process 4153 detached > > .... > > > > > > Could you help us to determine what's wrong with our setup? > > > > Thank you very much, > > > > Kind regards > > Lukas Weiss > > > > --- > > This email has been checked for viruses by AVG. > > https://www.avg.com > > > >