Let’s see if I can help… I do see that the ability to customize the wait strategy it via config file was removed sometime after v0.10.2. So there is no way to do that anymore but you can try to tweak topology.disruptor.wait.timeout.millis and see if that helps your situation.
The TimeoutBlockingWaitStrategy is what is being used for the Disruptor currently if the timeout>0 … or LiteBlockingWaitStrategy (if timeout=0) Is your primary concern that too much kernel CPU is being used in idle mode (in the wait strategy) and consequently it is affecting your ability to run more topos on the same box ? -Roshan From: Roee Shenberg <shenb...@alooma.io> Reply-To: "user@storm.apache.org" <user@storm.apache.org> Date: Monday, June 12, 2017 at 10:06 AM To: "user@storm.apache.org" <user@storm.apache.org> Subject: Re: High kernel cpu usage when running 10 topologies on the same worker Some better diagnostics: I got a perf recording (30secs) of an idle topology on a system with 10 idle topologies, using perf-map-agent to create a .map file for jitted code. The flame graph is attached, and it strengthens my desire to use a waiting strategy that isn't based on condition variables seeing as the vast majority of the time was spent in futexes. On Sun, Jun 11, 2017 at 6:39 PM, Roee Shenberg <shenb...@alooma.io<mailto:shenb...@alooma.io>> wrote: A more practical question is - is there a way to replace the disruptor waiting strategy? I suspect the issue is the ginormous amount of threads vying to be woken up, running the disruptor queue with SleepWaitStrategy seems like it should alleviate this pain, but it seems the ability to set the wait strategy was removed in the transition from clojure to java in versions 0.10.x->1.0.x. On Sun, Jun 11, 2017 at 3:32 PM, Roee Shenberg <shenb...@alooma.io<mailto:shenb...@alooma.io>> wrote: Our storm cluster (1.0.2) is running many trident topologies, each one is local to a single worker, with each supervisor having 10 worker slots. Every slot runs a copy of the same topology with a different configuration, the topology being a fairly fat trident topology (e.g. ~300 threads per topology - totalling >3000 threads on the machine) A quick htop showed a grim picture of most CPU time being spent in the kernel: [nline image 1] (note: running as root inside a docker) Here's an example top summary line: %Cpu(s): 39.4 us, 51.1 sy, 0.0 ni, 8.6 id, 0.0 wa, 0.0 hi, 0.1 si, 0.8 st This suggests actual kernel time waste, not I/O, irqs, etc, so I ran sudo strace -cf -p 2466 to get a feel for what's going on: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 86.84 3489.872073 14442 241646 27003 futex 10.69 429.437949 271453 1582 epoll_wait 1.88 75.608000 361761 209 108 restart_syscall 0.29 11.722287 46889 250 recvfrom 0.12 4.736911 92 51379 gettimeofday 0.08 3.173336 12 254162 clock_gettime 0.06 2.234660 4373 511 poll ... I don't understand whether threads that are simply blocking are counted (in which case this is a worthless measure) or not. I ran jvmtop to get some runtime data out of one of the topologies (well, a few, but they were all roughly the same): TID NAME STATE CPU TOTALCPU BLOCKEDBY 203 Thread-27-$mastercoord-bg0-exe RUNNABLE 57.55% 8.54% 414 RMI TCP Connection(3)-172.17.0 RUNNABLE 8.03% 0.01% 22 disruptor-flush-trigger TIMED_WAITING 3.79% 4.49% 51 heartbeat-timer TIMED_WAITING 0.80% 1.66% 328 disruptor-flush-task-pool TIMED_WAITING 0.61% 0.84% ... So just about all of the time is spent inside the trident master coordinator. My theory is that the ridiculous thread count is causing the kernel to work extra-hard on all those futex calls (e.g. when waking a thread blocking on a futex). I'm very uncertain regarding this, the best I can say is that the overhead is related more to the number of topologies than to what the topology is doing (when running 1 topology on the same worker, cpu use is less than 1/10th of what it is with 10 topologies), and there are a lot of threads on the system (>3000). Any advice, suggestions for additional diagnostics, or ideas as to the cause? Operationally, we're planning moving to smaller instances with less slots per worker to work around this issue, but I'd rather resolve it without changing our cluster's makeup entirely. Thanks, Roee