Like I suspected …your topology.max.spout.pending is disabled.
Set it to something like 10k or 50k  .. assuming your message sizes are in kb 
or less.

The worker stall/blocked issue may have been due to the backpressure subsystem. 
I remember reporting that bug, not sure if it got addressed fully. That’s why 
we disabled it by default.

-roshan

From: Alexandre Vermeerbergen <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Tuesday, May 2, 2017 at 2:11 AM
To: "[email protected]" <[email protected]>
Subject: Re: Disruptor Queue Filling Memory

Hi Rohan,
Thank you very much for your answers.

For your information, with Storm 1.0.1 our topologies work with the by-default 
enabled back-pressure, we sometimes have the blocked worker issue which we have 
mitigated by writing our own "fail-over" system which detects such situation 
and automatically restart impacted topologies.
With Storm 1.0.3, we no longer have blocked workers, but our lag sometimes gets 
crazy, CPU load bumps and we have a huge accumulation of memory with disruptor 
queue.
To answer your questions about our topologies' settings, here's what we 
currently have:
Required information

Property name (if not the same)

Property value

topology.acker.executors

-

1

topology.worker.max.heap.size.mb

-

768

worker heap size

worker.heap.memory.mb

768

max spout pending

topology.max.spout.pending

Null

back pressure settings

backpressure.disruptor.high.watermark
backpressure.disruptor.low.watermark
task.backpressure.poll.secs
topology.backpressure.enable

0.9
0.4
30
false

topology.message.timeout.secs

-

30


We're going to study metrics with your suggested approach
Best regards,
Alexandre


2017-05-02 9:52 GMT+02:00 Roshan Naik 
<[email protected]<mailto:[email protected]>>:
That ConcurrentLinkedQueue  is the overflow list that I was referring to 
earlier. It is part of org.apache.storm.utils.DisruptorQueue.
This DisruptorQueue class is Storm’s wrapper around the lmax disruptor q.

When a spout/bolt instance cannot emit() to its downstream bolt (within the 
same worker process), because the inbound DisruptorQ of the destination bolt is 
full… the messages are stashed away in the overflow linked list associated with 
that DisruptorQ . As the disruptor q gets gradually drained a bit, the messages 
from the overflow are drained into the available space in the Disruptor.

In cases like this the max spout pending, if enabled, should kick in to prevent 
excessive accumulation of un-acked messages in the topology.
I assume you are using ACKers in your topo ? Otherwise this won’t help.

Can you share the values of the below settings … as shown by the topology 
settings search box in the topology UI page …
- topology.acker.executors
- topology.worker.max.heap.size.mb:
- worker heap size
- max spout pending
- back pressure settings
 - topology.message.timeout.secs


Also on the topology metrics table, you may be able to identify which 
spout->bolt or bolt->bolt  connection is congested by looking at the 
‘transferred’/emits metrics of each spout and bolt. Also examine the ack counts.

It looks like Back pressure is still disabled by default.
https://github.com/apache/storm/blob/v1.0.3/conf/defaults.yaml
I am not sure how stable it is at the moment so wont be able to recommend on 
turning it on.

-roshan


From: Alexandre Vermeerbergen 
<[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, May 1, 2017 at 2:50 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>

Subject: Re: Disruptor Queue Filling Memory

Hello,
I think that I am experiencing the same kind of issue as Tim with Storm 1.0.3 : 
I have a big instability in my storm cluster whenever I add a certain topology, 
leading to very high CPU load on the VM which hosts the worker process getting 
this topology.
I made a heap dump, opened it with Eclipse MAT, and bingo: it gives me 
"org.apache.storm.utils.DisruptorQueue" as the leaks / problem suspect 1.
More detail on Eclipse MAT's output:

One instance of "org.apache.storm.utils.DisruptorQueue" loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x80013d40" occupies 766 807 504 (46,64%) 
bytes. The memory is accumulated in one instance of 
"java.util.concurrent.ConcurrentLinkedQueue$Node" loaded by "<system class 
loader>".

Keywords
org.apache.storm.utils.DisruptorQueue
sun.misc.Launcher$AppClassLoader @ 0x80013d40
java.util.concurrent.ConcurrentLinkedQueue$Node
The same set of topologies never "eats" that much CPU & memory with Storm 
1.0.1, so I guess that with https://issues.apache.org/jira/browse/STORM-1956 
the main difference between our full set of topologies working with Storm 1.0.1 
vers 1.0.3 is that we no longer have backpressure with Storm 1.0.3.
I have a few questions which consolidate Tim's:
1. Is backpressure enabled again by default with Storm 1.1.0 ?
2. Are there guidelines to re-enable backpressure and correctly tune it ?
Best regards,
Alexandre Vermeerbergen

2017-05-01 21:52 GMT+02:00 Tim Fendt 
<[email protected]<mailto:[email protected]>>:
We have max spout pending enabled and it is set to 1000 and we have the back 
pressure system turned off. We did see increased latency for the processor 
which contributed to the queueing. Given what you are saying I assume that 1000 
messages are just too large to fit in memory we have assigned? Should we look 
at turning on back pressure and reducing max spout mending?

Thanks,

--
Tim


From: Roshan Naik <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Monday, May 1, 2017 at 2:26 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Disruptor Queue Filling Memory

You are most likely experiencing back pressure and your max spout pending is 
not enabled. That is causing the overflow (unbounded) linked list inside stom's 
disruptor wrapper to swallow all the memory. You can try using max spout 
pending to throttle the spouts under such scenarios.

Get Outlook for iOS<https://aka.ms/o0ukef>


On Mon, May 1, 2017 at 11:56 AM -0700, "Tim Fendt" 
<[email protected]<mailto:[email protected]>> wrote:
We have been having an issue where after about a week of running our old gen on 
the JVM has troubles freeing space. I generated a heapdump during the last 
issue and found it to be filled with DisruptorQueue objects. Is there a memory 
leak with the disruptor queue or is there some configuration we are missing? We 
are running Storm version 1.0.2.

org.apache.storm.utils.DisruptorQueue$ThreadLocalBatcher and 
org.apache.storm.utils.DisruptorQueue classes fill the memory.
https://puu.sh/vCkQE/cda1f319ad.png

This is our config for the supervisors:
storm.local.dir: "/var/storm-local"
storm.zookeeper.servers:
    - “10.0.0.5”
storm.zookeeper.port: 2181

nimbus.seeds: ["10.0.0.6"]

supervisor.slots.ports:
    - 6700

worker.childopts: "-Xms3072m -Xmx3072m"


Thanks,

--
Tim

Confidentiality Notice: The information contained in this e-mail, including any 
attachment(s), is intended solely for use by the designated recipient(s). 
Unauthorized use, dissemination, distribution, or reproduction of this message 
by anyone other than the intended recipient(s), or a person designated as 
responsible for delivering such messages to the intended recipient, is strictly 
prohibited and may be unlawful. This e-mail may contain proprietary, 
confidential or privileged information. Any views or opinions expressed are 
solely those of the author and do not necessarily represent those of Virgin 
Pulse, Inc. If you have received this message in error, or are not the named 
recipient(s), please immediately notify the sender and delete this e-mail 
message.


Reply via email to