Re: Strange storm problem

Santosh Pingale Mon, 02 Nov 2015 00:00:35 -0800

Share storm UI screen shot so that we can have a look at stats of topology.
Also a visualization screen shot to know the flow.


On Mon, Nov 2, 2015 at 10:25 AM, Renjie Liu <[email protected]> wrote:

> The output speed is measured by the outpu of dstat, which shows the worker
> traffic speed.
>
> On Mon, Nov 2, 2015 at 10:52 AM, Nathan Leung <[email protected]> wrote:
>
>> How are you measuring output speed?  Is it possible that you are
>> experiencing problems with HBase?
>>
>> On Sun, Nov 1, 2015 at 9:22 PM, Renjie Liu <[email protected]>
>> wrote:
>>
>>> The result of jstat shows that it's not in full gc cycle but the minor
>>> gc takes more than 1s each time. However, the frequence of minor gc is
>>> quite low, which happens once every few seconds.
>>>
>>> On Mon, Nov 2, 2015 at 12:29 AM, Nathan Leung <[email protected]> wrote:
>>>
>>>> The box with no throughput might be in a gc loop. Check your heap
>>>> utilization and maybe increase worker heap if necessary. Also consider
>>>> decreasing the max spout pending, even without further details 20k seems
>>>> high.
>>>> On Nov 1, 2015 10:50 AM, "Harsha" <[email protected]> wrote:
>>>>
>>>>> Do you have any calls to external data sources which might be
>>>>> increasing the latency and causing tuple timeout?
>>>>>
>>>>>
>>>>> On Sun, Nov 1, 2015, at 04:49 AM, Renjie Liu wrote:
>>>>>
>>>>> Yes, I've set it to 20000
>>>>>
>>>>> On Sun, Nov 1, 2015 at 6:40 PM, Santosh Pingale <
>>>>> [email protected]> wrote:
>>>>>
>>>>> Have you set 'topology.*max*.*spout*.*pending'?*
>>>>>
>>>>> On Sun, Nov 1, 2015 at 2:26 PM, Renjie Liu <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Hi, storm community:
>>>>>
>>>>> We have a storm cluster deployed with 15 workers and recently we often
>>>>> experience failure since ack timeout. Our input source is kafka and we 
>>>>> used
>>>>> ganglia to monitor our cluster. Recently we experience failures every 12
>>>>> hours and following are my observations from some monitoring tools when
>>>>> problem happens:
>>>>>
>>>>>    1. Topology page shows that no worker was down since uptime of
>>>>>    each task are nearly equal to topology uptime
>>>>>    2. I've checked ganglia, the cpu report and mem report does not
>>>>>    give any clue about the problem. But network report shows something
>>>>>    unusual: the in speed decreases a little while the out speed decreases 
>>>>> to
>>>>>    nearly zero on some workers.
>>>>>    3. I've logged in to one of machines mentioned above, and found
>>>>>    out that one of the survivor areas always remains 100% full.
>>>>>    4. dstat show that csw turns to 4k+ every few seconds while it
>>>>>    remains around 400 in normal condition.
>>>>>
>>>>> Can anyone give us some hint about this problem?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Renjie Liu
>>>>> Department of Computer Science & Engineering
>>>>> Shanghai JiaoTong University
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Renjie Liu
>>> Department of Computer Science & Engineering
>>> Shanghai JiaoTong University
>>>
>>
>>
>
>
> --
> Renjie Liu
> Department of Computer Science & Engineering
> Shanghai JiaoTong University
>

Re: Strange storm problem

Reply via email to