Hi Srinath,

Thanks for the update.  I'm not quite sure why your solution would remedy
the solution (seems like there are more tuples in flight in the system),
but it's great that you could provide a working setup.

Michael


On Tue, Apr 15, 2014 at 11:33 PM, Srinath C <[email protected]> wrote:

> Hi Michael,
>     I experimented a bit by making changes to my topology and now I'm
> seeing consistent acking with little failures on the spout.
>
>     My topology had a spout S emitting tuples and two BaseRichBolts B1
> (store tuple) and B2 (aggregate tuple) were receiving tuples from the
> default stream of the spout. I made S emit each tuple twice on different
> streams - one with a message Id for reliable delivery streamed out to B1
> and another stream without a message Id streamed out to B2.
>
>    With this change there is a significant amount of improvement to the
> number of failed tuples. Its almost down to 1-2% of the total now. Even the
> failures occurred only at peak rates of tuple rate.
>
>    I'd like to try out more experiments to figure what was wrong with my
> earlier topology, but I'm time constrained right now.
>    Hope it helps and let me know if you figure out anything..
>
> Regards,
> Srinath.
>
>
>
> On Tue, Apr 15, 2014 at 9:33 PM, Michael Chang <[email protected]> wrote:
>
>> Hey Srinath,
>>
>> Yep, our ackers don't seem overloaded at all, and the behavior you are
>> seeing sounds exactly like what we are seeing here.
>>
>>
>> On Tue, Apr 15, 2014 at 6:47 AM, Srinath C <[email protected]> wrote:
>>
>>> I have been seeing this behaviour on 0.9.0.1 running on (aws & non-vpc).
>>> All tuples get a fail() on the spout and I'm not sure why. Even a simple
>>> case of spoutA -> boltB is showing up this behaviour after a continuous
>>> flow of tuples.
>>>
>>> So far increasing ACKer count hasn't helped. All I could figure out was
>>> the fail() is called from  backtype.storm.utils.RotatingMap#rotate
>>> which I believe means that the topology.max.spout.pending time has
>>> exceeded and the tuple is not yet marked as completed. I'm pretty sure
>>> there are no exceptions in handling the tuples.
>>>
>>> Will update if I find any insights.
>>>
>>>
>>>
>>> On Tue, Apr 15, 2014 at 3:07 PM, 朱春来 <[email protected]> wrote:
>>>
>>>> Hi Michael Chang,
>>>>
>>>>
>>>>
>>>>          Did you ack or fail tuple in the bolt timely and please check
>>>> the bolt processing speed of a tuple.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *发件人:* Michael Chang [mailto:[email protected]]
>>>> *发送时间:* 2014年4月15日 16:41
>>>> *收件人:* [email protected]
>>>> *主题:* Storm Topology Halts
>>>>
>>>>
>>>>
>>>> [email protected] all,
>>>>
>>>>
>>>>
>>>> Issue:
>>>>
>>>>
>>>>
>>>> We are having issues with stuck topologies.  When submitted and
>>>> started, our topology will start processing for a while, then completely
>>>> halt for around topology.max.spout.pending seconds, after which it seems
>>>> that all the in-flight tuples are failed.  This cycle will loop
>>>> continuously.  Has anybody seen this issue / have suggestions about how to
>>>> debug?
>>>>
>>>>
>>>>
>>>> Environment:
>>>>
>>>>
>>>>
>>>> We are running a storm cluster in AWS, non-vpc.  We’re running 0.9.1
>>>> but using guava 16.0.1 and httpclient 4.3.1 in the lib path.  We were
>>>> originally trying this with the regular netty transport, and reverting back
>>>> to the zmq transport seemed to help at first, but now we’re seeing the same
>>>> behavior as well, so it seems like a deeper rooted problem than just the
>>>> transport.
>>>>
>>>>
>>>>
>>>> Any help would be appreciated.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>>
>>>>
>>>> Michael
>>>>
>>>
>>>
>>
>

Reply via email to