Re: Storm failing all the tuples post worker restart

Devang Shah Mon, 03 Nov 2014 20:54:49 -0800

Thanks Sean.

We are using 0.9.2


Have not tried the issue with 0.9.3. Will try with this and confirm back.

Any ideas when Storm 0.9.3 will be available to be used in production
environments.

Thanks and Regards,
Devang
On 3 Nov 2014 11:38, "Sean Zhong" <[email protected]> wrote:

> Hi Devang,
>
> Which storm version are you using?
> You may want to check STORM-404 and STORM-329.
>
> Sean
>
>
> On Mon, Nov 3, 2014 at 9:27 AM, Devang Shah <[email protected]>
> wrote:
>
>> Thanks much for notifying.
>>
>> Would you know the bug id ? I did refer to the change log of 0.9.3 but
>> could not get hold of the bug id. Incidentally I too have raised a jira and
>> would like to close it giving reference to the previously raised jira.
>> Thanks.
>> On 31 Oct 2014 21:49, "M.Tarkeshwar Rao" <[email protected]> wrote:
>>
>>> Yes it is the bug which is raised by Denigel.fixed in 9.3.pls use it. Or
>>> use zero mq in place of netty ur problem will be resolved.
>>> On 27 Oct 2014 20:52, "Devang Shah" <[email protected]> wrote:
>>>
>>>> It seems to be a bug in storm unless someone confirms otherwise.
>>>>
>>>> How can I file a bug for storm ?
>>>>  On 25 Oct 2014 07:51, "Devang Shah" <[email protected]> wrote:
>>>>
>>>>> You are correct Taylor. Sorry missed to mention all the details.
>>>>>
>>>>> We have topology.spout.max.pending set to 1000 and we have not
>>>>> modified the topology.message.timeouts.secs (default 30 secs).
>>>>>
>>>>> Another observation,
>>>>> When I delibrately bring down the worker (kill -9) and when the worker
>>>>> is brought up on the same port it was running previously on then the storm
>>>>> starts failing all the messages despite it being successfully processed. 
>>>>> If
>>>>> the worker is brought up on different supervisor port then the issue
>>>>> doesn't seem to occur.
>>>>>
>>>>> Eg steps,
>>>>> 1. Worker running on 6703 supervisor slot(this worker runs a single
>>>>> spout instance of our topology) and everything runs fine. Messages get
>>>>> procesed and acks back to the message provider. If I let it run in this
>>>>> state it can process any number of messages.
>>>>> 2. I bring down the java process by kill -9
>>>>> 3. Supervisor brings up the worker on the same slot 6703 and also the
>>>>> spout task instance on it.
>>>>> 4. All the messages get processed fine but the ackers fail all the
>>>>> messages the topology processed after the default 30 secs timeout. This
>>>>> even happens when topology is idle and I push a single message into the
>>>>> topology. So my guess is increasing the timeout will not help (though I
>>>>> have not tried it).
>>>>> 5.If the supervisor brings up the worker on a different slot say 6700
>>>>> then the issue doen't seem to occur. Probably a bug in storm.
>>>>>
>>>>> Steps to simulate the behaviour,
>>>>> 1. Run topology(spout as single instance and multiple instances of
>>>>> bolts) with multiple workers.
>>>>> 2. Identify the slot on which the single spout instance is running and
>>>>> kill it.
>>>>> 3. See if the supervisor started the worker on the same port. If not
>>>>> then repeat step 2 untill you get supervisor on the same slot as previous
>>>>> one.
>>>>> 4. Pump in a message into the topology.
>>>>> 5. You will see message being processed successfully and also the
>>>>> ackers failing the message. This can be verified by logging statements in
>>>>> the ack and fail methods of the spout.
>>>>>
>>>>> Thanks and Regards,
>>>>> Devang
>>>>> On 25 Oct 2014 04:34, "P. Taylor Goetz" <[email protected]> wrote:
>>>>>
>>>>>> My guess is that you are getting timeouts.
>>>>>>
>>>>>> Do you have topology.spout.max.pending set? If so, what is the value.
>>>>>> Have you overridden topology.message.timeout.secs (default is 30
>>>>>> seconds)?
>>>>>>
>>>>>> Look in Storm UI for the complete latency of the topology. Is it
>>>>>> close to or greater than topology.message.timeout.secs?
>>>>>>
>>>>>>
>>>>>> -Taylor
>>>>>>
>>>>>>
>>>>>> On Oct 23, 2014, at 12:44 PM, Devang Shah <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Team,
>>>>>>
>>>>>> I am facing an issue with one of our failover tests. Storm fails all
>>>>>> the messages post worker restarts.
>>>>>>
>>>>>> Steps done,
>>>>>> 0. 1 spout, 3 bolts, 5 ackers
>>>>>> 1. Pre-load tibems with 50k messages
>>>>>> 2. Start the topology
>>>>>> 3. Let it run for brief time and the kill the worker where the spout
>>>>>> is executing (spout in our topology is a single instance)
>>>>>> 4. The worker is brought up by the supervisor automatically
>>>>>>
>>>>>> Observation/query,
>>>>>> When spout starts pumping in data again into the topology, storm
>>>>>> starts failing the messages even though they are successfully processed 
>>>>>> (I
>>>>>> have verified this as our last bolt pushes data to kafka and the
>>>>>> incoming/kafka data njmber matches). I have checked the tuple anchoring 
>>>>>> and
>>>>>> that seems to be fine as without the worker restarts the topology acks 
>>>>>> and
>>>>>> processes messages fine.
>>>>>>
>>>>>> Any thing I should check again ?
>>>>>>
>>>>>>
>>>>>>
>

Re: Storm failing all the tuples post worker restart

Reply via email to