Re: Storm failing all the tuples post worker restart

Devang Shah Thu, 06 Nov 2014 04:59:54 -0800

Thanks for your help.

Will check it on weekend.
On 5 Nov 2014 15:28, "M.Tarkeshwar Rao" <[email protected]> wrote:


> Please confirm your issue is resolved or not?
>
> On Wed, Nov 5, 2014 at 12:54 PM, M Tarkeshwar Rao <
> [email protected]> wrote:
>
>>  https://issues.apache.org/jira/browse/STORM-406
>>
>>
>>
>>
>>
>> *From:* Devang Shah [mailto:[email protected]]
>> *Sent:* 04 November 2014 10:23
>> *To:* [email protected]
>> *Subject:* Re: Storm failing all the tuples post worker restart
>>
>>
>>
>> Thanks Sean.
>>
>> We are using 0.9.2
>>
>> Have not tried the issue with 0.9.3. Will try with this and confirm back.
>>
>> Any ideas when Storm 0.9.3 will be available to be used in production
>> environments.
>>
>> Thanks and Regards,
>> Devang
>>
>> On 3 Nov 2014 11:38, "Sean Zhong" <[email protected]> wrote:
>>
>> Hi Devang,
>>
>>
>>
>> Which storm version are you using?
>>
>> You may want to check STORM-404 and STORM-329.
>>
>>
>>
>> Sean
>>
>>
>>
>>
>>
>> On Mon, Nov 3, 2014 at 9:27 AM, Devang Shah <[email protected]>
>> wrote:
>>
>> Thanks much for notifying.
>>
>> Would you know the bug id ? I did refer to the change log of 0.9.3 but
>> could not get hold of the bug id. Incidentally I too have raised a jira and
>> would like to close it giving reference to the previously raised jira.
>> Thanks.
>>
>> On 31 Oct 2014 21:49, "M.Tarkeshwar Rao" <[email protected]> wrote:
>>
>> Yes it is the bug which is raised by Denigel.fixed in 9.3.pls use it. Or
>> use zero mq in place of netty ur problem will be resolved.
>>
>> On 27 Oct 2014 20:52, "Devang Shah" <[email protected]> wrote:
>>
>> It seems to be a bug in storm unless someone confirms otherwise.
>>
>> How can I file a bug for storm ?
>>
>> On 25 Oct 2014 07:51, "Devang Shah" <[email protected]> wrote:
>>
>> You are correct Taylor. Sorry missed to mention all the details.
>>
>> We have topology.spout.max.pending set to 1000 and we have not modified
>> the topology.message.timeouts.secs (default 30 secs).
>>
>> Another observation,
>> When I delibrately bring down the worker (kill -9) and when the worker is
>> brought up on the same port it was running previously on then the storm
>> starts failing all the messages despite it being successfully processed. If
>> the worker is brought up on different supervisor port then the issue
>> doesn't seem to occur.
>>
>> Eg steps,
>> 1. Worker running on 6703 supervisor slot(this worker runs a single spout
>> instance of our topology) and everything runs fine. Messages get procesed
>> and acks back to the message provider. If I let it run in this state it can
>> process any number of messages.
>> 2. I bring down the java process by kill -9
>> 3. Supervisor brings up the worker on the same slot 6703 and also the
>> spout task instance on it.
>> 4. All the messages get processed fine but the ackers fail all the
>> messages the topology processed after the default 30 secs timeout. This
>> even happens when topology is idle and I push a single message into the
>> topology. So my guess is increasing the timeout will not help (though I
>> have not tried it).
>> 5.If the supervisor brings up the worker on a different slot say 6700
>> then the issue doen't seem to occur. Probably a bug in storm.
>>
>> Steps to simulate the behaviour,
>> 1. Run topology(spout as single instance and multiple instances of bolts)
>> with multiple workers.
>> 2. Identify the slot on which the single spout instance is running and
>> kill it.
>> 3. See if the supervisor started the worker on the same port. If not then
>> repeat step 2 untill you get supervisor on the same slot as previous one.
>> 4. Pump in a message into the topology.
>> 5. You will see message being processed successfully and also the ackers
>> failing the message. This can be verified by logging statements in the ack
>> and fail methods of the spout.
>>
>> Thanks and Regards,
>> Devang
>>
>> On 25 Oct 2014 04:34, "P. Taylor Goetz" <[email protected]> wrote:
>>
>> My guess is that you are getting timeouts.
>>
>>
>>
>> Do you have topology.spout.max.pending set? If so, what is the value.
>>
>> Have you overridden topology.message.timeout.secs (default is 30 seconds)?
>>
>>
>>
>> Look in Storm UI for the complete latency of the topology. Is it close to
>> or greater than topology.message.timeout.secs?
>>
>>
>>
>>
>>
>> -Taylor
>>
>>
>>
>>
>>
>> On Oct 23, 2014, at 12:44 PM, Devang Shah <[email protected]> wrote:
>>
>>
>>
>>  Hi Team,
>>
>> I am facing an issue with one of our failover tests. Storm fails all the
>> messages post worker restarts.
>>
>> Steps done,
>> 0. 1 spout, 3 bolts, 5 ackers
>> 1. Pre-load tibems with 50k messages
>> 2. Start the topology
>> 3. Let it run for brief time and the kill the worker where the spout is
>> executing (spout in our topology is a single instance)
>> 4. The worker is brought up by the supervisor automatically
>>
>> Observation/query,
>> When spout starts pumping in data again into the topology, storm starts
>> failing the messages even though they are successfully processed (I have
>> verified this as our last bolt pushes data to kafka and the incoming/kafka
>> data njmber matches). I have checked the tuple anchoring and that seems to
>> be fine as without the worker restarts the topology acks and processes
>> messages fine.
>>
>> Any thing I should check again ?
>>
>>
>>
>>
>>
>
>

Re: Storm failing all the tuples post worker restart

Reply via email to