Yes it is the bug which is raised by Denigel.fixed in 9.3.pls use it. Or use zero mq in place of netty ur problem will be resolved. On 27 Oct 2014 20:52, "Devang Shah" <[email protected]> wrote:
> It seems to be a bug in storm unless someone confirms otherwise. > > How can I file a bug for storm ? > On 25 Oct 2014 07:51, "Devang Shah" <[email protected]> wrote: > >> You are correct Taylor. Sorry missed to mention all the details. >> >> We have topology.spout.max.pending set to 1000 and we have not modified >> the topology.message.timeouts.secs (default 30 secs). >> >> Another observation, >> When I delibrately bring down the worker (kill -9) and when the worker is >> brought up on the same port it was running previously on then the storm >> starts failing all the messages despite it being successfully processed. If >> the worker is brought up on different supervisor port then the issue >> doesn't seem to occur. >> >> Eg steps, >> 1. Worker running on 6703 supervisor slot(this worker runs a single spout >> instance of our topology) and everything runs fine. Messages get procesed >> and acks back to the message provider. If I let it run in this state it can >> process any number of messages. >> 2. I bring down the java process by kill -9 >> 3. Supervisor brings up the worker on the same slot 6703 and also the >> spout task instance on it. >> 4. All the messages get processed fine but the ackers fail all the >> messages the topology processed after the default 30 secs timeout. This >> even happens when topology is idle and I push a single message into the >> topology. So my guess is increasing the timeout will not help (though I >> have not tried it). >> 5.If the supervisor brings up the worker on a different slot say 6700 >> then the issue doen't seem to occur. Probably a bug in storm. >> >> Steps to simulate the behaviour, >> 1. Run topology(spout as single instance and multiple instances of bolts) >> with multiple workers. >> 2. Identify the slot on which the single spout instance is running and >> kill it. >> 3. See if the supervisor started the worker on the same port. If not then >> repeat step 2 untill you get supervisor on the same slot as previous one. >> 4. Pump in a message into the topology. >> 5. You will see message being processed successfully and also the ackers >> failing the message. This can be verified by logging statements in the ack >> and fail methods of the spout. >> >> Thanks and Regards, >> Devang >> On 25 Oct 2014 04:34, "P. Taylor Goetz" <[email protected]> wrote: >> >>> My guess is that you are getting timeouts. >>> >>> Do you have topology.spout.max.pending set? If so, what is the value. >>> Have you overridden topology.message.timeout.secs (default is 30 >>> seconds)? >>> >>> Look in Storm UI for the complete latency of the topology. Is it close >>> to or greater than topology.message.timeout.secs? >>> >>> >>> -Taylor >>> >>> >>> On Oct 23, 2014, at 12:44 PM, Devang Shah <[email protected]> >>> wrote: >>> >>> Hi Team, >>> >>> I am facing an issue with one of our failover tests. Storm fails all the >>> messages post worker restarts. >>> >>> Steps done, >>> 0. 1 spout, 3 bolts, 5 ackers >>> 1. Pre-load tibems with 50k messages >>> 2. Start the topology >>> 3. Let it run for brief time and the kill the worker where the spout is >>> executing (spout in our topology is a single instance) >>> 4. The worker is brought up by the supervisor automatically >>> >>> Observation/query, >>> When spout starts pumping in data again into the topology, storm starts >>> failing the messages even though they are successfully processed (I have >>> verified this as our last bolt pushes data to kafka and the incoming/kafka >>> data njmber matches). I have checked the tuple anchoring and that seems to >>> be fine as without the worker restarts the topology acks and processes >>> messages fine. >>> >>> Any thing I should check again ? >>> >>> >>>
