Thanks for your help. Will check it on weekend. On 5 Nov 2014 15:28, "M.Tarkeshwar Rao" <[email protected]> wrote:
> Please confirm your issue is resolved or not? > > On Wed, Nov 5, 2014 at 12:54 PM, M Tarkeshwar Rao < > [email protected]> wrote: > >> https://issues.apache.org/jira/browse/STORM-406 >> >> >> >> >> >> *From:* Devang Shah [mailto:[email protected]] >> *Sent:* 04 November 2014 10:23 >> *To:* [email protected] >> *Subject:* Re: Storm failing all the tuples post worker restart >> >> >> >> Thanks Sean. >> >> We are using 0.9.2 >> >> Have not tried the issue with 0.9.3. Will try with this and confirm back. >> >> Any ideas when Storm 0.9.3 will be available to be used in production >> environments. >> >> Thanks and Regards, >> Devang >> >> On 3 Nov 2014 11:38, "Sean Zhong" <[email protected]> wrote: >> >> Hi Devang, >> >> >> >> Which storm version are you using? >> >> You may want to check STORM-404 and STORM-329. >> >> >> >> Sean >> >> >> >> >> >> On Mon, Nov 3, 2014 at 9:27 AM, Devang Shah <[email protected]> >> wrote: >> >> Thanks much for notifying. >> >> Would you know the bug id ? I did refer to the change log of 0.9.3 but >> could not get hold of the bug id. Incidentally I too have raised a jira and >> would like to close it giving reference to the previously raised jira. >> Thanks. >> >> On 31 Oct 2014 21:49, "M.Tarkeshwar Rao" <[email protected]> wrote: >> >> Yes it is the bug which is raised by Denigel.fixed in 9.3.pls use it. Or >> use zero mq in place of netty ur problem will be resolved. >> >> On 27 Oct 2014 20:52, "Devang Shah" <[email protected]> wrote: >> >> It seems to be a bug in storm unless someone confirms otherwise. >> >> How can I file a bug for storm ? >> >> On 25 Oct 2014 07:51, "Devang Shah" <[email protected]> wrote: >> >> You are correct Taylor. Sorry missed to mention all the details. >> >> We have topology.spout.max.pending set to 1000 and we have not modified >> the topology.message.timeouts.secs (default 30 secs). >> >> Another observation, >> When I delibrately bring down the worker (kill -9) and when the worker is >> brought up on the same port it was running previously on then the storm >> starts failing all the messages despite it being successfully processed. If >> the worker is brought up on different supervisor port then the issue >> doesn't seem to occur. >> >> Eg steps, >> 1. Worker running on 6703 supervisor slot(this worker runs a single spout >> instance of our topology) and everything runs fine. Messages get procesed >> and acks back to the message provider. If I let it run in this state it can >> process any number of messages. >> 2. I bring down the java process by kill -9 >> 3. Supervisor brings up the worker on the same slot 6703 and also the >> spout task instance on it. >> 4. All the messages get processed fine but the ackers fail all the >> messages the topology processed after the default 30 secs timeout. This >> even happens when topology is idle and I push a single message into the >> topology. So my guess is increasing the timeout will not help (though I >> have not tried it). >> 5.If the supervisor brings up the worker on a different slot say 6700 >> then the issue doen't seem to occur. Probably a bug in storm. >> >> Steps to simulate the behaviour, >> 1. Run topology(spout as single instance and multiple instances of bolts) >> with multiple workers. >> 2. Identify the slot on which the single spout instance is running and >> kill it. >> 3. See if the supervisor started the worker on the same port. If not then >> repeat step 2 untill you get supervisor on the same slot as previous one. >> 4. Pump in a message into the topology. >> 5. You will see message being processed successfully and also the ackers >> failing the message. This can be verified by logging statements in the ack >> and fail methods of the spout. >> >> Thanks and Regards, >> Devang >> >> On 25 Oct 2014 04:34, "P. Taylor Goetz" <[email protected]> wrote: >> >> My guess is that you are getting timeouts. >> >> >> >> Do you have topology.spout.max.pending set? If so, what is the value. >> >> Have you overridden topology.message.timeout.secs (default is 30 seconds)? >> >> >> >> Look in Storm UI for the complete latency of the topology. Is it close to >> or greater than topology.message.timeout.secs? >> >> >> >> >> >> -Taylor >> >> >> >> >> >> On Oct 23, 2014, at 12:44 PM, Devang Shah <[email protected]> wrote: >> >> >> >> Hi Team, >> >> I am facing an issue with one of our failover tests. Storm fails all the >> messages post worker restarts. >> >> Steps done, >> 0. 1 spout, 3 bolts, 5 ackers >> 1. Pre-load tibems with 50k messages >> 2. Start the topology >> 3. Let it run for brief time and the kill the worker where the spout is >> executing (spout in our topology is a single instance) >> 4. The worker is brought up by the supervisor automatically >> >> Observation/query, >> When spout starts pumping in data again into the topology, storm starts >> failing the messages even though they are successfully processed (I have >> verified this as our last bolt pushes data to kafka and the incoming/kafka >> data njmber matches). I have checked the tuple anchoring and that seems to >> be fine as without the worker restarts the topology acks and processes >> messages fine. >> >> Any thing I should check again ? >> >> >> >> >> > >
