issue is https://issues.apache.org/jira/browse/STORM-406
On Tue, Nov 4, 2014 at 10:22 AM, Devang Shah <[email protected]> wrote: > Thanks Sean. > > We are using 0.9.2 > > Have not tried the issue with 0.9.3. Will try with this and confirm back. > > Any ideas when Storm 0.9.3 will be available to be used in production > environments. > > Thanks and Regards, > Devang > On 3 Nov 2014 11:38, "Sean Zhong" <[email protected]> wrote: > >> Hi Devang, >> >> Which storm version are you using? >> You may want to check STORM-404 and STORM-329. >> >> Sean >> >> >> On Mon, Nov 3, 2014 at 9:27 AM, Devang Shah <[email protected]> >> wrote: >> >>> Thanks much for notifying. >>> >>> Would you know the bug id ? I did refer to the change log of 0.9.3 but >>> could not get hold of the bug id. Incidentally I too have raised a jira and >>> would like to close it giving reference to the previously raised jira. >>> Thanks. >>> On 31 Oct 2014 21:49, "M.Tarkeshwar Rao" <[email protected]> wrote: >>> >>>> Yes it is the bug which is raised by Denigel.fixed in 9.3.pls use it. >>>> Or use zero mq in place of netty ur problem will be resolved. >>>> On 27 Oct 2014 20:52, "Devang Shah" <[email protected]> wrote: >>>> >>>>> It seems to be a bug in storm unless someone confirms otherwise. >>>>> >>>>> How can I file a bug for storm ? >>>>> On 25 Oct 2014 07:51, "Devang Shah" <[email protected]> wrote: >>>>> >>>>>> You are correct Taylor. Sorry missed to mention all the details. >>>>>> >>>>>> We have topology.spout.max.pending set to 1000 and we have not >>>>>> modified the topology.message.timeouts.secs (default 30 secs). >>>>>> >>>>>> Another observation, >>>>>> When I delibrately bring down the worker (kill -9) and when the >>>>>> worker is brought up on the same port it was running previously on then >>>>>> the >>>>>> storm starts failing all the messages despite it being successfully >>>>>> processed. If the worker is brought up on different supervisor port then >>>>>> the issue doesn't seem to occur. >>>>>> >>>>>> Eg steps, >>>>>> 1. Worker running on 6703 supervisor slot(this worker runs a single >>>>>> spout instance of our topology) and everything runs fine. Messages get >>>>>> procesed and acks back to the message provider. If I let it run in this >>>>>> state it can process any number of messages. >>>>>> 2. I bring down the java process by kill -9 >>>>>> 3. Supervisor brings up the worker on the same slot 6703 and also the >>>>>> spout task instance on it. >>>>>> 4. All the messages get processed fine but the ackers fail all the >>>>>> messages the topology processed after the default 30 secs timeout. This >>>>>> even happens when topology is idle and I push a single message into the >>>>>> topology. So my guess is increasing the timeout will not help (though I >>>>>> have not tried it). >>>>>> 5.If the supervisor brings up the worker on a different slot say 6700 >>>>>> then the issue doen't seem to occur. Probably a bug in storm. >>>>>> >>>>>> Steps to simulate the behaviour, >>>>>> 1. Run topology(spout as single instance and multiple instances of >>>>>> bolts) with multiple workers. >>>>>> 2. Identify the slot on which the single spout instance is running >>>>>> and kill it. >>>>>> 3. See if the supervisor started the worker on the same port. If not >>>>>> then repeat step 2 untill you get supervisor on the same slot as previous >>>>>> one. >>>>>> 4. Pump in a message into the topology. >>>>>> 5. You will see message being processed successfully and also the >>>>>> ackers failing the message. This can be verified by logging statements in >>>>>> the ack and fail methods of the spout. >>>>>> >>>>>> Thanks and Regards, >>>>>> Devang >>>>>> On 25 Oct 2014 04:34, "P. Taylor Goetz" <[email protected]> wrote: >>>>>> >>>>>>> My guess is that you are getting timeouts. >>>>>>> >>>>>>> Do you have topology.spout.max.pending set? If so, what is the value. >>>>>>> Have you overridden topology.message.timeout.secs (default is 30 >>>>>>> seconds)? >>>>>>> >>>>>>> Look in Storm UI for the complete latency of the topology. Is it >>>>>>> close to or greater than topology.message.timeout.secs? >>>>>>> >>>>>>> >>>>>>> -Taylor >>>>>>> >>>>>>> >>>>>>> On Oct 23, 2014, at 12:44 PM, Devang Shah <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Hi Team, >>>>>>> >>>>>>> I am facing an issue with one of our failover tests. Storm fails all >>>>>>> the messages post worker restarts. >>>>>>> >>>>>>> Steps done, >>>>>>> 0. 1 spout, 3 bolts, 5 ackers >>>>>>> 1. Pre-load tibems with 50k messages >>>>>>> 2. Start the topology >>>>>>> 3. Let it run for brief time and the kill the worker where the spout >>>>>>> is executing (spout in our topology is a single instance) >>>>>>> 4. The worker is brought up by the supervisor automatically >>>>>>> >>>>>>> Observation/query, >>>>>>> When spout starts pumping in data again into the topology, storm >>>>>>> starts failing the messages even though they are successfully processed >>>>>>> (I >>>>>>> have verified this as our last bolt pushes data to kafka and the >>>>>>> incoming/kafka data njmber matches). I have checked the tuple anchoring >>>>>>> and >>>>>>> that seems to be fine as without the worker restarts the topology acks >>>>>>> and >>>>>>> processes messages fine. >>>>>>> >>>>>>> Any thing I should check again ? >>>>>>> >>>>>>> >>>>>>> >>
