Hi Devang, Which storm version are you using? You may want to check STORM-404 and STORM-329.
Sean On Mon, Nov 3, 2014 at 9:27 AM, Devang Shah <[email protected]> wrote: > Thanks much for notifying. > > Would you know the bug id ? I did refer to the change log of 0.9.3 but > could not get hold of the bug id. Incidentally I too have raised a jira and > would like to close it giving reference to the previously raised jira. > Thanks. > On 31 Oct 2014 21:49, "M.Tarkeshwar Rao" <[email protected]> wrote: > >> Yes it is the bug which is raised by Denigel.fixed in 9.3.pls use it. Or >> use zero mq in place of netty ur problem will be resolved. >> On 27 Oct 2014 20:52, "Devang Shah" <[email protected]> wrote: >> >>> It seems to be a bug in storm unless someone confirms otherwise. >>> >>> How can I file a bug for storm ? >>> On 25 Oct 2014 07:51, "Devang Shah" <[email protected]> wrote: >>> >>>> You are correct Taylor. Sorry missed to mention all the details. >>>> >>>> We have topology.spout.max.pending set to 1000 and we have not modified >>>> the topology.message.timeouts.secs (default 30 secs). >>>> >>>> Another observation, >>>> When I delibrately bring down the worker (kill -9) and when the worker >>>> is brought up on the same port it was running previously on then the storm >>>> starts failing all the messages despite it being successfully processed. If >>>> the worker is brought up on different supervisor port then the issue >>>> doesn't seem to occur. >>>> >>>> Eg steps, >>>> 1. Worker running on 6703 supervisor slot(this worker runs a single >>>> spout instance of our topology) and everything runs fine. Messages get >>>> procesed and acks back to the message provider. If I let it run in this >>>> state it can process any number of messages. >>>> 2. I bring down the java process by kill -9 >>>> 3. Supervisor brings up the worker on the same slot 6703 and also the >>>> spout task instance on it. >>>> 4. All the messages get processed fine but the ackers fail all the >>>> messages the topology processed after the default 30 secs timeout. This >>>> even happens when topology is idle and I push a single message into the >>>> topology. So my guess is increasing the timeout will not help (though I >>>> have not tried it). >>>> 5.If the supervisor brings up the worker on a different slot say 6700 >>>> then the issue doen't seem to occur. Probably a bug in storm. >>>> >>>> Steps to simulate the behaviour, >>>> 1. Run topology(spout as single instance and multiple instances of >>>> bolts) with multiple workers. >>>> 2. Identify the slot on which the single spout instance is running and >>>> kill it. >>>> 3. See if the supervisor started the worker on the same port. If not >>>> then repeat step 2 untill you get supervisor on the same slot as previous >>>> one. >>>> 4. Pump in a message into the topology. >>>> 5. You will see message being processed successfully and also the >>>> ackers failing the message. This can be verified by logging statements in >>>> the ack and fail methods of the spout. >>>> >>>> Thanks and Regards, >>>> Devang >>>> On 25 Oct 2014 04:34, "P. Taylor Goetz" <[email protected]> wrote: >>>> >>>>> My guess is that you are getting timeouts. >>>>> >>>>> Do you have topology.spout.max.pending set? If so, what is the value. >>>>> Have you overridden topology.message.timeout.secs (default is 30 >>>>> seconds)? >>>>> >>>>> Look in Storm UI for the complete latency of the topology. Is it close >>>>> to or greater than topology.message.timeout.secs? >>>>> >>>>> >>>>> -Taylor >>>>> >>>>> >>>>> On Oct 23, 2014, at 12:44 PM, Devang Shah <[email protected]> >>>>> wrote: >>>>> >>>>> Hi Team, >>>>> >>>>> I am facing an issue with one of our failover tests. Storm fails all >>>>> the messages post worker restarts. >>>>> >>>>> Steps done, >>>>> 0. 1 spout, 3 bolts, 5 ackers >>>>> 1. Pre-load tibems with 50k messages >>>>> 2. Start the topology >>>>> 3. Let it run for brief time and the kill the worker where the spout >>>>> is executing (spout in our topology is a single instance) >>>>> 4. The worker is brought up by the supervisor automatically >>>>> >>>>> Observation/query, >>>>> When spout starts pumping in data again into the topology, storm >>>>> starts failing the messages even though they are successfully processed (I >>>>> have verified this as our last bolt pushes data to kafka and the >>>>> incoming/kafka data njmber matches). I have checked the tuple anchoring >>>>> and >>>>> that seems to be fine as without the worker restarts the topology acks and >>>>> processes messages fine. >>>>> >>>>> Any thing I should check again ? >>>>> >>>>> >>>>>
