Re: KafkaSpout stops pulling data after a few hours

Vladi Feigin Mon, 03 Nov 2014 12:42:34 -0800

Hi,

Yes, you probably fail because of timeouts.
Check that none of your APIs is not blocking , make sure you have a timeout
for all of them
Check your GC, if you have many full GCs you should increase your Java heap
Seems to me that you shouldn't put too high max pending spout.
How many spouts (executors) do you have?
Vladi




On Mon, Nov 3, 2014 at 10:20 PM, Maxime Nay <[email protected]> wrote:

> Hi Vladi,
>
> I will put log statements in each bolt.
> The processing time per tuple is high due to a third party API queried
> through http requests in one of our bolts. It can take up to 3 seconds to
> get an answer from this service.
>
> I've tried multiple values for max pending spout. 400, 800, 2000... It
> doesn't really seem to change anything. I'm also setting messageTimeoutSecs
> to 25sc.
>
> I also noticed that at some point I'm getting failed tuples, even though
> I'm never throwing any FailedException manually. So I guess the only way
> for a tuple to fail is to exceed the messageTimeoutSecs?
>
> Anyway, I restarted the topology and I will take a look at the debug
> statements when it crashes again.
>
> Thanks for your help!
>
>
> Maxime
>
> On Sat, Nov 1, 2014 at 9:49 PM, Vladi Feigin <[email protected]> wrote:
>
>> Hi
>> We have the similar problem with v. 0.82.
>> We suspect some slowest bolt in the topology hangs and this causes the
>> entire topology being hanged.
>> It can be database bolt for example.
>> Put logging in each bolt enter and exit print out the bolt name,thread id
>> and time. This will help you to find out which bolt hangs
>> Few seconds proccesing per tuple sound too long. Maybe you should to
>> profile your code as well
>> What's your max pending spout value?
>> Vladi
>>  On 31 Oct 2014 20:09, "Maxime Nay" <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> For some reason, after a few hours of processing, my topology starts
>>> hanging. In the UI's 'Topology Stats' the emitted and transferred counts
>>> are equal to 0, and I can't see anything coming out of the topology
>>> (usually inserting in some database).
>>>
>>> I can't see anything unusual in the storm workers logs, nor in kafka and
>>> zookeeper's logs.
>>> The zkCoordinator keeps refreshing, but nothing happens :
>>> 2014-10-31 17:00:13 s.k.ZkCoordinator [INFO] Task [2/2] Deleted
>>> partition managers: []
>>> 2014-10-31 17:00:13 s.k.ZkCoordinator [INFO] Task [2/2] New partition
>>> managers: []
>>> 2014-10-31 17:00:13 s.k.ZkCoordinator [INFO] Task [2/2] Finished
>>> refreshing
>>> 2014-10-31 17:00:13 s.k.DynamicBrokersReader [INFO] Read partition info
>>> from zookeeper: GlobalPartitionInformation{...
>>>
>>> I don't really understand why this is hanging, and how I could fix this.
>>>
>>>
>>> I'm using storm 0.9.2-incubating with Kafka 0.8.1.1 and storm-kafka
>>> 0.9.2-incubating.
>>>
>>> My topology pulls data from 4 different topics in Kafka, and has 9
>>> different bolts. Each bolt implements IBasicBolt. I'm not doing any acking
>>> manually (storm should take care of this for me, right?)
>>> It takes a few second for a tuple to go through the entire topology.
>>> I'm setting a MaxSpoutPending to limit the number of tuples in the
>>> topology.
>>> My tuples shouldn't exceed the max size limit (set to default on my
>>> kafka brokers and in my SpoutConfig. And I think the default is rather high
>>> and should easily handle a few lines of text)
>>> The tuples don't necessarily go to each bolt.
>>>
>>> I'm defining my spouts like this:
>>>         ZkHosts zkHosts = new ZkHosts("zk1.example.com:2181", "
>>> zk2.example.com:2181"...);
>>>         zkHosts.refreshFreqSecs = 120;
>>>
>>>         SpoutConfig kafkaConfig = new SpoutConfig(brokerHosts(),
>>>                 "TOPIC_NAME",
>>>                 "/consumers",
>>>                 "CONSUMER_ID");
>>>         kafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
>>>         KafkaSpout kafkaSpout = new KafkaSpout(kafkaConfig)
>>>
>>> I'm running this topology on 2 different workers, located on two
>>> different supervisors. In total I'm using something like 160 executors.
>>>
>>>
>>> I would greatly appreciate any help or hints on how to fix/investigate
>>> this problem!
>>>
>>> Thanks,
>>> Maxime
>>>
>>
>

Re: KafkaSpout stops pulling data after a few hours

Reply via email to