Re: worker dies after view minutes

Harsha Fri, 24 Jul 2015 19:13:12 -0700

you can try increasing the supervisor.worker.timeout.secs . At basic
level your parallelism depends on the number of cpus as you are
increasing no.of threads executing the vm. You probably want to increase
the JVM memory as well.



On Fri, Jul 24, 2015, at 05:58 AM, Eric Ruel wrote:
>


>
>
is there a limit of bolts/threads we can have within a single worker?
>
> our topology has almost 140 bolts including those created by trident,
> and if I increase the parallelism, the worker dies
>
> is it caused by the process that check if every tasks are alive, and
> it takes too much time to do the whole loop, or something like that?
>
> is there any values in storm.yaml I should modify to support that
> number of threads?
>
> *De :* Eric Ruel <[email protected]> *Envoyé :* 23 juillet
> 2015 13:57 *À :* [email protected] *Objet :* RE: worker dies after
> view minutes
>
>


>


> Originally, we had multiple topologies single worker with a
> maxSpoutpending of 3, a parallellimshint between 1 and 20 depending on
> the bolt,a batch size of 300 and a maxTask of 90


>


> by reducing the batchsize to 2 records, and saw that my workers died
> always after about ~45 seconds... so I just did many tries by changing
> values of all parameters


>


> I still don't totally understand the difference between the maxtask
> and the paralellism hint


>


>


> I only remember that originally, we preferred to reduce the
> parallellismHint to minimum to avoid the problem caused by
> https://issues.apache.org/jira/browse/STORM-503


>


> note that we use trident, so if I count all bolts I can see under the
> section "Bolt (All time)" in storm UI, I have 137 bolts (including all
> merges, joins, project...)


>


>


> Eric


>
>
>
> *De :* Harsha <[email protected]> *Envoyé :* 23 juillet 2015 11:14 *À :*
> [email protected] *Objet :* Re: worker dies after view minutes
>
> Thanks for update Eric. Could you describe how did you find that
> maxTask too high causing this issue. We are trying to improve
> debugging of storm topologies , this will be helpful for us.
> Thanks, Harhsa
>
> On Thu, Jul 23, 2015, at 07:36 AM, Eric Ruel wrote:
>>


>> finally the problem was caused by a maxTask too high
>>
>>
>>
>> *De :* Harsha <[email protected]> *Envoyé :* 22 juillet 2015 10:56 *À
>> :* [email protected] *Objet :* Re: worker dies after view minutes
>>
>> how is your topology code looks like are you throwing any errors from
>> bolt's execute method?. It does look like there is a RuntimeException
>> happening " *Error when processing event***
>> *java.lang.RuntimeException:* " Its up to the user to catch any
>> exception and log or do something with instead of throwing it back to
>> worker jvm
>>
>> -Harsha
>>
>>
>> On Wed, Jul 22, 2015, at 07:43 AM, Eric Ruel wrote:
>>> Hello


>>>


>>> the workers in my topology dies after 1,2 minutes


>>>


>>> I tried to change the config about the heartbeat, cluster or local
>>> mode, but they always die


>>>


>>> any idea?


>>>


>>> 10:38:38.019 ERROR backtype.storm.daemon.worker - Error when
>>> processing event


>>> java.lang.RuntimeException:
>>> org.apache.storm.zookeeper.KeeperException$ConnectionLossException:
>>> KeeperErrorCode = ConnectionLoss for 
>>> /workerbeats/testeric-1-1437575782/259f61ae-02a5-4a75-be50-
>>> 68f27054a7b2-1024


>>> at backtype.storm.util$wrap_in_runtime.invoke(util.clj:44) ~[storm-core-
>>> 0.9.6.jar:0.9.6]


>>> at backtype.storm.zookeeper$set_data.invoke(zookeeper.clj:173) ~[storm-core-
>>> 0.9.6.jar:0.9.6]


>>> at backtype.storm.cluster$mk_distributed_cluster_state$reify__1919.-
>>> set_data(cluster.clj:92) ~[storm-core-0.9.6.jar:0.9.6]


>>> at backtype.storm.cluster$mk_storm_cluster_state$reify__2376.worker-
>>> _heartbeat_BANG_(cluster.clj:332) ~[storm-core-0.9.6.jar:0.9.6]


>>> at sun.reflect.GeneratedMethodAccessor135.invoke(Unknown Source)
>>> ~[na:na]


>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethod-
>>> AccessorImpl.java:43) ~[na:1.7.0_71]


>>> at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]


>>> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
>>> ~[clojure-1.5.1.jar:na]


>>> at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
>>> ~[clojure-1.5.1.jar:na]


>>> at backtype.storm.daemon.worker$do_executor_heartbeats.doInvoke(wor-
>>> ker.clj:56) ~[storm-core-0.9.6.jar:0.9.6]


>>> at clojure.lang.RestFn.invoke(RestFn.java:439) ~[clojure-
>>> 1.5.1.jar:na]


>>> at backtype.storm.daemon.worker$fn__3757$exec_fn__1163__auto____375-
>>> 8$fn__3761.invoke(worker.clj:413) ~[storm-core-0.9.6.jar:0.9.6]


>>> at backtype.storm.timer$schedule_recurring$this__1704.invoke(timer.-
>>> clj:99) ~[storm-core-0.9.6.jar:0.9.6]


>>> at backtype.storm.timer$mk_timer$fn__1687$fn__1688.invoke(timer.clj-
>>> :50) ~[storm-core-0.9.6.jar:0.9.6]


>>> at backtype.storm.timer$mk_timer$fn__1687.invoke(timer.clj:42) [storm-core-
>>> 0.9.6.jar:0.9.6]


>>> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]


>>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]


>>> Caused by:
>>> org.apache.storm.zookeeper.KeeperException$ConnectionLossException:
>>> KeeperErrorCode = ConnectionLoss for 
>>> /workerbeats/testeric-1-1437575782/259f61ae-02a5-4a75-be50-
>>> 68f27054a7b2-1024


>>> at org.apache.storm.zookeeper.KeeperException.create(KeeperExceptio-
>>> n.java:99) ~[storm-core-0.9.6.jar:0.9.6]


>>> at org.apache.storm.zookeeper.KeeperException.create(KeeperExceptio-
>>> n.java:51) ~[storm-core-0.9.6.jar:0.9.6]


>>> at org.apache.storm.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
>>> ~[storm-core-0.9.6.jar:0.9.6]


>>> at org.apache.storm.curator.framework.imps.SetDataBuilderImpl$4.cal-
>>> l(SetDataBuilderImpl.java:260) ~[storm-core-0.9.6.jar:0.9.6]


>>> at org.apache.storm.curator.framework.imps.SetDataBuilderImpl$4.cal-
>>> l(SetDataBuilderImpl.java:256) ~[storm-core-0.9.6.jar:0.9.6]


>>> at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:-
>>> 107) ~[storm-core-0.9.6.jar:0.9.6]


>>> at org.apache.storm.curator.framework.imps.SetDataBuilderImpl.pathI-
>>> nForeground(SetDataBuilderImpl.java:252) ~[storm-core-
>>> 0.9.6.jar:0.9.6]


>>> at org.apache.storm.curator.framework.imps.SetDataBuilderImpl.forPa-
>>> th(SetDataBuilderImpl.java:239) ~[storm-core-0.9.6.jar:0.9.6]


>>> at org.apache.storm.curator.framework.imps.SetDataBuilderImpl.forPa-
>>> th(SetDataBuilderImpl.java:39) ~[storm-core-0.9.6.jar:0.9.6]


>>> at backtype.storm.zookeeper$set_data.invoke(zookeeper.clj:172) ~[storm-core-
>>> 0.9.6.jar:0.9.6]


>>> ... 15 common frames omitted


>>> 10:38:38.023 ERROR backtype.storm.util - Halting process: ("Error
>>> when processing an event")


>>> java.lang.RuntimeException: ("Error when processing an event")


>>> at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) 
>>> [storm-core-
>>> 0.9.6.jar:0.9.6]


>>> at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-
>>> 1.5.1.jar:na]


>>> at backtype.storm.daemon.worker$mk_halting_timer$fn__3572.invoke(wo-
>>> rker.clj:177) [storm-core-0.9.6.jar:0.9.6]


>>> at backtype.storm.timer$mk_timer$fn__1687$fn__1688.invoke(timer.clj-
>>> :68) [storm-core-0.9.6.jar:0.9.6]


>>> at backtype.storm.timer$mk_timer$fn__1687.invoke(timer.clj:42) [storm-core-
>>> 0.9.6.jar:0.9.6]


>>> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]


>>> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]


>>>
>>
>

Re: worker dies after view minutes

Reply via email to