Re: Urgent - Some workers stop processing after a few seconds

Martin Illecker Wed, 25 Feb 2015 09:02:32 -0800

How can I find out why workers do not get any tuples?
After they have successfully processed a few thousand.


I have also tested the *allGrouping* to ensure that each Bolt must receive
tuples.
But two workers including two Bolts stop receiving tuples after a few
seconds.

I would appreciate any help!



2015-02-25 17:40 GMT+01:00 Harsha <[email protected]>:

>  My bad was looking at another supervisor.log.  There are no errors in
> supervisor and worker logs.
>
> -Harsha
>
> On Wed, Feb 25, 2015, at 08:29 AM, Martin Illecker wrote:
>
> Hi Harsha,
>
> I'm using three c3.4xlarge EC2 instances:
>  1) Nimbus, WebUI, Zookeeper, Supervisor
>  2) Zookeeper, Supervisor
>  3) Zookeeper, Supervisor
>
> I cannot find this error message in my attached supervisor log?
> By the way, I'm running on Ubuntu EC2 nodes and there is no path C:\.
>
> I have not made any changes in these timeout values. Should be the default
> values:
> storm.zookeeper.session.timeout: 20000
> storm.zookeeper.connection.timeout: 15000
> supervisor.worker.timeout.secs: 30
>
> Thanks!
> Best regards
> Martin
>
>
> 2015-02-25 17:03 GMT+01:00 Harsha <[email protected]>:
>
>
> Hi Martin,
>             Can you share your storm.zookeeper.session.timeout and
> storm.zookeeper.connection.timeout and supervisor.worker.timeout.secs. By
> looking at the supervisor logs I see
> Error when processing event
> java.io.FileNotFoundException: File
> 'c:\hdistorm\workers\f3e70029-c5c8-4f55-a4a1-396096b37509\heartbeats\1417082031858'
>
> you might be running into  https://issues.apache.org/jira/browse/STORM-682
> Is your zookeeper cluster on a different set of  nodes and can you check
> you are able to connect to it without any issues
> -Harsha
>
>
>
> On Wed, Feb 25, 2015, at 03:49 AM, Martin Illecker wrote:
>
> Hi,
>
> I'm still observing this strange issue.
> Two of three workers stop processing after a few seconds. (each worker is
> running on one dedicated EC2 node)
>
> My guess would be that the output stream of one spout is not properly
> distributed over all three workers.
> Or somehow directed to one worker only? But *shuffleGrouping* should
> guarantee equal distribution among multiple bolts right?
>
> I'm using the following topology:
>
>
> TopologyBuilder builder = new TopologyBuilder();
>
> builder.setSpout("dataset-spout", spout);
>
> builder.setBolt("tokenizer-bolt", tokenizerBolt, 3).shuffleGrouping(
>
> "dataset-spout");
>
> builder.setBolt("preprocessor-bolt", preprocessorBolt, 3).shuffleGrouping(
>
> "tokenizer-bolt");
>
> conf.setMaxSpoutPending(2000);
>
> conf.setNumWorkers(3);
>
>     StormSubmitter
>
>         .submitTopology(TOPOLOGY_NAME, conf, builder.createTopology());
>
> I have attached the screenshots of the topology and the truncated worker
> and supervisor log of one idle worker.
>
> The supervisor log includes a few interesting lines, but I think they are
> normal?
>
> supervisor [INFO] e76bc338-2ba5-444b-9854-bca94f9587b7 still hasn't started
>
> I hope, someone can help me with this issue!
>
> Thanks
> Best regards
> Martin
>
>
> 2015-02-24 20:37 GMT+01:00 Martin Illecker <[email protected]>:
>
> Hi,
>
> I'm trying to run a topology on EC2, but I'm observing the following
> strange issue:
>
> Some workers stop processing after a few seconds, without any error in the
> worker log.
>
> For example, my topology consists of 3 workers and each worker is running
> on its own EC2 node.
> Two of them stop processing after a few seconds. But they have already
> processed several tuples successfully.
>
> I'm using only one spout and shuffleGrouping at all bolts.
> If I add more spouts then all workers keep processing, but the performance
> is very bad.
>
> Does anyone have a guess why this happens?
>
> The topology is currently running at:
> http://54.155.156.203:8080
>
> Thanks!
>
> Martin
>
>
>
>
>
>
> Email had 4 attachments:
>
>    - topology.jpeg
>      161k (image/jpeg)
>    - component.jpeg
>      183k (image/jpeg)
>    - supervisor.log
>      7k (application/octet-stream)
>    - worker.log
>      37k (application/octet-stream)
>
>
>
>
>
>
>

Re: Urgent - Some workers stop processing after a few seconds

Reply via email to