Hi Harsha, I'm using three c3.4xlarge EC2 instances: 1) Nimbus, WebUI, Zookeeper, Supervisor 2) Zookeeper, Supervisor 3) Zookeeper, Supervisor
I cannot find this error message in my attached supervisor log? By the way, I'm running on Ubuntu EC2 nodes and there is no path C:\. I have not made any changes in these timeout values. Should be the default values: storm.zookeeper.session.timeout: 20000 storm.zookeeper.connection.timeout: 15000 supervisor.worker.timeout.secs: 30 Thanks! Best regards Martin 2015-02-25 17:03 GMT+01:00 Harsha <[email protected]>: > Hi Martin, > Can you share your storm.zookeeper.session.timeout and > storm.zookeeper.connection.timeout and supervisor.worker.timeout.secs. By > looking at the supervisor logs I see > Error when processing event > java.io.FileNotFoundException: File > 'c:\hdistorm\workers\f3e70029-c5c8-4f55-a4a1-396096b37509\heartbeats\1417082031858' > > you might be running into https://issues.apache.org/jira/browse/STORM-682 > Is your zookeeper cluster on a different set of nodes and can you check > you are able to connect to it without any issues > -Harsha > > > > On Wed, Feb 25, 2015, at 03:49 AM, Martin Illecker wrote: > > Hi, > > I'm still observing this strange issue. > Two of three workers stop processing after a few seconds. (each worker is > running on one dedicated EC2 node) > > My guess would be that the output stream of one spout is not properly > distributed over all three workers. > Or somehow directed to one worker only? But *shuffleGrouping* should > guarantee equal distribution among multiple bolts right? > > I'm using the following topology: > > > TopologyBuilder builder = new TopologyBuilder(); > > builder.setSpout("dataset-spout", spout); > > builder.setBolt("tokenizer-bolt", tokenizerBolt, 3).shuffleGrouping( > > "dataset-spout"); > > builder.setBolt("preprocessor-bolt", preprocessorBolt, 3).shuffleGrouping( > > "tokenizer-bolt"); > > conf.setMaxSpoutPending(2000); > > conf.setNumWorkers(3); > > StormSubmitter > > .submitTopology(TOPOLOGY_NAME, conf, builder.createTopology()); > > I have attached the screenshots of the topology and the truncated worker > and supervisor log of one idle worker. > > The supervisor log includes a few interesting lines, but I think they are > normal? > > supervisor [INFO] e76bc338-2ba5-444b-9854-bca94f9587b7 still hasn't started > > I hope, someone can help me with this issue! > > Thanks > Best regards > Martin > > > 2015-02-24 20:37 GMT+01:00 Martin Illecker <[email protected]>: > > Hi, > > I'm trying to run a topology on EC2, but I'm observing the following > strange issue: > > Some workers stop processing after a few seconds, without any error in the > worker log. > > For example, my topology consists of 3 workers and each worker is running > on its own EC2 node. > Two of them stop processing after a few seconds. But they have already > processed several tuples successfully. > > I'm using only one spout and shuffleGrouping at all bolts. > If I add more spouts then all workers keep processing, but the performance > is very bad. > > Does anyone have a guess why this happens? > > The topology is currently running at: > http://54.155.156.203:8080 > > Thanks! > > Martin > > > > > > > Email had 4 attachments: > > - topology.jpeg > 161k (image/jpeg) > - component.jpeg > 183k (image/jpeg) > - supervisor.log > 7k (application/octet-stream) > - worker.log > 37k (application/octet-stream) > > >
