My bad was looking at another supervisor.log. There are no errors in supervisor and worker logs.
-Harsha On Wed, Feb 25, 2015, at 08:29 AM, Martin Illecker wrote: > Hi Harsha, > > I'm using three c3.4xlarge EC2 instances: 1) Nimbus, WebUI, Zookeeper, > Supervisor 2) Zookeeper, Supervisor 3) Zookeeper, Supervisor > > I cannot find this error message in my attached supervisor log? By the > way, I'm running on Ubuntu EC2 nodes and there is no path C:\. > > I have not made any changes in these timeout values. Should be the > default values: storm.zookeeper.session.timeout: 20000 > storm.zookeeper.connection.timeout: 15000 > supervisor.worker.timeout.secs: 30 > > Thanks! Best regards Martin > > > 2015-02-25 17:03 GMT+01:00 Harsha <[email protected]>: >> __ >> Hi Martin, Can you share your storm.zookeeper.session.timeout and >> storm.zookeeper.connection.timeout and >> supervisor.worker.timeout.secs. By looking at the supervisor logs I >> see Error when processing event java.io.FileNotFoundException: File >> 'c:\hdistorm\workers\f3e70029-c5c8-4f55-a4a1-396096b37509\heartbeats\1417082031858' >> you might be running into >> https://issues.apache.org/jira/browse/STORM-682 Is your zookeeper >> cluster on a different set of nodes and can you check you are able to >> connect to it without any issues -Harsha >> >> >> >> On Wed, Feb 25, 2015, at 03:49 AM, Martin Illecker wrote: >>> Hi, >>> >>> I'm still observing this strange issue. Two of three workers stop >>> processing after a few seconds. (each worker is running on one >>> dedicated EC2 node) >>> >>> My guess would be that the output stream of one spout is not >>> properly distributed over all three workers. Or somehow directed to >>> one worker only? But *shuffleGrouping* should guarantee equal >>> distribution among multiple bolts right? >>> >>> I'm using the following topology: >>> >>> TopologyBuilder builder = new TopologyBuilder(); >>> builder.setSpout("dataset-spout", spout); >>> builder.setBolt("tokenizer-bolt", tokenizerBolt, 3).shuffleGrouping( >>> "dataset-spout"); >>> builder.setBolt("preprocessor-bolt", preprocessorBolt, >>> 3).shuffleGrouping( >>> "tokenizer-bolt"); >>> conf.setMaxSpoutPending(2000); >>> conf.setNumWorkers(3); >>> StormSubmitter >>> .submitTopology(TOPOLOGY_NAME, conf, builder.createTopology()); >>> >>> I have attached the screenshots of the topology and the truncated >>> worker and supervisor log of one idle worker. >>> >>> The supervisor log includes a few interesting lines, but I think >>> they are normal? supervisor [INFO] >>> e76bc338-2ba5-444b-9854-bca94f9587b7 still hasn't started >>> >>> I hope, someone can help me with this issue! >>> >>> Thanks Best regards Martin >>> >>> >>> 2015-02-24 20:37 GMT+01:00 Martin Illecker <[email protected]>: >>>> Hi, >>>> >>>> I'm trying to run a topology on EC2, but I'm observing the >>>> following strange issue: >>>> >>>> Some workers stop processing after a few seconds, without any error >>>> in the worker log. >>>> >>>> For example, my topology consists of 3 workers and each worker is >>>> running on its own EC2 node. Two of them stop processing after a >>>> few seconds. But they have already processed several tuples >>>> successfully. >>>> >>>> I'm using only one spout and shuffleGrouping at all bolts. If I add >>>> more spouts then all workers keep processing, but the performance >>>> is very bad. >>>> >>>> Does anyone have a guess why this happens? >>>> >>>> The topology is currently running at: http://54.155.156.203:8080 >>>> >>>> Thanks! >>>> >>>> Martin >>>> >>>> >>>> >>> >>> Email had 4 attachments: >>> * topology.jpeg 161k (image/jpeg) >>> * component.jpeg 183k (image/jpeg) >>> * supervisor.log 7k (application/octet-stream) >>> * worker.log 37k (application/octet-stream) >> >
