By the way, I'm using an *unreliable* spout. Here is the source of my spout: https://github.com/millecker/storm-apps/blob/master/commons/src/at/illecker /storm/commons/spout/DatasetSpout.java
Maybe this might be the problem? 2015-02-26 18:30 GMT+01:00 Martin Illecker <[email protected]>: > Hi, > > I believe this issue belongs to Storm or EC2 because on a single node (one > worker) my topology is operating fine. > > I have tried different combinations of the following parameters: > - *shuffleGrouping* and *allGrouping* between the spout and the first > bolt > - spout parallelism from 1 to numberOfWorkers (each worker has its own > spout task) > - maxSpoutPending from 5000 down to 50 > - 1ms sleep in spout > > The issue occurs when one spout with parallelism 1 should feed multiple > workers. > For example, 5 workers including one spout with parallelism 1 and a bolt > with parallelism 5. > After a few seconds, 4 of these 5 workers become idle and only one worker > keeps processing. > This might be probably the worker including the spout task. > > If I increase the parallelism of the spout, then the performance drops > dramatically, but all workers keep working. > > There are no error messages in the worker or supervisor log. > > You've maxSpout pending set to 2k tuples do you see any where in your bolt >> code can be hanging before acking the tuple?. > > I thought I would receive an exception or a timeout if the bolt is hanging? > > Please have a look a the full source of my topology: > > https://github.com/millecker/storm-apps/blob/master/sentiment_analysis_svm/src/at/illecker/storm/sentimentanalysis/svm/SentimentAnalysisSVMTopology.java > > Thanks! > > > > > > > > > 2015-02-26 17:31 GMT+01:00 Harsha <[email protected]>: > >> Martin, >> Can't find anything wrong in the logs or in your topologyBuilder >> code. In your bolts code how are you doing the acking of the tuples. You've >> maxSpout pending set to 2k tuples do you see any where in your bolt code >> can be hanging before acking the tuple?. >> >> -Harsha >> >> On Wed, Feb 25, 2015, at 09:02 AM, Martin Illecker wrote: >> >> How can I find out why workers do not get any tuples? >> After they have successfully processed a few thousand. >> >> I have also tested the *allGrouping* to ensure that each Bolt must >> receive tuples. >> But two workers including two Bolts stop receiving tuples after a few >> seconds. >> >> I would appreciate any help! >> >> >> >> 2015-02-25 17:40 GMT+01:00 Harsha <[email protected]>: >> >> >> My bad was looking at another supervisor.log. There are no errors in >> supervisor and worker logs. >> >> >> -Harsha >> >> >> On Wed, Feb 25, 2015, at 08:29 AM, Martin Illecker wrote: >> >> Hi Harsha, >> >> I'm using three c3.4xlarge EC2 instances: >> 1) Nimbus, WebUI, Zookeeper, Supervisor >> 2) Zookeeper, Supervisor >> 3) Zookeeper, Supervisor >> >> I cannot find this error message in my attached supervisor log? >> By the way, I'm running on Ubuntu EC2 nodes and there is no path C:\. >> >> I have not made any changes in these timeout values. Should be the >> default values: >> storm.zookeeper.session.timeout: 20000 >> storm.zookeeper.connection.timeout: 15000 >> supervisor.worker.timeout.secs: 30 >> >> Thanks! >> Best regards >> Martin >> >> >> 2015-02-25 17:03 GMT+01:00 Harsha <[email protected]>: >> >> >> Hi Martin, >> Can you share your storm.zookeeper.session.timeout and >> storm.zookeeper.connection.timeout and supervisor.worker.timeout.secs. By >> looking at the supervisor logs I see >> Error when processing event >> java.io.FileNotFoundException: File >> 'c:\hdistorm\workers\f3e70029-c5c8-4f55-a4a1-396096b37509\heartbeats\1417082031858' >> >> you might be running into >> https://issues.apache.org/jira/browse/STORM-682 >> Is your zookeeper cluster on a different set of nodes and can you check >> you are able to connect to it without any issues >> -Harsha >> >> >> >> On Wed, Feb 25, 2015, at 03:49 AM, Martin Illecker wrote: >> >> Hi, >> >> I'm still observing this strange issue. >> Two of three workers stop processing after a few seconds. (each worker is >> running on one dedicated EC2 node) >> >> My guess would be that the output stream of one spout is not properly >> distributed over all three workers. >> Or somehow directed to one worker only? But *shuffleGrouping* should >> guarantee equal distribution among multiple bolts right? >> >> I'm using the following topology: >> >> >> TopologyBuilder builder = new TopologyBuilder(); >> >> builder.setSpout("dataset-spout", spout); >> >> builder.setBolt("tokenizer-bolt", tokenizerBolt, 3).shuffleGrouping( >> >> "dataset-spout"); >> >> builder.setBolt("preprocessor-bolt", preprocessorBolt, >> 3).shuffleGrouping( >> >> "tokenizer-bolt"); >> >> conf.setMaxSpoutPending(2000); >> >> conf.setNumWorkers(3); >> >> StormSubmitter >> >> .submitTopology(TOPOLOGY_NAME, conf, builder.createTopology()); >> >> I have attached the screenshots of the topology and the truncated worker >> and supervisor log of one idle worker. >> >> The supervisor log includes a few interesting lines, but I think they are >> normal? >> >> supervisor [INFO] e76bc338-2ba5-444b-9854-bca94f9587b7 still hasn't >> started >> >> I hope, someone can help me with this issue! >> >> Thanks >> Best regards >> Martin >> >> >> 2015-02-24 20:37 GMT+01:00 Martin Illecker <[email protected]>: >> >> Hi, >> >> I'm trying to run a topology on EC2, but I'm observing the following >> strange issue: >> >> Some workers stop processing after a few seconds, without any error in >> the worker log. >> >> For example, my topology consists of 3 workers and each worker is running >> on its own EC2 node. >> Two of them stop processing after a few seconds. But they have already >> processed several tuples successfully. >> >> I'm using only one spout and shuffleGrouping at all bolts. >> If I add more spouts then all workers keep processing, but the >> performance is very bad. >> >> Does anyone have a guess why this happens? >> >> The topology is currently running at: >> http://54.155.156.203:8080 >> >> Thanks! >> >> Martin >> >> >> >> >> >> >> Email had 4 attachments: >> >> - topology.jpeg >> 161k (image/jpeg) >> - component.jpeg >> 183k (image/jpeg) >> - supervisor.log >> 7k (application/octet-stream) >> - worker.log >> 37k (application/octet-stream) >> >> >> >> >> >> >> >> >> >> >> > >
