Thank you for taking the time to respond. In my bolt I am registering 3 custom metrics (each a ReducedMetric to track the latency of individual operations in the bolt). The metric interval for each is the same as TOPOLOGY_BUILTIN_METRICS_BUCKET_SIZE_SECS which we have set at 60s
The topology did not hang completely but it did degrade severely. Without metrics it was hard to tell but it looked like some of the tasks for certain kafka partitions either stopped emitting tuples or never got acknowledgements for the tuples they did emit. Some tuples were definitely making it through though because data was continuously being inserted in to Cassandra. After I killed and resubmitted the topology, there were still messages left over in the topic but only for certain partitions. What queue configuration are you looking for? I don't believe that the case was that the graphite metrics consumer wasn't "keeping up". In storm UI, the processing latency was very low for that pseudo-bolt, as was the capacity. Storm UI just showed that no tuples were being delivered to the bolt. Thanks! On Thu, Apr 14, 2016 at 9:00 PM, Jungtaek Lim <[email protected]> wrote: > Kevin, > > Do you register custom metrics? If then how long / vary is their intervals? > Did your topology not working completely? (I mean did all tuples become > failing after that time?) > And could you share your queue configuration? > > And you can replace storm-graphite to LoggingMetricsConsumer and see it > helps. If changing consumer resolves the issue, we can guess storm-graphite > cannot keep up the metrics. > > Btw, I'm addressing metrics consumer issues (asynchronous, filter). > You can track the progress here: > https://issues.apache.org/jira/browse/STORM-1699 > > I'm afraid they may be not ported to 0.10.x, but asynchronous metrics > consumer bolt <https://issues.apache.org/jira/browse/STORM-1698> is a > simple patch so you can apply and build custom 0.10.0, and give it a try. > > Hope this helps. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > > 2016년 4월 14일 (목) 오후 11:06, Denis DEBARBIEUX <[email protected]>님이 작성: > >> Hi Kevin, >> >> I have a similar issue with storm 0.9.6 (see the following topic >> https://mail-archives.apache.org/mod_mbox/storm-user/201603.mbox/browser >> ). >> >> It is still open. So, please, keep me informed on your progress. >> >> Denis >> >> >> Le 14/04/2016 15:54, Kevin Conaway a écrit : >> >> We are using Storm 0.10 with the following configuration: >> >> - 1 Nimbus node >> - 6 Supervisor nodes, each with 2 worker slots. Each supervisor has >> 8 cores. >> >> >> Our topology has a KafkaSpout that forwards to a bolt where we transform >> the message and insert it in to Cassandra. Our topic has 50 partitions so >> we have configured the number of executors/tasks for the KafkaSpout to be >> 50. Our bolt has 150 executors/tasks. >> >> We have also added the storm-graphite metrics consumer ( >> <https://github.com/verisign/storm-graphite> >> https://github.com/verisign/storm-graphite) to our topology so that >> storms metrics are sent to our graphite cluster. >> >> Yesterday we were running a 2000 tuple/sec load test and everything was >> fine for a few hours until we noticed that we were no longer receiving >> metrics from Storm in graphite. >> >> I verified that its not a connectivity issue between the Storm and >> Graphite. Looking in Storm UI, >> the __metricscom.verisign.storm.metrics.GraphiteMetricsConsumer hadn't >> received a single tuple in the prior 10 minute or 3 hour window. >> >> Since the metrics consumer bolt was assigned to one executor, I took >> thread dumps of that JVM. I saw the following stack trace for the metrics >> consumer thread: >> >> >> >> >> ------------------------------ >> [image: Avast logo] >> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> >> >> L'absence de virus dans ce courrier électronique a été vérifiée par le >> logiciel antivirus Avast. >> www.avast.com >> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient> >> >> -- Kevin Conaway http://www.linkedin.com/pub/kevin-conaway/7/107/580/ https://github.com/kevinconaway
