What version of Pyleus do you use? AFAIK Pyleus doesn't have any release versions which support heartbeat introduced from 0.9.3. It's on develop branch.
Could you check your worker / shellbolt subprocess log, and paste it if you don't mind? On 2015년 3월 11일 (수) at 오전 4:39 Victor Godoy Poluceno < [email protected]> wrote: > Hi, > > I've a log processing topology using Storm 0.9.3 with Kafka as Spout and > bolts using ShellBolt with multilang through Pyleus (Python). > > Everything runs fine under normal pace (if there is no big lag to be > consumed). But, with any offset lag higher than 100 messages per partition, > this topology stop processing/acking tuples after processing only a few > tuples, around 200 total. > > The whole processing is very CPU bound, as I need to parse, roll and > aggregate metrics before sending time series data to Cassandra. By the way, > we are processing big tuples, around 0.8 MB each. > > This topology has 1 spout, and 2 bolts, one bolt that aggregates and other > that just commits. Those bolts are connected using a fields grouping. > > My storm.yml settings: > > worker.childopts: "-Xmx1G -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.port=1%ID% > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false -Djava.net.preferIPv4Stack=true" > storm.messaging.netty.max_wait_ms: 3000 > storm.messaging.netty.min_wait_ms: 300 > storm.messaging.netty.max_retries: 90 > > storm.messaging.netty.buffer_size: 1048576 #1MB buffer > topology.receiver.buffer.size: 2 > topology.transfer.buffer.size: 4 > topology.executor.receive.buffer.size: 2 > topology.executor.send.buffer.size: 2 > > > I've reduced all buffer configs because of size of my tuples, but that > didn't show any improvements. This topology runs on 2 dedicated servers > with 8 cores and 16 GB of RAM on each. I'm running 12 workers, with > parallelism hint of 12 for the Kafka spout, and 12 for the aggregate bolt > and 12 for commit bolt. > > Also, I've set max spout pending to 100 and max shell bolt pending to > 1000. But changing these numbers didn't seem to have any real impact. > > After a lot of investigations I've restricted the problem to emitting > tuples from the aggregate bolt to the commit bolt. If I just don't emit any > tuples from the aggregate bolt, everything works fine. I don't known if > this has anything to do with ShellBolt or Pyleus itself. > > This screenshot > <https://s3-sa-east-1.amazonaws.com/uploads-br.hipchat.com/114883/858008/SfP6b6YsX4k8jHo/screen.jpg> > shows this topology running during a 5 minutes window with around 150 of > lag on each Kafka partition (Kafka with 12 partitions). > > Any thoughts on why this is happening? > > -- > hooray! > > -- > Victor Godoy Poluceno >
