What version of Pyleus do you use?
AFAIK Pyleus doesn't have any release versions which support heartbeat
introduced from 0.9.3.
It's on develop branch.

Could you check your worker / shellbolt subprocess log, and paste it if you
don't mind?
On 2015년 3월 11일 (수) at 오전 4:39 Victor Godoy Poluceno <
[email protected]> wrote:

> Hi,
>
> I've a log processing topology using Storm 0.9.3 with Kafka as Spout and
> bolts using ShellBolt with multilang through Pyleus (Python).
>
> Everything runs fine under normal pace (if there is no big lag to be
> consumed). But, with any offset lag higher than 100 messages per partition,
> this topology stop processing/acking tuples after processing only a few
> tuples, around 200 total.
>
> The whole processing is very CPU bound, as I need to parse, roll and
> aggregate metrics before sending time series data to Cassandra. By the way,
> we are processing big tuples, around 0.8 MB each.
>
> This topology has 1 spout, and 2 bolts, one bolt that aggregates and other
> that just commits. Those bolts are connected using a fields grouping.
>
> My storm.yml settings:
>
> worker.childopts: "-Xmx1G -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.port=1%ID%
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false -Djava.net.preferIPv4Stack=true"
> storm.messaging.netty.max_wait_ms: 3000
> storm.messaging.netty.min_wait_ms: 300
> storm.messaging.netty.max_retries: 90
>
> storm.messaging.netty.buffer_size: 1048576 #1MB buffer
> topology.receiver.buffer.size: 2
> topology.transfer.buffer.size: 4
> topology.executor.receive.buffer.size: 2
> topology.executor.send.buffer.size: 2
>
>
> I've reduced all buffer configs because of size of my tuples, but that
> didn't show any improvements. This topology runs on 2 dedicated servers
> with 8 cores and 16 GB of RAM on each. I'm running 12 workers, with
> parallelism hint of 12 for the Kafka spout, and 12 for the aggregate bolt
> and 12 for commit bolt.
>
> Also, I've set max spout pending to 100 and max shell bolt pending to
> 1000. But changing these numbers didn't seem to have any real impact.
>
> After a lot of investigations I've restricted the problem to emitting
> tuples from the aggregate bolt to the commit bolt. If I just don't emit any
> tuples from the aggregate bolt, everything works fine. I don't known if
> this has anything to do with ShellBolt or Pyleus itself.
>
> This screenshot
> <https://s3-sa-east-1.amazonaws.com/uploads-br.hipchat.com/114883/858008/SfP6b6YsX4k8jHo/screen.jpg>
> shows this topology running during a 5 minutes window with around 150 of
> lag on each Kafka partition (Kafka with 12 partitions).
>
> Any thoughts on why this is happening?
>
> --
> hooray!
>
> --
> Victor Godoy Poluceno
>

Reply via email to