Re: Storm 0.9.3 ShellBolt stops processing tuples after a while

Victor Godoy Poluceno Wed, 11 Mar 2015 09:03:49 -0700

I'm using Pyleus 0.2.4 but I've backported pyleus's changes [1] that add
heartbeat support to my own Python code. So, my bolts should be responding
to heartbeat commands.


These are logs from 2 of 6 my workers and supervisor logs:

https://gist.github.com/victorpoluceno/0c296eac4f5954f7377c
https://gist.github.com/victorpoluceno/50dc1f1f7782ca4a86ce
https://gist.github.com/victorpoluceno/0020623046cd3807e802

Thank you for you time.

[1] -
https://github.com/Yelp/pyleus/commit/a929e425cd86d6f9cca87c6b5d37664184f19d99

On Wed, Mar 11, 2015 at 10:23 AM, 임정택 <[email protected]> wrote:

> What version of Pyleus do you use?
> AFAIK Pyleus doesn't have any release versions which support heartbeat
> introduced from 0.9.3.
> It's on develop branch.
>
> Could you check your worker / shellbolt subprocess log, and paste it if
> you don't mind?
>
> On 2015년 3월 11일 (수) at 오전 4:39 Victor Godoy Poluceno <
> [email protected]> wrote:
>
>> Hi,
>>
>> I've a log processing topology using Storm 0.9.3 with Kafka as Spout and
>> bolts using ShellBolt with multilang through Pyleus (Python).
>>
>> Everything runs fine under normal pace (if there is no big lag to be
>> consumed). But, with any offset lag higher than 100 messages per partition,
>> this topology stop processing/acking tuples after processing only a few
>> tuples, around 200 total.
>>
>> The whole processing is very CPU bound, as I need to parse, roll and
>> aggregate metrics before sending time series data to Cassandra. By the way,
>> we are processing big tuples, around 0.8 MB each.
>>
>> This topology has 1 spout, and 2 bolts, one bolt that aggregates and
>> other that just commits. Those bolts are connected using a fields grouping.
>>
>> My storm.yml settings:
>>
>> worker.childopts: "-Xmx1G -Dcom.sun.management.jmxremote
>> -Dcom.sun.management.jmxremote.port=1%ID%
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dcom.sun.management.jmxremote.ssl=false -Djava.net.preferIPv4Stack=true"
>> storm.messaging.netty.max_wait_ms: 3000
>> storm.messaging.netty.min_wait_ms: 300
>> storm.messaging.netty.max_retries: 90
>>
>> storm.messaging.netty.buffer_size: 1048576 #1MB buffer
>> topology.receiver.buffer.size: 2
>> topology.transfer.buffer.size: 4
>> topology.executor.receive.buffer.size: 2
>> topology.executor.send.buffer.size: 2
>>
>>
>> I've reduced all buffer configs because of size of my tuples, but that
>> didn't show any improvements. This topology runs on 2 dedicated servers
>> with 8 cores and 16 GB of RAM on each. I'm running 12 workers, with
>> parallelism hint of 12 for the Kafka spout, and 12 for the aggregate bolt
>> and 12 for commit bolt.
>>
>> Also, I've set max spout pending to 100 and max shell bolt pending to
>> 1000. But changing these numbers didn't seem to have any real impact.
>>
>> After a lot of investigations I've restricted the problem to emitting
>> tuples from the aggregate bolt to the commit bolt. If I just don't emit any
>> tuples from the aggregate bolt, everything works fine. I don't known if
>> this has anything to do with ShellBolt or Pyleus itself.
>>
>> This screenshot
>> <https://s3-sa-east-1.amazonaws.com/uploads-br.hipchat.com/114883/858008/SfP6b6YsX4k8jHo/screen.jpg>
>> shows this topology running during a 5 minutes window with around 150 of
>> lag on each Kafka partition (Kafka with 12 partitions).
>>
>> Any thoughts on why this is happening?
>>
>> --
>> hooray!
>>
>> --
>> Victor Godoy Poluceno
>>
>


-- 
hooray!

--
Victor Godoy Poluceno

Re: Storm 0.9.3 ShellBolt stops processing tuples after a while

Reply via email to