I'm using Pyleus 0.2.4 but I've backported pyleus's changes [1] that add heartbeat support to my own Python code. So, my bolts should be responding to heartbeat commands.
These are logs from 2 of 6 my workers and supervisor logs: https://gist.github.com/victorpoluceno/0c296eac4f5954f7377c https://gist.github.com/victorpoluceno/50dc1f1f7782ca4a86ce https://gist.github.com/victorpoluceno/0020623046cd3807e802 Thank you for you time. [1] - https://github.com/Yelp/pyleus/commit/a929e425cd86d6f9cca87c6b5d37664184f19d99 On Wed, Mar 11, 2015 at 10:23 AM, 임정택 <[email protected]> wrote: > What version of Pyleus do you use? > AFAIK Pyleus doesn't have any release versions which support heartbeat > introduced from 0.9.3. > It's on develop branch. > > Could you check your worker / shellbolt subprocess log, and paste it if > you don't mind? > > On 2015년 3월 11일 (수) at 오전 4:39 Victor Godoy Poluceno < > [email protected]> wrote: > >> Hi, >> >> I've a log processing topology using Storm 0.9.3 with Kafka as Spout and >> bolts using ShellBolt with multilang through Pyleus (Python). >> >> Everything runs fine under normal pace (if there is no big lag to be >> consumed). But, with any offset lag higher than 100 messages per partition, >> this topology stop processing/acking tuples after processing only a few >> tuples, around 200 total. >> >> The whole processing is very CPU bound, as I need to parse, roll and >> aggregate metrics before sending time series data to Cassandra. By the way, >> we are processing big tuples, around 0.8 MB each. >> >> This topology has 1 spout, and 2 bolts, one bolt that aggregates and >> other that just commits. Those bolts are connected using a fields grouping. >> >> My storm.yml settings: >> >> worker.childopts: "-Xmx1G -Dcom.sun.management.jmxremote >> -Dcom.sun.management.jmxremote.port=1%ID% >> -Dcom.sun.management.jmxremote.authenticate=false >> -Dcom.sun.management.jmxremote.ssl=false -Djava.net.preferIPv4Stack=true" >> storm.messaging.netty.max_wait_ms: 3000 >> storm.messaging.netty.min_wait_ms: 300 >> storm.messaging.netty.max_retries: 90 >> >> storm.messaging.netty.buffer_size: 1048576 #1MB buffer >> topology.receiver.buffer.size: 2 >> topology.transfer.buffer.size: 4 >> topology.executor.receive.buffer.size: 2 >> topology.executor.send.buffer.size: 2 >> >> >> I've reduced all buffer configs because of size of my tuples, but that >> didn't show any improvements. This topology runs on 2 dedicated servers >> with 8 cores and 16 GB of RAM on each. I'm running 12 workers, with >> parallelism hint of 12 for the Kafka spout, and 12 for the aggregate bolt >> and 12 for commit bolt. >> >> Also, I've set max spout pending to 100 and max shell bolt pending to >> 1000. But changing these numbers didn't seem to have any real impact. >> >> After a lot of investigations I've restricted the problem to emitting >> tuples from the aggregate bolt to the commit bolt. If I just don't emit any >> tuples from the aggregate bolt, everything works fine. I don't known if >> this has anything to do with ShellBolt or Pyleus itself. >> >> This screenshot >> <https://s3-sa-east-1.amazonaws.com/uploads-br.hipchat.com/114883/858008/SfP6b6YsX4k8jHo/screen.jpg> >> shows this topology running during a 5 minutes window with around 150 of >> lag on each Kafka partition (Kafka with 12 partitions). >> >> Any thoughts on why this is happening? >> >> -- >> hooray! >> >> -- >> Victor Godoy Poluceno >> > -- hooray! -- Victor Godoy Poluceno
