Re: why complete latency and failure rate is so high of my spout.

Nathan Leung Wed, 30 Jul 2014 09:45:41 -0700

The topology.max.spout.pending configures how many messages can be un-acked
from each spout before it stops sending messages.  So in your example, each
spout task can have 10 thousand messages waiting to be acked before it
throttle itself and stops emitting.  Of course if some of those messages
are acked, then it will be able to emit more messages.  This is important
because you do not want to have too much data pending.  If you have a lot
of data pending, then it will increase the amount of time that it takes to
process the message because the complete latency is counted starting when
you emit the tuple even if it's just waiting in the spouts output queue.
 If the message times out without getting acked at the spout, then it will
get counted as a failure, which is why you were seeing so many failures.
 Changing the timeout to 90s probably also played a big role in reducing
your failure count.


The complete latency seems kind of high, but maybe setting max spout
pending to a lower value would help reduce it.


On Wed, Jul 30, 2014 at 12:38 PM, Wei, Xin <[email protected]> wrote:

>  Hi There,
>
>  Yesterday, I changed some configurations of storm settings,  right now ,
> the spout failure rate dropped to 0.  As shown below:
>
>   Topology stats   Window Emitted Transferred Complete latency (ms) Acked
> Failed   10m 0s
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=600>
> 8766 8766 43077.391 5290 0  3h 0m 0s
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=10800>
> 8766 8766 43077.391 5290 0  1d 0h 0m 0s
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=86400>
> 8766 8766 43077.391 5290 0  All time
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=%3Aall-time>
> 8766 8766 43077.391 5290 0   Spouts (All time)   Id Executors Tasks
> Emitted Transferred Complete latency (ms) Acked Failed Last error
> JMS_QUEUE_SPOUT
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/JMS_QUEUE_SPOUT>
> 2 2 5290 5290 43077.391 5290 0    Bolts (All time)   Id Executors Tasks
> Emitted Transferred Capacity (last 10m) Execute latency (ms) Executed Process
> latency (ms) Acked Failed Last error   AGGREGATOR_BOLT
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/AGGREGATOR_BOLT>
> 8 8 1738 1738 0.080 83.264 1738 81.243 1738 0   MESSAGEFILTER_BOLT
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/MESSAGEFILTER_BOLT>
> 8 8 1738 1738 0.091 29.833 5290 24.918 5290 0   OFFER_GENERATOR_BOLT
> <http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/OFFER_GENERATOR_BOLT>
> 8 8 0 0 0.031 25.993 1738 24.296 1738 0
>   The topology configuration is listed below:
>
>   Topology Configuration   Key Value   dev.zookeeper.path
> /tmp/dev-storm-zookeeper  drpc.childopts -Xmx768m  drpc.invocations.port
> 3773  drpc.port 3772  drpc.queue.size 128  drpc.request.timeout.secs 600
> drpc.worker.threads 64  java.library.path /usr/local/lib
> logviewer.appender.name A1  logviewer.childopts -Xmx128m  logviewer.port
> 8000  nimbus.childopts -Xmx1024m -Djava.net.preferIPv4Stack=true
> nimbus.cleanup.inbox.freq.secs 600  nimbus.file.copy.expiration.secs 600
> nimbus.host zookeeper  nimbus.inbox.jar.expiration.secs 3600
> nimbus.monitor.freq.secs 10  nimbus.reassign true
> nimbus.supervisor.timeout.secs 60  nimbus.task.launch.secs 120
> nimbus.task.timeout.secs 30  nimbus.thrift.port 6627
> nimbus.topology.validator backtype.storm.nimbus.DefaultTopologyValidator
> storm.cluster.mode distributed  storm.id nearline-3-1406737061
> storm.local.dir /app_local/storm  storm.local.mode.zmq false
> storm.messaging.netty.buffer_size 5242880
> storm.messaging.netty.client_worker_threads 1
> storm.messaging.netty.max_retries 30  storm.messaging.netty.max_wait_ms
> 1000  storm.messaging.netty.min_wait_ms 100
> storm.messaging.netty.server_worker_threads 1  storm.messaging.transport
> backtype.storm.messaging.zmq  storm.thrift.transport
> backtype.storm.security.auth.SimpleTransportPlugin
> storm.zookeeper.connection.timeout 15000  storm.zookeeper.port 2181
> storm.zookeeper.retry.interval 1000
> storm.zookeeper.retry.intervalceiling.millis 30000
> storm.zookeeper.retry.times 5  storm.zookeeper.root /storm
> storm.zookeeper.servers ["zookeeper"]  storm.zookeeper.session.timeout
> 20000  supervisor.childopts -Xmx256m -Djava.net.preferIPv4Stack=true
> supervisor.enable true  supervisor.heartbeat.frequency.secs 5
> supervisor.monitor.frequency.secs 3  supervisor.slots.ports [6700 6701
> 6702 6703]  supervisor.worker.start.timeout.secs 120
> supervisor.worker.timeout.secs 30  task.heartbeat.frequency.secs 3
> task.refresh.poll.secs 10  topology.acker.executors 4
> topology.builtin.metrics.bucket.size.secs 60  topology.debug false
> topology.disruptor.wait.strategy com.lmax.disruptor.BlockingWaitStrategy
> topology.enable.message.timeouts true
> topology.error.throttle.interval.secs 10
> topology.executor.receive.buffer.size 16384
> topology.executor.send.buffer.size 16384
> topology.fall.back.on.java.serialization true  topology.kryo.decorators []
> topology.kryo.factory backtype.storm.serialization.DefaultKryoFactory
> topology.kryo.register   topology.max.error.report.per.interval 5
> topology.max.spout.pending 10000  topology.max.task.parallelism
> topology.message.timeout.secs 90  topology.name nearline
> topology.optimize true  topology.receiver.buffer.size 8
> topology.skip.missing.kryo.registrations false
> topology.sleep.spout.wait.strategy.time.ms 1  topology.spout.wait.strategy
> backtype.storm.spout.SleepSpoutWaitStrategy
> topology.state.synchronization.timeout.secs 60  topology.stats.sample.rate
> 1  topology.tasks   topology.tick.tuple.freq.secs
> topology.transfer.buffer.size 32
> topology.trident.batch.emit.interval.millis 500  topology.tuple.serializer
> backtype.storm.serialization.types.ListDelegateSerializer
> topology.worker.childopts   topology.worker.shared.thread.pool.size 4
> topology.workers 4  transactional.zookeeper.port
> transactional.zookeeper.root /transactional
> transactional.zookeeper.servers   ui.childopts -Xmx768m  ui.port 8080
> worker.childopts -Xmx768m -Djava.net.preferIPv4Stack=false
> -DNEARLINE_DATA_ENV=dev -DNEARLINE_APP_ENV=dev -DNEARLINE_QUEUES_ENV=dev
> -Dauthfilter.appcred.default.encrypt.file=/home/xwei/FP_AppCred_Encrypt.txt
> -Dauthfilter.appcred.default.passphrase.file=/home/xwei/FP_AppCred_Passphrase.txt
> worker.heartbeat.frequency.secs 1  zmq.hwm 0  zmq.linger.millis 5000
> zmq.threads 1
>  The settiings I changed:
> 1.  topology.acker.executors    I adjust it to 4.
> 2. Topology.max.spout.pending    change it to 10000
> 3. topology.message.timeout.secs   change it from 30 to 90 secs
>
>  I think the NO 2 topology.max.spout.pending is the critical factor which
> make big differences. Can anybody tell me what that setting does?
>
>
>  Thanks a lot for help.
>
>
>
>

Re: why complete latency and failure rate is so high of my spout.

Reply via email to