Hi There, Yesterday, I changed some configurations of storm settings, right now , the spout failure rate dropped to 0. As shown below:
Topology stats Window Emitted Transferred Complete latency (ms) Acked Failed 10m 0s<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=600> 8766 8766 43077.391 5290 0 3h 0m 0s<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=10800> 8766 8766 43077.391 5290 0 1d 0h 0m 0s<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=86400> 8766 8766 43077.391 5290 0 All time<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=%3Aall-time> 8766 8766 43077.391 5290 0 Spouts (All time) Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Last error JMS_QUEUE_SPOUT<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/JMS_QUEUE_SPOUT> 2 2 5290 5290 43077.391 5290 0 Bolts (All time) Id Executors Tasks Emitted Transferred Capacity (last 10m) Execute latency (ms) Executed Process latency (ms) Acked Failed Last error AGGREGATOR_BOLT<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/AGGREGATOR_BOLT> 8 8 1738 1738 0.080 83.264 1738 81.243 1738 0 MESSAGEFILTER_BOLT<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/MESSAGEFILTER_BOLT> 8 8 1738 1738 0.091 29.833 5290 24.918 5290 0 OFFER_GENERATOR_BOLT<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/OFFER_GENERATOR_BOLT> 8 8 0 0 0.031 25.993 1738 24.296 1738 0 The topology configuration is listed below: Topology Configuration Key Value dev.zookeeper.path /tmp/dev-storm-zookeeper drpc.childopts -Xmx768m drpc.invocations.port 3773 drpc.port 3772 drpc.queue.size 128 drpc.request.timeout.secs 600 drpc.worker.threads 64 java.library.path /usr/local/lib logviewer.appender.name A1 logviewer.childopts -Xmx128m logviewer.port 8000 nimbus.childopts -Xmx1024m -Djava.net.preferIPv4Stack=true nimbus.cleanup.inbox.freq.secs 600 nimbus.file.copy.expiration.secs 600 nimbus.host zookeeper nimbus.inbox.jar.expiration.secs 3600 nimbus.monitor.freq.secs 10 nimbus.reassign true nimbus.supervisor.timeout.secs 60 nimbus.task.launch.secs 120 nimbus.task.timeout.secs 30 nimbus.thrift.port 6627 nimbus.topology.validator backtype.storm.nimbus.DefaultTopologyValidator storm.cluster.mode distributed storm.id nearline-3-1406737061 storm.local.dir /app_local/storm storm.local.mode.zmq false storm.messaging.netty.buffer_size 5242880 storm.messaging.netty.client_worker_threads 1 storm.messaging.netty.max_retries 30 storm.messaging.netty.max_wait_ms 1000 storm.messaging.netty.min_wait_ms 100 storm.messaging.netty.server_worker_threads 1 storm.messaging.transport backtype.storm.messaging.zmq storm.thrift.transport backtype.storm.security.auth.SimpleTransportPlugin storm.zookeeper.connection.timeout 15000 storm.zookeeper.port 2181 storm.zookeeper.retry.interval 1000 storm.zookeeper.retry.intervalceiling.millis 30000 storm.zookeeper.retry.times 5 storm.zookeeper.root /storm storm.zookeeper.servers ["zookeeper"] storm.zookeeper.session.timeout 20000 supervisor.childopts -Xmx256m -Djava.net.preferIPv4Stack=true supervisor.enable true supervisor.heartbeat.frequency.secs 5 supervisor.monitor.frequency.secs 3 supervisor.slots.ports [6700 6701 6702 6703] supervisor.worker.start.timeout.secs 120 supervisor.worker.timeout.secs 30 task.heartbeat.frequency.secs 3 task.refresh.poll.secs 10 topology.acker.executors 4 topology.builtin.metrics.bucket.size.secs 60 topology.debug false topology.disruptor.wait.strategy com.lmax.disruptor.BlockingWaitStrategy topology.enable.message.timeouts true topology.error.throttle.interval.secs 10 topology.executor.receive.buffer.size 16384 topology.executor.send.buffer.size 16384 topology.fall.back.on.java.serialization true topology.kryo.decorators [] topology.kryo.factory backtype.storm.serialization.DefaultKryoFactory topology.kryo.register topology.max.error.report.per.interval 5 topology.max.spout.pending 10000 topology.max.task.parallelism topology.message.timeout.secs 90 topology.name nearline topology.optimize true topology.receiver.buffer.size 8 topology.skip.missing.kryo.registrations false topology.sleep.spout.wait.strategy.time.ms 1 topology.spout.wait.strategy backtype.storm.spout.SleepSpoutWaitStrategy topology.state.synchronization.timeout.secs 60 topology.stats.sample.rate 1 topology.tasks topology.tick.tuple.freq.secs topology.transfer.buffer.size 32 topology.trident.batch.emit.interval.millis 500 topology.tuple.serializer backtype.storm.serialization.types.ListDelegateSerializer topology.worker.childopts topology.worker.shared.thread.pool.size 4 topology.workers 4 transactional.zookeeper.port transactional.zookeeper.root /transactional transactional.zookeeper.servers ui.childopts -Xmx768m ui.port 8080 worker.childopts -Xmx768m -Djava.net.preferIPv4Stack=false -DNEARLINE_DATA_ENV=dev -DNEARLINE_APP_ENV=dev -DNEARLINE_QUEUES_ENV=dev -Dauthfilter.appcred.default.encrypt.file=/home/xwei/FP_AppCred_Encrypt.txt -Dauthfilter.appcred.default.passphrase.file=/home/xwei/FP_AppCred_Passphrase.txt worker.heartbeat.frequency.secs 1 zmq.hwm 0 zmq.linger.millis 5000 zmq.threads 1 The settiings I changed: 1. topology.acker.executors I adjust it to 4. 2. Topology.max.spout.pending change it to 10000 3. topology.message.timeout.secs change it from 30 to 90 secs I think the NO 2 topology.max.spout.pending is the critical factor which make big differences. Can anybody tell me what that setting does? Thanks a lot for help.
