Hi There,

Yesterday, I changed some configurations of storm settings,  right now , the 
spout failure rate dropped to 0.  As shown below:

Topology stats
Window  Emitted Transferred     Complete latency (ms)   Acked   Failed
10m 
0s<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=600>
      8766    8766    43077.391       5290    0
3h 0m 
0s<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=10800>
  8766    8766    43077.391       5290    0
1d 0h 0m 
0s<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=86400>
       8766    8766    43077.391       5290    0
All 
time<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061?window=%3Aall-time>
    8766    8766    43077.391       5290    0
Spouts (All time)
Id      Executors       Tasks   Emitted Transferred     Complete latency (ms)   
Acked   Failed  Last error
JMS_QUEUE_SPOUT<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/JMS_QUEUE_SPOUT>
      2       2       5290    5290    43077.391       5290    0
Bolts (All time)
Id      Executors       Tasks   Emitted Transferred     Capacity (last 10m)     
Execute latency (ms)    Executed        Process latency (ms)    Acked   Failed  
Last error
AGGREGATOR_BOLT<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/AGGREGATOR_BOLT>
      8       8       1738    1738    0.080   83.264  1738    81.243  1738    0
MESSAGEFILTER_BOLT<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/MESSAGEFILTER_BOLT>
        8       8       1738    1738    0.091   29.833  5290    24.918  5290    0
OFFER_GENERATOR_BOLT<http://pppdc9prd470.corp.intuit.net:8080/topology/nearline-3-1406737061/component/OFFER_GENERATOR_BOLT>
    8       8       0       0       0.031   25.993  1738    24.296  1738    0

The topology configuration is listed below:

Topology Configuration
Key     Value
dev.zookeeper.path      /tmp/dev-storm-zookeeper
drpc.childopts  -Xmx768m
drpc.invocations.port   3773
drpc.port       3772
drpc.queue.size 128
drpc.request.timeout.secs       600
drpc.worker.threads     64
java.library.path       /usr/local/lib
logviewer.appender.name A1
logviewer.childopts     -Xmx128m
logviewer.port  8000
nimbus.childopts        -Xmx1024m -Djava.net.preferIPv4Stack=true
nimbus.cleanup.inbox.freq.secs  600
nimbus.file.copy.expiration.secs        600
nimbus.host     zookeeper
nimbus.inbox.jar.expiration.secs        3600
nimbus.monitor.freq.secs        10
nimbus.reassign true
nimbus.supervisor.timeout.secs  60
nimbus.task.launch.secs 120
nimbus.task.timeout.secs        30
nimbus.thrift.port      6627
nimbus.topology.validator       backtype.storm.nimbus.DefaultTopologyValidator
storm.cluster.mode      distributed
storm.id        nearline-3-1406737061
storm.local.dir /app_local/storm
storm.local.mode.zmq    false
storm.messaging.netty.buffer_size       5242880
storm.messaging.netty.client_worker_threads     1
storm.messaging.netty.max_retries       30
storm.messaging.netty.max_wait_ms       1000
storm.messaging.netty.min_wait_ms       100
storm.messaging.netty.server_worker_threads     1
storm.messaging.transport       backtype.storm.messaging.zmq
storm.thrift.transport  backtype.storm.security.auth.SimpleTransportPlugin
storm.zookeeper.connection.timeout      15000
storm.zookeeper.port    2181
storm.zookeeper.retry.interval  1000
storm.zookeeper.retry.intervalceiling.millis    30000
storm.zookeeper.retry.times     5
storm.zookeeper.root    /storm
storm.zookeeper.servers ["zookeeper"]
storm.zookeeper.session.timeout 20000
supervisor.childopts    -Xmx256m -Djava.net.preferIPv4Stack=true
supervisor.enable       true
supervisor.heartbeat.frequency.secs     5
supervisor.monitor.frequency.secs       3
supervisor.slots.ports  [6700 6701 6702 6703]
supervisor.worker.start.timeout.secs    120
supervisor.worker.timeout.secs  30
task.heartbeat.frequency.secs   3
task.refresh.poll.secs  10
topology.acker.executors        4
topology.builtin.metrics.bucket.size.secs       60
topology.debug  false
topology.disruptor.wait.strategy        com.lmax.disruptor.BlockingWaitStrategy
topology.enable.message.timeouts        true
topology.error.throttle.interval.secs   10
topology.executor.receive.buffer.size   16384
topology.executor.send.buffer.size      16384
topology.fall.back.on.java.serialization        true
topology.kryo.decorators        []
topology.kryo.factory   backtype.storm.serialization.DefaultKryoFactory
topology.kryo.register
topology.max.error.report.per.interval  5
topology.max.spout.pending      10000
topology.max.task.parallelism
topology.message.timeout.secs   90
topology.name   nearline
topology.optimize       true
topology.receiver.buffer.size   8
topology.skip.missing.kryo.registrations        false
topology.sleep.spout.wait.strategy.time.ms      1
topology.spout.wait.strategy    backtype.storm.spout.SleepSpoutWaitStrategy
topology.state.synchronization.timeout.secs     60
topology.stats.sample.rate      1
topology.tasks
topology.tick.tuple.freq.secs
topology.transfer.buffer.size   32
topology.trident.batch.emit.interval.millis     500
topology.tuple.serializer       
backtype.storm.serialization.types.ListDelegateSerializer
topology.worker.childopts
topology.worker.shared.thread.pool.size 4
topology.workers        4
transactional.zookeeper.port
transactional.zookeeper.root    /transactional
transactional.zookeeper.servers
ui.childopts    -Xmx768m
ui.port 8080
worker.childopts        -Xmx768m -Djava.net.preferIPv4Stack=false 
-DNEARLINE_DATA_ENV=dev -DNEARLINE_APP_ENV=dev -DNEARLINE_QUEUES_ENV=dev 
-Dauthfilter.appcred.default.encrypt.file=/home/xwei/FP_AppCred_Encrypt.txt 
-Dauthfilter.appcred.default.passphrase.file=/home/xwei/FP_AppCred_Passphrase.txt
worker.heartbeat.frequency.secs 1
zmq.hwm 0
zmq.linger.millis       5000
zmq.threads     1
The settiings I changed:
1.  topology.acker.executors    I adjust it to 4.
2. Topology.max.spout.pending    change it to 10000
3. topology.message.timeout.secs   change it from 30 to 90 secs

I think the NO 2 topology.max.spout.pending is the critical factor which make 
big differences. Can anybody tell me what that setting does?


Thanks a lot for help.



Reply via email to