This can happen if your spout output queue is not empty. The complete latency includes the time that the tuple spends in the output queue waiting to be sent to the first bolt. You can reduce the latency by reducing max spout pending but if the value is too low you will experience reduced throughput. On Nov 12, 2014 5:38 PM, "Vladi Feigin" <[email protected]> wrote:
> It can a network issue (high latency). > Check your network configuration( NIC ) > Vladi > > On Thu, Nov 13, 2014 at 12:25 AM, Hines, Sohail <[email protected]> > wrote: > >> We have a 2-node Storm cluster in production (v0.9.0.1). The condition >> we're noticing is that Storm Complete Latency varies throughout the day >> from *300ms to 1**2**00ms*. However during the same period, the sum of >> all Bolt Execute Latency times only add up to about ** *50ms ***. The >> same is true for the sum of all of the Bolt Process Latency times. See >> snapshot of this illustrated below (monospaced font works best), along with >> the full storm configuration. >> >> >> We've tried doubling the topology workers and ackers, as well as >> increasing tasks/parallelism for all bolts. Lastly we also tried to >> increase the topology.executor send/receive/transfer buffer sizes (as >> included below). >> >> >> However we're unable to account for most of the topology latency >> as reported under Complete Latency. Should the sum of the bolt >> latencies closely match the topology's Complete Latency? If so, what >> tuning parameters could we change to help achieve this? >> >> *Topology stats (Storm UI)* >> >> Window Emitted Transferred Complete latency >> (ms) Acked Failed >> 10m 0s 4502820 5912040 *1009.565* >> 260600 0 >> 3h 0m 0s 71909220 94401060 731.852 >> 4466280 0 >> 1d 0h 0m 0s 461365780 604827120 486.754 >> 31079420 0 >> All time 911575300 1194802200 483.995 >> 61884080 0 >> >> *Bolts (All time)* >> >> Id Executors Tasks Emitted Transferred >> Capacity Execute latency Executed Process latency (ms) >> writerBolt 12 12 0 0 >> 0.479 *1.246* 283204540 1.244 >> udsEnhancement 400 400 283242820 566485640 >> 0.872 *47.420* 283204720 47.397 >> transformBolt 8 8 283168440 283168440 >> 0.772 *1.085* 283205380 1.085 >> realtimePublisher 8 8 0 0 >> 0.533 *0.834* 283204360 0.831 >> parserBolt 20 20 283254740 283238820 >> 0.745 *8.392* 61884380 8.409 >> >> *Full Configuration Settings:* >> >> medio.storm.debug false >> medio.storm.events.max_future_allowed_hours 24 >> medio.storm.events.max_past_allowed_hours 720 >> medio.storm.max_spout_pending 100 >> medio.storm.max_task_parallelism 5 >> medio.storm.message_timeout_seconds 180 >> medio.storm.num_ackers 4 >> medio.storm.num_workers 4 >> medio.storm.topology_name avalanche-core >> medio.topology.connect_kafka_logging_bolt false >> medio.topology.emit_mock_kafka_requests false >> medio.topology.version 2.4.6 >> nimbus.childopts -Xmx1024m >> nimbus.cleanup.inbox.freq.secs 600 >> nimbus.file.copy.expiration.secs 600 >> nimbus.host m3web01-sef.prod.msrch >> nimbus.inbox.jar.expiration.secs 3600 >> nimbus.monitor.freq.secs 10 >> nimbus.reassign true >> nimbus.supervisor.timeout.secs 60 >> nimbus.task.launch.secs 120 >> nimbus.task.timeout.secs 30 >> nimbus.thrift.port 6627 >> nimbus.topology.validator backtype.storm.nimbus.DefaultTopologyValidator >> storm.cluster.mode distributed >> storm.id avalanche-core-3-1415647968 >> storm.local.dir /workplace/storm/local >> storm.local.mode.zmq false >> storm.messaging.netty.buffer_size 5242880 >> storm.messaging.netty.client_worker_threads 1 >> storm.messaging.netty.max_retries 30 >> storm.messaging.netty.max_wait_ms 1000 >> storm.messaging.netty.min_wait_ms 100 >> storm.messaging.netty.server_worker_threads 1 >> storm.messaging.transport backtype.storm.messaging.zmq >> storm.thrift.transport backtype.storm.security.auth.SimpleTransportPlugin >> storm.zookeeper.connection.timeout 15000 >> storm.zookeeper.port 2181 >> storm.zookeeper.retry.interval 1000 >> storm.zookeeper.retry.intervalceiling.millis 30000 >> storm.zookeeper.retry.times 5 >> storm.zookeeper.root /storm >> storm.zookeeper.servers ["zk01-sef" "zk02-sef" "zk03-sef"] >> storm.zookeeper.session.timeout 20000 >> supervisor.childopts -Xmx256m >> supervisor.enable true >> supervisor.heartbeat.frequency.secs 5 >> supervisor.monitor.frequency.secs 3 >> supervisor.slots.ports [6700 6701 6702 6703] >> supervisor.worker.start.timeout.secs 120 >> supervisor.worker.timeout.secs 30 >> task.heartbeat.frequency.secs 3 >> task.refresh.poll.secs 10 >> topology.acker.executors 4 >> topology.builtin.metrics.bucket.size.secs 60 >> topology.debug false >> topology.disruptor.wait.strategy com.lmax.disruptor.BlockingWaitStrategy >> topology.enable.message.timeouts true >> topology.error.throttle.interval.secs 10 >> topology.executor.receive.buffer.size 16384 >> topology.executor.send.buffer.size 16384 >> topology.fall.back.on.java.serialization false >> topology.kryo.decorators [] >> topology.kryo.factory backtype.storm.serialization.DefaultKryoFactory >> topology.kryo.register >> {"com.medio.services.avalanche.model.impl.RoutableEventImpl" >> "com.medio.services.avalanche.storm.common.serializers.RoutableEventSerializer"} >> topology.max.error.report.per.interval 5 >> topology.max.spout.pending 100 >> topology.max.task.parallelism >> topology.message.timeout.secs 180 >> topology.metrics.consumer.register [{"argument" nil, "parallelism.hint" >> 2, "class" >> "com.medio.services.tempest.client.storm.OpenTSDBMetricConsumer"}] >> topology.name avalanche-core >> topology.optimize true >> topology.receiver.buffer.size 8 >> topology.skip.missing.kryo.registrations false >> topology.sleep.spout.wait.strategy.time.ms 1 >> topology.spout.wait.strategy backtype.storm.spout.SleepSpoutWaitStrategy >> topology.state.synchronization.timeout.secs 60 >> topology.stats.sample.rate 0.05 >> topology.tasks >> topology.tick.tuple.freq.secs >> topology.transfer.buffer.size 32 >> topology.trident.batch.emit.interval.millis 500 >> topology.tuple.serializer >> backtype.storm.serialization.types.ListDelegateSerializer >> topology.worker.childopts >> topology.worker.shared.thread.pool.size 4 >> topology.workers 4 >> >> medio.bolt.event_parser.num_tasks 20 >> medio.bolt.event_parser.parallelism_hint 20 >> medio.bolt.event_transform.enabled true >> medio.bolt.event_transform.num_tasks 8 >> medio.bolt.event_transform.parallelism_hint 8 >> medio.bolt.event_transform.refresh_interval 60000 >> medio.bolt.event_transform.root_directory >> /usr/medio/avalanche-storm-transforms/ >> medio.bolt.event_writer.num_tasks 12 >> medio.bolt.event_writer.parallelism_hint 12 >> medio.bolt.event_writer.upload_notification.batch_expire_ms 5000 >> medio.bolt.event_writer.upload_notification.batch_size 200 >> medio.bolt.event_writer.upload_notification.compression_codec 0 >> medio.bolt.event_writer.upload_notification.max_queue_size 5000 >> medio.bolt.event_writer.upload_notification.topic avalancheFileUploads >> medio.bolt.event_writer.upload_notification.zk_connect >> medio.bolt.kafka_logging.num_tasks 1 >> medio.bolt.kafka_logging.parallelism_hint 1 >> medio.bolt.location.blacklisted_events >> medio.bolt.location.enabled false >> medio.bolt.location.magellan.host 10.10.32.249:8904 >> medio.bolt.location.magellan.timeout_ms 5000 >> medio.bolt.location.num_tasks 20 >> medio.bolt.location.parallelism_hint 20 >> medio.bolt.location.whitelisted_events >> medio.bolt.realtime_publisher.enabled true >> medio.bolt.realtime_publisher.kafka.batch_expire_ms 5000 >> medio.bolt.realtime_publisher.kafka.batch_size 2000 >> medio.bolt.realtime_publisher.kafka.compression_codec 1 >> medio.bolt.realtime_publisher.kafka.max_queue_size 100000 >> medio.bolt.realtime_publisher.kafka.topic realtimeEvents >> medio.bolt.realtime_publisher.kafka.zk_connect >> zk01-sef,zk02-sef,zk03-sef:2181/kafka/prod_cluster >> medio.bolt.realtime_publisher.num_tasks 8 >> medio.bolt.realtime_publisher.parallelism_hint 8 >> medio.bolt.realtime_publisher.rabbit.enabled false >> medio.bolt.realtime_publisher.rabbit.exchange_name realtimeEvents >> medio.bolt.realtime_publisher.rabbit.host dlc02-sea >> medio.bolt.realtime_publisher.whitelisted_events *:*:* >> >> medio.bolt.uds_enhancement.ccp.service.root >> http://ccp.medio.com/ccp-ws/v1/ >> medio.bolt.uds_enhancement.ccpclient.connectionTimeout 3000 >> medio.bolt.uds_enhancement.enabled true >> medio.bolt.uds_enhancement.insert_medio_id.default_uds_mapping >> anon_id->anon-id >> : user_id -> customer-id : device_id -> device-id : MSISDN -> msisdn >> medio.bolt.uds_enhancement.num_tasks 400 >> medio.bolt.uds_enhancement.parallelism_hint 400 >> medio.bolt.uds_enhancement.uss.service.root http://sefuss:8422/uss-ws/v1/ >> medio.bolt.uds_enhancement.whitelisted_events *:*: >> medio.botl.uds_enhancement.ccpclient.root http://ccp.medio.com/ccp-ws/v1/ >> >> >> >> >
