RE: Bolt slow down the emit speed of spout
Yes, the parallelism of reads from a Kafka topic is limited by number of partitions. There are other things you can tweak with kafkaspout parameters to increase batch read sizes, etc that might help each individual kafkaspout's performance. -Chien From: Chen Junfeng [mailto:k-2f...@hotmail.com] Sent: Tuesday, October 18, 2016 8:25 PM To: Jungtaek Lim ; user@storm.apache.org Subject: 答复: Bolt slow down the emit speed of spout So the only method to improve kafkaspout performance is to increase the partition count of specific topic to increase the parallelism of kafkaspout? 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 发件人: Jungtaek Lim<mailto:kabh...@gmail.com> 发送时间: 2016年10月19日 9:48 收件人: Chen Junfeng<mailto:k-2f...@hotmail.com>; Chien Le<mailto:chien...@ds-iq.com>; user@storm.apache.org<mailto:user@storm.apache.org> 主题: Re: Bolt slow down the emit speed of spout You need to make it clear that you're using multiple 'topics' or multiple 'partitions'. There's no way to read partition in parallel, and it's a design of Kafka. If you mean topic, you can increase parallelism of Spout to same as partition count. 2016년 10월 19일 (수) 오전 10:42, Chen Junfeng mailto:k-2f...@hotmail.com>>님이 작성: Thanks for your reply. Maybe I am close to the reason. I found the cpu usage of spout worker is really very high, about 200% to 400% via ‘top’ command. But KafkaSpout with Storm seems not support using multi spouts instance to read one topic. 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 发件人: Jungtaek Lim<mailto:kabh...@gmail.com> 发送时间: 2016年10月19日 9:18 收件人: Chien Le<mailto:chien...@ds-iq.com>; user@storm.apache.org<mailto:user@storm.apache.org> 主题: Re: Bolt slow down the emit speed of spout Could you change your bolt to no-op but ack and run again? It makes your benchmark more clearer. And if you pick your number from the first 10 minutes, could you pick your number from another 10 minutes? It gets rid of worker / executor deploy / startup time, and warm-up of JVM. - Jungtaek Lim (HeartSaVioR) On Wed, 19 Oct 2016 at 10:00 AM Chen Junfeng mailto:k-2f...@hotmail.com>> wrote: I used ‘dstat -a’ which measures cpu, memory, disk and network. Disk should not be the bottleneck as my topology doesn’t have much disk operation. Log.info print tuple to disk every 1000 tuples. 发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 发件人: Chien Le<mailto:chien...@ds-iq.com> 发送时间: 2016年10月19日 5:39 收件人: user@storm.apache.org<mailto:user@storm.apache.org> 主题: RE: Bolt slow down the emit speed of spout dstat -d is measuring your disk throughput, yes? Did you mean dstat -n? Otherwise, dstat -d showing 300MB/s doesn't eliminate disk as your bottleneck. I would try using iostat -x 1 and see what your %util and svc times are to truly eliminate disk as the bottleneck. -Chien From: Chen Junfeng [mailto:k-2f...@hotmail.com<mailto:k-2f...@hotmail.com>] Sent: Monday, October 17, 2016 11:30 PM To: user@storm.apache.org<mailto:user@storm.apache.org> Subject: Bolt slow down the emit speed of spout I found a tough but interesting issue about Storm 1.0.2 to share with you. Firstly I introduce my system structure. My topology has 471 workers, 6 kafka topic and some processor bolt. Backpressure has been disabled in topology by conf.put(Config.TOPOLOGY_BACKPRESSURE_ENABLE, false); If I submit KafkaSpout only, the statistic data from Storm UI is as below: Topology stats Window Emitted Transferred Complete latency (ms) Acked Failed 10m 0s 875161429 0 0 875141691 3h 0m 0s 927117120 0 0 927117320 1d 0h 0m 0s 927117120 0 0 927117320 All time 927117120 0 0 927117320 Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Spout_JSCTGW_144 2 2 128701261 0 0 128702858 0 Spout_JSCTGW_145 2 2 162347454 0 0 162347639 0 Spout_JSCTGW_146 2 2 135598494 0 0 135598608 0 Spout_JSCTGW_147 2 2 128307822 0 0 128306102 0 Spout_JSCTGW_148 2 2 160369513 0 0 160360423 0 Spout_JSCTGW_149 2 2 159836885 0 0 159826061 0 In 10 minutes time, 6 kafkaSpout read more than 800 million tuple from kafka cluster. Then I submit the topology with a simple PrintBolt, the 10min stat data is as following: Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Error Host Error Port Spout_JSCTGW_144 2 2 113599789 113599789 0 113513630 0 Spout_JSCTGW_145 2 2 122501522 122501575 0 122659659 0 Spout_JSCTGW_146 2 2 91308915 91308915 0 91308652 0 Spout_JSCTGW_147 2 2 105029568 105029568 0 104968275 0 Spout_JSCTGW_148 2 2 115889172 115889178 0 115890165 0 Spout_JSCTGW_149 2 2 115185591 115185591 0 115185638 0 Showing 1 to 6 of 6 entries 6635260
Re: Bolt slow down the emit speed of spout
You need to make it clear that you're using multiple 'topics' or multiple 'partitions'. There's no way to read partition in parallel, and it's a design of Kafka. If you mean topic, you can increase parallelism of Spout to same as partition count. 2016년 10월 19일 (수) 오전 10:42, Chen Junfeng 님이 작성: > Thanks for your reply. > > > > Maybe I am close to the reason. I found the cpu usage of spout worker is > really very high, about 200% to 400% via ‘top’ command. But KafkaSpout with > Storm seems not support using multi spouts instance to read one topic. > > > > 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用 > > > > *发件人: *Jungtaek Lim > *发送时间: *2016年10月19日 9:18 > *收件人: *Chien Le ; user@storm.apache.org > *主题: *Re: Bolt slow down the emit speed of spout > > > Could you change your bolt to no-op but ack and run again? It makes your > benchmark more clearer. > > And if you pick your number from the first 10 minutes, could you pick your > number from another 10 minutes? It gets rid of worker / executor deploy / > startup time, and warm-up of JVM. > > - Jungtaek Lim (HeartSaVioR) > On Wed, 19 Oct 2016 at 10:00 AM Chen Junfeng wrote: > > I used ‘dstat -a’ which measures cpu, memory, disk and network. Disk > should not be the bottleneck as my topology doesn’t have much disk > operation. Log.info print tuple to disk every 1000 tuples. > > > > 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用 > > > > *发件人: *Chien Le > *发送时间: *2016年10月19日 5:39 > *收件人: *user@storm.apache.org > *主题: *RE: Bolt slow down the emit speed of spout > > > > dstat -d is measuring your disk throughput, yes? Did you mean dstat -n? > > > > Otherwise, dstat -d showing 300MB/s doesn't eliminate disk as your > bottleneck. I would try using iostat -x 1 and see what your %util and svc > times are to truly eliminate disk as the bottleneck. > > > > -Chien > > > > *From:* Chen Junfeng [mailto:k-2f...@hotmail.com] > *Sent:* Monday, October 17, 2016 11:30 PM > *To:* user@storm.apache.org > *Subject:* Bolt slow down the emit speed of spout > > > > > > I found a tough but interesting issue about Storm 1.0.2 to share with you. > > Firstly I introduce my system structure. My topology has 471 workers, 6 > kafka topic and some processor bolt. Backpressure has been disabled in > topology by > > conf.put(Config.*TOPOLOGY_BACKPRESSURE_ENABLE*, false); > > > > If I submit KafkaSpout only, the statistic data from Storm UI is as below: > > Topology stats > > Window > > Emitted > > Transferred > > Complete latency (ms) > > Acked > > Failed > > 10m 0s > > 875161429 > > 0 > > 0 > > 875141691 > > 3h 0m 0s > > 927117120 > > 0 > > 0 > > 927117320 > > 1d 0h 0m 0s > > 927117120 > > 0 > > 0 > > 927117320 > > All time > > 927117120 > > 0 > > 0 > > 927117320 > > Spouts (10m 0s) > > Search: > > Id > > Executors > > Tasks > > Emitted > > Transferred > > Complete latency (ms) > > Acked > > Failed > > Spout_JSCTGW_144 > > 2 > > 2 > > 128701261 > > 0 > > 0 > > 128702858 > > 0 > > Spout_JSCTGW_145 > > 2 > > 2 > > 162347454 > > 0 > > 0 > > 162347639 > > 0 > > Spout_JSCTGW_146 > > 2 > > 2 > > 135598494 > > 0 > > 0 > > 135598608 > > 0 > > Spout_JSCTGW_147 > > 2 > > 2 > > 128307822 > > 0 > > 0 > > 128306102 > > 0 > > Spout_JSCTGW_148 > > 2 > > 2 > > 160369513 > > 0 > > 0 > > 160360423 > > 0 > > Spout_JSCTGW_149 > > 2 > > 2 > > 159836885 > > 0 > > 0 > > 159826061 > > 0 > > > > > > In 10 minutes time, 6 kafkaSpout read more than 800 million tuple from > kafka cluster. > > Then I submit the topology with a simple PrintBolt, the 10min stat data is > as following: > > Spouts (10m 0s) > > Search: > > Id > > Executors > > Tasks > > Emitted > > Transferred > > Complete latency (ms) > > Acked > > Failed > > Error Host > > Error Port > > Spout_JSCTGW_144 > > 2 > > 2 > > 113599789 > > 113599789 > > 0 > > 113513630 > > 0 > > Spout_JSCTGW_145 > > 2 > > 2 > > 122501522 > > 122501575 > > 0 > > 122659659 > > 0 > > Spout_JSCTGW_146 > > 2 > > 2 > > 91308915
Re: Bolt slow down the emit speed of spout
Could you change your bolt to no-op but ack and run again? It makes your benchmark more clearer. And if you pick your number from the first 10 minutes, could you pick your number from another 10 minutes? It gets rid of worker / executor deploy / startup time, and warm-up of JVM. - Jungtaek Lim (HeartSaVioR) On Wed, 19 Oct 2016 at 10:00 AM Chen Junfeng wrote: > I used ‘dstat -a’ which measures cpu, memory, disk and network. Disk > should not be the bottleneck as my topology doesn’t have much disk > operation. Log.info print tuple to disk every 1000 tuples. > > > > 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用 > > > > *发件人: *Chien Le > *发送时间: *2016年10月19日 5:39 > *收件人: *user@storm.apache.org > *主题: *RE: Bolt slow down the emit speed of spout > > > > dstat -d is measuring your disk throughput, yes? Did you mean dstat -n? > > > > Otherwise, dstat -d showing 300MB/s doesn't eliminate disk as your > bottleneck. I would try using iostat -x 1 and see what your %util and svc > times are to truly eliminate disk as the bottleneck. > > > > -Chien > > > > *From:* Chen Junfeng [mailto:k-2f...@hotmail.com] > *Sent:* Monday, October 17, 2016 11:30 PM > *To:* user@storm.apache.org > *Subject:* Bolt slow down the emit speed of spout > > > > > > I found a tough but interesting issue about Storm 1.0.2 to share with you. > > Firstly I introduce my system structure. My topology has 471 workers, 6 > kafka topic and some processor bolt. Backpressure has been disabled in > topology by > > conf.put(Config.*TOPOLOGY_BACKPRESSURE_ENABLE*, false); > > > > If I submit KafkaSpout only, the statistic data from Storm UI is as below: > > Topology stats > > Window > > Emitted > > Transferred > > Complete latency (ms) > > Acked > > Failed > > 10m 0s > > 875161429 > > 0 > > 0 > > 875141691 > > 3h 0m 0s > > 927117120 > > 0 > > 0 > > 927117320 > > 1d 0h 0m 0s > > 927117120 > > 0 > > 0 > > 927117320 > > All time > > 927117120 > > 0 > > 0 > > 927117320 > > Spouts (10m 0s) > > Search: > > Id > > Executors > > Tasks > > Emitted > > Transferred > > Complete latency (ms) > > Acked > > Failed > > Spout_JSCTGW_144 > > 2 > > 2 > > 128701261 > > 0 > > 0 > > 128702858 > > 0 > > Spout_JSCTGW_145 > > 2 > > 2 > > 162347454 > > 0 > > 0 > > 162347639 > > 0 > > Spout_JSCTGW_146 > > 2 > > 2 > > 135598494 > > 0 > > 0 > > 135598608 > > 0 > > Spout_JSCTGW_147 > > 2 > > 2 > > 128307822 > > 0 > > 0 > > 128306102 > > 0 > > Spout_JSCTGW_148 > > 2 > > 2 > > 160369513 > > 0 > > 0 > > 160360423 > > 0 > > Spout_JSCTGW_149 > > 2 > > 2 > > 159836885 > > 0 > > 0 > > 159826061 > > 0 > > > > > > In 10 minutes time, 6 kafkaSpout read more than 800 million tuple from > kafka cluster. > > Then I submit the topology with a simple PrintBolt, the 10min stat data is > as following: > > Spouts (10m 0s) > > Search: > > Id > > Executors > > Tasks > > Emitted > > Transferred > > Complete latency (ms) > > Acked > > Failed > > Error Host > > Error Port > > Spout_JSCTGW_144 > > 2 > > 2 > > 113599789 > > 113599789 > > 0 > > 113513630 > > 0 > > Spout_JSCTGW_145 > > 2 > > 2 > > 122501522 > > 122501575 > > 0 > > 122659659 > > 0 > > Spout_JSCTGW_146 > > 2 > > 2 > > 91308915 > > 91308915 > > 0 > > 91308652 > > 0 > > Spout_JSCTGW_147 > > 2 > > 2 > > 105029568 > > 105029568 > > 0 > > 104968275 > > 0 > > Spout_JSCTGW_148 > > 2 > > 2 > > 115889172 > > 115889178 > > 0 > > 115890165 > > 0 > > Spout_JSCTGW_149 > > 2 > > 2 > > 115185591 > > 115185591 > > 0 > > 115185638 > > 0 > > Showing 1 to 6 of 6 entries > > *663526019* > > Bolts (10m 0s) > > Search: > > Id > > Executors > > Tasks > > Emitted > > Transferred > > Capacity (last 10m) > > Execute latency (ms) > > Executed > > Process latency (ms) > > Acked > > PrintBolt > > 240 > > 240 > > 0 > > 0 > > 0.241 > > 0.041 > > 665063824 > > 0 > &
RE: Bolt slow down the emit speed of spout
dstat -d is measuring your disk throughput, yes? Did you mean dstat -n? Otherwise, dstat -d showing 300MB/s doesn't eliminate disk as your bottleneck. I would try using iostat -x 1 and see what your %util and svc times are to truly eliminate disk as the bottleneck. -Chien From: Chen Junfeng [mailto:k-2f...@hotmail.com] Sent: Monday, October 17, 2016 11:30 PM To: user@storm.apache.org Subject: Bolt slow down the emit speed of spout I found a tough but interesting issue about Storm 1.0.2 to share with you. Firstly I introduce my system structure. My topology has 471 workers, 6 kafka topic and some processor bolt. Backpressure has been disabled in topology by conf.put(Config.TOPOLOGY_BACKPRESSURE_ENABLE, false); If I submit KafkaSpout only, the statistic data from Storm UI is as below: Topology stats Window Emitted Transferred Complete latency (ms) Acked Failed 10m 0s 875161429 0 0 875141691 3h 0m 0s 927117120 0 0 927117320 1d 0h 0m 0s 927117120 0 0 927117320 All time 927117120 0 0 927117320 Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Spout_JSCTGW_144 2 2 128701261 0 0 128702858 0 Spout_JSCTGW_145 2 2 162347454 0 0 162347639 0 Spout_JSCTGW_146 2 2 135598494 0 0 135598608 0 Spout_JSCTGW_147 2 2 128307822 0 0 128306102 0 Spout_JSCTGW_148 2 2 160369513 0 0 160360423 0 Spout_JSCTGW_149 2 2 159836885 0 0 159826061 0 In 10 minutes time, 6 kafkaSpout read more than 800 million tuple from kafka cluster. Then I submit the topology with a simple PrintBolt, the 10min stat data is as following: Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Error Host Error Port Spout_JSCTGW_144 2 2 113599789 113599789 0 113513630 0 Spout_JSCTGW_145 2 2 122501522 122501575 0 122659659 0 Spout_JSCTGW_146 2 2 91308915 91308915 0 91308652 0 Spout_JSCTGW_147 2 2 105029568 105029568 0 104968275 0 Spout_JSCTGW_148 2 2 115889172 115889178 0 115890165 0 Spout_JSCTGW_149 2 2 115185591 115185591 0 115185638 0 Showing 1 to 6 of 6 entries 663526019 Bolts (10m 0s) Search: Id Executors Tasks Emitted Transferred Capacity (last 10m) Execute latency (ms) Executed Process latency (ms) Acked PrintBolt 240 240 0 0 0.241 0.041 665063824 0 665,036,902 The printbolt contains nothing but a simple log.info method, but the total number of emitted tuple of spout decrease to about 600 million. All of machines have 32 cores but the system average load never exceed 15. Also I have checked the network load. My network adapter has 10Gb bandwidth and the dstat -d command shows the max rec or send speed is 300MB/s at most , which doesn’t reach the limit of network adapter. The total number of executors is less than that of total workers, so performance should not be problem. Then I do some more test, Only one kafkaspout reading on topic Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Spout_JSCTGW_145 2 2 148918096 0 0 148917805 0 One kafkaspout and one data processing bolt Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Spout_JSCTGW_145 2 2 106481751 106481751 0 106367801 0 Showing 1 to 1 of 1 entries Bolts (10m 0s) Search: Id Executors Tasks Emitted Transferred Capacity (last 10m) Execute latency (ms) Executed AtomPluginBolt 1000 1000 132752203 0 0.019 0.067 106519630 Even I only read one kafka topic, the emit speed is also slowed down. Does anyone meet the same problem?
Re: Bolt slow down the emit speed of spout
Thanks for sharing your benchmark. Hortonworks does similar benchmark with 1.0.1, so I guess you might also want to read this page, too. http://hortonworks.com/blog/microbenchmarking-storm-1-0-performance/ The cost of routing and transferring tuple, and handling ACK is not free (some works are held in the same thread which emits the tuple), so what you're seeing is natural. Normally this doesn't make problems since it's already fast enough to meet the requirement, but it heavily affects the result of micro benchmark. As you may see from the page, from v1.0.1 performance is greatly improved. I think there're still more rooms to improve performance, but what area Storm is really lacking is easier usage. Storm doesn't have high-level API (Storm has Trident but it forces users to live with micro-batch.) and SQL support (recently it has been trends of streaming frameworks). Ideally if we can address both of things together it should be great, but if we can't, IMHO, improving usage is a top priority. Thanks, Jungtaek Lim (HeartSaVioR) 2016년 10월 18일 (화) 오후 3:30, Chen Junfeng 님이 작성: I found a tough but interesting issue about Storm 1.0.2 to share with you. Firstly I introduce my system structure. My topology has 471 workers, 6 kafka topic and some processor bolt. Backpressure has been disabled in topology by conf.put(Config.*TOPOLOGY_BACKPRESSURE_ENABLE*, false); If I submit KafkaSpout only, the statistic data from Storm UI is as below: Topology stats Window Emitted Transferred Complete latency (ms) Acked Failed 10m 0s 875161429 0 0 875141691 3h 0m 0s 927117120 0 0 927117320 1d 0h 0m 0s 927117120 0 0 927117320 All time 927117120 0 0 927117320 Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Spout_JSCTGW_144 2 2 128701261 0 0 128702858 0 Spout_JSCTGW_145 2 2 162347454 0 0 162347639 0 Spout_JSCTGW_146 2 2 135598494 0 0 135598608 0 Spout_JSCTGW_147 2 2 128307822 0 0 128306102 0 Spout_JSCTGW_148 2 2 160369513 0 0 160360423 0 Spout_JSCTGW_149 2 2 159836885 0 0 159826061 0 In 10 minutes time, 6 kafkaSpout read more than 800 million tuple from kafka cluster. Then I submit the topology with a simple PrintBolt, the 10min stat data is as following: Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Error Host Error Port Spout_JSCTGW_144 2 2 113599789 113599789 0 113513630 0 Spout_JSCTGW_145 2 2 122501522 122501575 0 122659659 0 Spout_JSCTGW_146 2 2 91308915 91308915 0 91308652 0 Spout_JSCTGW_147 2 2 105029568 105029568 0 104968275 0 Spout_JSCTGW_148 2 2 115889172 115889178 0 115890165 0 Spout_JSCTGW_149 2 2 115185591 115185591 0 115185638 0 Showing 1 to 6 of 6 entries *663526019* Bolts (10m 0s) Search: Id Executors Tasks Emitted Transferred Capacity (last 10m) Execute latency (ms) Executed Process latency (ms) Acked PrintBolt 240 240 0 0 0.241 0.041 665063824 0 665,036,902 The printbolt contains nothing but a simple log.info method, *but the total number of emitted tuple of spout decrease to about 600 million*. All of machines have 32 cores but the system average load never exceed 15. Also I have checked the network load. My network adapter has 10Gb bandwidth and the dstat -d command shows the max rec or send speed is 300MB/s at most , which doesn’t reach the limit of network adapter. The total number of executors is less than that of total workers, so performance should not be problem. Then I do some more test, Only one kafkaspout reading on topic Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Spout_JSCTGW_145 2 2 148918096 0 0 148917805 0 One kafkaspout and one data processing bolt Spouts (10m 0s) Search: Id Executors Tasks Emitted Transferred Complete latency (ms) Acked Failed Spout_JSCTGW_145 2 2 106481751 106481751 0 106367801 0 Showing 1 to 1 of 1 entries Bolts (10m 0s) Search: Id Executors Tasks Emitted Transferred Capacity (last 10m) Execute latency (ms) Executed AtomPluginBolt 1000 1000 132752203 0 0.019 0.067 106519630 Even I only read one kafka topic, the emit speed is also slowed down. Does anyone meet the same problem?