Hey Everyone, I have 48 workers, 192 executors for the first bolt and 12 for a second, receiving bolt.
When I emit tuples from a bolt it is dramatically dropping performance with the number of tuples acked going from 3.8 million/minute down to 300000. In the bolt that is receiving the tuples I turned off all processing so the receiving bolt is a passthrough that simply acks the incoming tuples. Also, I tested removing the second bolt so that the first bolt processes the incoming tuples, emits and acks them. The tuples per minute numbers went back up to 3.3 million. Based upon what I've seen, I was thinking the performance issue was due to shuffling, so I switched to localOrShuffleGrouping, and that did not make a difference. I then figured that matching the number of receiving bolt executors to match the number of workers would help as this should (I think?) ensure there is local shuffling only. This appears to improve performance by 10% or so. Any ideas as to why the emit/receive appears to be decreasing throughput by over a factor of 10? Thanks --John
