Hey Javier, Sorry, just to clarify, when I have 50 workers I configure the topology to have 100 Bolt A executors and 5 Bolt B executors, and when I have 120 workers (the max I've gotten on our cluster), the topology is configured to have 200 Bolt A executors and 100 Bolt B executors for the best configuration(s). Yes, we actually have 300 nodes in our Mesos cluster, but it is a multi-tenant environment so, again, I can get 120 workers somewhat reliably, but no more.
Definitely a great suggestion regarding getting one worker per node. We are using a Storm on Mesos setup and I configure each worker to have 7 GB of RAM and 7 CPU cores. As a consequence, I get one worker per machine since each machine has 2 quad core processors. --John On Fri, Aug 14, 2015 at 5:43 PM, Javier Gonzalez <[email protected]> wrote: > Do you actually have 170 machines? Try sticking to one worker per machine > (tweak memory parameters in storm.yaml), makes inter bolt traffic much > faster. > On Aug 14, 2015 5:28 PM, "John Yost" <[email protected]> wrote: > >> Hey Javier, >> >> Cool, thanks for your response! I have 50 workers for 200 Bolt A/5 Bolt >> B and 120 workers for 400 Bolt A/100 Bolt B (this latter config is optimal, >> but cluster resources make it tricky to actually launch this). >> >> I will up the number of Ackers and see if that helps. If not, then I will >> try to vary the number of B bolts beyond 100. >> >> Thanks Again! >> >> --John >> >> On Fri, Aug 14, 2015 at 2:59 PM, Javier Gonzalez <[email protected]> >> wrote: >> >>> You will have a detrimental effect to wiring in boltB, even if it does >>> nothing but ack. Every tuple you have processed from A has to travel to a B >>> bolt, and the ack has to travel back. >>> >>> You could try modifying the number of ackers, and playing with the >>> number of A and B bolts. How many workers do you have for the topology? >>> >>> Regards, >>> JG >>> On Aug 14, 2015 12:31 PM, "John Yost" <[email protected]> wrote: >>> >>>> Hi Everyone, >>>> >>>> I have a topology where a highly CPU-intensive bolt (Bolt A) requires a >>>> much higher degree of parallelism than the bolt it emits tuples to (Bolt B) >>>> (200 Bolt A executors vs <= 100 Bolt B executors). >>>> >>>> I find that the throughput, as measured in number of tuples acked, goes >>>> from 7 million/minute to ~ 1 million/minute when I wire in Bolt B--even if >>>> all of the logic within the Bolt B execute method is disabled and the Bolt >>>> B is therefore simply acking the input tuples from Bolt A. In addition, I >>>> find that, going from 50 to 100 Bolt B executors causes the throughput to >>>> go from 900K/minute to ~ 1.1 million/minute. >>>> >>>> Is the fact that I am going from 200 bolt instances to 100 or less the >>>> problem? I've already experimented with executor.send.buffer.size and >>>> executor.receive.buffer.size, which helped drive throughput from 800K to >>>> 900K. I will try topology.transfer.buffer.size, perhaps set that higher to >>>> 2048. Any other ideas? >>>> >>>> Thanks >>>> >>>> --John >>>> >>>> >>
