Hey Javier,

Sorry, just to clarify, when I have 50 workers I configure the topology to
have 100 Bolt A executors and 5 Bolt B executors, and when I have 120
workers (the max I've gotten on our cluster), the topology is configured to
have 200 Bolt A executors and 100 Bolt B executors for the best
configuration(s).  Yes, we actually have 300 nodes in our Mesos cluster,
but it is a multi-tenant environment so, again, I can get 120 workers
somewhat reliably, but no more.

Definitely a great suggestion regarding getting one worker per node. We are
using a Storm on Mesos setup and I configure each worker to have 7 GB of
RAM and 7 CPU cores. As a consequence, I get one worker per machine since
each machine has 2 quad core processors.

--John

On Fri, Aug 14, 2015 at 5:43 PM, Javier Gonzalez <[email protected]> wrote:

> Do  you actually have 170 machines? Try sticking to one worker per machine
> (tweak memory parameters in storm.yaml), makes inter bolt traffic much
> faster.
> On Aug 14, 2015 5:28 PM, "John Yost" <[email protected]> wrote:
>
>> Hey Javier,
>>
>> Cool, thanks for your response!  I have 50 workers for 200 Bolt A/5 Bolt
>> B and 120 workers for 400 Bolt A/100 Bolt B (this latter config is optimal,
>> but cluster resources make it tricky to actually launch this).
>>
>> I will up the number of Ackers and see if that helps. If not, then I will
>> try to vary the number of B bolts beyond 100.
>>
>> Thanks Again!
>>
>> --John
>>
>> On Fri, Aug 14, 2015 at 2:59 PM, Javier Gonzalez <[email protected]>
>> wrote:
>>
>>> You will have a detrimental effect to wiring in boltB, even if it does
>>> nothing but ack. Every tuple you have processed from A has to travel to a B
>>> bolt, and the ack has to travel back.
>>>
>>> You could try modifying the number of ackers, and playing with the
>>> number of A and B bolts. How many workers do you have for the topology?
>>>
>>> Regards,
>>> JG
>>> On Aug 14, 2015 12:31 PM, "John Yost" <[email protected]> wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> I have a topology where a highly CPU-intensive bolt (Bolt A) requires a
>>>> much higher degree of parallelism than the bolt it emits tuples to (Bolt B)
>>>> (200 Bolt A executors vs <= 100 Bolt B executors).
>>>>
>>>> I find that the throughput, as measured in number of tuples acked, goes
>>>> from 7 million/minute to ~ 1 million/minute when I wire in Bolt B--even if
>>>> all of the logic within the Bolt B execute method is disabled and the Bolt
>>>> B is therefore simply acking the input tuples from Bolt A. In addition, I
>>>> find that, going from 50 to 100 Bolt B executors causes the throughput to
>>>> go from 900K/minute to ~ 1.1 million/minute.
>>>>
>>>> Is the fact that I am going from 200 bolt instances to 100 or less the
>>>> problem?   I've already experimented with executor.send.buffer.size and
>>>> executor.receive.buffer.size, which helped drive throughput from 800K to
>>>> 900K. I will try topology.transfer.buffer.size, perhaps set that higher to
>>>> 2048. Any other ideas?
>>>>
>>>> Thanks
>>>>
>>>> --John
>>>>
>>>>
>>

Reply via email to