I agree the behavior is awkward.

The setNumWorkers config appears to behave as an upper limit of workers that 
will be utilized (e.g. setNumWorkers(200) will not fail to deploy when workers 
< 200), so this can also impact the ORDER of topology deployment required when 
workers < sum of all setNumWorkers configs for all topologies.

For example, I originally setNumWorkers(200) - artificially high - so that we 
can scale our worker pool up to 200 without deploying new code. However, when 
worker pool is 10, and this topology gets deployed first, then no other 
topologies can be deployed, since all the workers are assigned to this single 
topology.

Now, while this is awkward IMHO (I would rather have some additional knobs, 
like “assign a percentage of workers” or “assign a topology priority”), we can 
get around this by using rebalance and setting the number of workers to reflect 
the current state of the cluster. Since you MUST rebalance to increase/decrease 
the number of threads (which would typically go hand in hand with changing the 
number of workers, I think), then its not really extra work.

At some point it would be nice to have some pluggable logic that controls the 
number of executors dynamically, so that when a worker is added, the number of 
executors can be altered programmatically, instead of requiring manual 
intervention.

Thanks
Tyson

On Nov 5, 2014, at 9:05 AM, Dan DeCapria, CivicScience 
<[email protected]<mailto:[email protected]>> wrote:

Hi Nathan,

Sounds like I need to just bite-the-bullet and manually define the number of 
workers for each topology considering all topologies that will be running 
concurrently. The secondary process you mentioned is interesting wrt using 
Thrift to query the worker utilization and then auto-balance all topologies 
during runtime - I'll have to look into that process further.

Thanks for you help,

-Dan


On Wed, Nov 5, 2014 at 11:50 AM, Nathan Leung 
<[email protected]<mailto:[email protected]>> wrote:
It doesn't make sense to automatically balance worker load because you can get 
yourself into strange situations (e.g. 3 workers and each topology requests 4, 
or 4 workers and > 4 topologies, etc).  It would be nice if the UI or the logs 
gave a better indication that there were not enough workers to go around 
though.  You could write something that reads cluster information from the 
nimbus over thrift and rebalances all topologies as necessary, but I think it 
would be better to just make sure that you have enough workers available (or 
even better, more than enough workers) to satisfy the needs of your 
applications.

On Tue, Nov 4, 2014 at 4:45 PM, Dan DeCapria, CivicScience 
<[email protected]<mailto:[email protected]>> wrote:
Use Case:
I have a production storm cluster running with six workers.  Currently topology 
A is active and consuming all six workers via conf.setNumWorkers(6).  Now, 
launching Topology B with six workers (again via conf.setNumWorkers(6)) states 
the topology is active, but currently there are no available workers on the 
cluster for Topology B to use (as Topology A has claimed them all already) and 
hence Topology B is doing nothing. I believe this is due to storm requiring a 
priori that the sum of all topology's workers requested <= cluster worker 
capacity.

I am wondering why the sum of all topology workers is not normalizing the 
allocations for the worker pool when capacity is exceeded and auto-adjusting as 
new topologies come and go? Meaning, from the use case, since both topologies 
requested the same count of six workers, and given the finite capacity of the 
cluster at six actual workers, the implemented normalized proportion of the 
cluster resources for each topology would be 50% split - such that Topology A 
gets three actual workers and Topology B gets three actual workers as well.

How would I go about implementing a dynamic re-allocation of topology workers 
based on the proportion of expected workers given the cluster's finite worker 
capacity? An idea would be allocating workers as desired for all topologies 
until the cluster capacity is reached, at which point the actual number of 
workers desired becomes a normalized proportion allocation model over the 
cluster.

Many thanks,

-Dan




Reply via email to