Re: Running multiple topologies

2018-04-13 Thread Hannum, Daniel
Thank you both. This helps.

Here’s what I was worried about. Right now I have 40 workers on 10 nodes, all 
devoted to one topology. If I have to split 10 workers off for another 
topology, then I was worried that I would have idle nodes if that new topology 
wasn’t busy enough. I now understand that while I would have idle workers in my 
example, I probably wouldn’t have idle nodes because the same number of threads 
would be squeezed into a smaller number of workers but the scheduler would 
still spread those 30 workers over the same number of nodes (and the RAS may 
also further optimize that). So, my CPU/RAM resources would still be utilized.

It still would be really nice to make the worker distribution dynamic or more 
easily changeable. Right now, I’d have to change it in two topologies and then 
restart both of them. Future feature request, I guess.

From: Alessio Pagliari <pagli...@i3s.unice.fr>
Reply-To: "user@storm.apache.org" <user@storm.apache.org>
Date: Thursday, April 12, 2018 at 3:15 PM
To: "user@storm.apache.org" <user@storm.apache.org>
Subject: Re: Running multiple topologies

This email did not originate from the Premier, Inc. network. Use caution 
when opening attachments or clicking on URLs.*


.
Hi Daniel,

I’ll try to answer at my best.

Basically storm has a default scheduler that perform a round robin placement, 
when in a topology you define the number of workers (aka jvms) it will first 
place them in all the nodes you have in your cluster in a cyclic way (round 
robin), then with the same algorithm it will place the executors in the workers 
(basically the workers become the nodes of the situation). Storm has an 
algorithm to order the nodes, the slots in each node, and the executors, and it 
will use that order for every placement. It’s quite deterministic if the 
available nodes are always the same.

That means that if for example you have 3 nodes and 4 workers, the 1st worker 
in the 1st node, the 2nd in the 2nd, the 3rd in the 3rd and the 4th worker will 
be placed back inside the first node. Making you to have 2 workers in the first 
node and 1 in the others. If you deploy a second application it will follow the 
same pattern starting always from the 1st node, then the 2nd and so on. That is 
until you don’t finish the available slots in the 1st node, from that point on, 
storm will start the placement from the 2nd worker.

You have also to consider one thing, that may be obvious, but just to be clear: 
a worker cannot be shared between multiple topologies, meaning that if you have 
one worker with only one executor in it, you can’t place executors from other 
topologies in it. Which means that you cannot move a JVM from a topology A to a 
topology B, unless you kill (I’m not sure if you just need to disable it) the 
topology A (the first you deployed), you “rebalance” the topology B and 
redeploy the topology A, in that way you will have the node slots switched 
between the two topologies.

If you’re interested in improving resource management (in terms of CPU and RAM) 
and pure throughput performances of your application, without considering the 
advantages of a distributed system and load balancing your topology 
cluster-wide, you can use the Resource Aware Scheduler (RAS), as suggested by 
Ethan. However, the RAS focus on locality, to avoid tuples to go around in the 
network, making the placement in order to put all the workers and executors 
more near to each other as possible. Most of the times you will have the entire 
topology inside the same node. So a possible scenario is having 5 nodes in a 
cluster with 5 topologies deployed each one on a different machine, in fact 
differently from the above scheduler, RAS start placing in the machine with 
most resources available.

As of Storm 2.0 I know that they’re improving the RAS, but unfortunately I stil 
didn’t dig to understand how.

Hope I answered your question,


--
Alessio Pagliari
Scale Team, PhD Student
Université Côte d’Azur, CNRS, I3S

[cid:8DADF9C1-5CB5-4A6E-AB13-12FDBF2A0D04@i3s.unice.fr]


On 12 Apr 2018, at 20:51, Ethan Li 
<ethanopensou...@gmail.com<mailto:ethanopensou...@gmail.com>> wrote:

Hi Daniel,

I am not sure if I understand your questions correctly. But will the resource 
aware scheduler help? 
https://github.com/apache/storm/blob/master/docs/Resource_Aware_Scheduler_overview.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_docs_Resource-5FAware-5FScheduler-5Foverview.md=DwMFaQ=2Nw3fMGo2X8W9N0cgJ8QEE6dJUdSW43Zo0sycqQq3H0=qUxrE20KiMOUmgAziv_enaLH7wvWTKTNI8fz4O6YsA0=dirsBDgkD4MXfqbWpVeH677cBoqLZjIE8HQwS02I2Os=hl5mHjtOcyBfij04YADnhnnDLZbZOmWpRIw501x0f78=>


Thanks
Ethan



On Apr 12, 2018, at 1:45 PM, Hannum, Daniel 
<daniel_han...@premierinc.com<mailto:daniel_han...@premierinc.com>> wrote:

I think I know the answer, but I can’t find docs, so check my thinking please

Re: Running multiple topologies

2018-04-12 Thread Alessio Pagliari
Hi Daniel,

I’ll try to answer at my best. 

Basically storm has a default scheduler that perform a round robin placement, 
when in a topology you define the number of workers (aka jvms) it will first 
place them in all the nodes you have in your cluster in a cyclic way (round 
robin), then with the same algorithm it will place the executors in the workers 
(basically the workers become the nodes of the situation). Storm has an 
algorithm to order the nodes, the slots in each node, and the executors, and it 
will use that order for every placement. It’s quite deterministic if the 
available nodes are always the same.

That means that if for example you have 3 nodes and 4 workers, the 1st worker 
in the 1st node, the 2nd in the 2nd, the 3rd in the 3rd and the 4th worker will 
be placed back inside the first node. Making you to have 2 workers in the first 
node and 1 in the others. If you deploy a second application it will follow the 
same pattern starting always from the 1st node, then the 2nd and so on. That is 
until you don’t finish the available slots in the 1st node, from that point on, 
storm will start the placement from the 2nd worker.

You have also to consider one thing, that may be obvious, but just to be clear: 
a worker cannot be shared between multiple topologies, meaning that if you have 
one worker with only one executor in it, you can’t place executors from other 
topologies in it. Which means that you cannot move a JVM from a topology A to a 
topology B, unless you kill (I’m not sure if you just need to disable it) the 
topology A (the first you deployed), you “rebalance” the topology B and 
redeploy the topology A, in that way you will have the node slots switched 
between the two topologies.

If you’re interested in improving resource management (in terms of CPU and RAM) 
and pure throughput performances of your application, without considering the 
advantages of a distributed system and load balancing your topology 
cluster-wide, you can use the Resource Aware Scheduler (RAS), as suggested by 
Ethan. However, the RAS focus on locality, to avoid tuples to go around in the 
network, making the placement in order to put all the workers and executors 
more near to each other as possible. Most of the times you will have the entire 
topology inside the same node. So a possible scenario is having 5 nodes in a 
cluster with 5 topologies deployed each one on a different machine, in fact 
differently from the above scheduler, RAS start placing in the machine with 
most resources available.

As of Storm 2.0 I know that they’re improving the RAS, but unfortunately I stil 
didn’t dig to understand how.

Hope I answered your question,


--
Alessio Pagliari
Scale Team, PhD Student
Université Côte d’Azur, CNRS, I3S



> On 12 Apr 2018, at 20:51, Ethan Li  wrote:
> 
> Hi Daniel,
> 
> I am not sure if I understand your questions correctly. But will the resource 
> aware scheduler help? 
> https://github.com/apache/storm/blob/master/docs/Resource_Aware_Scheduler_overview.md
>  
> 
> 
> 
> Thanks
> Ethan
> 
> 
>> On Apr 12, 2018, at 1:45 PM, Hannum, Daniel > > wrote:
>> 
>> I think I know the answer, but I can’t find docs, so check my thinking 
>> please.
>>  
>> We’re going to be going from 1 topology to maybe 5 on our (v1.1.x) cluster. 
>> I think the way I share the cluster is by setting NumWorkers() on all the 
>> topologies so we divide the available JVM’s up. If that is true, then don’t 
>> we have the problem that I’m tying up resources if one is idle? Or that I 
>> can’t move a JVM from topology A to topology B if B is under load?
>>  
>> So my questions are: 
>> Do I understand this correctly?
>> Is there any way to improve this situation besides just getting more hosts 
>> and being ok with some portion of them being idle?
>> Is this going to get better in Storm 2?
>>  
>> Thanks!
> 



Re: Running multiple topologies

2018-04-12 Thread Ethan Li
Hi Daniel,

I am not sure if I understand your questions correctly. But will the resource 
aware scheduler help? 
https://github.com/apache/storm/blob/master/docs/Resource_Aware_Scheduler_overview.md
 



Thanks
Ethan


> On Apr 12, 2018, at 1:45 PM, Hannum, Daniel  
> wrote:
> 
> I think I know the answer, but I can’t find docs, so check my thinking please.
>  
> We’re going to be going from 1 topology to maybe 5 on our (v1.1.x) cluster. I 
> think the way I share the cluster is by setting NumWorkers() on all the 
> topologies so we divide the available JVM’s up. If that is true, then don’t 
> we have the problem that I’m tying up resources if one is idle? Or that I 
> can’t move a JVM from topology A to topology B if B is under load?
>  
> So my questions are: 
> Do I understand this correctly?
> Is there any way to improve this situation besides just getting more hosts 
> and being ok with some portion of them being idle?
> Is this going to get better in Storm 2?
>  
> Thanks!



Re: running multiple topologies in same cluster

2016-02-04 Thread Spico Florin
Hello!
Thank you all for your answers! I guess I'll wait for adding in support for
resource aware scheduling in a multi-tenant stand alone storm cluster.
Regards,
 Florin

On Tue, Feb 2, 2016 at 1:00 AM, Erik Weathers  wrote:

> hi Spico,
>
> As Bobby said, native Storm is going to have better support for this soon.
>
> FWIW, there is also the storm-on-mesos project, which we've been running
> on for almost 2 years at Groupon.
>
>- https://github.com/mesos/storm
>
> Caveats:
>
>- storm's logviewer is unsupported
>- scheduling can be suboptimal, causing topologies of different "size"
>(resource requirements) to starve each other
>   - side effect of needing to dynamically calculate storm "slots"
>   from mesos resource offers, my team has a framework change we are 
> testing
>   that will improve that behavior.
>- it's relatively complex to operate since there are so many different
>moving parts / components
>   - mesos-master
>   - mesos-slave/agent (it's being renamed)
>   - storm nimbus (MesosNimbus -> the mesos scheduler)
>   - storm supervisor (MesosSupervisor -> the mesos executor)
>   - storm worker (the mesos task)
>   - ZooKeeper
>- it probably won't work nicely with all the fancy security stuff that
>has been added to Storm in 0.10.0+
>
> - Erik
>
> On Mon, Feb 1, 2016 at 12:14 PM, Bobby Evans  wrote:
>
>> We are currently adding in support for resource aware scheduling in a
>> multi-tenant stand alone storm cluster.  It is still alpha quality but we
>> plan on getting it into production at Yahoo this quarter.  If you can wait
>> that would be the preferred way I see to support your use case.
>>
>> - Bobby
>>
>>
>> On Monday, February 1, 2016 12:16 PM, Spico Florin 
>> wrote:
>>
>>
>> Hello!
>> I have an use case where we have to deploy many tpologies in a storm
>> cluster.
>> 1.we would like to know if running these topologies in combination with
>> apache slider over yarn would bring us some benefits in terms of resource
>> consumption?
>> 2. in such cases (running many topolgies aprox 60) what are the best
>> practices on how to run them over a cluster with a smart load balanced
>> hardware resources consumption (cpu, ram)?
>> I look forward for your answers.
>> Regards,
>> Florin
>>
>>
>>
>>
>