Re: [openstack-dev] [nova] Proposal for an Experiment

2015-08-04 Thread Ed Leafe
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 08/03/2015 02:24 PM, Jesse Cook wrote:

 Performance tests against 1000 node clusters being setup by OSIC? 
 Sounds like you have a playground for your tests.

Unfortunately, the consensus of the nova cores during the mid-cycle
meetup was that while this is an interesting approach, and that
experimenting with novel approaches can be very worthwhile, it was not
considered a priority, as there is too much work already on everyone's
plate for Liberty. So the experiment isn't going to happen any time soon
.

- -- 

- -- Ed Leafe
-BEGIN PGP SIGNATURE-
Version: GnuPG v2
Comment: GPGTools - https://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIbBAEBCgAGBQJVwM5gAAoJEKMgtcocwZqLG90P9RNKm4pRmcLPK+PqVaXCIu/E
c+i0SW9Af5fmy4cC7Efnuv2o7UqJNNU7GsGw3on54Lt1SoF3z1yJ/9WarzjiNLq+
25Uz+2HDytovvnREi+P/LoVB4mj49nowEkMh/3QgQWVsMOvPitPM7mBPAGkyzNvb
5ElC1Xr2HkyQ/9h34IGLWcC/X/meR79BcvRHfJwqzNyTP2foi7tboq4sugbFVQfy
72vbaTAtLI/mDzUjkafNGB5W2ge4VWAJRsjf1y+eIv+j2f3PKbKsx3XLrTIuzJ78
9qBmpGal/biqyUwFyrvrg/e//KuD0FhJdiDwiuc35hebkN7UBJbq71RAjdxtK4Jr
clImy04sGAvKI0r27LFZA3ycjv0J8OW4nJeH9vjdBg5N2D0FuhOIBNxXsKohdF42
0maWFe1Wj7Icv9YnJ26ZaaWjqwnGE/PjVl3lFd1X5W7KFJ5Ay/uYY28cfHbD0wKT
Ych2oSR0/Jrzzfm9jd+VP2kjORgtEdDaARbP11auT+o1xnIKLIA9qTiOGuhX807L
cZVitPoUdIlogQyKJ6hZtiIytrnnYocHKZp/MQjpTabVKkl4aKmKLAo2onNgDcD8
6B08hmsW28pm7aKZf0SBb7oe6OU0vCQjFaKWAVb1O+zGqspOrhqehQR3u6nF+O1Y
TaG3j4n4w/5D9gadk4k=
=l9oR
-END PGP SIGNATURE-

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-08-04 Thread Roman Vasilets
Rally? Something else? What can we do to measure this?
Of cause, if you looking for instrument for measure performance - Rally is
the best choice!

On Tue, Aug 4, 2015 at 5:38 PM, Ed Leafe e...@leafe.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512

 On 08/03/2015 02:24 PM, Jesse Cook wrote:

  Performance tests against 1000 node clusters being setup by OSIC?
  Sounds like you have a playground for your tests.

 Unfortunately, the consensus of the nova cores during the mid-cycle
 meetup was that while this is an interesting approach, and that
 experimenting with novel approaches can be very worthwhile, it was not
 considered a priority, as there is too much work already on everyone's
 plate for Liberty. So the experiment isn't going to happen any time soon
 .

 - --

 - -- Ed Leafe
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v2
 Comment: GPGTools - https://gpgtools.org
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iQIbBAEBCgAGBQJVwM5gAAoJEKMgtcocwZqLG90P9RNKm4pRmcLPK+PqVaXCIu/E
 c+i0SW9Af5fmy4cC7Efnuv2o7UqJNNU7GsGw3on54Lt1SoF3z1yJ/9WarzjiNLq+
 25Uz+2HDytovvnREi+P/LoVB4mj49nowEkMh/3QgQWVsMOvPitPM7mBPAGkyzNvb
 5ElC1Xr2HkyQ/9h34IGLWcC/X/meR79BcvRHfJwqzNyTP2foi7tboq4sugbFVQfy
 72vbaTAtLI/mDzUjkafNGB5W2ge4VWAJRsjf1y+eIv+j2f3PKbKsx3XLrTIuzJ78
 9qBmpGal/biqyUwFyrvrg/e//KuD0FhJdiDwiuc35hebkN7UBJbq71RAjdxtK4Jr
 clImy04sGAvKI0r27LFZA3ycjv0J8OW4nJeH9vjdBg5N2D0FuhOIBNxXsKohdF42
 0maWFe1Wj7Icv9YnJ26ZaaWjqwnGE/PjVl3lFd1X5W7KFJ5Ay/uYY28cfHbD0wKT
 Ych2oSR0/Jrzzfm9jd+VP2kjORgtEdDaARbP11auT+o1xnIKLIA9qTiOGuhX807L
 cZVitPoUdIlogQyKJ6hZtiIytrnnYocHKZp/MQjpTabVKkl4aKmKLAo2onNgDcD8
 6B08hmsW28pm7aKZf0SBb7oe6OU0vCQjFaKWAVb1O+zGqspOrhqehQR3u6nF+O1Y
 TaG3j4n4w/5D9gadk4k=
 =l9oR
 -END PGP SIGNATURE-

 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-08-03 Thread Jesse Cook


Jesse J. CookCompute Team Lead
jesse.c...@rackspace.com
irc: #compute-eng (gimchi)
mobile: 618-530-0659
 https://rackspacemarketing.com/signatyourEmail/
https://www.linkedin.com/pub/jesse-cook/8/292/620
https://plus.google.com/u/0/+JesseCooks/posts/p/pub




On 7/20/15, 12:40 PM, Clint Byrum cl...@fewbar.com wrote:

Excerpts from Jesse Cook's message of 2015-07-20 07:48:46 -0700:
 
 On 7/15/15, 9:18 AM, Ed Leafe e...@leafe.com wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512
 
 Changing the architecture of a complex system such as Nova is never
 easy, even when we know that the design isn't working as well as we
 need it to. And it's even more frustrating because when the change is
 complete, it's hard to know if the improvement, if any, was worth it.
 
 So I had an idea: what if we ran a test of that architecture change
 out-of-tree? In other words, create a separate deployment, and rip out
 the parts that don't work well, replacing them with an alternative
 design. There would be no Gerrit reviews or anything that would slow
 down the work or add load to the already overloaded reviewers. Then we
 could see if this modified system is a significant-enough improvement
 to justify investing the time in implementing it in-tree. And, of
 course, if the test doesn't show what was hoped for, it is scrapped
 and we start thinking anew.
 
 +1
 
 The important part in this process is defining up front what level of
 improvement would be needed to make considering actually making such a
 change worthwhile, and what sort of tests would demonstrate whether or
 not whether this level was met. I'd like to discuss such an experiment
 next week at the Nova mid-cycle.
 
 What I'd like to investigate is replacing the current design of having
 the compute nodes communicating with the scheduler via message queues.
 This design is overly complex and has several known scalability
 issues. My thought is to replace this with a Cassandra [1] backend.
 Compute nodes would update their state to Cassandra whenever they
 change, and that data would be read by the scheduler to make its host
 selection. When the scheduler chooses a host, it would post the claim
 to Cassandra wrapped in a lightweight transaction, which would ensure
 that no other scheduler has tried to claim those resources. When the
 host has built the requested VM, it will delete the claim and update
 Cassandra with its current state.
 
 One main motivation for using Cassandra over the current design is
 that it will enable us to run multiple schedulers without increasing
 the raciness of the system. Another is that it will greatly simplify a
 lot of the internal plumbing we've set up to implement in Nova what we
 would get out of the box with Cassandra. A third is that if this
 proves to be a success, it would also be able to be used further down
 the road to simplify inter-cell communication (but this is getting
 ahead of ourselves...). I've worked with Cassandra before and it has
 been rock-solid to run and simple to set up. I've also had preliminary
 technical reviews with the engineers at DataStax [2], the company
 behind Cassandra, and they agreed that this was a good fit.
 
 At this point I'm sure that most of you are filled with thoughts on
 how this won't work, or how much trouble it will be to switch, or how
 much more of a pain it will be, or how you hate non-relational DBs, or
 any of a zillion other negative thoughts. FWIW, I have them too. But
 instead of ranting, I would ask that we acknowledge for now that:
 
 Call me an optimist, I think this can work :)
 
 I would prefer a solution that avoids state management all together and
 instead depends on each individual making rule-based decisions using
their
 limited observations of their perceived environment. Of course, this has
 certain emergent behaviors you have to learn from, but on the upside, no
 more braiding state throughout the system. I don¹t like the assumption
 that it has to be a global state management problem when it doesn¹t have
 to be. That being said, I¹m not opposed to trying a solution like you
 described using Cassandra or something similar. I generally support
 improvements :)
 


 
 a) it will be disruptive and painful to switch something like this at
 this point in Nova's development
 b) it would have to provide *significant* improvement to make such a
 change worthwhile
 
 So what I'm asking from all of you is to help define the second part:
 what we would want improved, and how to measure those benefits. In
 other words, what results would you have to see in order to make you
 reconsider your initial nah, this'll never work reaction, and start
 to think that this is will be a worthwhile change to make to Nova.
 
 I¹d like to see n build requests within 1 second each be successfully
 scheduled to a host that has spare capacity with only say a total system
 capacity of n * 1.10 where n = 1, each cell having ~100 hosts, the
 number of hosts is = n * 0.10 and = n * 

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-20 Thread Jesse Cook

On 7/15/15, 9:18 AM, Ed Leafe e...@leafe.com wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Changing the architecture of a complex system such as Nova is never
easy, even when we know that the design isn't working as well as we
need it to. And it's even more frustrating because when the change is
complete, it's hard to know if the improvement, if any, was worth it.

So I had an idea: what if we ran a test of that architecture change
out-of-tree? In other words, create a separate deployment, and rip out
the parts that don't work well, replacing them with an alternative
design. There would be no Gerrit reviews or anything that would slow
down the work or add load to the already overloaded reviewers. Then we
could see if this modified system is a significant-enough improvement
to justify investing the time in implementing it in-tree. And, of
course, if the test doesn't show what was hoped for, it is scrapped
and we start thinking anew.

+1

The important part in this process is defining up front what level of
improvement would be needed to make considering actually making such a
change worthwhile, and what sort of tests would demonstrate whether or
not whether this level was met. I'd like to discuss such an experiment
next week at the Nova mid-cycle.

What I'd like to investigate is replacing the current design of having
the compute nodes communicating with the scheduler via message queues.
This design is overly complex and has several known scalability
issues. My thought is to replace this with a Cassandra [1] backend.
Compute nodes would update their state to Cassandra whenever they
change, and that data would be read by the scheduler to make its host
selection. When the scheduler chooses a host, it would post the claim
to Cassandra wrapped in a lightweight transaction, which would ensure
that no other scheduler has tried to claim those resources. When the
host has built the requested VM, it will delete the claim and update
Cassandra with its current state.

One main motivation for using Cassandra over the current design is
that it will enable us to run multiple schedulers without increasing
the raciness of the system. Another is that it will greatly simplify a
lot of the internal plumbing we've set up to implement in Nova what we
would get out of the box with Cassandra. A third is that if this
proves to be a success, it would also be able to be used further down
the road to simplify inter-cell communication (but this is getting
ahead of ourselves...). I've worked with Cassandra before and it has
been rock-solid to run and simple to set up. I've also had preliminary
technical reviews with the engineers at DataStax [2], the company
behind Cassandra, and they agreed that this was a good fit.

At this point I'm sure that most of you are filled with thoughts on
how this won't work, or how much trouble it will be to switch, or how
much more of a pain it will be, or how you hate non-relational DBs, or
any of a zillion other negative thoughts. FWIW, I have them too. But
instead of ranting, I would ask that we acknowledge for now that:

Call me an optimist, I think this can work :)

I would prefer a solution that avoids state management all together and
instead depends on each individual making rule-based decisions using their
limited observations of their perceived environment. Of course, this has
certain emergent behaviors you have to learn from, but on the upside, no
more braiding state throughout the system. I don¹t like the assumption
that it has to be a global state management problem when it doesn¹t have
to be. That being said, I¹m not opposed to trying a solution like you
described using Cassandra or something similar. I generally support
improvements :)


a) it will be disruptive and painful to switch something like this at
this point in Nova's development
b) it would have to provide *significant* improvement to make such a
change worthwhile

So what I'm asking from all of you is to help define the second part:
what we would want improved, and how to measure those benefits. In
other words, what results would you have to see in order to make you
reconsider your initial nah, this'll never work reaction, and start
to think that this is will be a worthwhile change to make to Nova.

I¹d like to see n build requests within 1 second each be successfully
scheduled to a host that has spare capacity with only say a total system
capacity of n * 1.10 where n = 1, each cell having ~100 hosts, the
number of hosts is = n * 0.10 and = n * 0.90, and the number of
schedulers is = 2.

For example:

Build requests: 1 in 1 second
Slots for flavor requested: 11000
Hosts that can build flavor: 7500
Number of schedulers: 3
Number of cells: 75 (each with 100 hosts)


I'm also asking that you refrain from talking about why this can't
work for now. I know it'll be difficult to do that, since nobody likes
ranting about stuff more than I do, but right now it won't be helpful.
There will be plenty of time for that 

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-20 Thread Clint Byrum
Excerpts from Jesse Cook's message of 2015-07-20 07:48:46 -0700:
 
 On 7/15/15, 9:18 AM, Ed Leafe e...@leafe.com wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512
 
 Changing the architecture of a complex system such as Nova is never
 easy, even when we know that the design isn't working as well as we
 need it to. And it's even more frustrating because when the change is
 complete, it's hard to know if the improvement, if any, was worth it.
 
 So I had an idea: what if we ran a test of that architecture change
 out-of-tree? In other words, create a separate deployment, and rip out
 the parts that don't work well, replacing them with an alternative
 design. There would be no Gerrit reviews or anything that would slow
 down the work or add load to the already overloaded reviewers. Then we
 could see if this modified system is a significant-enough improvement
 to justify investing the time in implementing it in-tree. And, of
 course, if the test doesn't show what was hoped for, it is scrapped
 and we start thinking anew.
 
 +1
 
 The important part in this process is defining up front what level of
 improvement would be needed to make considering actually making such a
 change worthwhile, and what sort of tests would demonstrate whether or
 not whether this level was met. I'd like to discuss such an experiment
 next week at the Nova mid-cycle.
 
 What I'd like to investigate is replacing the current design of having
 the compute nodes communicating with the scheduler via message queues.
 This design is overly complex and has several known scalability
 issues. My thought is to replace this with a Cassandra [1] backend.
 Compute nodes would update their state to Cassandra whenever they
 change, and that data would be read by the scheduler to make its host
 selection. When the scheduler chooses a host, it would post the claim
 to Cassandra wrapped in a lightweight transaction, which would ensure
 that no other scheduler has tried to claim those resources. When the
 host has built the requested VM, it will delete the claim and update
 Cassandra with its current state.
 
 One main motivation for using Cassandra over the current design is
 that it will enable us to run multiple schedulers without increasing
 the raciness of the system. Another is that it will greatly simplify a
 lot of the internal plumbing we've set up to implement in Nova what we
 would get out of the box with Cassandra. A third is that if this
 proves to be a success, it would also be able to be used further down
 the road to simplify inter-cell communication (but this is getting
 ahead of ourselves...). I've worked with Cassandra before and it has
 been rock-solid to run and simple to set up. I've also had preliminary
 technical reviews with the engineers at DataStax [2], the company
 behind Cassandra, and they agreed that this was a good fit.
 
 At this point I'm sure that most of you are filled with thoughts on
 how this won't work, or how much trouble it will be to switch, or how
 much more of a pain it will be, or how you hate non-relational DBs, or
 any of a zillion other negative thoughts. FWIW, I have them too. But
 instead of ranting, I would ask that we acknowledge for now that:
 
 Call me an optimist, I think this can work :)
 
 I would prefer a solution that avoids state management all together and
 instead depends on each individual making rule-based decisions using their
 limited observations of their perceived environment. Of course, this has
 certain emergent behaviors you have to learn from, but on the upside, no
 more braiding state throughout the system. I don¹t like the assumption
 that it has to be a global state management problem when it doesn¹t have
 to be. That being said, I¹m not opposed to trying a solution like you
 described using Cassandra or something similar. I generally support
 improvements :)
 


 
 a) it will be disruptive and painful to switch something like this at
 this point in Nova's development
 b) it would have to provide *significant* improvement to make such a
 change worthwhile
 
 So what I'm asking from all of you is to help define the second part:
 what we would want improved, and how to measure those benefits. In
 other words, what results would you have to see in order to make you
 reconsider your initial nah, this'll never work reaction, and start
 to think that this is will be a worthwhile change to make to Nova.
 
 I¹d like to see n build requests within 1 second each be successfully
 scheduled to a host that has spare capacity with only say a total system
 capacity of n * 1.10 where n = 1, each cell having ~100 hosts, the
 number of hosts is = n * 0.10 and = n * 0.90, and the number of
 schedulers is = 2.
 
 For example:
 
 Build requests: 1 in 1 second
 Slots for flavor requested: 11000
 Hosts that can build flavor: 7500
 Number of schedulers: 3
 Number of cells: 75 (each with 100 hosts)
 

This is right on, though one thing missing is where the current code
fails this 

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-20 Thread Chris Friesen

On 07/20/2015 02:04 PM, Clint Byrum wrote:

Excerpts from Chris Friesen's message of 2015-07-20 12:17:29 -0700:



Some questions:

1) Could you elaborate a bit on how this would work?  I don't quite understand
how you would handle a request for booting an instance with a certain set of
resources--would you queue up a message for each resource?



Please be concrete on what you mean by resource.

I'm suggesting if you only have flavors, which have cpu, ram, disk, and rx/tx 
ratios,
then each flavor is a queue. Thats the easiest problem to solve. Then if
you have a single special thing that can only have one VM per host (lets
say, a PCI pass through thing), then thats another iteration of each
flavor. So assuming 3
flavors:

1=tiny cpu=1,ram=1024m,disk=5gb,rxtx=1
2=medium cpu=2,ram=4096m,disk=100gb,rxtx=2
3=large cpu=8,ram=16384,disk=200gb,rxtx=2

This means you have these queues:

reserve
release
compute,cpu=1,ram=1024m,disk=5gb,rxtx=1,pci=1
compute,cpu=1,ram=1024m,disk=5gb,rxtx=1
compute,cpu=2,ram=4096m,disk=100gb,rxtx=2,pci=1
compute,cpu=2,ram=4096m,disk=100gb,rxtx=2
compute,cpu=8,ram=16384,disk=200gb,rxtx=2pci=1
compute,cpu=8,ram=16384,disk=200gb,rxtx=2


snip


Now, I've made this argument in the past, and people have pointed out
that the permutations can get into the tens of thousands very easily
if you start adding lots of dimensions and/or flavors. I suggest that
is no big deal, but maybe I'm biased because I have done something like
that in Gearman and it was, in fact, no big deal.


Yeah, that's what I was worried about.  We have things that can be specified per 
flavor, and things that can be specified per image, and things that can be 
specified per instance, and they all multiply together.



2) How would it handle stuff like weight functions where you could have multiple
compute nodes that *could* satisfy the requirement but some of them would be
better than others by some arbitrary criteria.



Can you provide a concrete example? Feels like I'm asking for a straw
man to be built. ;)


Well, as an example we have a cluster that is aimed at high-performance network 
processing and so all else being equal they will choose the compute node with 
the least network traffic.  You might also try to pack instances together for 
power efficiency (allowing you to turn off unused compute nodes), or choose the 
compute node that results in the tightest packing (to minimize unused resources).



3) The biggest improvement I'd like to see is in group scheduling.  Suppose I
want to schedule multiple instances, each with their own resource requirements,
but also with interdependency between them (these ones on the same node, these
ones not on the same node, these ones with this provider network, etc.)  The
scheduler could then look at the whole request all at once and optimize it
rather than looking at each piece separately.  That could also allow relocating
multiple instances that want to be co-located on the same compute node.



So, if the grouping is arbitrary, then there's no way to pre-calculate the
group size, I agree. I am wont to pursue something like this though, as I
don't really think this is the kind of optimization that cloud workloads
should be built on top of. If you need two processes to have low latency,
why not just boot a bigger machine and do it all in one VM? There are a
few reasons I can think of, but I wonder how many are in the general
case?


It's a fair question. :)  I honestly don't know...I was just thinking that we 
allow the expression of affinity/anti-affinity policies via server groups, but 
the scheduler doesn't really do a good job of actually scheduling those groups.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-20 Thread Joshua Harlow
I have a feeling that we really need to make whatever this selection 
process has clearly defined API boundaries, so that various 
'implementation experiments' can be used (and researched on).


Those API boundaries will be what scheduling entities must provide but 
the implementations could be many things. I have feeling that this is 
really an on-going area of research and no solution will likely be 
optimal 'yet' (maybe someday...).


Without even defined API boundaries I start to wonder if this whole 
exploring will end up just burning out people (when said people find a 
possible solution but the code won't be accepted due to lack of API 
boundaries in the first place); I believe gantt was trying to fix this 
(but I'm not sure of the status of that)?


-Josh

Chris Friesen wrote:

On 07/20/2015 02:04 PM, Clint Byrum wrote:

Excerpts from Chris Friesen's message of 2015-07-20 12:17:29 -0700:



Some questions:

1) Could you elaborate a bit on how this would work? I don't quite
understand
how you would handle a request for booting an instance with a certain
set of
resources--would you queue up a message for each resource?



Please be concrete on what you mean by resource.

I'm suggesting if you only have flavors, which have cpu, ram, disk,
and rx/tx ratios,
then each flavor is a queue. Thats the easiest problem to solve. Then if
you have a single special thing that can only have one VM per host (lets
say, a PCI pass through thing), then thats another iteration of each
flavor. So assuming 3
flavors:

1=tiny cpu=1,ram=1024m,disk=5gb,rxtx=1
2=medium cpu=2,ram=4096m,disk=100gb,rxtx=2
3=large cpu=8,ram=16384,disk=200gb,rxtx=2

This means you have these queues:

reserve
release
compute,cpu=1,ram=1024m,disk=5gb,rxtx=1,pci=1
compute,cpu=1,ram=1024m,disk=5gb,rxtx=1
compute,cpu=2,ram=4096m,disk=100gb,rxtx=2,pci=1
compute,cpu=2,ram=4096m,disk=100gb,rxtx=2
compute,cpu=8,ram=16384,disk=200gb,rxtx=2pci=1
compute,cpu=8,ram=16384,disk=200gb,rxtx=2


snip


Now, I've made this argument in the past, and people have pointed out
that the permutations can get into the tens of thousands very easily
if you start adding lots of dimensions and/or flavors. I suggest that
is no big deal, but maybe I'm biased because I have done something like
that in Gearman and it was, in fact, no big deal.


Yeah, that's what I was worried about. We have things that can be
specified per flavor, and things that can be specified per image, and
things that can be specified per instance, and they all multiply
together.


2) How would it handle stuff like weight functions where you could
have multiple
compute nodes that *could* satisfy the requirement but some of them
would be
better than others by some arbitrary criteria.



Can you provide a concrete example? Feels like I'm asking for a straw
man to be built. ;)


Well, as an example we have a cluster that is aimed at high-performance
network processing and so all else being equal they will choose the
compute node with the least network traffic. You might also try to pack
instances together for power efficiency (allowing you to turn off unused
compute nodes), or choose the compute node that results in the tightest
packing (to minimize unused resources).


3) The biggest improvement I'd like to see is in group scheduling.
Suppose I
want to schedule multiple instances, each with their own resource
requirements,
but also with interdependency between them (these ones on the same
node, these
ones not on the same node, these ones with this provider network,
etc.) The
scheduler could then look at the whole request all at once and
optimize it
rather than looking at each piece separately. That could also allow
relocating
multiple instances that want to be co-located on the same compute node.



So, if the grouping is arbitrary, then there's no way to pre-calculate
the
group size, I agree. I am wont to pursue something like this though,
as I
don't really think this is the kind of optimization that cloud workloads
should be built on top of. If you need two processes to have low
latency,
why not just boot a bigger machine and do it all in one VM? There are a
few reasons I can think of, but I wonder how many are in the general
case?


It's a fair question. :) I honestly don't know...I was just thinking
that we allow the expression of affinity/anti-affinity policies via
server groups, but the scheduler doesn't really do a good job of
actually scheduling those groups.

Chris

__

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-20 Thread Chris Friesen

On 07/20/2015 11:40 AM, Clint Byrum wrote:


To your earlier point about state being abused in the system, I
totally 100% agree. In the past I've wondered a lot if there can be a
worker model, where compute hosts all try to grab work off queues if
they have available resources. So API requests for boot/delete don't
change any state, they just enqueue a message. Queues would be matched
up to resources and the more filter choices, the more queues. Each
time a compute node completed a task (create vm, destroy vm) it would
re-evaluate all of the queues and subscribe to the ones it could satisfy
right now. Quotas would simply be the first stop for the enqueued create
messages, and a final stop for the enqueued delete messages (once its
done, release quota). If you haven't noticed, this would agree with Robert
Collins's suggestion that something like Kafka is a technology more suited
to this (or my favorite old-often-forgotten solution to this , Gearman. ;)

This would have no global dynamic state, and very little local dynamic
state. API, conductor, and compute nodes simply need to know all of the
choices users are offered, and there is no scheduler at runtime, just
a predictive queue-list-manager that only gets updated when choices are
added or removed. This would relieve a ton of the burden currently put
on the database by scheduling since the only accesses would be simple
read/writes (that includes 'server-list' type operations since that
would read a single index key).


Some questions:

1) Could you elaborate a bit on how this would work?  I don't quite understand 
how you would handle a request for booting an instance with a certain set of 
resources--would you queue up a message for each resource?


2) How would it handle stuff like weight functions where you could have multiple 
compute nodes that *could* satisfy the requirement but some of them would be 
better than others by some arbitrary criteria.


3) The biggest improvement I'd like to see is in group scheduling.  Suppose I 
want to schedule multiple instances, each with their own resource requirements, 
but also with interdependency between them (these ones on the same node, these 
ones not on the same node, these ones with this provider network, etc.)  The 
scheduler could then look at the whole request all at once and optimize it 
rather than looking at each piece separately.  That could also allow relocating 
multiple instances that want to be co-located on the same compute node.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-20 Thread Clint Byrum
Excerpts from Chris Friesen's message of 2015-07-20 12:17:29 -0700:
 On 07/20/2015 11:40 AM, Clint Byrum wrote:
 
  To your earlier point about state being abused in the system, I
  totally 100% agree. In the past I've wondered a lot if there can be a
  worker model, where compute hosts all try to grab work off queues if
  they have available resources. So API requests for boot/delete don't
  change any state, they just enqueue a message. Queues would be matched
  up to resources and the more filter choices, the more queues. Each
  time a compute node completed a task (create vm, destroy vm) it would
  re-evaluate all of the queues and subscribe to the ones it could satisfy
  right now. Quotas would simply be the first stop for the enqueued create
  messages, and a final stop for the enqueued delete messages (once its
  done, release quota). If you haven't noticed, this would agree with Robert
  Collins's suggestion that something like Kafka is a technology more suited
  to this (or my favorite old-often-forgotten solution to this , Gearman. ;)
 
  This would have no global dynamic state, and very little local dynamic
  state. API, conductor, and compute nodes simply need to know all of the
  choices users are offered, and there is no scheduler at runtime, just
  a predictive queue-list-manager that only gets updated when choices are
  added or removed. This would relieve a ton of the burden currently put
  on the database by scheduling since the only accesses would be simple
  read/writes (that includes 'server-list' type operations since that
  would read a single index key).
 
 Some questions:
 
 1) Could you elaborate a bit on how this would work?  I don't quite 
 understand 
 how you would handle a request for booting an instance with a certain set of 
 resources--would you queue up a message for each resource?
 

Please be concrete on what you mean by resource.

I'm suggesting if you only have flavors, which have cpu, ram, disk, and rx/tx 
ratios,
then each flavor is a queue. Thats the easiest problem to solve. Then if
you have a single special thing that can only have one VM per host (lets
say, a PCI pass through thing), then thats another iteration of each
flavor. So assuming 3
flavors:

1=tiny cpu=1,ram=1024m,disk=5gb,rxtx=1
2=medium cpu=2,ram=4096m,disk=100gb,rxtx=2
3=large cpu=8,ram=16384,disk=200gb,rxtx=2

This means you have these queues:

reserve
release
compute,cpu=1,ram=1024m,disk=5gb,rxtx=1,pci=1
compute,cpu=1,ram=1024m,disk=5gb,rxtx=1
compute,cpu=2,ram=4096m,disk=100gb,rxtx=2,pci=1
compute,cpu=2,ram=4096m,disk=100gb,rxtx=2
compute,cpu=8,ram=16384,disk=200gb,rxtx=2pci=1
compute,cpu=8,ram=16384,disk=200gb,rxtx=2

Also you have a delete queue per compute node (and migrate and and and..
RPC still is pretty unchanged at the single-instance level)

So, compute nodes that have the pci device boot up, query the flavors
table, and subscribe to the compute queues that they can satisfy now
(which would be _all_ of them assuming they have 16G of ram available).

A user asks for a tiny + pci pass through. API node injects a message
to the reserve queue, a conductor receives it, checks the user's quota,
bumps usage by 1, and then sends it to the appropriate compute queue. A
compute node receives it. It starts the VM, ACK's the job (so it is
dropped from the queue so it won't be retried) and then looks at its
capabilities vs. the queues, and unsubscribes from all of the pci=1
queues, since its one pci device is in use.

When the user deletes the node, the compute node receives that on its
delete queue, removes the node, and then sends a message on the release
queue that the resources can be returned to the user's quota (or we can
talk about whether to just release them earlier.. when releasing happens
is a sub-topic).

Now, I've made this argument in the past, and people have pointed out
that the permutations can get into the tens of thousands very easily
if you start adding lots of dimensions and/or flavors. I suggest that
is no big deal, but maybe I'm biased because I have done something like
that in Gearman and it was, in fact, no big deal.

 2) How would it handle stuff like weight functions where you could have 
 multiple 
 compute nodes that *could* satisfy the requirement but some of them would be 
 better than others by some arbitrary criteria.


Can you provide a concrete example? Feels like I'm asking for a straw
man to be built. ;)

 3) The biggest improvement I'd like to see is in group scheduling.  Suppose I 
 want to schedule multiple instances, each with their own resource 
 requirements, 
 but also with interdependency between them (these ones on the same node, 
 these 
 ones not on the same node, these ones with this provider network, etc.)  The 
 scheduler could then look at the whole request all at once and optimize it 
 rather than looking at each piece separately.  That could also allow 
 relocating 
 multiple instances that want to be co-located on the same compute node.
 

So, if 

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-20 Thread Clint Byrum
Excerpts from Chris Friesen's message of 2015-07-20 14:30:53 -0700:
 On 07/20/2015 02:04 PM, Clint Byrum wrote:
  Excerpts from Chris Friesen's message of 2015-07-20 12:17:29 -0700:
 
  Some questions:
 
  1) Could you elaborate a bit on how this would work?  I don't quite 
  understand
  how you would handle a request for booting an instance with a certain set 
  of
  resources--would you queue up a message for each resource?
 
 
  Please be concrete on what you mean by resource.
 
  I'm suggesting if you only have flavors, which have cpu, ram, disk, and 
  rx/tx ratios,
  then each flavor is a queue. Thats the easiest problem to solve. Then if
  you have a single special thing that can only have one VM per host (lets
  say, a PCI pass through thing), then thats another iteration of each
  flavor. So assuming 3
  flavors:
 
  1=tiny cpu=1,ram=1024m,disk=5gb,rxtx=1
  2=medium cpu=2,ram=4096m,disk=100gb,rxtx=2
  3=large cpu=8,ram=16384,disk=200gb,rxtx=2
 
  This means you have these queues:
 
  reserve
  release
  compute,cpu=1,ram=1024m,disk=5gb,rxtx=1,pci=1
  compute,cpu=1,ram=1024m,disk=5gb,rxtx=1
  compute,cpu=2,ram=4096m,disk=100gb,rxtx=2,pci=1
  compute,cpu=2,ram=4096m,disk=100gb,rxtx=2
  compute,cpu=8,ram=16384,disk=200gb,rxtx=2pci=1
  compute,cpu=8,ram=16384,disk=200gb,rxtx=2
 
 snip
 
  Now, I've made this argument in the past, and people have pointed out
  that the permutations can get into the tens of thousands very easily
  if you start adding lots of dimensions and/or flavors. I suggest that
  is no big deal, but maybe I'm biased because I have done something like
  that in Gearman and it was, in fact, no big deal.
 
 Yeah, that's what I was worried about.  We have things that can be specified 
 per 
 flavor, and things that can be specified per image, and things that can be 
 specified per instance, and they all multiply together.
 

So all that matters is the size of the set of permutations that people
are using _now_ to request nodes.  It's relatively low-cost to create
the queues in a distributed manner and just have compute nodes listen to
a broadcast for new ones that they should try to subscribe to. Even if
there are 1 million queues possible, it's unlikely there will be 1 million
legitimate unique boot arguments. This does complicate things quite a
bit though, so part of me just wants to suggest don't do that.  ;)

  2) How would it handle stuff like weight functions where you could have 
  multiple
  compute nodes that *could* satisfy the requirement but some of them would 
  be
  better than others by some arbitrary criteria.
 
 
  Can you provide a concrete example? Feels like I'm asking for a straw
  man to be built. ;)
 
 Well, as an example we have a cluster that is aimed at high-performance 
 network 
 processing and so all else being equal they will choose the compute node with 
 the least network traffic.  You might also try to pack instances together for 
 power efficiency (allowing you to turn off unused compute nodes), or choose 
 the 
 compute node that results in the tightest packing (to minimize unused 
 resources).
 

Least-utilized is hard since it requires knowledge of all of the nodes'
state. It also breaks down and gives 0 benefit when all the nodes are
fully bandwidth-utilized. However, Below 20% utilized is extremely
easy and achieves the actual goal that the user stated, since each node
can self-assess whether it is or is not in that group. In this way a user
gets given an error I don't have any fully available networking for you
instead of getting a node which is oversubscribed unknowingly.

Packing is kind of interesting. One can achieve it on an empty cluster
simply by only turning on one node at a time, and whenever the queue
has less than safety_margin workers, turn on more nodes. However,
once nodes are full and workloads are being deleted, you want to assess
which ones would be the least cost to migrate off of and turn off. I'm
inclined to say I would do this from something outside the scheduler,
as part of a power-reclaimer, but perhaps a centralized scheduler that
always knows would do a better job here. It would need to do that in
such a manner that is so efficient it would outweigh the benefit of not
needing global state awareness. An external reclaimer can work in an
eventually consistent manner and thus I would still lean toward that over
the realtime scheduler, but this needs some experimentation to confirm.

  3) The biggest improvement I'd like to see is in group scheduling.  
  Suppose I
  want to schedule multiple instances, each with their own resource 
  requirements,
  but also with interdependency between them (these ones on the same node, 
  these
  ones not on the same node, these ones with this provider network, etc.)  
  The
  scheduler could then look at the whole request all at once and optimize it
  rather than looking at each piece separately.  That could also allow 
  relocating
  multiple instances that want to be co-located on the 

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-20 Thread Clint Byrum
Excerpts from Joshua Harlow's message of 2015-07-20 14:57:48 -0700:
 I have a feeling that we really need to make whatever this selection 
 process has clearly defined API boundaries, so that various 
 'implementation experiments' can be used (and researched on).
 
 Those API boundaries will be what scheduling entities must provide but 
 the implementations could be many things. I have feeling that this is 
 really an on-going area of research and no solution will likely be 
 optimal 'yet' (maybe someday...).
 
 Without even defined API boundaries I start to wonder if this whole 
 exploring will end up just burning out people (when said people find a 
 possible solution but the code won't be accepted due to lack of API 
 boundaries in the first place); I believe gantt was trying to fix this 
 (but I'm not sure of the status of that)?
 

Yes, right now it's just too tightly wound into Nova to experiment
without doing major surgery. If one can simply make the scheduler go
faster, without having to change everything else around it, we get
something that is easier to test, and easier for deployers to migrate
to.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-20 Thread Joshua Harlow

Clint Byrum wrote:

Excerpts from Chris Friesen's message of 2015-07-20 14:30:53 -0700:

On 07/20/2015 02:04 PM, Clint Byrum wrote:

Excerpts from Chris Friesen's message of 2015-07-20 12:17:29 -0700:

Some questions:

1) Could you elaborate a bit on how this would work?  I don't quite understand
how you would handle a request for booting an instance with a certain set of
resources--would you queue up a message for each resource?


Please be concrete on what you mean by resource.

I'm suggesting if you only have flavors, which have cpu, ram, disk, and rx/tx 
ratios,
then each flavor is a queue. Thats the easiest problem to solve. Then if
you have a single special thing that can only have one VM per host (lets
say, a PCI pass through thing), then thats another iteration of each
flavor. So assuming 3
flavors:

1=tiny cpu=1,ram=1024m,disk=5gb,rxtx=1
2=medium cpu=2,ram=4096m,disk=100gb,rxtx=2
3=large cpu=8,ram=16384,disk=200gb,rxtx=2

This means you have these queues:

reserve
release
compute,cpu=1,ram=1024m,disk=5gb,rxtx=1,pci=1
compute,cpu=1,ram=1024m,disk=5gb,rxtx=1
compute,cpu=2,ram=4096m,disk=100gb,rxtx=2,pci=1
compute,cpu=2,ram=4096m,disk=100gb,rxtx=2
compute,cpu=8,ram=16384,disk=200gb,rxtx=2pci=1
compute,cpu=8,ram=16384,disk=200gb,rxtx=2

snip


Now, I've made this argument in the past, and people have pointed out
that the permutations can get into the tens of thousands very easily
if you start adding lots of dimensions and/or flavors. I suggest that
is no big deal, but maybe I'm biased because I have done something like
that in Gearman and it was, in fact, no big deal.

Yeah, that's what I was worried about.  We have things that can be specified per
flavor, and things that can be specified per image, and things that can be
specified per instance, and they all multiply together.



So all that matters is the size of the set of permutations that people
are using _now_ to request nodes.  It's relatively low-cost to create
the queues in a distributed manner and just have compute nodes listen to
a broadcast for new ones that they should try to subscribe to. Even if
there are 1 million queues possible, it's unlikely there will be 1 million
legitimate unique boot arguments. This does complicate things quite a
bit though, so part of me just wants to suggest don't do that.  ;)


2) How would it handle stuff like weight functions where you could have multiple
compute nodes that *could* satisfy the requirement but some of them would be
better than others by some arbitrary criteria.


Can you provide a concrete example? Feels like I'm asking for a straw
man to be built. ;)

Well, as an example we have a cluster that is aimed at high-performance network
processing and so all else being equal they will choose the compute node with
the least network traffic.  You might also try to pack instances together for
power efficiency (allowing you to turn off unused compute nodes), or choose the
compute node that results in the tightest packing (to minimize unused 
resources).



Least-utilized is hard since it requires knowledge of all of the nodes'
state. It also breaks down and gives 0 benefit when all the nodes are
fully bandwidth-utilized. However, Below 20% utilized is extremely
easy and achieves the actual goal that the user stated, since each node
can self-assess whether it is or is not in that group. In this way a user
gets given an error I don't have any fully available networking for you
instead of getting a node which is oversubscribed unknowingly.

Packing is kind of interesting. One can achieve it on an empty cluster
simply by only turning on one node at a time, and whenever the queue
has less than safety_margin workers, turn on more nodes. However,
once nodes are full and workloads are being deleted, you want to assess
which ones would be the least cost to migrate off of and turn off. I'm
inclined to say I would do this from something outside the scheduler,
as part of a power-reclaimer, but perhaps a centralized scheduler that
always knows would do a better job here. It would need to do that in
such a manner that is so efficient it would outweigh the benefit of not
needing global state awareness. An external reclaimer can work in an
eventually consistent manner and thus I would still lean toward that over
the realtime scheduler, but this needs some experimentation to confirm.


From what I've heard (idk how widely this is done in the industry); but 
actually turning off nodes I've heard causes more problems than it 
solves in terms of power-costs, cooling, hardware [disk, cpu, other] 
failures and so-on, so maybe turning nodes off may not be the best idea. 
This is all things I've heard second-hand though so may not be what 
others do.





3) The biggest improvement I'd like to see is in group scheduling.  Suppose I
want to schedule multiple instances, each with their own resource requirements,
but also with interdependency between them (these ones on the same node, these
ones not on the same node, these 

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-16 Thread John Garbutt
On 15 July 2015 at 19:25, Robert Collins robe...@robertcollins.net wrote:
 On 16 July 2015 at 02:18, Ed Leafe e...@leafe.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512
 ...
 What I'd like to investigate is replacing the current design of having
 the compute nodes communicating with the scheduler via message queues.
 This design is overly complex and has several known scalability
 issues. My thought is to replace this with a Cassandra [1] backend.
 Compute nodes would update their state to Cassandra whenever they
 change, and that data would be read by the scheduler to make its host
 selection. When the scheduler chooses a host, it would post the claim
 to Cassandra wrapped in a lightweight transaction, which would ensure
 that no other scheduler has tried to claim those resources. When the
 host has built the requested VM, it will delete the claim and update
 Cassandra with its current state.

 +1 on doing an experiment.

 Some semi-random thoughts here. Well, not random at all, I've been
 mulling on this for a while.

 I think Kafka may fit our model significantly vis-a-vis updating state
 more closely than Cassandra does. It would be neat if we could do a
 few different sketchy implementations and head-to-head test them. I
 love Cassandra in a lot of ways, but lightweight-transaction are two
 words that I'd really not expect to see in Cassandra (Yes, I know it
 has them in the official docs and design :)) - its a full paxos
 interaction to do SERIAL consistency, which is more work than ether
 QUORUM or LOCAL_QUORUM. A sharded approach - there is only one compute
 node in question for the update needed - can be less work than either
 and still race free.

 I too also very much want to see us move to brokerless RPC,
 systematically, for all the reasons :). You might need a little of
 that mixed in to the experiments, depending on the scale reached.

 In terms of quantification; are you looking to test scalability (e.g.
 scheduling some N events per second without races), [there are huge
 improvements possible by rewriting the current schedulers innards to
 be less wasteful, but that doesn't address active-active setups],
 latency (e.g. 99th percentile time-to-schedule) or ... ?

+1 for trying Kafka

I have tried to write up my thoughts on the Kafka approach (and a few
related things) in here:
https://review.openstack.org/#/c/191914/5/specs/backlog/approved/parallel-scheduler.rst,cm

Its trying to describe what I want to prototype for the next
scheduler, its also possibly one of the worse specs I have ever seen.
There may be some ideas worth nicking in there (there may not be!)

John

PS
I also cover my want for multiple schedulers living in Nova, long term
(We already have 2.5 schedulers, depending on how you count them)
I can see some of these schedulers being the best for a sub set of
deployments.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Joshua Harlow

Chris Friesen wrote:

On 07/15/2015 09:31 AM, Joshua Harlow wrote:

I do like experiments!

What about going even farther and trying to integrate somehow into mesos?

https://mesos.apache.org/documentation/latest/mesos-architecture/

Replace the hadooop executor, MPI executor with a 'VM executor' and
perhaps we
could eliminate a large part of the scheduler code (just a thought)...


Is the mesos scheduler sufficiently generic as to encompass all the
filters we currently have in nova?


IMHO some of these should probably have never existed in the first 
place: ie 
https://github.com/openstack/nova/blob/master/nova/scheduler/filters/json_filter.py 
since they are near impossible to ever migrate away from once created (a 
JSON-based grammar for selecting hosts, like woah). So if someone is 
going to do a comparison/experiment I'd hope that they can overlook some 
of the filters that should likely never have been created in the first 
place ;)




Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Robert Collins
On 16 July 2015 at 02:18, Ed Leafe e...@leafe.com wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA512
...
 What I'd like to investigate is replacing the current design of having
 the compute nodes communicating with the scheduler via message queues.
 This design is overly complex and has several known scalability
 issues. My thought is to replace this with a Cassandra [1] backend.
 Compute nodes would update their state to Cassandra whenever they
 change, and that data would be read by the scheduler to make its host
 selection. When the scheduler chooses a host, it would post the claim
 to Cassandra wrapped in a lightweight transaction, which would ensure
 that no other scheduler has tried to claim those resources. When the
 host has built the requested VM, it will delete the claim and update
 Cassandra with its current state.

+1 on doing an experiment.

Some semi-random thoughts here. Well, not random at all, I've been
mulling on this for a while.

I think Kafka may fit our model significantly vis-a-vis updating state
more closely than Cassandra does. It would be neat if we could do a
few different sketchy implementations and head-to-head test them. I
love Cassandra in a lot of ways, but lightweight-transaction are two
words that I'd really not expect to see in Cassandra (Yes, I know it
has them in the official docs and design :)) - its a full paxos
interaction to do SERIAL consistency, which is more work than ether
QUORUM or LOCAL_QUORUM. A sharded approach - there is only one compute
node in question for the update needed - can be less work than either
and still race free.

I too also very much want to see us move to brokerless RPC,
systematically, for all the reasons :). You might need a little of
that mixed in to the experiments, depending on the scale reached.

In terms of quantification; are you looking to test scalability (e.g.
scheduling some N events per second without races), [there are huge
improvements possible by rewriting the current schedulers innards to
be less wasteful, but that doesn't address active-active setups],
latency (e.g. 99th percentile time-to-schedule) or ... ?

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Clint Byrum
What you describe is a spike. It's a grand plan, and you don't need
anyone's permission, so huzzah for the spike!

As far as what should be improved, I hear a lot that having multiple
schedulers does not scale well, so I'd suggest that as a primary target
(maybe measure the _current_ problem, and then set the target as a 10x
improvement over what we have now).

Things to consider while pushing on that goal:

* Do not backslide the resilience in the system. The code is just now
starting to be fault tolerant when talking to RabbitMQ, so make sure
to also consider how tolerant of failures this will be. Cassandra is
typically chosen for its resilience and performance, but Cassandra does
a neat trick in that clients can switch its CAP theorem profile from
Consistent and Available (but slow) to Available and Performant when
reading things. That might be useful in the context of trying to push
the performance _UP_ for schedulers, while not breaking anything else.

* Consider the cost of introducing a brand new technology into the
deployer space. If there _is_ a way to get the desired improvement with,
say, just MySQL and some clever sharding, then that might be a smaller
pill to swallow for deployers.

Anyway, I wish you well on this endeavor and hope to see your results
soon!

Excerpts from Ed Leafe's message of 2015-07-15 07:18:42 -0700:
 Hash: SHA512
 
 Changing the architecture of a complex system such as Nova is never
 easy, even when we know that the design isn't working as well as we
 need it to. And it's even more frustrating because when the change is
 complete, it's hard to know if the improvement, if any, was worth it.
 
 So I had an idea: what if we ran a test of that architecture change
 out-of-tree? In other words, create a separate deployment, and rip out
 the parts that don't work well, replacing them with an alternative
 design. There would be no Gerrit reviews or anything that would slow
 down the work or add load to the already overloaded reviewers. Then we
 could see if this modified system is a significant-enough improvement
 to justify investing the time in implementing it in-tree. And, of
 course, if the test doesn't show what was hoped for, it is scrapped
 and we start thinking anew.
 
 The important part in this process is defining up front what level of
 improvement would be needed to make considering actually making such a
 change worthwhile, and what sort of tests would demonstrate whether or
 not whether this level was met. I'd like to discuss such an experiment
 next week at the Nova mid-cycle.
 
 What I'd like to investigate is replacing the current design of having
 the compute nodes communicating with the scheduler via message queues.
 This design is overly complex and has several known scalability
 issues. My thought is to replace this with a Cassandra [1] backend.
 Compute nodes would update their state to Cassandra whenever they
 change, and that data would be read by the scheduler to make its host
 selection. When the scheduler chooses a host, it would post the claim
 to Cassandra wrapped in a lightweight transaction, which would ensure
 that no other scheduler has tried to claim those resources. When the
 host has built the requested VM, it will delete the claim and update
 Cassandra with its current state.
 
 One main motivation for using Cassandra over the current design is
 that it will enable us to run multiple schedulers without increasing
 the raciness of the system. Another is that it will greatly simplify a
 lot of the internal plumbing we've set up to implement in Nova what we
 would get out of the box with Cassandra. A third is that if this
 proves to be a success, it would also be able to be used further down
 the road to simplify inter-cell communication (but this is getting
 ahead of ourselves...). I've worked with Cassandra before and it has
 been rock-solid to run and simple to set up. I've also had preliminary
 technical reviews with the engineers at DataStax [2], the company
 behind Cassandra, and they agreed that this was a good fit.
 
 At this point I'm sure that most of you are filled with thoughts on
 how this won't work, or how much trouble it will be to switch, or how
 much more of a pain it will be, or how you hate non-relational DBs, or
 any of a zillion other negative thoughts. FWIW, I have them too. But
 instead of ranting, I would ask that we acknowledge for now that:
 
 a) it will be disruptive and painful to switch something like this at
 this point in Nova's development
 b) it would have to provide *significant* improvement to make such a
 change worthwhile
 
 So what I'm asking from all of you is to help define the second part:
 what we would want improved, and how to measure those benefits. In
 other words, what results would you have to see in order to make you
 reconsider your initial nah, this'll never work reaction, and start
 to think that this is will be a worthwhile change to make to Nova.
 
 I'm also asking that you refrain from 

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Ed Leafe
On Jul 15, 2015, at 1:08 PM, Maish Saidel-Keesing mais...@maishsk.com wrote:

 * Consider the cost of introducing a brand new technology into the
 deployer space. If there _is_ a way to get the desired improvement with,
 say, just MySQL and some clever sharding, then that might be a smaller
 pill to swallow for deployers.
 +1000 to this part regarding introducing a new technology

Yes, of course it has been considered. If it were trivial, I would just propose 
a blueprint.

Again, I'd really like to hear ideas on what kind of results would be 
convincing enough to make it worthwhile to introduce a new technology.


-- Ed Leafe







signature.asc
Description: Message signed with OpenPGP using GPGMail
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Robert Collins
On 16 July 2015 at 07:27, Ed Leafe e...@leafe.com wrote:
 On Jul 15, 2015, at 1:08 PM, Maish Saidel-Keesing mais...@maishsk.com wrote:

 * Consider the cost of introducing a brand new technology into the
 deployer space. If there _is_ a way to get the desired improvement with,
 say, just MySQL and some clever sharding, then that might be a smaller
 pill to swallow for deployers.
 +1000 to this part regarding introducing a new technology

 Yes, of course it has been considered. If it were trivial, I would just 
 propose a blueprint.

 Again, I'd really like to hear ideas on what kind of results would be 
 convincing enough to make it worthwhile to introduce a new technology.

We spent some summit time discussing just this:
https://wiki.openstack.org/wiki/TechnologyChoices

The summary here is IMO:
 - ops will follow where we lead BUT
 - we need to take their needs into account
 - which includes robustness, operability, and so on
 - things where an alternative implementation exists can be
uptake-driven : e.g. we expand the choices, and observe what folk move
onto.

That said, I think the fundamental thing today is that we have a bug
and its not fixed. LOTS of them. Where fixing them needs better
plumbing, lets be bold - but not hasty.

-Rob

-- 
Robert Collins rbtcoll...@hp.com
Distinguished Technologist
HP Converged Cloud

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Maish Saidel-Keesing

On 07/15/15 20:40, Clint Byrum wrote:

What you describe is a spike. It's a grand plan, and you don't need
anyone's permission, so huzzah for the spike!

As far as what should be improved, I hear a lot that having multiple
schedulers does not scale well, so I'd suggest that as a primary target
(maybe measure the _current_ problem, and then set the target as a 10x
improvement over what we have now).

Things to consider while pushing on that goal:

* Do not backslide the resilience in the system. The code is just now
starting to be fault tolerant when talking to RabbitMQ, so make sure
to also consider how tolerant of failures this will be. Cassandra is
typically chosen for its resilience and performance, but Cassandra does
a neat trick in that clients can switch its CAP theorem profile from
Consistent and Available (but slow) to Available and Performant when
reading things. That might be useful in the context of trying to push
the performance _UP_ for schedulers, while not breaking anything else.

* Consider the cost of introducing a brand new technology into the
deployer space. If there _is_ a way to get the desired improvement with,
say, just MySQL and some clever sharding, then that might be a smaller
pill to swallow for deployers.

+1000 to this part regarding introducing a new technology


Anyway, I wish you well on this endeavor and hope to see your results
soon!

Excerpts from Ed Leafe's message of 2015-07-15 07:18:42 -0700:

Hash: SHA512

Changing the architecture of a complex system such as Nova is never
easy, even when we know that the design isn't working as well as we
need it to. And it's even more frustrating because when the change is
complete, it's hard to know if the improvement, if any, was worth it.

So I had an idea: what if we ran a test of that architecture change
out-of-tree? In other words, create a separate deployment, and rip out
the parts that don't work well, replacing them with an alternative
design. There would be no Gerrit reviews or anything that would slow
down the work or add load to the already overloaded reviewers. Then we
could see if this modified system is a significant-enough improvement
to justify investing the time in implementing it in-tree. And, of
course, if the test doesn't show what was hoped for, it is scrapped
and we start thinking anew.

The important part in this process is defining up front what level of
improvement would be needed to make considering actually making such a
change worthwhile, and what sort of tests would demonstrate whether or
not whether this level was met. I'd like to discuss such an experiment
next week at the Nova mid-cycle.

What I'd like to investigate is replacing the current design of having
the compute nodes communicating with the scheduler via message queues.
This design is overly complex and has several known scalability
issues. My thought is to replace this with a Cassandra [1] backend.
Compute nodes would update their state to Cassandra whenever they
change, and that data would be read by the scheduler to make its host
selection. When the scheduler chooses a host, it would post the claim
to Cassandra wrapped in a lightweight transaction, which would ensure
that no other scheduler has tried to claim those resources. When the
host has built the requested VM, it will delete the claim and update
Cassandra with its current state.

One main motivation for using Cassandra over the current design is
that it will enable us to run multiple schedulers without increasing
the raciness of the system. Another is that it will greatly simplify a
lot of the internal plumbing we've set up to implement in Nova what we
would get out of the box with Cassandra. A third is that if this
proves to be a success, it would also be able to be used further down
the road to simplify inter-cell communication (but this is getting
ahead of ourselves...). I've worked with Cassandra before and it has
been rock-solid to run and simple to set up. I've also had preliminary
technical reviews with the engineers at DataStax [2], the company
behind Cassandra, and they agreed that this was a good fit.

At this point I'm sure that most of you are filled with thoughts on
how this won't work, or how much trouble it will be to switch, or how
much more of a pain it will be, or how you hate non-relational DBs, or
any of a zillion other negative thoughts. FWIW, I have them too. But
instead of ranting, I would ask that we acknowledge for now that:

a) it will be disruptive and painful to switch something like this at
this point in Nova's development
b) it would have to provide *significant* improvement to make such a
change worthwhile

So what I'm asking from all of you is to help define the second part:
what we would want improved, and how to measure those benefits. In
other words, what results would you have to see in order to make you
reconsider your initial nah, this'll never work reaction, and start
to think that this is will be a worthwhile change to make to Nova.

I'm 

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Chris Friesen

On 07/15/2015 09:31 AM, Joshua Harlow wrote:

I do like experiments!

What about going even farther and trying to integrate somehow into mesos?

https://mesos.apache.org/documentation/latest/mesos-architecture/

Replace the hadooop executor, MPI executor with a 'VM executor' and perhaps we
could eliminate a large part of the scheduler code (just a thought)...


Is the mesos scheduler sufficiently generic as to encompass all the filters we 
currently have in nova?


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Joshua Harlow

Chris Friesen wrote:

On 07/15/2015 09:31 AM, Joshua Harlow wrote:

I do like experiments!

What about going even farther and trying to integrate somehow into mesos?

https://mesos.apache.org/documentation/latest/mesos-architecture/

Replace the hadooop executor, MPI executor with a 'VM executor' and
perhaps we
could eliminate a large part of the scheduler code (just a thought)...


Is the mesos scheduler sufficiently generic as to encompass all the
filters we currently have in nova?


Unsure, if not it's just another open-source project right? I'm sure 
they'd love to collaborate, and maybe they will even do most of the 
work? Who knows...




Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Chris Friesen

On 07/15/2015 08:18 AM, Ed Leafe wrote:


What I'd like to investigate is replacing the current design of having
the compute nodes communicating with the scheduler via message queues.
This design is overly complex and has several known scalability
issues. My thought is to replace this with a Cassandra [1] backend.
Compute nodes would update their state to Cassandra whenever they
change, and that data would be read by the scheduler to make its host
selection. When the scheduler chooses a host, it would post the claim
to Cassandra wrapped in a lightweight transaction, which would ensure
that no other scheduler has tried to claim those resources. When the
host has built the requested VM, it will delete the claim and update
Cassandra with its current state.

One main motivation for using Cassandra over the current design is
that it will enable us to run multiple schedulers without increasing
the raciness of the system.


It seems to me that the ability to run multiple schedulers comes from the fact 
that you're talking about claiming resources in the data store, and not from 
anything inherent in Cassandra itself.


Why couldn't we just update the existing nova scheduler to claim resources in 
the existing database in order to get the same reduction of raciness? (Thus 
allowing multiple schedulers running in parallel.)


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Ed Leafe
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Changing the architecture of a complex system such as Nova is never
easy, even when we know that the design isn't working as well as we
need it to. And it's even more frustrating because when the change is
complete, it's hard to know if the improvement, if any, was worth it.

So I had an idea: what if we ran a test of that architecture change
out-of-tree? In other words, create a separate deployment, and rip out
the parts that don't work well, replacing them with an alternative
design. There would be no Gerrit reviews or anything that would slow
down the work or add load to the already overloaded reviewers. Then we
could see if this modified system is a significant-enough improvement
to justify investing the time in implementing it in-tree. And, of
course, if the test doesn't show what was hoped for, it is scrapped
and we start thinking anew.

The important part in this process is defining up front what level of
improvement would be needed to make considering actually making such a
change worthwhile, and what sort of tests would demonstrate whether or
not whether this level was met. I'd like to discuss such an experiment
next week at the Nova mid-cycle.

What I'd like to investigate is replacing the current design of having
the compute nodes communicating with the scheduler via message queues.
This design is overly complex and has several known scalability
issues. My thought is to replace this with a Cassandra [1] backend.
Compute nodes would update their state to Cassandra whenever they
change, and that data would be read by the scheduler to make its host
selection. When the scheduler chooses a host, it would post the claim
to Cassandra wrapped in a lightweight transaction, which would ensure
that no other scheduler has tried to claim those resources. When the
host has built the requested VM, it will delete the claim and update
Cassandra with its current state.

One main motivation for using Cassandra over the current design is
that it will enable us to run multiple schedulers without increasing
the raciness of the system. Another is that it will greatly simplify a
lot of the internal plumbing we've set up to implement in Nova what we
would get out of the box with Cassandra. A third is that if this
proves to be a success, it would also be able to be used further down
the road to simplify inter-cell communication (but this is getting
ahead of ourselves...). I've worked with Cassandra before and it has
been rock-solid to run and simple to set up. I've also had preliminary
technical reviews with the engineers at DataStax [2], the company
behind Cassandra, and they agreed that this was a good fit.

At this point I'm sure that most of you are filled with thoughts on
how this won't work, or how much trouble it will be to switch, or how
much more of a pain it will be, or how you hate non-relational DBs, or
any of a zillion other negative thoughts. FWIW, I have them too. But
instead of ranting, I would ask that we acknowledge for now that:

a) it will be disruptive and painful to switch something like this at
this point in Nova's development
b) it would have to provide *significant* improvement to make such a
change worthwhile

So what I'm asking from all of you is to help define the second part:
what we would want improved, and how to measure those benefits. In
other words, what results would you have to see in order to make you
reconsider your initial nah, this'll never work reaction, and start
to think that this is will be a worthwhile change to make to Nova.

I'm also asking that you refrain from talking about why this can't
work for now. I know it'll be difficult to do that, since nobody likes
ranting about stuff more than I do, but right now it won't be helpful.
There will be plenty of time for that later, assuming that this
experiment yields anything worthwhile. Instead, think of the current
pain points in the scheduler design, and what sort of improvement you
would have to see in order to seriously consider undertaking this
change to Nova.

I've gotten the OK from my management to pursue this, and several
people in the community have expressed support for both the approach
and the experiment, even though most don't have spare cycles to
contribute. I'd love to have anyone who is interested become involved.

I hope that this will be a positive discussion at the Nova mid-cycle
next week. I know it will be a lively one. :)

[1] http://cassandra.apache.org/
[2] http://www.datastax.com/
- -- 

- -- Ed Leafe
-BEGIN PGP SIGNATURE-
Version: GnuPG v2
Comment: GPGTools - https://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJVpmvCAAoJEKMgtcocwZqLSNYP/0b8s7pZnXaF3tTYF+WtNppr
lHyQMHSXLQ3CESoS4961ZWOCMtV2hCxvcioXem+PJzOdZED143XMJ3LR6+dZ012q
RGSp43Co+vUfuTtaTg030sLyDlXZKEenkPXy0202WpPaK4RYSonrnrxs0kmv+ZpH
yamsZP2/gReZseBsKiww0FkqWGkIJxD7bi1r8DdLa/HLvwYUD+U2zrcUvT4cMXMR

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Joshua Harlow

I do like experiments!

What about going even farther and trying to integrate somehow into mesos?

https://mesos.apache.org/documentation/latest/mesos-architecture/

Replace the hadooop executor, MPI executor with a 'VM executor' and 
perhaps we could eliminate a large part of the scheduler code (just a 
thought)...


I think a bunch of other ideas were also written down @ 
https://review.openstack.org/#/c/191914/ maybe u can try some of those to :)


Ed Leafe wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Changing the architecture of a complex system such as Nova is never
easy, even when we know that the design isn't working as well as we
need it to. And it's even more frustrating because when the change is
complete, it's hard to know if the improvement, if any, was worth it.

So I had an idea: what if we ran a test of that architecture change
out-of-tree? In other words, create a separate deployment, and rip out
the parts that don't work well, replacing them with an alternative
design. There would be no Gerrit reviews or anything that would slow
down the work or add load to the already overloaded reviewers. Then we
could see if this modified system is a significant-enough improvement
to justify investing the time in implementing it in-tree. And, of
course, if the test doesn't show what was hoped for, it is scrapped
and we start thinking anew.

The important part in this process is defining up front what level of
improvement would be needed to make considering actually making such a
change worthwhile, and what sort of tests would demonstrate whether or
not whether this level was met. I'd like to discuss such an experiment
next week at the Nova mid-cycle.

What I'd like to investigate is replacing the current design of having
the compute nodes communicating with the scheduler via message queues.
This design is overly complex and has several known scalability
issues. My thought is to replace this with a Cassandra [1] backend.
Compute nodes would update their state to Cassandra whenever they
change, and that data would be read by the scheduler to make its host
selection. When the scheduler chooses a host, it would post the claim
to Cassandra wrapped in a lightweight transaction, which would ensure
that no other scheduler has tried to claim those resources. When the
host has built the requested VM, it will delete the claim and update
Cassandra with its current state.

One main motivation for using Cassandra over the current design is
that it will enable us to run multiple schedulers without increasing
the raciness of the system. Another is that it will greatly simplify a
lot of the internal plumbing we've set up to implement in Nova what we
would get out of the box with Cassandra. A third is that if this
proves to be a success, it would also be able to be used further down
the road to simplify inter-cell communication (but this is getting
ahead of ourselves...). I've worked with Cassandra before and it has
been rock-solid to run and simple to set up. I've also had preliminary
technical reviews with the engineers at DataStax [2], the company
behind Cassandra, and they agreed that this was a good fit.

At this point I'm sure that most of you are filled with thoughts on
how this won't work, or how much trouble it will be to switch, or how
much more of a pain it will be, or how you hate non-relational DBs, or
any of a zillion other negative thoughts. FWIW, I have them too. But
instead of ranting, I would ask that we acknowledge for now that:

a) it will be disruptive and painful to switch something like this at
this point in Nova's development
b) it would have to provide *significant* improvement to make such a
change worthwhile

So what I'm asking from all of you is to help define the second part:
what we would want improved, and how to measure those benefits. In
other words, what results would you have to see in order to make you
reconsider your initial nah, this'll never work reaction, and start
to think that this is will be a worthwhile change to make to Nova.

I'm also asking that you refrain from talking about why this can't
work for now. I know it'll be difficult to do that, since nobody likes
ranting about stuff more than I do, but right now it won't be helpful.
There will be plenty of time for that later, assuming that this
experiment yields anything worthwhile. Instead, think of the current
pain points in the scheduler design, and what sort of improvement you
would have to see in order to seriously consider undertaking this
change to Nova.

I've gotten the OK from my management to pursue this, and several
people in the community have expressed support for both the approach
and the experiment, even though most don't have spare cycles to
contribute. I'd love to have anyone who is interested become involved.

I hope that this will be a positive discussion at the Nova mid-cycle
next week. I know it will be a lively one. :)

[1] http://cassandra.apache.org/
[2] http://www.datastax.com/
- --

- -- 

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Matt Riedemann



On 7/15/2015 9:18 AM, Ed Leafe wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Changing the architecture of a complex system such as Nova is never
easy, even when we know that the design isn't working as well as we
need it to. And it's even more frustrating because when the change is
complete, it's hard to know if the improvement, if any, was worth it.

So I had an idea: what if we ran a test of that architecture change
out-of-tree? In other words, create a separate deployment, and rip out
the parts that don't work well, replacing them with an alternative
design. There would be no Gerrit reviews or anything that would slow
down the work or add load to the already overloaded reviewers. Then we
could see if this modified system is a significant-enough improvement
to justify investing the time in implementing it in-tree. And, of
course, if the test doesn't show what was hoped for, it is scrapped
and we start thinking anew.

The important part in this process is defining up front what level of
improvement would be needed to make considering actually making such a
change worthwhile, and what sort of tests would demonstrate whether or
not whether this level was met. I'd like to discuss such an experiment
next week at the Nova mid-cycle.

What I'd like to investigate is replacing the current design of having
the compute nodes communicating with the scheduler via message queues.
This design is overly complex and has several known scalability
issues. My thought is to replace this with a Cassandra [1] backend.
Compute nodes would update their state to Cassandra whenever they
change, and that data would be read by the scheduler to make its host
selection. When the scheduler chooses a host, it would post the claim
to Cassandra wrapped in a lightweight transaction, which would ensure
that no other scheduler has tried to claim those resources. When the
host has built the requested VM, it will delete the claim and update
Cassandra with its current state.

One main motivation for using Cassandra over the current design is
that it will enable us to run multiple schedulers without increasing
the raciness of the system. Another is that it will greatly simplify a
lot of the internal plumbing we've set up to implement in Nova what we
would get out of the box with Cassandra. A third is that if this
proves to be a success, it would also be able to be used further down
the road to simplify inter-cell communication (but this is getting
ahead of ourselves...). I've worked with Cassandra before and it has
been rock-solid to run and simple to set up. I've also had preliminary
technical reviews with the engineers at DataStax [2], the company
behind Cassandra, and they agreed that this was a good fit.

At this point I'm sure that most of you are filled with thoughts on
how this won't work, or how much trouble it will be to switch, or how
much more of a pain it will be, or how you hate non-relational DBs, or
any of a zillion other negative thoughts. FWIW, I have them too. But
instead of ranting, I would ask that we acknowledge for now that:

a) it will be disruptive and painful to switch something like this at
this point in Nova's development
b) it would have to provide *significant* improvement to make such a
change worthwhile

So what I'm asking from all of you is to help define the second part:
what we would want improved, and how to measure those benefits. In
other words, what results would you have to see in order to make you
reconsider your initial nah, this'll never work reaction, and start
to think that this is will be a worthwhile change to make to Nova.

I'm also asking that you refrain from talking about why this can't
work for now. I know it'll be difficult to do that, since nobody likes
ranting about stuff more than I do, but right now it won't be helpful.
There will be plenty of time for that later, assuming that this
experiment yields anything worthwhile. Instead, think of the current
pain points in the scheduler design, and what sort of improvement you
would have to see in order to seriously consider undertaking this
change to Nova.

I've gotten the OK from my management to pursue this, and several
people in the community have expressed support for both the approach
and the experiment, even though most don't have spare cycles to
contribute. I'd love to have anyone who is interested become involved.

I hope that this will be a positive discussion at the Nova mid-cycle
next week. I know it will be a lively one. :)

[1] http://cassandra.apache.org/
[2] http://www.datastax.com/
- --

- -- Ed Leafe
-BEGIN PGP SIGNATURE-
Version: GnuPG v2
Comment: GPGTools - https://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJVpmvCAAoJEKMgtcocwZqLSNYP/0b8s7pZnXaF3tTYF+WtNppr
lHyQMHSXLQ3CESoS4961ZWOCMtV2hCxvcioXem+PJzOdZED143XMJ3LR6+dZ012q
RGSp43Co+vUfuTtaTg030sLyDlXZKEenkPXy0202WpPaK4RYSonrnrxs0kmv+ZpH
yamsZP2/gReZseBsKiww0FkqWGkIJxD7bi1r8DdLa/HLvwYUD+U2zrcUvT4cMXMR

Re: [openstack-dev] [nova] Proposal for an Experiment

2015-07-15 Thread Ed Leafe
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 07/15/2015 09:49 AM, Matt Riedemann wrote:

 Without reading the whole thread, couldn't you just do a feature
 branch (but that would require reviews in gerrit which we don't
 want), or fork the repo in github and just hack on it there without
 gerrit?
 
 I'm sure many will say it's not cool to fork the repo, but that's 
 essentially what you'd be doing anyway, so meh.

It will be a temporary fork, not anything designed to live on forever.

 I think you just have to have an understanding that whatever you
 work on in the fork won't necessarily be accepted back in the main
 repo.

Yes, that's sort of the whole point. First prove that it can work;
then and only then do we sit down and discuss the best way to
implement. The odds that any of the changes made would be able to be
pulled directly into master would be slim.

- -- 

- -- Ed Leafe
-BEGIN PGP SIGNATURE-
Version: GnuPG v2
Comment: GPGTools - https://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCgAGBQJVpnlOAAoJEKMgtcocwZqL1loQAI/xGth0tSbXAB5gm3bjKYMQ
mdsWopf2sAfBUqgSXys5VmYRMuJPGsVXmIQhOYVZtjA8FFAAHcfeHba6c8uIw04n
iZgOv/Da8ABX+Saj7jFnjXrBpujD6v4b7T2WhIWg38RNT15z79wTCG0Olh2WPHP0
UNGu79iTqV2c7jaFQ1P91jswRRfLYoY/MaRnTCEhT0Rl/VYS46IeSK9GY1PXrC+z
ZBNKdqo2RHqNisPPsdvBVvdTsbcTU3Y8T00u+djp/OEHTPQGIP6SIUzFL61iOVye
RXcdSehWmGNG61Tiq1ng6qSzVoisWYaP9kATrXRGTVUhYVJXrhiCgCZPJ8WK3jSI
Du3meEW1mr18NcDClTsMbbPmuMeTlPTwWoVNqqqDBhFYQIHTYhbwk9cI2XwkKy0+
VQdORuO5h9Qt7JNdRGb62kDLrC4tKnXP7TWCmqmGXdj31kiCQc4vno+kozzJb90j
6I/I37acxIDKFBvF6GsdWxYNnJdIz03IfoQtMwfR6Jc3QTwl47h/aUuIrTpVpXPA
+CCgmcrimef5reQB8kaUEbPyPbwjBUOoYxaFJi3mtQ13nWoOsU23km9qt253+9eS
xWVcRL06L6418juvPbMPqDz3giNhUT5ZOL/qC/a9UirQw3p2mASeVwTKmDwfOl+i
zhnQQpeqIPkWR3N7+Mwu
=5/gA
-END PGP SIGNATURE-

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev