Re: [Openstack] Folsom nova-scheduler race condition?

2012-10-10 Thread Day, Phil

 Per my understanding, this shouldn't happen no matter how (fast) you create 
 instances since the requests are
 queued and scheduler updates resource information after it processes each 
 request.  The only possibility may cause 
the problem you met that I can think of is there are more than 1 scheduler 
doing scheduling.

I think the new retry logic is meant to be safe even if there is more than one 
scheduler, as the requests are effectively serialised when they get to the 
compute manager, which can then reject any that break its actual resource 
limits ?

-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Huang Zhiteng
Sent: 10 October 2012 04:28
To: Jonathan Proulx
Cc: openstack@lists.launchpad.net
Subject: Re: [Openstack] Folsom nova-scheduler race condition?

On Tue, Oct 9, 2012 at 10:52 PM, Jonathan Proulx j...@jonproulx.com wrote:
 Hi All,

 Looking for a sanity test before I file a bug.  I very recently 
 upgraded my install to Folsom (on top of Ubuntu 12.04/kvm).  My 
 scheduler settings in nova.conf are:

 scheduler_available_filters=nova.scheduler.filters.standard_filters
 scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,
 ComputeFilter 
 least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost
 _fn
 compute_fill_first_cost_fn_weight=1.0
 cpu_allocation_ratio=1.0

 This had been working to fill systems based on available RAM and to 
 not exceed 1:1 allocation ration of CPU resources with Essex.  With 
 Folsom, if I specify a moderately large number of instances to boot or 
 spin up single instances in a tight shell loop they will all get 
 schedule on the same compute node well in excess of the number of 
 available vCPUs . If I start them one at a time (using --poll in a 
 shell loop so each instance is started before the next launches) then 
 I get the expected allocation behaviour.

Per my understanding, this shouldn't happen no matter how (fast) you create 
instances since the requests are queued and scheduler updates resource 
information after it processes each request.  The only possibility may cause 
the problem you met that I can think of is there are more than
 1 scheduler doing scheduling.
 I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to 
 attempt to address this issue but as I read it that fix is based on 
 retrying failures.  Since KVM is capable of over committing both CPU 
 and Memory I don't seem to get retryable failure, just really bad 
 performance.

 Am I missing something this this fix or perhaps there's a reported bug 
 I didn't find in my search, or is this really a bug no one has 
 reported?

 Thanks,
 -Jon

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



--
Regards
Huang Zhiteng

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Folsom nova-scheduler race condition?

2012-10-10 Thread Huang Zhiteng
On Wed, Oct 10, 2012 at 3:44 PM, Day, Phil philip@hp.com wrote:

 Per my understanding, this shouldn't happen no matter how (fast) you create 
 instances since the requests are
 queued and scheduler updates resource information after it processes each 
 request.  The only possibility may cause
the problem you met that I can think of is there are more than 1 scheduler 
doing scheduling.

 I think the new retry logic is meant to be safe even if there is more than 
 one scheduler, as the requests are effectively serialised when they get to 
 the compute manager, which can then reject any that break its actual resource 
 limits ?

Yes, but it seems Jonathan's filter list doesn't include RetryFilter,
so it's possible that he ran into a race condition that RetryFilter
targeted to solve.

 -Original Message-
 From: openstack-bounces+philip.day=hp@lists.launchpad.net 
 [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
 Huang Zhiteng
 Sent: 10 October 2012 04:28
 To: Jonathan Proulx
 Cc: openstack@lists.launchpad.net
 Subject: Re: [Openstack] Folsom nova-scheduler race condition?

 On Tue, Oct 9, 2012 at 10:52 PM, Jonathan Proulx j...@jonproulx.com wrote:
 Hi All,

 Looking for a sanity test before I file a bug.  I very recently
 upgraded my install to Folsom (on top of Ubuntu 12.04/kvm).  My
 scheduler settings in nova.conf are:

 scheduler_available_filters=nova.scheduler.filters.standard_filters
 scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,
 ComputeFilter
 least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost
 _fn
 compute_fill_first_cost_fn_weight=1.0
 cpu_allocation_ratio=1.0

 This had been working to fill systems based on available RAM and to
 not exceed 1:1 allocation ration of CPU resources with Essex.  With
 Folsom, if I specify a moderately large number of instances to boot or
 spin up single instances in a tight shell loop they will all get
 schedule on the same compute node well in excess of the number of
 available vCPUs . If I start them one at a time (using --poll in a
 shell loop so each instance is started before the next launches) then
 I get the expected allocation behaviour.

 Per my understanding, this shouldn't happen no matter how (fast) you create 
 instances since the requests are queued and scheduler updates resource 
 information after it processes each request.  The only possibility may cause 
 the problem you met that I can think of is there are more than
  1 scheduler doing scheduling.
 I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to
 attempt to address this issue but as I read it that fix is based on
 retrying failures.  Since KVM is capable of over committing both CPU
 and Memory I don't seem to get retryable failure, just really bad
 performance.

 Am I missing something this this fix or perhaps there's a reported bug
 I didn't find in my search, or is this really a bug no one has
 reported?

 Thanks,
 -Jon

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



 --
 Regards
 Huang Zhiteng

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



-- 
Regards
Huang Zhiteng

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Folsom nova-scheduler race condition?

2012-10-10 Thread Jonathan Proulx
On Wed, Oct 10, 2012 at 4:33 AM, Huang Zhiteng winsto...@gmail.com wrote:

 Yes, but it seems Jonathan's filter list doesn't include RetryFilter,
 so it's possible that he ran into a race condition that RetryFilter
 targeted to solve.

Yes, that was it exactly.  Thank you for seeing the obvious think I
was missing...

-Jon

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Folsom nova-scheduler race condition?

2012-10-09 Thread Jonathan Proulx
Hi All,

Looking for a sanity test before I file a bug.  I very recently
upgraded my install to Folsom (on top of Ubuntu 12.04/kvm).  My
scheduler settings in nova.conf are:

scheduler_available_filters=nova.scheduler.filters.standard_filters
scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,ComputeFilter
least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn
compute_fill_first_cost_fn_weight=1.0
cpu_allocation_ratio=1.0

This had been working to fill systems based on available RAM and to
not exceed 1:1 allocation ration of CPU resources with Essex.  With
Folsom, if I specify a moderately large number of instances to boot or
spin up single instances in a tight shell loop they will all get
schedule on the same compute node well in excess of the number of
available vCPUs . If I start them one at a time (using --poll in a
shell loop so each instance is started before the next launches) then
I get the expected allocation behaviour.

I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to
attempt to address this issue but as I read it that fix is based on
retrying failures.  Since KVM is capable of over committing both CPU
and Memory I don't seem to get retryable failure, just really bad
performance.

Am I missing something this this fix or perhaps there's a reported bug
I didn't find in my search, or is this really a bug no one has
reported?

Thanks,
-Jon

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Folsom nova-scheduler race condition?

2012-10-09 Thread Day, Phil
Hi Jon,

I believe the retry is meant to occur not just if the spawn fails, but also if 
a host receives a request which it can't honour because it already has too many 
VMs running or in progress of being launched.   

Maybe try reducing your filters down a bit (standard_filters means all 
filters I think) in case there is some odd interaction between that full set ?

Phil


-Original Message-
From: openstack-bounces+philip.day=hp@lists.launchpad.net 
[mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of 
Jonathan Proulx
Sent: 09 October 2012 15:53
To: openstack@lists.launchpad.net
Subject: [Openstack] Folsom nova-scheduler race condition?

Hi All,

Looking for a sanity test before I file a bug.  I very recently upgraded my 
install to Folsom (on top of Ubuntu 12.04/kvm).  My scheduler settings in 
nova.conf are:

scheduler_available_filters=nova.scheduler.filters.standard_filters
scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,ComputeFilter
least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn
compute_fill_first_cost_fn_weight=1.0
cpu_allocation_ratio=1.0

This had been working to fill systems based on available RAM and to not exceed 
1:1 allocation ration of CPU resources with Essex.  With Folsom, if I specify a 
moderately large number of instances to boot or spin up single instances in a 
tight shell loop they will all get schedule on the same compute node well in 
excess of the number of available vCPUs . If I start them one at a time (using 
--poll in a shell loop so each instance is started before the next launches) 
then I get the expected allocation behaviour.

I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to attempt to 
address this issue but as I read it that fix is based on retrying failures.  
Since KVM is capable of over committing both CPU and Memory I don't seem to get 
retryable failure, just really bad performance.

Am I missing something this this fix or perhaps there's a reported bug I didn't 
find in my search, or is this really a bug no one has reported?

Thanks,
-Jon

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Folsom nova-scheduler race condition?

2012-10-09 Thread Huang Zhiteng
On Tue, Oct 9, 2012 at 10:52 PM, Jonathan Proulx j...@jonproulx.com wrote:
 Hi All,

 Looking for a sanity test before I file a bug.  I very recently
 upgraded my install to Folsom (on top of Ubuntu 12.04/kvm).  My
 scheduler settings in nova.conf are:

 scheduler_available_filters=nova.scheduler.filters.standard_filters
 scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,ComputeFilter
 least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn
 compute_fill_first_cost_fn_weight=1.0
 cpu_allocation_ratio=1.0

 This had been working to fill systems based on available RAM and to
 not exceed 1:1 allocation ration of CPU resources with Essex.  With
 Folsom, if I specify a moderately large number of instances to boot or
 spin up single instances in a tight shell loop they will all get
 schedule on the same compute node well in excess of the number of
 available vCPUs . If I start them one at a time (using --poll in a
 shell loop so each instance is started before the next launches) then
 I get the expected allocation behaviour.

Per my understanding, this shouldn't happen no matter how (fast) you
create instances since the requests are queued and scheduler updates
resource information after it processes each request.  The only possibility
may cause the problem you met that I can think of is there are more than
 1 scheduler doing scheduling.
 I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to
 attempt to address this issue but as I read it that fix is based on
 retrying failures.  Since KVM is capable of over committing both CPU
 and Memory I don't seem to get retryable failure, just really bad
 performance.

 Am I missing something this this fix or perhaps there's a reported bug
 I didn't find in my search, or is this really a bug no one has
 reported?

 Thanks,
 -Jon

 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



-- 
Regards
Huang Zhiteng

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp