Re: [Openstack] Folsom nova-scheduler race condition?
Per my understanding, this shouldn't happen no matter how (fast) you create instances since the requests are queued and scheduler updates resource information after it processes each request. The only possibility may cause the problem you met that I can think of is there are more than 1 scheduler doing scheduling. I think the new retry logic is meant to be safe even if there is more than one scheduler, as the requests are effectively serialised when they get to the compute manager, which can then reject any that break its actual resource limits ? -Original Message- From: openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of Huang Zhiteng Sent: 10 October 2012 04:28 To: Jonathan Proulx Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] Folsom nova-scheduler race condition? On Tue, Oct 9, 2012 at 10:52 PM, Jonathan Proulx j...@jonproulx.com wrote: Hi All, Looking for a sanity test before I file a bug. I very recently upgraded my install to Folsom (on top of Ubuntu 12.04/kvm). My scheduler settings in nova.conf are: scheduler_available_filters=nova.scheduler.filters.standard_filters scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter, ComputeFilter least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost _fn compute_fill_first_cost_fn_weight=1.0 cpu_allocation_ratio=1.0 This had been working to fill systems based on available RAM and to not exceed 1:1 allocation ration of CPU resources with Essex. With Folsom, if I specify a moderately large number of instances to boot or spin up single instances in a tight shell loop they will all get schedule on the same compute node well in excess of the number of available vCPUs . If I start them one at a time (using --poll in a shell loop so each instance is started before the next launches) then I get the expected allocation behaviour. Per my understanding, this shouldn't happen no matter how (fast) you create instances since the requests are queued and scheduler updates resource information after it processes each request. The only possibility may cause the problem you met that I can think of is there are more than 1 scheduler doing scheduling. I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to attempt to address this issue but as I read it that fix is based on retrying failures. Since KVM is capable of over committing both CPU and Memory I don't seem to get retryable failure, just really bad performance. Am I missing something this this fix or perhaps there's a reported bug I didn't find in my search, or is this really a bug no one has reported? Thanks, -Jon ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp -- Regards Huang Zhiteng ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Folsom nova-scheduler race condition?
On Wed, Oct 10, 2012 at 3:44 PM, Day, Phil philip@hp.com wrote: Per my understanding, this shouldn't happen no matter how (fast) you create instances since the requests are queued and scheduler updates resource information after it processes each request. The only possibility may cause the problem you met that I can think of is there are more than 1 scheduler doing scheduling. I think the new retry logic is meant to be safe even if there is more than one scheduler, as the requests are effectively serialised when they get to the compute manager, which can then reject any that break its actual resource limits ? Yes, but it seems Jonathan's filter list doesn't include RetryFilter, so it's possible that he ran into a race condition that RetryFilter targeted to solve. -Original Message- From: openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of Huang Zhiteng Sent: 10 October 2012 04:28 To: Jonathan Proulx Cc: openstack@lists.launchpad.net Subject: Re: [Openstack] Folsom nova-scheduler race condition? On Tue, Oct 9, 2012 at 10:52 PM, Jonathan Proulx j...@jonproulx.com wrote: Hi All, Looking for a sanity test before I file a bug. I very recently upgraded my install to Folsom (on top of Ubuntu 12.04/kvm). My scheduler settings in nova.conf are: scheduler_available_filters=nova.scheduler.filters.standard_filters scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter, ComputeFilter least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost _fn compute_fill_first_cost_fn_weight=1.0 cpu_allocation_ratio=1.0 This had been working to fill systems based on available RAM and to not exceed 1:1 allocation ration of CPU resources with Essex. With Folsom, if I specify a moderately large number of instances to boot or spin up single instances in a tight shell loop they will all get schedule on the same compute node well in excess of the number of available vCPUs . If I start them one at a time (using --poll in a shell loop so each instance is started before the next launches) then I get the expected allocation behaviour. Per my understanding, this shouldn't happen no matter how (fast) you create instances since the requests are queued and scheduler updates resource information after it processes each request. The only possibility may cause the problem you met that I can think of is there are more than 1 scheduler doing scheduling. I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to attempt to address this issue but as I read it that fix is based on retrying failures. Since KVM is capable of over committing both CPU and Memory I don't seem to get retryable failure, just really bad performance. Am I missing something this this fix or perhaps there's a reported bug I didn't find in my search, or is this really a bug no one has reported? Thanks, -Jon ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp -- Regards Huang Zhiteng ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp -- Regards Huang Zhiteng ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Folsom nova-scheduler race condition?
On Wed, Oct 10, 2012 at 4:33 AM, Huang Zhiteng winsto...@gmail.com wrote: Yes, but it seems Jonathan's filter list doesn't include RetryFilter, so it's possible that he ran into a race condition that RetryFilter targeted to solve. Yes, that was it exactly. Thank you for seeing the obvious think I was missing... -Jon ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
[Openstack] Folsom nova-scheduler race condition?
Hi All, Looking for a sanity test before I file a bug. I very recently upgraded my install to Folsom (on top of Ubuntu 12.04/kvm). My scheduler settings in nova.conf are: scheduler_available_filters=nova.scheduler.filters.standard_filters scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,ComputeFilter least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn compute_fill_first_cost_fn_weight=1.0 cpu_allocation_ratio=1.0 This had been working to fill systems based on available RAM and to not exceed 1:1 allocation ration of CPU resources with Essex. With Folsom, if I specify a moderately large number of instances to boot or spin up single instances in a tight shell loop they will all get schedule on the same compute node well in excess of the number of available vCPUs . If I start them one at a time (using --poll in a shell loop so each instance is started before the next launches) then I get the expected allocation behaviour. I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to attempt to address this issue but as I read it that fix is based on retrying failures. Since KVM is capable of over committing both CPU and Memory I don't seem to get retryable failure, just really bad performance. Am I missing something this this fix or perhaps there's a reported bug I didn't find in my search, or is this really a bug no one has reported? Thanks, -Jon ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Folsom nova-scheduler race condition?
Hi Jon, I believe the retry is meant to occur not just if the spawn fails, but also if a host receives a request which it can't honour because it already has too many VMs running or in progress of being launched. Maybe try reducing your filters down a bit (standard_filters means all filters I think) in case there is some odd interaction between that full set ? Phil -Original Message- From: openstack-bounces+philip.day=hp@lists.launchpad.net [mailto:openstack-bounces+philip.day=hp@lists.launchpad.net] On Behalf Of Jonathan Proulx Sent: 09 October 2012 15:53 To: openstack@lists.launchpad.net Subject: [Openstack] Folsom nova-scheduler race condition? Hi All, Looking for a sanity test before I file a bug. I very recently upgraded my install to Folsom (on top of Ubuntu 12.04/kvm). My scheduler settings in nova.conf are: scheduler_available_filters=nova.scheduler.filters.standard_filters scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,ComputeFilter least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn compute_fill_first_cost_fn_weight=1.0 cpu_allocation_ratio=1.0 This had been working to fill systems based on available RAM and to not exceed 1:1 allocation ration of CPU resources with Essex. With Folsom, if I specify a moderately large number of instances to boot or spin up single instances in a tight shell loop they will all get schedule on the same compute node well in excess of the number of available vCPUs . If I start them one at a time (using --poll in a shell loop so each instance is started before the next launches) then I get the expected allocation behaviour. I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to attempt to address this issue but as I read it that fix is based on retrying failures. Since KVM is capable of over committing both CPU and Memory I don't seem to get retryable failure, just really bad performance. Am I missing something this this fix or perhaps there's a reported bug I didn't find in my search, or is this really a bug no one has reported? Thanks, -Jon ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] Folsom nova-scheduler race condition?
On Tue, Oct 9, 2012 at 10:52 PM, Jonathan Proulx j...@jonproulx.com wrote: Hi All, Looking for a sanity test before I file a bug. I very recently upgraded my install to Folsom (on top of Ubuntu 12.04/kvm). My scheduler settings in nova.conf are: scheduler_available_filters=nova.scheduler.filters.standard_filters scheduler_default_filters=AvailabilityZoneFilter,RamFilter,CoreFilter,ComputeFilter least_cost_functions=nova.scheduler.least_cost.compute_fill_first_cost_fn compute_fill_first_cost_fn_weight=1.0 cpu_allocation_ratio=1.0 This had been working to fill systems based on available RAM and to not exceed 1:1 allocation ration of CPU resources with Essex. With Folsom, if I specify a moderately large number of instances to boot or spin up single instances in a tight shell loop they will all get schedule on the same compute node well in excess of the number of available vCPUs . If I start them one at a time (using --poll in a shell loop so each instance is started before the next launches) then I get the expected allocation behaviour. Per my understanding, this shouldn't happen no matter how (fast) you create instances since the requests are queued and scheduler updates resource information after it processes each request. The only possibility may cause the problem you met that I can think of is there are more than 1 scheduler doing scheduling. I see https://bugs.launchpad.net/nova/+bug/1011852 which seems to attempt to address this issue but as I read it that fix is based on retrying failures. Since KVM is capable of over committing both CPU and Memory I don't seem to get retryable failure, just really bad performance. Am I missing something this this fix or perhaps there's a reported bug I didn't find in my search, or is this really a bug no one has reported? Thanks, -Jon ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp -- Regards Huang Zhiteng ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp