Re: [openstack-dev] [Magnum] Consistent functional test failures (seems infra not have enough resource)
On 2015-08-21 01:10:22 + (+), Steven Dake (stdake) wrote: [...] How large is /opt? [...] It appears at the moment HP Cloud gives us a 30GiB root filesystem (vda1) and a 0.5TiB ephemeral disk (vdb). Rackspace on the other hand provides a 40GB root filesystem (xvda1) and 80GB ephemeral disk (xvde). If your jobs are using devstack-gate, have a look at fix_disk_layout() in functions.sh for details on how we repartition, format and mount ephemeral disks. If your job is not based on devstack-gate, then you should be able to implement some similar routines to duplicate this. -- Jeremy Stanley __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Magnum] Consistent functional test failures
On 8/13/15, 4:58 PM, Clark Boylan cboy...@sapwetik.org wrote: On Thu, Aug 13, 2015, at 03:13 AM, Tom Cammann wrote: Hi Team, Wanted to let you know why we are having consistent functional test failures in the gate. This is being caused by Nova returning No valid host to heat: 2015-08-13 08:26:16.303 31543 INFO heat.engine.resource [-] CREATE: Server kube_minion [12ab45ef-0177-4118-9ba0-3fffbc3c1d1a] Stack testbay-y366b2atg6mm-kube_minions-cdlfyvhaximr-0-dufsjliqfoet [b40f0c9f-cb54-4d75-86c3-8a9f347a27a6] 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource Traceback (most recent call last): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 625, in _action_recorder 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 696, in _do_action 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield self.action_handler_task(action, args=handler_args) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/scheduler.py, line 320, in wrapper 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource step = next(subtask) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 670, in action_handler_task 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource while not check(handler_data): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resources/openstack/nova/server.py, line 759, in check_create_complete 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource return self.client_plugin()._check_active(server_id) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/clients/os/nova.py, line 232, in _check_active 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource 'code': fault.get('code', _('Unknown')) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource ResourceInError: Went to status ERROR due to Message: No valid host was found. There are not enough hosts available., Code: 500 And this in turn is being caused by the compute instance running out of disk space: 2015-08-13 08:26:15.216 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Starting with 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:70 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter RetryFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter AvailabilityZoneFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter RamFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.218 DEBUG nova.scheduler.filters.disk_filter [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] (devstack-trusty-rax-dfw-4299602, devstack-trusty-rax-dfw-4299602) ram:5172 disk:17408 io_ops:0 instances:1 does not have 20480 MB usable disk, it only has 17408.0 MB usable disk. host_passes /opt/stack/new/nova/nova/scheduler/filters/disk_filter.py:60 2015-08-13 08:26:15.218 INFO nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter DiskFilter returned 0 hosts For now a recheck seems to work about 1 in 2, so we can still land patches. The fix for this could be to clean up our Magnum devstack install more aggressively, which might be as simple as cleaning up the images we use, or get infra to provide our tests with a larger disk size. I will probably test out a patch today which cleans up the images we use in devstack to see if that helps. It is not trivial to provide your tests with more disk as we are using the flavors appropriate for our RAM and CPU needs and are constrained by quotas in the clouds we use. Do you really need 20GB nested test instances? The VMs these jobs run on have ~13GB images which is almost half the size of the instances you are trying to boot there. I would definitely look into trimming the disk requirements for the nested VMs before anything else. As for working ~50% of the time hpcloud gives us more disk than rackspace which is likely why you see about half fail and half pass. The runs that pass probably run on hpcloud VMs. In the short term, is there a way to request HP vms? 20gb won¹t do the job unfortunately. Regards, -steve Clark __ OpenStack Development Mailing List (not for usage questions) Unsubscribe:
Re: [openstack-dev] [Magnum] Consistent functional test failures (seems infra not have enough resource)
On 8/13/15, 6:13 AM, Jeremy Stanley fu...@yuggoth.org wrote: On 2015-08-13 19:38:07 +0800 (+0800), Kai Qiang Wu wrote: I did talked to infra, which I think it is resource issue, But they thought it is nova issue, [...] No, I said the error was being raised by Nova, so was not an error coming _from_ the infrastructure we manage. If your jobs are more resource-intensive than a typical devstack/tempest job, you'll want to look at ways to scale them back. It is 20GB disk space, so failed for that. Correct, we run jobs on resources donated by public service providers. Some of them only provide a 20GB root disk. There's generally an ephemeral disk mounted at /opt with additional space if you can modify your job to leverage that for whatever is running out of space. How large is /opt? I think it is related with this, the jenkins allocated VM disk space is not large. I am curious why it failed so often recently. Does os-infra changed something ? Nothing has been intentionally changed with our disk space on job workers as far as I'm aware. Different workers have varying root disk sizes depending on the provider where they were booted, but they could be as small as 20GB so your job will need to take that into account. 20gb isn¹t enough for Magnum¹s CI jobs. We could link /var/lib/docker to /opt if there is sufficient space there. Regards, -steve -- Jeremy Stanley __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Magnum] Consistent functional test failures
Kai, This sounds like a good solution. The actual VM doesn’t need to be super large given our present tests. Regards -steve From: Kai Qiang Wu wk...@cn.ibm.commailto:wk...@cn.ibm.com Reply-To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Date: Friday, August 14, 2015 at 3:46 AM To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [Magnum] Consistent functional test failures I have checked with infra team members. For two instances, 10GB each should be OK. So I add some steps to create magnum specific flavor(8 GB disk), instead of use the existed devstack flavors (m1.small needs 20GB, m1.tiny can not be used) Magnum creates one for jenkins job and delete it when tests finished. Thanks Best Wishes, Kai Qiang Wu (吴开强 Kennan) IBM China System and Technology Lab, Beijing E-mail: wk...@cn.ibm.commailto:wk...@cn.ibm.com Tel: 86-10-82451647 Address: Building 28(Ring Building), ZhongGuanCun Software Park, No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 Follow your heart. You are miracle! [Inactive hide details for Clark Boylan ---08/14/2015 08:05:15 AM---On Thu, Aug 13, 2015, at 03:13 AM, Tom Cammann wrote: Hi T]Clark Boylan ---08/14/2015 08:05:15 AM---On Thu, Aug 13, 2015, at 03:13 AM, Tom Cammann wrote: Hi Team, From: Clark Boylan cboy...@sapwetik.orgmailto:cboy...@sapwetik.org To: openstack-dev@lists.openstack.orgmailto:openstack-dev@lists.openstack.org Date: 08/14/2015 08:05 AM Subject: Re: [openstack-dev] [Magnum] Consistent functional test failures On Thu, Aug 13, 2015, at 03:13 AM, Tom Cammann wrote: Hi Team, Wanted to let you know why we are having consistent functional test failures in the gate. This is being caused by Nova returning No valid host to heat: 2015-08-13 08:26:16.303 31543 INFO heat.engine.resource [-] CREATE: Server kube_minion [12ab45ef-0177-4118-9ba0-3fffbc3c1d1a] Stack testbay-y366b2atg6mm-kube_minions-cdlfyvhaximr-0-dufsjliqfoet [b40f0c9f-cb54-4d75-86c3-8a9f347a27a6] 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource Traceback (most recent call last): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 625, in _action_recorder 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 696, in _do_action 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield self.action_handler_task(action, args=handler_args) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/scheduler.py, line 320, in wrapper 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource step = next(subtask) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 670, in action_handler_task 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource while not check(handler_data): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resources/openstack/nova/server.py, line 759, in check_create_complete 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource return self.client_plugin()._check_active(server_id) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/clients/os/nova.py, line 232, in _check_active 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource 'code': fault.get('code', _('Unknown')) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource ResourceInError: Went to status ERROR due to Message: No valid host was found. There are not enough hosts available., Code: 500 And this in turn is being caused by the compute instance running out of disk space: 2015-08-13 08:26:15.216 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Starting with 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:70 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter RetryFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter AvailabilityZoneFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter RamFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84
Re: [openstack-dev] [Magnum] Consistent functional test failures
I have checked with infra team members. For two instances, 10GB each should be OK. So I add some steps to create magnum specific flavor(8 GB disk), instead of use the existed devstack flavors (m1.small needs 20GB, m1.tiny can not be used) Magnum creates one for jenkins job and delete it when tests finished. Thanks Best Wishes, Kai Qiang Wu (吴开强 Kennan) IBM China System and Technology Lab, Beijing E-mail: wk...@cn.ibm.com Tel: 86-10-82451647 Address: Building 28(Ring Building), ZhongGuanCun Software Park, No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 Follow your heart. You are miracle! From: Clark Boylan cboy...@sapwetik.org To: openstack-dev@lists.openstack.org Date: 08/14/2015 08:05 AM Subject:Re: [openstack-dev] [Magnum] Consistent functional test failures On Thu, Aug 13, 2015, at 03:13 AM, Tom Cammann wrote: Hi Team, Wanted to let you know why we are having consistent functional test failures in the gate. This is being caused by Nova returning No valid host to heat: 2015-08-13 08:26:16.303 31543 INFO heat.engine.resource [-] CREATE: Server kube_minion [12ab45ef-0177-4118-9ba0-3fffbc3c1d1a] Stack testbay-y366b2atg6mm-kube_minions-cdlfyvhaximr-0-dufsjliqfoet [b40f0c9f-cb54-4d75-86c3-8a9f347a27a6] 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource Traceback (most recent call last): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 625, in _action_recorder 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 696, in _do_action 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield self.action_handler_task(action, args=handler_args) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/scheduler.py, line 320, in wrapper 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource step = next(subtask) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 670, in action_handler_task 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource while not check(handler_data): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resources/openstack/nova/server.py, line 759, in check_create_complete 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource return self.client_plugin()._check_active(server_id) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/clients/os/nova.py, line 232, in _check_active 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource 'code': fault.get('code', _('Unknown')) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource ResourceInError: Went to status ERROR due to Message: No valid host was found. There are not enough hosts available., Code: 500 And this in turn is being caused by the compute instance running out of disk space: 2015-08-13 08:26:15.216 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Starting with 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:70 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter RetryFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter AvailabilityZoneFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter RamFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.218 DEBUG nova.scheduler.filters.disk_filter [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] (devstack-trusty-rax-dfw-4299602, devstack-trusty-rax-dfw-4299602) ram:5172 disk:17408 io_ops:0 instances:1 does not have 20480 MB usable disk, it only has 17408.0 MB usable disk. host_passes /opt/stack/new/nova/nova/scheduler/filters/disk_filter.py:60 2015-08-13 08:26:15.218 INFO nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter DiskFilter returned 0 hosts For now a recheck seems to work about 1 in 2, so we can still land patches. The fix for this could be to clean up our Magnum devstack install more aggressively, which might be as simple as cleaning up the images we use, or get infra to provide our tests with a larger disk size. I will probably test out a patch today which cleans up the images we use in devstack to see if that helps
Re: [openstack-dev] [Magnum] Consistent functional test failures (seems infra not have enough resource)
Hi Tom, I did talked to infra, which I think it is resource issue, But they thought it is nova issue, For we boot k8s bay, we use baymodel with falvor m1.small, you can find devstack +-+---+---+--+---+--+---+-+---+ | ID | Name | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public | +-+---+---+--+---+--+---+-+---+ | 1 | m1.tiny | 512 | 1| 0 | | 1 | 1.0 | True | | 2 | m1.small | 2048 | 20 | 0 | | 1 | 1.0 | True | | 3 | m1.medium | 4096 | 40 | 0 | | 2 | 1.0 | True | | 4 | m1.large | 8192 | 80 | 0 | | 4 | 1.0 | True | | 42 | m1.nano | 64| 0| 0 | | 1 | 1.0 | True | | 451 | m1.heat | 512 | 0| 0 | | 1 | 1.0 | True | | 5 | m1.xlarge | 16384 | 160 | 0 | | 8 | 1.0 | True | | 84 | m1.micro | 128 | 0| 0 | | 1 | 1.0 | True | +-+---+---+--+---+--+---+-+---+ From logs below: [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] (devstack-trusty-rax-dfw-4299602, devstack-trusty-rax-dfw-4299602) ram:5172 disk:17408 io_ops:0 instances:1 does not have 20480 MB usable disk, it only has 17408.0 MB usable disk. host_passes /opt/stack/new/nova/nova/scheduler/filters/disk_filter.py:60 2015-08-13 08:26:15.218 INFO nova.filters [req-e It is 20GB disk space, so failed for that. I think it is related with this, the jenkins allocated VM disk space is not large. I am curious why it failed so often recently. Does os-infra changed something ? Thanks Best Wishes, Kai Qiang Wu (吴开强 Kennan) IBM China System and Technology Lab, Beijing E-mail: wk...@cn.ibm.com Tel: 86-10-82451647 Address: Building 28(Ring Building), ZhongGuanCun Software Park, No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 Follow your heart. You are miracle! From: Tom Cammann tom.camm...@hp.com To: OpenStack Development Mailing List (not for usage questions) openstack-dev@lists.openstack.org Date: 08/13/2015 06:24 PM Subject:[openstack-dev] [Magnum] Consistent functional test failures Hi Team, Wanted to let you know why we are having consistent functional test failures in the gate. This is being caused by Nova returning No valid host to heat: 2015-08-13 08:26:16.303 31543 INFO heat.engine.resource [-] CREATE: Server kube_minion [12ab45ef-0177-4118-9ba0-3fffbc3c1d1a] Stack testbay-y366b2atg6mm-kube_minions-cdlfyvhaximr-0-dufsjliqfoet [b40f0c9f-cb54-4d75-86c3-8a9f347a27a6] 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource Traceback (most recent call last): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 625, in _action_recorder 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 696, in _do_action 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield self.action_handler_task(action, args=handler_args) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/scheduler.py, line 320, in wrapper 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource step = next(subtask) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 670, in action_handler_task 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource while not check(handler_data): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resources/openstack/nova/server.py, line 759, in check_create_complete 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource return self.client_plugin()._check_active(server_id) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/clients/os/nova.py, line 232, in _check_active 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource 'code': fault.get('code', _('Unknown')) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource ResourceInError: Went to status ERROR due to Message: No valid host was found. There are not enough hosts available., Code: 500 And this in turn is being caused by the compute instance running out of disk space: 2015-08-13 08:26:15.216 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Starting with 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:70 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e
[openstack-dev] [Magnum] Consistent functional test failures
Hi Team, Wanted to let you know why we are having consistent functional test failures in the gate. This is being caused by Nova returning No valid host to heat: 2015-08-13 08:26:16.303 31543 INFO heat.engine.resource [-] CREATE: Server kube_minion [12ab45ef-0177-4118-9ba0-3fffbc3c1d1a] Stack testbay-y366b2atg6mm-kube_minions-cdlfyvhaximr-0-dufsjliqfoet [b40f0c9f-cb54-4d75-86c3-8a9f347a27a6] 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource Traceback (most recent call last): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 625, in _action_recorder 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 696, in _do_action 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield self.action_handler_task(action, args=handler_args) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/scheduler.py, line 320, in wrapper 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource step = next(subtask) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 670, in action_handler_task 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource while not check(handler_data): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resources/openstack/nova/server.py, line 759, in check_create_complete 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource return self.client_plugin()._check_active(server_id) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/clients/os/nova.py, line 232, in _check_active 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource 'code': fault.get('code', _('Unknown')) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource ResourceInError: Went to status ERROR due to Message: No valid host was found. There are not enough hosts available., Code: 500 And this in turn is being caused by the compute instance running out of disk space: 2015-08-13 08:26:15.216 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Starting with 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:70 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter RetryFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter AvailabilityZoneFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter RamFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.218 DEBUG nova.scheduler.filters.disk_filter [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] (devstack-trusty-rax-dfw-4299602, devstack-trusty-rax-dfw-4299602) ram:5172 disk:17408 io_ops:0 instances:1 does not have 20480 MB usable disk, it only has 17408.0 MB usable disk. host_passes /opt/stack/new/nova/nova/scheduler/filters/disk_filter.py:60 2015-08-13 08:26:15.218 INFO nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter DiskFilter returned 0 hosts For now a recheck seems to work about 1 in 2, so we can still land patches. The fix for this could be to clean up our Magnum devstack install more aggressively, which might be as simple as cleaning up the images we use, or get infra to provide our tests with a larger disk size. I will probably test out a patch today which cleans up the images we use in devstack to see if that helps. If anyone can help progress this let me know. Cheers, Tom __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Magnum] Consistent functional test failures (seems infra not have enough resource)
On 2015-08-13 19:38:07 +0800 (+0800), Kai Qiang Wu wrote: I did talked to infra, which I think it is resource issue, But they thought it is nova issue, [...] No, I said the error was being raised by Nova, so was not an error coming _from_ the infrastructure we manage. If your jobs are more resource-intensive than a typical devstack/tempest job, you'll want to look at ways to scale them back. It is 20GB disk space, so failed for that. Correct, we run jobs on resources donated by public service providers. Some of them only provide a 20GB root disk. There's generally an ephemeral disk mounted at /opt with additional space if you can modify your job to leverage that for whatever is running out of space. I think it is related with this, the jenkins allocated VM disk space is not large. I am curious why it failed so often recently. Does os-infra changed something ? Nothing has been intentionally changed with our disk space on job workers as far as I'm aware. Different workers have varying root disk sizes depending on the provider where they were booted, but they could be as small as 20GB so your job will need to take that into account. -- Jeremy Stanley __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Magnum] Consistent functional test failures
On Thu, Aug 13, 2015, at 03:13 AM, Tom Cammann wrote: Hi Team, Wanted to let you know why we are having consistent functional test failures in the gate. This is being caused by Nova returning No valid host to heat: 2015-08-13 08:26:16.303 31543 INFO heat.engine.resource [-] CREATE: Server kube_minion [12ab45ef-0177-4118-9ba0-3fffbc3c1d1a] Stack testbay-y366b2atg6mm-kube_minions-cdlfyvhaximr-0-dufsjliqfoet [b40f0c9f-cb54-4d75-86c3-8a9f347a27a6] 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource Traceback (most recent call last): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 625, in _action_recorder 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 696, in _do_action 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource yield self.action_handler_task(action, args=handler_args) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/scheduler.py, line 320, in wrapper 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource step = next(subtask) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resource.py, line 670, in action_handler_task 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource while not check(handler_data): 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/resources/openstack/nova/server.py, line 759, in check_create_complete 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource return self.client_plugin()._check_active(server_id) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource File /opt/stack/new/heat/heat/engine/clients/os/nova.py, line 232, in _check_active 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource 'code': fault.get('code', _('Unknown')) 2015-08-13 08:26:16.303 31543 ERROR heat.engine.resource ResourceInError: Went to status ERROR due to Message: No valid host was found. There are not enough hosts available., Code: 500 And this in turn is being caused by the compute instance running out of disk space: 2015-08-13 08:26:15.216 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Starting with 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:70 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter RetryFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter AvailabilityZoneFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.217 DEBUG nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter RamFilter returned 1 host(s) get_filtered_objects /opt/stack/new/nova/nova/filters.py:84 2015-08-13 08:26:15.218 DEBUG nova.scheduler.filters.disk_filter [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] (devstack-trusty-rax-dfw-4299602, devstack-trusty-rax-dfw-4299602) ram:5172 disk:17408 io_ops:0 instances:1 does not have 20480 MB usable disk, it only has 17408.0 MB usable disk. host_passes /opt/stack/new/nova/nova/scheduler/filters/disk_filter.py:60 2015-08-13 08:26:15.218 INFO nova.filters [req-e5bb52cb-387e-4638-911e-8c72aa1b6400 admin admin] Filter DiskFilter returned 0 hosts For now a recheck seems to work about 1 in 2, so we can still land patches. The fix for this could be to clean up our Magnum devstack install more aggressively, which might be as simple as cleaning up the images we use, or get infra to provide our tests with a larger disk size. I will probably test out a patch today which cleans up the images we use in devstack to see if that helps. It is not trivial to provide your tests with more disk as we are using the flavors appropriate for our RAM and CPU needs and are constrained by quotas in the clouds we use. Do you really need 20GB nested test instances? The VMs these jobs run on have ~13GB images which is almost half the size of the instances you are trying to boot there. I would definitely look into trimming the disk requirements for the nested VMs before anything else. As for working ~50% of the time hpcloud gives us more disk than rackspace which is likely why you see about half fail and half pass. The runs that pass probably run on hpcloud VMs. Clark __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev