Public bug reported:
Description
===========
When using high speed disks and NFS as storage backend, during high loads the
nfs mounts hang indefinitely.
Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* Spin up a VM with a mounted cinder volume from an NFS backend
* Generate some read/write load
* Occationally the loopback NFS mounts will hang. The machine and everything
else using that mount will also hang.
Expected result
===============
The system should run stably
Actual result
=============
Occasionally , usually during higher load the system will hang.
Environment
===========
1. Exact version of OpenStack you are running. See the following
Openstack Kilo
openstack-nova-compute-2015.1.1-1.el7.noarch
openstack-nova-cert-2015.1.1-1.el7.noarch
python-nova-2015.1.1-1.el7.noarch
openstack-nova-console-2015.1.1-1.el7.noarch
openstack-nova-novncproxy-2015.1.1-1.el7.noarch
openstack-nova-common-2015.1.1-1.el7.noarch
python-novaclient-2.23.0-1.el7.noarch
openstack-nova-scheduler-2015.1.1-1.el7.noarch
openstack-nova-api-2015.1.1-1.el7.noarch
openstack-nova-conductor-2015.1.1-1.el7.noarch
2. Which hypervisor did you use?
Libvirt + KVM,
2. Which storage type did you use?
NFS
3. Which networking type did you use?
Neutron with OpenVSwitch
Logs & Configs
==============
Nova.conf:
[DEFAULT]
notification_driver=ceilometer.compute.nova_notifier
notification_driver=nova.openstack.common.notifier.rpc_notifier
notification_driver =
notification_topics=notifications
rpc_backend=rabbit
internal_service_availability_zone=internal
default_availability_zone=nova
notify_api_faults=False
state_path=/openstack/nova
report_interval=10
enabled_apis=ec2,osapi_compute,metadata
ec2_listen=0.0.0.0
ec2_workers=2
osapi_compute_listen=0.0.0.0
osapi_compute_workers=2
metadata_listen=0.0.0.0
metadata_workers=2
compute_manager=nova.compute.manager.ComputeManager
service_down_time=60
rootwrap_config=/etc/nova/rootwrap.conf
auth_strategy=keystone
use_forwarded_for=False
novncproxy_host=192.168.0.1
novncproxy_port=6080
allow_resize_to_same_host=true
block_device_allocate_retries=1560
heal_instance_info_cache_interval=60
reserved_host_memory_mb=512
network_api_class=nova.network.neutronv2.api.API
default_floating_pool=public
force_snat_range=0.0.0.0/0
metadata_host=192.168.0.1
dhcp_domain=novalocal
security_group_api=neutron
debug=True
verbose=True
log_dir=/var/log/nova
use_syslog=False
cpu_allocation_ratio=16.0
ram_allocation_ratio=1.5
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
compute_driver=libvirt.LibvirtDriver
vif_plugging_is_fatal=True
vif_plugging_timeout=300
firewall_driver=nova.virt.firewall.NoopFirewallDriver
remove_unused_base_images=true
force_raw_images=True
novncproxy_base_url=http://0.0.0.0:6080/vnc_auto.html
vncserver_listen=192.168.0.1
vncserver_proxyclient_address=127.0.0.1
vnc_enabled=True
vnc_keymap=en-us
volume_api_class=nova.volume.cinder.API
amqp_durable_queues=False
sql_connection=mysql:XXXXXXXXXXX
lock_path=/openstack/nova/tmp
osapi_volume_listen=0.0.0.0
[api_database]
[barbican]
[cells]
[cinder]
[conductor]
workers=2
[database]
[ephemeral_storage_encryption]
[glance]
api_servers=192.168.0.1:9292
[guestfs]
[hyperv]
[image_file_url]
[ironic]
[keymgr]
[keystone_authtoken]
auth_uri=http://192.168.0.1:5000/v2.0
identity_uri=http://192.168.0.1:35357
admin_user=nova
admin_password=XXXXXXx
[libvirt]
virt_type=kvm
inject_password=False
inject_key=False
inject_partition=-1
live_migration_uri=qemu+tcp://nova@%s/system
cpu_mode=host-model
disk_cachemodes=file=writethrough,block=writethrough
nfs_mount_options=rw,hard,intr,nolock,vers=4.1,timeo=10
vif_driver=nova.virt.libvirt.vif.LibvirtGenericVIFDriver
[metrics]
[neutron]
......
Cinder.conf:
[nfs_ssd]
nfs_used_ratio=0.95
nfs_oversub_ratio=10.0
volume_driver=cinder.volume.drivers.nfs.NfsDriver
nfs_shares_config=/etc/cinder/nfs_shares_ssd.conf
volume_backend_name=nfs_ssd
quota_volumes = -1
nfs_mount_options=rw,hard,intr,nolock
- No notable output in nova log
- System log /dmesg after a hang:
Nov 24 04:10:41 openstack1.itgix.com kernel: INFO: task qemu-kvm:11726 blocked
for more than 120 seconds.
Nov 24 04:10:41 openstack1.itgix.com kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 24 04:10:41 openstack1.itgix.com kernel: qemu-kvm D ffff88118b1b1f60
0 11726 1 0x00000080
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff880da4c77c40 0000000000000082
ffff881184b86780 ffff880da4c77fd8
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff880da4c77fd8 ffff880da4c77fd8
ffff881184b86780 ffff88118b1b1f58
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff88118b1b1f5c ffff881184b86780
00000000ffffffff ffff88118b1b1f60
Nov 24 04:10:41 openstack1.itgix.com kernel: Call Trace:
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163bf29>]
schedule_preempt_disabled+0x29/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81639c25>]
__mutex_lock_slowpath+0xc5/0x1c0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff812fbff4>] ?
timerqueue_del+0x24/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163908f>]
mutex_lock+0x1f/0x2f
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8116b60a>]
generic_file_aio_write+0x4a/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffffa06dc03b>]
nfs_file_write+0xbb/0x1d0 [nfs]
Nov 24 04:10:41 openstack1.itgix.com kernel: Call Trace:
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163bf29>]
schedule_preempt_disabled+0x29/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81639c25>]
__mutex_lock_slowpath+0xc5/0x1c0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163908f>]
mutex_lock+0x1f/0x2f
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8116b60a>]
generic_file_aio_write+0x4a/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffffa06dc03b>]
nfs_file_write+0xbb/0x1d0 [nfs]
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811dde5d>]
do_sync_write+0x8d/0xd0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811de67d>]
vfs_write+0xbd/0x1e0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811df2d2>]
SyS_pwrite64+0x92/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81645ec9>]
system_call_fastpath+0x16/0x1b
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1646896
Title:
System hangs when using NFS storage backend with loopback mounts
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
When using high speed disks and NFS as storage backend, during high loads the
nfs mounts hang indefinitely.
Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* Spin up a VM with a mounted cinder volume from an NFS backend
* Generate some read/write load
* Occationally the loopback NFS mounts will hang. The machine and everything
else using that mount will also hang.
Expected result
===============
The system should run stably
Actual result
=============
Occasionally , usually during higher load the system will hang.
Environment
===========
1. Exact version of OpenStack you are running. See the following
Openstack Kilo
openstack-nova-compute-2015.1.1-1.el7.noarch
openstack-nova-cert-2015.1.1-1.el7.noarch
python-nova-2015.1.1-1.el7.noarch
openstack-nova-console-2015.1.1-1.el7.noarch
openstack-nova-novncproxy-2015.1.1-1.el7.noarch
openstack-nova-common-2015.1.1-1.el7.noarch
python-novaclient-2.23.0-1.el7.noarch
openstack-nova-scheduler-2015.1.1-1.el7.noarch
openstack-nova-api-2015.1.1-1.el7.noarch
openstack-nova-conductor-2015.1.1-1.el7.noarch
2. Which hypervisor did you use?
Libvirt + KVM,
2. Which storage type did you use?
NFS
3. Which networking type did you use?
Neutron with OpenVSwitch
Logs & Configs
==============
Nova.conf:
[DEFAULT]
notification_driver=ceilometer.compute.nova_notifier
notification_driver=nova.openstack.common.notifier.rpc_notifier
notification_driver =
notification_topics=notifications
rpc_backend=rabbit
internal_service_availability_zone=internal
default_availability_zone=nova
notify_api_faults=False
state_path=/openstack/nova
report_interval=10
enabled_apis=ec2,osapi_compute,metadata
ec2_listen=0.0.0.0
ec2_workers=2
osapi_compute_listen=0.0.0.0
osapi_compute_workers=2
metadata_listen=0.0.0.0
metadata_workers=2
compute_manager=nova.compute.manager.ComputeManager
service_down_time=60
rootwrap_config=/etc/nova/rootwrap.conf
auth_strategy=keystone
use_forwarded_for=False
novncproxy_host=192.168.0.1
novncproxy_port=6080
allow_resize_to_same_host=true
block_device_allocate_retries=1560
heal_instance_info_cache_interval=60
reserved_host_memory_mb=512
network_api_class=nova.network.neutronv2.api.API
default_floating_pool=public
force_snat_range=0.0.0.0/0
metadata_host=192.168.0.1
dhcp_domain=novalocal
security_group_api=neutron
debug=True
verbose=True
log_dir=/var/log/nova
use_syslog=False
cpu_allocation_ratio=16.0
ram_allocation_ratio=1.5
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
compute_driver=libvirt.LibvirtDriver
vif_plugging_is_fatal=True
vif_plugging_timeout=300
firewall_driver=nova.virt.firewall.NoopFirewallDriver
remove_unused_base_images=true
force_raw_images=True
novncproxy_base_url=http://0.0.0.0:6080/vnc_auto.html
vncserver_listen=192.168.0.1
vncserver_proxyclient_address=127.0.0.1
vnc_enabled=True
vnc_keymap=en-us
volume_api_class=nova.volume.cinder.API
amqp_durable_queues=False
sql_connection=mysql:XXXXXXXXXXX
lock_path=/openstack/nova/tmp
osapi_volume_listen=0.0.0.0
[api_database]
[barbican]
[cells]
[cinder]
[conductor]
workers=2
[database]
[ephemeral_storage_encryption]
[glance]
api_servers=192.168.0.1:9292
[guestfs]
[hyperv]
[image_file_url]
[ironic]
[keymgr]
[keystone_authtoken]
auth_uri=http://192.168.0.1:5000/v2.0
identity_uri=http://192.168.0.1:35357
admin_user=nova
admin_password=XXXXXXx
[libvirt]
virt_type=kvm
inject_password=False
inject_key=False
inject_partition=-1
live_migration_uri=qemu+tcp://nova@%s/system
cpu_mode=host-model
disk_cachemodes=file=writethrough,block=writethrough
nfs_mount_options=rw,hard,intr,nolock,vers=4.1,timeo=10
vif_driver=nova.virt.libvirt.vif.LibvirtGenericVIFDriver
[metrics]
[neutron]
......
Cinder.conf:
[nfs_ssd]
nfs_used_ratio=0.95
nfs_oversub_ratio=10.0
volume_driver=cinder.volume.drivers.nfs.NfsDriver
nfs_shares_config=/etc/cinder/nfs_shares_ssd.conf
volume_backend_name=nfs_ssd
quota_volumes = -1
nfs_mount_options=rw,hard,intr,nolock
- No notable output in nova log
- System log /dmesg after a hang:
Nov 24 04:10:41 openstack1.itgix.com kernel: INFO: task qemu-kvm:11726
blocked for more than 120 seconds.
Nov 24 04:10:41 openstack1.itgix.com kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 24 04:10:41 openstack1.itgix.com kernel: qemu-kvm D
ffff88118b1b1f60 0 11726 1 0x00000080
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff880da4c77c40
0000000000000082 ffff881184b86780 ffff880da4c77fd8
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff880da4c77fd8
ffff880da4c77fd8 ffff881184b86780 ffff88118b1b1f58
Nov 24 04:10:41 openstack1.itgix.com kernel: ffff88118b1b1f5c
ffff881184b86780 00000000ffffffff ffff88118b1b1f60
Nov 24 04:10:41 openstack1.itgix.com kernel: Call Trace:
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163bf29>]
schedule_preempt_disabled+0x29/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81639c25>]
__mutex_lock_slowpath+0xc5/0x1c0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff812fbff4>] ?
timerqueue_del+0x24/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163908f>]
mutex_lock+0x1f/0x2f
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8116b60a>]
generic_file_aio_write+0x4a/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffffa06dc03b>]
nfs_file_write+0xbb/0x1d0 [nfs]
Nov 24 04:10:41 openstack1.itgix.com kernel: Call Trace:
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163bf29>]
schedule_preempt_disabled+0x29/0x70
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81639c25>]
__mutex_lock_slowpath+0xc5/0x1c0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8163908f>]
mutex_lock+0x1f/0x2f
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff8116b60a>]
generic_file_aio_write+0x4a/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffffa06dc03b>]
nfs_file_write+0xbb/0x1d0 [nfs]
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811dde5d>]
do_sync_write+0x8d/0xd0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811de67d>]
vfs_write+0xbd/0x1e0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff811df2d2>]
SyS_pwrite64+0x92/0xc0
Nov 24 04:10:41 openstack1.itgix.com kernel: [<ffffffff81645ec9>]
system_call_fastpath+0x16/0x1b
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1646896/+subscriptions
--
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp