[ovirt-users] Re: Managed Block Storage: ceph detach_volume failing after migration
On Wed, Sep 25, 2019 at 8:02 PM Dan Poltawski wrote: > Hi, > > On Wed, 2019-09-25 at 15:42 +0300, Amit Bawer wrote: > > According to resolution of [1] it's a multipathd/udev configuration > > issue. Could be worth to track this issue. > > > > [1] https://tracker.ceph.com/issues/12763 > > Thanks, that certainly looks like a smoking gun to me, in the logs: > > Sep 25 12:27:45 mario multipathd: rbd29: add path (uevent) > Sep 25 12:27:45 mario multipathd: rbd29: spurious uevent, path already > in pathvec > Sep 25 12:27:45 mario multipathd: rbd29: HDIO_GETGEO failed with 25 > Sep 25 12:27:45 mario multipathd: rbd29: failed to get path uid > Sep 25 12:27:45 mario multipathd: uevent trigger error > Please file oVirt bug. Vdsm manages multipath configuration and I don't think we have a blacklist for rbd devices. If this is the issue, you can fix this locally by installing a multipath drop-in configuration: # cat /etc/multipath.conf.d/rbd.conf blacklist { devnode "^(rbd)[0-9]*" } Vdsm should include this configuration in /etc/multipath.conf that vdsm manages. Nir > > > Dan > > > > > On Wed, Sep 25, 2019 at 3:18 PM Dan Poltawski < > > dan.poltaw...@tnp.net.uk> wrote: > > > On ovirt 4.3.5 we are seeing various problems related to the rbd > > > device staying mapped after a guest has been live migrated. This > > > causes problems migrating the guest back, as well as rebooting the > > > guest when it starts back up on the original host. The error > > > returned is ‘nrbd: unmap failed: (16) Device or resource busy’. > > > I’ve pasted the full vdsm log below. > > > > > > As far as I can tell this isn’t happening 100% of the time, and > > > seems to be more prevalent on busy guests. > > > > > > (Not sure if I should create a bug for this, so thought I’d start > > > here first) > > > > > > Thanks, > > > > > > Dan > > > > > > > > > Sep 24 19:26:18 mario vdsm[5485]: ERROR FINISH detach_volume > > > error=Managed Volume Helper failed.: ('Error executing helper: > > > Command [\'/usr/libexec/vdsm/managedvolume-helper\', \'detach\'] > > > failed with rc=1 out=\'\' err=\'oslo.privsep.daemon: Running > > > privsep helper: [\\\'sudo\\\', \\\'privsep-helper\\\', \\\' > > > --privsep_context\\\', \\\'os_brick.privileged.default\\\', \\\' > > > --privsep_sock_path\\\', > > > \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned > > > new privsep daemon via rootwrap\\noslo.privsep.daemon: privsep > > > daemon starting\\noslo.privsep.daemon: privsep process running with > > > uid/gid: 0/0\\noslo.privsep.daemon: privsep process running with > > > capabilities (eff/prm/inh): > > > CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: privsep > > > daemon running as pid 76076\\nTraceback (most recent call > > > last):\\n File "/usr/libexec/vdsm/managedvolume-helper", line 154, > > > in \\nsys.exit(main(sys.argv[1:]))\\n File > > > "/usr/libexec/vdsm/managedvolume-helper", line 77, in main\\n > > > args.command(args)\\n File "/usr/libexec/vdsm/managedvolume- > > > helper", line 149, in detach\\nignore_errors=False)\\n File > > > "/usr/lib/python2.7/site-packages/vdsm/storage/nos_brick.py", line > > > 121, in disconnect_volume\\nrun_as_root=True)\\n File > > > "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, > > > in _execute\\nresult = self.__execute(*args, **kwargs)\\n File > > > "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", > > > line 169, in execute\\nreturn execute_root(*cmd, **kwargs)\\n > > > File "/usr/lib/python2.7/site- > > > packages/oslo_privsep/priv_context.py", line 241, in _wrap\\n > > > return self.channel.remote_call(name, args, kwargs)\\n File > > > "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line > > > 203, in remote_call\\nraise > > > exc_type(*result[2])\\noslo_concurrency.processutils.ProcessExecuti > > > onError: Unexpected error while running command.\\nCommand: rbd > > > unmap /dev/rbd/rbd/volume-0e8c1056-45d6-4740-934d-eb07a9f73160 -- > > > conf /tmp/brickrbd_LCKezP --id ovirt --mon_host 172.16.10.13:3300 > > > --mon_host 172.16.10.14:3300 --mon_host 172.16.10.12:6789\\nExit > > > code: 16\\nStdout: u\\\'\\\'\\nStderr: u\\\'rbd: sysfs write > > > failednrbd: unmap failed: (16) Device or resource > > > busyn\\\'\\n\'',)#012Traceback (most recent call last):#012 > > > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line > > > 124, in method#012ret = func(*args, **kwargs)#012 File > > > "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1766, in > > > detach_volume#012return > > > managedvolume.detach_volume(vol_id)#012 File > > > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", > > > line 67, in wrapper#012return func(*args, **kwargs)#012 File > > > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", > > > line 135, in detach_volume#012run_helper("detach", > > > vol_info)#012 File "/usr/lib/python2.7/site- > > > packages/vdsm/st
[ovirt-users] Re: Managed Block Storage: ceph detach_volume failing after migration
Hi, On Wed, 2019-09-25 at 15:42 +0300, Amit Bawer wrote: > According to resolution of [1] it's a multipathd/udev configuration > issue. Could be worth to track this issue. > > [1] https://tracker.ceph.com/issues/12763 Thanks, that certainly looks like a smoking gun to me, in the logs: Sep 25 12:27:45 mario multipathd: rbd29: add path (uevent) Sep 25 12:27:45 mario multipathd: rbd29: spurious uevent, path already in pathvec Sep 25 12:27:45 mario multipathd: rbd29: HDIO_GETGEO failed with 25 Sep 25 12:27:45 mario multipathd: rbd29: failed to get path uid Sep 25 12:27:45 mario multipathd: uevent trigger error Dan > > On Wed, Sep 25, 2019 at 3:18 PM Dan Poltawski < > dan.poltaw...@tnp.net.uk> wrote: > > On ovirt 4.3.5 we are seeing various problems related to the rbd > > device staying mapped after a guest has been live migrated. This > > causes problems migrating the guest back, as well as rebooting the > > guest when it starts back up on the original host. The error > > returned is ‘nrbd: unmap failed: (16) Device or resource busy’. > > I’ve pasted the full vdsm log below. > > > > As far as I can tell this isn’t happening 100% of the time, and > > seems to be more prevalent on busy guests. > > > > (Not sure if I should create a bug for this, so thought I’d start > > here first) > > > > Thanks, > > > > Dan > > > > > > Sep 24 19:26:18 mario vdsm[5485]: ERROR FINISH detach_volume > > error=Managed Volume Helper failed.: ('Error executing helper: > > Command [\'/usr/libexec/vdsm/managedvolume-helper\', \'detach\'] > > failed with rc=1 out=\'\' err=\'oslo.privsep.daemon: Running > > privsep helper: [\\\'sudo\\\', \\\'privsep-helper\\\', \\\' > > --privsep_context\\\', \\\'os_brick.privileged.default\\\', \\\' > > --privsep_sock_path\\\', > > \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned > > new privsep daemon via rootwrap\\noslo.privsep.daemon: privsep > > daemon starting\\noslo.privsep.daemon: privsep process running with > > uid/gid: 0/0\\noslo.privsep.daemon: privsep process running with > > capabilities (eff/prm/inh): > > CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: privsep > > daemon running as pid 76076\\nTraceback (most recent call > > last):\\n File "/usr/libexec/vdsm/managedvolume-helper", line 154, > > in \\nsys.exit(main(sys.argv[1:]))\\n File > > "/usr/libexec/vdsm/managedvolume-helper", line 77, in main\\n > > args.command(args)\\n File "/usr/libexec/vdsm/managedvolume- > > helper", line 149, in detach\\nignore_errors=False)\\n File > > "/usr/lib/python2.7/site-packages/vdsm/storage/nos_brick.py", line > > 121, in disconnect_volume\\nrun_as_root=True)\\n File > > "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, > > in _execute\\nresult = self.__execute(*args, **kwargs)\\n File > > "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", > > line 169, in execute\\nreturn execute_root(*cmd, **kwargs)\\n > > File "/usr/lib/python2.7/site- > > packages/oslo_privsep/priv_context.py", line 241, in _wrap\\n > > return self.channel.remote_call(name, args, kwargs)\\n File > > "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line > > 203, in remote_call\\nraise > > exc_type(*result[2])\\noslo_concurrency.processutils.ProcessExecuti > > onError: Unexpected error while running command.\\nCommand: rbd > > unmap /dev/rbd/rbd/volume-0e8c1056-45d6-4740-934d-eb07a9f73160 -- > > conf /tmp/brickrbd_LCKezP --id ovirt --mon_host 172.16.10.13:3300 > > --mon_host 172.16.10.14:3300 --mon_host 172.16.10.12:6789\\nExit > > code: 16\\nStdout: u\\\'\\\'\\nStderr: u\\\'rbd: sysfs write > > failednrbd: unmap failed: (16) Device or resource > > busyn\\\'\\n\'',)#012Traceback (most recent call last):#012 > > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line > > 124, in method#012ret = func(*args, **kwargs)#012 File > > "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1766, in > > detach_volume#012return > > managedvolume.detach_volume(vol_id)#012 File > > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", > > line 67, in wrapper#012return func(*args, **kwargs)#012 File > > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", > > line 135, in detach_volume#012run_helper("detach", > > vol_info)#012 File "/usr/lib/python2.7/site- > > packages/vdsm/storage/managedvolume.py", line 179, in > > run_helper#012sub_cmd, cmd_input=cmd_input)#012 File > > "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line > > 56, in __call__#012return callMethod()#012 File > > "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line > > 54, in #012**kwargs)#012 File "", line 2, in > > managedvolume_run_helper#012 File > > "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in > > _callmethod#012raise convert_to_error(kind, > > result)#012ManagedVolumeHelperFailed: Managed Volume Helper > > failed.: ('Error executing helper: Co
[ovirt-users] Re: Managed Block Storage: ceph detach_volume failing after migration
According to resolution of [1] it's a multipathd/udev configuration issue. Could be worth to track this issue. [1] https://tracker.ceph.com/issues/12763 On Wed, Sep 25, 2019 at 3:18 PM Dan Poltawski wrote: > On ovirt 4.3.5 we are seeing various problems related to the rbd device > staying mapped after a guest has been live migrated. This causes problems > migrating the guest back, as well as rebooting the guest when it starts > back up on the original host. The error returned is ‘nrbd: unmap failed: > (16) Device or resource busy’. I’ve pasted the full vdsm log below. > > > > As far as I can tell this isn’t happening 100% of the time, and seems to > be more prevalent on busy guests. > > > > (Not sure if I should create a bug for this, so thought I’d start here > first) > > > > Thanks, > > > > Dan > > > > > > Sep 24 19:26:18 mario vdsm[5485]: ERROR FINISH detach_volume error=Managed > Volume Helper failed.: ('Error executing helper: Command > [\'/usr/libexec/vdsm/managedvolume-helper\', \'detach\'] failed with rc=1 > out=\'\' err=\'oslo.privsep.daemon: Running privsep helper: [\\\'sudo\\\', > \\\'privsep-helper\\\', \\\'--privsep_context\\\', > \\\'os_brick.privileged.default\\\', \\\'--privsep_sock_path\\\', > \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned new > privsep daemon via rootwrap\\noslo.privsep.daemon: privsep daemon > starting\\noslo.privsep.daemon: privsep process running with uid/gid: > 0/0\\noslo.privsep.daemon: privsep process running with capabilities > (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: > privsep daemon running as pid 76076\\nTraceback (most recent call > last):\\n File "/usr/libexec/vdsm/managedvolume-helper", line 154, in > \\nsys.exit(main(sys.argv[1:]))\\n File > "/usr/libexec/vdsm/managedvolume-helper", line 77, in main\\n > args.command(args)\\n File "/usr/libexec/vdsm/managedvolume-helper", line > 149, in detach\\nignore_errors=False)\\n File > "/usr/lib/python2.7/site-packages/vdsm/storage/nos_brick.py", line 121, in > disconnect_volume\\nrun_as_root=True)\\n File > "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in > _execute\\nresult = self.__execute(*args, **kwargs)\\n File > "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line > 169, in execute\\nreturn execute_root(*cmd, **kwargs)\\n File > "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 241, > in _wrap\\nreturn self.channel.remote_call(name, args, kwargs)\\n File > "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 203, in > remote_call\\nraise > exc_type(*result[2])\\noslo_concurrency.processutils.ProcessExecutionError: > Unexpected error while running command.\\nCommand: rbd unmap > /dev/rbd/rbd/volume-0e8c1056-45d6-4740-934d-eb07a9f73160 --conf > /tmp/brickrbd_LCKezP --id ovirt --mon_host 172.16.10.13:3300 --mon_host > 172.16.10.14:3300 --mon_host 172.16.10.12:6789\\nExit code: 16\\nStdout: > u\\\'\\\'\\nStderr: u\\\'rbd: sysfs write failednrbd: unmap failed: > (16) Device or resource busyn\\\'\\n\'',)#012Traceback (most recent > call last):#012 File > "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in > method#012ret = func(*args, **kwargs)#012 File > "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1766, in > detach_volume#012return managedvolume.detach_volume(vol_id)#012 File > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 67, > in wrapper#012return func(*args, **kwargs)#012 File > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 135, > in detach_volume#012run_helper("detach", vol_info)#012 File > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 179, > in run_helper#012sub_cmd, cmd_input=cmd_input)#012 File > "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in > __call__#012return callMethod()#012 File > "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in > #012**kwargs)#012 File "", line 2, in > managedvolume_run_helper#012 File > "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in > _callmethod#012raise convert_to_error(kind, > result)#012ManagedVolumeHelperFailed: Managed Volume Helper failed.: > ('Error executing helper: Command > [\'/usr/libexec/vdsm/managedvolume-helper\', \'detach\'] failed with rc=1 > out=\'\' err=\'oslo.privsep.daemon: Running privsep helper: [\\\'sudo\\\', > \\\'privsep-helper\\\', \\\'--privsep_context\\\', > \\\'os_brick.privileged.default\\\', \\\'--privsep_sock_path\\\', > \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned new > privsep daemon via rootwrap\\noslo.privsep.daemon: privsep daemon > starting\\noslo.privsep.daemon: privsep process running with uid/gid: > 0/0\\noslo.privsep.daemon: privsep process running with capabilities > (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: > privsep
[ovirt-users] Re: Managed Block Storage: ceph detach_volume failing after migration
This might be a bug, can you share the full vdsm and engine logs? On Wed, Sep 25, 2019 at 3:18 PM Dan Poltawski wrote: > > On ovirt 4.3.5 we are seeing various problems related to the rbd device > staying mapped after a guest has been live migrated. This causes problems > migrating the guest back, as well as rebooting the guest when it starts back > up on the original host. The error returned is ‘nrbd: unmap failed: (16) > Device or resource busy’. I’ve pasted the full vdsm log below. > > > > As far as I can tell this isn’t happening 100% of the time, and seems to be > more prevalent on busy guests. > > > > (Not sure if I should create a bug for this, so thought I’d start here first) > > > > Thanks, > > > > Dan > > > > > > Sep 24 19:26:18 mario vdsm[5485]: ERROR FINISH detach_volume error=Managed > Volume Helper failed.: ('Error executing helper: Command > [\'/usr/libexec/vdsm/managedvolume-helper\', \'detach\'] failed with rc=1 > out=\'\' err=\'oslo.privsep.daemon: Running privsep helper: [\\\'sudo\\\', > \\\'privsep-helper\\\', \\\'--privsep_context\\\', > \\\'os_brick.privileged.default\\\', \\\'--privsep_sock_path\\\', > \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned new > privsep daemon via rootwrap\\noslo.privsep.daemon: privsep daemon > starting\\noslo.privsep.daemon: privsep process running with uid/gid: > 0/0\\noslo.privsep.daemon: privsep process running with capabilities > (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: > privsep daemon running as pid 76076\\nTraceback (most recent call last):\\n > File "/usr/libexec/vdsm/managedvolume-helper", line 154, in \\n > sys.exit(main(sys.argv[1:]))\\n File > "/usr/libexec/vdsm/managedvolume-helper", line 77, in main\\n > args.command(args)\\n File "/usr/libexec/vdsm/managedvolume-helper", line > 149, in detach\\nignore_errors=False)\\n File > "/usr/lib/python2.7/site-packages/vdsm/storage/nos_brick.py", line 121, in > disconnect_volume\\nrun_as_root=True)\\n File > "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in > _execute\\nresult = self.__execute(*args, **kwargs)\\n File > "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line 169, > in execute\\nreturn execute_root(*cmd, **kwargs)\\n File > "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 241, in > _wrap\\nreturn self.channel.remote_call(name, args, kwargs)\\n File > "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 203, in > remote_call\\nraise > exc_type(*result[2])\\noslo_concurrency.processutils.ProcessExecutionError: > Unexpected error while running command.\\nCommand: rbd unmap > /dev/rbd/rbd/volume-0e8c1056-45d6-4740-934d-eb07a9f73160 --conf > /tmp/brickrbd_LCKezP --id ovirt --mon_host 172.16.10.13:3300 --mon_host > 172.16.10.14:3300 --mon_host 172.16.10.12:6789\\nExit code: 16\\nStdout: > u\\\'\\\'\\nStderr: u\\\'rbd: sysfs write failednrbd: unmap failed: (16) > Device or resource busyn\\\'\\n\'',)#012Traceback (most recent call > last):#012 File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line > 124, in method#012ret = func(*args, **kwargs)#012 File > "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1766, in > detach_volume#012return managedvolume.detach_volume(vol_id)#012 File > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 67, in > wrapper#012return func(*args, **kwargs)#012 File > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 135, > in detach_volume#012run_helper("detach", vol_info)#012 File > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 179, > in run_helper#012sub_cmd, cmd_input=cmd_input)#012 File > "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in > __call__#012return callMethod()#012 File > "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in > #012**kwargs)#012 File "", line 2, in > managedvolume_run_helper#012 File > "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in > _callmethod#012raise convert_to_error(kind, > result)#012ManagedVolumeHelperFailed: Managed Volume Helper failed.: ('Error > executing helper: Command [\'/usr/libexec/vdsm/managedvolume-helper\', > \'detach\'] failed with rc=1 out=\'\' err=\'oslo.privsep.daemon: Running > privsep helper: [\\\'sudo\\\', \\\'privsep-helper\\\', > \\\'--privsep_context\\\', \\\'os_brick.privileged.default\\\', > \\\'--privsep_sock_path\\\', > \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned new > privsep daemon via rootwrap\\noslo.privsep.daemon: privsep daemon > starting\\noslo.privsep.daemon: privsep process running with uid/gid: > 0/0\\noslo.privsep.daemon: privsep process running with capabilities > (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: > privsep daemon running as pi