[ovirt-users] Re: Managed Block Storage: ceph detach_volume failing after migration

2019-09-25 Thread Nir Soffer
On Wed, Sep 25, 2019 at 8:02 PM Dan Poltawski 
wrote:

> Hi,
>
> On Wed, 2019-09-25 at 15:42 +0300, Amit Bawer wrote:
> > According to resolution of [1] it's a multipathd/udev configuration
> > issue. Could be worth to track this issue.
> >
> > [1] https://tracker.ceph.com/issues/12763
>
> Thanks, that certainly looks like a smoking gun to me, in the logs:
>
> Sep 25 12:27:45 mario multipathd: rbd29: add path (uevent)
> Sep 25 12:27:45 mario multipathd: rbd29: spurious uevent, path already
> in pathvec
> Sep 25 12:27:45 mario multipathd: rbd29: HDIO_GETGEO failed with 25
> Sep 25 12:27:45 mario multipathd: rbd29: failed to get path uid
> Sep 25 12:27:45 mario multipathd: uevent trigger error
>

Please file oVirt bug. Vdsm manages multipath configuration and I don't
think we have
a blacklist for rbd devices.

If this is the issue, you can fix this locally by installing a multipath
drop-in configuration:

# cat /etc/multipath.conf.d/rbd.conf
blacklist {
   devnode "^(rbd)[0-9]*"
}

Vdsm should include this configuration in /etc/multipath.conf that vdsm
manages.

Nir



>
>
> Dan
>
> >
> > On Wed, Sep 25, 2019 at 3:18 PM Dan Poltawski <
> > dan.poltaw...@tnp.net.uk> wrote:
> > > On ovirt 4.3.5 we are seeing various problems related to the rbd
> > > device staying mapped after a guest has been live migrated. This
> > > causes problems migrating the guest back, as well as rebooting the
> > > guest when it starts back up on the original host. The error
> > > returned is ‘nrbd: unmap failed: (16) Device or resource busy’.
> > > I’ve pasted the full vdsm log below.
> > >
> > > As far as I can tell this isn’t happening 100% of the time, and
> > > seems to be more prevalent on busy guests.
> > >
> > > (Not sure if I should create a bug for this, so thought I’d start
> > > here first)
> > >
> > > Thanks,
> > >
> > > Dan
> > >
> > >
> > > Sep 24 19:26:18 mario vdsm[5485]: ERROR FINISH detach_volume
> > > error=Managed Volume Helper failed.: ('Error executing helper:
> > > Command [\'/usr/libexec/vdsm/managedvolume-helper\', \'detach\']
> > > failed with rc=1 out=\'\' err=\'oslo.privsep.daemon: Running
> > > privsep helper: [\\\'sudo\\\', \\\'privsep-helper\\\', \\\'
> > > --privsep_context\\\', \\\'os_brick.privileged.default\\\', \\\'
> > > --privsep_sock_path\\\',
> > > \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned
> > > new privsep daemon via rootwrap\\noslo.privsep.daemon: privsep
> > > daemon starting\\noslo.privsep.daemon: privsep process running with
> > > uid/gid: 0/0\\noslo.privsep.daemon: privsep process running with
> > > capabilities (eff/prm/inh):
> > > CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: privsep
> > > daemon running as pid 76076\\nTraceback (most recent call
> > > last):\\n  File "/usr/libexec/vdsm/managedvolume-helper", line 154,
> > > in \\nsys.exit(main(sys.argv[1:]))\\n  File
> > > "/usr/libexec/vdsm/managedvolume-helper", line 77, in main\\n
> > > args.command(args)\\n  File "/usr/libexec/vdsm/managedvolume-
> > > helper", line 149, in detach\\nignore_errors=False)\\n  File
> > > "/usr/lib/python2.7/site-packages/vdsm/storage/nos_brick.py", line
> > > 121, in disconnect_volume\\nrun_as_root=True)\\n  File
> > > "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52,
> > > in _execute\\nresult = self.__execute(*args, **kwargs)\\n  File
> > > "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py",
> > > line 169, in execute\\nreturn execute_root(*cmd, **kwargs)\\n
> > > File "/usr/lib/python2.7/site-
> > > packages/oslo_privsep/priv_context.py",  line 241, in _wrap\\n
> > > return self.channel.remote_call(name, args, kwargs)\\n  File
> > > "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line
> > > 203, in remote_call\\nraise
> > > exc_type(*result[2])\\noslo_concurrency.processutils.ProcessExecuti
> > > onError: Unexpected error while running command.\\nCommand: rbd
> > > unmap /dev/rbd/rbd/volume-0e8c1056-45d6-4740-934d-eb07a9f73160 --
> > > conf /tmp/brickrbd_LCKezP --id ovirt --mon_host 172.16.10.13:3300
> > > --mon_host 172.16.10.14:3300 --mon_host 172.16.10.12:6789\\nExit
> > > code: 16\\nStdout: u\\\'\\\'\\nStderr: u\\\'rbd: sysfs write
> > > failednrbd: unmap failed: (16) Device or resource
> > > busyn\\\'\\n\'',)#012Traceback (most recent call last):#012
> > > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line
> > > 124, in method#012ret = func(*args, **kwargs)#012  File
> > > "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1766, in
> > > detach_volume#012return
> > > managedvolume.detach_volume(vol_id)#012  File
> > > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py",
> > > line 67, in wrapper#012return func(*args, **kwargs)#012  File
> > > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py",
> > > line 135, in detach_volume#012run_helper("detach",
> > > vol_info)#012  File "/usr/lib/python2.7/site-
> > > packages/vdsm/st

[ovirt-users] Re: Managed Block Storage: ceph detach_volume failing after migration

2019-09-25 Thread Dan Poltawski
Hi,

On Wed, 2019-09-25 at 15:42 +0300, Amit Bawer wrote:
> According to resolution of [1] it's a multipathd/udev configuration
> issue. Could be worth to track this issue.
>
> [1] https://tracker.ceph.com/issues/12763

Thanks, that certainly looks like a smoking gun to me, in the logs:

Sep 25 12:27:45 mario multipathd: rbd29: add path (uevent)
Sep 25 12:27:45 mario multipathd: rbd29: spurious uevent, path already
in pathvec
Sep 25 12:27:45 mario multipathd: rbd29: HDIO_GETGEO failed with 25
Sep 25 12:27:45 mario multipathd: rbd29: failed to get path uid
Sep 25 12:27:45 mario multipathd: uevent trigger error


Dan

>
> On Wed, Sep 25, 2019 at 3:18 PM Dan Poltawski <
> dan.poltaw...@tnp.net.uk> wrote:
> > On ovirt 4.3.5 we are seeing various problems related to the rbd
> > device staying mapped after a guest has been live migrated. This
> > causes problems migrating the guest back, as well as rebooting the
> > guest when it starts back up on the original host. The error
> > returned is ‘nrbd: unmap failed: (16) Device or resource busy’.
> > I’ve pasted the full vdsm log below.
> >
> > As far as I can tell this isn’t happening 100% of the time, and
> > seems to be more prevalent on busy guests.
> >
> > (Not sure if I should create a bug for this, so thought I’d start
> > here first)
> >
> > Thanks,
> >
> > Dan
> >
> >
> > Sep 24 19:26:18 mario vdsm[5485]: ERROR FINISH detach_volume
> > error=Managed Volume Helper failed.: ('Error executing helper:
> > Command [\'/usr/libexec/vdsm/managedvolume-helper\', \'detach\']
> > failed with rc=1 out=\'\' err=\'oslo.privsep.daemon: Running
> > privsep helper: [\\\'sudo\\\', \\\'privsep-helper\\\', \\\'
> > --privsep_context\\\', \\\'os_brick.privileged.default\\\', \\\'
> > --privsep_sock_path\\\',
> > \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned
> > new privsep daemon via rootwrap\\noslo.privsep.daemon: privsep
> > daemon starting\\noslo.privsep.daemon: privsep process running with
> > uid/gid: 0/0\\noslo.privsep.daemon: privsep process running with
> > capabilities (eff/prm/inh):
> > CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: privsep
> > daemon running as pid 76076\\nTraceback (most recent call
> > last):\\n  File "/usr/libexec/vdsm/managedvolume-helper", line 154,
> > in \\nsys.exit(main(sys.argv[1:]))\\n  File
> > "/usr/libexec/vdsm/managedvolume-helper", line 77, in main\\n
> > args.command(args)\\n  File "/usr/libexec/vdsm/managedvolume-
> > helper", line 149, in detach\\nignore_errors=False)\\n  File
> > "/usr/lib/python2.7/site-packages/vdsm/storage/nos_brick.py", line
> > 121, in disconnect_volume\\nrun_as_root=True)\\n  File
> > "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52,
> > in _execute\\nresult = self.__execute(*args, **kwargs)\\n  File
> > "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py",
> > line 169, in execute\\nreturn execute_root(*cmd, **kwargs)\\n
> > File "/usr/lib/python2.7/site-
> > packages/oslo_privsep/priv_context.py",  line 241, in _wrap\\n
> > return self.channel.remote_call(name, args, kwargs)\\n  File
> > "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line
> > 203, in remote_call\\nraise
> > exc_type(*result[2])\\noslo_concurrency.processutils.ProcessExecuti
> > onError: Unexpected error while running command.\\nCommand: rbd
> > unmap /dev/rbd/rbd/volume-0e8c1056-45d6-4740-934d-eb07a9f73160 --
> > conf /tmp/brickrbd_LCKezP --id ovirt --mon_host 172.16.10.13:3300
> > --mon_host 172.16.10.14:3300 --mon_host 172.16.10.12:6789\\nExit
> > code: 16\\nStdout: u\\\'\\\'\\nStderr: u\\\'rbd: sysfs write
> > failednrbd: unmap failed: (16) Device or resource
> > busyn\\\'\\n\'',)#012Traceback (most recent call last):#012
> > File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line
> > 124, in method#012ret = func(*args, **kwargs)#012  File
> > "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1766, in
> > detach_volume#012return
> > managedvolume.detach_volume(vol_id)#012  File
> > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py",
> > line 67, in wrapper#012return func(*args, **kwargs)#012  File
> > "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py",
> > line 135, in detach_volume#012run_helper("detach",
> > vol_info)#012  File "/usr/lib/python2.7/site-
> > packages/vdsm/storage/managedvolume.py", line 179, in
> > run_helper#012sub_cmd, cmd_input=cmd_input)#012  File
> > "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line
> > 56, in __call__#012return callMethod()#012  File
> > "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line
> > 54, in #012**kwargs)#012  File "", line 2, in
> > managedvolume_run_helper#012  File
> > "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in
> > _callmethod#012raise convert_to_error(kind,
> > result)#012ManagedVolumeHelperFailed: Managed Volume Helper
> > failed.: ('Error executing helper: Co

[ovirt-users] Re: Managed Block Storage: ceph detach_volume failing after migration

2019-09-25 Thread Amit Bawer
According to resolution of [1] it's a multipathd/udev configuration issue.
Could be worth to track this issue.

[1] https://tracker.ceph.com/issues/12763

On Wed, Sep 25, 2019 at 3:18 PM Dan Poltawski 
wrote:

> On ovirt 4.3.5 we are seeing various problems related to the rbd device
> staying mapped after a guest has been live migrated. This causes problems
> migrating the guest back, as well as rebooting the guest when it starts
> back up on the original host. The error returned is ‘nrbd: unmap failed:
> (16) Device or resource busy’. I’ve pasted the full vdsm log below.
>
>
>
> As far as I can tell this isn’t happening 100% of the time, and seems to
> be more prevalent on busy guests.
>
>
>
> (Not sure if I should create a bug for this, so thought I’d start here
> first)
>
>
>
> Thanks,
>
>
>
> Dan
>
>
>
>
>
> Sep 24 19:26:18 mario vdsm[5485]: ERROR FINISH detach_volume error=Managed
> Volume Helper failed.: ('Error executing helper: Command
> [\'/usr/libexec/vdsm/managedvolume-helper\', \'detach\'] failed with rc=1
> out=\'\' err=\'oslo.privsep.daemon: Running privsep helper: [\\\'sudo\\\',
> \\\'privsep-helper\\\', \\\'--privsep_context\\\',
> \\\'os_brick.privileged.default\\\', \\\'--privsep_sock_path\\\',
> \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned new
> privsep daemon via rootwrap\\noslo.privsep.daemon: privsep daemon
> starting\\noslo.privsep.daemon: privsep process running with uid/gid:
> 0/0\\noslo.privsep.daemon: privsep process running with capabilities
> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon:
> privsep daemon running as pid 76076\\nTraceback (most recent call
> last):\\n  File "/usr/libexec/vdsm/managedvolume-helper", line 154, in
> \\nsys.exit(main(sys.argv[1:]))\\n  File
> "/usr/libexec/vdsm/managedvolume-helper", line 77, in main\\n
> args.command(args)\\n  File "/usr/libexec/vdsm/managedvolume-helper", line
> 149, in detach\\nignore_errors=False)\\n  File
> "/usr/lib/python2.7/site-packages/vdsm/storage/nos_brick.py", line 121, in
> disconnect_volume\\nrun_as_root=True)\\n  File
> "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in
> _execute\\nresult = self.__execute(*args, **kwargs)\\n  File
> "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line
> 169, in execute\\nreturn execute_root(*cmd, **kwargs)\\n  File
> "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 241,
> in _wrap\\nreturn self.channel.remote_call(name, args, kwargs)\\n  File
> "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 203, in
> remote_call\\nraise
> exc_type(*result[2])\\noslo_concurrency.processutils.ProcessExecutionError:
> Unexpected error while running command.\\nCommand: rbd unmap
> /dev/rbd/rbd/volume-0e8c1056-45d6-4740-934d-eb07a9f73160 --conf
> /tmp/brickrbd_LCKezP --id ovirt --mon_host 172.16.10.13:3300 --mon_host
> 172.16.10.14:3300 --mon_host 172.16.10.12:6789\\nExit code: 16\\nStdout:
> u\\\'\\\'\\nStderr: u\\\'rbd: sysfs write failednrbd: unmap failed:
> (16) Device or resource busyn\\\'\\n\'',)#012Traceback (most recent
> call last):#012  File
> "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in
> method#012ret = func(*args, **kwargs)#012  File
> "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1766, in
> detach_volume#012return managedvolume.detach_volume(vol_id)#012  File
> "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 67,
> in wrapper#012return func(*args, **kwargs)#012  File
> "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 135,
> in detach_volume#012run_helper("detach", vol_info)#012  File
> "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 179,
> in run_helper#012sub_cmd, cmd_input=cmd_input)#012  File
> "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in
> __call__#012return callMethod()#012  File
> "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in
> #012**kwargs)#012  File "", line 2, in
> managedvolume_run_helper#012  File
> "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in
> _callmethod#012raise convert_to_error(kind,
> result)#012ManagedVolumeHelperFailed: Managed Volume Helper failed.:
> ('Error executing helper: Command
> [\'/usr/libexec/vdsm/managedvolume-helper\', \'detach\'] failed with rc=1
> out=\'\' err=\'oslo.privsep.daemon: Running privsep helper: [\\\'sudo\\\',
> \\\'privsep-helper\\\', \\\'--privsep_context\\\',
> \\\'os_brick.privileged.default\\\', \\\'--privsep_sock_path\\\',
> \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned new
> privsep daemon via rootwrap\\noslo.privsep.daemon: privsep daemon
> starting\\noslo.privsep.daemon: privsep process running with uid/gid:
> 0/0\\noslo.privsep.daemon: privsep process running with capabilities
> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon:
> privsep 

[ovirt-users] Re: Managed Block Storage: ceph detach_volume failing after migration

2019-09-25 Thread Benny Zlotnik
This might be a bug, can you share the full vdsm and engine logs?


On Wed, Sep 25, 2019 at 3:18 PM Dan Poltawski  wrote:
>
> On ovirt 4.3.5 we are seeing various problems related to the rbd device 
> staying mapped after a guest has been live migrated. This causes problems 
> migrating the guest back, as well as rebooting the guest when it starts back 
> up on the original host. The error returned is ‘nrbd: unmap failed: (16) 
> Device or resource busy’. I’ve pasted the full vdsm log below.
>
>
>
> As far as I can tell this isn’t happening 100% of the time, and seems to be 
> more prevalent on busy guests.
>
>
>
> (Not sure if I should create a bug for this, so thought I’d start here first)
>
>
>
> Thanks,
>
>
>
> Dan
>
>
>
>
>
> Sep 24 19:26:18 mario vdsm[5485]: ERROR FINISH detach_volume error=Managed 
> Volume Helper failed.: ('Error executing helper: Command 
> [\'/usr/libexec/vdsm/managedvolume-helper\', \'detach\'] failed with rc=1 
> out=\'\' err=\'oslo.privsep.daemon: Running privsep helper: [\\\'sudo\\\', 
> \\\'privsep-helper\\\', \\\'--privsep_context\\\', 
> \\\'os_brick.privileged.default\\\', \\\'--privsep_sock_path\\\', 
> \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned new 
> privsep daemon via rootwrap\\noslo.privsep.daemon: privsep daemon 
> starting\\noslo.privsep.daemon: privsep process running with uid/gid: 
> 0/0\\noslo.privsep.daemon: privsep process running with capabilities 
> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: 
> privsep daemon running as pid 76076\\nTraceback (most recent call last):\\n  
> File "/usr/libexec/vdsm/managedvolume-helper", line 154, in \\n
> sys.exit(main(sys.argv[1:]))\\n  File 
> "/usr/libexec/vdsm/managedvolume-helper", line 77, in main\\n
> args.command(args)\\n  File "/usr/libexec/vdsm/managedvolume-helper", line 
> 149, in detach\\nignore_errors=False)\\n  File 
> "/usr/lib/python2.7/site-packages/vdsm/storage/nos_brick.py", line 121, in 
> disconnect_volume\\nrun_as_root=True)\\n  File 
> "/usr/lib/python2.7/site-packages/os_brick/executor.py", line 52, in 
> _execute\\nresult = self.__execute(*args, **kwargs)\\n  File 
> "/usr/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py", line 169, 
> in execute\\nreturn execute_root(*cmd, **kwargs)\\n  File 
> "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 241, in 
> _wrap\\nreturn self.channel.remote_call(name, args, kwargs)\\n  File 
> "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 203, in 
> remote_call\\nraise 
> exc_type(*result[2])\\noslo_concurrency.processutils.ProcessExecutionError: 
> Unexpected error while running command.\\nCommand: rbd unmap 
> /dev/rbd/rbd/volume-0e8c1056-45d6-4740-934d-eb07a9f73160 --conf 
> /tmp/brickrbd_LCKezP --id ovirt --mon_host 172.16.10.13:3300 --mon_host 
> 172.16.10.14:3300 --mon_host 172.16.10.12:6789\\nExit code: 16\\nStdout: 
> u\\\'\\\'\\nStderr: u\\\'rbd: sysfs write failednrbd: unmap failed: (16) 
> Device or resource busyn\\\'\\n\'',)#012Traceback (most recent call 
> last):#012  File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 
> 124, in method#012ret = func(*args, **kwargs)#012  File 
> "/usr/lib/python2.7/site-packages/vdsm/API.py", line 1766, in 
> detach_volume#012return managedvolume.detach_volume(vol_id)#012  File 
> "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 67, in 
> wrapper#012return func(*args, **kwargs)#012  File 
> "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 135, 
> in detach_volume#012run_helper("detach", vol_info)#012  File 
> "/usr/lib/python2.7/site-packages/vdsm/storage/managedvolume.py", line 179, 
> in run_helper#012sub_cmd, cmd_input=cmd_input)#012  File 
> "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 56, in 
> __call__#012return callMethod()#012  File 
> "/usr/lib/python2.7/site-packages/vdsm/common/supervdsm.py", line 54, in 
> #012**kwargs)#012  File "", line 2, in 
> managedvolume_run_helper#012  File 
> "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in 
> _callmethod#012raise convert_to_error(kind, 
> result)#012ManagedVolumeHelperFailed: Managed Volume Helper failed.: ('Error 
> executing helper: Command [\'/usr/libexec/vdsm/managedvolume-helper\', 
> \'detach\'] failed with rc=1 out=\'\' err=\'oslo.privsep.daemon: Running 
> privsep helper: [\\\'sudo\\\', \\\'privsep-helper\\\', 
> \\\'--privsep_context\\\', \\\'os_brick.privileged.default\\\', 
> \\\'--privsep_sock_path\\\', 
> \\\'/tmp/tmptQzb10/privsep.sock\\\']\\noslo.privsep.daemon: Spawned new 
> privsep daemon via rootwrap\\noslo.privsep.daemon: privsep daemon 
> starting\\noslo.privsep.daemon: privsep process running with uid/gid: 
> 0/0\\noslo.privsep.daemon: privsep process running with capabilities 
> (eff/prm/inh): CAP_SYS_ADMIN/CAP_SYS_ADMIN/none\\noslo.privsep.daemon: 
> privsep daemon running as pi