Re: [ovirt-users] moving disk failed.. remained locked
On Thu, Feb 23, 2017 at 08:11:50PM +0200, Nir Soffer wrote: > > [g.cecchi@ovmsrv05 ~]$ sudo sanlock client renewal -s > > 922b5269-ab56-4c4d-838f-49d33427e2ab > > timestamp=1207533 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 > > timestamp=1207554 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 > > ... > > timestamp=1211163 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 > > timestamp=1211183 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 > > timestamp=1211204 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 > > > > How do I translate this output above? What would be the difference in case > > of problems? > > David, can you explain this output? > > read_ms and write_ms looks obvious, but next_timeout and next_errors are > a mystery to me. Sorry for copying, but I think I explained it better then than I could now! (I need to include this somewhere in the man page.) commit 6313c709722b3ba63234a75d1651a160bf1728ee Author: David Teigland Date: Wed Mar 9 11:58:21 2016 -0600 sanlock: renewal history Keep a history of read and write latencies for a lockspace. The times are measured for io in delta lease renewal (each delta lease renewal includes one read and one write). For each successful renewal, a record is saved that includes: - the timestamp written in the delta lease by the renewal - the time in milliseconds taken by the delta lease read - the time in milliseconds taken by the delta lease write Also counted and recorded are the number io timeouts and other io errors that occur between successful renewals. Two consecutive successful renewals would be recorded as: timestamp=5332 read_ms=482 write_ms=5525 next_timeouts=0 next_errors=0 timestamp=5353 read_ms=99 write_ms=3161 next_timeouts=0 next_errors=0 timestamp is the value written into the delta lease during that renewal. read_ms/write_ms are the milliseconds taken for the renewal read/write ios. next_timeouts are the number of io timeouts that occured after the renewal recorded on that line and before the next successful renewal on the following line. next_errors are the number of io errors (not timeouts) that occured after renewal recorded on that line and before the next successful renewal on the following line. The command 'sanlock client renewal -s lockspace_name' reports the full history of renewals saved by sanlock, which by default is 180 records, about 1 hour of history when using a 20 second renewal interval for a 10 second io timeout. (A --summary option could be added to calculate and report averages over a selected period of the history.) ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
On Wed, Feb 22, 2017 at 12:20 PM, Gianluca Cecchi wrote: > On Wed, Feb 22, 2017 at 10:59 AM, Nir Soffer wrote: >> >> >> >> Lesson, use only storage without problems ;-) > > > hopefully... ;-) > >> >> >> Can you share the output of: >> >> >> >> sanlock client renewal -s 900b1853-e192-4661-a0f9-7c7c396f6f49 >> > >> > >> > No, the storage domain has been removed >> >> Next time when you have storage issues, please remember to grab >> the output of this command. >> >> Nir > > > > > For example, on a currently active storage domain I get: > > [g.cecchi@ovmsrv05 ~]$ sudo sanlock client renewal -s > 922b5269-ab56-4c4d-838f-49d33427e2ab > timestamp=1207533 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 > timestamp=1207554 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 > ... > timestamp=1211163 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 > timestamp=1211183 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 > timestamp=1211204 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 > > How do I translate this output above? What would be the difference in case > of problems? David, can you explain this output? read_ms and write_ms looks obvious, but next_timeout and next_errors are a mystery to me. Nir ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
On Wed, Feb 22, 2017 at 10:59 AM, Nir Soffer wrote: > > > Lesson, use only storage without problems ;-) > hopefully... ;-) > >> Can you share the output of: > >> > >> sanlock client renewal -s 900b1853-e192-4661-a0f9-7c7c396f6f49 > > > > > > No, the storage domain has been removed > > Next time when you have storage issues, please remember to grab > the output of this command. > > Nir > For example, on a currently active storage domain I get: [g.cecchi@ovmsrv05 ~]$ sudo sanlock client renewal -s 922b5269-ab56-4c4d-838f-49d33427e2ab timestamp=1207533 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 timestamp=1207554 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 ... timestamp=1211163 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 timestamp=1211183 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 timestamp=1211204 read_ms=2 write_ms=0 next_timeouts=0 next_errors=0 How do I translate this output above? What would be the difference in case of problems? ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
On Wed, Feb 22, 2017 at 11:45 AM, Gianluca Cecchi wrote: > On Wed, Feb 22, 2017 at 9:56 AM, Nir Soffer wrote: >> >> >> >> Gianluca, what is domain 900b1853-e192-4661-a0f9-7c7c396f6f49? >> >> is this the domain you are migrating to in the same time? > > > That was the id of the storage domain created on the LUN with problems at > storage array level. This explains sanlock issues with this domain. > It only contained one disk of a VM. I was able to previously move other 2 > disks I had on it to another storage domain > > The disk was a data disk of a VM; its system disk was on another storage > domain without problems > > The order of my operations yesterday was: > - try move disk to another storge domain-> failure in auto snapshot > - try snapshot of VM selecting both disks --> failure The first step in moving disk to another domain when the vm is online, is creating a snapshot on old storage. Then we start mirroring process of the active (empty) snapshot to the destination storage domain. Then we copy the rest of the chain (readonly) to the destination storage domain. Finally we switch the active layer to the snapshot on the destination storage domain, and delete the old chain on the source domain. If the source storage is broken you have to stop the vm to move the disk. This is can also fail if we cannot read the disk from this storage. Lesson, use only storage without problems ;-) > - try snapshot of VM selecting only the system disk (the good one) --> ok > and also snapshot deletion ok > - try snapshot of VM selecting only the data disk --> failure > - hot add disk (in a good storage domain) to the VM --> OK > - try pvmove at VM OS level from problematic disk to new disk --> failure: > VM paused at 47% of pvmove and not able to continue > - power off VM --> OK > - remove disk from VM and delete --> OK > > Only at this point, with storage domain empty, I started to work on storage > domain itself, putting it to maintenance and removing it without problems; > and then the related LUN removal at host level with the notes described in > other thread > >> >> >> Can you share the output of: >> >> sanlock client renewal -s 900b1853-e192-4661-a0f9-7c7c396f6f49 > > > No, the storage domain has been removed Next time when you have storage issues, please remember to grab the output of this command. Nir ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
On Wed, Feb 22, 2017 at 9:56 AM, Nir Soffer wrote: > > > Gianluca, what is domain 900b1853-e192-4661-a0f9-7c7c396f6f49? > > is this the domain you are migrating to in the same time? > That was the id of the storage domain created on the LUN with problems at storage array level. It only contained one disk of a VM. I was able to previously move other 2 disks I had on it to another storage domain The disk was a data disk of a VM; its system disk was on another storage domain without problems The order of my operations yesterday was: - try move disk to another storge domain-> failure in auto snapshot - try snapshot of VM selecting both disks --> failure - try snapshot of VM selecting only the system disk (the good one) --> ok and also snapshot deletion ok - try snapshot of VM selecting only the data disk --> failure - hot add disk (in a good storage domain) to the VM --> OK - try pvmove at VM OS level from problematic disk to new disk --> failure: VM paused at 47% of pvmove and not able to continue - power off VM --> OK - remove disk from VM and delete --> OK Only at this point, with storage domain empty, I started to work on storage domain itself, putting it to maintenance and removing it without problems; and then the related LUN removal at host level with the notes described in other thread. > > Can you share the output of: > > sanlock client renewal -s 900b1853-e192-4661-a0f9-7c7c396f6f49 > No, the storage domain has been removed Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
On Wed, Feb 22, 2017 at 10:32 AM, Nir Soffer wrote: > On Wed, Feb 22, 2017 at 10:31 AM, Nir Soffer wrote: >> On Mon, Feb 20, 2017 at 4:49 PM, Gianluca Cecchi >> wrote: >>> Hello, >>> I'm trying to move a disk from one storage domain A to another B in oVirt >>> 4.1 >>> The corresponding VM is powered on in the mean time >>> >>> When executing the action, there was already in place a disk move from >>> storage domain C to A (this move was for a disk of a powered off VM and then >>> completed ok) >>> I got this in events of webadmin gui for the failed move A -> B: >>> >>> Feb 20, 2017 2:42:00 PM Failed to complete snapshot 'Auto-generated for Live >>> Storage Migration' creation for VM 'dbatest6'. >>> Feb 20, 2017 2:40:51 PM VDSM ovmsrv06 command HSMGetAllTasksStatusesVDS >>> failed: Error creating a new volume >>> Feb 20, 2017 2:40:51 PM Snapshot 'Auto-generated for Live Storage Migration' >>> creation for VM 'dbatest6' was initiated by admin@internal-authz. >>> >>> >>> And in relevant vdsm.log of referred host ovmsrv06 >>> >>> 2017-02-20 14:41:44,899 ERROR (tasks/8) [storage.Volume] Unexpected error >>> (volume:1087) >>> Traceback (most recent call last): >>> File "/usr/share/vdsm/storage/volume.py", line 1081, in create >>> cls.newVolumeLease(metaId, sdUUID, volUUID) >>> File "/usr/share/vdsm/storage/volume.py", line 1361, in newVolumeLease >>> return cls.manifestClass.newVolumeLease(metaId, sdUUID, volUUID) >>> File "/usr/share/vdsm/storage/blockVolume.py", line 310, in newVolumeLease >>> sanlock.init_resource(sdUUID, volUUID, [(leasePath, leaseOffset)]) >>> SanlockException: (-202, 'Sanlock resource init failure', 'Sanlock >>> exception') >> >> This means that sanlock could not initialize a lease in the new volume >> created >> for the snapshot. David, looking in sanlock log - we don't see any error matching this failure, but the domain 900b1853-e192-4661-a0f9-7c7c396f6f49 has renewal errors. I guess because sanlock_init_resource is implemented in the library, not going trough sanlock deamon? 2017-02-20 14:30:09+0100 1050804 [11738]: 900b1853 aio timeout RD 0x7f41d8c0:0x7f41d8d0:0x7f41e2afa000 ioto 10 to_count 1 2017-02-20 14:30:09+0100 1050804 [11738]: s3 delta_renew read timeout 10 sec offset 0 /dev/900b1853-e192-4661-a0f9-7c7c396f6f49/ids 2017-02-20 14:30:09+0100 1050804 [11738]: s3 renewal error -202 delta_length 10 last_success 1050773 2017-02-20 14:30:11+0100 1050806 [11738]: 900b1853 aio collect RD 0x7f41d8c0:0x7f41d8d0:0x7f41e2afa000 result 1048576:0 match reap 2017-02-20 14:35:58+0100 1051153 [11738]: 900b1853 aio timeout RD 0x7f41d8c0:0x7f41d8d0:0x7f41e2afa000 ioto 10 to_count 2 2017-02-20 14:35:58+0100 1051153 [11738]: s3 delta_renew read timeout 10 sec offset 0 /dev/900b1853-e192-4661-a0f9-7c7c396f6f49/ids 2017-02-20 14:35:58+0100 1051153 [11738]: s3 renewal error -202 delta_length 10 last_success 1051122 2017-02-20 14:36:01+0100 1051156 [11738]: 900b1853 aio collect RD 0x7f41d8c0:0x7f41d8d0:0x7f41e2afa000 result 1048576:0 match reap 2017-02-20 14:44:36+0100 1051671 [11738]: 900b1853 aio timeout RD 0x7f41d8c0:0x7f41d8d0:0x7f41e2afa000 ioto 10 to_count 3 2017-02-20 14:44:36+0100 1051671 [11738]: s3 delta_renew read timeout 10 sec offset 0 /dev/900b1853-e192-4661-a0f9-7c7c396f6f49/ids 2017-02-20 14:44:36+0100 1051671 [11738]: s3 renewal error -202 delta_length 10 last_success 1051641 2017-02-20 14:44:37+0100 1051672 [11738]: 900b1853 aio collect RD 0x7f41d8c0:0x7f41d8d0:0x7f41e2afa000 result 1048576:0 match reap 2017-02-20 14:48:02+0100 1051877 [11738]: 900b1853 aio timeout RD 0x7f41d8c0:0x7f41d8d0:0x7f41e2afa000 ioto 10 to_count 4 2017-02-20 14:48:02+0100 1051877 [11738]: s3 delta_renew read timeout 10 sec offset 0 /dev/900b1853-e192-4661-a0f9-7c7c396f6f49/ids 2017-02-20 14:48:02+0100 1051877 [11738]: s3 renewal error -202 delta_length 10 last_success 1051846 2017-02-20 14:48:02+0100 1051877 [11738]: 900b1853 aio collect RD 0x7f41d8c0:0x7f41d8d0:0x7f41e2afa000 result 1048576:0 match reap Gianluca, what is domain 900b1853-e192-4661-a0f9-7c7c396f6f49? is this the domain you are migrating to in the same time? Can you share the output of: sanlock client renewal -s 900b1853-e192-4661-a0f9-7c7c396f6f49 >>> 2017-02-20 14:41:44,900 ERROR (tasks/8) [storage.TaskManager.Task] >>> (Task='d694b892-b078-4d86-a035-427ee4fb3b13') Unexpected error (task:870) >>> Traceback (most recent call last): >>> File "/usr/share/vdsm/storage/task.py", line 877, in _run >>> return fn(*args, **kargs) >>> File "/usr/share/vdsm/storage/task.py", line 333, in run >>> return self.cmd(*self.argslist, **self.argsdict) >>> File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line >>> 79, in wrapper >>> return method(self, *args, **kwargs) >>> File "/usr/share/vdsm/storage/sp.py", line 1929, in createVolume >>> initialSize=initialSize) >>> File "/usr/share/vdsm/storage/sd.py", line 762, in creat
Re: [ovirt-users] moving disk failed.. remained locked
On Wed, Feb 22, 2017 at 9:32 AM, Nir Soffer wrote: > On Wed, Feb 22, 2017 at 10:31 AM, Nir Soffer wrote: > > > > > This means that sanlock could not initialize a lease in the new volume > created > > for the snapshot. > > > > Can you attach sanlock.log? > > Found it in your next message > > OK. Just to recap what happened from a physical point of view: - apparently I had an array of disks with no more spare disks and on this array was the LUN composing the disk storage domain. So I was in involved in moving disks of the impacted storage domain and then removal of storage domain itself, so that we can remove the logical array on storage This is a test storage system without support so at the moment I had no more spare disks on it - actually there was another disk problem with the array, generating loss of data because of no more spare available at that time - No evidence of error at VM OS level and at storage domain level - But probably the 2 operations: 1) move disk 2) create snapshot of the VM containing the disk could not complete due to this low level problem It should be nice to find an evidence to this. Storage domain didn't go offline BTW - I got confirmation of the loss of data this way: The original disk of the VM, inside the VM, was a PV of a VG I added a disk (on another storage domain) to the VM, made it a PV and added to the original VG Tried pvmove from source disk to new disk, but it reached about 47% and then stopped/failed, pausing the VM. I could start again the VM but as soon as the pvmove continued, the VM came back to paused state. So I powered off the VM and was able to detach/delete the corrupted disk and then remove the storage domain (see other thread opened yesterday) I then managed to recover the now corrupted VG and restore from backup the data contained in original fs. So the original problem was low level error of storage. If can be of help to narrow down oVirt behavior in this case scenario I can provide further logs from VM OS or from hosts/engine. Let me know. Some questions: - how is it managed the reaction of putting VM in paused mode due to I/O error as in this case? Can I in some way manage to keep VM on a ndlet it generate errors as in real physical server or not? - Why I didn't get any message at storage domain level but only at VM disk level? Thanks for the given help Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
On Wed, Feb 22, 2017 at 10:31 AM, Nir Soffer wrote: > On Mon, Feb 20, 2017 at 4:49 PM, Gianluca Cecchi > wrote: >> Hello, >> I'm trying to move a disk from one storage domain A to another B in oVirt >> 4.1 >> The corresponding VM is powered on in the mean time >> >> When executing the action, there was already in place a disk move from >> storage domain C to A (this move was for a disk of a powered off VM and then >> completed ok) >> I got this in events of webadmin gui for the failed move A -> B: >> >> Feb 20, 2017 2:42:00 PM Failed to complete snapshot 'Auto-generated for Live >> Storage Migration' creation for VM 'dbatest6'. >> Feb 20, 2017 2:40:51 PM VDSM ovmsrv06 command HSMGetAllTasksStatusesVDS >> failed: Error creating a new volume >> Feb 20, 2017 2:40:51 PM Snapshot 'Auto-generated for Live Storage Migration' >> creation for VM 'dbatest6' was initiated by admin@internal-authz. >> >> >> And in relevant vdsm.log of referred host ovmsrv06 >> >> 2017-02-20 14:41:44,899 ERROR (tasks/8) [storage.Volume] Unexpected error >> (volume:1087) >> Traceback (most recent call last): >> File "/usr/share/vdsm/storage/volume.py", line 1081, in create >> cls.newVolumeLease(metaId, sdUUID, volUUID) >> File "/usr/share/vdsm/storage/volume.py", line 1361, in newVolumeLease >> return cls.manifestClass.newVolumeLease(metaId, sdUUID, volUUID) >> File "/usr/share/vdsm/storage/blockVolume.py", line 310, in newVolumeLease >> sanlock.init_resource(sdUUID, volUUID, [(leasePath, leaseOffset)]) >> SanlockException: (-202, 'Sanlock resource init failure', 'Sanlock >> exception') > > This means that sanlock could not initialize a lease in the new volume created > for the snapshot. > > Can you attach sanlock.log? Found it in your next message > >> 2017-02-20 14:41:44,900 ERROR (tasks/8) [storage.TaskManager.Task] >> (Task='d694b892-b078-4d86-a035-427ee4fb3b13') Unexpected error (task:870) >> Traceback (most recent call last): >> File "/usr/share/vdsm/storage/task.py", line 877, in _run >> return fn(*args, **kargs) >> File "/usr/share/vdsm/storage/task.py", line 333, in run >> return self.cmd(*self.argslist, **self.argsdict) >> File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line >> 79, in wrapper >> return method(self, *args, **kwargs) >> File "/usr/share/vdsm/storage/sp.py", line 1929, in createVolume >> initialSize=initialSize) >> File "/usr/share/vdsm/storage/sd.py", line 762, in createVolume >> initialSize=initialSize) >> File "/usr/share/vdsm/storage/volume.py", line 1089, in create >> (volUUID, e)) >> VolumeCreationError: Error creating a new volume: (u"Volume creation >> d0d938bd-1479-49cb-93fb-85b6a32d6cb4 failed: (-202, 'Sanlock resource init >> failure', 'Sanlock exception')",) >> 2017-02-20 14:41:44,941 INFO (tasks/8) [storage.Volume] Metadata rollback >> for sdUUID=900b1853-e192-4661-a0f9-7c7c396f6f49 offs=8 (blockVolume:448) >> >> >> Was the error generated due to the other migration still in progress? >> Is there a limit of concurrent migrations from/to a particular storage >> domain? > > No, maybe your network was overloaded by the concurrent migrations? > >> >> Now I would like to retry, but I see that the disk is in state locked with >> hourglass. >> The autogenerated snapshot of the failed action was apparently removed with >> success as I don't see it. >> >> How can I proceed to move the disk? >> >> Thanks in advance, >> Gianluca >> >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
On Mon, Feb 20, 2017 at 4:49 PM, Gianluca Cecchi wrote: > Hello, > I'm trying to move a disk from one storage domain A to another B in oVirt > 4.1 > The corresponding VM is powered on in the mean time > > When executing the action, there was already in place a disk move from > storage domain C to A (this move was for a disk of a powered off VM and then > completed ok) > I got this in events of webadmin gui for the failed move A -> B: > > Feb 20, 2017 2:42:00 PM Failed to complete snapshot 'Auto-generated for Live > Storage Migration' creation for VM 'dbatest6'. > Feb 20, 2017 2:40:51 PM VDSM ovmsrv06 command HSMGetAllTasksStatusesVDS > failed: Error creating a new volume > Feb 20, 2017 2:40:51 PM Snapshot 'Auto-generated for Live Storage Migration' > creation for VM 'dbatest6' was initiated by admin@internal-authz. > > > And in relevant vdsm.log of referred host ovmsrv06 > > 2017-02-20 14:41:44,899 ERROR (tasks/8) [storage.Volume] Unexpected error > (volume:1087) > Traceback (most recent call last): > File "/usr/share/vdsm/storage/volume.py", line 1081, in create > cls.newVolumeLease(metaId, sdUUID, volUUID) > File "/usr/share/vdsm/storage/volume.py", line 1361, in newVolumeLease > return cls.manifestClass.newVolumeLease(metaId, sdUUID, volUUID) > File "/usr/share/vdsm/storage/blockVolume.py", line 310, in newVolumeLease > sanlock.init_resource(sdUUID, volUUID, [(leasePath, leaseOffset)]) > SanlockException: (-202, 'Sanlock resource init failure', 'Sanlock > exception') This means that sanlock could not initialize a lease in the new volume created for the snapshot. Can you attach sanlock.log? > 2017-02-20 14:41:44,900 ERROR (tasks/8) [storage.TaskManager.Task] > (Task='d694b892-b078-4d86-a035-427ee4fb3b13') Unexpected error (task:870) > Traceback (most recent call last): > File "/usr/share/vdsm/storage/task.py", line 877, in _run > return fn(*args, **kargs) > File "/usr/share/vdsm/storage/task.py", line 333, in run > return self.cmd(*self.argslist, **self.argsdict) > File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line > 79, in wrapper > return method(self, *args, **kwargs) > File "/usr/share/vdsm/storage/sp.py", line 1929, in createVolume > initialSize=initialSize) > File "/usr/share/vdsm/storage/sd.py", line 762, in createVolume > initialSize=initialSize) > File "/usr/share/vdsm/storage/volume.py", line 1089, in create > (volUUID, e)) > VolumeCreationError: Error creating a new volume: (u"Volume creation > d0d938bd-1479-49cb-93fb-85b6a32d6cb4 failed: (-202, 'Sanlock resource init > failure', 'Sanlock exception')",) > 2017-02-20 14:41:44,941 INFO (tasks/8) [storage.Volume] Metadata rollback > for sdUUID=900b1853-e192-4661-a0f9-7c7c396f6f49 offs=8 (blockVolume:448) > > > Was the error generated due to the other migration still in progress? > Is there a limit of concurrent migrations from/to a particular storage > domain? No, maybe your network was overloaded by the concurrent migrations? > > Now I would like to retry, but I see that the disk is in state locked with > hourglass. > The autogenerated snapshot of the failed action was apparently removed with > success as I don't see it. > > How can I proceed to move the disk? > > Thanks in advance, > Gianluca > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
I opened a bug for the task cleaner issue: https://bugzilla.redhat.com/show_bug.cgi?id=1425705 Did you managed to copy the disk ? For better tracking, can you open a bug with the details and logs ? Thanks, Fred On Tue, Feb 21, 2017 at 3:36 PM, Gianluca Cecchi wrote: > The problem itself seems related with snapshot and with the disk (430Gb in > size). > > Failed to complete snapshot 'test3' creation for VM 'dbatest6'. > VDSM ovmsrv07 command HSMGetAllTasksStatusesVDS failed: Could not acquire > resource. Probably resource factory threw an exception.: () > Snapshot 'test3' creation for VM 'dbatest6' was initiated by > admin@internal-authz. > > The VM is composed by 2 disks, that are on 2 different storage domains. > I'm able to create and then delete a snapshot that includes only the first > system disk (no memory saved), but I receive the same error as in the move > disk if I try to do a snapshot including instead only the second disk > (again no memory save). > In this case the disk doesn't remain locked as it happened when trying to > move the disk... > Can it help in any way to shutdown the VM? > > I should free this storage domain and this is the only disk remained > before decommission... > Thanks, > Gianluca > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
The problem itself seems related with snapshot and with the disk (430Gb in size). Failed to complete snapshot 'test3' creation for VM 'dbatest6'. VDSM ovmsrv07 command HSMGetAllTasksStatusesVDS failed: Could not acquire resource. Probably resource factory threw an exception.: () Snapshot 'test3' creation for VM 'dbatest6' was initiated by admin@internal-authz. The VM is composed by 2 disks, that are on 2 different storage domains. I'm able to create and then delete a snapshot that includes only the first system disk (no memory saved), but I receive the same error as in the move disk if I try to do a snapshot including instead only the second disk (again no memory save). In this case the disk doesn't remain locked as it happened when trying to move the disk... Can it help in any way to shutdown the VM? I should free this storage domain and this is the only disk remained before decommission... Thanks, Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
On Tue, Feb 21, 2017 at 11:47 AM, Fred Rolland wrote: > Add before the command (with your db password): PGPASSWORD=engine > > for example: > PGPASSWORD=engine /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -T > > PGPASSWORD=engine /usr/share/ovirt-engine/setup/dbutils/unlock_entity.sh > -t disk -u engine -q > > >From taskcleaner, if I use the "-T" option I get error [root@ovmgr1 ovirt-engine]# PGPASSWORD=my_pwd /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -d engine -u engine -T t ERROR: column "job_id" does not exist LINE 1: ...created_at,status,return_value,return_value_class,job_id,ste... ^ FATAL: Cannot execute sql command: --command=SELECT command_id,command_type,root_command_id,command_parameters,command_params_class,created_at,status,return_value,return_value_class,job_id,step_id,executed FROM GetAllCommandsWithRunningTasks(); I see the function GetAllCommandsWithRunningTasks as defined only in /usr/share/ovirt-engine/setup/dbutils/taskcleaner_sp_3_5.sql and it seems it makes query on commands_entities, but if I directly go inside db, the table doesn't contain indeed a job_id column I'm on 4.1 upgraded from 4.0.6 engine=# \d command_entities Table "public.command_entities" Column | Type |Modifiers ---+--+- command_id| uuid | not null command_type | integer | not null root_command_id | uuid | command_parameters| text | command_params_class | character varying(256) | created_at| timestamp with time zone | status| character varying(20)| default NULL::character varying callback_enabled | boolean | default false callback_notified | boolean | default false return_value | text | return_value_class| character varying(256) | executed | boolean | default false user_id | uuid | parent_command_id | uuid | data | text | engine_session_seq_id | bigint | command_context | text | Indexes: "pk_command_entities" PRIMARY KEY, btree (command_id) "idx_root_command_id" btree (root_command_id) WHERE root_command_id IS NOT NULL Referenced by: TABLE "command_assoc_entities" CONSTRAINT "fk_coco_command_assoc_entity" FOREIGN KEY (command_id) REFERENCES comm and_entities(command_id) ON DELETE CASCADE engine=# Anyway after unlocking the disk and retrying the move, I get the same error while creating auto snapshot... the first problem on host (that is a different host from the chosen yesterday) seems MetaDataKeyNotFoundError: Meta Data key not found error: ("Missing metadata key: 'DOMAIN': found: {'NONE': 2017-02-21 11:38:58,985 INFO (jsonrpc/0) [dispatcher] Run and protect: createVolume(sdUUID=u'900b1853-e192-4661-a0f9-7c7c396f6f49', spUUID=u'588237b8-0031-02f6-035d-0136', imgUUID=u'f0b5a0e4-ee5d-44a7-ba07-08285791368a', size=u'461708984320', volFormat=4, preallocate=2, diskType=2, volUUID=u'c39c3d9f-dde8-45ab-b4a9-7c3b45c6391d', desc=u'', srcImgUUID=u'f0b5a0e4-ee5d-44a7-ba07-08285791368a', srcVolUUID=u'7ed43974-1039-4a68-a8b3-321e7594fe4c', initialSize=None) (logUtils:49) 2017-02-21 11:38:58,987 INFO (jsonrpc/0) [IOProcessClient] Starting client ioprocess-6269 (__init__:330) 2017-02-21 11:38:59,006 INFO (ioprocess/32170) [IOProcess] Starting ioprocess (__init__:452) 2017-02-21 11:38:59,040 INFO (jsonrpc/0) [dispatcher] Run and protect: createVolume, Return response: None (logUtils:52) 2017-02-21 11:38:59,053 INFO (jsonrpc/0) [jsonrpc.JsonRpcServer] RPC call Volume.create succeeded in 0.07 seconds (__init__:515) 2017-02-21 11:38:59,054 INFO (tasks/9) [storage.ThreadPool.WorkerThread] START task 08d7797a-af46-489f-ada0-c70bf4359366 (cmd=>, args=None) (threadPool:208) 2017-02-21 11:38:59,150 WARN (tasks/9) [storage.ResourceManager] Resource factory failed to create resource '01_img_900b1853-e192-4661-a0f9-7c7c396f6f49.f0b5a0e4-ee5d-44a7-ba07-08285791368a'. Canceling request. (resourceManager:542) Traceback (most recent call last): File "/usr/share/vdsm/storage/resourceManager.py", line 538, in registerResource obj = namespaceObj.factory.createResource(name, lockType) File "/usr/share/vdsm/storage/resourceFactories.py", line 190, in createResource lockType) File "/usr/share/vdsm/storage/resourceFactories.py", line 119, in __getResourceCandidatesList imgUUID=resourceName) File "/usr/share/vdsm/storage/image.py", line 220, in getChain if srcVol.isLeaf(): File "/usr/share/vdsm/storage/volume.py", line 1261, in isLeaf return self._manifest.is
Re: [ovirt-users] moving disk failed.. remained locked
Add before the command (with your db password): PGPASSWORD=engine for example: PGPASSWORD=engine /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -T PGPASSWORD=engine /usr/share/ovirt-engine/setup/dbutils/unlock_entity.sh -t disk -u engine -q On Tue, Feb 21, 2017 at 12:23 PM, Gianluca Cecchi wrote: > > I see here utilitues: > https://www.ovirt.org/develop/developer-guide/db-issues/helperutilities/ > > In particular unlock_entity.sh that should be of help in my case, as I see > here: > http://lists.ovirt.org/pipermail/users/2015-April/032576.html > > New path in 4.1 is now > /usr/share/ovirt-engine/setup/dbutils/ > and not > /usr/share/ovirt-engine/dbscripts > > Question: > How can I verify that "no jobs are still running over it"? > Is taskcleaner.sh the utility to crosscheck jobs? > > In this case how do I provide a password for it? > > [root@ovmgr1 ~]# /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -d > engine -u engine > psql: fe_sendauth: no password supplied > FATAL: Cannot execute sql command: --command=select exists (select * from > information_schema.tables where table_schema = 'public' and table_name = > 'command_entities'); > psql: fe_sendauth: no password supplied > FATAL: Cannot execute sql command: --file=/usr/share/ovirt- > engine/setup/dbutils/taskcleaner_sp.sql > > [root@ovmgr1 ~]# /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -h > Usage: /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh [options] > > -h- This help text. > -v- Turn on verbosity (WARNING: > lots of output) > -l LOGFILE- The logfile for capturing output (def. ) > -s HOST - The database servername for the database (def. > localhost) > -p PORT - The database port for the database(def. 5432) > -u USER - The username for the database (def. ) > -d DATABASE - The database name (def. ) > -t TASK_ID- Removes a task by its Task ID. > -c COMMAND_ID - Removes all tasks related to the given Command Id. > -T- Removes/Displays all commands that have running tasks > -o- Removes/Displays all commands. > -z- Removes/Displays a Zombie task. > -R- Removes all tasks (use with -z to clear only zombie > tasks). > -r- Removes all commands (use with -T to clear only those > with running tasks. Use with -Z to clear only commands with zombie tasks. > -Z- Removes/Displays a command with zombie tasks. > -C- Clear related compensation entries. > -J- Clear related Job Steps. > -A- Clear all Job Steps and compensation entries. > -q- Quite mode, do not prompt for confirmation. > > Thanks, > Gianluca > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
I see here utilitues: https://www.ovirt.org/develop/developer-guide/db-issues/helperutilities/ In particular unlock_entity.sh that should be of help in my case, as I see here: http://lists.ovirt.org/pipermail/users/2015-April/032576.html New path in 4.1 is now /usr/share/ovirt-engine/setup/dbutils/ and not /usr/share/ovirt-engine/dbscripts Question: How can I verify that "no jobs are still running over it"? Is taskcleaner.sh the utility to crosscheck jobs? In this case how do I provide a password for it? [root@ovmgr1 ~]# /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -d engine -u engine psql: fe_sendauth: no password supplied FATAL: Cannot execute sql command: --command=select exists (select * from information_schema.tables where table_schema = 'public' and table_name = 'command_entities'); psql: fe_sendauth: no password supplied FATAL: Cannot execute sql command: --file=/usr/share/ovirt-engine/setup/dbutils/taskcleaner_sp.sql [root@ovmgr1 ~]# /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -h Usage: /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh [options] -h- This help text. -v- Turn on verbosity (WARNING: lots of output) -l LOGFILE- The logfile for capturing output (def. ) -s HOST - The database servername for the database (def. localhost) -p PORT - The database port for the database(def. 5432) -u USER - The username for the database (def. ) -d DATABASE - The database name (def. ) -t TASK_ID- Removes a task by its Task ID. -c COMMAND_ID - Removes all tasks related to the given Command Id. -T- Removes/Displays all commands that have running tasks -o- Removes/Displays all commands. -z- Removes/Displays a Zombie task. -R- Removes all tasks (use with -z to clear only zombie tasks). -r- Removes all commands (use with -T to clear only those with running tasks. Use with -Z to clear only commands with zombie tasks. -Z- Removes/Displays a command with zombie tasks. -C- Clear related compensation entries. -J- Clear related Job Steps. -A- Clear all Job Steps and compensation entries. -q- Quite mode, do not prompt for confirmation. Thanks, Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
On Tue, Feb 21, 2017 at 7:01 AM, Gianluca Cecchi wrote: > On Mon, Feb 20, 2017 at 10:51 PM, Gianluca Cecchi < > gianluca.cec...@gmail.com> wrote: > >> On Mon, Feb 20, 2017 at 8:46 PM, Fred Rolland >> wrote: >> >>> Can you please send the whole logs ? (Engine, vdsm and sanlock) >>> >>> >> vdsm.log.1.xz: >> https://drive.google.com/file/d/0BwoPbcrMv8mvWTViWEUtNjRtLTg >> /view?usp=sharing >> >> sanlock.log >> https://drive.google.com/file/d/0BwoPbcrMv8mvcVM4YzZ4aUZLYVU >> /view?usp=sharing >> >> engine.log (gzip format); >> https://drive.google.com/file/d/0BwoPbcrMv8mvdW80RlFIYkpzenc >> /view?usp=sharing >> >> Thanks, >> Gianluca >> >> > I didn't say that size of disk is 430Gb and target storage domain is 1Tb, > almost empty (950Gb free) > I received a message about problems from the storage where the the disk is > and so I'm trying to move it so that I can put under maintenance the > original one and see. > The errors seem about destination creation of volume and not source... > thanks, > Gianluca > > Info on disk: [g.cecchi@ovmsrv07 ~]$ sudo qemu-img info /rhev/data-center/588237b8-0031-02f6-035d-0136/900b1853-e192-4661-a0f9-7c7c396f6f49/images/f0b5a0e4-ee5d-44a7-ba07-08285791368a/7ed43974-1039-4a68-a8b3-321e7594fe4c image: /rhev/data-center/588237b8-0031-02f6-035d-0136/900b1853-e192-4661-a0f9-7c7c396f6f49/images/f0b5a0e4-ee5d-44a7-ba07-08285791368a/7ed43974-1039-4a68-a8b3-321e7594fe4c file format: qcow2 virtual size: 430G (461708984320 bytes) disk size: 0 cluster_size: 65536 Format specific information: compat: 1.1 lazy refcounts: false refcount bits: 16 corrupt: false [g.cecchi@ovmsrv07 ~]$ Based on another command I learnt from another thread, this is what I get if I check the disk: [g.cecchi@ovmsrv07 ~]$ sudo qemu-img check /rhev/data-center/588237b8-0031-02f6-035d-0136/900b1853-e192-4661-a0f9-7c7c396f6f49/images/f0b5a0e4-ee5d-44a7-ba07-08285791368a/7ed43974-1039-4a68-a8b3-321e7594fe4c Leaked cluster 4013995 refcount=1 reference=0 Leaked cluster 4013996 refcount=1 reference=0 Leaked cluster 4013997 refcount=1 reference=0 ... many lines of this type ... Leaked cluster 6275183 refcount=1 reference=0 Leaked cluster 6275184 refcount=1 reference=0 Leaked cluster 6275185 refcount=1 reference=0 57506 leaked clusters were found on the image. This means waste of disk space, but no harm to data. 6599964/7045120 = 93.68% allocated, 6.30% fragmented, 0.00% compressed clusters Image end offset: 436986380288 Can it help in any way to shutdown the VM to unlock the disk? Thanks, Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
On Mon, Feb 20, 2017 at 10:51 PM, Gianluca Cecchi wrote: > On Mon, Feb 20, 2017 at 8:46 PM, Fred Rolland wrote: > >> Can you please send the whole logs ? (Engine, vdsm and sanlock) >> >> > vdsm.log.1.xz: > https://drive.google.com/file/d/0BwoPbcrMv8mvWTViWEUtNjRtLTg/ > view?usp=sharing > > sanlock.log > https://drive.google.com/file/d/0BwoPbcrMv8mvcVM4YzZ4aUZLYVU/ > view?usp=sharing > > engine.log (gzip format); > https://drive.google.com/file/d/0BwoPbcrMv8mvdW80RlFIYkpzenc/ > view?usp=sharing > > Thanks, > Gianluca > > I didn't say that size of disk is 430Gb and target storage domain is 1Tb, almost empty (950Gb free) I received a message about problems from the storage where the the disk is and so I'm trying to move it so that I can put under maintenance the original one and see. The errors seem about destination creation of volume and not source... thanks, Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
On Mon, Feb 20, 2017 at 8:46 PM, Fred Rolland wrote: > Can you please send the whole logs ? (Engine, vdsm and sanlock) > > vdsm.log.1.xz: https://drive.google.com/file/d/0BwoPbcrMv8mvWTViWEUtNjRtLTg/view?usp=sharing sanlock.log https://drive.google.com/file/d/0BwoPbcrMv8mvcVM4YzZ4aUZLYVU/view?usp=sharing engine.log (gzip format); https://drive.google.com/file/d/0BwoPbcrMv8mvdW80RlFIYkpzenc/view?usp=sharing Thanks, Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] moving disk failed.. remained locked
Can you please send the whole logs ? (Engine, vdsm and sanlock) On Mon, Feb 20, 2017 at 4:49 PM, Gianluca Cecchi wrote: > Hello, > I'm trying to move a disk from one storage domain A to another B in oVirt > 4.1 > The corresponding VM is powered on in the mean time > > When executing the action, there was already in place a disk move from > storage domain C to A (this move was for a disk of a powered off VM and > then completed ok) > I got this in events of webadmin gui for the failed move A -> B: > > Feb 20, 2017 2:42:00 PM Failed to complete snapshot 'Auto-generated for > Live Storage Migration' creation for VM 'dbatest6'. > Feb 20, 2017 2:40:51 PM VDSM ovmsrv06 command HSMGetAllTasksStatusesVDS > failed: Error creating a new volume > Feb 20, 2017 2:40:51 PM Snapshot 'Auto-generated for Live Storage > Migration' creation for VM 'dbatest6' was initiated by admin@internal-authz. > > > And in relevant vdsm.log of referred host ovmsrv06 > > 2017-02-20 14:41:44,899 ERROR (tasks/8) [storage.Volume] Unexpected error > (volume:1087) > Traceback (most recent call last): > File "/usr/share/vdsm/storage/volume.py", line 1081, in create > cls.newVolumeLease(metaId, sdUUID, volUUID) > File "/usr/share/vdsm/storage/volume.py", line 1361, in newVolumeLease > return cls.manifestClass.newVolumeLease(metaId, sdUUID, volUUID) > File "/usr/share/vdsm/storage/blockVolume.py", line 310, in > newVolumeLease > sanlock.init_resource(sdUUID, volUUID, [(leasePath, leaseOffset)]) > SanlockException: (-202, 'Sanlock resource init failure', 'Sanlock > exception') > 2017-02-20 14:41:44,900 ERROR (tasks/8) [storage.TaskManager.Task] > (Task='d694b892-b078-4d86-a035-427ee4fb3b13') Unexpected error (task:870) > Traceback (most recent call last): > File "/usr/share/vdsm/storage/task.py", line 877, in _run > return fn(*args, **kargs) > File "/usr/share/vdsm/storage/task.py", line 333, in run > return self.cmd(*self.argslist, **self.argsdict) > File "/usr/lib/python2.7/site-packages/vdsm/storage/securable.py", line > 79, in wrapper > return method(self, *args, **kwargs) > File "/usr/share/vdsm/storage/sp.py", line 1929, in createVolume > initialSize=initialSize) > File "/usr/share/vdsm/storage/sd.py", line 762, in createVolume > initialSize=initialSize) > File "/usr/share/vdsm/storage/volume.py", line 1089, in create > (volUUID, e)) > VolumeCreationError: Error creating a new volume: (u"Volume creation > d0d938bd-1479-49cb-93fb-85b6a32d6cb4 failed: (-202, 'Sanlock resource > init failure', 'Sanlock exception')",) > 2017-02-20 14:41:44,941 INFO (tasks/8) [storage.Volume] Metadata rollback > for sdUUID=900b1853-e192-4661-a0f9-7c7c396f6f49 offs=8 (blockVolume:448) > > > Was the error generated due to the other migration still in progress? > Is there a limit of concurrent migrations from/to a particular storage > domain? > > Now I would like to retry, but I see that the disk is in state locked with > hourglass. > The autogenerated snapshot of the failed action was apparently removed > with success as I don't see it. > > How can I proceed to move the disk? > > Thanks in advance, > Gianluca > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users