Please disregard the last e-mail. I re-run the command and now the exit code was 0, and the migration process is not stuck anymore.

Thanks so much for all the help, Benny!

Regards.

El 2018-05-18 08:42, nico...@devels.es escribió:
Hi,

We're getting closer to solve it :-)

I'll answer below with my steps, there's one that fails and I don't
know why (probably I missed something).

El 2018-05-17 15:47, Benny Zlotnik escribió:
Sorry, I forgot it's ISCSI, it's a bit different

In my case it would look something like:

2018-05-17 17:30:12,740+0300 DEBUG (jsonrpc/7) [jsonrpc.JsonRpcServer]
Return 'Volume.getInfo' in bridge with {'status': 'OK', 'domain':
'3e541b2d-
2a49-4eb8-ae4b-aa9acee228c6', 'voltype': 'INTERNAL', 'description':
'{"DiskAlias":"vm_Disk1","DiskDescription":""}', 'parent':
'00000000-0000-0000-
0000-000000000000', 'format': 'RAW', 'generation': 0, 'image':
'dd6b5ae0-196e-4879-b076-a0a8d8a1dfde', 'ctime': '1526566607',
'disktype': 'DATA', '
legality': 'LEGAL', 'mtime': '0', 'apparentsize': '1073741824',
'children': [], 'pool': '', 'capacity': '1073741824', 'uuid':
u'221c45e1-7f65-42c8-afc3-0ccc1d6fc148', 'truesize': '1073741824',
'type': 'PREALLOCATED', 'lease': {'path':
'/dev/3e541b2d-2a49-4eb8-ae4b-aa9acee228c6/leases', 'owners
': [], 'version': None, 'offset': 109051904}} (__init__:355)

I then look for 221c45e1-7f65-42c8-afc3-0ccc1d6fc148 in sanlock.log:

2018-05-17 17:30:12 20753 [3335]: s10:r14 resource
3e541b2d-2a49-4eb8-ae4b-aa9acee228c6:221c45e1-7f65-42c8-afc3-0ccc1d6fc148:/dev/3e541b2d-2a49-4eb
8-ae4b-aa9acee228c6/leases:109051904 for 2,11,31496


I only could find the entry on one of the hosts. So when I grepped the
uuid I found:

2018-05-16 12:39:44 4761204 [1023]: s33:r103 resource
1876ab86-216f-4a37-a36b-2b5d99fcaad0:c2cfbb02-9981-4fb7-baea-7257a824145c:/dev/1876ab86-216f-4a37-a36b-2b5d99fcaad0/leases:128974848
for 23,47,9206

So the resource would
be: 
3e541b2d-2a49-4eb8-ae4b-aa9acee228c6:221c45e1-7f65-42c8-afc3-0ccc1d6fc148:/dev/3e541b2d-2a49-4eb8-ae4b-aa9acee228c6/leases:109051904
and the pid is 31496


Ok, so my resource is
1876ab86-216f-4a37-a36b-2b5d99fcaad0:c2cfbb02-9981-4fb7-baea-7257a824145c:/dev/1876ab86-216f-4a37-a36b-2b5d99fcaad0/leases:128974848
and my PID is 9206.

running
$ sanlock direct dump
/dev/3e541b2d-2a49-4eb8-ae4b-aa9acee228c6/leases:109051904

  offset                            lockspace     
                                   resource 
timestamp  own  gen lver

00000000 3e541b2d-2a49-4eb8-ae4b-aa9acee228c6           
 221c45e1-7f65-42c8-afc3-0ccc1d6fc148 0000020753 0001 0004 5
...

In my case the output would be:

[...]
00000000 1876ab86-216f-4a37-a36b-2b5d99fcaad0
c2cfbb02-9981-4fb7-baea-7257a824145c 0004918032 0008 0004 2
[...]


If the vdsm pid changed (and it probably did) it will be different,
so I acquire it for the new pid
$ sanlock client acquire -r
3e541b2d-2a49-4eb8-ae4b-aa9acee228c6:221c45e1-7f65-42c8-afc3-0ccc1d6fc148:/dev/3e541b2d-2a49-4eb8-ae4b-aa9acee228c6/leases:109051904
-p 32265
acquire pid 32265


I checked vdsmd's PID

# systemctl status vdsmd
● vdsmd.service - Virtual Desktop Server Manager
[...]
├─17758 /usr/bin/python2 /usr/share/vdsm/vdsm

So the new PID is 17758.

# sanlock client acquire -r
1876ab86-216f-4a37-a36b-2b5d99fcaad0:c2cfbb02-9981-4fb7-baea-7257a824145c:/dev/1876ab86-216f-4a37-a36b-2b5d99fcaad0/leases:128974848
-p 17758
acquire pid 17758
acquire done 0


Then I can see the timestamp changed 

$ sanlock direct dump
/dev/3e541b2d-2a49-4eb8-ae4b-aa9acee228c6/leases:109051904
  offset                            lockspace     
                                   resource 
timestamp  own  gen lver
00000000 3e541b2d-2a49-4eb8-ae4b-aa9acee228c6           
 221c45e1-7f65-42c8-afc3-0ccc1d6fc148 0000021210 0001 0005 6

And then I release it:
$ sanlock client release -r
3e541b2d-2a49-4eb8-ae4b-aa9acee228c6:221c45e1-7f65-42c8-afc3-0ccc1d6fc148:/dev/3e541b2d-2a49-4eb8-ae4b-aa9acee228c6/leases:109051904
-p 32265

release pid 32265
release done 0


There's where it fails:

# sanlock direct release -r
1876ab86-216f-4a37-a36b-2b5d99fcaad0:c2cfbb02-9981-4fb7-baea-7257a824145c:/dev/1876ab86-216f-4a37-a36b-2b5d99fcaad0/leases:128974848
-p 17758
release done -251

And the resource is still stuck.

Is there something I missed there?

$ sanlock direct dump
/dev/3e541b2d-2a49-4eb8-ae4b-aa9acee228c6/leases:109051904
  offset                            lockspace     
                                   resource 
timestamp  own  gen lver
00000000 3e541b2d-2a49-4eb8-ae4b-aa9acee228c6           
 221c45e1-7f65-42c8-afc3-0ccc1d6fc148 0000000000 0001 0005 6

The timestamp is zeroed and the lease is free

On Thu, May 17, 2018 at 3:38 PM, <nico...@devels.es> wrote:

This is vdsm 4.19.45. I grepped the disk uuid in
/var/log/sanlock.log but unfortunately no entry there...

El 2018-05-17 13:11, Benny Zlotnik escribió:

Which vdsm version are you using?

You can try looking for the image uuid in /var/log/sanlock.log

On Thu, May 17, 2018 at 2:40 PM, <nico...@devels.es> wrote:

Thanks.

I've been able to see the line in the log, however, the format
differs slightly from yours.

  2018-05-17 12:24:44,132+0100 DEBUG (jsonrpc/6)
[jsonrpc.JsonRpcServer] Calling 'Volume.getInfo' in bridge with
{u'storagepoolID': u'75bf8f48-970f-42bc-8596-f8ab6efb2b63',
u'imageID': u'b4013aba-a936-4a54-bb14-670d3a8b7c38', u'volumeID':
u'c2cfbb02-9981-4fb7-baea-7257a824145c', u'storagedomainID':
u'1876ab86-216f-4a37-a36b-2b5d99fcaad0'} (__init__:556)
2018-05-17 12:24:44,689+0100 DEBUG (jsonrpc/6)
[jsonrpc.JsonRpcServer] Return 'Volume.getInfo' in bridge with
{'status': 'OK', 'domain': '1876ab86-216f-4a37-a36b-2b5d99fcaad0',
'voltype': 'INTERNAL', 'description': 'None', 'parent':
'ea9a0182-329f-4b8f-abe3-e894de95dac0', 'format': 'COW',
'generation': 1, 'image': 'b4013aba-a936-4a54-bb14-670d3a8b7c38',
'ctime': '1526470759', 'disktype': '2', 'legality': 'LEGAL',
'mtime': '0', 'apparentsize': '1073741824', 'children': [], 'pool':
'', 'capacity': '21474836480', 'uuid':
u'c2cfbb02-9981-4fb7-baea-7257a824145c', 'truesize': '1073741824',
'type': 'SPARSE', 'lease': {'owners': [8], 'version': 1L}}
(__init__:582)

As you can see, there's no path field there.

How should I procceed?

El 2018-05-17 12:01, Benny Zlotnik escribió:
vdsm-client replaces vdsClient, take a look
here: https://lists.ovirt.org/pipermail/devel/2016-July/013535.html
[1]
[1]
[4]

On Thu, May 17, 2018 at 1:57 PM, <nico...@devels.es> wrote:

The issue is present in the logs:

  2018-05-17 11:50:44,822+01 INFO 
[org.ovirt.engine.core.bll.storage.disk.image.VdsmImagePoller]
(DefaultQuartzScheduler1) [39755bb7-9082-40d6-ae5e-64b5b2b5f98e]
Command CopyData id: '84a49b25-0e37-4338-834e-08bd67c42860': the
volume lease is not FREE - the job is running

I tried setting the log level to debug but it seems I have not a
vdsm-client command. All I have is a vdsm-tool command. Is it
equivalent?

Thanks

El 2018-05-17 11:49, Benny Zlotnik escribió:
By the way, please verify it's the same issue, you should see "the
volume lease is not FREE - the job is running" in the engine log

On Thu, May 17, 2018 at 1:21 PM, Benny Zlotnik
<bzlot...@redhat.com>
wrote:

I see because I am on debug level, you need to enable it in order
to
see 

https://www.ovirt.org/develop/developer-guide/vdsm/log-files/ [2]
[2]

[1]

[3]

On Thu, 17 May 2018, 13:10 , <nico...@devels.es> wrote:

Hi,

Thanks. I've checked vdsm logs on all my hosts but the only entry
I can
find grepping by Volume.getInfo is like this:

   2018-05-17 10:14:54,892+0100 INFO  (jsonrpc/0)
[jsonrpc.JsonRpcServer]
RPC call Volume.getInfo succeeded in 0.30 seconds (__init__:539)

I cannot find a line like yours... any other way on how to obtain
those
parameters. This is an iSCSI based storage FWIW (both source and
destination of the movement).

Thanks.

El 2018-05-17 10:01, Benny Zlotnik escribió:
In the vdsm log you will find the volumeInfo log which looks
like
this:

2018-05-17 11:55:03,257+0300 DEBUG (jsonrpc/6)
[jsonrpc.JsonRpcServer]
Return 'Volume.getInfo' in bridge with {'status': 'OK',
'domain':
'5c4d2216-
2eb3-4e24-b254-d5f83fde4dbe', 'voltype': 'INTERNAL',
'description':
'{"DiskAlias":"vm_Disk1","DiskDescription":""}', 'parent':
'00000000-0000-0000-
0000-000000000000', 'format': 'RAW', 'generation': 3, 'image':
'b8eb8c82-fddd-4fbc-b80d-6ee04c1255bc', 'ctime': '1526543244',
'disktype': 'DATA', '
legality': 'LEGAL', 'mtime': '0', 'apparentsize': '1073741824',
'children': [], 'pool': '', 'capacity': '1073741824', 'uuid':
u'7190913d-320c-4fc9-
a5b3-c55b26aa30f4', 'truesize': '0', 'type': 'SPARSE', 'lease':
{'path':


 u'/rhev/data-center/mnt/10.35.0.233:_root_storage__domains_sd1/5c4d2216-2e


b3-4e24-b254-d5f83fde4dbe/images/b8eb8c82-fddd-4fbc-b80d-6ee04c1255bc/7190913d-320c-4fc9-a5b3-c55b26aa30f4.lease',

'owners': [1], 'version': 8L, 'o
ffset': 0}} (__init__:355)

The lease path in my case is: 
/rhev/data-center/mnt/10.35.0. [3] [3] [2]


[1]233:_root_storage__domains_sd1/5c4d2216-2eb3-4e24-b254-d5f83fde4dbe/images/b8eb8c82-fddd-4fbc-b80d-6ee04c1255bc/7190913d-320c-4fc9-a5b3-c55b26aa30f4.lease

Then you can look in /var/log/sanlock.log

2018-05-17 11:35:18 243132 [14847]: s2:r9 resource


5c4d2216-2eb3-4e24-b254-d5f83fde4dbe:7190913d-320c-4fc9-a5b3-c55b26aa30f4:/rhev/data-center/mnt/10.35.0.233:_root_storage__domains_sd1/5c4d2216-2eb3-4e24-b254-d5f83fde4dbe/images/b8eb8c82-fddd-4fbc-b80d-6ee04c1255bc/7190913d-320c-4fc9-a5b3-c55b26aa30f4.lease:0

for 2,9,5049

Then you can use this command to unlock, the pid in this case
is 5049

sanlock client release -r RESOURCE -p pid

On Thu, May 17, 2018 at 11:52 AM, Benny Zlotnik
<bzlot...@redhat.com>
wrote:

I believe you've hit this
bug: https://bugzilla.redhat.com/show_bug.cgi?id=1565040 [4] [4]
[3]

[2]

 [1]

You can try to release the lease manually using the

  sanlock client

command (there's an example in the comments on the bug), 
once the lease is free the job will fail and the disk can be
   unlock

On Thu, May 17, 2018 at 11:05 AM, <nico...@devels.es> wrote:

Hi,

We're running oVirt 4.1.9 (I know it's not the recommended
version, but we can't upgrade yet) and recently we had an
   issue

with a Storage Domain while a VM was moving a disk. The
   Storage

Domain went down for a few minutes, then it got back.

However, the disk's state has stuck in a 'Migrating: 10%'
   state

(see ss-2.png).

I run the 'unlock_entity.sh' script to try to unlock the
   disk,

with these parameters:

 # PGPASSWORD=...
/usr/share/ovirt-engine/setup/dbutils/unlock_entity.sh -t
   disk -u

engine -v b4013aba-a936-4a54-bb14-670d3a8b7c38

The disk's state changed to 'OK', but the actual state still
states it's migrating (see ss-1.png).

Calling the script with -t all doesn't make a difference
   either.

Currently, the disk is unmanageable: cannot be deactivated,
   moved

or copied, as it says there's a copying operation running
   already.

Could someone provide a way to unlock this disk? I don't mind
modifying a value directly into the database, I just need the
copying process cancelled.

Thanks.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org

   Links:
   ------
   [1] https://bugzilla.redhat.com/show_bug.cgi?id=1565040 [4] [4]
[3] [2]

   Links:
   ------
   [1] http://10.35.0 [5] [5] [5].
   [2] https://bugzilla.redhat.com/show_bug.cgi?id=1565040 [4] [4]
[3]
   [3] https://www.ovirt.org/develop/developer-guide/vdsm/log-files/
[2]
 [2] [1]

  Links:
  ------
  [1] https://www.ovirt.org/develop/developer-guide/vdsm/log-files/
[2] [2]
  [2] http://10.35.0 [5] [5].
  [3] https://bugzilla.redhat.com/show_bug.cgi?id=1565040 [4] [4]
  [4] https://lists.ovirt.org/pipermail/devel/2016-July/013535.html
[1] [1]
  [5] http://10.35.0 [5] [5]

  _______________________________________________
  Users mailing list -- users@ovirt.org
  To unsubscribe send an email to users-le...@ovirt.org

 Links:
 ------
 [1] https://lists.ovirt.org/pipermail/devel/2016-July/013535.html [1]
 [2] https://www.ovirt.org/develop/developer-guide/vdsm/log-files/ [2]
 [3] http://10.35.0 [5].
 [4] https://bugzilla.redhat.com/show_bug.cgi?id=1565040 [4]
 [5] http://10.35.0 [5]



Links:
------
[1] https://lists.ovirt.org/pipermail/devel/2016-July/013535.html
[2] https://www.ovirt.org/develop/developer-guide/vdsm/log-files/
[3] http://10.35.0.
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1565040
[5] http://10.35.0
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org

Reply via email to