Re: [ovirt-users] Sanlock add Lockspace Errors
Am 6/3/2016 um 6:37 PM schrieb Nir Soffer: > On Fri, Jun 3, 2016 at 11:27 AM, InterNetX - Juergen Gotteswinter > wrote: >> What if we move all vm off the lun which causes this error, drop the lun >> and recreated it. Will we "migrate" the error with the VM to a different >> lun or could this be a fix? > > This should will fix the ids file, but since we don't know why this corruption > happened, it may happen again. > i am pretty sure to know when / why this happend, after a major outage with engine gone crazy in fencing hosts + crash / hard reset of the san this messages occoured the first time. but i can provide a log package, no problem > Please open a bug with the log I requested so we can investigate this issue. > > To fix the ids file you don't have to recreate the lun, just > initialize the ids lv. > > 1. Put the domain to maintenance (via engine) > > No host should access it while you reconstruct the ids file > > 2. Activate the ids lv > > You may need to connect to this iscsi target first, unless you have other > vgs connected on the same target. > > lvchange -ay sd_uuid/ids > > 3. Initialize the lockspace > > sanlock direct init -s :0:/dev//ids:0 > > 4. Deactivate the ids lv > > lvchange -an sd_uuid/ids > > 6. Activate the domain (via engine) > > The domain should become active after a while. > oh, this is great, going to announce an maintance window. Thanks a lot, this already started to drive me crazy. Will Report after we did this! > Nir > >> >> Am 6/3/2016 um 10:08 AM schrieb InterNetX - Juergen Gotteswinter: >>> Hello David, >>> >>> thanks for your explanation of those messages, is there any possibility >>> to get rid of this? i already figured out that it might be an corruption >>> of the ids file, but i didnt find anything about re-creating or other >>> solutions to fix this. >>> >>> Imho this occoured after an outage where several hosts, and the iscsi >>> SAN has been fenced and/or rebooted. >>> >>> Thanks, >>> >>> Juergen >>> >>> >>> Am 6/2/2016 um 6:03 PM schrieb David Teigland: On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote: >> This is a mess that's been caused by improper use of storage, and various >> sanity checks in sanlock have all reported errors for "impossible" >> conditions indicating that something catastrophic has been done to the >> storage it's using. Some fundamental rules are not being followed. > > Thanks David. > > Do you need more output from sanlock to understand this issue? I can think of nothing more to learn from sanlock. I'd suggest tighter, higher level checking or control of storage. Low level sanity checks detecting lease corruption are not a convenient place to work from. >>> >>> ___ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Sanlock add Lockspace Errors
On Fri, Jun 3, 2016 at 11:27 AM, InterNetX - Juergen Gotteswinter wrote: > What if we move all vm off the lun which causes this error, drop the lun > and recreated it. Will we "migrate" the error with the VM to a different > lun or could this be a fix? This should will fix the ids file, but since we don't know why this corruption happened, it may happen again. Please open a bug with the log I requested so we can investigate this issue. To fix the ids file you don't have to recreate the lun, just initialize the ids lv. 1. Put the domain to maintenance (via engine) No host should access it while you reconstruct the ids file 2. Activate the ids lv You may need to connect to this iscsi target first, unless you have other vgs connected on the same target. lvchange -ay sd_uuid/ids 3. Initialize the lockspace sanlock direct init -s :0:/dev//ids:0 4. Deactivate the ids lv lvchange -an sd_uuid/ids 6. Activate the domain (via engine) The domain should become active after a while. Nir > > Am 6/3/2016 um 10:08 AM schrieb InterNetX - Juergen Gotteswinter: >> Hello David, >> >> thanks for your explanation of those messages, is there any possibility >> to get rid of this? i already figured out that it might be an corruption >> of the ids file, but i didnt find anything about re-creating or other >> solutions to fix this. >> >> Imho this occoured after an outage where several hosts, and the iscsi >> SAN has been fenced and/or rebooted. >> >> Thanks, >> >> Juergen >> >> >> Am 6/2/2016 um 6:03 PM schrieb David Teigland: >>> On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote: > This is a mess that's been caused by improper use of storage, and various > sanity checks in sanlock have all reported errors for "impossible" > conditions indicating that something catastrophic has been done to the > storage it's using. Some fundamental rules are not being followed. Thanks David. Do you need more output from sanlock to understand this issue? >>> >>> I can think of nothing more to learn from sanlock. I'd suggest tighter, >>> higher level checking or control of storage. Low level sanity checks >>> detecting lease corruption are not a convenient place to work from. >>> >> >> ___ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Sanlock add Lockspace Errors
What if we move all vm off the lun which causes this error, drop the lun and recreated it. Will we "migrate" the error with the VM to a different lun or could this be a fix? Am 6/3/2016 um 10:08 AM schrieb InterNetX - Juergen Gotteswinter: > Hello David, > > thanks for your explanation of those messages, is there any possibility > to get rid of this? i already figured out that it might be an corruption > of the ids file, but i didnt find anything about re-creating or other > solutions to fix this. > > Imho this occoured after an outage where several hosts, and the iscsi > SAN has been fenced and/or rebooted. > > Thanks, > > Juergen > > > Am 6/2/2016 um 6:03 PM schrieb David Teigland: >> On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote: This is a mess that's been caused by improper use of storage, and various sanity checks in sanlock have all reported errors for "impossible" conditions indicating that something catastrophic has been done to the storage it's using. Some fundamental rules are not being followed. >>> >>> Thanks David. >>> >>> Do you need more output from sanlock to understand this issue? >> >> I can think of nothing more to learn from sanlock. I'd suggest tighter, >> higher level checking or control of storage. Low level sanity checks >> detecting lease corruption are not a convenient place to work from. >> > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Sanlock add Lockspace Errors
Hello David, thanks for your explanation of those messages, is there any possibility to get rid of this? i already figured out that it might be an corruption of the ids file, but i didnt find anything about re-creating or other solutions to fix this. Imho this occoured after an outage where several hosts, and the iscsi SAN has been fenced and/or rebooted. Thanks, Juergen Am 6/2/2016 um 6:03 PM schrieb David Teigland: > On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote: >>> This is a mess that's been caused by improper use of storage, and various >>> sanity checks in sanlock have all reported errors for "impossible" >>> conditions indicating that something catastrophic has been done to the >>> storage it's using. Some fundamental rules are not being followed. >> >> Thanks David. >> >> Do you need more output from sanlock to understand this issue? > > I can think of nothing more to learn from sanlock. I'd suggest tighter, > higher level checking or control of storage. Low level sanity checks > detecting lease corruption are not a convenient place to work from. > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Sanlock add Lockspace Errors
On Thu, Jun 02, 2016 at 06:47:37PM +0300, Nir Soffer wrote: > > This is a mess that's been caused by improper use of storage, and various > > sanity checks in sanlock have all reported errors for "impossible" > > conditions indicating that something catastrophic has been done to the > > storage it's using. Some fundamental rules are not being followed. > > Thanks David. > > Do you need more output from sanlock to understand this issue? I can think of nothing more to learn from sanlock. I'd suggest tighter, higher level checking or control of storage. Low level sanity checks detecting lease corruption are not a convenient place to work from. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Sanlock add Lockspace Errors
On Thu, Jun 2, 2016 at 6:35 PM, David Teigland wrote: >> verify_leader 2 wrong space name >> 4643f652-8014-4951-8a1a-02af41e67d08 >> f757b127-a951-4fa9-bf90-81180c0702e6 >> /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids > >> leader1 delta_acquire_begin error -226 lockspace >> f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 > > VDSM has tried to join VG/lockspace/storage-domain "f757b127" on LV > /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids. But sanlock finds that > lockspace "4643f652" is initialized on that storage, i.e. inconsistency > between the leases formatted on disk and what the leases are being used > for. That should never happen unless sanlock and/or storage are > used/moved/copied wrongly. The error is a sanlock sanity check to catch > misuse. > > >> s1527 check_other_lease invalid for host 0 0 ts 7566376 name in >> 4643f652-8014-4951-8a1a-02af41e67d08 > >> s1527 check_other_lease leader 12212010 owner 1 11 ts 7566376 >> sn f757b127-a951-4fa9-bf90-81180c0702e6 rn >> f888524b-27aa-4724-8bae-051f9e950a21.vm1.intern > > Apparently sanlock is already managing a lockspace called "4643f652" when > it finds another lease in that lockspace has the inconsistent/corrupt name > "f757b127". I can't say what steps might have been done to lead to this. > > This is a mess that's been caused by improper use of storage, and various > sanity checks in sanlock have all reported errors for "impossible" > conditions indicating that something catastrophic has been done to the > storage it's using. Some fundamental rules are not being followed. Thanks David. Do you need more output from sanlock to understand this issue? Juergen, can you open ovirt bug, and include sanlock and vdsm logs from the time this error started? Thanks, Nir ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Sanlock add Lockspace Errors
> verify_leader 2 wrong space name > 4643f652-8014-4951-8a1a-02af41e67d08 > f757b127-a951-4fa9-bf90-81180c0702e6 > /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids > leader1 delta_acquire_begin error -226 lockspace > f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 VDSM has tried to join VG/lockspace/storage-domain "f757b127" on LV /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids. But sanlock finds that lockspace "4643f652" is initialized on that storage, i.e. inconsistency between the leases formatted on disk and what the leases are being used for. That should never happen unless sanlock and/or storage are used/moved/copied wrongly. The error is a sanlock sanity check to catch misuse. > s1527 check_other_lease invalid for host 0 0 ts 7566376 name in > 4643f652-8014-4951-8a1a-02af41e67d08 > s1527 check_other_lease leader 12212010 owner 1 11 ts 7566376 > sn f757b127-a951-4fa9-bf90-81180c0702e6 rn > f888524b-27aa-4724-8bae-051f9e950a21.vm1.intern Apparently sanlock is already managing a lockspace called "4643f652" when it finds another lease in that lockspace has the inconsistent/corrupt name "f757b127". I can't say what steps might have been done to lead to this. This is a mess that's been caused by improper use of storage, and various sanity checks in sanlock have all reported errors for "impossible" conditions indicating that something catastrophic has been done to the storage it's using. Some fundamental rules are not being followed. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Sanlock add Lockspace Errors
On Mon, May 30, 2016 at 11:06 AM, InterNetX - Juergen Gotteswinter wrote: > Hi, > > since some time we get Error Messages from Sanlock, and so far i was not > able to figure out what exactly they try to tell and more important if > its something which can be ignored or needs to be fixed (and how). Sanlock errors messages are somewhat cryptic, hopefully David can explain them. > > Here are the Versions we are using currently: > > Engine > > ovirt-engine-3.5.6.2-1.el6.noarch > > Nodes > > vdsm-4.16.34-0.el7.centos.x86_64 > sanlock-3.2.4-1.el7.x86_64 > libvirt-lock-sanlock-1.2.17-13.el7_2.3.x86_64 > libvirt-daemon-1.2.17-13.el7_2.3.x86_64 > libvirt-lock-sanlock-1.2.17-13.el7_2.3.x86_64 > libvirt-1.2.17-13.el7_2.3.x86_64 > > -- snip -- > May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 > [60137]: verify_leader 2 wrong space name > 4643f652-8014-4951-8a1a-02af41e67d08 > f757b127-a951-4fa9-bf90-81180c0702e6 > /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids > May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 > [60137]: leader1 delta_acquire_begin error -226 lockspace > f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 > May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 > [60137]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 > May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 > [60137]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 > May 30 09:55:27 vm2 sanlock[1094]: 2016-05-30 09:55:27+0200 294109 > [60137]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn > 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 > May 30 09:55:28 vm2 sanlock[1094]: 2016-05-30 09:55:28+0200 294110 > [1099]: s9703 add_lockspace fail result -226 > May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 > [60331]: verify_leader 2 wrong space name > 4643f652-8014-4951-8a1a-02af41e67d08 > f757b127-a951-4fa9-bf90-81180c0702e6 > /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids > May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 > [60331]: leader1 delta_acquire_begin error -226 lockspace > f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 > May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 > [60331]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 > May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 > [60331]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 > May 30 09:55:58 vm2 sanlock[1094]: 2016-05-30 09:55:58+0200 294140 > [60331]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn > 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 > May 30 09:55:59 vm2 sanlock[1094]: 2016-05-30 09:55:59+0200 294141 > [1098]: s9704 add_lockspace fail result -226 > May 30 09:56:05 vm2 sanlock[1094]: 2016-05-30 09:56:05+0200 294148 > [1094]: s1527 check_other_lease invalid for host 0 0 ts 7566376 name in > 4643f652-8014-4951-8a1a-02af41e67d08 > May 30 09:56:05 vm2 sanlock[1094]: 2016-05-30 09:56:05+0200 294148 > [1094]: s1527 check_other_lease leader 12212010 owner 1 11 ts 7566376 sn > f757b127-a951-4fa9-bf90-81180c0702e6 rn > f888524b-27aa-4724-8bae-051f9e950a21.vm1.intern > May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 > [60496]: verify_leader 2 wrong space name > 4643f652-8014-4951-8a1a-02af41e67d08 > f757b127-a951-4fa9-bf90-81180c0702e6 > /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids > May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 > [60496]: leader1 delta_acquire_begin error -226 lockspace > f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 > May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 > [60496]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 > May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 > [60496]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 > May 30 09:56:28 vm2 sanlock[1094]: 2016-05-30 09:56:28+0200 294170 > [60496]: leader4 sn 4643f652-8014-4951-8a1a-02af41e67d08 rn > 1eed8aa9-8fb5-4d27-8d1c-03ebce2c36d4.vm2.intern ts 3786679 cs 1474f033 > May 30 09:56:29 vm2 sanlock[1094]: 2016-05-30 09:56:29+0200 294171 > [6415]: s9705 add_lockspace fail result -226 > May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 > [60645]: verify_leader 2 wrong space name > 4643f652-8014-4951-8a1a-02af41e67d08 > f757b127-a951-4fa9-bf90-81180c0702e6 > /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids > May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 > [60645]: leader1 delta_acquire_begin error -226 lockspace > f757b127-a951-4fa9-bf90-81180c0702e6 host_id 2 > May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 > [60645]: leader2 path /dev/f757b127-a951-4fa9-bf90-81180c0702e6/ids offset 0 > May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0200 294200 > [60645]: leader3 m 12212010 v 30003 ss 512 nh 0 mh 1 oi 2 og 8 lv 0 > May 30 09:56:58 vm2 sanlock[1094]: 2016-05-30 09:56:58+0