Re: [ClusterLabs] heads up: Possible VM data corruption upgrading to SLES15 SP3

2022-01-27 Thread Andrei Borzenkov
On Thu, Jan 27, 2022 at 5:10 PM Ulrich Windl
 wrote:
>
> Any better ideas anyone?
>

Perform online upgrade. Any reason you need to do an offline upgrade
in the first place?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] heads up: Possible VM data corruption upgrading to SLES15 SP3

2022-01-27 Thread Strahil Nikolov via Users
 Are you using HA-LVM or CLVM ?
Best Regards,Strahil Nikolov
 
  On Thu, Jan 27, 2022 at 16:10, Ulrich 
Windl wrote:   Hi!

I know this is semi-offtopic, but I think it's important:
I've upgraded one cluster node being a Xen host from SLES15 SP2 to SLES15 SP3 
using virtual DVD boot (i.e. the upgrade environment is loaded from that DVD).
Watching the syslog while Yast was searching for systems to upgrade, I noticed 
that it tried to mount _every_ disk read-only.
(We use multipathed FC SAN disks that are attached as block devices to the VMs, 
so they look like "normal" disks)

On my first attempt I did not enable multipath as it it not needed to upgrade 
the OS (system VG is single-pathed), but then LVM complained about multiple 
disks having the same ID.
On the second attempt I did activate multipathing, but then Yast mounted every 
disk and tried to assemble every MDRAID it found, even if that was on shared 
storage, thus being actively in use by the other cluster nodes.

To make things worse, even when mounting read-only, XFS (for example) tried to 
"recover" a filesystem when it thinks it is dirty.
I found no way to avoid that mounting (a support case at SUSE is in progress).

Fortunately if the VMs were running for a significant time, most blocks are 
cached inside the VM, and blocks are "mostly written" instead of being read. So 
most likely the badly recovered blocks are overwritten with good data before 
the machine reboots and the bad blocks would be read.

This most obvious "solution" to stop every VM on the whole cluster before 
upgrading a single node is not very HA-like, unfortunately.

Any better ideas anyone?

Regards,
Ulrich



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] heads up: Possible VM data corruption upgrading to SLES15 SP3

2022-01-27 Thread Ulrich Windl
Hi!

I know this is semi-offtopic, but I think it's important:
I've upgraded one cluster node being a Xen host from SLES15 SP2 to SLES15 SP3 
using virtual DVD boot (i.e. the upgrade environment is loaded from that DVD).
Watching the syslog while Yast was searching for systems to upgrade, I noticed 
that it tried to mount _every_ disk read-only.
(We use multipathed FC SAN disks that are attached as block devices to the VMs, 
so they look like "normal" disks)

On my first attempt I did not enable multipath as it it not needed to upgrade 
the OS (system VG is single-pathed), but then LVM complained about multiple 
disks having the same ID.
On the second attempt I did activate multipathing, but then Yast mounted every 
disk and tried to assemble every MDRAID it found, even if that was on shared 
storage, thus being actively in use by the other cluster nodes.

To make things worse, even when mounting read-only, XFS (for example) tried to 
"recover" a filesystem when it thinks it is dirty.
I found no way to avoid that mounting (a support case at SUSE is in progress).

Fortunately if the VMs were running for a significant time, most blocks are 
cached inside the VM, and blocks are "mostly written" instead of being read. So 
most likely the badly recovered blocks are overwritten with good data before 
the machine reboots and the bad blocks would be read.

This most obvious "solution" to stop every VM on the whole cluster before 
upgrading a single node is not very HA-like, unfortunately.

Any better ideas anyone?

Regards,
Ulrich



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/