[ovirt-users] Re: [Gluster-users] Re: VM disk corruption with LSM on Gluster
Hi Krutika, Leo, Sounds promising. I will test this too, and report back tomorrow (or maybe sooner, if corruption occurs again). -- Sander On 27-03-19 10:00, Krutika Dhananjay wrote: > This is needed to prevent any inconsistencies stemming from buffered > writes/caching file data during live VM migration. > Besides, for Gluster to truly honor direct-io behavior in qemu's > 'cache=none' mode (which is what oVirt uses), > one needs to turn on performance.strict-o-direct and disable remote-dio. > > -Krutika > > On Wed, Mar 27, 2019 at 12:24 PM Leo David <mailto:leoa...@gmail.com>> wrote: > > Hi, > I can confirm that after setting these two options, I haven't > encountered disk corruptions anymore. > The downside, is that at least for me it had a pretty big impact > on performance. > The iops really went down - performing inside vm fio tests. > > On Wed, Mar 27, 2019, 07:03 Krutika Dhananjay <mailto:kdhan...@redhat.com>> wrote: > > Could you enable strict-o-direct and disable remote-dio on the > src volume as well, restart the vms on "old" and retry migration? > > # gluster volume set performance.strict-o-direct on > # gluster volume set network.remote-dio off > > -Krutika > > On Tue, Mar 26, 2019 at 10:32 PM Sander Hoentjen > mailto:san...@hoentjen.eu>> wrote: > > On 26-03-19 14:23, Sahina Bose wrote: > > +Krutika Dhananjay and gluster ml > > > > On Tue, Mar 26, 2019 at 6:16 PM Sander Hoentjen > mailto:san...@hoentjen.eu>> wrote: > >> Hello, > >> > >> tl;dr We have disk corruption when doing live storage > migration on oVirt > >> 4.2 with gluster 3.12.15. Any idea why? > >> > >> We have a 3-node oVirt cluster that is both compute and > gluster-storage. > >> The manager runs on separate hardware. We are running > out of space on > >> this volume, so we added another Gluster volume that is > bigger, put a > >> storage domain on it and then we migrated VM's to it > with LSM. After > >> some time, we noticed that (some of) the migrated VM's > had corrupted > >> filesystems. After moving everything back with > export-import to the old > >> domain where possible, and recovering from backups > where needed we set > >> off to investigate this issue. > >> > >> We are now at the point where we can reproduce this > issue within a day. > >> What we have found so far: > >> 1) The corruption occurs at the very end of the > replication step, most > >> probably between START and FINISH of > diskReplicateFinish, before the > >> START merge step > >> 2) In the corrupted VM, at some place where data should > be, this data is > >> replaced by zero's. This can be file-contents or a > directory-structure > >> or whatever. > >> 3) The source gluster volume has different settings > then the destination > >> (Mostly because the defaults were different at creation > time): > >> > >> Setting old(src) new(dst) > >> cluster.op-version 30800 30800 > (the same) > >> cluster.max-op-version 31202 31202 > (the same) > >> cluster.metadata-self-heal off on > >> cluster.data-self-heal off on > >> cluster.entry-self-heal off on > >> performance.low-prio-threads 16 32 > >> performance.strict-o-direct off on > >> network.ping-timeout 42 30 > >> network.remote-dio enable off > >> transport.address-family - inet > >> performance.stat-prefetch off on > >> features.shard-block-size 512MB 64MB > >> clust
[ovirt-users] Re: VM disk corruption with LSM on Gluster
On 26-03-19 14:23, Sahina Bose wrote: > +Krutika Dhananjay and gluster ml > > On Tue, Mar 26, 2019 at 6:16 PM Sander Hoentjen wrote: >> Hello, >> >> tl;dr We have disk corruption when doing live storage migration on oVirt >> 4.2 with gluster 3.12.15. Any idea why? >> >> We have a 3-node oVirt cluster that is both compute and gluster-storage. >> The manager runs on separate hardware. We are running out of space on >> this volume, so we added another Gluster volume that is bigger, put a >> storage domain on it and then we migrated VM's to it with LSM. After >> some time, we noticed that (some of) the migrated VM's had corrupted >> filesystems. After moving everything back with export-import to the old >> domain where possible, and recovering from backups where needed we set >> off to investigate this issue. >> >> We are now at the point where we can reproduce this issue within a day. >> What we have found so far: >> 1) The corruption occurs at the very end of the replication step, most >> probably between START and FINISH of diskReplicateFinish, before the >> START merge step >> 2) In the corrupted VM, at some place where data should be, this data is >> replaced by zero's. This can be file-contents or a directory-structure >> or whatever. >> 3) The source gluster volume has different settings then the destination >> (Mostly because the defaults were different at creation time): >> >> Setting old(src) new(dst) >> cluster.op-version 30800 30800 (the same) >> cluster.max-op-version 31202 31202 (the same) >> cluster.metadata-self-heal off on >> cluster.data-self-heal off on >> cluster.entry-self-heal off on >> performance.low-prio-threads1632 >> performance.strict-o-direct off on >> network.ping-timeout4230 >> network.remote-dio enableoff >> transport.address-family- inet >> performance.stat-prefetch off on >> features.shard-block-size 512MB 64MB >> cluster.shd-max-threads 1 8 >> cluster.shd-wait-qlength1024 1 >> cluster.locking-scheme full granular >> cluster.granular-entry-heal noenable >> >> 4) To test, we migrate some VM's back and forth. The corruption does not >> occur every time. To this point it only occurs from old to new, but we >> don't have enough data-points to be sure about that. >> >> Anybody an idea what is causing the corruption? Is this the best list to >> ask, or should I ask on a Gluster list? I am not sure if this is oVirt >> specific or Gluster specific though. > Do you have logs from old and new gluster volumes? Any errors in the > new volume's fuse mount logs? Around the time of corruption I see the message: The message "I [MSGID: 133017] [shard.c:4941:shard_seek] 0-ZoneA_Gluster1-shard: seek called on 7fabc273-3d8a-4a49-8906-b8ccbea4a49f. [Operation not supported]" repeated 231 times between [2019-03-26 13:14:22.297333] and [2019-03-26 13:15:42.912170] I also see this message at other times, when I don't see the corruption occur, though. -- Sander ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/M3T2VGGGV6DE643ZKKJUAF274VSWTJFH/
[ovirt-users] VM disk corruption with LSM on Gluster
Hello, tl;dr We have disk corruption when doing live storage migration on oVirt 4.2 with gluster 3.12.15. Any idea why? We have a 3-node oVirt cluster that is both compute and gluster-storage. The manager runs on separate hardware. We are running out of space on this volume, so we added another Gluster volume that is bigger, put a storage domain on it and then we migrated VM's to it with LSM. After some time, we noticed that (some of) the migrated VM's had corrupted filesystems. After moving everything back with export-import to the old domain where possible, and recovering from backups where needed we set off to investigate this issue. We are now at the point where we can reproduce this issue within a day. What we have found so far: 1) The corruption occurs at the very end of the replication step, most probably between START and FINISH of diskReplicateFinish, before the START merge step 2) In the corrupted VM, at some place where data should be, this data is replaced by zero's. This can be file-contents or a directory-structure or whatever. 3) The source gluster volume has different settings then the destination (Mostly because the defaults were different at creation time): Setting old(src) new(dst) cluster.op-version 30800 30800 (the same) cluster.max-op-version 31202 31202 (the same) cluster.metadata-self-heal off on cluster.data-self-heal off on cluster.entry-self-heal off on performance.low-prio-threads 16 32 performance.strict-o-direct off on network.ping-timeout 42 30 network.remote-dio enable off transport.address-family - inet performance.stat-prefetch off on features.shard-block-size 512MB 64MB cluster.shd-max-threads 1 8 cluster.shd-wait-qlength 1024 1 cluster.locking-scheme full granular cluster.granular-entry-heal no enable 4) To test, we migrate some VM's back and forth. The corruption does not occur every time. To this point it only occurs from old to new, but we don't have enough data-points to be sure about that. Anybody an idea what is causing the corruption? Is this the best list to ask, or should I ask on a Gluster list? I am not sure if this is oVirt specific or Gluster specific though. Kind regards, Sander Hoentjen ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/43E2QYJYDHPYTIU3IFS53WS4WL5OFXUV/
Re: [ovirt-users] Ovirt/Gluster
On 08/21/2015 06:12 PM, Ravishankar N wrote: On 08/21/2015 07:57 PM, Sander Hoentjen wrote: Maybe I should formulate some clear questions: 1) Am I correct in assuming that an issue on of of 3 gluster nodes should not cause downtime for VM's on other nodes? From what I understand, yes. Maybe the ovirt folks can confirm. I can tell you this much for sure: If you create a replica 3 volume using 3 nodes, mount the volume locally on each node, and bring down one node, the mounts from the other 2 nodes *must* have read+write access to the volume. 2) What can I/we do to fix the issue I am seeing? 3) Can anybody else reproduce my issue? I'll try and see if I can. Hi Ravi, Did you get around to this by any chance? This is a blocker issue for us. Apart from that, has anybody else have any success with using gluster reliably as an ovirt storage solution? Regards, Sander ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Ovirt/Gluster
On 08/21/2015 11:30 AM, Ravishankar N wrote: On 08/21/2015 01:21 PM, Sander Hoentjen wrote: On 08/21/2015 09:28 AM, Ravishankar N wrote: On 08/20/2015 02:14 PM, Sander Hoentjen wrote: On 08/19/2015 09:04 AM, Ravishankar N wrote: On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote: + Ravi from gluster. Regards, Ramesh - Original Message - From: Sander Hoentjen san...@hoentjen.eu To: users@ovirt.org Sent: Tuesday, August 18, 2015 3:30:35 PM Subject: [ovirt-users] Ovirt/Gluster Hi, We are looking for some easy to manage self contained VM hosting. Ovirt with GlusterFS seems to fit that bill perfectly. I installed it and then starting kicking the tires. First results looked promising, but now I can get a VM to pause indefinitely fairly easy: My setup is 3 hosts that are in a Virt and Gluster cluster. Gluster is setup as replica-3. The gluster export is used as the storage domain for the VM's. Hi, What version of gluster and ovirt are you using? glusterfs-3.7.3-1.el7.x86_64 vdsm-4.16.20-0.el7.centos.x86_64 ovirt-engine-3.5.3.1-1.el7.centos.noarch Now when I start the VM all is good, performance is good enough so we are happy. I then start bonnie++ to generate some load. I have a VM running on host 1, host 2 is SPM and all 3 VM's are seeing some network traffic courtesy of gluster. Now, for fun, suddenly the network on host3 goes bad (iptables -I OUTPUT -m statistic --mode random --probability 0.75 -j REJECT). Some time later I see the guest has a small hickup, I'm guessing that is when gluster decides host 3 is not allowed to play anymore. No big deal anyway. After a while 25% of packages just isn't good enough for Ovirt anymore, so the host will be fenced. I'm not sure what fencing means w.r.t ovirt and what it actually fences. As far is gluster is concerned, since only one node is blocked, the VM image should still be accessible by the VM running on host1. Fencing means (at least in this case) that the IPMI of the server does a power reset. After a reboot *sometimes* the VM will be paused, and even after the gluster self-heal is complete it can not be unpaused, has to be restarted. Could you provide the gluster mount (fuse?) logs and the brick logs of all 3 nodes when the VM is paused? That should give us some clue. Logs are attached. Problem was at around 8:15 - 8:20 UTC This time however the vm stopped even without a reboot of hyp03 The mount logs (rhev-data-center-mnt-glusterSD*) are indicating frequent disconnects to the bricks with 'clnt_ping_timer_expired', 'Client-quorum is not met' and 'Read-only file system' messages. client-quorum is enabled by default for replica 3 volumes. So if the mount cannot connect to 2 bricks at least, quorum is lost and the gluster volume becomes read-only. That seems to be the reason why the VMs are pausing. I'm not sure if the frequent disconnects are due a flaky network or the bricks not responding to the mount's ping timer due to it's epoll threads busy with I/O (unlikely). Can you also share the output of `gluster volume info volname` ? The frequent disconnects are probably because I intentionally broke the network on hyp03 (dropped 75% of outgoing packets). In my opinion this should not affect the VM an hyp02. Am I wrong to think that? For client-quorum, If a client (mount) cannot connect to the number of bricks to achieve quorum, the client becomes read-only. So if the client on hyp02 can see itself and 01, it shouldn't be affected. But it was, and I only broke hyp03. [root@hyp01 ~]# gluster volume info VMS Volume Name: VMS Type: Replicate Volume ID: 9e6657e7-8520-4720-ba9d-78b14a86c8ca Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.99.50.20:/brick/VMS Brick2: 10.99.50.21:/brick/VMS Brick3: 10.99.50.22:/brick/VMS Options Reconfigured: performance.readdir-ahead: on nfs.disable: on user.cifs: disable auth.allow: * performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server I see that you have enabled server-quorum too. Since you blocked hyp03, the if the glusterd on that node cannot see the other 2 nodes due to iptable rules, it would kill all brick processes. See the 7 How To Test section in http://www.gluster.org/community/documentation/index.php/Features/Server-quorum to get a better idea of server-quorum. Yes but it should only kill the bricks on hyp03, right? So then why does the VM on hyp02 die? I don't like the fact that a problem on any one of the hosts can bring down any VM on any host. -- Sander ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] Ovirt/Gluster
On 08/21/2015 02:21 PM, Ravishankar N wrote: On 08/21/2015 04:32 PM, Sander Hoentjen wrote: On 08/21/2015 11:30 AM, Ravishankar N wrote: On 08/21/2015 01:21 PM, Sander Hoentjen wrote: On 08/21/2015 09:28 AM, Ravishankar N wrote: On 08/20/2015 02:14 PM, Sander Hoentjen wrote: On 08/19/2015 09:04 AM, Ravishankar N wrote: On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote: + Ravi from gluster. Regards, Ramesh - Original Message - From: Sander Hoentjen san...@hoentjen.eu To: users@ovirt.org Sent: Tuesday, August 18, 2015 3:30:35 PM Subject: [ovirt-users] Ovirt/Gluster Hi, We are looking for some easy to manage self contained VM hosting. Ovirt with GlusterFS seems to fit that bill perfectly. I installed it and then starting kicking the tires. First results looked promising, but now I can get a VM to pause indefinitely fairly easy: My setup is 3 hosts that are in a Virt and Gluster cluster. Gluster is setup as replica-3. The gluster export is used as the storage domain for the VM's. Hi, What version of gluster and ovirt are you using? glusterfs-3.7.3-1.el7.x86_64 vdsm-4.16.20-0.el7.centos.x86_64 ovirt-engine-3.5.3.1-1.el7.centos.noarch Now when I start the VM all is good, performance is good enough so we are happy. I then start bonnie++ to generate some load. I have a VM running on host 1, host 2 is SPM and all 3 VM's are seeing some network traffic courtesy of gluster. Now, for fun, suddenly the network on host3 goes bad (iptables -I OUTPUT -m statistic --mode random --probability 0.75 -j REJECT). Some time later I see the guest has a small hickup, I'm guessing that is when gluster decides host 3 is not allowed to play anymore. No big deal anyway. After a while 25% of packages just isn't good enough for Ovirt anymore, so the host will be fenced. I'm not sure what fencing means w.r.t ovirt and what it actually fences. As far is gluster is concerned, since only one node is blocked, the VM image should still be accessible by the VM running on host1. Fencing means (at least in this case) that the IPMI of the server does a power reset. After a reboot *sometimes* the VM will be paused, and even after the gluster self-heal is complete it can not be unpaused, has to be restarted. Could you provide the gluster mount (fuse?) logs and the brick logs of all 3 nodes when the VM is paused? That should give us some clue. Logs are attached. Problem was at around 8:15 - 8:20 UTC This time however the vm stopped even without a reboot of hyp03 The mount logs (rhev-data-center-mnt-glusterSD*) are indicating frequent disconnects to the bricks with 'clnt_ping_timer_expired', 'Client-quorum is not met' and 'Read-only file system' messages. client-quorum is enabled by default for replica 3 volumes. So if the mount cannot connect to 2 bricks at least, quorum is lost and the gluster volume becomes read-only. That seems to be the reason why the VMs are pausing. I'm not sure if the frequent disconnects are due a flaky network or the bricks not responding to the mount's ping timer due to it's epoll threads busy with I/O (unlikely). Can you also share the output of `gluster volume info volname` ? The frequent disconnects are probably because I intentionally broke the network on hyp03 (dropped 75% of outgoing packets). In my opinion this should not affect the VM an hyp02. Am I wrong to think that? For client-quorum, If a client (mount) cannot connect to the number of bricks to achieve quorum, the client becomes read-only. So if the client on hyp02 can see itself and 01, it shouldn't be affected. But it was, and I only broke hyp03. Beats me then. I see [2015-08-18 15:15:27.922998] W [MSGID: 108001] [afr-common.c:4043:afr_notify] 0-VMS-replicate-0: Client-quorum is not met on hyp02's mount log but the time stamp is earlier than when you say you observed the hang (2015-08-20, around 8:15 - 8:20 UTC?). (they do occur in that time on hyp03 though). Yeah that event is from before. For your information: This setup is used to test, so I try to break it and hope I don't succeed. Unfortunately I succeeded. [root@hyp01 ~]# gluster volume info VMS Volume Name: VMS Type: Replicate Volume ID: 9e6657e7-8520-4720-ba9d-78b14a86c8ca Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.99.50.20:/brick/VMS Brick2: 10.99.50.21:/brick/VMS Brick3: 10.99.50.22:/brick/VMS Options Reconfigured: performance.readdir-ahead: on nfs.disable: on user.cifs: disable auth.allow: * performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server I see that you have enabled server-quorum too. Since you blocked hyp03, the if the glusterd on that node cannot see the other 2 nodes due to iptable rules, it would kill all brick processes. See the 7 How To Test section
Re: [ovirt-users] Ovirt/Gluster
On 08/21/2015 09:28 AM, Ravishankar N wrote: On 08/20/2015 02:14 PM, Sander Hoentjen wrote: On 08/19/2015 09:04 AM, Ravishankar N wrote: On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote: + Ravi from gluster. Regards, Ramesh - Original Message - From: Sander Hoentjen san...@hoentjen.eu To: users@ovirt.org Sent: Tuesday, August 18, 2015 3:30:35 PM Subject: [ovirt-users] Ovirt/Gluster Hi, We are looking for some easy to manage self contained VM hosting. Ovirt with GlusterFS seems to fit that bill perfectly. I installed it and then starting kicking the tires. First results looked promising, but now I can get a VM to pause indefinitely fairly easy: My setup is 3 hosts that are in a Virt and Gluster cluster. Gluster is setup as replica-3. The gluster export is used as the storage domain for the VM's. Hi, What version of gluster and ovirt are you using? glusterfs-3.7.3-1.el7.x86_64 vdsm-4.16.20-0.el7.centos.x86_64 ovirt-engine-3.5.3.1-1.el7.centos.noarch Now when I start the VM all is good, performance is good enough so we are happy. I then start bonnie++ to generate some load. I have a VM running on host 1, host 2 is SPM and all 3 VM's are seeing some network traffic courtesy of gluster. Now, for fun, suddenly the network on host3 goes bad (iptables -I OUTPUT -m statistic --mode random --probability 0.75 -j REJECT). Some time later I see the guest has a small hickup, I'm guessing that is when gluster decides host 3 is not allowed to play anymore. No big deal anyway. After a while 25% of packages just isn't good enough for Ovirt anymore, so the host will be fenced. I'm not sure what fencing means w.r.t ovirt and what it actually fences. As far is gluster is concerned, since only one node is blocked, the VM image should still be accessible by the VM running on host1. Fencing means (at least in this case) that the IPMI of the server does a power reset. After a reboot *sometimes* the VM will be paused, and even after the gluster self-heal is complete it can not be unpaused, has to be restarted. Could you provide the gluster mount (fuse?) logs and the brick logs of all 3 nodes when the VM is paused? That should give us some clue. Logs are attached. Problem was at around 8:15 - 8:20 UTC This time however the vm stopped even without a reboot of hyp03 The mount logs (rhev-data-center-mnt-glusterSD*) are indicating frequent disconnects to the bricks with 'clnt_ping_timer_expired', 'Client-quorum is not met' and 'Read-only file system' messages. client-quorum is enabled by default for replica 3 volumes. So if the mount cannot connect to 2 bricks at least, quorum is lost and the gluster volume becomes read-only. That seems to be the reason why the VMs are pausing. I'm not sure if the frequent disconnects are due a flaky network or the bricks not responding to the mount's ping timer due to it's epoll threads busy with I/O (unlikely). Can you also share the output of `gluster volume info volname` ? The frequent disconnects are probably because I intentionally broke the network on hyp03 (dropped 75% of outgoing packets). In my opinion this should not affect the VM an hyp02. Am I wrong to think that? [root@hyp01 ~]# gluster volume info VMS Volume Name: VMS Type: Replicate Volume ID: 9e6657e7-8520-4720-ba9d-78b14a86c8ca Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 10.99.50.20:/brick/VMS Brick2: 10.99.50.21:/brick/VMS Brick3: 10.99.50.22:/brick/VMS Options Reconfigured: performance.readdir-ahead: on nfs.disable: on user.cifs: disable auth.allow: * performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 -- Sander ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
[ovirt-users] Ovirt/Gluster
Hi, We are looking for some easy to manage self contained VM hosting. Ovirt with GlusterFS seems to fit that bill perfectly. I installed it and then starting kicking the tires. First results looked promising, but now I can get a VM to pause indefinitely fairly easy: My setup is 3 hosts that are in a Virt and Gluster cluster. Gluster is setup as replica-3. The gluster export is used as the storage domain for the VM's. Now when I start the VM all is good, performance is good enough so we are happy. I then start bonnie++ to generate some load. I have a VM running on host 1, host 2 is SPM and all 3 VM's are seeing some network traffic courtesy of gluster. Now, for fun, suddenly the network on host3 goes bad (iptables -I OUTPUT -m statistic --mode random --probability 0.75 -j REJECT). Some time later I see the guest has a small hickup, I'm guessing that is when gluster decides host 3 is not allowed to play anymore. No big deal anyway. After a while 25% of packages just isn't good enough for Ovirt anymore, so the host will be fenced. After a reboot *sometimes* the VM will be paused, and even after the gluster self-heal is complete it can not be unpaused, has to be restarted. Is there anything I can do to prevent the VM from being paused? Regards, Sander ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users