Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart
Hi Brian, which version were you using before 3.7.1? Did that problem occur with version 3.7.1 meanwhile? Greetings, Roger Lehmann Am 04.06.2015 um 16:55 schrieb Andrus, Brian Contractor: I have similar issues with gluster and am starting to wonder if it really is stable for VM images. My setup is simple: 1X2=2 I am mirroring a disk, basically. Trouble has been that the VM images (qcow2 files) go split-brained when one of the VMs gets busier than usual. Once that happens, heal doesn't work often and while it is in such a state, the VM often becomes unresponsive. I've had to pick on of the qcow files from a brick, copy it somewhere, delete the file from gluster and then copy the file from a brick to gluster. Usually that works, but sometimes I have to run fsck on the image at boot to clean things up. This is NOT stable, to be sure. Hopefully it is moot as I recently upgraded to 3.7.1 and we will see how that goes. So far, so good. Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238 -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Justin Clift Sent: Thursday, June 04, 2015 7:33 AM To: Roger Lehmann Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart On 4 Jun 2015, at 15:08, Roger Lehmann roger.lehm...@marktjagd.de wrote: snip I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 but I had other problems while testing (may also be because of a virtualized test environment), so I don't want to upgrade to 3.6.2 until I definitely know the problems I encountered are fixed in 3.6.2. snip Just to point out, version 3.6.3 was released a while ago. It's effectively 3.6.2 + bug fixes. Have you looked at testing that? :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart
Roger I was using the last latest 3.7.0 and before that 3.6.3 So far I have NOT had the issue with 3.7.1, so that makes me quite happy. Brian Andrus -Original Message- From: Roger Lehmann [mailto:roger.lehm...@marktjagd.de] Sent: Monday, June 08, 2015 11:46 PM To: Andrus, Brian Contractor; Justin Clift Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart Hi Brian, which version were you using before 3.7.1? Did that problem occur with version 3.7.1 meanwhile? Greetings, Roger Lehmann Am 04.06.2015 um 16:55 schrieb Andrus, Brian Contractor: I have similar issues with gluster and am starting to wonder if it really is stable for VM images. My setup is simple: 1X2=2 I am mirroring a disk, basically. Trouble has been that the VM images (qcow2 files) go split-brained when one of the VMs gets busier than usual. Once that happens, heal doesn't work often and while it is in such a state, the VM often becomes unresponsive. I've had to pick on of the qcow files from a brick, copy it somewhere, delete the file from gluster and then copy the file from a brick to gluster. Usually that works, but sometimes I have to run fsck on the image at boot to clean things up. This is NOT stable, to be sure. Hopefully it is moot as I recently upgraded to 3.7.1 and we will see how that goes. So far, so good. Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238 -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Justin Clift Sent: Thursday, June 04, 2015 7:33 AM To: Roger Lehmann Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart On 4 Jun 2015, at 15:08, Roger Lehmann roger.lehm...@marktjagd.de wrote: snip I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 but I had other problems while testing (may also be because of a virtualized test environment), so I don't want to upgrade to 3.6.2 until I definitely know the problems I encountered are fixed in 3.6.2. snip Just to point out, version 3.6.3 was released a while ago. It's effectively 3.6.2 + bug fixes. Have you looked at testing that? :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart
Unfortunately, when I restart every node in the cluster sequentially...qemu image of the HA VM gets corrupted... Even client nodes? Make sure that your client can connect to all of the servers. Make sure, after you restart a server, that the self-heal finishes before you restart the next one. What I suspect is happening is that you restart server A, writes happen on server B. You restart server B before the heal has happened to copy the changes from server A to server B, thus causing the client to write changes to server B. When server A comes back, both server A and server B think they have changes for the other. This is a classic split-brain state. On 06/04/2015 07:08 AM, Roger Lehmann wrote: Hello, I'm having a serious problem with my GlusterFS cluster. I'm using Proxmox 3.4 for high available VM management which works with GlusterFS as storage. Unfortunately, when I restart every node in the cluster sequentially one by one (with online migration of the running HA VM first of course) the qemu image of the HA VM gets corrupted and the VM itself has problems accessing it. May 15 10:35:09 blog kernel: [339003.942602] end_request: I/O error, dev vda, sector 2048 May 15 10:35:09 blog kernel: [339003.942829] Buffer I/O error on device vda1, logical block 0 May 15 10:35:09 blog kernel: [339003.942929] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.942952] end_request: I/O error, dev vda, sector 2072 May 15 10:35:09 blog kernel: [339003.943049] Buffer I/O error on device vda1, logical block 3 May 15 10:35:09 blog kernel: [339003.943146] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.943153] end_request: I/O error, dev vda, sector 4196712 May 15 10:35:09 blog kernel: [339003.943251] Buffer I/O error on device vda1, logical block 524333 May 15 10:35:09 blog kernel: [339003.943350] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.943363] end_request: I/O error, dev vda, sector 4197184 After the image is broken, it's impossible to migrate the VM or start it when it's down. root@pve2 ~ # gluster volume heal pve-vol info Gathering list of entries to be healed on volume pve-vol has been successful Brick pve1:/var/lib/glusterd/brick Number of entries: 1 /images//200/vm-200-disk-1.qcow2 Brick pve2:/var/lib/glusterd/brick Number of entries: 1 /images/200/vm-200-disk-1.qcow2 Brick pve3:/var/lib/glusterd/brick Number of entries: 1 /images//200/vm-200-disk-1.qcow2 I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 but I had other problems while testing (may also be because of a virtualized test environment), so I don't want to upgrade to 3.6.2 until I definitely know the problems I encountered are fixed in 3.6.2. Anybody else experienced this problem? I'm not sure if issue 1161885 (Possible file corruption on dispersed volumes) is the issue I'm experiencing. I have a 3 node replicate cluster. Thanks for your help! Regards, Roger Lehmann ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart
I saw similar behaviour when file permissions of vm image was set to root:root instead of hypervisor user. chown -R libvirt-qemu:kvm /var/lib/libvirt/images before starting vm did the trick for me... Am 04.06.2015 um 16:08 schrieb Roger Lehmann: Hello, I'm having a serious problem with my GlusterFS cluster. I'm using Proxmox 3.4 for high available VM management which works with GlusterFS as storage. Unfortunately, when I restart every node in the cluster sequentially one by one (with online migration of the running HA VM first of course) the qemu image of the HA VM gets corrupted and the VM itself has problems accessing it. May 15 10:35:09 blog kernel: [339003.942602] end_request: I/O error, dev vda, sector 2048 May 15 10:35:09 blog kernel: [339003.942829] Buffer I/O error on device vda1, logical block 0 May 15 10:35:09 blog kernel: [339003.942929] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.942952] end_request: I/O error, dev vda, sector 2072 May 15 10:35:09 blog kernel: [339003.943049] Buffer I/O error on device vda1, logical block 3 May 15 10:35:09 blog kernel: [339003.943146] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.943153] end_request: I/O error, dev vda, sector 4196712 May 15 10:35:09 blog kernel: [339003.943251] Buffer I/O error on device vda1, logical block 524333 May 15 10:35:09 blog kernel: [339003.943350] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.943363] end_request: I/O error, dev vda, sector 4197184 After the image is broken, it's impossible to migrate the VM or start it when it's down. root@pve2 ~ # gluster volume heal pve-vol info Gathering list of entries to be healed on volume pve-vol has been successful Brick pve1:/var/lib/glusterd/brick Number of entries: 1 /images//200/vm-200-disk-1.qcow2 Brick pve2:/var/lib/glusterd/brick Number of entries: 1 /images/200/vm-200-disk-1.qcow2 Brick pve3:/var/lib/glusterd/brick Number of entries: 1 /images//200/vm-200-disk-1.qcow2 I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 but I had other problems while testing (may also be because of a virtualized test environment), so I don't want to upgrade to 3.6.2 until I definitely know the problems I encountered are fixed in 3.6.2. Anybody else experienced this problem? I'm not sure if issue 1161885 (Possible file corruption on dispersed volumes) is the issue I'm experiencing. I have a 3 node replicate cluster. Thanks for your help! Regards, Roger Lehmann ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users -- Mit freundlichen Grüßen André Bauer MAGIX Software GmbH André Bauer Administrator August-Bebel-Straße 48 01219 Dresden GERMANY tel.: 0351 41884875 e-mail: aba...@magix.net aba...@magix.net mailto:Email www.magix.com http://www.magix.com/ Geschäftsführer | Managing Directors: Dr. Arnd Schröder, Michael Keith Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205 Find us on: http://www.facebook.com/MAGIX http://www.twitter.com/magix_de http://www.youtube.com/wwwmagixcom http://www.magixmagazin.de -- The information in this email is intended only for the addressee named above. Access to this email by anyone else is unauthorized. If you are not the intended recipient of this message any disclosure, copying, distribution or any action taken in reliance on it is prohibited and may be unlawful. MAGIX does not warrant that any attachments are free from viruses or other defects and accepts no liability for any losses resulting from infected email transmissions. Please note that any views expressed in this email may be those of the originator and do not necessarily represent the agenda of the company. -- ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart
I have similar issues with gluster and am starting to wonder if it really is stable for VM images. My setup is simple: 1X2=2 I am mirroring a disk, basically. Trouble has been that the VM images (qcow2 files) go split-brained when one of the VMs gets busier than usual. Once that happens, heal doesn't work often and while it is in such a state, the VM often becomes unresponsive. I've had to pick on of the qcow files from a brick, copy it somewhere, delete the file from gluster and then copy the file from a brick to gluster. Usually that works, but sometimes I have to run fsck on the image at boot to clean things up. This is NOT stable, to be sure. Hopefully it is moot as I recently upgraded to 3.7.1 and we will see how that goes. So far, so good. Brian Andrus ITACS/Research Computing Naval Postgraduate School Monterey, California voice: 831-656-6238 -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Justin Clift Sent: Thursday, June 04, 2015 7:33 AM To: Roger Lehmann Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart On 4 Jun 2015, at 15:08, Roger Lehmann roger.lehm...@marktjagd.de wrote: snip I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 but I had other problems while testing (may also be because of a virtualized test environment), so I don't want to upgrade to 3.6.2 until I definitely know the problems I encountered are fixed in 3.6.2. snip Just to point out, version 3.6.3 was released a while ago. It's effectively 3.6.2 + bug fixes. Have you looked at testing that? :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart
On 4 Jun 2015, at 15:08, Roger Lehmann roger.lehm...@marktjagd.de wrote: snip I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 but I had other problems while testing (may also be because of a virtualized test environment), so I don't want to upgrade to 3.6.2 until I definitely know the problems I encountered are fixed in 3.6.2. snip Just to point out, version 3.6.3 was released a while ago. It's effectively 3.6.2 + bug fixes. Have you looked at testing that? :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart
Hello, I'm having a serious problem with my GlusterFS cluster. I'm using Proxmox 3.4 for high available VM management which works with GlusterFS as storage. Unfortunately, when I restart every node in the cluster sequentially one by one (with online migration of the running HA VM first of course) the qemu image of the HA VM gets corrupted and the VM itself has problems accessing it. May 15 10:35:09 blog kernel: [339003.942602] end_request: I/O error, dev vda, sector 2048 May 15 10:35:09 blog kernel: [339003.942829] Buffer I/O error on device vda1, logical block 0 May 15 10:35:09 blog kernel: [339003.942929] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.942952] end_request: I/O error, dev vda, sector 2072 May 15 10:35:09 blog kernel: [339003.943049] Buffer I/O error on device vda1, logical block 3 May 15 10:35:09 blog kernel: [339003.943146] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.943153] end_request: I/O error, dev vda, sector 4196712 May 15 10:35:09 blog kernel: [339003.943251] Buffer I/O error on device vda1, logical block 524333 May 15 10:35:09 blog kernel: [339003.943350] lost page write due to I/O error on vda1 May 15 10:35:09 blog kernel: [339003.943363] end_request: I/O error, dev vda, sector 4197184 After the image is broken, it's impossible to migrate the VM or start it when it's down. root@pve2 ~ # gluster volume heal pve-vol info Gathering list of entries to be healed on volume pve-vol has been successful Brick pve1:/var/lib/glusterd/brick Number of entries: 1 /images//200/vm-200-disk-1.qcow2 Brick pve2:/var/lib/glusterd/brick Number of entries: 1 /images/200/vm-200-disk-1.qcow2 Brick pve3:/var/lib/glusterd/brick Number of entries: 1 /images//200/vm-200-disk-1.qcow2 I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 but I had other problems while testing (may also be because of a virtualized test environment), so I don't want to upgrade to 3.6.2 until I definitely know the problems I encountered are fixed in 3.6.2. Anybody else experienced this problem? I'm not sure if issue 1161885 (Possible file corruption on dispersed volumes) is the issue I'm experiencing. I have a 3 node replicate cluster. Thanks for your help! Regards, Roger Lehmann ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users