Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart

2015-06-09 Thread Roger Lehmann

Hi Brian,

which version were you using before 3.7.1? Did that problem occur with 
version 3.7.1 meanwhile?


Greetings,
Roger Lehmann

Am 04.06.2015 um 16:55 schrieb Andrus, Brian Contractor:

I have similar issues with gluster and am starting to wonder if it really is 
stable for VM images.

My setup is simple: 1X2=2
I am mirroring a disk, basically.

Trouble has been that the VM images (qcow2 files) go split-brained when one of 
the VMs gets busier than usual. Once that happens, heal doesn't work often and 
while it is in such a state, the VM often becomes unresponsive. I've had to 
pick on of the qcow files from a brick, copy it somewhere, delete the file from 
gluster and then copy the file from a brick to gluster.

Usually that works, but sometimes I have to run fsck on the image at boot to 
clean things up.
This is NOT stable, to be sure.

Hopefully it is moot as I recently upgraded to 3.7.1 and we will see how that 
goes. So far, so good.


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238


-Original Message-
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Justin Clift
Sent: Thursday, June 04, 2015 7:33 AM
To: Roger Lehmann
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node 
restart

On 4 Jun 2015, at 15:08, Roger Lehmann roger.lehm...@marktjagd.de wrote:
snip

I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 
but I had other problems while testing (may also be because of a virtualized 
test environment), so I don't want to upgrade to 3.6.2 until I definitely know 
the problems I encountered are fixed in 3.6.2.

snip

Just to point out, version 3.6.3 was released a while ago.  It's effectively 
3.6.2 + bug fixes.  Have you looked at testing that? :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several petabytes, and 
handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart

2015-06-09 Thread Andrus, Brian Contractor
Roger
I was using the last latest 3.7.0 and before that 3.6.3
So far I have NOT had the issue with 3.7.1, so that makes me quite happy.

Brian Andrus


-Original Message-
From: Roger Lehmann [mailto:roger.lehm...@marktjagd.de] 
Sent: Monday, June 08, 2015 11:46 PM
To: Andrus, Brian Contractor; Justin Clift
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node 
restart

Hi Brian,

which version were you using before 3.7.1? Did that problem occur with version 
3.7.1 meanwhile?

Greetings,
Roger Lehmann

Am 04.06.2015 um 16:55 schrieb Andrus, Brian Contractor:
 I have similar issues with gluster and am starting to wonder if it really is 
 stable for VM images.

 My setup is simple: 1X2=2
 I am mirroring a disk, basically.

 Trouble has been that the VM images (qcow2 files) go split-brained when one 
 of the VMs gets busier than usual. Once that happens, heal doesn't work often 
 and while it is in such a state, the VM often becomes unresponsive. I've had 
 to pick on of the qcow files from a brick, copy it somewhere, delete the file 
 from gluster and then copy the file from a brick to gluster.

 Usually that works, but sometimes I have to run fsck on the image at boot to 
 clean things up.
 This is NOT stable, to be sure.

 Hopefully it is moot as I recently upgraded to 3.7.1 and we will see how that 
 goes. So far, so good.


 Brian Andrus
 ITACS/Research Computing
 Naval Postgraduate School
 Monterey, California
 voice: 831-656-6238


 -Original Message-
 From: gluster-users-boun...@gluster.org 
 [mailto:gluster-users-boun...@gluster.org] On Behalf Of Justin Clift
 Sent: Thursday, June 04, 2015 7:33 AM
 To: Roger Lehmann
 Cc: gluster-users@gluster.org
 Subject: Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on 
 cluster node restart

 On 4 Jun 2015, at 15:08, Roger Lehmann roger.lehm...@marktjagd.de wrote:
 snip
 I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 
 but I had other problems while testing (may also be because of a virtualized 
 test environment), so I don't want to upgrade to 3.6.2 until I definitely 
 know the problems I encountered are fixed in 3.6.2.
 snip

 Just to point out, version 3.6.3 was released a while ago.  It's 
 effectively 3.6.2 + bug fixes.  Have you looked at testing that? :)

 + Justin

 --
 GlusterFS - http://www.gluster.org

 An open source, distributed file system scaling to several petabytes, and 
 handling thousands of clients.

 My personal twitter: twitter.com/realjustinclift

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart

2015-06-08 Thread Joe Julian
Unfortunately, when I restart every node in the cluster 
sequentially...qemu image of the HA VM gets corrupted...

Even client nodes?

Make sure that your client can connect to all of the servers.

Make sure, after you restart a server, that the self-heal finishes 
before you restart the next one. What I suspect is happening is that you 
restart server A, writes happen on server B. You restart server B before 
the heal has happened to copy the changes from server A to server B, 
thus causing the client to write changes to server B. When server A 
comes back, both server A and server B think they have changes for the 
other. This is a classic split-brain state.


On 06/04/2015 07:08 AM, Roger Lehmann wrote:

Hello, I'm having a serious problem with my GlusterFS cluster.
I'm using Proxmox 3.4 for high available VM management which works 
with GlusterFS as storage.
Unfortunately, when I restart every node in the cluster sequentially 
one by one (with online migration of the running HA VM first of 
course) the qemu image of the HA VM gets corrupted and the VM itself 
has problems accessing it.


May 15 10:35:09 blog kernel: [339003.942602] end_request: I/O error, 
dev vda, sector 2048
May 15 10:35:09 blog kernel: [339003.942829] Buffer I/O error on 
device vda1, logical block 0
May 15 10:35:09 blog kernel: [339003.942929] lost page write due to 
I/O error on vda1
May 15 10:35:09 blog kernel: [339003.942952] end_request: I/O error, 
dev vda, sector 2072
May 15 10:35:09 blog kernel: [339003.943049] Buffer I/O error on 
device vda1, logical block 3
May 15 10:35:09 blog kernel: [339003.943146] lost page write due to 
I/O error on vda1
May 15 10:35:09 blog kernel: [339003.943153] end_request: I/O error, 
dev vda, sector 4196712
May 15 10:35:09 blog kernel: [339003.943251] Buffer I/O error on 
device vda1, logical block 524333
May 15 10:35:09 blog kernel: [339003.943350] lost page write due to 
I/O error on vda1
May 15 10:35:09 blog kernel: [339003.943363] end_request: I/O error, 
dev vda, sector 4197184



After the image is broken, it's impossible to migrate the VM or start 
it when it's down.


root@pve2 ~ # gluster volume heal pve-vol info
Gathering list of entries to be healed on volume pve-vol has been 
successful


Brick pve1:/var/lib/glusterd/brick
Number of entries: 1
/images//200/vm-200-disk-1.qcow2

Brick pve2:/var/lib/glusterd/brick
Number of entries: 1
/images/200/vm-200-disk-1.qcow2

Brick pve3:/var/lib/glusterd/brick
Number of entries: 1
/images//200/vm-200-disk-1.qcow2



I couldn't really reproduce this in my test environment with GlusterFS 
3.6.2 but I had other problems while testing (may also be because of a 
virtualized test environment), so I don't want to upgrade to 3.6.2 
until I definitely know the problems I encountered are fixed in 3.6.2.
Anybody else experienced this problem? I'm not sure if issue 1161885 
(Possible file corruption on dispersed volumes) is the issue I'm 
experiencing. I have a 3 node replicate cluster.

Thanks for your help!

Regards,
Roger Lehmann
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart

2015-06-08 Thread André Bauer
I saw similar behaviour when file permissions of vm image was set to
root:root instead of hypervisor user.

chown -R libvirt-qemu:kvm /var/lib/libvirt/images before starting vm
did the trick for me...


Am 04.06.2015 um 16:08 schrieb Roger Lehmann:
 Hello, I'm having a serious problem with my GlusterFS cluster.
 I'm using Proxmox 3.4 for high available VM management which works with
 GlusterFS as storage.
 Unfortunately, when I restart every node in the cluster sequentially one
 by one (with online migration of the running HA VM first of course) the
 qemu image of the HA VM gets corrupted and the VM itself has problems
 accessing it.
 
 May 15 10:35:09 blog kernel: [339003.942602] end_request: I/O error, dev
 vda, sector 2048
 May 15 10:35:09 blog kernel: [339003.942829] Buffer I/O error on device
 vda1, logical block 0
 May 15 10:35:09 blog kernel: [339003.942929] lost page write due to I/O
 error on vda1
 May 15 10:35:09 blog kernel: [339003.942952] end_request: I/O error, dev
 vda, sector 2072
 May 15 10:35:09 blog kernel: [339003.943049] Buffer I/O error on device
 vda1, logical block 3
 May 15 10:35:09 blog kernel: [339003.943146] lost page write due to I/O
 error on vda1
 May 15 10:35:09 blog kernel: [339003.943153] end_request: I/O error, dev
 vda, sector 4196712
 May 15 10:35:09 blog kernel: [339003.943251] Buffer I/O error on device
 vda1, logical block 524333
 May 15 10:35:09 blog kernel: [339003.943350] lost page write due to I/O
 error on vda1
 May 15 10:35:09 blog kernel: [339003.943363] end_request: I/O error, dev
 vda, sector 4197184
 
 
 After the image is broken, it's impossible to migrate the VM or start it
 when it's down.
 
 root@pve2 ~ # gluster volume heal pve-vol info
 Gathering list of entries to be healed on volume pve-vol has been
 successful
 
 Brick pve1:/var/lib/glusterd/brick
 Number of entries: 1
 /images//200/vm-200-disk-1.qcow2
 
 Brick pve2:/var/lib/glusterd/brick
 Number of entries: 1
 /images/200/vm-200-disk-1.qcow2
 
 Brick pve3:/var/lib/glusterd/brick
 Number of entries: 1
 /images//200/vm-200-disk-1.qcow2
 
 
 
 I couldn't really reproduce this in my test environment with GlusterFS
 3.6.2 but I had other problems while testing (may also be because of a
 virtualized test environment), so I don't want to upgrade to 3.6.2 until
 I definitely know the problems I encountered are fixed in 3.6.2.
 Anybody else experienced this problem? I'm not sure if issue 1161885
 (Possible file corruption on dispersed volumes) is the issue I'm
 experiencing. I have a 3 node replicate cluster.
 Thanks for your help!
 
 Regards,
 Roger Lehmann
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users
 


-- 
Mit freundlichen Grüßen
André Bauer

MAGIX Software GmbH
André Bauer
Administrator
August-Bebel-Straße 48
01219 Dresden
GERMANY

tel.: 0351 41884875
e-mail: aba...@magix.net
aba...@magix.net mailto:Email
www.magix.com http://www.magix.com/


Geschäftsführer | Managing Directors: Dr. Arnd Schröder, Michael Keith
Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205

Find us on:

http://www.facebook.com/MAGIX http://www.twitter.com/magix_de
http://www.youtube.com/wwwmagixcom http://www.magixmagazin.de
--
The information in this email is intended only for the addressee named
above. Access to this email by anyone else is unauthorized. If you are
not the intended recipient of this message any disclosure, copying,
distribution or any action taken in reliance on it is prohibited and
may be unlawful. MAGIX does not warrant that any attachments are free
from viruses or other defects and accepts no liability for any losses
resulting from infected email transmissions. Please note that any
views expressed in this email may be those of the originator and do
not necessarily represent the agenda of the company.
--
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart

2015-06-04 Thread Andrus, Brian Contractor
I have similar issues with gluster and am starting to wonder if it really is 
stable for VM images.

My setup is simple: 1X2=2
I am mirroring a disk, basically.

Trouble has been that the VM images (qcow2 files) go split-brained when one of 
the VMs gets busier than usual. Once that happens, heal doesn't work often and 
while it is in such a state, the VM often becomes unresponsive. I've had to 
pick on of the qcow files from a brick, copy it somewhere, delete the file from 
gluster and then copy the file from a brick to gluster.

Usually that works, but sometimes I have to run fsck on the image at boot to 
clean things up.
This is NOT stable, to be sure.

Hopefully it is moot as I recently upgraded to 3.7.1 and we will see how that 
goes. So far, so good.


Brian Andrus
ITACS/Research Computing
Naval Postgraduate School
Monterey, California
voice: 831-656-6238


-Original Message-
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Justin Clift
Sent: Thursday, June 04, 2015 7:33 AM
To: Roger Lehmann
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node 
restart

On 4 Jun 2015, at 15:08, Roger Lehmann roger.lehm...@marktjagd.de wrote:
snip
 I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 
 but I had other problems while testing (may also be because of a virtualized 
 test environment), so I don't want to upgrade to 3.6.2 until I definitely 
 know the problems I encountered are fixed in 3.6.2.
snip

Just to point out, version 3.6.3 was released a while ago.  It's effectively 
3.6.2 + bug fixes.  Have you looked at testing that? :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several petabytes, and 
handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart

2015-06-04 Thread Justin Clift
On 4 Jun 2015, at 15:08, Roger Lehmann roger.lehm...@marktjagd.de wrote:
snip
 I couldn't really reproduce this in my test environment with GlusterFS 3.6.2 
 but I had other problems while testing (may also be because of a virtualized 
 test environment), so I don't want to upgrade to 3.6.2 until I definitely 
 know the problems I encountered are fixed in 3.6.2.
snip

Just to point out, version 3.6.3 was released a while ago.  It's
effectively 3.6.2 + bug fixes.  Have you looked at testing that? :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] GlusterFS 3.6.1 breaks VM images on cluster node restart

2015-06-04 Thread Roger Lehmann

Hello, I'm having a serious problem with my GlusterFS cluster.
I'm using Proxmox 3.4 for high available VM management which works with 
GlusterFS as storage.
Unfortunately, when I restart every node in the cluster sequentially one 
by one (with online migration of the running HA VM first of course) the 
qemu image of the HA VM gets corrupted and the VM itself has problems 
accessing it.


May 15 10:35:09 blog kernel: [339003.942602] end_request: I/O error, dev vda, 
sector 2048
May 15 10:35:09 blog kernel: [339003.942829] Buffer I/O error on device vda1, 
logical block 0
May 15 10:35:09 blog kernel: [339003.942929] lost page write due to I/O error 
on vda1
May 15 10:35:09 blog kernel: [339003.942952] end_request: I/O error, dev vda, 
sector 2072
May 15 10:35:09 blog kernel: [339003.943049] Buffer I/O error on device vda1, 
logical block 3
May 15 10:35:09 blog kernel: [339003.943146] lost page write due to I/O error 
on vda1
May 15 10:35:09 blog kernel: [339003.943153] end_request: I/O error, dev vda, 
sector 4196712
May 15 10:35:09 blog kernel: [339003.943251] Buffer I/O error on device vda1, 
logical block 524333
May 15 10:35:09 blog kernel: [339003.943350] lost page write due to I/O error 
on vda1
May 15 10:35:09 blog kernel: [339003.943363] end_request: I/O error, dev vda, 
sector 4197184


After the image is broken, it's impossible to migrate the VM or start it 
when it's down.


root@pve2 ~ # gluster volume heal pve-vol info
Gathering list of entries to be healed on volume pve-vol has been successful

Brick pve1:/var/lib/glusterd/brick
Number of entries: 1
/images//200/vm-200-disk-1.qcow2

Brick pve2:/var/lib/glusterd/brick
Number of entries: 1
/images/200/vm-200-disk-1.qcow2

Brick pve3:/var/lib/glusterd/brick
Number of entries: 1
/images//200/vm-200-disk-1.qcow2



I couldn't really reproduce this in my test environment with GlusterFS 
3.6.2 but I had other problems while testing (may also be because of a 
virtualized test environment), so I don't want to upgrade to 3.6.2 until 
I definitely know the problems I encountered are fixed in 3.6.2.
Anybody else experienced this problem? I'm not sure if issue 1161885 
(Possible file corruption on dispersed volumes) is the issue I'm 
experiencing. I have a 3 node replicate cluster.

Thanks for your help!

Regards,
Roger Lehmann
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users