Re: [Gluster-users] VM disks corruption on 3.7.11

2016-06-13 Thread Kevin Lemonnier
> Kevin, did you solve this issue? Any updates? Oh yeah, we discussed it on IRC and it's apparently a known bug, it's fixed in the next version. I tested a patched version and it does seem to work, so I've been waiting for 3.7.12 since then to do some proper testing and confirm that it's been

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-06-13 Thread Gandalf Corvotempesta
2016-05-27 13:56 GMT+02:00 Kevin Lemonnier : > Yes, I did configure it to do a daily scrub when I reinstalled last time, > when I was wondering if maybe it was hardware. Doesn't seem like it detected > anything. Kevin, did you solve this issue? Any updates?

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-27 Thread Lindsay Mathieson
On 27/05/2016 9:56 PM, Kevin Lemonnier wrote: Yes, I did configure it to do a daily scrub when I reinstalled last time, when I was wondering if maybe it was hardware. Doesn't seem like it detected anything. I was wondering if the scrub was interfering with things -- Lindsay Mathieson

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-27 Thread Kevin Lemonnier
>Just a thought - do you have bitrot detection enabled? (I don't) Yes, I did configure it to do a daily scrub when I reinstalled last time, when I was wondering if maybe it was hardware. Doesn't seem like it detected anything. -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-27 Thread Lindsay Mathieson
On 26/05/2016 1:58 AM, Kevin Lemonnier wrote: There, re-created the VM from scratch, and still got the same errors. Just a thought - do you have bitrot detection enabled? (I don't) -- Lindsay Mathieson ___ Gluster-users mailing list

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-25 Thread Kevin Lemonnier
There, re-created the VM from scratch, and still got the same errors. Attached are the logs, I created the VM on node 50, worked fine. I tried to reboot it and start my import again, still worked fine. I powered off the VM, then started it again on node 2, rebooted it a bunch and just got the

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-25 Thread Kevin Lemonnier
Just did that, below is the output. Didn't seem to move after the boot, and no new lines when the I/O errors appeared. Also, as mentionned I tried moving the disk on NFS and had the exact same errors, so it doesn't look like it's a libgfapi problem .. I should probably re-create the VM, maybe

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-25 Thread Lindsay Mathieson
On 25/05/2016 5:58 PM, Kevin Lemonnier wrote: I use XFS, I read that was recommended. What are you using ? Since yours seems to work, I'm not opposed to changing ! ZFS - RAID10 (4 * WD Red 3TB) - 8GB ram dedicated to ZFS - SSD for log and cache (10GB and 100GB partitions respectively)

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-25 Thread Krutika Dhananjay
Also, it seems Lindsay knows a way to get the gluster client logs when using proxmox and libgfapi. Would it be possible for you to get that sorted with Lindsay's help before recreating this issue next time and share the glusterfs client logs from all the nodes when you do hit the issue? It is

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-25 Thread Kevin Lemonnier
Hi, Not that I know of, no. Doesn't look like the bricks have trouble communication, but is there a simple way to check that in glusterFS, some sort of brick uptime ? Who knows, maybe the bricks are flickering and I don't notice, that's entirely possible. As mentionned, the problem occurs on

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-25 Thread Krutika Dhananjay
Hi Kevin, If you actually ran into a 'read-only filesystem' issue, then it could possibly because of a bug in AFR that Pranith recently fixed. To confirm if that is indeed the case, could you tell me if you saw the pause after a brick (single brick) was down while IO was going on? -Krutika On

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-25 Thread Kevin Lemonnier
>Whats the underlying filesystem under the bricks? I use XFS, I read that was recommended. What are you using ? Since yours seems to work, I'm not opposed to changing ! -- Kevin Lemonnier PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 signature.asc Description: Digital signature

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-25 Thread Lindsay Mathieson
On 25/05/2016 5:36 PM, Kevin Lemonnier wrote: Nope, not solved ! Looks like directsync just delays the problem, this morning the VM had thrown a bunch of I/O errors again. Tried writethrough and it seems to behave exactly like cache=none, the errors appear in a few minutes. Trying again with

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-25 Thread Kevin Lemonnier
Nope, not solved ! Looks like directsync just delays the problem, this morning the VM had thrown a bunch of I/O errors again. Tried writethrough and it seems to behave exactly like cache=none, the errors appear in a few minutes. Trying again with directsync and no errors for now, so it looks like

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-24 Thread Nicolas Ecarnot
Le 24/05/2016 12:54, Lindsay Mathieson a écrit : On 24/05/2016 8:24 PM, Kevin Lemonnier wrote: So the VM were configured with cache set to none, I just tried with cache=directsync and it seems to be fixing the issue. Still need to run more test, but did a couple already with that option and no

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-24 Thread Lindsay Mathieson
On 24/05/2016 8:24 PM, Kevin Lemonnier wrote: So the VM were configured with cache set to none, I just tried with cache=directsync and it seems to be fixing the issue. Still need to run more test, but did a couple already with that option and no I/O errors. Never had to do this before, is it

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-24 Thread Kevin Lemonnier
So the VM were configured with cache set to none, I just tried with cache=directsync and it seems to be fixing the issue. Still need to run more test, but did a couple already with that option and no I/O errors. Never had to do this before, is it known ? Found the clue in some old mail from this

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-24 Thread Kevin Lemonnier
Hi, Some news on this. I actually don't need to trigger a heal to get corruption, so the problem is not the healing. Live migrating the VM seems to trigger corruption every time, and even without that just doing a database import, rebooting then doing another import seems to corrupt as well. To

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-23 Thread Kevin Lemonnier
Hi, I didn't specify it but I use "localhost" to add the storage in proxmox. My thinking is that every proxmox node is also a glusterFS node, so that should work fine. I don't want to use the "normal" way of setting a regular address in there because you can't change it afterwards in proxmox,

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-19 Thread David Gossage
*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284 On Thu, May 19, 2016 at 7:25 PM, Kevin Lemonnier wrote: > The I/O errors are happening after, not during the heal. > As described, I just rebooted a node, waited for the heal to finish, >

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-19 Thread Kevin Lemonnier
The I/O errors are happening after, not during the heal. As described, I just rebooted a node, waited for the heal to finish, rebooted another, waited for the heal to finish then rebooted the third. From that point, the VM just has a lot of I/O errors showing whenever I use the disk a lot

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-19 Thread Alastair Neil
I am slightly confused you say you have image file corruption but then you say the qemu-img check says there is no corruption. If what you mean is that you see I/O errors during a heal this is likely to be due to io starvation, something that is a well know issue. There is work happening to

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-19 Thread Lindsay Mathieson
On 19/05/2016 12:17 AM, Lindsay Mathieson wrote: One thought - since the VM's are active while the brick is removed/re-added, could it be the shards that are written while the brick is added that are the reverse healing shards? I tested by: - removing brick 3 - erasing brick 3 - closing

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-18 Thread Lindsay Mathieson
On 18/05/2016 11:41 PM, Krutika Dhananjay wrote: I will try to recreate this issue tomorrow on my machines with the steps that Lindsay provided in this thread. I will let you know the result soon after that. Thanks Krutika, I've been trying to get the shard stats you wanted, but by the time

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-18 Thread Kevin Lemonnier
Some additional details if it helps, there is no cache on the disk, it's virtio and iothread=1. The file is in qcow and using qemu-img check it says it's not corrupted, but when the VM is running I have I/O Errors. As you can see in the config, performance.stat-prefetch: off but being on a debian

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-18 Thread Krutika Dhananjay
Hi, I will try to recreate this issue tomorrow on my machines with the steps that Lindsay provided in this thread. I will let you know the result soon after that. -Krutika On Wednesday, May 18, 2016, Kevin Lemonnier wrote: > Hi, > > Some news on this. > Over the week end

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-18 Thread Kevin Lemonnier
Hi, Some news on this. Over the week end the RAID Card of the node ipvr2 died, and I thought that maybe that was the problem all along. The RAID Card was changed and yesterday I reinstalled everything. Same problem just now. My test is simple, using the website hosted on the VMs all the time I

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-12 Thread Kevin Lemonnier
As discussed, the missing ipvr50 log file. On Thu, May 12, 2016 at 04:24:14PM +0200, Kevin Lemonnier wrote: > As requested on IRC, here are the logs on the 3 nodes. > > On Thu, May 12, 2016 at 04:03:02PM +0200, Kevin Lemonnier wrote: > > Hi, > > > > I had a problem some time ago with 3.7.6 and

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-12 Thread Lindsay Mathieson
On 13/05/2016 12:03 AM, Kevin Lemonnier wrote: I just tried to refresh the database by importing the production one on the two MySQL VMs, and both of them started doing I/O errors. Sorry, I don't quite undertsand what you did - you migrated 1 or 2 VM's onto the test gluster volume? --

Re: [Gluster-users] VM disks corruption on 3.7.11

2016-05-12 Thread Kevin Lemonnier
As requested on IRC, here are the logs on the 3 nodes. On Thu, May 12, 2016 at 04:03:02PM +0200, Kevin Lemonnier wrote: > Hi, > > I had a problem some time ago with 3.7.6 and freezing during heals, > and multiple persons advised to use 3.7.11 instead. Indeed, with that > version the freez

[Gluster-users] VM disks corruption on 3.7.11

2016-05-12 Thread Kevin Lemonnier
Hi, I had a problem some time ago with 3.7.6 and freezing during heals, and multiple persons advised to use 3.7.11 instead. Indeed, with that version the freez problem is fixed, it works like a dream ! You can almost not tell that a node is down or healing, everything keeps working except for a