Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-22 Thread Krutika Dhananjay
Awesome. Thanks for the logs. Will take a look. -Krutika On Sun, Oct 23, 2016 at 5:47 AM, Lindsay Mathieson < lindsay.mathie...@gmail.com> wrote: > On 20/10/2016 9:13 PM, Krutika Dhananjay wrote: > >> It would be awesome if you could tell us whether you >> see the issue with FUSE as well, while

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-22 Thread Lindsay Mathieson
On 20/10/2016 9:13 PM, Krutika Dhananjay wrote: It would be awesome if you could tell us whether you see the issue with FUSE as well, while we get around to setting up the environment and running the test ourselves. I just managed to replicate the exact same error using the fuse mount --

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-22 Thread Lindsay Mathieson
On 20/10/2016 8:43 AM, Joe Julian wrote: Is there a bug open with the client and server logs attached? I would take a stab at reproducing as well, but I want to make sure I'm comparing apples to apples. https://bugzilla.redhat.com/show_bug.cgi?id=1387878 -- Lindsay Mathieson

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-21 Thread Lindsay Mathieson
On 21/10/2016 12:39 PM, Lindsay Mathieson wrote: And now I have it all setup for logging etc I can't reproduce the error:( Ah, figured out what I was doing different - started the heavy I/O before the add bricks and rebalance which reliably triggers a "volume rebalance: teststore1: failed:

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-20 Thread Lindsay Mathieson
And now I have it all setup for logging etc I can't reproduce the error :( Though I did manage to score a "volume rebalance: teststore1: failed: Another transaction is in progress for teststore1. Please try again after sometime" problem. No gluster commands would work after that. I had to restart

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-20 Thread Krutika Dhananjay
Thanks a lot, Lindsay! Appreciate the help. It would be awesome if you could tell us whether you see the issue with FUSE as well, while we get around to setting up the environment and running the test ourselves. -Krutika On Thu, Oct 20, 2016 at 2:57 AM, Lindsay Mathieson <

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-20 Thread Hans Henrik Happe
Hi, This is scary stuff. While not as scary, you might confirm a bug that I reported a while back on your test systems: https://bugzilla.redhat.com/show_bug.cgi?id=1370832 Cheers, Hans Henrik On 19-10-2016 08:40, Krutika Dhananjay wrote: Agreed. I will run the same test on an actual vm

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson
On 20/10/2016 10:03 AM, Joe Julian wrote: Heh, well then a little less scary might be: find /var/log/glusterfs -type f Then if the list looks correct you can easily find /var/log/glusterfs -type f | xargs truncate --size=0 Thanks, I'll script that up. Will put together a test plan

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian
On 10/19/2016 04:59 PM, Lindsay Mathieson wrote: On 20/10/2016 9:48 AM, Joe Julian wrote: Personally, with logrotate I use copytruncate. If I want to truncate the whole shebang manually: find -type f -exec truncate --size=0 {} \; Eeep! that looks scary, I just see me running that from

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson
On 20/10/2016 9:48 AM, Joe Julian wrote: Personally, with logrotate I use copytruncate. If I want to truncate the whole shebang manually: find -type f -exec truncate --size=0 {} \; Eeep! that looks scary, I just see me running that from root at 3am. -- Lindsay Mathieson

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian
On 10/19/2016 04:46 PM, Lindsay Mathieson wrote: On 20/10/2016 9:30 AM, Joe Julian wrote: Joe, when you say truncated, just delete the logs before the test is started? That's one way of doing it, yes. :) nb: I've noticed that if you delete the logs while gluster is running they won't be

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson
On 20/10/2016 9:30 AM, Joe Julian wrote: Joe, when you say truncated, just delete the logs before the test is started? That's one way of doing it, yes. :) nb: I've noticed that if you delete the logs while gluster is running they won't be recreated, you have to restart the services. Is

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson
On 20/10/2016 9:31 AM, Joe Julian wrote: That makes me think, too, that it might be nice to see what kind of VM proxmox creates. I assume it uses libvirt so a dumpxml of an affected VM might be telling as well Unfortunately not, they use their own config file and build a cmd line from that,

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian
On 10/19/2016 04:21 PM, Kevin Lemonnier wrote: If you have clean logs (truncated before you reproduced) everything from /var/log/glusterfs from the client and all the servers would be great. I'm assuming that proxmox doesn't redirect the client logs to some other place. No client logs with

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian
On 10/19/2016 04:28 PM, Lindsay Mathieson wrote: On 20/10/2016 9:21 AM, Kevin Lemonnier wrote: No client logs with proxmox unfortunatly, you need to start qemu by hand to get those as far as I know. Ah, those logs! forgot about that. I'll replicate it again soo. Might take a couple of days

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson
On 20/10/2016 9:21 AM, Kevin Lemonnier wrote: No client logs with proxmox unfortunatly, you need to start qemu by hand to get those as far as I know. Ah, those logs! forgot about that. I'll replicate it again soo. Might take a couple of days and create a bug report. If you have clean

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier
> > If you have clean logs (truncated before you reproduced) everything from > /var/log/glusterfs from the client and all the servers would be great. > I'm assuming that proxmox doesn't redirect the client logs to some other > place. > No client logs with proxmox unfortunatly, you need to

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian
On 10/19/2016 04:04 PM, Lindsay Mathieson wrote: On 20/10/2016 8:43 AM, Joe Julian wrote: Is there a bug open with the client and server logs attached? I would take a stab at reproducing as well, but I want to make sure I'm comparing apples to apples. Which log names are those (sorry). I

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson
On 20/10/2016 8:43 AM, Joe Julian wrote: Is there a bug open with the client and server logs attached? I would take a stab at reproducing as well, but I want to make sure I'm comparing apples to apples. Which log names are those (sorry). I have all the logs from this mornings trial

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian
Is there a bug open with the client and server logs attached? I would take a stab at reproducing as well, but I want to make sure I'm comparing apples to apples. On 10/19/2016 02:39 PM, Kevin Lemonnier wrote: Looked like it was going ok for a while, then blew up. The first windows vm which

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier
> > Looked like it was going ok for a while, then blew up. The first windows > vm which was running diskmark died and won't boot. qemu-img check shows > the image hopelessly corrupted. 2nd VM has also crashed and is > unbootable, though qemuimg shows the qcow2 file as ok. > Ha, glad you

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson
On 20/10/2016 7:01 AM, Kevin Lemonnier wrote: Yes, you need to add a full replica set at once. I don't remember, but according to my history, looks like I've used this : gluster volume add-brick VMs host1:/brick host2:/brick host3:/brick force (I have the same without force just before that,

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier
> > Ok, I'll give that a go > - what command do you use? > - I think from memory the extra bricks have to be added in blocks of three? > Yes, you need to add a full replica set at once. I don't remember, but according to my history, looks like I've used this : gluster volume add-brick VMs

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson
On 20/10/2016 6:39 AM, Kevin Lemonnier wrote: Did you add a third brick in replication, if I understand it correctly ? Yup, took it from rep 2 to rep 3 That's not the problem I had, that does seem to be working fine. What broke was adding 3 new bricks to a replica 3, bumping it from 1 x 3 to

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier
> > I had a go at reproducing it last night Kevin with 3.8.4 > > - new volume > > - Initial 2 bricks on two hosts > > - copied a windows VM on to it > > - Started some load (Crystal DiskMark in the VM) > > - Added a 3rd brick and node > For the size it's already more that what I have, most

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson
On 19/10/2016 11:08 PM, Kevin Lemonnier wrote: Yes, to be honest I wasn't even planning on rebalancing just yet, I was planning on letting it run a few days before, see if everything's fine, and then maybe rebalance. I tried the rebalance when everything came crashing down, hoping that might fix

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier
> > As Kevin said, the problem appeared before rebalancing if I understood > correctly. > Yes, to be honest I wasn't even planning on rebalancing just yet, I was planning on letting it run a few days before, see if everything's fine, and then maybe rebalance. I tried the rebalance when

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Xavier Hernandez
On 19/10/16 14:59, Gandalf Corvotempesta wrote: Il 19 ott 2016 14:32, "Xavier Hernandez" > ha scritto: I had a similar issue while moving machines from an old gluster volume to a new volume with sharding enabled and I added new bricks to

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Gandalf Corvotempesta
Il 19 ott 2016 14:32, "Xavier Hernandez" ha scritto: > I had a similar issue while moving machines from an old gluster volume to a new volume with sharding enabled and I added new bricks to it. Maybe related to rebalance after adding bricks on a sharded volume? maybe that

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier
> are you using Proxmox, right ? > Yes, indeed. > I think it's important because Proxmox uses gfapi to connect each VM to > the disk, not FUSE. Maybe this is important to find the cause. I believe so yes, and I should add (I believe I mentionned it) that I am using GlusterFS 3.7.12. It took a

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Xavier Hernandez
Hi Kevin, are you using Proxmox, right ? I think it's important because Proxmox uses gfapi to connect each VM to the disk, not FUSE. Maybe this is important to find the cause. I had a similar issue while moving machines from an old gluster volume to a new volume with sharding enabled and I

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Krutika Dhananjay
Agreed. I will run the same test on an actual vm setup one of these days and see if I manage to recreate the issue (after I have completed some of my long pending tasks). Meanwhile if any of you find a consistent simpler test case to hit the issue, feel free to reply on this thread. At least I had

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-17 Thread Gandalf Corvotempesta
Il 14 ott 2016 17:37, "David Gossage" ha scritto: > > Sorry to resurrect an old email but did any resolution occur for this or a cause found? I just see this as a potential task I may need to also run through some day and if their are pitfalls to watch for would be

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-17 Thread Kevin Lemonnier
> >I see that network.ping-timeout on your setup is 15 seconds andA that's >too low. Could you reconfigure that to 30 seconds? > Yes, I can. I set it to 15 to be sure no browser would timeout when trying to load a website on a frozen VM during the timeout, 15 seemed pretty good since

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-17 Thread Kevin Lemonnier
On Fri, Oct 14, 2016 at 10:37:03AM -0500, David Gossage wrote: >Sorry to resurrect an old email but did any resolution occur for this or a >cause found?A I just see this as a potential task I may need to also run >through some day and if their are pitfalls to watch for would be good

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-16 Thread Krutika Dhananjay
Hi, No. I did run add-brick on a volume with the same configuration as that of Kevin, while IO was running, except that I wasn't running VM workload. I compared the file checksums wrt the original src files from which they were copied and they matched. @Kevin, I see that network.ping-timeout

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-14 Thread David Gossage
Sorry to resurrect an old email but did any resolution occur for this or a cause found? I just see this as a potential task I may need to also run through some day and if their are pitfalls to watch for would be good to know. *David Gossage* *Carousel Checks Inc. | System Administrator* *Office*

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-09-06 Thread Kevin Lemonnier
Hi, Here is the info : Volume Name: VMs Type: Replicate Volume ID: c5272382-d0c8-4aa4-aced-dd25a064e45c Status: Started Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ips4adm.name:/mnt/storage/VMs Brick2: ips5adm.name:/mnt/storage/VMs Brick3: ips6adm.name:/mnt/storage/VMs

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-09-05 Thread Krutika Dhananjay
Could you please attach the glusterfs client and brick logs? Also provide output of `gluster volume info`. -Krutika On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier wrote: > >- What was the original (and current) geometry? (status and info) > > It was a 1x3 that I was

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-09-05 Thread Kevin Lemonnier
>- What was the original (and current) geometry? (status and info) It was a 1x3 that I was trying to bump to 2x3. >- what parameters did you use when adding the bricks? > Just a simple add-brick node1:/path node2:/path node3:/path Then a fix-layout when everything started going wrong.

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-09-05 Thread Lindsay Mathieson
Sorry, no answers :( but probably useful to post some more info - What was the original (and current) geometry? (status and info) - what parameters did you use when adding the bricks? On 6/09/2016 8:00 AM, Kevin Lemonnier wrote: I tried a fix-layout, and since that didn't work I removed the

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-09-05 Thread Kevin Lemonnier
I tried a fix-layout, and since that didn't work I removed the brick (start then commit when it showed completed). Not better, the volume is now running on the 3 original bricks (replica 3) but the VMs are still corrupted. I have 880 Mb of shards on the bricks I removed for some reason, thos