subject:"Re\: \[Gluster\-users\] \[URGENT\] Add\-bricks to a volume corrupted the files"

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-22 Thread Krutika Dhananjay

Awesome. Thanks for the logs. Will take a look.

-Krutika

On Sun, Oct 23, 2016 at 5:47 AM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> On 20/10/2016 9:13 PM, Krutika Dhananjay wrote:
>
>> It would be awesome if you could tell us whether you
>> see the issue with FUSE as well, while we get around
>> to setting up the environment and running the test ourselves.
>>
>
> I just managed to replicate the exact same error using the fuse mount
>
> --
> Lindsay Mathieson
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-22 Thread Lindsay Mathieson


On 20/10/2016 9:13 PM, Krutika Dhananjay wrote:

It would be awesome if you could tell us whether you
see the issue with FUSE as well, while we get around
to setting up the environment and running the test ourselves.


I just managed to replicate the exact same error using the fuse mount

--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-22 Thread Lindsay Mathieson


On 20/10/2016 8:43 AM, Joe Julian wrote:
Is there a bug open with the client and server logs attached? I would 
take a stab at reproducing as well, but I want to make sure I'm 
comparing apples to apples.


https://bugzilla.redhat.com/show_bug.cgi?id=1387878

--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-21 Thread Lindsay Mathieson


On 21/10/2016 12:39 PM, Lindsay Mathieson wrote:

And now I have it all setup for logging etc I can't reproduce the error:(


Ah, figured out what I was doing different - started the heavy I/O 
before the add bricks and rebalance which reliably triggers a "volume 
rebalance: teststore1: failed: Another transaction is in progress for 
teststore1. Please try again after sometime"



If I start the I/O after the rebalance starts then things rapidly go 
pear shaped. However busy this saturday (Spouses birthday), so I'll 
probably get the test and logs sorted Sunday.


--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-20 Thread Lindsay Mathieson

And now I have it all setup for logging etc I can't reproduce the error :(

Though I did manage to score a "volume rebalance: teststore1: failed:
Another transaction is in progress for teststore1. Please try again
after sometime" problem. No gluster commands would work after that. I
had to restart the glusterfsd service.

On 20 October 2016 at 21:13, Krutika Dhananjay  wrote:
> Thanks a lot, Lindsay! Appreciate the help.
>
> It would be awesome if you could tell us whether you
> see the issue with FUSE as well, while we get around
> to setting up the environment and running the test ourselves.
>
> -Krutika
>
> On Thu, Oct 20, 2016 at 2:57 AM, Lindsay Mathieson
>  wrote:
>>
>> On 20/10/2016 7:01 AM, Kevin Lemonnier wrote:
>>>
>>> Yes, you need to add a full replica set at once.
>>> I don't remember, but according to my history, looks like I've used this
>>> :
>>>
>>> gluster volume add-brick VMs host1:/brick host2:/brick host3:/brick force
>>>
>>> (I have the same without force just before that, so I assume force is
>>> needed)
>>
>>
>> Ok, I did a:
>>
>> gluster volume add-brick datastore1
>> vna.proxmox.softlog:/tank/vmdata/datastore1-2
>> vnb.proxmox.softlog:/tank/vmdata/datastore1-2
>> vng.proxmox.softlog:/tank/vmdata/datastore1-2
>>
>> I had added a 2nd windows VM as well.
>>
>> Looked like it was going ok for a while, then blew up. The first windows
>> vm which was running diskmark died and won't boot. qemu-img check shows the
>> image hopelessly corrupted. 2nd VM has also crashed and is unbootable,
>> though qemuimg shows the qcow2 file as ok.
>>
>>
>> I have a sneaking suspicion its related to active IO. VM1 was doing heavy
>> io compared to vm2, perhaps thats while is image was corrupted worse.
>>
>>
>> rebalance status looks odd to me:
>>
>> root@vna:~# gluster volume rebalance datastore1 status
>> Node Rebalanced-files  size
>> scanned  failures skipped   status  run time in h:m:s
>>- ---   ---
>> ---   --- --- 
>> --
>>localhost 00Bytes 0
>> 0 0completed0:0:1
>>  vnb.proxmox.softlog 00Bytes 0
>> 0 0completed0:0:1
>>  vng.proxmox.softlog 32819.2GB  1440
>> 0 0  in progress0:11:55
>>
>>
>> Don't know why vng is taking so much longer, the nodes are identical. But
>> maybe this normal?
>>
>>
>> When I get time, I'll try again with:
>>
>> - all vm's shutdown (no IO)
>>
>> - All VM's running off the gluster fuse mount (no gfapi).
>>
>>
>> cheers,
>>
>>
>> --
>> Lindsay Mathieson
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>
>



-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-20 Thread Krutika Dhananjay

Thanks a lot, Lindsay! Appreciate the help.

It would be awesome if you could tell us whether you
see the issue with FUSE as well, while we get around
to setting up the environment and running the test ourselves.

-Krutika

On Thu, Oct 20, 2016 at 2:57 AM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> On 20/10/2016 7:01 AM, Kevin Lemonnier wrote:
>
>> Yes, you need to add a full replica set at once.
>> I don't remember, but according to my history, looks like I've used this :
>>
>> gluster volume add-brick VMs host1:/brick host2:/brick host3:/brick force
>>
>> (I have the same without force just before that, so I assume force is
>> needed)
>>
>
> Ok, I did a:
>
> gluster volume add-brick datastore1 
> vna.proxmox.softlog:/tank/vmdata/datastore1-2
> vnb.proxmox.softlog:/tank/vmdata/datastore1-2
> vng.proxmox.softlog:/tank/vmdata/datastore1-2
>
> I had added a 2nd windows VM as well.
>
> Looked like it was going ok for a while, then blew up. The first windows
> vm which was running diskmark died and won't boot. qemu-img check shows the
> image hopelessly corrupted. 2nd VM has also crashed and is unbootable,
> though qemuimg shows the qcow2 file as ok.
>
>
> I have a sneaking suspicion its related to active IO. VM1 was doing heavy
> io compared to vm2, perhaps thats while is image was corrupted worse.
>
>
> rebalance status looks odd to me:
>
> root@vna:~# gluster volume rebalance datastore1 status
> Node Rebalanced-files  size
>scanned  failures skipped   status  run time in h:m:s
>- ---   ---
>  ---   --- --- 
>  --
>localhost 00Bytes 0
>  0 0completed0:0:1
>  vnb.proxmox.softlog 00Bytes 0
>  0 0completed0:0:1
>  vng.proxmox.softlog 32819.2GB  1440
>0 0  in progress0:11:55
>
>
> Don't know why vng is taking so much longer, the nodes are identical. But
> maybe this normal?
>
>
> When I get time, I'll try again with:
>
> - all vm's shutdown (no IO)
>
> - All VM's running off the gluster fuse mount (no gfapi).
>
>
> cheers,
>
>
> --
> Lindsay Mathieson
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-20 Thread Hans Henrik Happe

Hi,

This is scary stuff. While not as scary, you might confirm a bug that I 
reported a while back on your test systems:

https://bugzilla.redhat.com/show_bug.cgi?id=1370832

Cheers,
Hans Henrik

On 19-10-2016 08:40, Krutika Dhananjay wrote:

Agreed.
I will run the same test on an actual vm setup one of these days and
see if I manage to recreate the issue (after I have completed some
of my long pending tasks). Meanwhile if any of you find a consistent simpler
test case to hit the issue, feel free to reply on this thread. At least
I had no success
in recreating the bug in a non-VM-store setup.

-Krutika

On Mon, Oct 17, 2016 at 12:50 PM, Gandalf Corvotempesta
> wrote:

Il 14 ott 2016 17:37, "David Gossage" > ha scritto:
>
> Sorry to resurrect an old email but did any resolution occur for this or 
a cause found?  I just see this as a potential task I may need to also run through 
some day and if their are pitfalls to watch for would be good to know.
>

I think that the issue wrote in these emails must be addressed in
some way.
It's really bad that adding bricks to a cluster lead to data
corruption as adding bricks is a standard administration task

I hope that the issue will be detected and fixed asap.

___
Gluster-users mailing list
Gluster-users@gluster.org 
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson


On 20/10/2016 10:03 AM, Joe Julian wrote:

Heh, well then a little less scary might be:

find /var/log/glusterfs -type f

Then if the list looks correct you can easily

find /var/log/glusterfs -type f | xargs truncate --size=0 


Thanks, I'll script that up.


Will put together a test plan and run it by y'all later.

--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian




On 10/19/2016 04:59 PM, Lindsay Mathieson wrote:

On 20/10/2016 9:48 AM, Joe Julian wrote:
Personally, with logrotate I use copytruncate. If I want to truncate 
the whole shebang manually:


find -type f -exec truncate --size=0 {} \;


Eeep! that looks scary, I just see me running that from root at 3am.



Heh, well then a little less scary might be:

find /var/log/glusterfs -type f

Then if the list looks correct you can easily

find /var/log/glusterfs -type f | xargs truncate --size=0
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson


On 20/10/2016 9:48 AM, Joe Julian wrote:
Personally, with logrotate I use copytruncate. If I want to truncate 
the whole shebang manually:


find -type f -exec truncate --size=0 {} \;


Eeep! that looks scary, I just see me running that from root at 3am.

--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian


On 10/19/2016 04:46 PM, Lindsay Mathieson wrote:

On 20/10/2016 9:30 AM, Joe Julian wrote:
Joe, when you say truncated, just delete the logs before the test is 
started?




That's one way of doing it, yes. :)


nb: I've noticed that if you delete the logs while gluster is running 
they won't be recreated, you have to restart the services. Is there a 
better way of doing it?




Personally, with logrotate I use copytruncate. If I want to truncate the 
whole shebang manually:


find -type f -exec truncate --size=0 {} \;


--
Lindsay Mathieson


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson


On 20/10/2016 9:30 AM, Joe Julian wrote:
Joe, when you say truncated, just delete the logs before the test is 
started?




That's one way of doing it, yes. :)


nb: I've noticed that if you delete the logs while gluster is running 
they won't be recreated, you have to restart the services. Is there a 
better way of doing it?


--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson


On 20/10/2016 9:31 AM, Joe Julian wrote:
That makes me think, too, that it might be nice to see what kind of VM 
proxmox creates. I assume it uses libvirt so a dumpxml of an affected 
VM might be telling as well


Unfortunately not, they use their own config file and build a cmd line 
from that, but it is a easy to read text format, plus I can dump the 
args of a running vm.


--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian


On 10/19/2016 04:21 PM, Kevin Lemonnier wrote:

If you have clean logs (truncated before you reproduced) everything from
/var/log/glusterfs from the client and all the servers would be great.
I'm assuming that proxmox doesn't redirect the client logs to some other
place.


No client logs with proxmox unfortunatly, you need to start qemu by hand
to get those as far as I know.




That makes me think, too, that it might be nice to see what kind of VM 
proxmox creates. I assume it uses libvirt so a dumpxml of an affected VM 
might be telling as well.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian


On 10/19/2016 04:28 PM, Lindsay Mathieson wrote:

On 20/10/2016 9:21 AM, Kevin Lemonnier wrote:

No client logs with proxmox unfortunatly, you need to start qemu by hand
to get those as far as I know.


Ah, those logs! forgot about that.


I'll replicate it again soo. Might take a couple of days and create a 
bug report.




If you have clean logs (truncated before you reproduced)


Joe, when you say truncated, just delete the logs before the test is 
started?




That's one way of doing it, yes. :)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson


On 20/10/2016 9:21 AM, Kevin Lemonnier wrote:

No client logs with proxmox unfortunatly, you need to start qemu by hand
to get those as far as I know.


Ah, those logs! forgot about that.


I'll replicate it again soo. Might take a couple of days and create a 
bug report.




If you have clean logs (truncated before you reproduced)


Joe, when you say truncated, just delete the logs before the test is 
started?


--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier

> 
> If you have clean logs (truncated before you reproduced) everything from 
> /var/log/glusterfs from the client and all the servers would be great. 
> I'm assuming that proxmox doesn't redirect the client logs to some other 
> place.
> 

No client logs with proxmox unfortunatly, you need to start qemu by hand
to get those as far as I know.

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian


On 10/19/2016 04:04 PM, Lindsay Mathieson wrote:

On 20/10/2016 8:43 AM, Joe Julian wrote:
Is there a bug open with the client and server logs attached? I would 
take a stab at reproducing as well, but I want to make sure I'm 
comparing apples to apples.


Which log names are those (sorry). I have all the logs from this 
mornings trial available.




If you have clean logs (truncated before you reproduced) everything from 
/var/log/glusterfs from the client and all the servers would be great. 
I'm assuming that proxmox doesn't redirect the client logs to some other 
place.


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson


On 20/10/2016 8:43 AM, Joe Julian wrote:
Is there a bug open with the client and server logs attached? I would 
take a stab at reproducing as well, but I want to make sure I'm 
comparing apples to apples.


Which log names are those (sorry). I have all the logs from this 
mornings trial available.


--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Joe Julian

Is there a bug open with the client and server logs attached? I would 
take a stab at reproducing as well, but I want to make sure I'm 
comparing apples to apples.



On 10/19/2016 02:39 PM, Kevin Lemonnier wrote:

Looked like it was going ok for a while, then blew up. The first windows
vm which was running diskmark died and won't boot. qemu-img check shows
the image hopelessly corrupted. 2nd VM has also crashed and is
unbootable, though qemuimg shows the qcow2 file as ok.


Ha, glad you could reproduce this ! (Well, all things considered)
Looks very much like what I had indeed. So it's still a problem
in recent versions, glad I didn't try again then.
Thanks for taking the time, let's hope that'll help them :)



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier

> 
> Looked like it was going ok for a while, then blew up. The first windows 
> vm which was running diskmark died and won't boot. qemu-img check shows 
> the image hopelessly corrupted. 2nd VM has also crashed and is 
> unbootable, though qemuimg shows the qcow2 file as ok.
> 

Ha, glad you could reproduce this ! (Well, all things considered)
Looks very much like what I had indeed. So it's still a problem
in recent versions, glad I didn't try again then.
Thanks for taking the time, let's hope that'll help them :)

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson


On 20/10/2016 7:01 AM, Kevin Lemonnier wrote:

Yes, you need to add a full replica set at once.
I don't remember, but according to my history, looks like I've used this :

gluster volume add-brick VMs host1:/brick host2:/brick host3:/brick force

(I have the same without force just before that, so I assume force is needed)


Ok, I did a:

gluster volume add-brick datastore1 
vna.proxmox.softlog:/tank/vmdata/datastore1-2 
vnb.proxmox.softlog:/tank/vmdata/datastore1-2 
vng.proxmox.softlog:/tank/vmdata/datastore1-2


I had added a 2nd windows VM as well.

Looked like it was going ok for a while, then blew up. The first windows 
vm which was running diskmark died and won't boot. qemu-img check shows 
the image hopelessly corrupted. 2nd VM has also crashed and is 
unbootable, though qemuimg shows the qcow2 file as ok.



I have a sneaking suspicion its related to active IO. VM1 was doing 
heavy io compared to vm2, perhaps thats while is image was corrupted worse.



rebalance status looks odd to me:

root@vna:~# gluster volume rebalance datastore1 status
Node Rebalanced-files  
size   scanned  failures skipped   status  run time 
in h:m:s
   - ---   ---   
---   --- ---  
--
   localhost 00Bytes 
0 0 0completed0:0:1
 vnb.proxmox.softlog 00Bytes 
0 0 0completed0:0:1
 vng.proxmox.softlog 32819.2GB  
1440 0 0  in progress0:11:55



Don't know why vng is taking so much longer, the nodes are identical. 
But maybe this normal?



When I get time, I'll try again with:

- all vm's shutdown (no IO)

- All VM's running off the gluster fuse mount (no gfapi).


cheers,

--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier

> 
> Ok, I'll give that a go
> - what command do you use?
> - I think from memory the extra bricks have to be added in blocks of three?
> 

Yes, you need to add a full replica set at once.
I don't remember, but according to my history, looks like I've used this :

gluster volume add-brick VMs host1:/brick host2:/brick host3:/brick force

(I have the same without force just before that, so I assume force is needed)

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson


On 20/10/2016 6:39 AM, Kevin Lemonnier wrote:

Did you add a third brick in replication, if I understand it correctly ?


Yup, took it from rep 2 to rep 3


That's not the problem I had, that does seem to be working fine.
What broke was adding 3 new bricks to a replica 3, bumping it from 1 x 3
to a 2 x 3. That doesn't start a heal, at least not until you start a
rebalance, but I didn't even get to that point:(


Ok, I'll give that a go
- what command do you use?
- I think from memory the extra bricks have to be added in blocks of three?

My "bricks" are just folders on zfs pools, but that shouldn't make a 
difference.



--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier

> 
> I had a go at reproducing it last night Kevin with 3.8.4
> 
> - new volume
> 
> - Initial 2 bricks on two hosts
> 
> - copied a windows VM on to it
> 
> - Started some load (Crystal DiskMark in the VM)
> 
> - Added a 3rd brick and node
> 

For the size it's already more that what I have, most of our VMs are
20 Gb or less. They are almost all Linux webservers, the load on the disk
comes from MySQL almost exclusivly, but I guess load is load.
Still, it's not much, I think we currently have something like 15 Mb/s
average writing on the volume, so nothing crazy.

Did you add a third brick in replication, if I understand it correctly ?
That's not the problem I had, that does seem to be working fine.
What broke was adding 3 new bricks to a replica 3, bumping it from 1 x 3
to a 2 x 3. That doesn't start a heal, at least not until you start a
rebalance, but I didn't even get to that point :(

That's the only bug I've experienced so far in 3.7.12, everything else
(including increasing the replica count) seems to be working perfectly fine.
That's why I'm still installing that version, even though it's outdated.

Thanks !
-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Lindsay Mathieson


On 19/10/2016 11:08 PM, Kevin Lemonnier wrote:

Yes, to be honest I wasn't even planning on rebalancing just yet, I was planning
on letting it run a few days before, see if everything's fine, and then maybe
rebalance. I tried the rebalance when everything came crashing down, hoping that
might fix it, but it didn't.


I had a go at reproducing it last night Kevin with 3.8.4

- new volume

- Initial 2 bricks on two hosts

- copied a windows VM on to it

- Started some load (Crystal DiskMark in the VM)

- Added a 3rd brick and node



The 3rd bricks started healing right away and there were no issues. 
Eventually it was all healed and a normal rep 3 volume. Tonight I'lll 
compare the bricks with md5 but for now it seems ok.


- It was only one 40GB VM

- Not sure if the procedure is how it originally started for you.


If you'd like me to try a different process I can give it a goo, have 
the space and time and the cluster has been quite for weeks - time to 
break something :)



--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier

> 
> As Kevin said, the problem appeared before rebalancing if I understood 
> correctly.
> 

Yes, to be honest I wasn't even planning on rebalancing just yet, I was planning
on letting it run a few days before, see if everything's fine, and then maybe
rebalance. I tried the rebalance when everything came crashing down, hoping that
might fix it, but it didn't.

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Xavier Hernandez



On 19/10/16 14:59, Gandalf Corvotempesta wrote:

Il 19 ott 2016 14:32, "Xavier Hernandez" > ha scritto:

I had a similar issue while moving machines from an old gluster volume

to a new volume with sharding enabled and I added new bricks to it.

Maybe related to rebalance after adding bricks on a sharded volume?
maybe that some shareds are moved around and non detected properly by
gluster?  If a single shard is missing, the whole file is corrupted.



As Kevin said, the problem appeared before rebalancing if I understood 
correctly.


Xavi
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Gandalf Corvotempesta

Il 19 ott 2016 14:32, "Xavier Hernandez"  ha scritto:
> I had a similar issue while moving machines from an old gluster volume to
a new volume with sharding enabled and I added new bricks to it.

Maybe related to rebalance after adding bricks on a sharded volume?
maybe that some shareds are moved around and non detected properly by
gluster?  If a single shard is missing, the whole file is corrupted.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Kevin Lemonnier

> are you using Proxmox, right ?
> 

Yes, indeed.

> I think it's important because Proxmox uses gfapi to connect each VM to 
> the disk, not FUSE. Maybe this is important to find the cause.

I believe so yes, and I should add (I believe I mentionned it) that I am using
GlusterFS 3.7.12. It took a while to finally get a version that worked for us,
so we stayed on it once we got it. Maybe that problem has already been fixed in
later versions.

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Xavier Hernandez


Hi Kevin,

are you using Proxmox, right ?

I think it's important because Proxmox uses gfapi to connect each VM to 
the disk, not FUSE. Maybe this is important to find the cause.


I had a similar issue while moving machines from an old gluster volume 
to a new volume with sharding enabled and I added new bricks to it.


Xavi

On 17/10/16 08:46, Kevin Lemonnier wrote:


   I see that network.ping-timeout on your setup is 15 seconds andA  that's
   too low. Could you reconfigure that to 30 seconds?



Yes, I can. I set it to 15 to be sure no browser would timeout when trying to 
load
a website on a frozen VM during the timeout, 15 seemed pretty good since it just
feels like the website was a bit slow, which happens. I guess 30 should still 
work,
do you think 15 could cause problems ? We've had that on our clusters for a few 
months
already without noticing anything. The heals are totally transparent now so I 
figured
I don't really mind if it heals everytime there is a little lag ..



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Krutika Dhananjay

Agreed.
I will run the same test on an actual vm setup one of these days and
see if I manage to recreate the issue (after I have completed some
of my long pending tasks). Meanwhile if any of you find a consistent simpler
test case to hit the issue, feel free to reply on this thread. At least I
had no success
in recreating the bug in a non-VM-store setup.

-Krutika

On Mon, Oct 17, 2016 at 12:50 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> Il 14 ott 2016 17:37, "David Gossage"  ha
> scritto:
> >
> > Sorry to resurrect an old email but did any resolution occur for this or
> a cause found?  I just see this as a potential task I may need to also run
> through some day and if their are pitfalls to watch for would be good to
> know.
> >
>
> I think that the issue wrote in these emails must be addressed in some way.
> It's really bad that adding bricks to a cluster lead to data corruption as
> adding bricks is a standard administration task
>
> I hope that the issue will be detected and fixed asap.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-17 Thread Gandalf Corvotempesta

Il 14 ott 2016 17:37, "David Gossage"  ha
scritto:
>
> Sorry to resurrect an old email but did any resolution occur for this or
a cause found?  I just see this as a potential task I may need to also run
through some day and if their are pitfalls to watch for would be good to
know.
>

I think that the issue wrote in these emails must be addressed in some way.
It's really bad that adding bricks to a cluster lead to data corruption as
adding bricks is a standard administration task

I hope that the issue will be detected and fixed asap.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-17 Thread Kevin Lemonnier

> 
>I see that network.ping-timeout on your setup is 15 seconds andA  that's
>too low. Could you reconfigure that to 30 seconds?
> 

Yes, I can. I set it to 15 to be sure no browser would timeout when trying to 
load
a website on a frozen VM during the timeout, 15 seemed pretty good since it just
feels like the website was a bit slow, which happens. I guess 30 should still 
work,
do you think 15 could cause problems ? We've had that on our clusters for a few 
months
already without noticing anything. The heals are totally transparent now so I 
figured
I don't really mind if it heals everytime there is a little lag ..

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-17 Thread Kevin Lemonnier

On Fri, Oct 14, 2016 at 10:37:03AM -0500, David Gossage wrote:
>Sorry to resurrect an old email but did any resolution occur for this or a
>cause found?A  I just see this as a potential task I may need to also run
>through some day and if their are pitfalls to watch for would be good to
>know.

Unfortunatly no, I ended up restoring almost all the VMs from backups then
we created two small clusters instead of a big one, and I guess we'll keep
creating 3 bricks cluster when needed for now.

Maybe just make sure you are running > 3.7.12, and if possible test it
on a non-production environment first. Still. hard to replicate the
same load for tests ..

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111

signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-16 Thread Krutika Dhananjay

Hi,

No. I did run add-brick on a volume with the same configuration as that of
Kevin, while IO was running, except
that I wasn't running VM workload. I compared the file checksums wrt the
original src files from which they were copied
and they matched.


@Kevin,

I see that network.ping-timeout on your setup is 15 seconds and  that's too
low. Could you reconfigure that to 30 seconds?

-Krutika

On Fri, Oct 14, 2016 at 9:07 PM, David Gossage 
wrote:

> Sorry to resurrect an old email but did any resolution occur for this or a
> cause found?  I just see this as a potential task I may need to also run
> through some day and if their are pitfalls to watch for would be good to
> know.
>
> *David Gossage*
> *Carousel Checks Inc. | System Administrator*
> *Office* 708.613.2284
>
> On Tue, Sep 6, 2016 at 5:38 AM, Kevin Lemonnier 
> wrote:
>
>> Hi,
>>
>> Here is the info :
>>
>> Volume Name: VMs
>> Type: Replicate
>> Volume ID: c5272382-d0c8-4aa4-aced-dd25a064e45c
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: ips4adm.name:/mnt/storage/VMs
>> Brick2: ips5adm.name:/mnt/storage/VMs
>> Brick3: ips6adm.name:/mnt/storage/VMs
>> Options Reconfigured:
>> performance.readdir-ahead: on
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> features.shard: on
>> features.shard-block-size: 64MB
>> cluster.data-self-heal-algorithm: full
>> network.ping-timeout: 15
>>
>>
>> For the logs I'm sending that over to you in private.
>>
>>
>> On Tue, Sep 06, 2016 at 09:48:07AM +0530, Krutika Dhananjay wrote:
>> >Could you please attach the glusterfs client and brick logs?
>> >Also provide output of `gluster volume info`.
>> >-Krutika
>> >On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier <
>> lemonni...@ulrar.net>
>> >wrote:
>> >
>> >  >A  A  - What was the original (and current) geometry? (status and
>> info)
>> >
>> >  It was a 1x3 that I was trying to bump to 2x3.
>> >  >A  A  - what parameters did you use when adding the bricks?
>> >  >
>> >
>> >  Just a simple add-brick node1:/path node2:/path node3:/path
>> >  Then a fix-layout when everything started going wrong.
>> >
>> >  I was able to salvage some VMs by stopping them then starting them
>> >  again,
>> >  but most won't start for various reasons (disk corrupted, grub not
>> found
>> >  ...).
>> >  For those we are deleting the disks then importing them from
>> backups,
>> >  that's
>> >  a huge loss but everything has been down for so long, no choice ..
>> >  >A  A  On 6/09/2016 8:00 AM, Kevin Lemonnier wrote:
>> >  >
>> >  >A  I tried a fix-layout, and since that didn't work I removed the
>> brick
>> >  (start then commit when it showed
>> >  >A  completed). Not better, the volume is now running on the 3
>> original
>> >  bricks (replica 3) but the VMs
>> >  >A  are still corrupted. I have 880 Mb of shards on the bricks I
>> removed
>> >  for some reason, thos shards do exist
>> >  >A  (and are bigger) on the "live" volume. I don't understand why
>> now
>> >  that I have removed the new bricks
>> >  >A  everything isn't working like before ..
>> >  >
>> >  >A  On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier
>> wrote:
>> >  >
>> >  >A  Hi,
>> >  >
>> >  >A  I just added 3 bricks to a volume and all the VMs are doing I/O
>> >  errors now.
>> >  >A  I rebooted a VM to see and it can't start again, am I missing
>> >  something ? Is the reblance required
>> >  >A  to make everything run ?
>> >  >
>> >  >A  That's urgent, thanks.
>> >  >
>> >  >A  --
>> >  >A  Kevin Lemonnier
>> >  >A  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>> >  >
>> >  >
>> >  >
>> >  >
>> >  >A  ___
>> >  >A  Gluster-users mailing list
>> >  >A  Gluster-users@gluster.org
>> >  >A  http://www.gluster.org/mailman/listinfo/gluster-users
>> >  >
>> >  >
>> >  >
>> >  >A  ___
>> >  >A  Gluster-users mailing list
>> >  >A  Gluster-users@gluster.org
>> >  >A  http://www.gluster.org/mailman/listinfo/gluster-users
>> >  >
>> >  >A  --
>> >  >A  Lindsay Mathieson
>> >
>> >  > ___
>> >  > Gluster-users mailing list
>> >  > Gluster-users@gluster.org
>> >  > http://www.gluster.org/mailman/listinfo/gluster-users
>> >
>> >  --
>> >  Kevin Lemonnier
>> >  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>> >  ___
>> >  Gluster-users mailing list
>> >

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-14 Thread David Gossage

Sorry to resurrect an old email but did any resolution occur for this or a
cause found?  I just see this as a potential task I may need to also run
through some day and if their are pitfalls to watch for would be good to
know.

*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office* 708.613.2284

On Tue, Sep 6, 2016 at 5:38 AM, Kevin Lemonnier 
wrote:

> Hi,
>
> Here is the info :
>
> Volume Name: VMs
> Type: Replicate
> Volume ID: c5272382-d0c8-4aa4-aced-dd25a064e45c
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: ips4adm.name:/mnt/storage/VMs
> Brick2: ips5adm.name:/mnt/storage/VMs
> Brick3: ips6adm.name:/mnt/storage/VMs
> Options Reconfigured:
> performance.readdir-ahead: on
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> features.shard: on
> features.shard-block-size: 64MB
> cluster.data-self-heal-algorithm: full
> network.ping-timeout: 15
>
>
> For the logs I'm sending that over to you in private.
>
>
> On Tue, Sep 06, 2016 at 09:48:07AM +0530, Krutika Dhananjay wrote:
> >Could you please attach the glusterfs client and brick logs?
> >Also provide output of `gluster volume info`.
> >-Krutika
> >On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier  >
> >wrote:
> >
> >  >A  A  - What was the original (and current) geometry? (status and
> info)
> >
> >  It was a 1x3 that I was trying to bump to 2x3.
> >  >A  A  - what parameters did you use when adding the bricks?
> >  >
> >
> >  Just a simple add-brick node1:/path node2:/path node3:/path
> >  Then a fix-layout when everything started going wrong.
> >
> >  I was able to salvage some VMs by stopping them then starting them
> >  again,
> >  but most won't start for various reasons (disk corrupted, grub not
> found
> >  ...).
> >  For those we are deleting the disks then importing them from
> backups,
> >  that's
> >  a huge loss but everything has been down for so long, no choice ..
> >  >A  A  On 6/09/2016 8:00 AM, Kevin Lemonnier wrote:
> >  >
> >  >A  I tried a fix-layout, and since that didn't work I removed the
> brick
> >  (start then commit when it showed
> >  >A  completed). Not better, the volume is now running on the 3
> original
> >  bricks (replica 3) but the VMs
> >  >A  are still corrupted. I have 880 Mb of shards on the bricks I
> removed
> >  for some reason, thos shards do exist
> >  >A  (and are bigger) on the "live" volume. I don't understand why
> now
> >  that I have removed the new bricks
> >  >A  everything isn't working like before ..
> >  >
> >  >A  On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier wrote:
> >  >
> >  >A  Hi,
> >  >
> >  >A  I just added 3 bricks to a volume and all the VMs are doing I/O
> >  errors now.
> >  >A  I rebooted a VM to see and it can't start again, am I missing
> >  something ? Is the reblance required
> >  >A  to make everything run ?
> >  >
> >  >A  That's urgent, thanks.
> >  >
> >  >A  --
> >  >A  Kevin Lemonnier
> >  >A  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> >  >
> >  >
> >  >
> >  >
> >  >A  ___
> >  >A  Gluster-users mailing list
> >  >A  Gluster-users@gluster.org
> >  >A  http://www.gluster.org/mailman/listinfo/gluster-users
> >  >
> >  >
> >  >
> >  >A  ___
> >  >A  Gluster-users mailing list
> >  >A  Gluster-users@gluster.org
> >  >A  http://www.gluster.org/mailman/listinfo/gluster-users
> >  >
> >  >A  --
> >  >A  Lindsay Mathieson
> >
> >  > ___
> >  > Gluster-users mailing list
> >  > Gluster-users@gluster.org
> >  > http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >  --
> >  Kevin Lemonnier
> >  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> >  ___
> >  Gluster-users mailing list
> >  Gluster-users@gluster.org
> >  http://www.gluster.org/mailman/listinfo/gluster-users
>
> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-09-06 Thread Kevin Lemonnier

Hi,

Here is the info :

Volume Name: VMs
Type: Replicate
Volume ID: c5272382-d0c8-4aa4-aced-dd25a064e45c
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: ips4adm.name:/mnt/storage/VMs
Brick2: ips5adm.name:/mnt/storage/VMs
Brick3: ips6adm.name:/mnt/storage/VMs
Options Reconfigured:
performance.readdir-ahead: on
cluster.quorum-type: auto
cluster.server-quorum-type: server
network.remote-dio: enable
cluster.eager-lock: enable
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
features.shard: on
features.shard-block-size: 64MB
cluster.data-self-heal-algorithm: full
network.ping-timeout: 15


For the logs I'm sending that over to you in private.


On Tue, Sep 06, 2016 at 09:48:07AM +0530, Krutika Dhananjay wrote:
>Could you please attach the glusterfs client and brick logs?
>Also provide output of `gluster volume info`.
>-Krutika
>On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier 
>wrote:
> 
>  >A  A  - What was the original (and current) geometry? (status and info)
> 
>  It was a 1x3 that I was trying to bump to 2x3.
>  >A  A  - what parameters did you use when adding the bricks?
>  >
> 
>  Just a simple add-brick node1:/path node2:/path node3:/path
>  Then a fix-layout when everything started going wrong.
> 
>  I was able to salvage some VMs by stopping them then starting them
>  again,
>  but most won't start for various reasons (disk corrupted, grub not found
>  ...).
>  For those we are deleting the disks then importing them from backups,
>  that's
>  a huge loss but everything has been down for so long, no choice ..
>  >A  A  On 6/09/2016 8:00 AM, Kevin Lemonnier wrote:
>  >
>  >A  I tried a fix-layout, and since that didn't work I removed the brick
>  (start then commit when it showed
>  >A  completed). Not better, the volume is now running on the 3 original
>  bricks (replica 3) but the VMs
>  >A  are still corrupted. I have 880 Mb of shards on the bricks I removed
>  for some reason, thos shards do exist
>  >A  (and are bigger) on the "live" volume. I don't understand why now
>  that I have removed the new bricks
>  >A  everything isn't working like before ..
>  >
>  >A  On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier wrote:
>  >
>  >A  Hi,
>  >
>  >A  I just added 3 bricks to a volume and all the VMs are doing I/O
>  errors now.
>  >A  I rebooted a VM to see and it can't start again, am I missing
>  something ? Is the reblance required
>  >A  to make everything run ?
>  >
>  >A  That's urgent, thanks.
>  >
>  >A  --
>  >A  Kevin Lemonnier
>  >A  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>  >
>  >
>  >
>  >
>  >A  ___
>  >A  Gluster-users mailing list
>  >A  Gluster-users@gluster.org
>  >A  http://www.gluster.org/mailman/listinfo/gluster-users
>  >
>  >
>  >
>  >A  ___
>  >A  Gluster-users mailing list
>  >A  Gluster-users@gluster.org
>  >A  http://www.gluster.org/mailman/listinfo/gluster-users
>  >
>  >A  --
>  >A  Lindsay Mathieson
> 
>  > ___
>  > Gluster-users mailing list
>  > Gluster-users@gluster.org
>  > http://www.gluster.org/mailman/listinfo/gluster-users
> 
>  --
>  Kevin Lemonnier
>  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>  ___
>  Gluster-users mailing list
>  Gluster-users@gluster.org
>  http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-09-05 Thread Krutika Dhananjay

Could you please attach the glusterfs client and brick logs?
Also provide output of `gluster volume info`.

-Krutika

On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier 
wrote:

> >- What was the original (and current) geometry? (status and info)
>
> It was a 1x3 that I was trying to bump to 2x3.
>
> >- what parameters did you use when adding the bricks?
> >
>
> Just a simple add-brick node1:/path node2:/path node3:/path
> Then a fix-layout when everything started going wrong.
>
>
> I was able to salvage some VMs by stopping them then starting them again,
> but most won't start for various reasons (disk corrupted, grub not found
> ...).
> For those we are deleting the disks then importing them from backups,
> that's
> a huge loss but everything has been down for so long, no choice ..
>
> >On 6/09/2016 8:00 AM, Kevin Lemonnier wrote:
> >
> >  I tried a fix-layout, and since that didn't work I removed the brick
> (start then commit when it showed
> >  completed). Not better, the volume is now running on the 3 original
> bricks (replica 3) but the VMs
> >  are still corrupted. I have 880 Mb of shards on the bricks I removed
> for some reason, thos shards do exist
> >  (and are bigger) on the "live" volume. I don't understand why now that
> I have removed the new bricks
> >  everything isn't working like before ..
> >
> >  On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier wrote:
> >
> >  Hi,
> >
> >  I just added 3 bricks to a volume and all the VMs are doing I/O errors
> now.
> >  I rebooted a VM to see and it can't start again, am I missing something
> ? Is the reblance required
> >  to make everything run ?
> >
> >  That's urgent, thanks.
> >
> >  --
> >  Kevin Lemonnier
> >  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> >
> >
> >
> >
> >  ___
> >  Gluster-users mailing list
> >  Gluster-users@gluster.org
> >  http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> >  ___
> >  Gluster-users mailing list
> >  Gluster-users@gluster.org
> >  http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >  --
> >  Lindsay Mathieson
>
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
>
> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-09-05 Thread Kevin Lemonnier

>- What was the original (and current) geometry? (status and info)

It was a 1x3 that I was trying to bump to 2x3.

>- what parameters did you use when adding the bricks?
>

Just a simple add-brick node1:/path node2:/path node3:/path
Then a fix-layout when everything started going wrong.


I was able to salvage some VMs by stopping them then starting them again,
but most won't start for various reasons (disk corrupted, grub not found ...).
For those we are deleting the disks then importing them from backups, that's
a huge loss but everything has been down for so long, no choice ..

>On 6/09/2016 8:00 AM, Kevin Lemonnier wrote:
> 
>  I tried a fix-layout, and since that didn't work I removed the brick (start 
> then commit when it showed
>  completed). Not better, the volume is now running on the 3 original bricks 
> (replica 3) but the VMs
>  are still corrupted. I have 880 Mb of shards on the bricks I removed for 
> some reason, thos shards do exist
>  (and are bigger) on the "live" volume. I don't understand why now that I 
> have removed the new bricks
>  everything isn't working like before ..
> 
>  On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier wrote:
> 
>  Hi,
> 
>  I just added 3 bricks to a volume and all the VMs are doing I/O errors now.
>  I rebooted a VM to see and it can't start again, am I missing something ? Is 
> the reblance required
>  to make everything run ?
> 
>  That's urgent, thanks.
> 
>  --
>  Kevin Lemonnier
>  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> 
> 
> 
> 
>  ___
>  Gluster-users mailing list
>  Gluster-users@gluster.org
>  http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 
> 
>  ___
>  Gluster-users mailing list
>  Gluster-users@gluster.org
>  http://www.gluster.org/mailman/listinfo/gluster-users
> 
>  --
>  Lindsay Mathieson

> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users


-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-09-05 Thread Lindsay Mathieson


Sorry, no answers :( but probably useful to post some more info

- What was the original (and current) geometry? (status and info)
- what parameters did you use when adding the bricks?

On 6/09/2016 8:00 AM, Kevin Lemonnier wrote:

I tried a fix-layout, and since that didn't work I removed the brick (start 
then commit when it showed
completed). Not better, the volume is now running on the 3 original bricks 
(replica 3) but the VMs
are still corrupted. I have 880 Mb of shards on the bricks I removed for some 
reason, thos shards do exist
(and are bigger) on the "live" volume. I don't understand why now that I have 
removed the new bricks
everything isn't working like before ..

On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier wrote:

Hi,

I just added 3 bricks to a volume and all the VMs are doing I/O errors now.
I rebooted a VM to see and it can't start again, am I missing something ? Is 
the reblance required
to make everything run ?

That's urgent, thanks.

--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111




___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users



--
Lindsay Mathieson

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-09-05 Thread Kevin Lemonnier

I tried a fix-layout, and since that didn't work I removed the brick (start 
then commit when it showed
completed). Not better, the volume is now running on the 3 original bricks 
(replica 3) but the VMs
are still corrupted. I have 880 Mb of shards on the bricks I removed for some 
reason, thos shards do exist
(and are bigger) on the "live" volume. I don't understand why now that I have 
removed the new bricks
everything isn't working like before ..

On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier wrote:
> Hi,
> 
> I just added 3 bricks to a volume and all the VMs are doing I/O errors now.
> I rebooted a VM to see and it can't start again, am I missing something ? Is 
> the reblance required
> to make everything run ?
> 
> That's urgent, thanks.
> 
> -- 
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111

> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111

signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

42 matches

Mail list logo