Awesome. Thanks for the logs. Will take a look.
-Krutika
On Sun, Oct 23, 2016 at 5:47 AM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:
> On 20/10/2016 9:13 PM, Krutika Dhananjay wrote:
>
>> It would be awesome if you could tell us whether you
>> see the issue with FUSE as well, while
On 20/10/2016 9:13 PM, Krutika Dhananjay wrote:
It would be awesome if you could tell us whether you
see the issue with FUSE as well, while we get around
to setting up the environment and running the test ourselves.
I just managed to replicate the exact same error using the fuse mount
--
Linds
On 20/10/2016 8:43 AM, Joe Julian wrote:
Is there a bug open with the client and server logs attached? I would
take a stab at reproducing as well, but I want to make sure I'm
comparing apples to apples.
https://bugzilla.redhat.com/show_bug.cgi?id=1387878
--
Lindsay Mathieson
On 21/10/2016 12:39 PM, Lindsay Mathieson wrote:
And now I have it all setup for logging etc I can't reproduce the error:(
Ah, figured out what I was doing different - started the heavy I/O
before the add bricks and rebalance which reliably triggers a "volume
rebalance: teststore1: failed: An
And now I have it all setup for logging etc I can't reproduce the error :(
Though I did manage to score a "volume rebalance: teststore1: failed:
Another transaction is in progress for teststore1. Please try again
after sometime" problem. No gluster commands would work after that. I
had to restart
Thanks a lot, Lindsay! Appreciate the help.
It would be awesome if you could tell us whether you
see the issue with FUSE as well, while we get around
to setting up the environment and running the test ourselves.
-Krutika
On Thu, Oct 20, 2016 at 2:57 AM, Lindsay Mathieson <
lindsay.mathie...@gmai
Hi,
This is scary stuff. While not as scary, you might confirm a bug that I
reported a while back on your test systems:
https://bugzilla.redhat.com/show_bug.cgi?id=1370832
Cheers,
Hans Henrik
On 19-10-2016 08:40, Krutika Dhananjay wrote:
Agreed.
I will run the same test on an actual vm se
On 20/10/2016 10:03 AM, Joe Julian wrote:
Heh, well then a little less scary might be:
find /var/log/glusterfs -type f
Then if the list looks correct you can easily
find /var/log/glusterfs -type f | xargs truncate --size=0
Thanks, I'll script that up.
Will put together a test plan and
On 10/19/2016 04:59 PM, Lindsay Mathieson wrote:
On 20/10/2016 9:48 AM, Joe Julian wrote:
Personally, with logrotate I use copytruncate. If I want to truncate
the whole shebang manually:
find -type f -exec truncate --size=0 {} \;
Eeep! that looks scary, I just see me running that from
On 20/10/2016 9:48 AM, Joe Julian wrote:
Personally, with logrotate I use copytruncate. If I want to truncate
the whole shebang manually:
find -type f -exec truncate --size=0 {} \;
Eeep! that looks scary, I just see me running that from root at 3am.
--
Lindsay Mathieson
On 10/19/2016 04:46 PM, Lindsay Mathieson wrote:
On 20/10/2016 9:30 AM, Joe Julian wrote:
Joe, when you say truncated, just delete the logs before the test is
started?
That's one way of doing it, yes. :)
nb: I've noticed that if you delete the logs while gluster is running
they won't be r
On 20/10/2016 9:30 AM, Joe Julian wrote:
Joe, when you say truncated, just delete the logs before the test is
started?
That's one way of doing it, yes. :)
nb: I've noticed that if you delete the logs while gluster is running
they won't be recreated, you have to restart the services. Is the
On 20/10/2016 9:31 AM, Joe Julian wrote:
That makes me think, too, that it might be nice to see what kind of VM
proxmox creates. I assume it uses libvirt so a dumpxml of an affected
VM might be telling as well
Unfortunately not, they use their own config file and build a cmd line
from that, b
On 10/19/2016 04:21 PM, Kevin Lemonnier wrote:
If you have clean logs (truncated before you reproduced) everything from
/var/log/glusterfs from the client and all the servers would be great.
I'm assuming that proxmox doesn't redirect the client logs to some other
place.
No client logs with prox
On 10/19/2016 04:28 PM, Lindsay Mathieson wrote:
On 20/10/2016 9:21 AM, Kevin Lemonnier wrote:
No client logs with proxmox unfortunatly, you need to start qemu by hand
to get those as far as I know.
Ah, those logs! forgot about that.
I'll replicate it again soo. Might take a couple of days a
On 20/10/2016 9:21 AM, Kevin Lemonnier wrote:
No client logs with proxmox unfortunatly, you need to start qemu by hand
to get those as far as I know.
Ah, those logs! forgot about that.
I'll replicate it again soo. Might take a couple of days and create a
bug report.
If you have clean log
>
> If you have clean logs (truncated before you reproduced) everything from
> /var/log/glusterfs from the client and all the servers would be great.
> I'm assuming that proxmox doesn't redirect the client logs to some other
> place.
>
No client logs with proxmox unfortunatly, you need to sta
On 10/19/2016 04:04 PM, Lindsay Mathieson wrote:
On 20/10/2016 8:43 AM, Joe Julian wrote:
Is there a bug open with the client and server logs attached? I would
take a stab at reproducing as well, but I want to make sure I'm
comparing apples to apples.
Which log names are those (sorry). I have
On 20/10/2016 8:43 AM, Joe Julian wrote:
Is there a bug open with the client and server logs attached? I would
take a stab at reproducing as well, but I want to make sure I'm
comparing apples to apples.
Which log names are those (sorry). I have all the logs from this
mornings trial available.
Is there a bug open with the client and server logs attached? I would
take a stab at reproducing as well, but I want to make sure I'm
comparing apples to apples.
On 10/19/2016 02:39 PM, Kevin Lemonnier wrote:
Looked like it was going ok for a while, then blew up. The first windows
vm which wa
>
> Looked like it was going ok for a while, then blew up. The first windows
> vm which was running diskmark died and won't boot. qemu-img check shows
> the image hopelessly corrupted. 2nd VM has also crashed and is
> unbootable, though qemuimg shows the qcow2 file as ok.
>
Ha, glad you could
On 20/10/2016 7:01 AM, Kevin Lemonnier wrote:
Yes, you need to add a full replica set at once.
I don't remember, but according to my history, looks like I've used this :
gluster volume add-brick VMs host1:/brick host2:/brick host3:/brick force
(I have the same without force just before that, so
>
> Ok, I'll give that a go
> - what command do you use?
> - I think from memory the extra bricks have to be added in blocks of three?
>
Yes, you need to add a full replica set at once.
I don't remember, but according to my history, looks like I've used this :
gluster volume add-brick VMs host1
On 20/10/2016 6:39 AM, Kevin Lemonnier wrote:
Did you add a third brick in replication, if I understand it correctly ?
Yup, took it from rep 2 to rep 3
That's not the problem I had, that does seem to be working fine.
What broke was adding 3 new bricks to a replica 3, bumping it from 1 x 3
to
>
> I had a go at reproducing it last night Kevin with 3.8.4
>
> - new volume
>
> - Initial 2 bricks on two hosts
>
> - copied a windows VM on to it
>
> - Started some load (Crystal DiskMark in the VM)
>
> - Added a 3rd brick and node
>
For the size it's already more that what I have, most
On 19/10/2016 11:08 PM, Kevin Lemonnier wrote:
Yes, to be honest I wasn't even planning on rebalancing just yet, I was planning
on letting it run a few days before, see if everything's fine, and then maybe
rebalance. I tried the rebalance when everything came crashing down, hoping that
might fix
>
> As Kevin said, the problem appeared before rebalancing if I understood
> correctly.
>
Yes, to be honest I wasn't even planning on rebalancing just yet, I was planning
on letting it run a few days before, see if everything's fine, and then maybe
rebalance. I tried the rebalance when everythi
On 19/10/16 14:59, Gandalf Corvotempesta wrote:
Il 19 ott 2016 14:32, "Xavier Hernandez" mailto:xhernan...@datalab.es>> ha scritto:
I had a similar issue while moving machines from an old gluster volume
to a new volume with sharding enabled and I added new bricks to it.
Maybe related to rebal
Il 19 ott 2016 14:32, "Xavier Hernandez" ha scritto:
> I had a similar issue while moving machines from an old gluster volume to
a new volume with sharding enabled and I added new bricks to it.
Maybe related to rebalance after adding bricks on a sharded volume?
maybe that some shareds are moved a
> are you using Proxmox, right ?
>
Yes, indeed.
> I think it's important because Proxmox uses gfapi to connect each VM to
> the disk, not FUSE. Maybe this is important to find the cause.
I believe so yes, and I should add (I believe I mentionned it) that I am using
GlusterFS 3.7.12. It took a
Hi Kevin,
are you using Proxmox, right ?
I think it's important because Proxmox uses gfapi to connect each VM to
the disk, not FUSE. Maybe this is important to find the cause.
I had a similar issue while moving machines from an old gluster volume
to a new volume with sharding enabled and I a
Agreed.
I will run the same test on an actual vm setup one of these days and
see if I manage to recreate the issue (after I have completed some
of my long pending tasks). Meanwhile if any of you find a consistent simpler
test case to hit the issue, feel free to reply on this thread. At least I
had
Il 14 ott 2016 17:37, "David Gossage" ha
scritto:
>
> Sorry to resurrect an old email but did any resolution occur for this or
a cause found? I just see this as a potential task I may need to also run
through some day and if their are pitfalls to watch for would be good to
know.
>
I think that t
>
>I see that network.ping-timeout on your setup is 15 seconds andA that's
>too low. Could you reconfigure that to 30 seconds?
>
Yes, I can. I set it to 15 to be sure no browser would timeout when trying to
load
a website on a frozen VM during the timeout, 15 seemed pretty good since i
On Fri, Oct 14, 2016 at 10:37:03AM -0500, David Gossage wrote:
>Sorry to resurrect an old email but did any resolution occur for this or a
>cause found?A I just see this as a potential task I may need to also run
>through some day and if their are pitfalls to watch for would be good to
Hi,
No. I did run add-brick on a volume with the same configuration as that of
Kevin, while IO was running, except
that I wasn't running VM workload. I compared the file checksums wrt the
original src files from which they were copied
and they matched.
@Kevin,
I see that network.ping-timeout on
Sorry to resurrect an old email but did any resolution occur for this or a
cause found? I just see this as a potential task I may need to also run
through some day and if their are pitfalls to watch for would be good to
know.
*David Gossage*
*Carousel Checks Inc. | System Administrator*
*Office*
Hi,
Here is the info :
Volume Name: VMs
Type: Replicate
Volume ID: c5272382-d0c8-4aa4-aced-dd25a064e45c
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: ips4adm.name:/mnt/storage/VMs
Brick2: ips5adm.name:/mnt/storage/VMs
Brick3: ips6adm.name:/mnt/storage/VMs
Options
Could you please attach the glusterfs client and brick logs?
Also provide output of `gluster volume info`.
-Krutika
On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier
wrote:
> >- What was the original (and current) geometry? (status and info)
>
> It was a 1x3 that I was trying to bump to 2x3.
>- What was the original (and current) geometry? (status and info)
It was a 1x3 that I was trying to bump to 2x3.
>- what parameters did you use when adding the bricks?
>
Just a simple add-brick node1:/path node2:/path node3:/path
Then a fix-layout when everything started going wrong.
Sorry, no answers :( but probably useful to post some more info
- What was the original (and current) geometry? (status and info)
- what parameters did you use when adding the bricks?
On 6/09/2016 8:00 AM, Kevin Lemonnier wrote:
I tried a fix-layout, and since that didn't work I removed the bri
I tried a fix-layout, and since that didn't work I removed the brick (start
then commit when it showed
completed). Not better, the volume is now running on the 3 original bricks
(replica 3) but the VMs
are still corrupted. I have 880 Mb of shards on the bricks I removed for some
reason, thos sha
Hi,
I just added 3 bricks to a volume and all the VMs are doing I/O errors now.
I rebooted a VM to see and it can't start again, am I missing something ? Is
the reblance required
to make everything run ?
That's urgent, thanks.
--
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
sig
43 matches
Mail list logo