Re: [ovirt-users] Ovirt/Gluster

Sander Hoentjen Fri, 21 Aug 2015 04:03:43 -0700


On 08/21/2015 11:30 AM, Ravishankar N wrote:

On 08/21/2015 01:21 PM, Sander Hoentjen wrote:
On 08/21/2015 09:28 AM, Ravishankar N wrote:
On 08/20/2015 02:14 PM, Sander Hoentjen wrote:
On 08/19/2015 09:04 AM, Ravishankar N wrote:
On 08/18/2015 04:22 PM, Ramesh Nachimuthu wrote:
+ Ravi from gluster.

Regards,
Ramesh

----- Original Message -----
From: "Sander Hoentjen" <san...@hoentjen.eu>
To: users@ovirt.org
Sent: Tuesday, August 18, 2015 3:30:35 PM
Subject: [ovirt-users] Ovirt/Gluster

Hi,
We are looking for some easy to manage self contained VM hosting.Ovirtwith GlusterFS seems to fit that bill perfectly. I installed itand thenstarting kicking the tires. First results looked promising, butnow I
can get a VM to pause indefinitely fairly easy:
My setup is 3 hosts that are in a Virt and Gluster cluster.Gluster issetup as replica-3. The gluster export is used as the storagedomain for
the VM's.
Hi,

What version of gluster and ovirt are you using?
glusterfs-3.7.3-1.el7.x86_64
vdsm-4.16.20-0.el7.centos.x86_64
ovirt-engine-3.5.3.1-1.el7.centos.noarch
Now when I start the VM all is good, performance is good enoughso we
are happy. I then start bonnie++ to generate some load. I have a VM
running on host 1, host 2 is SPM and all 3 VM's are seeing somenetwork
traffic courtesy of gluster.
Now, for fun, suddenly the network on host3 goes bad (iptables -IOUTPUT
-m statistic --mode random --probability 0.75 -j REJECT).
Some time later I see the guest has a small "hickup", I'mguessing thatis when gluster decides host 3 is not allowed to play anymore. Nobig
deal anyway.
After a while 25% of packages just isn't good enough for Ovirtanymore,
so the host will be fenced.
I'm not sure what fencing means w.r.t ovirt and what it actuallyfences. As far is gluster is concerned, since only one node isblocked, the VM image should still be accessible by the VM runningon host1.
Fencing means (at least in this case) that the IPMI of the serverdoes a power reset.
After a reboot *sometimes* the VM will be
paused, and even after the gluster self-heal is complete it cannot be
unpaused, has to be restarted.
Could you provide the gluster mount (fuse?) logs and the bricklogs of all 3 nodes when the VM is paused? That should give ussome clue.
Logs are attached. Problem was at around 8:15 - 8:20 UTC
This time however the vm stopped even without a reboot of hyp03
The mount logs (rhev-data-center-mnt-glusterSD*) are indicatingfrequent disconnects to the bricks with 'clnt_ping_timer_expired','Client-quorum is not met' and 'Read-only file system' messages.client-quorum is enabled by default for replica 3 volumes. So if themount cannot connect to 2 bricks at least, quorum is lost and thegluster volume becomes read-only. That seems to be the reason whythe VMs are pausing.I'm not sure if the frequent disconnects are due a flaky network orthe bricks not responding to the mount's ping timer due to it'sepoll threads busy with I/O (unlikely). Can you also share theoutput of `gluster volume info <volname>` ?
The frequent disconnects are probably because I intentionally brokethe network on hyp03 (dropped 75% of outgoing packets). In my opinionthis should not affect the VM an hyp02. Am I wrong to think that?
For client-quorum, If a client (mount) cannot connect to the numberof bricks to achieve quorum, the client becomes read-only. So if theclient on hyp02 can see itself and 01, it shouldn't be affected.

But it was, and I only "broke" hyp03.

[root@hyp01 ~]# gluster volume info VMS

Volume Name: VMS
Type: Replicate
Volume ID: 9e6657e7-8520-4720-ba9d-78b14a86c8ca
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.99.50.20:/brick/VMS
Brick2: 10.99.50.21:/brick/VMS
Brick3: 10.99.50.22:/brick/VMS
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: on
user.cifs: disable
auth.allow: *
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
I see that you have enabled server-quorum too. Since you blockedhyp03, the if the glusterd on that node cannot see the other 2 nodesdue to iptable rules, it would kill all brick processes. See the "7How To Test " section inhttp://www.gluster.org/community/documentation/index.php/Features/Server-quorumto get a better idea of server-quorum.

Yes but it should only kill the bricks on hyp03, right? So then why doesthe VM on hyp02 die? I don't like the fact that a problem on any one ofthe hosts can bring down any VM on any host.


--
Sander
_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Re: [ovirt-users] Ovirt/Gluster

Reply via email to