[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

Edward Clay Mon, 09 Jul 2018 08:12:02 -0700

Just to add my .02 here.  I've opened a bug on this issue where HV/host 
connected to clusterfs volumes are running out of ram.  This seemed to be a bug 
fixed in gluster 3.13 but that patch doesn't seem to be avaiable any longer and 
3.12 is what ovirt is using.  For example I have a host that was showing 72% of 
memory consumption with 3 VMs running on it.  If I migrate those VMs to another 
Host memory consumption drops to 52%.  If i put this host into maintenance and 
then activate it it drops down to 2% or so.  Since I ran into this issue I've 
been manually watching memory consumption on each host and migrating VMs from 
it to others to keep things from dying.  I'm hoping with the announcement of 
gluster 3.12 end of life and the move to gluster 4.1 that this will get fixed 
or that the patch from 3.13 can get backported so this problem will go away.


https://bugzilla.redhat.com/show_bug.cgi?id=1593826

On 07/07/2018 11:49 AM, Jim Kusznir wrote:
**Security Notice - This external email is NOT from The Hut Group**

This host has NO VMs running on it, only 3 running cluster-wide (including the 
engine, which is on its own storage):

top - 10:44:41 up 1 day, 17:10,  1 user,  load average: 15.86, 14.33, 13.39
Tasks: 381 total,   1 running, 379 sleeping,   1 stopped,   0 zombie
%Cpu(s):  2.7 us,  2.1 sy,  0.0 ni, 89.0 id,  6.1 wa,  0.0 hi,  0.2 si,  0.0 st
KiB Mem : 32764284 total,   338232 free,   842324 used, 31583728 buff/cache
KiB Swap: 12582908 total, 12258660 free,   324248 used. 31076748 avail Mem

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
13279 root      20   0 2380708  37628   4396 S  51.7  0.1   3768:03 glusterfsd
13273 root      20   0 2233212  20460   4380 S  17.2  0.1 105:50.44 glusterfsd
13287 root      20   0 2233212  20608   4340 S   4.3  0.1  34:27.20 glusterfsd
16205 vdsm       0 -20 5048672  88940  13364 S   1.3  0.3   0:32.69 vdsmd
16300 vdsm      20   0  608488  25096   5404 S   1.3  0.1   0:05.78 python
1109 vdsm      20   0 3127696  44228   8552 S   0.7  0.1  18:49.76 
ovirt-ha-broker
25555 root      20   0       0      0      0 S   0.7  0.0   0:00.13 
kworker/u64:3
  10 root      20   0       0      0      0 S   0.3  0.0   4:22.36 rcu_sched
 572 root       0 -20       0      0      0 S   0.3  0.0   0:12.02 kworker/1:1H
 797 root      20   0       0      0      0 S   0.3  0.0   1:59.59 kdmwork-253:2
 877 root       0 -20       0      0      0 S   0.3  0.0   0:11.34 kworker/3:1H
1028 root      20   0       0      0      0 S   0.3  0.0   0:35.35 xfsaild/dm-10
1869 root      20   0 1496472  10540   6564 S   0.3  0.0   2:15.46 python
3747 root      20   0       0      0      0 D   0.3  0.0   0:01.21 kworker/u64:1
10979 root      15  -5  723504  15644   3920 S   0.3  0.0  22:46.27 glusterfs
15085 root      20   0  680884  10792   4328 S   0.3  0.0   0:01.13 glusterd
16102 root      15  -5 1204216  44948  11160 S   0.3  0.1   0:18.61 supervdsmd

At the moment, the engine is barely usable, my other VMs appear to be 
unresponsive.  Two on one host, one on another, and none on the third.



On Sat, Jul 7, 2018 at 10:38 AM, Jim Kusznir 
<[email protected]<mailto:[email protected]>> wrote:
I run 4-7 VMs, and most of them are 2GB ram.  I have 2 VMs with 4GB.

Ram hasn't been an issue until recent ovirt/gluster upgrades.  Storage has 
always been slow, especially with these drives.  However, even watching network 
utilization on my switch, the gig-e links never max out.

The loadavg issues and unresponsive behavior started with yesterday's ovirt 
updates.  I now have one VM with low I/O that lives on a separate storage volume 
(data, fully SSD backed instead of data-hdd, which was having the issues).  I 
moved it to a ovirt host with no other VMs on it, and that had reshly been 
rebooted.  Before it had this one VM on it, loadavg was >0.5.  Now its up in 
the 20's, with only one low Disk I/O, 4GB ram VM on the host.

This to me says there's now a new problem separate from Gluster.  I don't have 
any non-gluster storage available to test with.  I did notice that the last 
update included a new kernel, and it appears its the qemu-kvm processes that 
are consuming way more CPU than they used to now.

Are there any known issues?  I'm going to reboot into my previous kernel to see 
if its kernel-caused.

--Jim



On Fri, Jul 6, 2018 at 11:07 PM, Johan Bernhardsson 
<[email protected]<mailto:[email protected]>> wrote:
That is a single sata drive that is slow on random I/O and that has to be 
synced with 2 other servers. Gluster works syncronous so one write has to be 
written and acknowledged on all the three nodes.

So you have a bottle neck in io on drives and one on network and depending on 
how many virtual servers you have and how much ram they take you might have 
memory.

Load spikes when you have a wait somewhere and are overusing capacity. But it's 
now only CPU that load is counted on. It is waiting for resources so it can be 
memory or Network or drives.

How many virtual server do you run and how much ram do they consume?


On July 7, 2018 09:51:42 Jim Kusznir 
<[email protected]<mailto:[email protected]>> wrote:

In case it matters, the data-hdd gluster volume uses these hard drives:

https://www.amazon.com/gp/product/B01M1NHCZT/ref=oh_aui_detailpage_o05_s00?ie=UTF8&psc=1<https://www.amazon.com/gp/product/B01M1NHCZT/ref=oh_aui_detailpage_o05_s00?ie=UTF8&psc=1>

This is in a Dell R610 with PERC6/i (one drive per server, configured as a 
single drive volume to pass it through as its own /dev/sd* device).  Inside the 
OS, its partitioned with lvm_thin, then an lvm volume formatted with XFS and 
mounted as /gluster/brick3, with the data-hdd volume created inside that.

--Jim

On Fri, Jul 6, 2018 at 10:45 PM, Jim Kusznir 
<[email protected]<mailto:[email protected]>> wrote:
So, I'm still at a loss...It sounds like its either insufficient ram/swap, or 
insufficient network.  It seems to be neither now.  At this point, it appears that 
gluster is just "broke" and killing my systems for no descernable reason.  
Here's detals, all from the same system (currently running 3 VMs):

[root@ovirt3 ~]# w
22:26:53 up 36 days,  4:34,  1 user,  load average: 42.78, 55.98, 53.31
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    192.168.8.90     22:26    2.00s  0.12s  0.11s w

bwm-ng reports the highest data usage was about 6MB/s during this test (and 
that was combined; I have two different gig networks.  One gluster network 
(primary VM storage) runs on one, the other network handles everything else).

[root@ovirt3 ~]# free -m
             total        used        free      shared  buff/cache   available
Mem:          31996       13236         232          18       18526       18195
Swap:         16383        1475       14908

top - 22:32:56 up 36 days,  4:41,  1 user,  load average: 17.99, 39.69, 47.66
Tasks: 407 total,   1 running, 405 sleeping,   1 stopped,   0 zombie
%Cpu(s):  8.6 us,  2.1 sy,  0.0 ni, 87.6 id,  1.6 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 32764284 total,   228296 free, 13541952 used, 18994036 buff/cache
KiB Swap: 16777212 total, 15246200 free,  1531012 used. 18643960 avail Mem

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
30036 qemu      20   0 6872324   5.2g  13532 S 144.6 16.5 216:14.55 
/usr/libexec/qemu-kvm -name guest=BillingWin,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/v+
28501 qemu      20   0 5034968   3.6g  12880 S  16.2 11.7  73:44.99 
/usr/libexec/qemu-kvm -name guest=FusionPBX,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/va+
2694 root      20   0 2169224  12164   3108 S   5.0  0.0   3290:42 /usr/sbin/glusterfsd -s 
ovirt3.nwfiber.com<http://ovirt3.nwfiber.com> --volfile-id 
data.ovirt3.nwfiber.com<http://data.ovirt3.nwfiber.com>.gluster-brick2-data -p 
/var/run/+
14293 root      15  -5  944700  13356   4436 S   4.0  0.0  16:32.15 
/usr/sbin/glusterfs --volfile-server=192.168.8.11 --volfile-server=192.168.8.12 
--volfile-server=192.168.8.13 --+
25100 vdsm       0 -20 6747440 107868  12836 S   2.3  0.3  21:35.20 
/usr/bin/python2 /usr/share/vdsm/vdsmd
28971 qemu      20   0 2842592   1.5g  13548 S   1.7  4.7 241:46.49 
/usr/libexec/qemu-kvm -name 
guest=unifi.palousetech.com<http://unifi.palousetech.com>,debug-threads=on -S 
-object secret,id=masterKey0,format=+
12095 root      20   0  162276   2836   1868 R   1.3  0.0   0:00.25 top
2708 root      20   0 1906040  12404   3080 S   1.0  0.0   1083:33 /usr/sbin/glusterfsd -s 
ovirt3.nwfiber.com<http://ovirt3.nwfiber.com> --volfile-id 
engine.ovirt3.nwfiber.com<http://engine.ovirt3.nwfiber.com>.gluster-brick1-engine -p 
/var/+
28623 qemu      20   0 4749536   1.7g  12896 S   0.7  5.5   4:30.64 
/usr/libexec/qemu-kvm -name 
guest=billing.nwfiber.com<http://billing.nwfiber.com>,debug-threads=on -S 
-object secret,id=masterKey0,format=ra+
  10 root      20   0       0      0      0 S   0.3  0.0 215:54.72 [rcu_sched]
1030 sanlock   rt   0  773804  27908   2744 S   0.3  0.1  35:55.61 
/usr/sbin/sanlock daemon
1890 zabbix    20   0   83904   1696   1612 S   0.3  0.0  24:30.63 
/usr/sbin/zabbix_agentd: collector [idle 1 sec]
2722 root      20   0 1298004   6148   2580 S   0.3  0.0  38:10.82 /usr/sbin/glusterfsd -s 
ovirt3.nwfiber.com<http://ovirt3.nwfiber.com> --volfile-id 
iso.ovirt3.nwfiber.com<http://iso.ovirt3.nwfiber.com>.gluster-brick4-iso -p 
/var/run/gl+
6340 root      20   0       0      0      0 S   0.3  0.0   0:04.30 [kworker/7:0]
10652 root      20   0       0      0      0 S   0.3  0.0   0:00.23 
[kworker/u64:2]
14724 root      20   0 1076344  17400   3200 S   0.3  0.1  10:04.13 
/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p 
/var/run/gluster/glustershd/glustershd.pid<http://ustershd.pid> -+
22011 root      20   0       0      0      0 S   0.3  0.0   0:05.04 
[kworker/10:1]

Not sure why the system load dropped other than I was trying to take a picture 
of it :)

In any case, it appears that at this time, I have plenty of swap, ram, and 
network capacity, and yet things are still running very sluggish; I'm still 
getting e-mails from servers complaining about loss of communication with 
something or another; I still get e-mails from the engine about bad engine 
status, then recovery, etc.

I've shut down 2/3 of my VMs, too....just trying to keep the critical ones 
operating.

At this point, I don't believe the problem is the memory leak, but it seems to 
be triggered by the memory leak, as in all my problems started when I got low 
ram warnings from one of my 3 nodes and began recovery efforts from that.

I do really like the idea / concept behind glusterfs, but I really have to 
figure out why its been so poor performing from day one, and its caused 95% of 
my outages (including several large ones lately).  If I can get it stable, 
reliable, and well performing, then I'd love to keep it.  If I can't, then 
perhaps NFS is the way to go?  I don't like the single point of failure aspect 
of it, but my other NAS boxes I run for clients (central storage for windows 
boxes) have been very solid; If I could get that kind of reliability for my 
ovirt stack, it would be a substantial improvement.  Currently, it seems about 
every other month I have a gluster-induced outage.

Sometimes I wonder if its just hyperconverged is the issue, but my 
infrastructure doesn't justify three servers at the same location...I might be 
able to do two, but even that seems like its pushing it.

Looks like I can upgrade to 10G for about $900.  I can order a dual-Xeon 
supermicro 12-disk server, loaded with 2TB WD Enterprise disks and a pair of 
SSDs for the os, 32GB ram, 2.67Ghz CPUs for about $720 delivered.  I've got to 
do something to improve my reliability; I can't keep going the way I have 
been....

--Jim


On Fri, Jul 6, 2018 at 9:13 PM, Johan Bernhardsson 
<[email protected]<mailto:[email protected]>> wrote:
Load like that is mostly io based either the machine is swapping or network is 
to slow. Check I/o wait in top.

And the problem where you get oom killer to kill off gluster. That means that 
you don't monitor ram usage on the servers? Either it's eating all your ram and 
swap gets really io intensive and then is killed off. Or you have the wrong 
swap settings in sysctl.conf (there are tons of broken guides that recommends 
swappines to 0 but that disables swap on newer kernels. The proper swappines 
for only swapping when nesseary is 1 or a sufficiently low number like 10 
default is 60)


Moving to nfs will not improve things. You will get more memory since gluster 
isn't running and that is good. But you will have a single node that can fail 
with all your storage and it would still be on 1 gigabit only and your three 
node cluster would easily saturate that link.


On July 7, 2018 04:13:13 Jim Kusznir 
<[email protected]<mailto:[email protected]>> wrote:

So far it does not appear to be helping much. I'm still getting VM's locking up 
and all kinds of notices from overt engine about non-responsive hosts.  I'm 
still seeing load averages in the 20-30 range.

Jim

On Fri, Jul 6, 2018, 3:13 PM Jim Kusznir 
<[email protected]<mailto:[email protected]>> wrote:
Thank you for the advice and help

I do plan on going 10Gbps networking; haven't quite jumped off that cliff yet, 
though.

I did put my data-hdd (main VM storage volume) onto a dedicated 1Gbps network, 
and I've watched throughput on that and never seen more than 60GB/s achieved 
(as reported by bwm-ng).  I have a separate 1Gbps network for communication and 
ovirt migration, but I wanted to break that up further (separate out VM 
traffice from migration/mgmt traffic).  My three SSD-backed gluster volumes run 
the main network too, as I haven't been able to get them to move to the new 
network (which I was trying to use as all gluster).  I tried bonding, but that 
seamed to reduce performance rather than improve it.

--Jim

On Fri, Jul 6, 2018 at 2:52 PM, Jamie Lawrence 
<[email protected]<mailto:[email protected]>> wrote:
Hi Jim,

I don't have any targeted suggestions, because there isn't much to latch on to. 
I can say Gluster replica three  (no arbiters) on dedicated servers serving a 
couple Ovirt VM clusters here have not had these sorts of issues.

I suspect your long heal times (and the resultant long periods of high load) 
are at least partly related to 1G networking. That is just a matter of IO - 
heals of VMs involve moving a lot of bits. My cluster uses 10G bonded NICs on 
the gluster and ovirt boxes for storage traffic and separate bonded 1G for 
ovirtmgmt and communication with other machines/people, and we're occasionally 
hitting the bandwidth ceiling on the storage network. I'm starting to think 
about 40/100G, different ways of splitting up intensive systems, and 
considering iSCSI for specific volumes, although I really don't want to go 
there.

I don't run FreeNAS[1], but I do run FreeBSD as storage servers for their 
excellent ZFS implementation, mostly for backups. ZFS will make your `heal` 
problem go away, but not your bandwidth problems, which become worse (because 
of fewer NICS pushing traffic). 10G hardware is not exactly in the impulse-buy 
territory, but if you can, I'd recommend doing some testing using it. I think 
at least some of your problems are related.

If that's not possible, my next stops would be optimizing everything I could 
about sharding, healing and optimizing for serving the shard size to squeeze as 
much performance out of 1G as I could, but that will only go so far.

-j

[1] FreeNAS is just a storage-tuned FreeBSD with a GUI.

On Jul 6, 2018, at 1:19 PM, Jim Kusznir 
<[email protected]<mailto:[email protected]>> wrote:

hi all:

Once again my production ovirt cluster is collapsing in on itself.  My servers 
are intermittently unavailable or degrading, customers are noticing and calling 
in.  This seems to be yet another gluster failure that I haven't been able to 
pin down.

I posted about this a while ago, but didn't get anywhere (no replies that I 
found).  The problem started out as a glusterfsd process consuming large 
amounts of ram (up to the point where ram and swap were exhausted and the 
kernel OOM killer killed off the glusterfsd process).  For reasons not clear to 
me at this time, that resulted in any VMs running on that host and that gluster 
volume to be paused with I/O error (the glusterfs process is usually unharmed; 
why it didn't continue I/O with other servers is confusing to me).

I have 3 servers and a total of 4 gluster volumes (engine, iso, data, and 
data-hdd).  The first 3 are replica 2+arb; the 4th (data-hdd) is replica 3.  
The first 3 are backed by an LVM partition (some thin provisioned) on an SSD; 
the 4th is on a seagate hybrid disk (hdd + some internal flash for 
acceleration).  data-hdd is the only thing on the disk.  Servers are Dell R610 
with the PERC/6i raid card, with the disks individually passed through to the 
OS (no raid enabled).

The above RAM usage issue came from the data-hdd volume.  Yesterday, I cought 
one of the glusterfsd high ram usage before the OOM-Killer had to run.  I was 
able to migrate the VMs off the machine and for good measure, reboot the entire 
machine (after taking this opportunity to run the software updates that ovirt 
said were pending).  Upon booting back up, the necessary volume healing began.  
However, this time, the healing caused all three servers to go to very, very 
high load averages (I saw just under 200 on one server; typically they've been 
40-70) with top reporting IO Wait at 7-20%.  Network for this volume is a 
dedicated gig network.  According to bwm-ng, initially the network bandwidth 
would hit 50MB/s (yes, bytes), but tailed off to mostly in the kB/s for a 
while.  All machines' load averages were still 40+ and gluster volume heal 
data-hdd info reported 5 items needing healing.  Server's were intermittently 
experiencing IO issues, even on the 3 gluster volumes that appeared largely 
unaffected.  Even the OS activities on the hosts itself (logging in, running 
commands) would often be very delayed.  The ovirt engine was seemingly randomly 
throwing engine down / engine up / engine failed notifications.  Responsiveness 
on ANY VM was horrific most of the time, with random VMs being inaccessible.

I let the gluster heal run overnight.  By morning, there were still 5 items 
needing healing, all three servers were still experiencing high load, and 
servers were still largely unstable.

I've noticed that all of my ovirt outages (and I've had a lot, way more than is 
acceptable for a production cluster) have come from gluster.  I still have 3 
VMs who's hard disk images have become corrupted by my last gluster crash that 
I haven't had time to repair / rebuild yet (I believe this crash was caused by 
the OOM issue previously mentioned, but I didn't know it at the time).

Is gluster really ready for production yet?  It seems so unstable to me....  I'm looking 
at replacing gluster with a dedicated NFS server likely FreeNAS.  Any suggestions?  What 
is the "right" way to do production storage on this (3 node cluster)?  Can I 
get this gluster volume stable enough to get my VMs to run reliably again until I can 
deploy another storage solution?

--Jim
_______________________________________________
Users mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to 
[email protected]<mailto:[email protected]>
Privacy Statement: 
https://www.ovirt.org/site/privacy-policy/<https://www.ovirt.org/site/privacy-policy/>
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/<https://www.ovirt.org/community/about/community-guidelines/>
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/YQX3LQFQQPW4JTCB7B6FY2LLR6NA2CB3/<https://lists.ovirt.org/archives/list/[email protected]/message/YQX3LQFQQPW4JTCB7B6FY2LLR6NA2CB3/>



_______________________________________________
Users mailing list -- [email protected]<mailto:users%40ovirt.org>
To unsubscribe send an email to 
[email protected]<mailto:users-leave%40ovirt.org>
Privacy Statement: 
https://www.ovirt.org/site/privacy-policy/<https://www.ovirt.org/site/privacy-policy/>
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/<https://www.ovirt.org/community/about/community-guidelines/>
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/O2HIECLFMYGKH3KSZHHSMDUVGOEBI7GQ/<https://lists.ovirt.org/archives/list/[email protected]/message/O2HIECLFMYGKH3KSZHHSMDUVGOEBI7GQ/>









_______________________________________________
Users mailing list -- [email protected]<mailto:[email protected]>
To unsubscribe send an email to 
[email protected]<mailto:[email protected]>
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/T2M4J3Z7RPSGPEHNC33WFC2HUYOVL6FB/




Edward Clay
Systems Administrator
The Hut Group<http://www.thehutgroup.com/>

Tel:
Email: [email protected]<mailto:[email protected]>

For the purposes of this email, the "company" means The Hut Group Limited, a 
company registered in England and Wales (company number 6539496) whose registered office 
is at Fifth Floor, Voyager House, Chicago Avenue, Manchester Airport, M90 3DQ and/or any 
of its respective subsidiaries.

Confidentiality Notice
This e-mail is confidential and intended for the use of the named recipient 
only. If you are not the intended recipient please notify us by telephone 
immediately on +44(0)1606 811888 or return it to us by e-mail. Please then 
delete it from your system and note that any use, dissemination, forwarding, 
printing or copying is strictly prohibited. Any views or opinions are solely 
those of the author and do not necessarily represent those of the company.

Encryptions and Viruses
Please note that this e-mail and any attachments have not been encrypted. They 
may therefore be liable to be compromised. Please also note that it is your 
responsibility to scan this e-mail and any attachments for viruses. We do not, 
to the extent permitted by law, accept any liability (whether in contract, 
negligence or otherwise) for any virus infection and/or external compromise of 
security and/or confidentiality in relation to transmissions sent by e-mail.

Monitoring
Activity and use of the company's systems is monitored to secure its effective 
use and operation and for other lawful business purposes. Communications using 
these systems will also be monitored and may be recorded to secure effective 
use and operation and for other lawful business purposes.

hgvyjuv

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/Y2ZFGU2XDAXPMNLPQVHRDTNJQDFVWGCL/

[ovirt-users] Re: Ovirt cluster unstable; gluster to blame (again)

Reply via email to