[ovirt-users] Re: oVirt Performance (Horrific)

Karli Sjöberg Thu, 14 Mar 2019 06:13:38 -0700

On 2019-03-13 05:20, Drew Rash wrote:
> Pictures and speeds are the latest. Which seems to be the best
> performance we've ever gotten so far. Still seems like the hardware is
> sitting idling by not doing much after an initial burst.
>
> Took a picture of a file copy using the latest setup. You can see it
> transfer like 25% of a 7gig file at some where around 1GBps or 600MBps
> ish (it disappears quickly) down to 40MBps
> The left vm "MikeWin10:1" is freeNAS'd and achieves much higher highs.
> Still crawls down to the lows and has pause and weird stuff.  
> The right vm "MikeWin10_Drew:1" is a gluster fs mount. We tried nfs
> and decided to try gluster again but with a "negative-timeout=1"
> option set...appears to have made it faster by 4x.
> *https://imgur.com/a/R2w6IcO*
>
> *4 Boxes:*
> (2)Two are c9x299-PG300F super micro boards with 14c (28thread) i9's 
> 128GB 3200MHz Ram
> (1)FreeNAS is our weakest of all 4 boxes - 6 core, 64GB ram i7 extreme
> version.

Heyo!

Not that the thread is about ZFS, but I find this "stop and go" behavior
interesting.

FreeNAS is a excellent NAS platform, I mean, it's in the name, right? ;)
However, the ZFS filesystem and how you configure the system does impact
performance. First of all, how have you configured the drives in the
zpool? RAIDZ is not recommended for virtualization, just because it's
random IOPS performance are set to 1 HDD/vdev. If we assume a SATA drive
has 150 random IOPS and you create a 8 x 6 TB RAIDZ2 vdev, that entire
pool only have 150 random IOPS total. Can you do a "zpool status" and
post the output?

Second, it's worth mentioning that block sizes still matter. Most drives
still lie to the OS that they are 512 byte sectors while really being
4k, just so that older OS'es don't freak out because they don't know
drives can have any else than 512. I don't know if FreeNAS solves this
issue for you but it's something I always take care of, either by
"sysctl vfs.zfs.min_auto_ashift=12" or trick ZFS into thinking the
drives are true 4k disks with "gnop". A way to check is "zdb | grep
ashift"; it should be 12. If 9, you may have worse performance than you
should have, but not way worse. Still... Then there's alignment that I
also think that FreeNAS takes care of, probably... Most systems place
the partition start at 1 MiB which makes it OK for any disk regardless.
Your disks should be called "adaX", run "camcontrol devlist" to get a
list of all of them, then pick one disk to check the partitioning on
with "gpart show adaX". The "freebsd-zfs" partition should start at
something evenly divisible by 4096 (4k). Most of the time they're at
2048, because 512*2048=1048576(1MiB) and that divided by 4k is
(1048576/4096=256), which is a beautifully even number.

Third and maybe most important, ZFS _does_ listen to "sync" calls, which
is about everything over iSCSI (with ctld) or NFS. That means, since
your hosts are connecting to it over one of the two, for _every_ write,
the NAS stops and waits for it to be actually written safely to disk
before doing another write, it's sooo slow (but super awesome, because
it saves you from data corruption). What you do with ZFS to mitigate
that is to add a so called SLOG (separate log) disk, typically a
hella-fast SSD or NVME that only does that and nothing else, so that the
fast disk takes all the random, small writes and turns them into big
streaming writes that the HDD's can take. You can partition just a bit
of an SSD and use that as a SLOG, typically not more than the bandwidth
you could maximally take, times the interval between write flushes in
ZFS, which is 5 secs. So 10Gb/s is about 1,25 GB/s, tops- and you have
two of those. 2,5GB*5 = 12.5GB. Which means 14GB should definitely cover it.

Lastly network, are you sure you activated jumbo frames, all the way
from the storage to the hosts? That makes a huge difference on 10 Gb
ethernet. A way to test this is to start tcpdump on the iSCSI/NFS
storage interface, looking for just ping, like "tcpdump -vnni Jumbo_NFS
icmp" on both the storage and a host system. Then from another terminal
(as root) of the storage or host, send just _one_ big ping packet, and
see what you get, like so: "ping -c 1 -s 8192 XXX.XXX.XXX.XXX". The
tcpdump output should just have recieved _one_ ICMP echo request and
sent back just _one_ ICMP echo reply; one to get there, and one back.
That's how you're sure you've got no fragmentation happening between the
storage and host.

I swear, I didn't mean to write a book about it, it just happened :) You
put a quarter in the ZFS box and this is what you get...

/K

> (1) The last is an 8c (16thrd)  i7, 128GB  3000MHz Ram
> *Network:*
> All tied together with a 10Gbps managed switch, each machine having 2
> x 10Gbps nic ports.
>
> *Drives:*
> 4 8TB WD Gold Enterprise drives
> 4 6TB WD Gold Enterprise drives
> 4 m.2 500 GB samsung pro's
> and like 10 ssd's for random things with 4 being 1TB samsung's running
> a gluster for a production box. Which still also runs at around 13MBps
> inside the VM.
>
> Also I believe we tried using 9000 MTU on all networks and the setting
> is still set to that.
>
> We're testing using 2 8TB drives in a mirror 2 (no arb..testing) gluster.
> And we took the 6TB drives and made a raid on freenas for testing.
> m.2's are boot devices for the boxes.
>
> It's pretty apparent there's some kind of cache happening and then if
> the file copy is big enough, it'll just crawl down to nothing after it
> hits the end of whatever it is.
> Added a picture of the StoragePool page in freenas. And a picture of
> the oVirt gluster box VM page.
>
> I'm not sure where to find the dirty ratio and background ratio...?
>
>
>
>
> On Tue, Mar 12, 2019 at 1:19 AM Strahil <hunter86...@yahoo.com
> <mailto:hunter86...@yahoo.com>> wrote:
>
>     Hi Drew,
>
>     What is the host RAM size and what is the setting for
>     VM.dirty_ratio and background ratio on those hosts?
>
>     What about your iSCSI target?
>
>     Best Regards,
>     Strahil Nikolov
>
>     On Mar 11, 2019 23:51, Drew Rash <drew.r...@gmail.com
>     <mailto:drew.r...@gmail.com>> wrote:
>
>         Added the disable:false, removed the gluster, re-added using
>         nfs.  Performance still in the low 10's MBps + or - 5
>         Ran the showmount -e "" and it displayed the mount.
>
>         Trying right now to re-mount using gluster with a
>         negative-timeout=1 option.
>
>         We converted one of our 4 boxes to FreeNAS, took 4 6TB drives
>         and made a raid iSCSI and connected it to oVirt.  Boot
>         windows. ( times 2, did 2 boxes with a 7GB file on each) 
>         copied from one to the other and it copied at 600MBps average.
>         But then has weird pauses...  I think it's doing some kind of
>         cache..it'll go like 2GB and choke to zero Bps. Then speed up
>         and choke, speed up choke averaging or getting up to 10MBps.
>         Then at 99% it waits 15 seconds with 0 bytes left...
>         Small files, are instant basically. No complaint there.
>         So...WAY faster.  But suffers from the same thing....just
>         requires writing some more to get to it.  a few gigs and then
>         it crawls.
>
>         Seems to be related to if I JUST finished running a test.  If
>         I wait a while, I get it it to copy almost 4GB or so before
>         choking.
>         I made a 3rd windows 10 VM and copied the same file from the
>         1st to the 2nd (via a windows share and from the 3rd box)  And
>         it didn't choke or do any funny business...oddly.  Maybe a
>         fluke. Only did that once.
>
>         So....switching to freenas appears to have increased the
>         window size before it runs horribly.  But it will still run
>         horrifically if the disk is busy.
>
>         And since we're planning on doing actual work on this... idle
>         disks caching up on some hidden cache feature of oVirt isn't
>         gonna work.  We won't be writing gigs of data all over the
>         place...but knowing that this chokes a VM to near death...is
>         scary.  
>
>         It looks like for a windows 10 install to operate correctly,
>         it expects at least 15MB/s with less than 1s latency.
>         Otherwise services don't start and weird stuff happens and it
>         runs slower than my dog while pooping out that extra little
>         stringy bit near the end. So we gotta avoid that.
>
>
>
>
>         On Sat, Mar 9, 2019 at 12:44 AM Strahil <hunter86...@yahoo.com
>         <mailto:hunter86...@yahoo.com>> wrote:
>
>             Hj Drew,
>
>             For the test change the gluster parameter nfs.disabled to
>             false.
>             Something like gluster volume set volname nfs.dsiable false
>
>             Then use shownount -e gluster-node-fqdn
>             Note: NFS might not be allowed in the firewall.
>
>             Then add this NFS domain (don't forget to remove the
>             gluster storage domain before that) and do your tests.
>
>             If it works well, you will have to switch off nfs.disable
>             and deploy NFS Ganesha:
>
>             gluster volume reset volname nfs.disable
>
>             Best Regards,
>             Strahil Nikolov
>
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/C2CEUZTOFKJ5BI72JXZTVAJKFHDEF5RN/

pEpkey.asc
Description: application/pgp-keys

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3F3LXWTN7E7RXHAVQZYQHLWFRHEFJBE4/

[ovirt-users] Re: oVirt Performance (Horrific)

Reply via email to