On 28 May 2012 22:10, Richard Elling <richard.ell...@gmail.com> wrote:
> The only recommendation which will lead to results is to use a
> different OS or filesystem. Your choices are
> - FreeBSD with ZFS
> - Linux with BTRFS
> - Solaris with QFS
> - Solaris with UFS
> - Solaris with NFSv4, use ZFS on independent fileserver machines
> There's a rather mythical rewrite of the Solaris virtual memory
> subsystem called VM2 in progress but it will still take a long time
> until this will become available for customers and there are no real
> data yet whether this will help with mmap performance. It won't be
> available for Opensolaris successors like Illumos available either
> (likely never, at least the Illumos leadership doesn't see the need
> for this and instead recommends to rewrite the applications to not use
> mmap).
> This is a mischaracterization of the statements given. The illumos team
> says they will not implement Oracle's VM2 for valid, legal reasons.
> That does not mean that mmap performance improvements for ZFS
> cannot be implemented via other methods.

I'd like to hear what the other methods should be. The lack of mmap
performance is only a symptom of a more severe disease. Just doing
piecework and alter the VFS API to integrate ZFS/ARC/VM with each
other doesn't fix the underlying problems.

I've assigned two of my staff, one familiar with the FreeBSD VM and
one familiar with the Linux VM, to look at the current VM subsystem
and their preliminary reports point to disaster. If Illumos does not
initiate a VM rewrite project of it's own which will make the VM aware
of NUMA, power management and other issues then I predict nothing less
than the downfall of Illumos within a couple of years because the
performance impact is dramatic and makes the Illumos kernel no longer
Despite these findings, of which Sun was aware for a long time, and
the number of ex-Sun employees working on Illumos, I miss the
commitment to launch such a project. That's why I said "likely never",
unless of course someone slams Garrett's head with sufficient force on
a wooden table to make him see the reality.

The reality is:
- The modern x86 server platforms are now all NUMA or NUMA-like. Lack
of NUMA support leads to bad performance
- They all use some kind of serialized link between CPU nodes, let it
be Hypertransport or Quickpath, with power management. If power
management is active and has reduced the number of active links
between nodes and the OS doesn't manage this correctly you'll get bad
performance. Illumo's VM isn't even remotely aware of this fact
- Based on simulator testing we see that in a simulated environment
with 8 sockets almost 40% of kernel memory accesses are _REMOTE_
accesses, i.e. it's not local to the node accessing it
That are all preliminary results, I expect that the remainder of the
analysis will take another 4-5 weeks until we present the findings to
the Illumos community. But I can say already it will be a faceslap for
those who think that Illumos doesn't need a better VM system.

> The primary concern for mmap files is that the RAM footprint is doubled.

It's not only that RAM is doubled, the data are copied between both
ARC and page cache multiple times. You can say memory and the in
memory copy operation are cheap, but this and the lack of NUMA
awareness is a real performance killer.

zfs-discuss mailing list

Reply via email to