On 28 May 2012 22:10, Richard Elling <richard.ell...@gmail.com> wrote: > The only recommendation which will lead to results is to use a > different OS or filesystem. Your choices are > - FreeBSD with ZFS > - Linux with BTRFS > - Solaris with QFS > - Solaris with UFS > - Solaris with NFSv4, use ZFS on independent fileserver machines > > There's a rather mythical rewrite of the Solaris virtual memory > subsystem called VM2 in progress but it will still take a long time > until this will become available for customers and there are no real > data yet whether this will help with mmap performance. It won't be > available for Opensolaris successors like Illumos available either > (likely never, at least the Illumos leadership doesn't see the need > for this and instead recommends to rewrite the applications to not use > mmap). > > > This is a mischaracterization of the statements given. The illumos team > says they will not implement Oracle's VM2 for valid, legal reasons. > That does not mean that mmap performance improvements for ZFS > cannot be implemented via other methods.
I'd like to hear what the other methods should be. The lack of mmap performance is only a symptom of a more severe disease. Just doing piecework and alter the VFS API to integrate ZFS/ARC/VM with each other doesn't fix the underlying problems. I've assigned two of my staff, one familiar with the FreeBSD VM and one familiar with the Linux VM, to look at the current VM subsystem and their preliminary reports point to disaster. If Illumos does not initiate a VM rewrite project of it's own which will make the VM aware of NUMA, power management and other issues then I predict nothing less than the downfall of Illumos within a couple of years because the performance impact is dramatic and makes the Illumos kernel no longer competitive. Despite these findings, of which Sun was aware for a long time, and the number of ex-Sun employees working on Illumos, I miss the commitment to launch such a project. That's why I said "likely never", unless of course someone slams Garrett's head with sufficient force on a wooden table to make him see the reality. The reality is: - The modern x86 server platforms are now all NUMA or NUMA-like. Lack of NUMA support leads to bad performance - They all use some kind of serialized link between CPU nodes, let it be Hypertransport or Quickpath, with power management. If power management is active and has reduced the number of active links between nodes and the OS doesn't manage this correctly you'll get bad performance. Illumo's VM isn't even remotely aware of this fact - Based on simulator testing we see that in a simulated environment with 8 sockets almost 40% of kernel memory accesses are _REMOTE_ accesses, i.e. it's not local to the node accessing it That are all preliminary results, I expect that the remainder of the analysis will take another 4-5 weeks until we present the findings to the Illumos community. But I can say already it will be a faceslap for those who think that Illumos doesn't need a better VM system. > The primary concern for mmap files is that the RAM footprint is doubled. It's not only that RAM is doubled, the data are copied between both ARC and page cache multiple times. You can say memory and the in memory copy operation are cheap, but this and the lack of NUMA awareness is a real performance killer. Lionel _______________________________________________ zfs-discuss mailing list email@example.com http://mail.opensolaris.org/mailman/listinfo/zfs-discuss