I first want to apologize for not recognizing the cause of KVM performance 
problems (which were DROPPED PACKETS) much sooner.  Until recently, our KVM 
deployments in house have been either on r151006, or nothing else.  I've added 
an OI KVM box to our r151014 build machine, to make sure I have a platform to 
attempt replications.

What happened was that upstream illumos KVM (from Joyent) had a platform flag 
day during r151012's development --> the VND code. Joyent's illumos child has 
Virtual Networking Devices (VND) that allow KVM instances to not depend on an 
actual NIC's Promiscuous Mode to receive packets.  They updated their illumos, 
and subsequently their KVM.  Remember that "KVM" has two parts:  The kernel KVM 
driver (from Joyent's illumos-kvm repo), and the "KVM-cmd", which is QEMU (from 
Joyent's illumos-kvm-cmd repo).

Other distros do not have VND currently (the illumos community is attempting to 
fix this, and Joyent is leading here, modulo their own day jobs).  The 
compilation of illumos-kvm-cmd's latest revisions (the QEMU bits) fails without 
having VND around. We reset illumos-kvm-cmd to the pre-VND revision, but did 
NOT reset illumos-kvm bits to pre-VND.  Since the world compiled and ran in 
this split state, I moved forward.  The PROBLEM was that the amount of internal 
buffering for promiscuous devices is low, and while VND fixes the problem by 
reducing the use of promiscuous mode, non-VND illumos (like OmniOS) still needs 
to increase limits.  The up-to-date kernel side eliminated the method for 
increasing these buffering limits, causing MUCH higher packet drop rates.

Quoting Joyent's Robert Mustacchi:

> By default the stream high watermark for the promisc mode is quite low.
> And for some reason, that I don't recall, there was no great way to do
> that ourselves from user land (could be wrong entirely). As a result, if
> you don't set it, we're basically going to start dropping mblk_t's
> queued on the stream.
> 
> Basically without vnd, you need both of those. With vnd, then you can
> get rid of it in both QEMU and KVM.

Tobi Oetiker (who deserves a ton of credit for calling this problem out, AND 
determining it was packet drops) helped me test two solutions to the problem:

1.) Revert illumos-kvm to the pre-VND level as well.

2.) Keep up to date with illumos-kvm and illumos-kvm-cmd, but explicitly revert 
the VND changes in BOTH.

I'm strongly leaning toward committing solution #2. Regardless of which, I will 
be issuing an update for r151014 later this week that will push KVM performance 
back to its pre-VND-bump levels.

GOING FORWARD, once VND is upstreamed into illumos-gate, I can eliminate the 
VND backouts (or just catch up the built repos if I use option #1 above).

Thank you all for your patience, and again, sorry for not addressing this 
sooner.

Dan McDonald -- OmniOS Engineering


_______________________________________________
OmniOS-discuss mailing list
[email protected]
http://lists.omniti.com/mailman/listinfo/omnios-discuss

Reply via email to