Re: PCIe passthrough really that expensive?
Bezüglich Anish's Nachricht vom 13.06.2017 06:13 (localtime): > Hi Harry, >>Any hints highly appreciated! … > > Now use cpuset to route IRQ#265 to say core 0 > > $cpuset -l 0 -x 265 > > Again use cpuset to force VM[PID 1222] to run on all core except #0 > > root@svmhost:~ # ps > > PID TT STATTIME COMMAND > > > > 1222 1 I+ 5:59.45 bhyve: vm1 (bhyve) > > > VM can run on all cores except #0. > > $ cpuset -l 1-3 -p 1222 > > > You can monitor guest due to interrupts using > > $root@svmhost:~ # bhyvectl --get-stats --vm= --cpu= | > grep external > > vm exits due to external interrupt 27273 > > root@svmhost:~ # Thank you very much for that detailed explanation. I didn't thought that cpuset(1) could also pin IRQ (handlers?) to specific cpus. In my case, I couldn't get a noticable difference. Since I have hyperthreading enabled on a single-socket quad core, I pinned cpu2+3 to the bhyve pid (2vCPUs) and for testing copied two times the same 8GB file ofer NFS, while having ppt's IRQ handler pinned to CPU1, CPU4, CPU3 and CPU2. So two times different hostCPUs than guest uses and two times the same. I couldn't see any load nor performance difference and similar 'vm exits due to external interrupt' count groth. To name numbers: 1st CPU had about 40k "vm exits due to external interrupt" per 8GP transfer, the other vCPU ~160k "vm exits due to external interrupt". Like mentioned, different host-CPU-pinning didn't influence that noticable. Thanks four this lesson! -harry ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: PCIe passthrough really that expensive?
Hi Harry, >Any hints highly appreciated! In my setup, I have dual port Intel GigE, one assigned to host and other one is used by guest root@svmhost:~ # pciconf -l |grep ppt ... ppt2@pci0:2:0:0: class=0x02 card=0x125e8086 chip=0x105e8086 rev=0x06 hdr=0x00 root@svmhost:~ # This show up as 'em0' in BSD guest with 2 vcpus. root@bsdguest:~ # pciconf -l |grep em0 em0@pci0:0:21:0:class=0x02 card=0x125e8086 chip=0x105e8086 rev=0x06 hdr=0x00 root@bsdguest:~ # Once guest is booted, ppt2 will claim interrupt resources, in this case just 1 interrupt line #265 root@svmhost:~ # vmstat -i interrupt total rate ... irq263: em0:irq0 1028705634 .. irq265: ppt2 1041187641 Total2835121 1747 Now use cpuset to route IRQ#265 to say core 0 $cpuset -l 0 -x 265 Again use cpuset to force VM[PID 1222] to run on all core except #0 root@svmhost:~ # ps PID TT STATTIME COMMAND 1222 1 I+ 5:59.45 bhyve: vm1 (bhyve) VM can run on all cores except #0. $ cpuset -l 1-3 -p 1222 You can monitor guest due to interrupts using $root@svmhost:~ # bhyvectl --get-stats --vm= --cpu= | grep external vm exits due to external interrupt 27273 root@svmhost:~ # Regards, Anish On Sun, Jun 11, 2017 at 2:51 AM, Harry Schmalzbauer wrote: > Bezüglich Harry Schmalzbauer's Nachricht vom 09.06.2017 10:22 (localtime): > > Bezüglich Anish's Nachricht vom 08.06.2017 14:35 (localtime): > >> Hi Harry, > >>> I thought I'd save these expensive VM_Exits by using the passthru path. > >> Completely wrong, is it? > >> > >> It depends on which processor you are using. For example APICv was > >> introduced in IvyBridge which enabled h/w assisted localAPIC rather than > >> using s/w emulated, bhyve supports it on Intel processors. > … > > I'm still usign IvyBridge (E3v2) with this "new" machine, but haven't > > ever heard/thought about APCIv! > > It seems APICv is available on IvyBridge-EP (Xeon E5/E7v2) only, not for > E3v2 :-( > Furthermore, if I didn't miss anything in the datasheets, no currently > available E3 Xeon offers local APIC virtualization. Can somebody of the > xperts confirm that? > > > … > >> Can you run a simple experiment, assign pptdev interrupts to core that's > >> not running guest/vcpu? This will reduce #VMEXIT on vcpu which we know > >> is expensive. > > Interesting approach. But I have no idea how I should assign a PCIe > > specific core to a PCIe dev. Is it pptdev specific? The tunables in > > device.hints(5) can't be used for that, can they? > > I wasn't able to find out how to do that. > Any hints highly appreciated! > > -harry > > ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: PCIe passthrough really that expensive?
Bezüglich Harry Schmalzbauer's Nachricht vom 09.06.2017 10:22 (localtime): > Bezüglich Anish's Nachricht vom 08.06.2017 14:35 (localtime): >> Hi Harry, >>> I thought I'd save these expensive VM_Exits by using the passthru path. >> Completely wrong, is it? >> >> It depends on which processor you are using. For example APICv was >> introduced in IvyBridge which enabled h/w assisted localAPIC rather than >> using s/w emulated, bhyve supports it on Intel processors. … > I'm still usign IvyBridge (E3v2) with this "new" machine, but haven't > ever heard/thought about APCIv! It seems APICv is available on IvyBridge-EP (Xeon E5/E7v2) only, not for E3v2 :-( Furthermore, if I didn't miss anything in the datasheets, no currently available E3 Xeon offers local APIC virtualization. Can somebody of the xperts confirm that? … >> Can you run a simple experiment, assign pptdev interrupts to core that's >> not running guest/vcpu? This will reduce #VMEXIT on vcpu which we know >> is expensive. > Interesting approach. But I have no idea how I should assign a PCIe > specific core to a PCIe dev. Is it pptdev specific? The tunables in > device.hints(5) can't be used for that, can they? I wasn't able to find out how to do that. Any hints highly appreciated! -harry ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: PCIe passthrough really that expensive?
Bezüglich Anish's Nachricht vom 08.06.2017 14:35 (localtime): > Hi Harry, >>I thought I'd save these expensive VM_Exits by using the passthru path. > Completely wrong, is it? > > It depends on which processor you are using. For example APICv was > introduced in IvyBridge which enabled h/w assisted localAPIC rather than > using s/w emulated, bhyve supports it on Intel processors. > > Intel Broadwell introduced PostedInterrupt which enabled interrupt to > delivered to guest directly, bypassing hypervisor[2] for > passthrough devices. Emulated devices interrupt will still go through > hypervisor. That's very interesting, thanks so much! I wasn't ware that there were post VT-c improvements, guess I'll have to refresh my very basic knowledge urgently. I'm still usign IvyBridge (E3v2) with this "new" machine, but haven't ever heard/thought about APCIv! > You can verify capability using sysctl hw.vmm.vmx. What processor you > are using for these performance benchmarking? hw.vmm.vmx.vpid_alloc_failed: 0 hw.vmm.vmx.posted_interrupt_vector: -1 hw.vmm.vmx.cap.posted_interrupts: 0 hw.vmm.vmx.cap.virtual_interrupt_delivery: 0 hw.vmm.vmx.cap.invpcid: 0 hw.vmm.vmx.cap.monitor_trap: 1 hw.vmm.vmx.cap.unrestricted_guest: 1 hw.vmm.vmx.cap.pause_exit: 1 hw.vmm.vmx.cap.halt_exit: 1 hw.vmm.vmx.initialized: 1 hw.vmm.vmx.cr4_zeros_mask: 18446744073708017664 hw.vmm.vmx.cr4_ones_mask: 8192 hw.vmm.vmx.cr0_zeros_mask: 18446744071025197056 hw.vmm.vmx.cr0_ones_mask: 3 I did very simply 'time cp' with 8GB files over NFSv4, which come from ZFS-cache on the remote side and locally watching host+guest vmstat. > Can you run a simple experiment, assign pptdev interrupts to core that's > not running guest/vcpu? This will reduce #VMEXIT on vcpu which we know > is expensive. Interesting approach. But I have no idea how I should assign a PCIe specific core to a PCIe dev. Is it pptdev specific? The tunables in device.hints(5) can't be used for that, can they? It seems pptdev(0) couldn't get a man page yet, but I'll have a look at the sources, maybe I can find hints until earth has done it's job and present you the same nice sunshine I'm enjoying today :-) -harry ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"
Re: PCIe passthrough really that expensive?
Hi Harry, >I thought I'd save these expensive VM_Exits by using the passthru path. Completely wrong, is it? It depends on which processor you are using. For example APICv was introduced in IvyBridge which enabled h/w assisted localAPIC rather than using s/w emulated, bhyve supports it on Intel processors. Intel Broadwell introduced PostedInterrupt which enabled interrupt to delivered to guest directly, bypassing hypervisor[2] for passthrough devices. Emulated devices interrupt will still go through hypervisor. You can verify capability using sysctl hw.vmm.vmx. What processor you are using for these performance benchmarking? Can you run a simple experiment, assign pptdev interrupts to core that's not running guest/vcpu? This will reduce #VMEXIT on vcpu which we know is expensive. Regards, Anish On Wed, Jun 7, 2017 at 11:01 AM, Harry Schmalzbauer wrote: > Hello, > > some might have noticed my numerous posts recently, mainly in > freebsd-net@, but all around the same story – replacing ESXi. So I hope > nobody minds if I ask for help again to alleviate some of my knowledge > deficiencies about PCIePassThrough. > As last resort for special VMs, I always used to have dedicated NICs via > PCIePassThrough. > But with bhyve (besides other undiscovered strange side effects) I don't > understand the results utilizing bhyve-passthru. > > Simple test: Copy iso image from NFSv4 mount via 1GbE (to null). > > Host, using if_em (hartwell): 4-8kirqs/s (8 @mtu 1500), system idle > ~99-100% > Passing this same hartwell devcie to the guest, running the identical > FreeBSD version like the host, I see 2x8kirqs/s, MTU independent, and > only 80%idle, while almost all cycles are spent in Sys (vmm). > Running the same guest in if_bridge(4)-vtnet(4) or vale(4)-vtnet(4) > deliver identical results: About 80% attainable throughput, only 80% > idle cycles. > > So interrupts triggerd by PCI devices, which are controlled via > bhyve-passthru, are as expensive as interrupts triggered by emulated > devices? > I thought I'd save these expensive VM_Exits by using the passthru path. > Completely wrong, is it? > > I haven't ever done authoritative ESXi measures, but I remember that > there was a significant saving using VMDirectPath. Big enough that I > never felt the need for measuring. Is there any implementation > difference? Some kind of intermediate interrupt moderation maybe? > > Thanks for any hints/links, > > -harry > ___ > freebsd-virtualization@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization > To unsubscribe, send any mail to "freebsd-virtualization- > unsubscr...@freebsd.org" ___ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"