Re: PCIe passthrough really that expensive?

2017-06-13 Thread Harry Schmalzbauer
Bezüglich Anish's Nachricht vom 13.06.2017 06:13 (localtime):
> Hi Harry,
>>Any hints highly appreciated!
…
> 
> Now use cpuset to route IRQ#265 to say core 0
> 
> $cpuset -l 0 -x 265
> 
> Again use cpuset to force VM[PID 1222] to run on all core except #0  
> 
> root@svmhost:~ # ps
> 
>  PID TT  STATTIME COMMAND
> 
> 
> 
> 1222  1  I+   5:59.45 bhyve: vm1 (bhyve)
> 
> 
> VM can run on all cores except #0.
> 
> $ cpuset -l 1-3 -p 1222
> 
> 
> You can monitor guest due to interrupts using
> 
> $root@svmhost:~ # bhyvectl --get-stats --vm= --cpu= |
> grep external
> 
> vm exits due to external interrupt  27273
> 
> root@svmhost:~ # 

Thank you very much for that detailed explanation.  I didn't thought
that cpuset(1) could also pin IRQ (handlers?) to specific cpus.

In my case, I couldn't get a noticable difference.
Since I have hyperthreading enabled on a single-socket quad core, I
pinned cpu2+3 to the bhyve pid (2vCPUs) and for testing copied two times
the same 8GB file ofer NFS, while having ppt's IRQ handler pinned to
CPU1, CPU4, CPU3 and CPU2.  So two times different hostCPUs than guest
uses and two times the same.  I couldn't see any load nor performance
difference and similar 'vm exits due to external interrupt' count groth.

To name numbers: 1st CPU had about 40k "vm exits due to external
interrupt" per 8GP transfer, the other vCPU ~160k "vm exits due to
external interrupt".

Like mentioned, different host-CPU-pinning didn't influence that noticable.

Thanks four this lesson!

-harry
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: PCIe passthrough really that expensive?

2017-06-12 Thread Anish
Hi Harry,
>Any hints highly appreciated!
In my setup, I have dual port Intel GigE, one assigned to host and other
one is used by guest

root@svmhost:~ # pciconf -l |grep ppt

...

ppt2@pci0:2:0:0: class=0x02 card=0x125e8086 chip=0x105e8086 rev=0x06
hdr=0x00

root@svmhost:~ #


This show up as 'em0' in BSD guest with 2 vcpus.

root@bsdguest:~ # pciconf -l |grep em0

em0@pci0:0:21:0:class=0x02 card=0x125e8086 chip=0x105e8086
rev=0x06 hdr=0x00

root@bsdguest:~ #

Once guest is booted, ppt2 will claim interrupt resources, in this case
just 1 interrupt line #265

root@svmhost:~ # vmstat -i

interrupt  total   rate

...

irq263: em0:irq0 1028705634

..

irq265: ppt2 1041187641

Total2835121   1747

Now use cpuset to route IRQ#265 to say core 0

$cpuset -l 0 -x 265

Again use cpuset to force VM[PID 1222] to run on all core except #0

root@svmhost:~ # ps

 PID TT  STATTIME COMMAND



1222  1  I+   5:59.45 bhyve: vm1 (bhyve)


VM can run on all cores except #0.

$ cpuset -l 1-3 -p 1222


You can monitor guest due to interrupts using

$root@svmhost:~ # bhyvectl --get-stats --vm= --cpu= | grep
external

vm exits due to external interrupt  27273

root@svmhost:~ #


Regards,

Anish

On Sun, Jun 11, 2017 at 2:51 AM, Harry Schmalzbauer 
wrote:

>  Bezüglich Harry Schmalzbauer's Nachricht vom 09.06.2017 10:22 (localtime):
> > Bezüglich Anish's Nachricht vom 08.06.2017 14:35 (localtime):
> >> Hi Harry,
> >>> I thought I'd save these expensive VM_Exits by using the passthru path.
> >> Completely wrong, is it?
> >>
> >> It depends on which processor you are using. For example APICv was
> >> introduced in IvyBridge which enabled h/w assisted localAPIC rather than
> >> using s/w emulated, bhyve supports it on Intel processors.
> …
> > I'm still usign IvyBridge (E3v2) with this "new" machine, but haven't
> > ever heard/thought about APCIv!
>
> It seems APICv is available on IvyBridge-EP (Xeon E5/E7v2) only, not for
> E3v2 :-(
> Furthermore, if I didn't miss anything in the datasheets, no currently
> available E3 Xeon offers local APIC virtualization. Can somebody of the
> xperts confirm that?
>
>
> …
> >> Can you run a simple experiment, assign pptdev interrupts to core that's
> >> not running guest/vcpu? This will reduce #VMEXIT on vcpu which we know
> >> is expensive.
> > Interesting approach.  But I have no idea how I should assign a PCIe
> > specific core to a PCIe dev.  Is it pptdev specific? The tunables in
> > device.hints(5) can't be used for that, can they?
>
> I wasn't able to find out how to do that.
> Any hints highly appreciated!
>
> -harry
>
>
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: PCIe passthrough really that expensive?

2017-06-11 Thread Harry Schmalzbauer
 Bezüglich Harry Schmalzbauer's Nachricht vom 09.06.2017 10:22 (localtime):
> Bezüglich Anish's Nachricht vom 08.06.2017 14:35 (localtime):
>> Hi Harry,
>>> I thought I'd save these expensive VM_Exits by using the passthru path.
>> Completely wrong, is it?
>>
>> It depends on which processor you are using. For example APICv was
>> introduced in IvyBridge which enabled h/w assisted localAPIC rather than
>> using s/w emulated, bhyve supports it on Intel processors. 
…
> I'm still usign IvyBridge (E3v2) with this "new" machine, but haven't
> ever heard/thought about APCIv!

It seems APICv is available on IvyBridge-EP (Xeon E5/E7v2) only, not for
E3v2 :-(
Furthermore, if I didn't miss anything in the datasheets, no currently
available E3 Xeon offers local APIC virtualization. Can somebody of the
xperts confirm that?


…
>> Can you run a simple experiment, assign pptdev interrupts to core that's
>> not running guest/vcpu? This will reduce #VMEXIT on vcpu which we know
>> is expensive.
> Interesting approach.  But I have no idea how I should assign a PCIe
> specific core to a PCIe dev.  Is it pptdev specific? The tunables in
> device.hints(5) can't be used for that, can they?

I wasn't able to find out how to do that.
Any hints highly appreciated!

-harry

___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: PCIe passthrough really that expensive?

2017-06-09 Thread Harry Schmalzbauer
Bezüglich Anish's Nachricht vom 08.06.2017 14:35 (localtime):
> Hi Harry,
>>I thought I'd save these expensive VM_Exits by using the passthru path.
> Completely wrong, is it?
> 
> It depends on which processor you are using. For example APICv was
> introduced in IvyBridge which enabled h/w assisted localAPIC rather than
> using s/w emulated, bhyve supports it on Intel processors. 
> 
> Intel Broadwell introduced PostedInterrupt which enabled interrupt to
> delivered to guest directly, bypassing hypervisor[2] for
> passthrough devices. Emulated devices interrupt will still go through
> hypervisor. 

That's very interesting, thanks so much!
I wasn't ware that there were post VT-c improvements, guess I'll have to
refresh my very basic knowledge urgently.
I'm still usign IvyBridge (E3v2) with this "new" machine, but haven't
ever heard/thought about APCIv!


> You can verify capability using sysctl hw.vmm.vmx. What processor you
> are using for these performance benchmarking?

hw.vmm.vmx.vpid_alloc_failed: 0
hw.vmm.vmx.posted_interrupt_vector: -1
hw.vmm.vmx.cap.posted_interrupts: 0
hw.vmm.vmx.cap.virtual_interrupt_delivery: 0
hw.vmm.vmx.cap.invpcid: 0
hw.vmm.vmx.cap.monitor_trap: 1
hw.vmm.vmx.cap.unrestricted_guest: 1
hw.vmm.vmx.cap.pause_exit: 1
hw.vmm.vmx.cap.halt_exit: 1
hw.vmm.vmx.initialized: 1
hw.vmm.vmx.cr4_zeros_mask: 18446744073708017664
hw.vmm.vmx.cr4_ones_mask: 8192
hw.vmm.vmx.cr0_zeros_mask: 18446744071025197056
hw.vmm.vmx.cr0_ones_mask: 3

I did very simply 'time cp' with 8GB files over NFSv4, which come from
ZFS-cache on the remote side and locally watching host+guest vmstat.


> Can you run a simple experiment, assign pptdev interrupts to core that's
> not running guest/vcpu? This will reduce #VMEXIT on vcpu which we know
> is expensive.

Interesting approach.  But I have no idea how I should assign a PCIe
specific core to a PCIe dev.  Is it pptdev specific? The tunables in
device.hints(5) can't be used for that, can they?
It seems pptdev(0) couldn't get a man page yet, but I'll have a look at
the sources, maybe I can find hints until earth has done it's job and
present you the same nice sunshine I'm enjoying today :-)

-harry
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Re: PCIe passthrough really that expensive?

2017-06-08 Thread Anish
Hi Harry,
>I thought I'd save these expensive VM_Exits by using the passthru path.
Completely wrong, is it?

It depends on which processor you are using. For example APICv was
introduced in IvyBridge which enabled h/w assisted localAPIC rather than
using s/w emulated, bhyve supports it on Intel processors.

Intel Broadwell introduced PostedInterrupt which enabled interrupt to
delivered to guest directly, bypassing hypervisor[2] for
passthrough devices. Emulated devices interrupt will still go through
hypervisor.

You can verify capability using sysctl hw.vmm.vmx. What processor you are
using for these performance benchmarking?

Can you run a simple experiment, assign pptdev interrupts to core that's
not running guest/vcpu? This will reduce #VMEXIT on vcpu which we know is
expensive.

Regards,
Anish




On Wed, Jun 7, 2017 at 11:01 AM, Harry Schmalzbauer 
wrote:

>  Hello,
>
> some might have noticed my numerous posts recently, mainly in
> freebsd-net@, but all around the same story – replacing ESXi. So I hope
> nobody minds if I ask for help again to alleviate some of my knowledge
> deficiencies about PCIePassThrough.
> As last resort for special VMs, I always used to have dedicated NICs via
> PCIePassThrough.
> But with bhyve (besides other undiscovered strange side effects) I don't
> understand the results utilizing bhyve-passthru.
>
> Simple test: Copy iso image from NFSv4 mount via 1GbE (to null).
>
> Host, using if_em (hartwell): 4-8kirqs/s (8 @mtu 1500), system idle
> ~99-100%
> Passing this same hartwell devcie to the guest, running the identical
> FreeBSD version like the host, I see 2x8kirqs/s, MTU independent, and
> only 80%idle, while almost all cycles are spent in Sys (vmm).
> Running the same guest in if_bridge(4)-vtnet(4) or vale(4)-vtnet(4)
> deliver identical results: About 80% attainable throughput, only 80%
> idle cycles.
>
> So interrupts triggerd by PCI devices, which are controlled via
> bhyve-passthru, are as expensive as interrupts triggered by emulated
> devices?
> I thought I'd save these expensive VM_Exits by using the passthru path.
> Completely wrong, is it?
>
> I haven't ever done authoritative ESXi measures, but I remember that
> there was a significant saving using VMDirectPath. Big enough that I
> never felt the need for measuring. Is there any implementation
> difference? Some kind of intermediate interrupt moderation maybe?
>
> Thanks for any hints/links,
>
> -harry
> ___
> freebsd-virtualization@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
> To unsubscribe, send any mail to "freebsd-virtualization-
> unsubscr...@freebsd.org"
___
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"