Re: [PATCH 2/3] alpha: use common noop dma ops

2015-11-02 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 02:20:36PM +0100, Christian Borntraeger wrote:
> Some of the alpha pci noop dma ops are identical to the common ones.
> Use them.
> 
> Signed-off-by: Christian Borntraeger 
> ---
>  arch/alpha/kernel/pci-noop.c | 46 
> 
>  1 file changed, 4 insertions(+), 42 deletions(-)

Reviewed-by: Joerg Roedel 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCHv2 0/3] dma ops and virtio

2015-11-02 Thread Sebastian Ott
Hi,

On Fri, 30 Oct 2015, Christian Borntraeger wrote:

> here is the 2nd version of providing an DMA API for s390.
> 
> There are some attempts to unify the dma ops (Christoph) as well
> as some attempts to make virtio use the dma API (Andy).
> 
> At kernel summit we concluded that we want to use the same code on all
> platforms, whereever possible, so having a dummy dma_op might be the
> easiest solution to keep virtio-ccw as similar as possible to
> virtio-pci.Together with a fixed up patch set from Andy Lutomirski
> this seems to work.  
> 
> We will also need a fixup for powerc and QEMU changes to make virtio
> work with iommu on power and x86.
> 
> TODO:
> - future add-on patches to also fold in x86 no iommu
>   - dma_mask
>   - checking?
> - make compilation of dma-noop dependent on something
> 
> v1->v2:
> - initial testing
> - always use dma_noop_ops if device has no private dma_ops
> - get rid of setup in virtio_ccw,kvm_virtio
> - set CONFIG_HAS_DMA(ATTRS) for virtio (fixes compile for !PCI)
> - rename s390_dma_ops to s390_pci_dma_ops
> 
> Christian Borntraeger (3):
>   Provide simple noop dma ops
>   alpha: use common noop dma ops
>   s390/dma: Allow per device dma ops
> 
>  arch/alpha/kernel/pci-noop.c| 46 ++
>  arch/s390/Kconfig   |  3 +-
>  arch/s390/include/asm/device.h  |  6 ++-
>  arch/s390/include/asm/dma-mapping.h |  6 ++-
>  arch/s390/pci/pci.c |  1 +
>  arch/s390/pci/pci_dma.c |  4 +-
>  include/linux/dma-mapping.h |  2 +
>  lib/Makefile|  2 +-
>  lib/dma-noop.c  | 77 
> +
>  9 files changed, 98 insertions(+), 49 deletions(-)
>  create mode 100644 lib/dma-noop.c
> 
> -- 

I agree with these changes in principle. As long as we don't do MMIO
(writel and friends) on a dummy mapping we're fine.

Regards,
Sebastian

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] s390/dma: Allow per device dma ops

2015-11-02 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 02:20:37PM +0100, Christian Borntraeger wrote:
> As virtio-ccw now has dma ops, we can no longer default to the PCI ones.
> Make use of dev_archdata to keep the dma_ops per device. The pci devices
> now use that to override the default, and the default is changed to use
> the noop ops for everything that is not PCI. To compile without PCI
> support we also have to enable the DMA api with virtio.
> 
> Signed-off-by: Christian Borntraeger 
> ---
>  arch/s390/Kconfig   | 3 ++-
>  arch/s390/include/asm/device.h  | 6 +-
>  arch/s390/include/asm/dma-mapping.h | 6 --
>  arch/s390/pci/pci.c | 1 +
>  arch/s390/pci/pci_dma.c | 4 ++--
>  5 files changed, 14 insertions(+), 6 deletions(-)

Reviewed-by: Joerg Roedel 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] Provide simple noop dma ops

2015-11-02 Thread Christian Borntraeger
Am 02.11.2015 um 16:16 schrieb Joerg Roedel:
> On Fri, Oct 30, 2015 at 02:20:35PM +0100, Christian Borntraeger wrote:
>> +static void *dma_noop_alloc(struct device *dev, size_t size,
>> +dma_addr_t *dma_handle, gfp_t gfp,
>> +struct dma_attrs *attrs)
>> +{
>> +void *ret;
>> +
>> +ret = (void *)__get_free_pages(gfp, get_order(size));
>> +if (ret) {
>> +memset(ret, 0, size);
> 
> There is no need to zero out the memory here. If the user wants
> initialized memory it can call dma_zalloc_coherent. Having the memset
> here means to clear the memory twice in the dma_zalloc_coherent path.
> 
> Otherwise it looks good.

Thanks. Will fix.
In addition I will also make the compilation of dma-noop.o dependent 
on HAS_DMA.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next rfc V2 0/2] basic busy polling support for vhost_net

2015-11-02 Thread Jason Wang


On 10/30/2015 07:58 PM, Jason Wang wrote:
>
> On 10/29/2015 04:45 PM, Jason Wang wrote:
>> Hi all:
>>
>> This series tries to add basic busy polling for vhost net. The idea is
>> simple: at the end of tx processing, busy polling for new tx added
>> descriptor and rx receive socket for a while. The maximum number of
>> time (in us) could be spent on busy polling was specified through
>> module parameter.
>>
>> Test were done through:
>>
>> - 50 us as busy loop timeout
>> - Netperf 2.6
>> - Two machines with back to back connected mlx4
>> - Guest with 8 vcpus and 1 queue
>>
>> Result shows very huge improvement on both tx (at most 158%) and rr
>> (at most 53%) while rx is as much as in the past. Most cases the cpu
>> utilization is also improved:
>>
> Just notice there's something wrong in the setup. So the numbers are
> incorrect here. Will re-run and post correct number here.
>
> Sorry.

Here's the updated testing result:

1) 1 vcpu 1 queue:

TCP_RR
size/session/+thu%/+normalize%
1/ 1/0%/  -25%
1/50/  +12%/0%
1/   100/  +12%/   +1%
1/   200/   +9%/   -1%
   64/ 1/   +3%/  -21%
   64/50/   +8%/0%
   64/   100/   +7%/0%
   64/   200/   +9%/0%
  256/ 1/   +1%/  -25%
  256/50/   +7%/   -2%
  256/   100/   +6%/   -2%
  256/   200/   +4%/   -2%
  512/ 1/   +2%/  -19%
  512/50/   +5%/   -2%
  512/   100/   +3%/   -3%
  512/   200/   +6%/   -2%
 1024/ 1/   +2%/  -20%
 1024/50/   +3%/   -3%
 1024/   100/   +5%/   -3%
 1024/   200/   +4%/   -2%
Guest RX
size/session/+thu%/+normalize%
   64/ 1/   -4%/   -5%
   64/ 4/   -3%/  -10%
   64/ 8/   -3%/   -5%
  512/ 1/  +15%/   +1%
  512/ 4/   -5%/   -5%
  512/ 8/   -2%/   -4%
 1024/ 1/   -5%/  -16%
 1024/ 4/   -2%/   -5%
 1024/ 8/   -6%/   -6%
 2048/ 1/  +10%/   +5%
 2048/ 4/   -8%/   -4%
 2048/ 8/   -1%/   -4%
 4096/ 1/   -9%/  -11%
 4096/ 4/   +1%/   -1%
 4096/ 8/   +1%/0%
16384/ 1/  +20%/  +11%
16384/ 4/0%/   -3%
16384/ 8/   +1%/0%
65535/ 1/  +36%/  +13%
65535/ 4/  -10%/   -9%
65535/ 8/   -3%/   -2%
Guest TX
size/session/+thu%/+normalize%
   64/ 1/   -7%/  -16%
   64/ 4/  -14%/  -23%
   64/ 8/   -9%/  -20%
  512/ 1/  -62%/  -56%
  512/ 4/  -62%/  -56%
  512/ 8/  -61%/  -53%
 1024/ 1/  -66%/  -61%
 1024/ 4/  -77%/  -73%
 1024/ 8/  -73%/  -67%
 2048/ 1/  -74%/  -75%
 2048/ 4/  -77%/  -74%
 2048/ 8/  -72%/  -68%
 4096/ 1/  -65%/  -68%
 4096/ 4/  -66%/  -63%
 4096/ 8/  -62%/  -57%
16384/ 1/  -25%/  -28%
16384/ 4/  -28%/  -17%
16384/ 8/  -24%/  -10%
65535/ 1/  -17%/  -14%
65535/ 4/  -22%/   -5%
65535/ 8/  -25%/   -9%

- obvious improvement on TCP_RR (at most 12%)
- improvement on guest RX
- huge decreasing on Guest TX (at most -75%), this is probably because
virtio-net driver suffers from buffer bloat by orphaning skb before
transmission. The faster vhost it is, the smaller packet it could
produced. To reduce the impact on this, turning off gso in guest can
result the following result:

size/session/+thu%/+normalize%
   64/ 1/   +3%/  -11%
   64/ 4/   +4%/  -10%
   64/ 8/   +4%/  -10%
  512/ 1/   +2%/   +5%
  512/ 4/0%/   -1%
  512/ 8/0%/0%
 1024/ 1/  +11%/0%
 1024/ 4/0%/   -1%
 1024/ 8/   +3%/   +1%
 2048/ 1/   +4%/   -1%
 2048/ 4/   +8%/   +3%
 2048/ 8/0%/   -1%
 4096/ 1/   +4%/   -1%
 4096/ 4/   +1%/0%
 4096/ 8/   +2%/0%
16384/ 1/   +2%/   -2%
16384/ 4/   +3%/   +1%
16384/ 8/0%/   -1%
65535/ 1/   +9%/   +7%
65535/ 4/0%/   -3%
65535/ 8/   -1%/   -1%

2) 8 vcpus 1 queue:

TCP_RR
size/session/+thu%/+normalize%
1/ 1/   +5%/  -14%
1/50/   +2%/   +1%
1/   100/0%/   -1%
1/   200/0%/0%
   64/ 1/0%/  -25%
   64/50/   +5%/   +5%
   64/   100/0%/0%
   64/   200/0%/   -1%
  256/ 1/0%/  -30%
  256/50/0%/0%
  256/   100/   -2%/   -2%
  256/   200/0%/0%
  512/ 1/   +1%/  -23%
  512/50/   +1%/   +1%
  512/   100/   +1%/0%
  512/   200/   +1%/   +1%
 1024/ 1/   +1%/  -23%
 1024/50/   +5%/   +5%
 1024/   100/0%/   -1%
 1024/   200/0%/0%
Guest RX
size/session/+thu%/+normalize%
   64/ 1/   +1%/   +1%
   64/ 4/   -2%/   +1%
   64/ 8/   +6%/  +19%
  512/ 1/   +5%/   -7%
  512/ 4/   -4%/   -4%
  512/ 8/0%/0%
 1024/ 1/   +1%/   +2%
 1024/ 4/   -2%/   -2%
 1024/ 8/   -1%/   +7%
 2048/ 1/   +8%/   -2%
 2048/ 4/0%/   +5%
 2048/ 8/   -1%/  +13%
 4096/ 1/   -1%/   +2%
 4096/ 4/0%/   +6%
 4096/ 8/   -2%/  +15%
16384/ 1/   -1%/0%
16384/ 4/   -2%/   -1%
16384/ 8/   -2%/   +2%
65535/ 1/   -2%/0%
65535/ 4/   -3%/   -3%
65535/ 8/   -2%/   +2%
Guest TX
size/session/+thu%/+normalize%
   64/ 1/   +6%/   

Re: [PATCH] vhost: move is_le setup to the backend

2015-11-02 Thread David Miller
From: Greg Kurz 
Date: Fri, 30 Oct 2015 12:42:35 +0100

> The vq->is_le field is used to fix endianness when accessing the vring via
> the cpu_to_vhost16() and vhost16_to_cpu() helpers in the following cases:
> 
> 1) host is big endian and device is modern virtio
> 
> 2) host has cross-endian support and device is legacy virtio with a different
>endianness than the host
> 
> Both cases rely on the VHOST_SET_FEATURES ioctl, but 2) also needs the
> VHOST_SET_VRING_ENDIAN ioctl to be called by userspace. Since vq->is_le
> is only needed when the backend is active, it was decided to set it at
> backend start.
> 
> This is currently done in vhost_init_used()->vhost_init_is_le() but it
> obfuscates the core vhost code. This patch moves the is_le setup to a
> dedicated function that is called from the backend code.
> 
> Note vhost_net is the only backend that can pass vq->private_data == NULL to
> vhost_init_used(), hence the "if (sock)" branch.
> 
> No behaviour change.
> 
> Signed-off-by: Greg Kurz 

Michael, I'm assuming that you will be the one taking this.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/3] s390/dma: Allow per device dma ops

2015-11-02 Thread Sebastian Ott
On Fri, 30 Oct 2015, Christian Borntraeger wrote:

> As virtio-ccw now has dma ops, we can no longer default to the PCI ones.
> Make use of dev_archdata to keep the dma_ops per device. The pci devices
> now use that to override the default, and the default is changed to use
> the noop ops for everything that is not PCI. To compile without PCI
> support we also have to enable the DMA api with virtio.
> 
> Signed-off-by: Christian Borntraeger 

Acked-by: Sebastian Ott 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/7] Hyper-V Synthetic interrupt controller

2015-11-02 Thread Paolo Bonzini


On 26/10/2015 10:50, Andrey Smetanin wrote:
> Hyper-V SynIC (synthetic interrupt controller) device
> implementation.
> 
> The implementation contains:
> * msr's support
> * irq routing setup
> * irq injection
> * irq ack callback registration
> * event/message pages changes tracking at Hyper-V exit
> * Hyper-V test device to test SynIC by kvm-unit-tests
> 
> Andrey Smetanin (7):
>   standard-headers/x86: add Hyper-V SynIC constants
>   target-i386/kvm: Hyper-V SynIC MSR's support
>   linux-headers/kvm: add Hyper-V SynIC irq routing type and struct
>   kvm: Hyper-V SynIC irq routing support
>   linux-headers/kvm: KVM_EXIT_HYPERV type and struct
>   target-i386/hyperv: Hyper-V SynIC SINT routing and vCPU exit
>   hw/misc: Hyper-V test device 'hyperv-testdev'
> 
>  default-configs/i386-softmmu.mak  |   1 +
>  default-configs/x86_64-softmmu.mak|   1 +
>  hw/misc/Makefile.objs |   1 +
>  hw/misc/hyperv_testdev.c  | 164 
> ++
>  include/standard-headers/asm-x86/hyperv.h |  12 +++
>  include/sysemu/kvm.h  |   1 +
>  kvm-all.c |  33 ++
>  linux-headers/linux/kvm.h |  25 +
>  target-i386/Makefile.objs |   2 +-
>  target-i386/cpu-qom.h |   1 +
>  target-i386/cpu.c |   1 +
>  target-i386/cpu.h |   5 +
>  target-i386/hyperv.c  | 127 +++
>  target-i386/hyperv.h  |  42 
>  target-i386/kvm.c |  66 +++-
>  target-i386/machine.c |  39 +++
>  16 files changed, 519 insertions(+), 2 deletions(-)
>  create mode 100644 hw/misc/hyperv_testdev.c
>  create mode 100644 target-i386/hyperv.c
>  create mode 100644 target-i386/hyperv.h
> 
> Signed-off-by: Andrey Smetanin 
> Reviewed-by: Roman Kagan 
> Signed-off-by: Denis V. Lunev 
> CC: Vitaly Kuznetsov 
> CC: "K. Y. Srinivasan" 
> CC: Gleb Natapov 
> CC: Paolo Bonzini 
> CC: Roman Kagan 
> CC: Denis V. Lunev 
> CC: k...@vger.kernel.org
> CC: virtualization@lists.linux-foundation.org
> 

Reviewed-by: Paolo Bonzini 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/3] Provide simple noop dma ops

2015-11-02 Thread Joerg Roedel
On Fri, Oct 30, 2015 at 02:20:35PM +0100, Christian Borntraeger wrote:
> +static void *dma_noop_alloc(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, gfp_t gfp,
> + struct dma_attrs *attrs)
> +{
> + void *ret;
> +
> + ret = (void *)__get_free_pages(gfp, get_order(size));
> + if (ret) {
> + memset(ret, 0, size);

There is no need to zero out the memory here. If the user wants
initialized memory it can call dma_zalloc_coherent. Having the memset
here means to clear the memory twice in the dma_zalloc_coherent path.

Otherwise it looks good.


Joerg

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [kvm-unit-tests PATCH] x86: hyperv_synic: Hyper-V SynIC test

2015-11-02 Thread Paolo Bonzini


On 02/11/2015 13:18, Denis V. Lunev wrote:
>> I'm keeping the kernel patches queued for my own testing, but this of
>> course has to be fixed before including them---which will delay this
>> feature to 4.5, unfortunately.
> 
> well, the problem is that it actually uses auto EOI

Ok, no big deal.  We can just disable APICv when SynIC is enabled.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [kvm-unit-tests PATCH] x86: hyperv_synic: Hyper-V SynIC test

2015-11-02 Thread Denis V. Lunev

On 11/02/2015 03:16 PM, Paolo Bonzini wrote:

On 26/10/2015 10:56, Andrey Smetanin wrote:

Hyper-V SynIC is a Hyper-V synthetic interrupt controller.

The test runs on every vCPU and performs the following steps:
* read from all Hyper-V SynIC MSR's
* setup Hyper-V SynIC evt/msg pages
* setup SINT's routing
* inject SINT's into destination vCPU by 'hyperv-synic-test-device'
* wait for SINT's isr's completion
* clear Hyper-V SynIC evt/msg pages and destroy SINT's routing

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Vitaly Kuznetsov 
CC: "K. Y. Srinivasan" 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org
CC: virtualization@lists.linux-foundation.org

Bad news.

The test breaks with APICv, because of the following sequence of events:

1) non-auto-EOI interrupt 176 is injected into IRR and ISR

2) The PPR register is now 176

3) auto-EOI interrupt 179 is injected into IRR only, because (179 &
0xf0) <= (PPR & 0xf0)

4) interrupt 176 ISR performs an EOI

5) at this point, because virtual interrupt delivery is enabled, the
processor does not perform TPR virtualization (SDM 29.1.2).

In addition (and even worse) because virtual interrupt delivery is
enabled, an auto-EOI interrupt that was stashed in IRR can be injected
by the processor, and the auto-EOI behavior will be skipped.

The solution is to have userspace enable KVM_CAP_HYPERV_SYNIC through
KVM_ENABLE_CAP, and modify vmx.c to not use apicv on VMs that have it
enabled.  This requires some changes to the callbacks that only work if
enable_apicv or !enable_apicv:

if (enable_apicv)
kvm_x86_ops->update_cr8_intercept = NULL;
else {
kvm_x86_ops->hwapic_irr_update = NULL;
kvm_x86_ops->hwapic_isr_update = NULL;
kvm_x86_ops->deliver_posted_interrupt = NULL;
kvm_x86_ops->sync_pir_to_irr = vmx_sync_pir_to_irr_dummy;
}

The question then is... does Hyper-V actually use auto-EOI interrupts?
If it doesn't, we might as well not implement them... :/

I'm keeping the kernel patches queued for my own testing, but this of
course has to be fixed before including them---which will delay this
feature to 4.5, unfortunately.

Paolo


well, the problem is that it actually uses auto EOI

Den
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [kvm-unit-tests PATCH] x86: hyperv_synic: Hyper-V SynIC test

2015-11-02 Thread Roman Kagan
On Mon, Nov 02, 2015 at 01:16:02PM +0100, Paolo Bonzini wrote:
> On 26/10/2015 10:56, Andrey Smetanin wrote:
> > Hyper-V SynIC is a Hyper-V synthetic interrupt controller.
> > 
> > The test runs on every vCPU and performs the following steps:
> > * read from all Hyper-V SynIC MSR's
> > * setup Hyper-V SynIC evt/msg pages
> > * setup SINT's routing
> > * inject SINT's into destination vCPU by 'hyperv-synic-test-device'
> > * wait for SINT's isr's completion
> > * clear Hyper-V SynIC evt/msg pages and destroy SINT's routing
> > 
> > Signed-off-by: Andrey Smetanin 
> > Reviewed-by: Roman Kagan 
> > Signed-off-by: Denis V. Lunev 
> > CC: Vitaly Kuznetsov 
> > CC: "K. Y. Srinivasan" 
> > CC: Gleb Natapov 
> > CC: Paolo Bonzini 
> > CC: Roman Kagan 
> > CC: Denis V. Lunev 
> > CC: qemu-de...@nongnu.org
> > CC: virtualization@lists.linux-foundation.org
> 
> Bad news.
> 
> The test breaks with APICv, because of the following sequence of events:

Thanks for testing and analyzing this!

(... running around looking for an APICv-capable machine to be able to
catch this ourselves before we resubmit ...)

> The question then is... does Hyper-V actually use auto-EOI interrupts?
> If it doesn't, we might as well not implement them... :/

As Den wrote, we've yet to see a hyperv device which doesn't :(

Roman.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization