Re: [PATCH 0/3] Infinite loops in microcode while running guests

2015-11-10 Thread Paolo Bonzini


On 10/11/2015 16:38, Jan Kiszka wrote:
> On 2015-11-10 13:22, Paolo Bonzini wrote:
>> Yes, these can happen.  The issue is that benign exceptions are
>> delivered serially, but two of them (#DB and #AC) can also happen
>> during exception delivery itself.  The subsequent infinite stream
>> of exceptions causes the processor to never exit guest mode.
>>
>> Paolo
>>
>> Eric Northup (1):
>>   KVM: x86: work around infinite loop in microcode when #AC is delivered
>>
>> Paolo Bonzini (2):
>>   KVM: svm: unconditionally intercept #DB
>>   KVM: x86: rename update_db_bp_intercept to update_bp_intercept
>>
>>  arch/x86/include/asm/kvm_host.h |  2 +-
>>  arch/x86/include/uapi/asm/svm.h |  1 +
>>  arch/x86/kvm/svm.c  | 22 +++---
>>  arch/x86/kvm/vmx.c  |  7 +--
>>  arch/x86/kvm/x86.c  |  2 +-
>>  5 files changed, 19 insertions(+), 15 deletions(-)
>>
> 
> So this affects both Intel and AMD CPUs equally? Nice cross-vendor
> "compatibility".

Yes, it's for both.

> And it can only be triggered via #AC and #DB, or also other exceptions
> (that KVM already happens to intercept)?

Yes, these are the sole benign exceptions that can occur during
exception delivery.  All other benign exceptions only occur as the
result of executing instructions.

> You may guess why I'm asking...
> Is any of the issues already documented in a vendor errata?

No idea.  As far as I understood, processor engineers consider this to
be intended behavior (!) though they admit that the outcome for
virtualization is bad.

I don't have a reproducer for this yet (I have only tested the patches
by ensuring that they do not regress on the legal cases), but from what
I heard at least on Intel the #AC injection actually causes a failed
VM-entry...  Yet, nothing in the manual suggests _why_.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 06/18] README: add some CONTRIBUTING notes

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> From: Alex Bennée 
> 
> Signed-off-by: Alex Bennée 
> Reviewed-by: Andrew Jones 
> ---
>  README | 26 ++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/README b/README
> index 45587f2a97ec6..457bd797362cf 100644
> --- a/README
> +++ b/README
> @@ -41,3 +41,29 @@ Directory structure:
>  ./:the sources of the tests and the created objects/images
>  
>  See /README for architecture specific documentation.
> +
> +CONTRIBUTING:
> +=
> +
> +Style
> +-
> +
> +Currently there is a mix of indentation styles so any changes to
> +existing files should be consistent with the existing style. For new
> +files:
> +
> +  - C: please use standard linux-with-tabs
> +  - Shell: use TABs for indentation

Which should be changed :)

> +Patches
> +---
> +
> +Patches are welcome at the KVM mailing list .
> +
> +Please prefix messages with: [kvm-unit-tests PATCH]
> +
> +You can add the following to .git/config to do this automatically for you:
> +
> +[format]
> + subjectprefix = kvm-unit-tests PATCH
> +
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 08/18] run_tests: pass test name to run script

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> With this $TEST_DIR/run can output test specific error messages.
> 
> Signed-off-by: Andrew Jones 
> ---
>  run_tests.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/run_tests.sh b/run_tests.sh
> index ebb7e9fe6fdfc..80b87823c3358 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -46,7 +46,7 @@ function run()
>  fi
>  done
>  
> -cmdline="./$TEST_DIR-run $kernel -smp $smp $opts"
> +cmdline="TESTNAME=$testname ./$TEST_DIR-run $kernel -smp $smp $opts"
>  if [ $verbose != 0 ]; then
>  echo $cmdline
>  fi
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] target-i386: calculate vcpu's TSC rate to be migrated

2015-11-10 Thread Haozhong Zhang
On 11/09/15 14:01, Eduardo Habkost wrote:
> On Mon, Nov 09, 2015 at 08:33:55AM +0800, haozhong.zh...@intel.com wrote:
> > On 11/06/15 13:12, Eduardo Habkost wrote:
> > > On Fri, Nov 06, 2015 at 10:32:24AM +0800, haozhong.zh...@intel.com wrote:
> > > > On 11/05/15 14:05, Eduardo Habkost wrote:
> > > > > On Thu, Nov 05, 2015 at 09:30:51AM +0800, Haozhong Zhang wrote:
> > > > > > On 11/04/15 19:42, Eduardo Habkost wrote:
> > > [...]
> > > > > > > > +env->tsc_khz_saved = r;
> > > > > > > > +}
> > > > > > > 
> > > > > > > Why do you need a separate tsc_khz_saved field, and don't simply 
> > > > > > > use
> > > > > > > tsc_khz? It would have the additional feature of letting QMP 
> > > > > > > clients
> > > > > > > query the current TSC rate by asking for the tsc-freq property on 
> > > > > > > CPU
> > > > > > > objects.
> > > > > > >
> > > > > > 
> > > > > > It's to avoid overriding env->tsc_khz on the destination in the
> > > > > > migration. I can change this line to
> > > > > >  env->tsc_khz = env->tsc_khz_saved = r;
> > > > > 
> > > > > You are already avoiding overriding env->tsc_khz, because you use
> > > > > KVM_GET_TSC_KHZ only if tsc_khz is not set yet. I still don't see why
> > > > > you need a tsc_khz_saved field that requires duplicating the 
> > > > > SET_TSC_KHZ
> > > > > code, if you could just do this:
> > > > > 
> > > > > if (!env->tsc_khz) {
> > > > > env->tsc_khz = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
> > > > > }
> > > > >
> > > > 
> > > > Consider an example that we migrate a VM from machine A to machine B
> > > > and then to machine C, and QEMU on machine B is launched with the cpu
> > > > option 'tsc-freq' (i.e. env->tsc_khz on B is non-zero at the
> > > > beginning):
> > > >  1) In the migration from B to C, the user-specified TSC frequency by
> > > > 'tsc-freq' on B is expected to be migrated to C. That is, the
> > > > value of env->tsc_khz on B is migrated.
> > > >  2) If TSC frequency is migrated through env->tsc_khz, then
> > > > env->tsc_khz on B will be overrode in the migration from A to B
> > > > before kvm_arch_setup_tsc_khz(). If the guest TSC frequency is
> > > > different than the user-specified TSC frequency on B, the
> > > > expectation in 1) will not be satisfied anymore.
> > > 
> > > Setting tsc-freq on B when tsc-freq was not used on A is invalid usage.
> > > This is not different from changing the CPU model and adding or removing
> > > CPU flags when migrating, which is also incorrect. The command-line
> > > parameters defining the VM must be the same when you migrate.
> > >
> > 
> > Good to know it's an invalid usage. Then the question is what QEMU is
> > expected to do for this invalid usage?
> > 
> >  1) Abort the migration? But I find that the current QEMU does not
> > abort the migration between different CPU models (e.g. Nehalem and
> > Haswell).
> > 
> >  2) Or do not abort the migration and ignore tsc-freq option? If so,
> > tsc_khz_saved will be not needed.
> 
> My first choice is to abort migration. If we decide to abort today and
> find it to cause problems, we can easily fix it. If we decide to
> continue without aborting, it is difficult to change that behavior
> without breaking existing setups.
>

Two additional questions:

 1) Existing QEMU allows 'tsc-freq' on the destination in the
migration. If we decided to abort when both 'tsc-freq' and
migrated TSC were present on the destination, it would break some
existing usages. Considering backward compatibility, would above
choice 2) be better?

 2) If we do decide to abort, could I use abort()? Or are there other
clean approaches to abort?

Haozhong

> -- 
> Eduardo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] MIPS: KVM: Fix CP0_EBASE redefined build warning

2015-11-10 Thread James Hogan
The patch "MIPS: CPS: Early debug using an ns16550-compatible UART" in
linux-next causes a build warning in locore.S by adding an identical
definition of CP0_EBASE in asm/mipsregs.h.

arch/mips/kvm/locore.S:41:0: warning: "CP0_EBASE" redefined
 #define CP0_EBASE   $15,1
 ^
In file included from ./arch/mips/include/asm/msa.h:13:0,
 from ./arch/mips/include/asm/asmmacro.h:13,
 from arch/mips/kvm/locore.S:13:
./arch/mips/include/asm/mipsregs.h:62:0: note: this is the location of the 
previous definition
 #define CP0_EBASE $15, 1
 ^

Remove the definition in locore.S and move a few of the other similar
definitions in asm/mipsregs.h too. CP0_INTCTL, CP0_SRSCTL, & CP0_SRSMAP
are unused so they're just dropped instead. CP0_DDATA_LO is left where
it is as I have patches to eliminate its use in locore.S and it
otherwise is unlikely to need to be used from assembly code.

Fixes: "MIPS: CPS: Early debug using an ns16550-compatible UART"
Signed-off-by: James Hogan 
Cc: Ralf Baechle 
Cc: Paul Burton 
Cc: Paolo Bonzini 
Cc: Gleb Natapov 
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
---
Ralf: Please can you take this patch, as the thing it fixes is in the
mips-for-linux-next branch.
---
 arch/mips/include/asm/mipsregs.h | 3 +++
 arch/mips/kvm/locore.S   | 8 
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/mips/include/asm/mipsregs.h b/arch/mips/include/asm/mipsregs.h
index e7c1e28438e0..e43aca183c99 100644
--- a/arch/mips/include/asm/mipsregs.h
+++ b/arch/mips/include/asm/mipsregs.h
@@ -50,6 +50,7 @@
 #define CP0_PAGEMASK $5
 #define CP0_WIRED $6
 #define CP0_INFO $7
+#define CP0_HWRENA $7, 0
 #define CP0_BADVADDR $8
 #define CP0_BADINSTR $8, 1
 #define CP0_COUNT $9
@@ -62,6 +63,8 @@
 #define CP0_EBASE $15, 1
 #define CP0_CMGCRBASE $15, 3
 #define CP0_CONFIG $16
+#define CP0_CONFIG3 $16, 3
+#define CP0_CONFIG5 $16, 5
 #define CP0_LLADDR $17
 #define CP0_WATCHLO $18
 #define CP0_WATCHHI $19
diff --git a/arch/mips/kvm/locore.S b/arch/mips/kvm/locore.S
index c567240386a0..7bab3a4e8f7d 100644
--- a/arch/mips/kvm/locore.S
+++ b/arch/mips/kvm/locore.S
@@ -36,14 +36,6 @@
 #define PT_HOST_USERLOCAL   PT_EPC
 
 #define CP0_DDATA_LO$28,3
-#define CP0_CONFIG3 $16,3
-#define CP0_CONFIG5 $16,5
-#define CP0_EBASE   $15,1
-
-#define CP0_INTCTL  $12,1
-#define CP0_SRSCTL  $12,2
-#define CP0_SRSMAP  $12,3
-#define CP0_HWRENA  $7,0
 
 /* Resume Flags */
 #define RESUME_FLAG_HOST(1<<1)  /* Resume host? */
-- 
2.4.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH v2 02/19] trivial: lib: fail hard on failed mallocs

2015-11-10 Thread Paolo Bonzini


On 09/11/2015 21:53, Andrew Jones wrote:
> It's pretty safe to not even bother checking for NULL when
> using malloc and friends, but if we do check, then fail
> hard.
> 
> Signed-off-by: Andrew Jones 
> ---
> v2: no code in asserts [Thomas Huth]
> 
>  lib/virtio-mmio.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/virtio-mmio.c b/lib/virtio-mmio.c
> index 043832299174e..5ccbd193a264a 100644
> --- a/lib/virtio-mmio.c
> +++ b/lib/virtio-mmio.c
> @@ -54,8 +54,7 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
> *vdev,
>  
>   vq = calloc(1, sizeof(*vq));
>   queue = memalign(PAGE_SIZE, VIRTIO_MMIO_QUEUE_SIZE_MIN);
> - if (!vq || !queue)
> - return NULL;
> + assert(vq && queue);
>  
>   writel(index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL);
>  
> @@ -162,8 +161,7 @@ static struct virtio_device *virtio_mmio_dt_bind(u32 
> devid)
>   return NULL;
>  
>   vm_dev = calloc(1, sizeof(*vm_dev));
> - if (!vm_dev)
> - return NULL;
> + assert(vm_dev != NULL);
>  
>   vm_dev->base = info.base;
>   vm_device_init(vm_dev);
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 01/18] makefiles: use bash

2015-11-10 Thread Paolo Bonzini


On 10/11/2015 17:37, Andrew Jones wrote:
> On Tue, Nov 10, 2015 at 05:22:41PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 06/11/2015 01:24, Andrew Jones wrote:
>>> Use bash in the makefiles, like we do in the scripts. Without
>>> this some platforms using dash fail to execute make targets
>>> that use bash-isms.
>>>
>>> Signed-off-by: Andrew Jones 
>>> ---
>>>  Makefile | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/Makefile b/Makefile
>>> index 0d5933474cd8c..3e60b4f8e4a57 100644
>>> --- a/Makefile
>>> +++ b/Makefile
>>> @@ -1,4 +1,6 @@
>>>  
>>> +SHELL := /bin/bash
>>> +
>>>  ifeq ($(wildcard config.mak),)
>>>  $(error run ./configure first. See ./configure -h)
>>>  endif
>>>
>>
>> Which bash-isms are actually present?
> 
> config/config-arm-common.mak has $(RM) $(TEST_DIR)/*.{o,flat,elf,map} ...
> 
> I could certainly change that one, and that may be the only one... But,
> we require bash for other scripts anyway, so I think requiring make to
> use it is reasonable.

One is enough to apply the patch. :)

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] kvm-all: PAGE_SIZE should be real host page size

2015-11-10 Thread Peter Maydell
On 10 November 2015 at 16:59, Andrew Jones  wrote:
> On Tue, Nov 10, 2015 at 04:29:31PM +, Peter Maydell wrote:
>> On 10 November 2015 at 00:23, Andrew Jones  wrote:
>> > -/* KVM uses PAGE_SIZE in its definition of COALESCED_MMIO_MAX */
>> > -#define PAGE_SIZE TARGET_PAGE_SIZE
>> > +/* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
>> > + * need to use the real host PAGE_SIZE, as that's what KVM will use.
>> > + */
>> > +#define PAGE_SIZE getpagesize()
>>
>> Rather than defining PAGE_SIZE here (a confusing macro given
>> we have several page sizes to deal with), why not just use
>> getpagesize() in the one and only location where we currently
>> use this macro?
>
> The macro is used by kernel headers that we import and include in
> kvm-all.c. It's ugly, I agree, but that's how the this cookie crumbled.

Oh, I see. That's pretty horrible.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-all: PAGE_SIZE should be real host page size

2015-11-10 Thread Paolo Bonzini


On 10/11/2015 01:23, Andrew Jones wrote:
> Just noticed this while grepping TARGET_PAGE_SIZE for an unrelated
> reason. I didn't use qemu_real_host_page_size as kvm_set_phys_mem()
> does, because we'd need to make sure page_size_init() has run first.
> 
> Signed-off-by: Andrew Jones 
> ---
>  kvm-all.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kvm-all.c b/kvm-all.c
> index 1bc12737723c3..de9ff5971fb3b 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -45,8 +45,10 @@
>  #include 
>  #endif
>  
> -/* KVM uses PAGE_SIZE in its definition of COALESCED_MMIO_MAX */
> -#define PAGE_SIZE TARGET_PAGE_SIZE
> +/* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
> + * need to use the real host PAGE_SIZE, as that's what KVM will use.
> + */
> +#define PAGE_SIZE getpagesize()
>  
>  //#define DEBUG_KVM
>  
> 

Is this a bugfix or just a cleanup?  If the former, on which targets?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-11-10 Thread Michael S. Tsirkin
On Sun, Nov 08, 2015 at 12:49:46PM +0100, Joerg Roedel wrote:
> On Sun, Nov 08, 2015 at 12:37:47PM +0200, Michael S. Tsirkin wrote:
> > I have no problem with that. For example, can we teach
> > the DMA API on intel x86 to use PT for virtio by default?
> > That would allow merging Andy's patches with
> > full compatibility with old guests and hosts.
> 
> Well, the only incompatibility comes from an experimental qemu feature,
> more explicitly from a bug in that features implementation. So why
> should we work around that in the kernel? I think it is not too hard to
> fix qemu to generate a correct DMAR table which excludes the virtio
> devices from iommu translation.
> 
> 
>   Joerg

It's not that easy - you'd have to dedicate some buses
for iommu bypass, and teach management tools to only put
virtio there - but it's possible.

This will absolutely address guests that don't need to set up IOMMU for
virtio devices, and virtio that bypasses the IOMMU.

But the problem is that we do want to *allow* guests
to set up IOMMU for virtio devices.
In that case, these are two other usecases:

A- monolitic virtio within QEMU:
iommu only needed for VFIO ->
guest should always use iommu=pt
iommu=on works but is just useless overhead.

B- modular out of process virtio outside QEMU:
iommu needed for VFIO or kernel driver ->
guest should use iommu=pt or iommu=on
depending on security/performance requirements

Note that there could easily be a mix of these in the same system.

So for these cases we do need QEMU to specify to guest that IOMMU covers
the virtio devices.  Also, once one does this, the default on linux is
iommu=on and not pt, which works but ATM is very slow.

This poses three problems:

1. How do we address the different needs of A and B?
   One way would be for virtio to pass the information to guest
   using some virtio specific way, and have drivers
   specify what kind of DMA access they want.

2. (Kind of a subset of 1) once we do allow IOMMU, how do we make sure most 
guests
   use the more sensible iommu=pt.

3. Once we do allow IOMMU, how can we keep existing guests work in this 
configuration?
   Creating different hypervisor configurations depending on guest is very 
nasty.
   Again, one way would be some virtio specific interface.

I'd rather we figured the answers to this before merging Andy's patches
because I'm concerned that instead of 1 broken configuration
(virtio always bypasses IOMMU) we'll get two bad configurations
(in the second one, virtio uses the slow default with no
gain in security).

Suggestions wellcome.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Infinite loops in microcode while running guests

2015-11-10 Thread Jan Kiszka
On 2015-11-10 13:22, Paolo Bonzini wrote:
> Yes, these can happen.  The issue is that benign exceptions are
> delivered serially, but two of them (#DB and #AC) can also happen
> during exception delivery itself.  The subsequent infinite stream
> of exceptions causes the processor to never exit guest mode.
> 
> Paolo
> 
> Eric Northup (1):
>   KVM: x86: work around infinite loop in microcode when #AC is delivered
> 
> Paolo Bonzini (2):
>   KVM: svm: unconditionally intercept #DB
>   KVM: x86: rename update_db_bp_intercept to update_bp_intercept
> 
>  arch/x86/include/asm/kvm_host.h |  2 +-
>  arch/x86/include/uapi/asm/svm.h |  1 +
>  arch/x86/kvm/svm.c  | 22 +++---
>  arch/x86/kvm/vmx.c  |  7 +--
>  arch/x86/kvm/x86.c  |  2 +-
>  5 files changed, 19 insertions(+), 15 deletions(-)
> 

So this affects both Intel and AMD CPUs equally? Nice cross-vendor
"compatibility".

And it can only be triggered via #AC and #DB, or also other exceptions
(that KVM already happens to intercept)? You may guess why I'm asking...

Is any of the issues already documented in a vendor errata?

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 07/18] configure: emit HOST=$host to config.mak

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> From: Alex Bennée 
> 
> This is useful information for the run scripts to know, especially if
> they want to drop to using TCG.
> 
> Signed-off-by: Alex Bennée 
> Reviewed-by: Andrew Jones 
> ---
>  configure | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/configure b/configure
> index b2ad32a3e3a52..078b70ce096a6 100755
> --- a/configure
> +++ b/configure
> @@ -7,6 +7,7 @@ ld=ld
>  objcopy=objcopy
>  ar=ar
>  arch=`uname -m | sed -e s/i.86/i386/ | sed -e 's/arm.*/arm/'`
> +host=$arch
>  cross_prefix=
>  
>  usage() {
> @@ -122,6 +123,7 @@ ln -s $asm lib/asm
>  cat < config.mak
>  PREFIX=$prefix
>  KERNELDIR=$(readlink -f $kerneldir)
> +HOST=$host
>  ARCH=$arch
>  ARCH_NAME=$arch_name
>  PROCESSOR=$processor
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 13/18] arm: Fail on unknown subtest

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> From: Christopher Covington 
> 
> Signed-off-by: Christopher Covington 
> Reviewed-by: Andrew Jones 
> ---
>  arm/selftest.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arm/selftest.c b/arm/selftest.c
> index fc9ec609d875e..f4a503079e464 100644
> --- a/arm/selftest.c
> +++ b/arm/selftest.c
> @@ -376,6 +376,9 @@ int main(int argc, char **argv)
>   cpumask_set_cpu(0, _reported);
>   while (!cpumask_full(_reported))
>   cpu_relax();
> + } else {
> + printf("Unknown subtest\n");
> + abort();
>   }
>  
>   return report_summary();
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 12/18] lib/arm: add flush_tlb_page mmu function

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> From: Alex Bennée 
> 
> This introduces a new flush_tlb_page function which does exactly what
> you expect. It's going to be useful for the future TLB torture test.
> 
> Signed-off-by: Alex Bennée 
> Reviewed-by: Andrew Jones 
> ---
>  lib/arm/asm/mmu.h   | 11 +++
>  lib/arm64/asm/mmu.h |  8 
>  2 files changed, 19 insertions(+)
> 
> diff --git a/lib/arm/asm/mmu.h b/lib/arm/asm/mmu.h
> index c1bd01c9ee1b9..2bb0cde820f8a 100644
> --- a/lib/arm/asm/mmu.h
> +++ b/lib/arm/asm/mmu.h
> @@ -14,8 +14,11 @@
>  #define PTE_AF   PTE_EXT_AF
>  #define PTE_WBWA L_PTE_MT_WRITEALLOC
>  
> +/* See B3.18.7 TLB maintenance operations */
> +
>  static inline void local_flush_tlb_all(void)
>  {
> + /* TLBIALL */
>   asm volatile("mcr p15, 0, %0, c8, c7, 0" :: "r" (0));
>   dsb();
>   isb();
> @@ -27,6 +30,14 @@ static inline void flush_tlb_all(void)
>   local_flush_tlb_all();
>  }
>  
> +static inline void flush_tlb_page(unsigned long vaddr)
> +{
> + /* TLBIMVAA */
> + asm volatile("mcr p15, 0, %0, c8, c7, 3" :: "r" (vaddr));
> + dsb();
> + isb();
> +}
> +
>  #include 
>  
>  #endif /* __ASMARM_MMU_H_ */
> diff --git a/lib/arm64/asm/mmu.h b/lib/arm64/asm/mmu.h
> index 18b4d6be18fae..3bc31c91c36f8 100644
> --- a/lib/arm64/asm/mmu.h
> +++ b/lib/arm64/asm/mmu.h
> @@ -19,6 +19,14 @@ static inline void flush_tlb_all(void)
>   isb();
>  }
>  
> +static inline void flush_tlb_page(unsigned long vaddr)
> +{
> + unsigned long page = vaddr >> 12;
> + dsb(ishst);
> + asm("tlbi   vaae1is, %0" :: "r" (page));
> + dsb(ish);
> +}
> +
>  #include 
>  
>  #endif /* __ASMARM64_MMU_H_ */
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 11/18] lib/printf: support the %u unsigned fmt field

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> From: Alex Bennée 
> 
> Signed-off-by: Alex Bennée 
> Reviewed-by: Andrew Jones 
> ---
>  lib/printf.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/lib/printf.c b/lib/printf.c
> index 89308fb26b7d2..5d83605afe829 100644
> --- a/lib/printf.c
> +++ b/lib/printf.c
> @@ -180,6 +180,19 @@ int vsnprintf(char *buf, int size, const char *fmt, 
> va_list va)
>   break;
>   }
>   break;
> + case 'u':
> + switch (nlong) {
> + case 0:
> + print_unsigned(, va_arg(va, unsigned), 10, props);
> + break;
> + case 1:
> + print_unsigned(, va_arg(va, unsigned long), 10, props);
> + break;
> + default:
> + print_unsigned(, va_arg(va, unsigned long long), 10, props);
> + break;
> + }
> + break;
>   case 'x':
>   switch (nlong) {
>   case 0:
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-all: PAGE_SIZE should be real host page size

2015-11-10 Thread Paolo Bonzini


On 10/11/2015 16:57, Andrew Jones wrote:
> On Tue, Nov 10, 2015 at 04:41:16PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 10/11/2015 01:23, Andrew Jones wrote:
>>> Just noticed this while grepping TARGET_PAGE_SIZE for an unrelated
>>> reason. I didn't use qemu_real_host_page_size as kvm_set_phys_mem()
>>> does, because we'd need to make sure page_size_init() has run first.
>>>
>>> Signed-off-by: Andrew Jones 
>>> ---
>>>  kvm-all.c | 6 --
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kvm-all.c b/kvm-all.c
>>> index 1bc12737723c3..de9ff5971fb3b 100644
>>> --- a/kvm-all.c
>>> +++ b/kvm-all.c
>>> @@ -45,8 +45,10 @@
>>>  #include 
>>>  #endif
>>>  
>>> -/* KVM uses PAGE_SIZE in its definition of COALESCED_MMIO_MAX */
>>> -#define PAGE_SIZE TARGET_PAGE_SIZE
>>> +/* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
>>> + * need to use the real host PAGE_SIZE, as that's what KVM will use.
>>> + */
>>> +#define PAGE_SIZE getpagesize()
>>>  
>>>  //#define DEBUG_KVM
>>>  
>>>
>>
>> Is this a bugfix or just a cleanup?  If the former, on which targets?
> 
> It's a bugfix for any targets that have a TARGET_PAGE_SIZE !=
> real-host-page-size. For example ARM has TARGET_PAGE_SIZE set to 1024,
> even when the host is using 4k or 64k pages. However, I didn't find this
> due to a bug, because on ARM I'm not using emulated devices that make
> use of the coalesced-mmio feature at this time.
> 
> Thanks,
> drew
> 
>>
>> Paolo
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ok, applied (2.6 only unless I gather more urgent patches).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 00/18] bunch of mostly trivial patches

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> Many of these patches were posted once. Some weren't, but anyway
> almost everything is pretty trivial. I'd like to get these in, or
> at least get definitive nacks on them (and then drop them) in order
> to clean my queue before more patches (coming from Alex Bennée and
> Chistopher are reposted).
> 
> All patches also available here
> https://github.com/rhdrjones/kvm-unit-tests/commits/queue

I applied all of these except 1 (question asked) and 14/15/16/17 (not
sure I like the idea).

Paolo

> Thanks,
> drew
> 
> 
> Alex Bennée (4):
>   README: add some CONTRIBUTING notes
>   configure: emit HOST=$host to config.mak
>   lib/printf: support the %u unsigned fmt field
>   lib/arm: add flush_tlb_page mmu function
> 
> Andrew Jones (13):
>   makefiles: use bash
>   trivial: lib: fail hard on failed mallocs
>   trivial: alloc: don't use 'top' outside spinlock
>   trivial: lib: missing extern in string.h
>   README: add pointer to new wiki page
>   run_tests: pass test name to run script
>   arm/run: use ACCEL to choose between kvm and tcg
>   run_tests: probe for max-smp
>   arm/arm64: allow building a single test
>   arm/arm64: generate map files
>   lib: link in linux kernel headers (uapi)
>   Revert "arm/arm64: import include/uapi/linux/psci.h"
>   arm/arm64: uart0_init: check /chosen/stdout-path
> 
> Christopher Covington (1):
>   arm: Fail on unknown subtest
> 
>  .gitignore   |  2 ++
>  Makefile |  6 ++--
>  README   | 32 +++
>  arm/run  | 43 ++
>  arm/selftest.c   |  3 ++
>  arm/unittests.cfg|  7 +++--
>  config/config-arm-common.mak |  9 +-
>  configure| 11 +++
>  lib/alloc.c  |  8 +++--
>  lib/arm/asm/mmu.h| 11 +++
>  lib/arm/asm/page.h   |  2 +-
>  lib/arm/asm/psci.h   |  2 +-
>  lib/arm/asm/uapi-psci.h  | 73 
> 
>  lib/arm/io.c | 36 --
>  lib/arm64/asm/mmu.h  |  8 +
>  lib/arm64/asm/page.h |  2 +-
>  lib/arm64/asm/psci.h |  2 +-
>  lib/arm64/asm/uapi-psci.h|  1 -
>  lib/asm-generic/page.h   |  2 +-
>  lib/const.h  | 11 ---
>  lib/printf.c | 13 
>  lib/string.h |  2 +-
>  lib/virtio-mmio.c|  7 ++---
>  run_tests.sh | 12 +++-
>  scripts/functions.bash   |  8 +++--
>  scripts/mkstandalone.sh  | 22 ++---
>  x86/unittests.cfg|  1 +
>  27 files changed, 210 insertions(+), 126 deletions(-)
>  delete mode 100644 lib/arm/asm/uapi-psci.h
>  delete mode 100644 lib/arm64/asm/uapi-psci.h
>  delete mode 100644 lib/const.h
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 05/18] README: add pointer to new wiki page

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> Signed-off-by: Andrew Jones 
> ---
>  README | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/README b/README
> index eab5ea28f7fab..45587f2a97ec6 100644
> --- a/README
> +++ b/README
> @@ -1,3 +1,9 @@
> +Welcome to kvm-unit-tests
> +
> +See http://www.linux-kvm.org/page/KVM-unit-tests for a high-level
> +description of this project, as well as running tests and adding
> +tests HOWTOs.
> +
>  This directory contains sources for a kvm test suite.
>  
>  To create the test images do
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 03/18] trivial: alloc: don't use 'top' outside spinlock

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> This is a fix just due to being too much of a type-A person.
> I noticed the issue while reading over the function, and
> decided to fix it, even though it's unlikely to be a problem
> ever because top is read-mostly (like written once, then only
> read, type of mostly).
> 
> Signed-off-by: Andrew Jones 
> ---
>  lib/alloc.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/alloc.c b/lib/alloc.c
> index ad6761430c965..34f71a337d868 100644
> --- a/lib/alloc.c
> +++ b/lib/alloc.c
> @@ -61,15 +61,17 @@ static phys_addr_t phys_alloc_aligned_safe(phys_addr_t 
> size,
>  {
>   static bool warned = false;
>   phys_addr_t addr, size_orig = size;
> - u64 top_safe = top;
> + u64 top_safe;
> +
> + spin_lock();
> +
> + top_safe = top;
>  
>   if (safe && sizeof(long) == 4)
>   top_safe = MIN(top, 1ULL << 32);
>  
>   align = MAX(align, align_min);
>  
> - spin_lock();
> -
>   addr = ALIGN(base, align);
>   size += addr - base;
>  
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 04/18] trivial: lib: missing extern in string.h

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> Signed-off-by: Andrew Jones 
> ---
>  lib/string.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/string.h b/lib/string.h
> index 7820db86ee4e0..4e24f54d9e231 100644
> --- a/lib/string.h
> +++ b/lib/string.h
> @@ -6,7 +6,7 @@ extern char *strcat(char *dest, const char *src);
>  extern char *strcpy(char *dest, const char *src);
>  extern int strcmp(const char *a, const char *b);
>  extern char *strchr(const char *s, int c);
> -char *strstr(const char *haystack, const char *needle);
> +extern char *strstr(const char *haystack, const char *needle);
>  extern void *memset(void *s, int c, size_t n);
>  extern void *memcpy(void *dest, const void *src, size_t n);
>  extern int memcmp(const void *s1, const void *s2, size_t n);
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 18/18] arm/arm64: uart0_init: check /chosen/stdout-path

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> Arguably all of uart0_init() is unnecessary, as we're pretty sure
> that the address we initialize uart0_base to is correct. We go
> through the motions of finding the uart anyway though, because it's
> easy. It's also easy to check chosen/stdout-path first, so let's do
> that too. But, just to make all this stuff is a little less unnecessary,
> let's add a warning when we do actually find an address that doesn't
> match our initializer.
> 
> Signed-off-by: Andrew Jones 
> ---
>  lib/arm/io.c | 36 +++-
>  1 file changed, 27 insertions(+), 9 deletions(-)
> 
> diff --git a/lib/arm/io.c b/lib/arm/io.c
> index 8b1501886736a..a08d394e4aa1c 100644
> --- a/lib/arm/io.c
> +++ b/lib/arm/io.c
> @@ -19,12 +19,14 @@ extern void halt(int code);
>  /*
>   * Use this guess for the pl011 base in order to make an attempt at
>   * having earlier printf support. We'll overwrite it with the real
> - * base address that we read from the device tree later.
> + * base address that we read from the device tree later. This is
> + * the address we expect QEMU's mach-virt machine type to put in
> + * its generated device tree.
>   */
> -#define QEMU_MACH_VIRT_PL011_BASE 0x0900UL
> +#define UART_EARLY_BASE 0x0900UL
>  
>  static struct spinlock uart_lock;
> -static volatile u8 *uart0_base = (u8 *)QEMU_MACH_VIRT_PL011_BASE;
> +static volatile u8 *uart0_base = (u8 *)UART_EARLY_BASE;
>  
>  static void uart0_init(void)
>  {
> @@ -32,16 +34,32 @@ static void uart0_init(void)
>   struct dt_pbus_reg base;
>   int ret;
>  
> - ret = dt_pbus_get_base_compatible(compatible, );
> - assert(ret == 0 || ret == -FDT_ERR_NOTFOUND);
> + ret = dt_get_default_console_node();
> + assert(ret >= 0 || ret == -FDT_ERR_NOTFOUND);
>  
> - if (ret) {
> - printf("%s: %s not found in the device tree, aborting...\n",
> - __func__, compatible);
> - abort();
> + if (ret == -FDT_ERR_NOTFOUND) {
> +
> + ret = dt_pbus_get_base_compatible(compatible, );
> + assert(ret == 0 || ret == -FDT_ERR_NOTFOUND);
> +
> + if (ret) {
> + printf("%s: %s not found in the device tree, "
> + "aborting...\n",
> + __func__, compatible);
> + abort();
> + }
> +
> + } else {
> + assert(dt_pbus_translate_node(ret, 0, ) == 0);
>   }
>  
>   uart0_base = ioremap(base.addr, base.size);
> +
> + if (uart0_base != (u8 *)UART_EARLY_BASE) {
> + printf("WARNING: early print support may not work. "
> +"Found uart at %p, but early base is %p.\n",
> + uart0_base, (u8 *)UART_EARLY_BASE);
> + }
>  }
>  
>  void io_init(void)
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 10/18] run_tests: probe for max-smp

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> KVM can be configured to only support a few vcpus. ARM and AArch64
> currently have a default config of only 4. While it's nice to be
> able to write tests that use the maximum recommended, nr-host-cpus,
> we can't assume that nr-host-cpus == kvm-max-vcpus. This patch allows
> one to put $MAX_SMP in the smp =  line of a unittests.cfg file.
> That variable will then expand to the number of host cpus, or to the
> maximum vcpus allowed by KVM.
> 
> [Inspired by a patch from Alex Bennée solving the same issue.]
> 
> Signed-off-by: Andrew Jones 
> ---
>  arm/unittests.cfg   | 3 ++-
>  run_tests.sh| 9 +
>  scripts/mkstandalone.sh | 9 -
>  x86/unittests.cfg   | 1 +
>  4 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index 243c13301811b..5e26da1a8c1bc 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -2,6 +2,7 @@
>  # [unittest_name]
>  # file = foo.flat # Name of the flat file to be used
>  # smp  = 2# Number of processors the VM will use during this test
> +# # Use $MAX_SMP to use the maximum the host supports.
>  # extra_params = -append  # Additional parameters used
>  # arch = arm|arm64   # Only if test case is specific to one
>  # groups = group1 group2 # Used to identify test cases with run_tests -g ...
> @@ -34,6 +35,6 @@ groups = selftest
>  # Test SMP support
>  [selftest-smp]
>  file = selftest.flat
> -smp = `getconf _NPROCESSORS_CONF`
> +smp = $MAX_SMP
>  extra_params = -append 'smp'
>  groups = selftest
> diff --git a/run_tests.sh b/run_tests.sh
> index b1b4c541ecaea..fad22a935b007 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -98,4 +98,13 @@ while getopts "g:hv" opt; do
>  esac
>  done
>  
> +#
> +# Probe for MAX_SMP
> +#
> +MAX_SMP=$(getconf _NPROCESSORS_CONF)
> +while ./$TEST_DIR-run _NO_FILE_4Uhere_ -smp $MAX_SMP \
> + |& grep -q 'exceeds max cpus'; do
> + ((--MAX_SMP))
> +done
> +
>  for_each_unittest $config run
> diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
> index 0c39451e538c9..3ce244aff67b9 100755
> --- a/scripts/mkstandalone.sh
> +++ b/scripts/mkstandalone.sh
> @@ -95,12 +95,19 @@ qemu="$qemu"
>  if [ "\$QEMU" ]; then
>   qemu="\$QEMU"
>  fi
> +
> +MAX_SMP="MAX_SMP"
>  echo \$qemu $cmdline -smp $smp $opts
>  
>  cmdline="\`echo '$cmdline' | sed s%$kernel%_NO_FILE_4Uhere_%\`"
>  if \$qemu \$cmdline 2>&1 | grep 'No accelerator found'; then
> -ret=2
> + ret=2
>  else
> + MAX_SMP=\`getconf _NPROCESSORS_CONF\`
> + while \$qemu \$cmdline -smp \$MAX_SMP 2>&1 | grep 'exceeds max cpus' > 
> /dev/null; do
> + MAX_SMP=\`expr \$MAX_SMP - 1\`
> + done
> +
>   cmdline="\`echo '$cmdline' | sed s%$kernel%\$bin%\`"
>   \$qemu \$cmdline -smp $smp $opts
>   ret=\$?
> diff --git a/x86/unittests.cfg b/x86/unittests.cfg
> index a38544f77c056..337cc19d3d19d 100644
> --- a/x86/unittests.cfg
> +++ b/x86/unittests.cfg
> @@ -2,6 +2,7 @@
>  # [unittest_name]
>  # file = foo.flat # Name of the flat file to be used
>  # smp = 2 # Number of processors the VM will use during this test
> +# # Use $MAX_SMP to use the maximum the host supports.
>  # extra_params = -cpu qemu64,+x2apic # Additional parameters used
>  # arch = i386/x86_64 # Only if the test case works only on one of them
>  # groups = group1 group2 # Used to identify test cases with run_tests -g ...
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] kvm-all: PAGE_SIZE should be real host page size

2015-11-10 Thread Peter Maydell
On 10 November 2015 at 00:23, Andrew Jones  wrote:
> Just noticed this while grepping TARGET_PAGE_SIZE for an unrelated
> reason. I didn't use qemu_real_host_page_size as kvm_set_phys_mem()
> does, because we'd need to make sure page_size_init() has run first.
>
> Signed-off-by: Andrew Jones 
> ---
>  kvm-all.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kvm-all.c b/kvm-all.c
> index 1bc12737723c3..de9ff5971fb3b 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -45,8 +45,10 @@
>  #include 
>  #endif
>
> -/* KVM uses PAGE_SIZE in its definition of COALESCED_MMIO_MAX */
> -#define PAGE_SIZE TARGET_PAGE_SIZE
> +/* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
> + * need to use the real host PAGE_SIZE, as that's what KVM will use.
> + */
> +#define PAGE_SIZE getpagesize()

Rather than defining PAGE_SIZE here (a confusing macro given
we have several page sizes to deal with), why not just use
getpagesize() in the one and only location where we currently
use this macro?

Also, you're guaranteed that page_size_init() has been run, because
we call that from kvm_init(), and you can't call kvm_vcpu_init()
before kvm_init().

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 09/18] arm/run: use ACCEL to choose between kvm and tcg

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> Inspired by a patch by Alex Bennée. This version uses a new
> unittests.cfg variable and includes support for DRYRUN.
> 
> Signed-off-by: Andrew Jones 
> ---
>  arm/run | 43 +--
>  arm/unittests.cfg   |  4 +++-
>  run_tests.sh|  3 ++-
>  scripts/functions.bash  |  8 ++--
>  scripts/mkstandalone.sh | 15 +++
>  5 files changed, 59 insertions(+), 14 deletions(-)
> 
> diff --git a/arm/run b/arm/run
> index 8cc2fa2571967..4a648697d7fb5 100755
> --- a/arm/run
> +++ b/arm/run
> @@ -7,6 +7,42 @@ fi
>  source config.mak
>  processor="$PROCESSOR"
>  
> +if [ -c /dev/kvm ]; then
> + if [ "$HOST" = "arm" ] && [ "$ARCH" = "arm" ]; then
> + kvm_available=yes
> + elif [ "$HOST" = "aarch64" ]; then
> + kvm_available=yes
> + fi
> +fi
> +
> +if [ "$ACCEL" = "kvm" ] && [ "$kvm_available" != "yes" ] &&
> + [ "$DRYRUN" != "yes" ]; then
> + printf "skip $TESTNAME (kvm only)\n\n"
> + exit 2
> +fi
> +
> +if [ -z "$ACCEL" ]; then
> + if [ "$DRYRUN" = "yes" ]; then
> + # Output kvm with tcg fallback for dryrun (when both are
> + # allowed), since the command line we output may get used
> + # elsewhere.
> + ACCEL="kvm:tcg"
> + elif [ "$kvm_available" = "yes" ]; then
> + ACCEL="kvm"
> + else
> + ACCEL="tcg"
> + fi
> +fi
> +
> +if [ "$ARCH" = "arm64" ]; then
> + if [[ $ACCEL =~ kvm ]]; then
> + # arm64 must use '-cpu host' with kvm, and we can't use
> + # '-cpu host' with tcg, so we force kvm-only (no fallback)
> + ACCEL="kvm"
> + processor="host"
> + fi
> +fi
> +
>  qemu="${QEMU:-qemu-system-$ARCH_NAME}"
>  qpath=$(which $qemu 2>/dev/null)
>  
> @@ -33,15 +69,10 @@ if $qemu $M -chardev testdev,id=id -initrd . 2>&1 \
>   exit 2
>  fi
>  
> -M='-machine virt,accel=kvm:tcg'
>  chr_testdev='-device virtio-serial-device'
>  chr_testdev+=' -device virtconsole,chardev=ctd -chardev testdev,id=ctd'
>  
> -# arm64 must use '-cpu host' with kvm
> -if [ "$(arch)" = "aarch64" ] && [ "$ARCH" = "arm64" ] && [ -c /dev/kvm ]; 
> then
> - processor="host"
> -fi
> -
> +M+=",accel=$ACCEL"
>  command="$qemu $M -cpu $processor $chr_testdev"
>  command+=" -display none -serial stdio -kernel"
>  echo $command "$@"
> diff --git a/arm/unittests.cfg b/arm/unittests.cfg
> index e068a0cdd9c1f..243c13301811b 100644
> --- a/arm/unittests.cfg
> +++ b/arm/unittests.cfg
> @@ -3,8 +3,10 @@
>  # file = foo.flat # Name of the flat file to be used
>  # smp  = 2# Number of processors the VM will use during this test
>  # extra_params = -append  # Additional parameters used
> -# arch = arm/arm64   # Only if test case is specific to one
> +# arch = arm|arm64   # Only if test case is specific to one
>  # groups = group1 group2 # Used to identify test cases with run_tests -g ...
> +# accel = kvm|tcg # Optionally specify if test must run with kvm or tcg.
> +# # If not specified, then kvm will be used when available.
>  
>  #
>  # Test that the configured number of processors (smp = ), and
> diff --git a/run_tests.sh b/run_tests.sh
> index 80b87823c3358..b1b4c541ecaea 100755
> --- a/run_tests.sh
> +++ b/run_tests.sh
> @@ -20,6 +20,7 @@ function run()
>  local opts="$5"
>  local arch="$6"
>  local check="$7"
> +local accel="$8"
>  
>  if [ -z "$testname" ]; then
>  return
> @@ -46,7 +47,7 @@ function run()
>  fi
>  done
>  
> -cmdline="TESTNAME=$testname ./$TEST_DIR-run $kernel -smp $smp $opts"
> +cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp 
> $smp $opts"
>  if [ $verbose != 0 ]; then
>  echo $cmdline
>  fi
> diff --git a/scripts/functions.bash b/scripts/functions.bash
> index 7ed5a517250bc..f13fe6f88f23d 100644
> --- a/scripts/functions.bash
> +++ b/scripts/functions.bash
> @@ -10,12 +10,13 @@ function for_each_unittest()
>   local groups
>   local arch
>   local check
> + local accel
>  
>   exec {fd}<"$unittests"
>  
>   while read -u $fd line; do
>   if [[ "$line" =~ ^\[(.*)\]$ ]]; then
> - "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" 
> "$arch" "$check"
> + "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" 
> "$arch" "$check" "$accel"
>   testname=${BASH_REMATCH[1]}
>   smp=1
>   kernel=""
> @@ -23,6 +24,7 @@ function for_each_unittest()
>   groups=""
>   arch=""
>   check=""
> + accel=""
>   elif [[ $line =~ ^file\ *=\ *(.*)$ ]]; then
>   kernel=$TEST_DIR/${BASH_REMATCH[1]}
>   elif [[ $line =~ ^smp\ *=\ 

Re: [kvm-unit-tests PATCH 19/18] don't embed code inside asserts

2015-11-10 Thread Paolo Bonzini


On 09/11/2015 21:57, Andrew Jones wrote:
> assert() is classically a macro which could also be disabled, so if
> somebody introduces a switch to "#define assert(...) /*nothing*/" in
> the future, we'd lose code.
> 
> Suggested-by: Thomas Huth 
> Signed-off-by: Andrew Jones 
> ---
>  lib/arm/setup.c   | 19 ++-
>  lib/arm/smp.c |  4 +++-
>  lib/virtio-mmio.c |  4 +++-
>  3 files changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/arm/setup.c b/lib/arm/setup.c
> index 02e81a689a8a6..da6edc1f9d8ff 100644
> --- a/lib/arm/setup.c
> +++ b/lib/arm/setup.c
> @@ -39,8 +39,11 @@ static void cpu_set(int fdtnode __unused, u32 regval, void 
> *info __unused)
>  
>  static void cpu_init(void)
>  {
> + int ret;
> +
>   nr_cpus = 0;
> - assert(dt_for_each_cpu_node(cpu_set, NULL) == 0);
> + ret = dt_for_each_cpu_node(cpu_set, NULL);
> + assert(ret == 0);
>   set_cpu_online(0, true);
>  }
>  
> @@ -49,8 +52,10 @@ static void mem_init(phys_addr_t freemem_start)
>   /* we only expect one membank to be defined in the DT */
>   struct dt_pbus_reg regs[1];
>   phys_addr_t mem_start, mem_end;
> + int ret;
>  
> - assert(dt_get_memory_params(regs, 1));
> + ret = dt_get_memory_params(regs, 1);
> + assert(ret != 0);
>  
>   mem_start = regs[0].addr;
>   mem_end = mem_start + regs[0].size;
> @@ -71,14 +76,17 @@ void setup(const void *fdt)
>  {
>   const char *bootargs;
>   u32 fdt_size;
> + int ret;
>  
>   /*
>* Move the fdt to just above the stack. The free memory
>* then starts just after the fdt.
>*/
>   fdt_size = fdt_totalsize(fdt);
> - assert(fdt_move(fdt, , fdt_size) == 0);
> - assert(dt_init() == 0);
> + ret = fdt_move(fdt, , fdt_size);
> + assert(ret == 0);
> + ret = dt_init();
> + assert(ret == 0);
>  
>   mem_init(PAGE_ALIGN((unsigned long) + fdt_size));
>   io_init();
> @@ -86,6 +94,7 @@ void setup(const void *fdt)
>  
>   thread_info_init(current_thread_info(), 0);
>  
> - assert(dt_get_bootargs() == 0);
> + ret = dt_get_bootargs();
> + assert(ret == 0);
>   setup_args(bootargs);
>  }
> diff --git a/lib/arm/smp.c b/lib/arm/smp.c
> index 3cfc6d5ddedd0..390c53b5d84c3 100644
> --- a/lib/arm/smp.c
> +++ b/lib/arm/smp.c
> @@ -44,11 +44,13 @@ secondary_entry_fn secondary_cinit(void)
>  void smp_boot_secondary(int cpu, secondary_entry_fn entry)
>  {
>   void *stack_base = memalign(THREAD_SIZE, THREAD_SIZE);
> + int ret;
>  
>   secondary_data.stack = stack_base + THREAD_START_SP;
>   secondary_data.entry = entry;
>   mmu_mark_disabled(cpu);
> - assert(cpu_psci_cpu_boot(cpu) == 0);
> + ret = cpu_psci_cpu_boot(cpu);
> + assert(ret == 0);
>  
>   while (!cpu_online(cpu))
>   wfe();
> diff --git a/lib/virtio-mmio.c b/lib/virtio-mmio.c
> index 5ccbd193a264a..fa8dd5b8d484d 100644
> --- a/lib/virtio-mmio.c
> +++ b/lib/virtio-mmio.c
> @@ -123,10 +123,12 @@ static int vm_dt_match(const struct dt_device *dev, int 
> fdtnode)
>   struct vm_dt_info *info = (struct vm_dt_info *)dev->info;
>   struct dt_pbus_reg base;
>   u32 magic;
> + int ret;
>  
>   dt_device_bind_node((struct dt_device *)dev, fdtnode);
>  
> - assert(dt_pbus_get_base(dev, ) == 0);
> + ret = dt_pbus_get_base(dev, );
> + assert(ret == 0);
>   info->base = ioremap(base.addr, base.size);
>  
>   magic = readl(info->base + VIRTIO_MMIO_MAGIC_VALUE);
> 

Applied, thanks.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 01/18] makefiles: use bash

2015-11-10 Thread Andrew Jones
On Tue, Nov 10, 2015 at 05:22:41PM +0100, Paolo Bonzini wrote:
> 
> 
> On 06/11/2015 01:24, Andrew Jones wrote:
> > Use bash in the makefiles, like we do in the scripts. Without
> > this some platforms using dash fail to execute make targets
> > that use bash-isms.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  Makefile | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/Makefile b/Makefile
> > index 0d5933474cd8c..3e60b4f8e4a57 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1,4 +1,6 @@
> >  
> > +SHELL := /bin/bash
> > +
> >  ifeq ($(wildcard config.mak),)
> >  $(error run ./configure first. See ./configure -h)
> >  endif
> > 
> 
> Which bash-isms are actually present?

config/config-arm-common.mak has $(RM) $(TEST_DIR)/*.{o,flat,elf,map} ...

I could certainly change that one, and that may be the only one... But,
we require bash for other scripts anyway, so I think requiring make to
use it is reasonable.

Thanks,
drew

> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 00/18] bunch of mostly trivial patches

2015-11-10 Thread Andrew Jones
On Tue, Nov 10, 2015 at 05:38:38PM +0100, Paolo Bonzini wrote:
> 
> 
> On 06/11/2015 01:24, Andrew Jones wrote:
> > Many of these patches were posted once. Some weren't, but anyway
> > almost everything is pretty trivial. I'd like to get these in, or
> > at least get definitive nacks on them (and then drop them) in order
> > to clean my queue before more patches (coming from Alex Bennée and
> > Chistopher are reposted).
> > 
> > All patches also available here
> > https://github.com/rhdrjones/kvm-unit-tests/commits/queue
> 
> I applied all of these 

Thanks!

> except 1 (question asked) and 14/15/16/17 (not sure I like the idea).

At one point I recall that you liked the uapi patches, although I'm
not 100% married to it myself, as it does add a new dependency. I'm
open to suggestions.

I'm not sure what you're opposed to wrt to map files (patch 15). They
aren't 100% necessary, but don't really hurt either to generate either.
I won't fight for them though.

The TEST= patch is quite useful. I find it annoying to always have
to modify a makefile whenever I throw together a few line test. It
may not be for everyone, but then it doesn't do anything when it's
not used, so it shouldn't hurt that it exists. I would agree that
maybe the patch should also document it though, if you argued that.
Or, that fact that it's undocumented, and does nothing when not used,
could be an argument to just commit it :-)

Thanks,
drew


> 
> Paolo
> 
> > Thanks,
> > drew
> > 
> > 
> > Alex Bennée (4):
> >   README: add some CONTRIBUTING notes
> >   configure: emit HOST=$host to config.mak
> >   lib/printf: support the %u unsigned fmt field
> >   lib/arm: add flush_tlb_page mmu function
> > 
> > Andrew Jones (13):
> >   makefiles: use bash
> >   trivial: lib: fail hard on failed mallocs
> >   trivial: alloc: don't use 'top' outside spinlock
> >   trivial: lib: missing extern in string.h
> >   README: add pointer to new wiki page
> >   run_tests: pass test name to run script
> >   arm/run: use ACCEL to choose between kvm and tcg
> >   run_tests: probe for max-smp
> >   arm/arm64: allow building a single test
> >   arm/arm64: generate map files
> >   lib: link in linux kernel headers (uapi)
> >   Revert "arm/arm64: import include/uapi/linux/psci.h"
> >   arm/arm64: uart0_init: check /chosen/stdout-path
> > 
> > Christopher Covington (1):
> >   arm: Fail on unknown subtest
> > 
> >  .gitignore   |  2 ++
> >  Makefile |  6 ++--
> >  README   | 32 +++
> >  arm/run  | 43 ++
> >  arm/selftest.c   |  3 ++
> >  arm/unittests.cfg|  7 +++--
> >  config/config-arm-common.mak |  9 +-
> >  configure| 11 +++
> >  lib/alloc.c  |  8 +++--
> >  lib/arm/asm/mmu.h| 11 +++
> >  lib/arm/asm/page.h   |  2 +-
> >  lib/arm/asm/psci.h   |  2 +-
> >  lib/arm/asm/uapi-psci.h  | 73 
> > 
> >  lib/arm/io.c | 36 --
> >  lib/arm64/asm/mmu.h  |  8 +
> >  lib/arm64/asm/page.h |  2 +-
> >  lib/arm64/asm/psci.h |  2 +-
> >  lib/arm64/asm/uapi-psci.h|  1 -
> >  lib/asm-generic/page.h   |  2 +-
> >  lib/const.h  | 11 ---
> >  lib/printf.c | 13 
> >  lib/string.h |  2 +-
> >  lib/virtio-mmio.c|  7 ++---
> >  run_tests.sh | 12 +++-
> >  scripts/functions.bash   |  8 +++--
> >  scripts/mkstandalone.sh  | 22 ++---
> >  x86/unittests.cfg|  1 +
> >  27 files changed, 210 insertions(+), 126 deletions(-)
> >  delete mode 100644 lib/arm/asm/uapi-psci.h
> >  delete mode 100644 lib/arm64/asm/uapi-psci.h
> >  delete mode 100644 lib/const.h
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-all: PAGE_SIZE should be real host page size

2015-11-10 Thread Andrew Jones
On Tue, Nov 10, 2015 at 04:41:16PM +0100, Paolo Bonzini wrote:
> 
> 
> On 10/11/2015 01:23, Andrew Jones wrote:
> > Just noticed this while grepping TARGET_PAGE_SIZE for an unrelated
> > reason. I didn't use qemu_real_host_page_size as kvm_set_phys_mem()
> > does, because we'd need to make sure page_size_init() has run first.
> > 
> > Signed-off-by: Andrew Jones 
> > ---
> >  kvm-all.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kvm-all.c b/kvm-all.c
> > index 1bc12737723c3..de9ff5971fb3b 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -45,8 +45,10 @@
> >  #include 
> >  #endif
> >  
> > -/* KVM uses PAGE_SIZE in its definition of COALESCED_MMIO_MAX */
> > -#define PAGE_SIZE TARGET_PAGE_SIZE
> > +/* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
> > + * need to use the real host PAGE_SIZE, as that's what KVM will use.
> > + */
> > +#define PAGE_SIZE getpagesize()
> >  
> >  //#define DEBUG_KVM
> >  
> > 
> 
> Is this a bugfix or just a cleanup?  If the former, on which targets?

It's a bugfix for any targets that have a TARGET_PAGE_SIZE !=
real-host-page-size. For example ARM has TARGET_PAGE_SIZE set to 1024,
even when the host is using 4k or 64k pages. However, I didn't find this
due to a bug, because on ARM I'm not using emulated devices that make
use of the coalesced-mmio feature at this time.

Thanks,
drew

> 
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 01/18] makefiles: use bash

2015-11-10 Thread Paolo Bonzini


On 06/11/2015 01:24, Andrew Jones wrote:
> Use bash in the makefiles, like we do in the scripts. Without
> this some platforms using dash fail to execute make targets
> that use bash-isms.
> 
> Signed-off-by: Andrew Jones 
> ---
>  Makefile | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index 0d5933474cd8c..3e60b4f8e4a57 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1,4 +1,6 @@
>  
> +SHELL := /bin/bash
> +
>  ifeq ($(wildcard config.mak),)
>  $(error run ./configure first. See ./configure -h)
>  endif
> 

Which bash-isms are actually present?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] kvm-all: PAGE_SIZE should be real host page size

2015-11-10 Thread Andrew Jones
On Tue, Nov 10, 2015 at 04:29:31PM +, Peter Maydell wrote:
> On 10 November 2015 at 00:23, Andrew Jones  wrote:
> > Just noticed this while grepping TARGET_PAGE_SIZE for an unrelated
> > reason. I didn't use qemu_real_host_page_size as kvm_set_phys_mem()
> > does, because we'd need to make sure page_size_init() has run first.
> >
> > Signed-off-by: Andrew Jones 
> > ---
> >  kvm-all.c | 6 --
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/kvm-all.c b/kvm-all.c
> > index 1bc12737723c3..de9ff5971fb3b 100644
> > --- a/kvm-all.c
> > +++ b/kvm-all.c
> > @@ -45,8 +45,10 @@
> >  #include 
> >  #endif
> >
> > -/* KVM uses PAGE_SIZE in its definition of COALESCED_MMIO_MAX */
> > -#define PAGE_SIZE TARGET_PAGE_SIZE
> > +/* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We
> > + * need to use the real host PAGE_SIZE, as that's what KVM will use.
> > + */
> > +#define PAGE_SIZE getpagesize()
> 
> Rather than defining PAGE_SIZE here (a confusing macro given
> we have several page sizes to deal with), why not just use
> getpagesize() in the one and only location where we currently
> use this macro?

The macro is used by kernel headers that we import and include in
kvm-all.c. It's ugly, I agree, but that's how the this cookie crumbled.

> 
> Also, you're guaranteed that page_size_init() has been run, because
> we call that from kvm_init(), and you can't call kvm_vcpu_init()
> before kvm_init().

True, but having that dependency seemed error prone to me. If we
we some day changed when/if page_size_init is called then there
could be an issue, or if somebody did something like

kvm_init()
{
  my_page_size = PAGE_SIZE;
  ...
  page_size_init();
  ...
  use(my_page_size)
}

things would break.

Thanks,
drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/3] virtio DMA API core stuff

2015-11-10 Thread Andy Lutomirski
On Nov 10, 2015 7:02 AM, "Michael S. Tsirkin"  wrote:
>
> On Sun, Nov 08, 2015 at 12:49:46PM +0100, Joerg Roedel wrote:
> > On Sun, Nov 08, 2015 at 12:37:47PM +0200, Michael S. Tsirkin wrote:
> > > I have no problem with that. For example, can we teach
> > > the DMA API on intel x86 to use PT for virtio by default?
> > > That would allow merging Andy's patches with
> > > full compatibility with old guests and hosts.
> >
> > Well, the only incompatibility comes from an experimental qemu feature,
> > more explicitly from a bug in that features implementation. So why
> > should we work around that in the kernel? I think it is not too hard to
> > fix qemu to generate a correct DMAR table which excludes the virtio
> > devices from iommu translation.
> >
> >
> >   Joerg
>
> It's not that easy - you'd have to dedicate some buses
> for iommu bypass, and teach management tools to only put
> virtio there - but it's possible.
>
> This will absolutely address guests that don't need to set up IOMMU for
> virtio devices, and virtio that bypasses the IOMMU.
>
> But the problem is that we do want to *allow* guests
> to set up IOMMU for virtio devices.
> In that case, these are two other usecases:
>
> A- monolitic virtio within QEMU:
> iommu only needed for VFIO ->
> guest should always use iommu=pt
> iommu=on works but is just useless overhead.
>
> B- modular out of process virtio outside QEMU:
> iommu needed for VFIO or kernel driver ->
> guest should use iommu=pt or iommu=on
> depending on security/performance requirements
>
> Note that there could easily be a mix of these in the same system.
>
> So for these cases we do need QEMU to specify to guest that IOMMU covers
> the virtio devices.  Also, once one does this, the default on linux is
> iommu=on and not pt, which works but ATM is very slow.
>
> This poses three problems:
>
> 1. How do we address the different needs of A and B?
>One way would be for virtio to pass the information to guest
>using some virtio specific way, and have drivers
>specify what kind of DMA access they want.
>
> 2. (Kind of a subset of 1) once we do allow IOMMU, how do we make sure most 
> guests
>use the more sensible iommu=pt.
>
> 3. Once we do allow IOMMU, how can we keep existing guests work in this 
> configuration?
>Creating different hypervisor configurations depending on guest is very 
> nasty.
>Again, one way would be some virtio specific interface.
>
> I'd rather we figured the answers to this before merging Andy's patches
> because I'm concerned that instead of 1 broken configuration
> (virtio always bypasses IOMMU) we'll get two bad configurations
> (in the second one, virtio uses the slow default with no
> gain in security).
>
> Suggestions wellcome.

I think there's still no downside of using my patches, even on x86.

Old kernels on new QEMU work unless IOMMU is enabled on the host.  I
think that's the best we can possibly do.

New kernels work at full speed on old QEMU.

New kernels with new QEMU and iommu enabled work slower.  Even newer
kernels with default passthrough work at full speed, and there's no
obvious downside to the existence of kernels with just my patches.

--Andy

>
> --
> MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Andy Lutomirski
On Nov 10, 2015 2:38 AM, "Benjamin Herrenschmidt"
 wrote:
>
> On Mon, 2015-11-09 at 21:35 -0800, Andy Lutomirski wrote:
> >
> > We could do it the other way around: on powerpc, if a PCI device is in
> > that range and doesn't have the "bypass" property at all, then it's
> > assumed to bypass the IOMMU.  This means that everything that
> > currently works continues working.  If someone builds a physical
> > virtio device or uses another system in PCIe target mode speaking
> > virtio, then it won't work until they upgrade their firmware to set
> > bypass=0.  Meanwhile everyone using hypothetical new QEMU also gets
> > bypass=0 and no ambiguity.
> >
> > vfio will presumably notice the bypass and correctly refuse to map any
> > current virtio devices.
> >
> > Would that work?
>
> That would be extremely strange from a platform perspective. Any device
> in that vendor/device range would bypass the iommu unless some new
> property "actually-works-like-a-real-pci-device" happens to exist in
> the device-tree, which we would then need to define somewhere and
> handle accross at least 3 different platforms who get their device-tree
> from widly different places.
>
> Also if tomorrow I create a PCI device that implements virtio-net and
> put it in a machine running IBM proprietary firmware (or Apple's or
> Sun's), it won't have that property...
>
> This is not hypothetical. People are using virtio to do point-to-point
> communication between machines via PCIe today.

Does that work on powerpc on existing kernels?

Anyway, here's another crazy idea: make the quirk assume that the
IOMMU is bypasses if and only if the weak barriers bit is set on
systems that are missing the new DT binding.

--Andy

>
> Cheers,
> Ben.
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch v2] vfio/pci: make an array larger

2015-11-10 Thread Alex Williamson
On Mon, 2015-11-09 at 15:24 +0300, Dan Carpenter wrote:
> Smatch complains about a possible out of bounds error:
> 
>   drivers/vfio/pci/vfio_pci_config.c:1241 vfio_cap_init()
>   error: buffer overflow 'pci_cap_length' 20 <= 20
> 
> The problem is that pci_cap_length[] was defined as large enough to
> hold "PCI_CAP_ID_AF + 1" elements.  The code in vfio_cap_init() assumes
> it has PCI_CAP_ID_MAX + 1 elements.  Originally, PCI_CAP_ID_AF and
> PCI_CAP_ID_MAX were the same but then we introduced PCI_CAP_ID_EA in
> f80b0ba95964 ('PCI: Add Enhanced Allocation register entries') so now
> the array is too small.
> 
> Let's fix this by making the array size PCI_CAP_ID_MAX + 1.  And let's
> make a similar change to pci_ext_cap_length[] for consistency.  Also
> both these arrays can be made const.
> 
> Signed-off-by: Dan Carpenter 
> ---

Applied to next for v4.4.  Thanks!

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Second batch of KVM changes for 4.4

2015-11-10 Thread Paolo Bonzini
Linus,

The following changes since commit a3eaa8649e4c6a6afdafaa04b9114fb230617bb1:

  KVM: VMX: Fix commit which broke PML (2015-11-05 11:34:11 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to a96036b8ef7df9f10cd575c0d78359bd33188e8e:

  KVM: x86: rename update_db_bp_intercept to update_bp_intercept (2015-11-10 
12:06:25 +0100)


Three changes:

- x86: work around two nasty cases where a benign exception occurs while
another is being delivered.  The endless stream of exceptions causes an
infinite loop in the processor, which not even NMIs or SMIs can interrupt;
in the virt case, there is no possibility to exit to the host either.

- x86: support for Skylake per-guest TSC rate.  Long supported by AMD,
the patches mostly move things from there to common arch/x86/kvm/ code.

- generic: remove local_irq_save/restore from the guest entry and exit
paths when context tracking is enabled.  The patches are a few months
old, but we discussed them again at kernel summit.  Andy will pick up
from here and, in 4.5, try to remove it from the user entry/exit paths.


Eric Northup (1):
  KVM: x86: work around infinite loop in microcode when #AC is delivered

Haozhong Zhang (12):
  KVM: x86: Collect information for setting TSC scaling ratio
  KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_arch
  KVM: x86: Add a common TSC scaling function
  KVM: x86: Replace call-back set_tsc_khz() with a common function
  KVM: x86: Replace call-back compute_tsc_offset() with a common function
  KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()
  KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()
  KVM: x86: Use the correct vcpu's TSC rate to compute time scale
  KVM: VMX: Enable and initialize VMX TSC scaling
  KVM: VMX: Setup TSC scaling ratio when a vcpu is loaded
  KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSC
  KVM: VMX: Dump TSC multiplier in dump_vmcs()

Paolo Bonzini (6):
  KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_common
  KVM: x86: declare a few variables as __read_mostly
  context_tracking: remove duplicate enabled check
  context_tracking: avoid irq_save/irq_restore on guest entry and exit
  KVM: svm: unconditionally intercept #DB
  KVM: x86: rename update_db_bp_intercept to update_bp_intercept

 arch/x86/include/asm/kvm_host.h  |  27 +++
 arch/x86/include/asm/vmx.h   |   3 +
 arch/x86/include/uapi/asm/svm.h  |   1 +
 arch/x86/kvm/lapic.c |   4 +-
 arch/x86/kvm/mmu.c   |  20 ++---
 arch/x86/kvm/mmu.h   |   6 +-
 arch/x86/kvm/paging_tmpl.h   |   3 +-
 arch/x86/kvm/svm.c   | 140 ++
 arch/x86/kvm/vmx.c   |  71 +
 arch/x86/kvm/x86.c   | 159 +--
 include/linux/context_tracking.h |  12 ++-
 include/linux/kvm_host.h |   1 +
 include/linux/math64.h   |  80 
 kernel/context_tracking.c|  80 ++--
 14 files changed, 352 insertions(+), 255 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 11:27 +0100, Joerg Roedel wrote:
> 
> You have the same problem when real PCIe devices appear that speak
> virtio. I think the only real (still not very nice) solution is to add a
> quirk to powerpc platform code that sets noop dma-ops for the existing
> virtio vendor/device-ids and add a DT property to opt-out of that quirk.
>
> New vendor/device-ids (as for real devices) would just not be covered by
> the quirk and existing emulated devices continue to work.

Why woud real devices use new vendor/device IDs ? Also there are other
cases such as using virtio between 2 partitions, which we could do
under PowerVM ... that would require proper iommu usage with existing
IDs.

> The absence of the property just means that the quirk is in place and
> the system assumes no translation for virtio devices.

The only way that works forward for me (and possibly sparc & others,
what about ARM ?) is if we *change* something in virtio qemu at the
same time as we add some kind of property. For example the ProgIf field
or revision ID field.

That way I can key on that change.

It's still tricky because I would have to somewhat tell my various firmwares
(SLOF, OpenBIOS, OPAL, ...) so they can create the appropriate property, it's
still hacky, but it would be workable.

Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 14:43 +0200, Michael S. Tsirkin wrote:
> But not virtio-pci I think - that's broken for that usecase since we use
> weaker barriers than required for real IO, as these have measureable
> overhead.  We could have a feature "is a real PCI device",
> that's completely reasonable.

Do we use weaker barriers on the Linux driver side ? I didn't think so
... 

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 10:54 -0800, Andy Lutomirski wrote:
> 
> Does that work on powerpc on existing kernels?
> 
> Anyway, here's another crazy idea: make the quirk assume that the
> IOMMU is bypasses if and only if the weak barriers bit is set on
> systems that are missing the new DT binding.

"New DT bindings" doesn't mean much ... how do we change DT bindings on
existing machines with a FW in flash ?

What about partition <-> partition virtio such as what we could do on
PAPR systems. That would have the weak barrier bit.

Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Andy Lutomirski
On Tue, Nov 10, 2015 at 2:27 PM, Benjamin Herrenschmidt
 wrote:
> On Tue, 2015-11-10 at 10:54 -0800, Andy Lutomirski wrote:
>>
>> Does that work on powerpc on existing kernels?
>>
>> Anyway, here's another crazy idea: make the quirk assume that the
>> IOMMU is bypasses if and only if the weak barriers bit is set on
>> systems that are missing the new DT binding.
>
> "New DT bindings" doesn't mean much ... how do we change DT bindings on
> existing machines with a FW in flash ?
>
> What about partition <-> partition virtio such as what we could do on
> PAPR systems. That would have the weak barrier bit.
>

Is it partition <-> partition, bypassing IOMMU?

I think I'd settle for just something that doesn't regress
non-experimental setups that actually work today and that allow new
setups (x86 with fixed QEMU and maybe something more complicated on
powerpc and/or sparc) to work in all cases.

We could certainly just make powerpc and sparc continue bypassing the
IOMMU until someone comes up with a way to fix it.  I'll send out some
patches that do that, and maybe that'll help this make progress.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virtio_ring: Shadow available ring flags & index

2015-11-10 Thread Venkatesh Srinivas
Improves cacheline transfer flow of available ring header.

Virtqueues are implemented as a pair of rings, one producer->consumer
avail ring and one consumer->producer used ring; preceding the
avail ring in memory are two contiguous u16 fields -- avail->flags
and avail->idx. A producer posts work by writing to avail->idx and
a consumer reads avail->idx.

The flags and idx fields only need to be written by a producer CPU
and only read by a consumer CPU; when the producer and consumer are
running on different CPUs and the virtio_ring code is structured to
only have source writes/sink reads, we can continuously transfer the
avail header cacheline between 'M' states between cores. This flow
optimizes core -> core bandwidth on certain CPUs.

(see: "Software Optimization Guide for AMD Family 15h Processors",
Section 11.6; similar language appears in the 10h guide and should
apply to CPUs w/ exclusive caches, using LLC as a transfer cache)

Unfortunately the existing virtio_ring code issued reads to the
avail->idx and read-modify-writes to avail->flags on the producer.

This change shadows the flags and index fields in producer memory;
the vring code now reads from the shadows and only ever writes to
avail->flags and avail->idx, allowing the cacheline to transfer
core -> core optimally.

In a concurrent version of vring_bench, the time required for
10,000,000 buffer checkout/returns was reduced by ~2% (average
across many runs) on an AMD Piledriver (15h) CPU:

(w/o shadowing):
 Performance counter stats for './vring_bench':
 5,451,082,016  L1-dcache-loads
 ...
   2.221477739 seconds time elapsed

(w/ shadowing):
 Performance counter stats for './vring_bench':
 5,405,701,361  L1-dcache-loads
 ...
   2.168405376 seconds time elapsed

The further away (in a NUMA sense) virtio producers and consumers are
from each other, the more we expect to benefit. Physical implementations
of virtio devices and implementations of virtio where the consumer polls
vring avail indexes (vhost) should also benefit.

Signed-off-by: Venkatesh Srinivas 
---
 drivers/virtio/virtio_ring.c | 46 
 1 file changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 096b857..6262015 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -80,6 +80,12 @@ struct vring_virtqueue {
/* Last used index we've seen. */
u16 last_used_idx;
 
+   /* Last written value to avail->flags */
+   u16 avail_flags_shadow;
+
+   /* Last written value to avail->idx in guest byte order */
+   u16 avail_idx_shadow;
+
/* How to notify other side. FIXME: commonalize hcalls! */
bool (*notify)(struct virtqueue *vq);
 
@@ -235,13 +241,14 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 
/* Put entry in available array (but don't update avail->idx until they
 * do sync). */
-   avail = virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) & 
(vq->vring.num - 1);
+   avail = vq->avail_idx_shadow & (vq->vring.num - 1);
vq->vring.avail->ring[avail] = cpu_to_virtio16(_vq->vdev, head);
 
/* Descriptors and available array need to be set before we expose the
 * new available array entries. */
virtio_wmb(vq->weak_barriers);
-   vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, 
virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) + 1);
+   vq->avail_idx_shadow++;
+   vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
vq->num_added++;
 
pr_debug("Added buffer head %i to %p\n", head, vq);
@@ -354,8 +361,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
 * event. */
virtio_mb(vq->weak_barriers);
 
-   old = virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx) - vq->num_added;
-   new = virtio16_to_cpu(_vq->vdev, vq->vring.avail->idx);
+   old = vq->avail_idx_shadow - vq->num_added;
+   new = vq->avail_idx_shadow;
vq->num_added = 0;
 
 #ifdef DEBUG
@@ -510,7 +517,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int 
*len)
/* If we expect an interrupt for the next entry, tell host
 * by writing event index and flush out the write before
 * the read in the next get_buf call. */
-   if (!(vq->vring.avail->flags & cpu_to_virtio16(_vq->vdev, 
VRING_AVAIL_F_NO_INTERRUPT))) {
+   if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
vring_used_event(>vring) = cpu_to_virtio16(_vq->vdev, 
vq->last_used_idx);
virtio_mb(vq->weak_barriers);
}
@@ -537,7 +544,11 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
struct vring_virtqueue *vq = to_vvq(_vq);
 
-   vq->vring.avail->flags |= cpu_to_virtio16(_vq->vdev, 
VRING_AVAIL_F_NO_INTERRUPT);
+   if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
+   

Re: [PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

2015-11-10 Thread Takuya Yoshikawa

On 2015/11/09 19:14, Paolo Bonzini wrote:

Can you also change kvm_mmu_mark_parents_unsync to use
for_each_rmap_spte instead of pte_list_walk?  It is the last use of
pte_list_walk, and it's nice if we have two uses of for_each_rmap_spte
with parent_ptes as the argument.


No problem, I will do.

Since parent_ptes is also explained as the "reverse mapping" list of
parent sptes (in mmu.txt and kvm_host.h), using rmap helpers will not
look so strange.


BTW, on my todo list is to change the rmap items to a struct (with a
single u64 inside) for type safety.  Since you are touching this code,
perhaps you can give it a shot?


Yes, almost done here (assuming that you mean 'unsigned long').
But I have some candidates for its name in mind:

1. struct kvm_rmap { unsigned long val; };
2. struct kvm_rmap_head { unsigned long val; };
3. struct kvm_rmap_list_head { unsigned long val; };
4. struct kvm_spte_list_head { unsigned long val; };

Since this is the head of the reverse mapping list of sptes, I thought
name 3 might be the best and first made a patch with it, but it was
a bit longer than I had hoped it to be.

I have changed it to name 2, and it looks a bit nicer now, but even
shorter, e.g. name 1, may be good as well.

Do you have any preference?

  Takuya

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

2015-11-10 Thread Paolo Bonzini


On 10/11/2015 10:05, Takuya Yoshikawa wrote:
> 
> 
>> BTW, on my todo list is to change the rmap items to a struct (with a
>> single u64 inside) for type safety.  Since you are touching this code,
>> perhaps you can give it a shot?
> 
> Yes, almost done here (assuming that you mean 'unsigned long').

Yes, thanks!

> But I have some candidates for its name in mind:
> 
> 1. struct kvm_rmap { unsigned long val; };
> 2. struct kvm_rmap_head { unsigned long val; };
> 3. struct kvm_rmap_list_head { unsigned long val; };
> 4. struct kvm_spte_list_head { unsigned long val; };
> 
> Since this is the head of the reverse mapping list of sptes, I thought
> name 3 might be the best and first made a patch with it, but it was
> a bit longer than I had hoped it to be.
> 
> I have changed it to name 2, and it looks a bit nicer now, but even
> shorter, e.g. name 1, may be good as well.
> 
> Do you have any preference?

I like kvm_rmap_head.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/9] KVM, pkeys: add pkeys support for permission_fault logic

2015-11-10 Thread Han, Huaitong
On Mon, 2015-11-09 at 14:17 +0100, Paolo Bonzini wrote:

> > >  static inline bool permission_fault(struct kvm_vcpu *vcpu,
> > > struct kvm_mmu *mmu,
> > > - unsigned pte_access,
> > > unsigned pfec)
> > > + unsigned pte_access, unsigned pte_pkeys,
> > > unsigned pfec)
> > >  {
> > > - int cpl = kvm_x86_ops->get_cpl(vcpu);
> > > - unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
> > > + unsigned long smap, rflags;
> > > + u32 pkru;
> > > + int cpl, index;
> > > + bool wf, uf, pk, pkru_ad, pkru_wd;
> > > +
> > > + cpl = kvm_x86_ops->get_cpl(vcpu);
> > > + rflags = kvm_x86_ops->get_rflags(vcpu);
> > > +
> > > + pkru = read_pkru();
> > > +
> > > + /*
> > > + * PKRU defines 32 bits, there are 16 domains and 2
> > > attribute bits per
> > > + * domain in pkru, pkey is index to a defined domain, so
> > > the value of
> > > + * pkey * PKRU_ATTRS + R/W is offset of a defined domain
> > > attribute.
> > > + */
> > > + pkru_ad = (pkru >> (pte_pkeys * PKRU_ATTRS + PKRU_READ))
> > > & 1;
> > > + pkru_wd = (pkru >> (pte_pkeys * PKRU_ATTRS +
> > > PKRU_WRITE)) & 1;
> > > +
> > > + wf = pfec & PFERR_WRITE_MASK;
> > > + uf = pfec & PFERR_USER_MASK;
> > > +
> > > + /*
> > > + * PKeys 2nd and 6th conditions:
> > > + * 2.EFER_LMA=1
> > > + * 6.PKRU.AD=1
> > > + *   or The access is a data write and PKRU.WD=1 and
> > > + *   either CR0.WP=1 or it is a user mode
> > > access
> > > + */
> > > + pk = is_long_mode(vcpu) && (pkru_ad ||
> > > + (pkru_wd && wf &&
> > > (is_write_protection(vcpu) || uf)));
> 
> A little more optimized:
> 
>   pkru_bits = (pkru >> (pte_pkeys * PKRU_ATTRS)) & 3;
> 
>   /*
>* Ignore PKRU.WD if not relevant to this access (a read,
>* or a supervisor mode access if CR0.WP=0).
>*/
>   if (!wf || (!uf && !is_write_protection(vcpu)))
>   pkru_bits &= ~(1 << PKRU_WRITE);
> 
> ... and then just check pkru_bits != 0.
> 
> > > + /*
> > > + * PK bit right value in pfec equal to
> > > + * PK bit current value in pfec and pk value.
> > > + */
> > > + pfec &= (pk << PFERR_PK_BIT) + ~PFERR_PK_MASK;
> > 
> > PK is only applicable to guest page tables, but if you do not
> > support
> > PKRU without EPT (patch 9), none of this is necessary, is it?
> 
> Doh. :(  Sorry, this is of course needed for the emulation case.
> 
> I think you should optimize this for the common case where pkru is
> zero,
> hence pk will always be zero.  So something like
> 
>   pkru = is_long_mode(vcpu) ? read_pkru() : 0;
>   if (unlikely(pkru) && (pfec & PFERR_PK_MASK)) {
>   ... from above ... */
> 
>   /* Flip PFERR_PK_MASK if pkru_bits is non-zero */
>   pfec ^= -pkru_bits & PFERR_PK_MASK;

If pkru_bits is zero, it means dynamically conditions is not met for
protection-key violations, so pfec on PK bit should be flipped. So I
guess it should be:
pfec ^= pkru_bits ? 0 : PFERR_PK_MASK;

>   }
> 
> Paolo
> 


Re: [PATCH 6/9] KVM, pkeys: add pkeys support for permission_fault logic

2015-11-10 Thread Paolo Bonzini


On 10/11/2015 10:28, Han, Huaitong wrote:
> > pkru = is_long_mode(vcpu) ? read_pkru() : 0;
> > if (unlikely(pkru) && (pfec & PFERR_PK_MASK)) {
> > ... from above ... */
> > 
> > /* Flip PFERR_PK_MASK if pkru_bits is non-zero */
> > pfec ^= -pkru_bits & PFERR_PK_MASK;
> 
> If pkru_bits is zero, it means dynamically conditions is not met for
> protection-key violations, so pfec on PK bit should be flipped. So I
> guess it should be:
>   pfec ^= pkru_bits ? 0 : PFERR_PK_MASK;

Right.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Knut Omang
On Tue, 2015-11-10 at 13:04 +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2015-11-09 at 16:46 -0800, Andy Lutomirski wrote:
> > The problem here is that in some of the problematic cases the
> > virtio
> > driver may not even be loaded.  If someone runs an L1 guest with an
> > IOMMU-bypassing virtio device and assigns it to L2 using vfio, then
> > *boom* L1 crashes.  (Same if, say, DPDK gets used, I think.)
> > 
> > > 
> > > The only way out of this while keeping the "platform" stuff would
> > > be to
> > > also bump some kind of version in the virtio config (or PCI
> > > header). I
> > > have no other way to differenciate between "this is an old qemu
> > > that
> > > doesn't do the 'bypass property' yet" from "this is a virtio
> > > device
> > > that doesn't bypass".
> > > 
> > > Any better idea ?
> > 
> > I'd suggest that, in the absence of the new DT binding, we assume
> > that
> > any PCI device with the virtio vendor ID is passthrough on powerpc.
> >   I
> > can do this in the virtio driver, but if it's in the platform code
> > then vfio gets it right too (i.e. fails to load).
> 
> The problem is there isn't *a* virtio vendor ID. It's the RedHat
> vendor
> ID which will be used by more than just virtio, so we need to
> specifically list the devices.
> 
> Additionally, that still means that once we have a virtio device that
> actually uses the iommu, powerpc will not work since the "workaround"
> above will kick in.
> 
> The "in absence of the new DT binding" doesn't make that much sense.
> 
> Those platforms use device-trees defined since the dawn of ages by
> actual open firmware implementations, they either have no iommu
> representation in there (Macs, the platform code hooks it all up) or
> have various properties related to the iommu but no concept of
> "bypass"
> in there.
> 
> We can *add* a new property under some circumstances that indicates a
> bypass on a per-device basis, however that doesn't completely solve
> it:
> 
>   - As I said above, what does the absence of that property mean ? An
> old qemu that does bypass on all virtio or a new qemu trying to tell
> you that the virtio device actually does use the iommu (or some other
> environment that isn't qemu) ?
> 
>   - On things like macs, the device-tree is generated by openbios, it
> would have to have some added logic to try to figure that out, which
> means it needs to know *via different means* that some or all virtio
> devices bypass the iommu.
> 
> I thus go back to my original statement, it's a LOT easier to handle
> if
> the device itself is self describing, indicating whether it is set to
> bypass a host iommu or not. For L1->L2, well, that wouldn't be the
> first time qemu/VFIO plays tricks with the passed through device
> configuration space...
> 
> Note that the above can be solved via some kind of compromise: The
> device self describes the ability to honor the iommu, along with the
> property (or ACPI table entry) that indicates whether or not it does.
> 
> IE. We could use the revision or ProgIf field of the config space for
> example. Or something in virtio config. If it's an "old" device, we
> know it always bypass. If it's a new device, we know it only bypasses
> if the corresponding property is in. I still would have to sort out
> the
> openbios case for mac among others but it's at least a workable
> direction.
> 
> BTW. Don't you have a similar problem on x86 that today qemu claims
> that everything honors the iommu in ACPI ?
> 
> Unless somebody can come up with a better idea...

Can something be done by means of PCIe capabilities?
ATS (Address Translation Support) seems like a natural choice?

Knut

> Cheers,
> Ben.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe sparclinux"
> in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Ard Biesheuvel
(adding lists)

On 10 November 2015 at 10:45, Ard Biesheuvel  wrote:
> Hi all,
>
> I wonder if this is a better way to address the problem. It looks at
> the nature of the memory rather than the nature of the mapping, which
> is probably a more reliable indicator of whether cache maintenance is
> required when performing the unmap.
>
>
> ---8<
> The open coded tests for checking whether a PTE maps a page as
> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
> which is not guaranteed to work since the type of a mapping is
> not a set of mutually exclusive bits
>
> For HYP mappings, the type is an index into the MAIR table (i.e, the
> index itself does not contain any information whatsoever about the
> type of the mapping), and for stage-2 mappings it is a bit field where
> normal memory and device types are defined as follows:
>
> #define MT_S2_NORMAL0xf
> #define MT_S2_DEVICE_nGnRE  0x1
>
> I.e., masking *and* comparing with the latter matches on the former,
> and we have been getting lucky merely because the S2 device mappings
> also have the PTE_UXN bit set, or we would misidentify memory mappings
> as device mappings.
>
> Since the unmap_range() code path (which contains one instance of the
> flawed test) is used both for HYP mappings and stage-2 mappings, and
> considering the difference between the two, it is non-trivial to fix
> this by rewriting the tests in place, as it would involve passing
> down the type of mapping through all the functions.
>
> However, since HYP mappings and stage-2 mappings both deal with host
> physical addresses, we can simply check whether the mapping is backed
> by memory that is managed by the host kernel, and only perform the
> D-cache maintenance if this is the case.
>
> Signed-off-by: Ard Biesheuvel 
> ---
>  arch/arm/kvm/mmu.c | 15 +++
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 6984342da13d..7dace909d5cf 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
> __kvm_flush_dcache_pud(pud);
>  }
>
> +static bool kvm_is_device_pfn(unsigned long pfn)
> +{
> +   return !pfn_valid(pfn);
> +}
> +
>  /**
>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
>   * @kvm:   pointer to kvm structure.
> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
> kvm_tlb_flush_vmid_ipa(kvm, addr);
>
> /* No need to invalidate the cache for device 
> mappings */
> -   if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
> PAGE_S2_DEVICE)
> +   if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
> kvm_flush_dcache_pte(old_pte);
>
> put_page(virt_to_page(pte));
> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>
> pte = pte_offset_kernel(pmd, addr);
> do {
> -   if (!pte_none(*pte) &&
> -   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> +   if (!pte_none(*pte) && 
> !kvm_is_device_pfn(__phys_to_pfn(addr)))
> kvm_flush_dcache_pte(*pte);
> } while (pte++, addr += PAGE_SIZE, addr != end);
>  }
> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> return kvm_vcpu_dabt_iswrite(vcpu);
>  }
>
> -static bool kvm_is_device_pfn(unsigned long pfn)
> -{
> -   return !pfn_valid(pfn);
> -}
> -
>  /**
>   * stage2_wp_ptes - write protect PMD range
>   * @pmd:   pointer to pmd entry
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Mon, 2015-11-09 at 21:35 -0800, Andy Lutomirski wrote:
> 
> We could do it the other way around: on powerpc, if a PCI device is in
> that range and doesn't have the "bypass" property at all, then it's
> assumed to bypass the IOMMU.  This means that everything that
> currently works continues working.  If someone builds a physical
> virtio device or uses another system in PCIe target mode speaking
> virtio, then it won't work until they upgrade their firmware to set
> bypass=0.  Meanwhile everyone using hypothetical new QEMU also gets
> bypass=0 and no ambiguity.
>
> vfio will presumably notice the bypass and correctly refuse to map any
> current virtio devices.
> 
> Would that work?

That would be extremely strange from a platform perspective. Any device
in that vendor/device range would bypass the iommu unless some new
property "actually-works-like-a-real-pci-device" happens to exist in
the device-tree, which we would then need to define somewhere and
handle accross at least 3 different platforms who get their device-tree 
from widly different places.

Also if tomorrow I create a PCI device that implements virtio-net and
put it in a machine running IBM proprietary firmware (or Apple's or
Sun's), it won't have that property...

This is not hypothetical. People are using virtio to do point-to-point
communication between machines via PCIe today.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Joerg Roedel
On Tue, Nov 10, 2015 at 01:04:36PM +1100, Benjamin Herrenschmidt wrote:
> The "in absence of the new DT binding" doesn't make that much sense.
> 
> Those platforms use device-trees defined since the dawn of ages by
> actual open firmware implementations, they either have no iommu
> representation in there (Macs, the platform code hooks it all up) or
> have various properties related to the iommu but no concept of "bypass"
> in there.
> 
> We can *add* a new property under some circumstances that indicates a
> bypass on a per-device basis, however that doesn't completely solve it:
> 
>   - As I said above, what does the absence of that property mean ? An
> old qemu that does bypass on all virtio or a new qemu trying to tell
> you that the virtio device actually does use the iommu (or some other
> environment that isn't qemu) ?

You have the same problem when real PCIe devices appear that speak
virtio. I think the only real (still not very nice) solution is to add a
quirk to powerpc platform code that sets noop dma-ops for the existing
virtio vendor/device-ids and add a DT property to opt-out of that quirk.

New vendor/device-ids (as for real devices) would just not be covered by
the quirk and existing emulated devices continue to work.

The absence of the property just means that the quirk is in place and
the system assumes no translation for virtio devices.


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Pavel Fedin
 Hello!

 Tested-by: Pavel Fedin 

 Personally i have a small concern about this way of testing. I know many ports 
of the kernel to proprietary systems, and they tend to have drivers which just 
deal with hardcoded physical memory regions on their own, without even 
registering them in the kernel.
 OTOH:
 1. KVM is not meant to be hacked this way as far as i can understand.
 2. Maintainers, i believe, would say: "Then all problems are problems of 
authors of those ports".
 3. Actually, this does not invent anything new, but reuses the approach being 
already used in other parts of the code. And this part is what personally i 
like.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

> -Original Message-
> From: Ard Biesheuvel [mailto:ard.biesheu...@linaro.org]
> Sent: Tuesday, November 10, 2015 12:48 PM
> To: Christoffer Dall; Marc Zyngier; KVM devel mailing list; 
> kvm...@lists.cs.columbia.edu
> Cc: p.fe...@samsung.com; Ard Biesheuvel
> Subject: Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness
> 
> (adding lists)
> 
> On 10 November 2015 at 10:45, Ard Biesheuvel  
> wrote:
> > Hi all,
> >
> > I wonder if this is a better way to address the problem. It looks at
> > the nature of the memory rather than the nature of the mapping, which
> > is probably a more reliable indicator of whether cache maintenance is
> > required when performing the unmap.
> >
> >
> > ---8<
> > The open coded tests for checking whether a PTE maps a page as
> > uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
> > which is not guaranteed to work since the type of a mapping is
> > not a set of mutually exclusive bits
> >
> > For HYP mappings, the type is an index into the MAIR table (i.e, the
> > index itself does not contain any information whatsoever about the
> > type of the mapping), and for stage-2 mappings it is a bit field where
> > normal memory and device types are defined as follows:
> >
> > #define MT_S2_NORMAL0xf
> > #define MT_S2_DEVICE_nGnRE  0x1
> >
> > I.e., masking *and* comparing with the latter matches on the former,
> > and we have been getting lucky merely because the S2 device mappings
> > also have the PTE_UXN bit set, or we would misidentify memory mappings
> > as device mappings.
> >
> > Since the unmap_range() code path (which contains one instance of the
> > flawed test) is used both for HYP mappings and stage-2 mappings, and
> > considering the difference between the two, it is non-trivial to fix
> > this by rewriting the tests in place, as it would involve passing
> > down the type of mapping through all the functions.
> >
> > However, since HYP mappings and stage-2 mappings both deal with host
> > physical addresses, we can simply check whether the mapping is backed
> > by memory that is managed by the host kernel, and only perform the
> > D-cache maintenance if this is the case.
> >
> > Signed-off-by: Ard Biesheuvel 
> > ---
> >  arch/arm/kvm/mmu.c | 15 +++
> >  1 file changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index 6984342da13d..7dace909d5cf 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
> > __kvm_flush_dcache_pud(pud);
> >  }
> >
> > +static bool kvm_is_device_pfn(unsigned long pfn)
> > +{
> > +   return !pfn_valid(pfn);
> > +}
> > +
> >  /**
> >   * stage2_dissolve_pmd() - clear and flush huge PMD entry
> >   * @kvm:   pointer to kvm structure.
> > @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
> > kvm_tlb_flush_vmid_ipa(kvm, addr);
> >
> > /* No need to invalidate the cache for device 
> > mappings */
> > -   if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
> > PAGE_S2_DEVICE)
> > +   if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
> > kvm_flush_dcache_pte(old_pte);
> >
> > put_page(virt_to_page(pte));
> > @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
> > *pmd,
> >
> > pte = pte_offset_kernel(pmd, addr);
> > do {
> > -   if (!pte_none(*pte) &&
> > -   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> > +   if (!pte_none(*pte) && 
> > !kvm_is_device_pfn(__phys_to_pfn(addr)))
> > kvm_flush_dcache_pte(*pte);
> > } while (pte++, addr += PAGE_SIZE, addr != end);
> >  }
> > @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> > return kvm_vcpu_dabt_iswrite(vcpu);
> >  }
> >
> > -static bool kvm_is_device_pfn(unsigned long pfn)
> > -{
> > -   return !pfn_valid(pfn);
> > -}
> > -
> >  /**
> >   * stage2_wp_ptes - write 

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Jan Kiszka
On 2015-11-10 03:18, Andy Lutomirski wrote:
> On Mon, Nov 9, 2015 at 6:04 PM, Benjamin Herrenschmidt
>> I thus go back to my original statement, it's a LOT easier to handle if
>> the device itself is self describing, indicating whether it is set to
>> bypass a host iommu or not. For L1->L2, well, that wouldn't be the
>> first time qemu/VFIO plays tricks with the passed through device
>> configuration space...
> 
> Which leaves the special case of Xen, where even preexisting devices
> don't bypass the IOMMU.  Can we keep this specific to powerpc and
> sparc?  On x86, this problem is basically nonexistent, since the IOMMU
> is properly self-describing.
> 
> IOW, I think that on x86 we should assume that all virtio devices
> honor the IOMMU.

>From the guest driver POV, that is OK because either there is no IOMMU
to program (the current situation with qemu), there can be one that
doesn't need it (the current situation with qemu and iommu=on) or there
is (Xen) or will be (future qemu) one that requires it.

> 
>>
>> Note that the above can be solved via some kind of compromise: The
>> device self describes the ability to honor the iommu, along with the
>> property (or ACPI table entry) that indicates whether or not it does.
>>
>> IE. We could use the revision or ProgIf field of the config space for
>> example. Or something in virtio config. If it's an "old" device, we
>> know it always bypass. If it's a new device, we know it only bypasses
>> if the corresponding property is in. I still would have to sort out the
>> openbios case for mac among others but it's at least a workable
>> direction.
>>
>> BTW. Don't you have a similar problem on x86 that today qemu claims
>> that everything honors the iommu in ACPI ?
> 
> Only on a single experimental configuration, and that can apparently
> just be fixed going forward without any real problems being caused.

BTW, I once tried to describe the current situation on QEMU x86 with
IOMMU enabled via ACPI. While you can easily add IOMMU device exceptions
to the static tables, the fun starts when considering device hotplug for
virtio. Unless I missed some trick, ACPI doesn't seem like being
designed for that level of flexibility.

You would have to reserve a complete PCI bus, declare that one as not
being IOMMU-governed, and then only add new virtio devices to that bus.
Possible, but a lot of restrictions that existing management software
would have to be aware of as well.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 3/5] kvm/x86: per-vcpu apicv deactivation support

2015-11-10 Thread Andrey Smetanin
The decision on whether to use hardware APIC virtualization used to be
taken globally, based on the availability of the feature in the CPU
and the value of a module parameter.

However, under certain circumstances we want to control it on per-vcpu
basis.  In particular, when the userspace activates HyperV synthetic
interrupt controller (SynIC), APICv has to be disabled as it's
incompatible with SynIC auto-EOI behavior.

To achieve that, introduce 'apicv_active' flag on struct
kvm_vcpu_arch, and kvm_vcpu_deactivate_apicv() function to turn APICv
off.  The flag is initialized based on the module parameter and CPU
capability, and consulted whenever an APICv-specific action is
performed.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org

---
 arch/x86/include/asm/kvm_host.h |  6 +-
 arch/x86/kvm/irq.c  |  2 +-
 arch/x86/kvm/lapic.c| 23 +++--
 arch/x86/kvm/lapic.h|  4 ++--
 arch/x86/kvm/svm.c  | 11 +++---
 arch/x86/kvm/vmx.c  | 45 +
 arch/x86/kvm/x86.c  | 19 ++---
 7 files changed, 63 insertions(+), 47 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d51a7e1d..a60a461 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -400,6 +400,7 @@ struct kvm_vcpu_arch {
u64 efer;
u64 apic_base;
struct kvm_lapic *apic;/* kernel irqchip context */
+   bool apicv_active;
DECLARE_BITMAP(ioapic_handled_vectors, 256);
unsigned long apic_attention;
int32_t apic_arb_prio;
@@ -830,7 +831,8 @@ struct kvm_x86_ops {
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
-   int (*cpu_uses_apicv)(struct kvm_vcpu *vcpu);
+   bool (*get_enable_apicv)(void);
+   void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
@@ -1096,6 +1098,8 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, 
gva_t gva,
 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
struct x86_exception *exception);
 
+void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu);
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t gva, u32 error_code,
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 097060e..3982b47 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -76,7 +76,7 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
if (kvm_cpu_has_extint(v))
return 1;
 
-   if (kvm_vcpu_apic_vid_enabled(v))
+   if (kvm_vcpu_apicv_active(v))
return 0;
 
return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b14436d..14d6fcc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -379,7 +379,8 @@ static inline int apic_find_highest_irr(struct kvm_lapic 
*apic)
if (!apic->irr_pending)
return -1;
 
-   kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
+   if (apic->vcpu->arch.apicv_active)
+   kvm_x86_ops->sync_pir_to_irr(apic->vcpu);
result = apic_search_irr(apic);
ASSERT(result == -1 || result >= 16);
 
@@ -392,7 +393,7 @@ static inline void apic_clear_irr(int vec, struct kvm_lapic 
*apic)
 
vcpu = apic->vcpu;
 
-   if (unlikely(kvm_vcpu_apic_vid_enabled(vcpu))) {
+   if (unlikely(vcpu->arch.apicv_active)) {
/* try to update RVI */
apic_clear_vector(vec, apic->regs + APIC_IRR);
kvm_make_request(KVM_REQ_EVENT, vcpu);
@@ -418,7 +419,7 @@ static inline void apic_set_isr(int vec, struct kvm_lapic 
*apic)
 * because the processor can modify ISR under the hood.  Instead
 * just set SVI.
 */
-   if (unlikely(kvm_x86_ops->hwapic_isr_update))
+   if (unlikely(vcpu->arch.apicv_active))
kvm_x86_ops->hwapic_isr_update(vcpu->kvm, vec);
else {
++apic->isr_count;
@@ -466,7 +467,7 @@ static inline void apic_clear_isr(int vec, struct kvm_lapic 
*apic)
 * on the other hand isr_count and highest_isr_cache are unused
 * and must be left alone.
 */
-   if (unlikely(kvm_x86_ops->hwapic_isr_update))
+   if 

[PATCH v4 2/5] kvm/x86: split ioapic-handled and EOI exit bitmaps

2015-11-10 Thread Andrey Smetanin
The function to determine if the vector is handled by ioapic used to
rely on the fact that only ioapic-handled vectors were set up to
cause vmexits when virtual apic was in use.

We're going to break this assumption when introducing Hyper-V
synthetic interrupts: they may need to cause vmexits too.

To achieve that, introduce a new bitmap dedicated specifically for
ioapic-handled vectors, and populate EOI exit bitmap from it for now.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org

---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/ioapic.c   |  4 ++--
 arch/x86/kvm/ioapic.h   |  7 ---
 arch/x86/kvm/irq_comm.c |  5 +++--
 arch/x86/kvm/lapic.c|  2 +-
 arch/x86/kvm/svm.c  |  2 +-
 arch/x86/kvm/vmx.c  |  3 +--
 arch/x86/kvm/x86.c  | 11 ++-
 8 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9265196..d51a7e1d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -400,7 +400,7 @@ struct kvm_vcpu_arch {
u64 efer;
u64 apic_base;
struct kvm_lapic *apic;/* kernel irqchip context */
-   u64 eoi_exit_bitmap[4];
+   DECLARE_BITMAP(ioapic_handled_vectors, 256);
unsigned long apic_attention;
int32_t apic_arb_prio;
int mp_state;
@@ -833,7 +833,7 @@ struct kvm_x86_ops {
int (*cpu_uses_apicv)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
-   void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu);
+   void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm_vcpu *vcpu, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 88d0a92..1facfd6 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -233,7 +233,7 @@ static void kvm_ioapic_inject_all(struct kvm_ioapic 
*ioapic, unsigned long irr)
 }
 
 
-void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
+void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong 
*ioapic_handled_vectors)
 {
struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
union kvm_ioapic_redirect_entry *e;
@@ -250,7 +250,7 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap)
(e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
 kvm_apic_pending_eoi(vcpu, e->fields.vector)))
__set_bit(e->fields.vector,
-   (unsigned long *)eoi_exit_bitmap);
+ ioapic_handled_vectors);
}
}
spin_unlock(>lock);
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index 084617d..2d16dc2 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -121,7 +121,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
struct kvm_lapic_irq *irq, unsigned long *dest_map);
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
-void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
-void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
-
+void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
+  ulong *ioapic_handled_vectors);
+void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
+   ulong *ioapic_handled_vectors);
 #endif
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index e39768c..ece901c 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -339,7 +339,8 @@ void kvm_arch_post_irq_routing_update(struct kvm *kvm)
kvm_make_scan_ioapic_request(kvm);
 }
 
-void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
+void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
+   ulong *ioapic_handled_vectors)
 {
struct kvm *kvm = vcpu->kvm;
struct kvm_kernel_irq_routing_entry *entry;
@@ -369,7 +370,7 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 
*eoi_exit_bitmap)
u32 vector = entry->msi.data & 0xff;
 
__set_bit(vector,
- (unsigned long *) eoi_exit_bitmap);
+  

[PATCH v4 1/5] kvm/irqchip: kvm_arch_irq_routing_update renaming split

2015-11-10 Thread Andrey Smetanin
Actually kvm_arch_irq_routing_update() should be
kvm_arch_post_irq_routing_update() as it's called at the end
of irq routing update.

This renaming frees kvm_arch_irq_routing_update function name.
kvm_arch_irq_routing_update() weak function which will be used
to update mappings for arch-specific irq routing entries
(in particular, the upcoming Hyper-V synthetic interrupts).

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org

---
 arch/x86/kvm/irq_comm.c  | 2 +-
 include/linux/kvm_host.h | 5 +++--
 virt/kvm/irqchip.c   | 7 ++-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 84b96d3..e39768c 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -332,7 +332,7 @@ int kvm_setup_empty_irq_routing(struct kvm *kvm)
return kvm_set_irq_routing(kvm, empty_routing, 0, 0);
 }
 
-void kvm_arch_irq_routing_update(struct kvm *kvm)
+void kvm_arch_post_irq_routing_update(struct kvm *kvm)
 {
if (ioapic_in_kernel(kvm) || !irqchip_in_kernel(kvm))
return;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 242a6d2..dbe2a2f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -473,12 +473,12 @@ void vcpu_put(struct kvm_vcpu *vcpu);
 
 #ifdef __KVM_HAVE_IOAPIC
 void kvm_vcpu_request_scan_ioapic(struct kvm *kvm);
-void kvm_arch_irq_routing_update(struct kvm *kvm);
+void kvm_arch_post_irq_routing_update(struct kvm *kvm);
 #else
 static inline void kvm_vcpu_request_scan_ioapic(struct kvm *kvm)
 {
 }
-static inline void kvm_arch_irq_routing_update(struct kvm *kvm)
+static inline void kvm_arch_post_irq_routing_update(struct kvm *kvm)
 {
 }
 #endif
@@ -1080,6 +1080,7 @@ static inline void kvm_irq_routing_update(struct kvm *kvm)
 {
 }
 #endif
+void kvm_arch_irq_routing_update(struct kvm *kvm);
 
 static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 {
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index f0b08a2..fe84e1a 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -166,6 +166,10 @@ out:
return r;
 }
 
+void __attribute__((weak)) kvm_arch_irq_routing_update(struct kvm *kvm)
+{
+}
+
 int kvm_set_irq_routing(struct kvm *kvm,
const struct kvm_irq_routing_entry *ue,
unsigned nr,
@@ -219,9 +223,10 @@ int kvm_set_irq_routing(struct kvm *kvm,
old = kvm->irq_routing;
rcu_assign_pointer(kvm->irq_routing, new);
kvm_irq_routing_update(kvm);
+   kvm_arch_irq_routing_update(kvm);
mutex_unlock(>irq_lock);
 
-   kvm_arch_irq_routing_update(kvm);
+   kvm_arch_post_irq_routing_update(kvm);
 
synchronize_srcu_expedited(>irq_srcu);
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/5] kvm/x86: Hyper-V synthetic interrupt controller

2015-11-10 Thread Andrey Smetanin
SynIC (synthetic interrupt controller) is a lapic extension,
which is controlled via MSRs and maintains for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Changes v4:
* added activation of SynIC by vcpu KVM_ENABLE_CAP
* added per SynIC active flag
* added deactivation of APICv upon SynIC activation

Changes v3:
* added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
docs

Changes v2:
* do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
* add Hyper-V SynIC vectors into EOI exit bitmap
* Hyper-V SyniIC SINT msr write logic simplified

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org

---
 Documentation/virtual/kvm/api.txt |  19 +++
 arch/x86/include/asm/kvm_host.h   |  15 ++
 arch/x86/kvm/hyperv.c | 315 ++
 arch/x86/kvm/hyperv.h |  23 +++
 arch/x86/kvm/irq_comm.c   |  34 
 arch/x86/kvm/lapic.c  |  15 +-
 arch/x86/kvm/lapic.h  |   5 +
 arch/x86/kvm/x86.c|  34 +++-
 include/linux/kvm_host.h  |   6 +
 include/uapi/linux/kvm.h  |   8 +
 10 files changed, 467 insertions(+), 7 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 34cc068..16096a2 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1451,6 +1451,7 @@ struct kvm_irq_routing_entry {
struct kvm_irq_routing_irqchip irqchip;
struct kvm_irq_routing_msi msi;
struct kvm_irq_routing_s390_adapter adapter;
+   struct kvm_irq_routing_hv_sint hv_sint;
__u32 pad[8];
} u;
 };
@@ -1459,6 +1460,7 @@ struct kvm_irq_routing_entry {
 #define KVM_IRQ_ROUTING_IRQCHIP 1
 #define KVM_IRQ_ROUTING_MSI 2
 #define KVM_IRQ_ROUTING_S390_ADAPTER 3
+#define KVM_IRQ_ROUTING_HV_SINT 4
 
 No flags are specified so far, the corresponding field must be set to zero.
 
@@ -1482,6 +1484,10 @@ struct kvm_irq_routing_s390_adapter {
__u32 adapter_id;
 };
 
+struct kvm_irq_routing_hv_sint {
+   __u32 vcpu;
+   __u32 sint;
+};
 
 4.53 KVM_ASSIGN_SET_MSIX_NR (deprecated)
 
@@ -3685,3 +3691,16 @@ available, means that that the kernel has an 
implementation of the
 H_RANDOM hypercall backed by a hardware random-number generator.
 If present, the kernel H_RANDOM handler can be enabled for guest use
 with the KVM_CAP_PPC_ENABLE_HCALL capability.
+
+8.2 KVM_CAP_HYPERV_SYNIC
+
+Architectures: x86
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that that the kernel has an implementation of the
+Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
+used to support Windows Hyper-V based guest paravirt drivers(VMBus).
+
+In order to use SynIC, it has to be activated by setting this
+capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
+will disable the use of APIC hardware virtualization even if supported
+by the CPU, as it's incompatible with SynIC auto-EOI behavior.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a60a461..ad29e89 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -374,10 +375,24 @@ struct kvm_mtrr {
struct list_head head;
 };
 
+/* Hyper-V synthetic interrupt controller (SynIC)*/
+struct kvm_vcpu_hv_synic {
+   u64 version;
+   u64 control;
+   u64 msg_page;
+   u64 evt_page;
+   atomic64_t sint[HV_SYNIC_SINT_COUNT];
+   atomic_t sint_to_gsi[HV_SYNIC_SINT_COUNT];
+   DECLARE_BITMAP(auto_eoi_bitmap, 256);
+   DECLARE_BITMAP(vec_bitmap, 256);
+   bool active;
+};
+
 /* Hyper-V per vcpu emulation context */
 struct kvm_vcpu_hv {
u64 hv_vapic;
s64 runtime_offset;
+   struct kvm_vcpu_hv_synic synic;
 };
 
 struct kvm_vcpu_arch {
diff --git 

[PATCH v4 5/5] kvm/x86: Hyper-V kvm exit

2015-11-10 Thread Andrey Smetanin
A new vcpu exit is introduced to notify the userspace of the
changes in Hyper-V SynIC configuration triggered by guest writing to the
corresponding MSRs.

Changes v4:
* exit into userspace only if guest writes into SynIC MSR's

Changes v3:
* added KVM_EXIT_HYPERV types and structs notes into docs

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org

---
 Documentation/virtual/kvm/api.txt | 22 ++
 arch/x86/include/asm/kvm_host.h   |  1 +
 arch/x86/kvm/hyperv.c | 20 
 arch/x86/kvm/x86.c|  6 ++
 include/linux/kvm_host.h  |  1 +
 include/uapi/linux/kvm.h  | 17 +
 6 files changed, 67 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 16096a2..abc4f48 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3337,6 +3337,28 @@ the userspace IOAPIC should process the EOI and 
retrigger the interrupt if
 it is still asserted.  Vector is the LAPIC interrupt vector for which the
 EOI was received.
 
+   struct kvm_hyperv_exit {
+#define KVM_EXIT_HYPERV_SYNIC  1
+   __u32 type;
+   union {
+   struct {
+   __u32 msr;
+   __u64 control;
+   __u64 evt_page;
+   __u64 msg_page;
+   } synic;
+   } u;
+   };
+   /* KVM_EXIT_HYPERV */
+struct kvm_hyperv_exit hyperv;
+Indicates that the VCPU exits into userspace to process some tasks
+related to Hyper-V emulation.
+Valid values for 'type' are:
+   KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about
+Hyper-V SynIC state change. Notification is used to remap SynIC
+event/message pages and to enable/disable SynIC messages/events processing
+in userspace.
+
/* Fix the size of the union. */
char padding[256];
};
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ad29e89..1cefa1e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -393,6 +393,7 @@ struct kvm_vcpu_hv {
u64 hv_vapic;
s64 runtime_offset;
struct kvm_vcpu_hv_synic synic;
+   struct kvm_hyperv_exit exit;
 };
 
 struct kvm_vcpu_arch {
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 83a3c0c..41869a9 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -130,6 +130,20 @@ static void kvm_hv_notify_acked_sint(struct kvm_vcpu 
*vcpu, u32 sint)
srcu_read_unlock(>irq_srcu, idx);
 }
 
+static void synic_exit(struct kvm_vcpu_hv_synic *synic, u32 msr)
+{
+   struct kvm_vcpu *vcpu = synic_to_vcpu(synic);
+   struct kvm_vcpu_hv *hv_vcpu = >arch.hyperv;
+
+   hv_vcpu->exit.type = KVM_EXIT_HYPERV_SYNIC;
+   hv_vcpu->exit.u.synic.msr = msr;
+   hv_vcpu->exit.u.synic.control = synic->control;
+   hv_vcpu->exit.u.synic.evt_page = synic->evt_page;
+   hv_vcpu->exit.u.synic.msg_page = synic->msg_page;
+
+   kvm_make_request(KVM_REQ_HV_EXIT, vcpu);
+}
+
 static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
 u32 msr, u64 data, bool host)
 {
@@ -145,6 +159,8 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
switch (msr) {
case HV_X64_MSR_SCONTROL:
synic->control = data;
+   if (!host)
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_SVERSION:
if (!host) {
@@ -161,6 +177,8 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
break;
}
synic->evt_page = data;
+   if (!host)
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_SIMP:
if (data & HV_SYNIC_SIMP_ENABLE)
@@ -170,6 +188,8 @@ static int synic_set_msr(struct kvm_vcpu_hv_synic *synic,
break;
}
synic->msg_page = data;
+   if (!host)
+   synic_exit(synic, msr);
break;
case HV_X64_MSR_EOM: {
int i;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 41f3030..04daf32 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6377,6 +6377,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
r = 0;
goto out;
 

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Michael S. Tsirkin
On Tue, Nov 10, 2015 at 09:37:54PM +1100, Benjamin Herrenschmidt wrote:
> On Mon, 2015-11-09 at 21:35 -0800, Andy Lutomirski wrote:
> > 
> > We could do it the other way around: on powerpc, if a PCI device is in
> > that range and doesn't have the "bypass" property at all, then it's
> > assumed to bypass the IOMMU.  This means that everything that
> > currently works continues working.  If someone builds a physical
> > virtio device or uses another system in PCIe target mode speaking
> > virtio, then it won't work until they upgrade their firmware to set
> > bypass=0.  Meanwhile everyone using hypothetical new QEMU also gets
> > bypass=0 and no ambiguity.
> >
> > vfio will presumably notice the bypass and correctly refuse to map any
> > current virtio devices.
> > 
> > Would that work?
> 
> That would be extremely strange from a platform perspective. Any device
> in that vendor/device range would bypass the iommu unless some new
> property "actually-works-like-a-real-pci-device" happens to exist in
> the device-tree, which we would then need to define somewhere and
> handle accross at least 3 different platforms who get their device-tree 
> from widly different places.

Then we are back to virtio driver telling DMA core
whether it wants a 1:1 mapping in the iommu?
If that's acceptable to others, I don't think that's too bad.


> Also if tomorrow I create a PCI device that implements virtio-net and
> put it in a machine running IBM proprietary firmware (or Apple's or
> Sun's), it won't have that property...
> 
> This is not hypothetical. People are using virtio to do point-to-point
> communication between machines via PCIe today.
> 
> Cheers,
> Ben.

But not virtio-pci I think - that's broken for that usecase since we use
weaker barriers than required for real IO, as these have measureable
overhead.  We could have a feature "is a real PCI device",
that's completely reasonable.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/5] kvm: Hyper-V SynIC irq routing support

2015-11-10 Thread Andrey Smetanin
Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: kvm@vger.kernel.org

---
 include/sysemu/kvm.h |  1 +
 kvm-all.c| 33 +
 2 files changed, 34 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 4ac6176..92ccb35 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -447,6 +447,7 @@ int kvm_irqchip_update_msi_route(KVMState *s, int virq, 
MSIMessage msg,
 void kvm_irqchip_release_virq(KVMState *s, int virq);
 
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter);
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint);
 
 int kvm_irqchip_add_irqfd_notifier_gsi(KVMState *s, EventNotifier *n,
EventNotifier *rn, int virq);
diff --git a/kvm-all.c b/kvm-all.c
index 1bc1273..d36b494 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1297,6 +1297,34 @@ int kvm_irqchip_add_adapter_route(KVMState *s, 
AdapterInfo *adapter)
 return virq;
 }
 
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint)
+{
+struct kvm_irq_routing_entry kroute = {};
+int virq;
+
+if (!kvm_gsi_routing_enabled()) {
+return -ENOSYS;
+}
+if (!kvm_check_extension(s, KVM_CAP_HYPERV_SYNIC)) {
+return -ENOSYS;
+}
+virq = kvm_irqchip_get_virq(s);
+if (virq < 0) {
+return virq;
+}
+
+kroute.gsi = virq;
+kroute.type = KVM_IRQ_ROUTING_HV_SINT;
+kroute.flags = 0;
+kroute.u.hv_sint.vcpu = vcpu;
+kroute.u.hv_sint.sint = sint;
+
+kvm_add_routing_entry(s, );
+kvm_irqchip_commit_routes(s);
+
+return virq;
+}
+
 #else /* !KVM_CAP_IRQ_ROUTING */
 
 void kvm_init_irq_routing(KVMState *s)
@@ -1322,6 +1350,11 @@ int kvm_irqchip_add_adapter_route(KVMState *s, 
AdapterInfo *adapter)
 return -ENOSYS;
 }
 
+int kvm_irqchip_add_hv_sint_route(KVMState *s, uint32_t vcpu, uint32_t sint)
+{
+return -ENOSYS;
+}
+
 static int kvm_irqchip_assign_irqfd(KVMState *s, int fd, int virq, bool assign)
 {
 abort();
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/5] headers: Linux kernel Hyper-V SynIC defines

2015-11-10 Thread Andrey Smetanin
This patch brings in the necessary changes from the corresponding kernel
patchset.  It's included only for completeness; ideally these changes
should arrive via the standard kernel header pull.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: kvm@vger.kernel.org

---
 include/standard-headers/asm-x86/hyperv.h | 12 
 linux-headers/linux/kvm.h | 25 +
 2 files changed, 37 insertions(+)

diff --git a/include/standard-headers/asm-x86/hyperv.h 
b/include/standard-headers/asm-x86/hyperv.h
index c37c14e..f9780f1 100644
--- a/include/standard-headers/asm-x86/hyperv.h
+++ b/include/standard-headers/asm-x86/hyperv.h
@@ -257,4 +257,16 @@ typedef struct _HV_REFERENCE_TSC_PAGE {
int64_t tsc_offset;
 } HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
 
+/* Define the number of synthetic interrupt sources. */
+#define HV_SYNIC_SINT_COUNT(16)
+/* Define the expected SynIC version. */
+#define HV_SYNIC_VERSION_1 (0x1)
+
+#define HV_SYNIC_CONTROL_ENABLE(1ULL << 0)
+#define HV_SYNIC_SIMP_ENABLE   (1ULL << 0)
+#define HV_SYNIC_SIEFP_ENABLE  (1ULL << 0)
+#define HV_SYNIC_SINT_MASKED   (1ULL << 16)
+#define HV_SYNIC_SINT_AUTO_EOI (1ULL << 17)
+#define HV_SYNIC_SINT_VECTOR_MASK  (0xFF)
+
 #endif
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index dcc410e..4e20262 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -154,6 +154,20 @@ struct kvm_s390_skeys {
__u32 flags;
__u32 reserved[9];
 };
+
+struct kvm_hyperv_exit {
+#define KVM_EXIT_HYPERV_SYNIC  1
+   __u32 type;
+   union {
+   struct {
+   __u32 msr;
+   __u64 control;
+   __u64 evt_page;
+   __u64 msg_page;
+   } synic;
+   } u;
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX1048576
 
@@ -184,6 +198,7 @@ struct kvm_s390_skeys {
 #define KVM_EXIT_SYSTEM_EVENT 24
 #define KVM_EXIT_S390_STSI25
 #define KVM_EXIT_IOAPIC_EOI   26
+#define KVM_EXIT_HYPERV   27
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -338,6 +353,8 @@ struct kvm_run {
struct {
__u8 vector;
} eoi;
+   /* KVM_EXIT_HYPERV */
+   struct kvm_hyperv_exit hyperv;
/* Fix the size of the union. */
char padding[256];
};
@@ -831,6 +848,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_GUEST_DEBUG_HW_WPS 120
 #define KVM_CAP_SPLIT_IRQCHIP 121
 #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122
+#define KVM_CAP_HYPERV_SYNIC 123
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -854,10 +872,16 @@ struct kvm_irq_routing_s390_adapter {
__u32 adapter_id;
 };
 
+struct kvm_irq_routing_hv_sint {
+   __u32 vcpu;
+   __u32 sint;
+};
+
 /* gsi routing entry types */
 #define KVM_IRQ_ROUTING_IRQCHIP 1
 #define KVM_IRQ_ROUTING_MSI 2
 #define KVM_IRQ_ROUTING_S390_ADAPTER 3
+#define KVM_IRQ_ROUTING_HV_SINT 4
 
 struct kvm_irq_routing_entry {
__u32 gsi;
@@ -868,6 +892,7 @@ struct kvm_irq_routing_entry {
struct kvm_irq_routing_irqchip irqchip;
struct kvm_irq_routing_msi msi;
struct kvm_irq_routing_s390_adapter adapter;
+   struct kvm_irq_routing_hv_sint hv_sint;
__u32 pad[8];
} u;
 };
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/5] target-i386/kvm: Hyper-V SynIC MSR's support

2015-11-10 Thread Andrey Smetanin
This patch does Hyper-V Synthetic interrupt
controller(Hyper-V SynIC) MSR's support and
migration. Hyper-V SynIC is enabled by cpu's
'hv-synic' option.

This patch does not allow cpu creation if
'hv-synic' option specified but kernel
doesn't support Hyper-V SynIC.

Changes v2:
* activate Hyper-V SynIC by enabling corresponding vcpu cap
* reject cpu initialization if user requested Hyper-V SynIC
  but kernel does not support Hyper-V SynIC

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: kvm@vger.kernel.org

---
 target-i386/cpu-qom.h |  1 +
 target-i386/cpu.c |  1 +
 target-i386/cpu.h |  5 
 target-i386/kvm.c | 67 ++-
 target-i386/machine.c | 39 ++
 5 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
index e3bfe9d..7ea5b34 100644
--- a/target-i386/cpu-qom.h
+++ b/target-i386/cpu-qom.h
@@ -94,6 +94,7 @@ typedef struct X86CPU {
 bool hyperv_reset;
 bool hyperv_vpindex;
 bool hyperv_runtime;
+bool hyperv_synic;
 bool check_cpuid;
 bool enforce_cpuid;
 bool expose_kvm;
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index e5f1c5b..1462e19 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -3142,6 +3142,7 @@ static Property x86_cpu_properties[] = {
 DEFINE_PROP_BOOL("hv-reset", X86CPU, hyperv_reset, false),
 DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
 DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
+DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
 DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
 DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
 DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index fc4a605..8cf33df 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -918,6 +918,11 @@ typedef struct CPUX86State {
 uint64_t msr_hv_tsc;
 uint64_t msr_hv_crash_params[HV_X64_MSR_CRASH_PARAMS];
 uint64_t msr_hv_runtime;
+uint64_t msr_hv_synic_control;
+uint64_t msr_hv_synic_version;
+uint64_t msr_hv_synic_evt_page;
+uint64_t msr_hv_synic_msg_page;
+uint64_t msr_hv_synic_sint[HV_SYNIC_SINT_COUNT];
 
 /* exception/interrupt handling */
 int error_code;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 2a9953b..cfcd01d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -86,6 +86,7 @@ static bool has_msr_hv_crash;
 static bool has_msr_hv_reset;
 static bool has_msr_hv_vpindex;
 static bool has_msr_hv_runtime;
+static bool has_msr_hv_synic;
 static bool has_msr_mtrr;
 static bool has_msr_xss;
 
@@ -521,7 +522,8 @@ static bool hyperv_enabled(X86CPU *cpu)
 cpu->hyperv_crash ||
 cpu->hyperv_reset ||
 cpu->hyperv_vpindex ||
-cpu->hyperv_runtime);
+cpu->hyperv_runtime ||
+cpu->hyperv_synic);
 }
 
 static Error *invtsc_mig_blocker;
@@ -610,6 +612,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
 if (cpu->hyperv_runtime && has_msr_hv_runtime) {
 c->eax |= HV_X64_MSR_VP_RUNTIME_AVAILABLE;
 }
+if (cpu->hyperv_synic) {
+if (!has_msr_hv_synic ||
+kvm_vcpu_enable_cap(cs, KVM_CAP_HYPERV_SYNIC, 0)) {
+fprintf(stderr, "Hyper-V SynIC is not supported by kernel\n");
+return -ENOSYS;
+}
+c->eax |= HV_X64_MSR_SYNIC_AVAILABLE;
+}
 c = _data.entries[cpuid_i++];
 c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
 if (cpu->hyperv_relaxed_timing) {
@@ -950,6 +960,10 @@ static int kvm_get_supported_msrs(KVMState *s)
 has_msr_hv_runtime = true;
 continue;
 }
+if (kvm_msr_list->indices[i] == HV_X64_MSR_SCONTROL) {
+has_msr_hv_synic = true;
+continue;
+}
 }
 }
 
@@ -1511,6 +1525,31 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
 kvm_msr_entry_set([n++], HV_X64_MSR_VP_RUNTIME,
   env->msr_hv_runtime);
 }
+if (cpu->hyperv_synic) {
+int j;
+
+if (!env->msr_hv_synic_version) {
+/* First time initialization */
+env->msr_hv_synic_version = HV_SYNIC_VERSION_1;
+for (j = 0; j < ARRAY_SIZE(env->msr_hv_synic_sint); j++) {
+env->msr_hv_synic_sint[j] = HV_SYNIC_SINT_MASKED;
+}

[PATCH v2 0/5] QEMU: Hyper-V SynIC support

2015-11-10 Thread Andrey Smetanin
Hyper-V SynIC (synthetic interrupt controller) support:
* msr's support
* irq routing setup
* irq injection
* irq ack's callbacks
* event/message pages changes tracking at Hyper-V exit
* Hyper-V test device to test SynIC by kvm-unit-tests

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: kvm@vger.kernel.org

Changes v2:
* linux headers update by scripts moved into separate patche
* activate Hyper-V SynIC by enabling corresponding vcpu cap
* reject cpu initialization if user requested Hyper-V SynIC
  but kernel does not support Hyper-V SynIC

Andrey Smetanin (5):
  headers: Linux kernel Hyper-V SynIC defines
  target-i386/kvm: Hyper-V SynIC MSR's support
  kvm: Hyper-V SynIC irq routing support
  target-i386/hyperv: Hyper-V SynIC SINT routing and vcpu exit
  hw/misc: Hyper-V test device 'hyperv-testdev'

 default-configs/i386-softmmu.mak  |   1 +
 default-configs/x86_64-softmmu.mak|   1 +
 hw/misc/Makefile.objs |   1 +
 hw/misc/hyperv_testdev.c  | 164 ++
 include/standard-headers/asm-x86/hyperv.h |  12 +++
 include/sysemu/kvm.h  |   1 +
 kvm-all.c |  33 ++
 linux-headers/linux/kvm.h |  25 +
 target-i386/Makefile.objs |   2 +-
 target-i386/cpu-qom.h |   1 +
 target-i386/cpu.c |   1 +
 target-i386/cpu.h |   5 +
 target-i386/hyperv.c  | 127 +++
 target-i386/hyperv.h  |  42 
 target-i386/kvm.c |  73 -
 target-i386/machine.c |  39 +++
 16 files changed, 526 insertions(+), 2 deletions(-)
 create mode 100644 hw/misc/hyperv_testdev.c
 create mode 100644 target-i386/hyperv.c
 create mode 100644 target-i386/hyperv.h

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/5] hw/misc: Hyper-V test device 'hyperv-testdev'

2015-11-10 Thread Andrey Smetanin
'hyperv-testdev' will be used by kvm-unit-tests
to setup Hyper-V SynIC SINT's routing and to inject
Hyper-V SynIC SINT's.

Hyper-V test device is ISA type device that creates 0x3000
IO memory region and catches write access into it. Every
write operation data decoded into ctl code and parameters
for Hyper-V test device.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: kvm@vger.kernel.org

---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/misc/Makefile.objs  |   1 +
 hw/misc/hyperv_testdev.c   | 164 +
 4 files changed, 167 insertions(+)
 create mode 100644 hw/misc/hyperv_testdev.c

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 43c96d1..7f3c850 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
+CONFIG_HYPERV_TESTDEV=y
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index dfb8095..e494d79 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -50,3 +50,4 @@ CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
+CONFIG_HYPERV_TESTDEV=y
diff --git a/hw/misc/Makefile.objs b/hw/misc/Makefile.objs
index 4aa76ff..fafc80a 100644
--- a/hw/misc/Makefile.objs
+++ b/hw/misc/Makefile.objs
@@ -40,3 +40,4 @@ obj-$(CONFIG_STM32F2XX_SYSCFG) += stm32f2xx_syscfg.o
 
 obj-$(CONFIG_PVPANIC) += pvpanic.o
 obj-$(CONFIG_EDU) += edu.o
+obj-$(CONFIG_HYPERV_TESTDEV) += hyperv_testdev.o
diff --git a/hw/misc/hyperv_testdev.c b/hw/misc/hyperv_testdev.c
new file mode 100644
index 000..f0e4e35
--- /dev/null
+++ b/hw/misc/hyperv_testdev.c
@@ -0,0 +1,164 @@
+/*
+ * QEMU KVM Hyper-V test device to support Hyper-V kvm-unit-tests
+ *
+ * Copyright (C) 2015 Andrey Smetanin 
+ *
+ * Authors:
+ *  Andrey Smetanin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hw/hw.h"
+#include "hw/qdev.h"
+#include "hw/isa/isa.h"
+#include "target-i386/hyperv.h"
+
+#define HV_TEST_DEV_MAX_SINT_ROUTES 64
+
+struct HypervTestDev {
+ISADevice parent_obj;
+MemoryRegion sint_control;
+HvSintRoute *sint_route[HV_TEST_DEV_MAX_SINT_ROUTES];
+};
+typedef struct HypervTestDev HypervTestDev;
+
+#define TYPE_HYPERV_TEST_DEV "hyperv-testdev"
+#define HYPERV_TEST_DEV(obj) \
+OBJECT_CHECK(HypervTestDev, (obj), TYPE_HYPERV_TEST_DEV)
+
+enum {
+HV_TEST_DEV_SINT_ROUTE_CREATE = 1,
+HV_TEST_DEV_SINT_ROUTE_DESTROY,
+HV_TEST_DEV_SINT_ROUTE_SET_SINT
+};
+
+static int alloc_sint_route_index(HypervTestDev *dev)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(dev->sint_route); i++) {
+if (dev->sint_route[i] == NULL) {
+return i;
+}
+}
+return -1;
+}
+
+static void free_sint_route_index(HypervTestDev *dev, int i)
+{
+assert(i >= 0 && i < ARRAY_SIZE(dev->sint_route));
+dev->sint_route[i] = NULL;
+}
+
+static int find_sint_route_index(HypervTestDev *dev, uint32_t vcpu_id,
+ uint32_t sint)
+{
+HvSintRoute *sint_route;
+int i;
+
+for (i = 0; i < ARRAY_SIZE(dev->sint_route); i++) {
+sint_route = dev->sint_route[i];
+if (sint_route && sint_route->vcpu_id == vcpu_id &&
+sint_route->sint == sint) {
+return i;
+}
+}
+return -1;
+}
+
+static void hv_synic_test_dev_control(HypervTestDev *dev, uint32_t ctl,
+  uint32_t vcpu_id, uint32_t sint)
+{
+int i;
+HvSintRoute *sint_route;
+
+switch (ctl) {
+case HV_TEST_DEV_SINT_ROUTE_CREATE:
+i = alloc_sint_route_index(dev);
+assert(i >= 0);
+sint_route = kvm_hv_sint_route_create(vcpu_id, sint, NULL);
+assert(sint_route);
+dev->sint_route[i] = sint_route;
+break;
+case HV_TEST_DEV_SINT_ROUTE_DESTROY:
+i = find_sint_route_index(dev, vcpu_id, sint);
+assert(i >= 0);
+sint_route = dev->sint_route[i];
+kvm_hv_sint_route_destroy(sint_route);
+free_sint_route_index(dev, i);
+break;
+case HV_TEST_DEV_SINT_ROUTE_SET_SINT:
+i = find_sint_route_index(dev, vcpu_id, sint);
+assert(i >= 0);
+sint_route = dev->sint_route[i];
+kvm_hv_sint_route_set_sint(sint_route);
+break;
+default:
+break;
+}

[PATCH v2 4/5] target-i386/hyperv: Hyper-V SynIC SINT routing and vcpu exit

2015-11-10 Thread Andrey Smetanin
Hyper-V SynIC(synthetic interrupt controller) helpers for
Hyper-V SynIC irq routing setup, irq injection, irq ack
notifications event/message pages changes tracking for future use.

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Paolo Bonzini 
CC: Richard Henderson 
CC: Eduardo Habkost 
CC: "Andreas Färber" 
CC: Marcelo Tosatti 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: kvm@vger.kernel.org

---
 target-i386/Makefile.objs |   2 +-
 target-i386/hyperv.c  | 127 ++
 target-i386/hyperv.h  |  42 +++
 target-i386/kvm.c |   6 +++
 4 files changed, 176 insertions(+), 1 deletion(-)
 create mode 100644 target-i386/hyperv.c
 create mode 100644 target-i386/hyperv.h

diff --git a/target-i386/Makefile.objs b/target-i386/Makefile.objs
index 437d997..2255f46 100644
--- a/target-i386/Makefile.objs
+++ b/target-i386/Makefile.objs
@@ -3,5 +3,5 @@ obj-y += excp_helper.o fpu_helper.o cc_helper.o int_helper.o 
svm_helper.o
 obj-y += smm_helper.o misc_helper.o mem_helper.o seg_helper.o
 obj-y += gdbstub.o
 obj-$(CONFIG_SOFTMMU) += machine.o arch_memory_mapping.o arch_dump.o monitor.o
-obj-$(CONFIG_KVM) += kvm.o
+obj-$(CONFIG_KVM) += kvm.o hyperv.o
 obj-$(call lnot,$(CONFIG_KVM)) += kvm-stub.o
diff --git a/target-i386/hyperv.c b/target-i386/hyperv.c
new file mode 100644
index 000..e79b173
--- /dev/null
+++ b/target-i386/hyperv.c
@@ -0,0 +1,127 @@
+/*
+ * QEMU KVM Hyper-V support
+ *
+ * Copyright (C) 2015 Andrey Smetanin 
+ *
+ * Authors:
+ *  Andrey Smetanin 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "hyperv.h"
+#include "standard-headers/asm-x86/hyperv.h"
+
+int kvm_hv_handle_exit(X86CPU *cpu, struct kvm_hyperv_exit *exit)
+{
+CPUX86State *env = >env;
+
+switch (exit->type) {
+case KVM_EXIT_HYPERV_SYNIC:
+if (!cpu->hyperv_synic) {
+return -1;
+}
+
+/*
+ * For now just track changes in SynIC control and msg/evt pages msr's.
+ * When SynIC messaging/events processing will be added in future
+ * here we will do messages queues flushing and pages remapping.
+ */
+switch (exit->u.synic.msr) {
+case HV_X64_MSR_SCONTROL:
+env->msr_hv_synic_control = exit->u.synic.control;
+break;
+case HV_X64_MSR_SIMP:
+env->msr_hv_synic_msg_page = exit->u.synic.msg_page;
+break;
+case HV_X64_MSR_SIEFP:
+env->msr_hv_synic_evt_page = exit->u.synic.evt_page;
+break;
+default:
+return -1;
+}
+return 0;
+default:
+return -1;
+}
+}
+
+static void kvm_hv_sint_ack_handler(EventNotifier *notifier)
+{
+HvSintRoute *sint_route = container_of(notifier, HvSintRoute,
+   sint_ack_notifier);
+event_notifier_test_and_clear(notifier);
+if (sint_route->sint_ack_clb) {
+sint_route->sint_ack_clb(sint_route);
+}
+}
+
+HvSintRoute *kvm_hv_sint_route_create(uint32_t vcpu_id, uint32_t sint,
+  HvSintAckClb sint_ack_clb)
+{
+HvSintRoute *sint_route;
+int r, gsi;
+
+sint_route = g_malloc0(sizeof(*sint_route));
+r = event_notifier_init(_route->sint_set_notifier, false);
+if (r) {
+goto err;
+}
+
+r = event_notifier_init(_route->sint_ack_notifier, false);
+if (r) {
+goto err_sint_set_notifier;
+}
+
+event_notifier_set_handler(_route->sint_ack_notifier,
+   kvm_hv_sint_ack_handler);
+
+gsi = kvm_irqchip_add_hv_sint_route(kvm_state, vcpu_id, sint);
+if (gsi < 0) {
+goto err_gsi;
+}
+
+r = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
+   _route->sint_set_notifier,
+   _route->sint_ack_notifier, 
gsi);
+if (r) {
+goto err_irqfd;
+}
+sint_route->gsi = gsi;
+sint_route->sint_ack_clb = sint_ack_clb;
+sint_route->vcpu_id = vcpu_id;
+sint_route->sint = sint;
+
+return sint_route;
+
+err_irqfd:
+kvm_irqchip_release_virq(kvm_state, gsi);
+err_gsi:
+event_notifier_set_handler(_route->sint_ack_notifier, NULL);
+event_notifier_cleanup(_route->sint_ack_notifier);
+err_sint_set_notifier:
+event_notifier_cleanup(_route->sint_set_notifier);
+err:
+g_free(sint_route);
+
+return NULL;
+}
+
+void kvm_hv_sint_route_destroy(HvSintRoute *sint_route)
+{
+kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state,
+  

[PATCH 2/2] arm: kvm: Fix STRICT_MM_TYPECHECK errors

2015-11-10 Thread Laura Abbott

PAGE_S2_DEVICE is a pgprot val and needs to be accessed using the proper
accessors. Switch to these accessors to avoid errors with
STRICT_MM_TYPECHECK.

Signed-off-by: Laura Abbott 
---
Found in the course of other work
---
 arch/arm/kvm/mmu.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 6984342..43f8162 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -213,7 +213,8 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
kvm_tlb_flush_vmid_ipa(kvm, addr);
 
/* No need to invalidate the cache for device mappings 
*/
-   if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
PAGE_S2_DEVICE)
+   if ((pte_val(old_pte) & pgprot_val(PAGE_S2_DEVICE)) !=
+pgprot_val(PAGE_S2_DEVICE))
kvm_flush_dcache_pte(old_pte);
 
put_page(virt_to_page(pte));
@@ -306,7 +307,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
pte = pte_offset_kernel(pmd, addr);
do {
if (!pte_none(*pte) &&
-   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
+   (pte_val(*pte) & pgprot_val(PAGE_S2_DEVICE)) !=
+pgprot_val(PAGE_S2_DEVICE))
kvm_flush_dcache_pte(*pte);
} while (pte++, addr += PAGE_SIZE, addr != end);
 }
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 15:44 -0800, Andy Lutomirski wrote:
> 
> > What about partition <-> partition virtio such as what we could do on
> > PAPR systems. That would have the weak barrier bit.
> >
> 
> Is it partition <-> partition, bypassing IOMMU?

No.

> I think I'd settle for just something that doesn't regress
> non-experimental setups that actually work today and that allow new
> setups (x86 with fixed QEMU and maybe something more complicated on
> powerpc and/or sparc) to work in all cases.
> 
> We could certainly just make powerpc and sparc continue bypassing the
> IOMMU until someone comes up with a way to fix it.  I'll send out some
> patches that do that, and maybe that'll help this make progress.

But we haven't found a solution that works. All we have come up with is
a quirk that will force bypass on virtio always and will not allow us
to operate non-bypassing devices on either of those architectures in
the future.

I'm not too happy about this.

Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: x86: work around infinite loop in microcode when #AC is delivered

2015-11-10 Thread Venkatesh Srinivas
On Tue, Nov 10, 2015 at 01:22:52PM +0100, Paolo Bonzini wrote:
> From: Eric Northup 
> 
> It was found that a guest can DoS a host by triggering an infinite
> stream of "alignment check" (#AC) exceptions.  This causes the
> microcode to enter an infinite loop where the core never receives
> another interrupt.  The host kernel panics pretty quickly due to the
> effects (CVE-2015-5307).
> 
> Signed-off-by: Eric Northup 
> Cc: sta...@vger.kernel.org
> Signed-off-by: Paolo Bonzini 

Tested-by: Venkatesh Srinivas 

-- vs;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [kvm-unit-tests PATCHv5 3/3] arm: pmu: Add CPI checking

2015-11-10 Thread Andrew Jones
On Mon, Nov 02, 2015 at 09:58:14AM -0600, Andrew Jones wrote:
> On Fri, Oct 30, 2015 at 03:32:43PM -0400, Christopher Covington wrote:
> > Hi Drew,
> > 
> > On 10/30/2015 09:00 AM, Andrew Jones wrote:
> > > On Wed, Oct 28, 2015 at 03:12:55PM -0400, Christopher Covington wrote:
> > >> Calculate the numbers of cycles per instruction (CPI) implied by ARM
> > >> PMU cycle counter values. The code includes a strict checking facility
> > >> intended for the -icount option in TCG mode but it is not yet enabled
> > >> in the configuration file. Enabling it must wait on infrastructure
> > >> improvements which allow for different tests to be run on TCG versus
> > >> KVM.
> > >>
> > >> Signed-off-by: Christopher Covington 
> > >> ---
> > >>  arm/pmu.c | 103 
> > >> +-
> > >>  1 file changed, 102 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/arm/pmu.c b/arm/pmu.c
> > >> index 4334de4..76a 100644
> > >> --- a/arm/pmu.c
> > >> +++ b/arm/pmu.c
> > >> @@ -43,6 +43,23 @@ static inline unsigned long get_pmccntr(void)
> > >>  asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
> > >>  return cycles;
> > >>  }
> > >> +
> > >> +/*
> > >> + * Extra instructions inserted by the compiler would be difficult to 
> > >> compensate
> > >> + * for, so hand assemble everything between, and including, the PMCR 
> > >> accesses
> > >> + * to start and stop counting.
> > >> + */
> > >> +static inline void loop(int i, uint32_t pmcr)
> > >> +{
> > >> +asm volatile(
> > >> +"   mcr p15, 0, %[pmcr], c9, c12, 0\n"
> > >> +"1: subs%[i], %[i], #1\n"
> > >> +"   bgt 1b\n"
> > >> +"   mcr p15, 0, %[z], c9, c12, 0\n"
> > >> +: [i] "+r" (i)
> > >> +: [pmcr] "r" (pmcr), [z] "r" (0)
> > >> +: "cc");
> > >> +}
> > >>  #elif defined(__aarch64__)
> > >>  static inline uint32_t get_pmcr(void)
> > >>  {
> > >> @@ -64,6 +81,23 @@ static inline unsigned long get_pmccntr(void)
> > >>  asm volatile("mrs %0, pmccntr_el0" : "=r" (cycles));
> > >>  return cycles;
> > >>  }
> > >> +
> > >> +/*
> > >> + * Extra instructions inserted by the compiler would be difficult to 
> > >> compensate
> > >> + * for, so hand assemble everything between, and including, the PMCR 
> > >> accesses
> > >> + * to start and stop counting.
> > >> + */
> > >> +static inline void loop(int i, uint32_t pmcr)
> > >> +{
> > >> +asm volatile(
> > >> +"   msr pmcr_el0, %[pmcr]\n"
> > >> +"1: subs%[i], %[i], #1\n"
> > >> +"   b.gt1b\n"
> > >> +"   msr pmcr_el0, xzr\n"
> > >> +: [i] "+r" (i)
> > >> +: [pmcr] "r" (pmcr)
> > >> +: "cc");
> > >> +}
> > >>  #endif
> > >>  
> > >>  struct pmu_data {
> > >> @@ -131,12 +165,79 @@ static bool check_cycles_increase(void)
> > >>  return true;
> > >>  }
> > >>  
> > >> -int main(void)
> > >> +/*
> > >> + * Execute a known number of guest instructions. Only odd instruction 
> > >> counts
> > >> + * greater than or equal to 3 are supported by the in-line assembly 
> > >> code. The
> > >> + * control register (PMCR_EL0) is initialized with the provided value 
> > >> (allowing
> > >> + * for example for the cycle counter or event counters to be reset). At 
> > >> the end
> > >> + * of the exact instruction loop, zero is written to PMCR_EL0 to disable
> > >> + * counting, allowing the cycle counter or event counters to be read at 
> > >> the
> > >> + * leisure of the calling code.
> > >> + */
> > >> +static void measure_instrs(int num, uint32_t pmcr)
> > >> +{
> > >> +int i = (num - 1) / 2;
> > >> +
> > >> +assert(num >= 3 && ((num - 1) % 2 == 0));
> > >> +loop(i, pmcr);
> > >> +}
> > >> +
> > >> +/*
> > >> + * Measure cycle counts for various known instruction counts. Ensure 
> > >> that the
> > >> + * cycle counter progresses (similar to check_cycles_increase() but 
> > >> with more
> > >> + * instructions and using reset and stop controls). If supplied a 
> > >> positive,
> > >> + * nonzero CPI parameter, also strictly check that every measurement 
> > >> matches
> > >> + * it. Strict CPI checking is used to test -icount mode.
> > >> + */
> > >> +static bool check_cpi(int cpi)
> > >> +{
> > >> +struct pmu_data pmu = {0};
> > >> +
> > >> +pmu.cycle_counter_reset = 1;
> > >> +pmu.enable = 1;
> > >> +
> > >> +if (cpi > 0)
> > >> +printf("Checking for CPI=%d.\n", cpi);
> > >> +printf("instrs : cycles0 cycles1 ...\n");
> > >> +
> > >> +for (int i = 3; i < 300; i += 32) {
> > >> +int avg, sum = 0;
> > >> +
> > >> +printf("%d :", i);
> > >> +for (int j = 0; j < NR_SAMPLES; j++) {
> > >> +int cycles;
> > >> +
> > >> +measure_instrs(i, pmu.pmcr_el0);
> > 

Re: [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch

2015-11-10 Thread Christoffer Dall
On Mon, Nov 09, 2015 at 03:13:15PM -0800, Mario Smarduch wrote:
> 
> 
> On 11/5/2015 7:02 AM, Christoffer Dall wrote:
> > On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
> >> This patch enables arm64 lazy fp/simd switch, similar to arm described in
> >> second patch. Change from previous version - restore function is moved to
> >> host. 
> >>
> >> Signed-off-by: Mario Smarduch 
> >> ---
> >>  arch/arm64/include/asm/kvm_host.h |  2 +-
> >>  arch/arm64/kernel/asm-offsets.c   |  1 +
> >>  arch/arm64/kvm/hyp.S  | 37 
> >> +++--
> >>  3 files changed, 33 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_host.h 
> >> b/arch/arm64/include/asm/kvm_host.h
> >> index 26a2347..dcecf92 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
> >>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
> >>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> >>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> >> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
> >>  
> >>  void kvm_arm_init_debug(void);
> >>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
> >>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> >> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
> >>  
> >>  #endif /* __ARM64_KVM_HOST_H__ */
> >> diff --git a/arch/arm64/kernel/asm-offsets.c 
> >> b/arch/arm64/kernel/asm-offsets.c
> >> index 8d89cf8..c9c5242 100644
> >> --- a/arch/arm64/kernel/asm-offsets.c
> >> +++ b/arch/arm64/kernel/asm-offsets.c
> >> @@ -124,6 +124,7 @@ int main(void)
> >>DEFINE(VCPU_HCR_EL2,offsetof(struct kvm_vcpu, 
> >> arch.hcr_el2));
> >>DEFINE(VCPU_MDCR_EL2,   offsetof(struct kvm_vcpu, arch.mdcr_el2));
> >>DEFINE(VCPU_IRQ_LINES,  offsetof(struct kvm_vcpu, arch.irq_lines));
> >> +  DEFINE(VCPU_VFP_DIRTY,  offsetof(struct kvm_vcpu, arch.vfp_dirty));
> >>DEFINE(VCPU_HOST_CONTEXT,   offsetof(struct kvm_vcpu, 
> >> arch.host_cpu_context));
> >>DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, 
> >> arch.host_debug_state));
> >>DEFINE(VCPU_TIMER_CNTV_CTL, offsetof(struct kvm_vcpu, 
> >> arch.timer_cpu.cntv_ctl));
> >> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> >> index e583613..ed2c4cf 100644
> >> --- a/arch/arm64/kvm/hyp.S
> >> +++ b/arch/arm64/kvm/hyp.S
> >> @@ -36,6 +36,28 @@
> >>  #define CPU_SYSREG_OFFSET(x)  (CPU_SYSREGS + 8*x)
> >>  
> >>.text
> >> +
> >> +/**
> >> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
> >> + *fp/simd switch, saves the guest, restores host. Called from host
> >> + *mode, placed outside of hyp section.
> > 
> > same comments on style as previous patch
> > 
> >> + */
> >> +ENTRY(kvm_restore_host_vfp_state)
> >> +  pushxzr, lr
> >> +
> >> +  add x2, x0, #VCPU_CONTEXT
> >> +  mov w3, #0
> >> +  strbw3, [x0, #VCPU_VFP_DIRTY]
> > 
> > I've been discussing with myself if it would make more sense to clear
> > the dirty flag in the C-code...
> > 
> >> +
> >> +  bl __save_fpsimd
> >> +
> >> +  ldr x2, [x0, #VCPU_HOST_CONTEXT]
> >> +  bl __restore_fpsimd
> >> +
> >> +  pop xzr, lr
> >> +  ret
> >> +ENDPROC(kvm_restore_host_vfp_state)
> >> +
> >>.pushsection.hyp.text, "ax"
> >>.align  PAGE_SHIFT
> >>  
> >> @@ -482,7 +504,11 @@
> >>  99:
> >>msr hcr_el2, x2
> >>mov x2, #CPTR_EL2_TTA
> >> +
> >> +  ldrbw3, [x0, #VCPU_VFP_DIRTY]
> >> +  tbnzw3, #0, 98f
> >>orr x2, x2, #CPTR_EL2_TFP
> >> +98:
> > 
> > mmm, don't you need to only set the fpexc32 when you're actually going
> > to trap the guest accesses?
> > 
> > also, you can consider only setting this in vcpu_load (jumping quickly
> > to EL2 to do so) if we're running a 32-bit guest.  Probably worth
> > measuring the difference between the extra EL2 jump on vcpu_load
> > compared to hitting this register on every entry/exit.
> > 
> > Code-wise, it will be nicer to do it on vcpu_load.
> Hi Christoffer, Marc -
>   just want to run this by you, I ran a test with typical number of
> fp threads and couple lmbench benchmarks  the stride and bandwidth ones. The
> ratio of exits to vcpu puts is high 50:1 or so. But of course that's subject
> to the loads you run.
> 
> I substituted:
> tbnz x2, #HCR_RW_SHIFT, 99f
> mov x3, #(1 << 30)
> msr fpexc32_el2, x3
> isb
> 
> with vcpu_load hyp call and check for 32 bit guest in C
> mov x1, #(1 << 30)
> msr fpexc32_el2, x3
> ret
> 
> And then
> skip_fpsimd_state x8, 2f
> mrs   x6, fpexec_el2
> str   x6, [x3, #16]
> 
> with vcpu_put hyp call with check for 32 bit guest in C this is called
> substantially less often then vcpu_load since fp/simd registers are not
> always dirty on 

Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 10:45 +0100, Knut Omang wrote:
> Can something be done by means of PCIe capabilities?
> ATS (Address Translation Support) seems like a natural choice?

Euh no... ATS is something else completely

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: fix eflags state following processor init/reset

2015-11-10 Thread Wanpeng Li
2015-11-03 18:47 GMT+08:00 Paolo Bonzini :
>
>
> On 28/10/2015 09:10, Nadav Amit wrote:
>> Here are my 5 cents. Note that vmx_vcpu_reset calls:
>>
>>   vmcs_writel(GUEST_RFLAGS, 0x02);
>>
>> (And the RFLAGS value is not cached by KVM, so no consistency problem should
>> occur.)
>>
>> You may want to change the value into constant or call a wrapper function
>> for setting RFLAGS, but I don’t see something broken in the functionality.
>
> I agree.  Wanpeng, if this is just a cleanup, can you send v2 that
> removes or modifies the existing call to vmcs_writel?  If there is a
> bug, can you write a unit test for it?  It should be possible to test
> for the problem using INIT+SIPI on an AP.

You are right, I write a INIT+SIPI kvm-unit-test and didn't trigger
any bug. Please ignore the patch.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvmtool: Makefile: remove static dependency files when make clean

2015-11-10 Thread James Hunt
After make lkvm-static & make clean, the dependency files for static
objects (.xxx.static.o.d) are not removed.

Signed-off-by: Xiaochen Shen 
Signed-off-by: Dimitri John Ledkov 
Signed-off-by: James Hunt 
---
 Makefile | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 9138942..f8fdc66 100644
--- a/Makefile
+++ b/Makefile
@@ -369,6 +369,9 @@ c_flags = -Wp,-MD,$(depfile) $(CFLAGS)
 #
 STATIC_OBJS = $(patsubst %.o,%.static.o,$(OBJS) $(OBJS_STATOPT))
 
+STATIC_DEPS:= $(foreach obj,$(STATIC_OBJS),\
+   $(subst $(comma),_,$(dir $(obj)).$(notdir $(obj)).d))
+
 $(PROGRAM)-static:  $(STATIC_OBJS) $(OTHEROBJS) $(GUEST_INIT) $(GUEST_PRE_INIT)
$(E) "  LINK" $@
$(Q) $(CC) -static $(CFLAGS) $(STATIC_OBJS) $(OTHEROBJS) $(GUEST_OBJS) 
$(LDFLAGS) $(LIBS) $(LIBS_STATOPT) -o $@
@@ -489,7 +492,7 @@ clean:
$(Q) rm -f x86/bios/bios-rom.h
$(Q) rm -f tests/boot/boot_test.iso
$(Q) rm -rf tests/boot/rootfs/
-   $(Q) rm -f $(DEPS) $(OBJS) $(OTHEROBJS) $(OBJS_DYNOPT) $(STATIC_OBJS) 
$(PROGRAM) $(PROGRAM_ALIAS) $(PROGRAM)-static $(GUEST_INIT) $(GUEST_PRE_INIT) 
$(GUEST_OBJS)
+   $(Q) rm -f $(DEPS) $(STATIC_DEPS) $(OBJS) $(OTHEROBJS) $(OBJS_DYNOPT) 
$(STATIC_OBJS) $(PROGRAM) $(PROGRAM_ALIAS) $(PROGRAM)-static $(GUEST_INIT) 
$(GUEST_PRE_INIT) $(GUEST_OBJS)
$(Q) rm -f cscope.*
$(Q) rm -f tags
$(Q) rm -f TAGS
-- 
2.5.0

-
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: x86: rename update_db_bp_intercept to update_bp_intercept

2015-11-10 Thread Paolo Bonzini
Because #DB is now intercepted unconditionally, this callback
only operates on #BP for both VMX and SVM.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/svm.c  | 2 +-
 arch/x86/kvm/vmx.c  | 2 +-
 arch/x86/kvm/x86.c  | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 456a3869a57e..30cfd64295a0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -778,7 +778,7 @@ struct kvm_x86_ops {
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
 
-   void (*update_db_bp_intercept)(struct kvm_vcpu *vcpu);
+   void (*update_bp_intercept)(struct kvm_vcpu *vcpu);
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1cc1ffca0d8c..83a1c643f9a5 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4279,7 +4279,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.vcpu_load = svm_vcpu_load,
.vcpu_put = svm_vcpu_put,
 
-   .update_db_bp_intercept = update_bp_intercept,
+   .update_bp_intercept = update_bp_intercept,
.get_msr = svm_get_msr,
.set_msr = svm_set_msr,
.get_segment_base = svm_get_segment_base,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 89aaedd2a91d..87acc5221740 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10759,7 +10759,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.vcpu_load = vmx_vcpu_load,
.vcpu_put = vmx_vcpu_put,
 
-   .update_db_bp_intercept = update_exception_bitmap,
+   .update_bp_intercept = update_exception_bitmap,
.get_msr = vmx_get_msr,
.set_msr = vmx_set_msr,
.get_segment_base = vmx_get_segment_base,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2cb074f5aaed..aba7f95d7a64 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7115,7 +7115,7 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
*vcpu,
 */
kvm_set_rflags(vcpu, rflags);
 
-   kvm_x86_ops->update_db_bp_intercept(vcpu);
+   kvm_x86_ops->update_bp_intercept(vcpu);
 
r = 0;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Infinite loops in microcode while running guests

2015-11-10 Thread Paolo Bonzini
Yes, these can happen.  The issue is that benign exceptions are
delivered serially, but two of them (#DB and #AC) can also happen
during exception delivery itself.  The subsequent infinite stream
of exceptions causes the processor to never exit guest mode.

Paolo

Eric Northup (1):
  KVM: x86: work around infinite loop in microcode when #AC is delivered

Paolo Bonzini (2):
  KVM: svm: unconditionally intercept #DB
  KVM: x86: rename update_db_bp_intercept to update_bp_intercept

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/include/uapi/asm/svm.h |  1 +
 arch/x86/kvm/svm.c  | 22 +++---
 arch/x86/kvm/vmx.c  |  7 +--
 arch/x86/kvm/x86.c  |  2 +-
 5 files changed, 19 insertions(+), 15 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: x86: work around infinite loop in microcode when #AC is delivered

2015-11-10 Thread Paolo Bonzini
From: Eric Northup 

It was found that a guest can DoS a host by triggering an infinite
stream of "alignment check" (#AC) exceptions.  This causes the
microcode to enter an infinite loop where the core never receives
another interrupt.  The host kernel panics pretty quickly due to the
effects (CVE-2015-5307).

Signed-off-by: Eric Northup 
Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Bonzini 
---
 arch/x86/include/uapi/asm/svm.h | 1 +
 arch/x86/kvm/svm.c  | 8 
 arch/x86/kvm/vmx.c  | 5 -
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
index b5d7640abc5d..8a4add8e4639 100644
--- a/arch/x86/include/uapi/asm/svm.h
+++ b/arch/x86/include/uapi/asm/svm.h
@@ -100,6 +100,7 @@
{ SVM_EXIT_EXCP_BASE + UD_VECTOR,   "UD excp" }, \
{ SVM_EXIT_EXCP_BASE + PF_VECTOR,   "PF excp" }, \
{ SVM_EXIT_EXCP_BASE + NM_VECTOR,   "NM excp" }, \
+   { SVM_EXIT_EXCP_BASE + AC_VECTOR,   "AC excp" }, \
{ SVM_EXIT_EXCP_BASE + MC_VECTOR,   "MC excp" }, \
{ SVM_EXIT_INTR,"interrupt" }, \
{ SVM_EXIT_NMI, "nmi" }, \
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f2ba91990b4e..183926483c3a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1019,6 +1019,7 @@ static void init_vmcb(struct vcpu_svm *svm)
set_exception_intercept(svm, PF_VECTOR);
set_exception_intercept(svm, UD_VECTOR);
set_exception_intercept(svm, MC_VECTOR);
+   set_exception_intercept(svm, AC_VECTOR);
 
set_intercept(svm, INTERCEPT_INTR);
set_intercept(svm, INTERCEPT_NMI);
@@ -1707,6 +1708,12 @@ static int ud_interception(struct vcpu_svm *svm)
return 1;
 }
 
+static int ac_interception(struct vcpu_svm *svm)
+{
+   kvm_queue_exception_e(>vcpu, AC_VECTOR, 0);
+   return 1;
+}
+
 static void svm_fpu_activate(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3270,6 +3277,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm 
*svm) = {
[SVM_EXIT_EXCP_BASE + PF_VECTOR]= pf_interception,
[SVM_EXIT_EXCP_BASE + NM_VECTOR]= nm_interception,
[SVM_EXIT_EXCP_BASE + MC_VECTOR]= mc_interception,
+   [SVM_EXIT_EXCP_BASE + AC_VECTOR]= ac_interception,
[SVM_EXIT_INTR] = intr_interception,
[SVM_EXIT_NMI]  = nmi_interception,
[SVM_EXIT_SMI]  = nop_on_interception,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b765b036a048..89aaedd2a91d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1639,7 +1639,7 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
u32 eb;
 
eb = (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR) |
-(1u << NM_VECTOR) | (1u << DB_VECTOR);
+(1u << NM_VECTOR) | (1u << DB_VECTOR) | (1u << AC_VECTOR);
if ((vcpu->guest_debug &
 (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP)) ==
(KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP))
@@ -5261,6 +5261,9 @@ static int handle_exception(struct kvm_vcpu *vcpu)
return handle_rmode_exception(vcpu, ex_no, error_code);
 
switch (ex_no) {
+   case AC_VECTOR:
+   kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
+   return 1;
case DB_VECTOR:
dr6 = vmcs_readl(EXIT_QUALIFICATION);
if (!(vcpu->guest_debug &
-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] KVM: svm: unconditionally intercept #DB

2015-11-10 Thread Paolo Bonzini
This is needed to avoid the possibility that the guest triggers
an infinite stream of #DB exceptions (CVE-2015-8104).

VMX is not affected: because it does not save DR6 in the VMCS,
it already intercepts #DB unconditionally.

Reported-by: Jan Beulich 
Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/svm.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 183926483c3a..1cc1ffca0d8c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1020,6 +1020,7 @@ static void init_vmcb(struct vcpu_svm *svm)
set_exception_intercept(svm, UD_VECTOR);
set_exception_intercept(svm, MC_VECTOR);
set_exception_intercept(svm, AC_VECTOR);
+   set_exception_intercept(svm, DB_VECTOR);
 
set_intercept(svm, INTERCEPT_INTR);
set_intercept(svm, INTERCEPT_NMI);
@@ -1554,20 +1555,13 @@ static void svm_set_segment(struct kvm_vcpu *vcpu,
mark_dirty(svm->vmcb, VMCB_SEG);
 }
 
-static void update_db_bp_intercept(struct kvm_vcpu *vcpu)
+static void update_bp_intercept(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   clr_exception_intercept(svm, DB_VECTOR);
clr_exception_intercept(svm, BP_VECTOR);
 
-   if (svm->nmi_singlestep)
-   set_exception_intercept(svm, DB_VECTOR);
-
if (vcpu->guest_debug & KVM_GUESTDBG_ENABLE) {
-   if (vcpu->guest_debug &
-   (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))
-   set_exception_intercept(svm, DB_VECTOR);
if (vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP)
set_exception_intercept(svm, BP_VECTOR);
} else
@@ -1673,7 +1667,6 @@ static int db_interception(struct vcpu_svm *svm)
if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
svm->vmcb->save.rflags &=
~(X86_EFLAGS_TF | X86_EFLAGS_RF);
-   update_db_bp_intercept(>vcpu);
}
 
if (svm->vcpu.guest_debug &
@@ -3661,7 +3654,6 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
 */
svm->nmi_singlestep = true;
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
-   update_db_bp_intercept(vcpu);
 }
 
 static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)
@@ -4287,7 +4279,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.vcpu_load = svm_vcpu_load,
.vcpu_put = svm_vcpu_put,
 
-   .update_db_bp_intercept = update_db_bp_intercept,
+   .update_db_bp_intercept = update_bp_intercept,
.get_msr = svm_get_msr,
.set_msr = svm_set_msr,
.get_segment_base = svm_get_segment_base,
-- 
1.8.3.1


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/5] KVM: Hyper-V synthetic interrupt controller

2015-11-10 Thread Andrey Smetanin
This patchset implements the KVM part of the synthetic interrupt
controller (SynIC) which is a building block of the Hyper-V
paravirtualized device bus (vmbus).

SynIC is a lapic extension, which is controlled via MSRs and maintains
for each vCPU
 - 16 synthetic interrupt "lines" (SINT's); each can be configured to
   trigger a specific interrupt vector optionally with auto-EOI
   semantics
 - a message page in the guest memory with 16 256-byte per-SINT message
   slots
 - an event flag page in the guest memory with 16 2048-bit per-SINT
   event flag areas

The host triggers a SINT whenever it delivers a new message to the
corresponding slot or flips an event flag bit in the corresponding area.
The guest informs the host that it can try delivering a message by
explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
MSR.

The userspace (qemu) triggers interrupts and receives EOM notifications
via irqfd with resampler; for that, a GSI is allocated for each
configured SINT, and irq_routing api is extended to support GSI-SINT
mapping.

Besides, a new vcpu exit is introduced to notify the userspace of the
changes in SynIC configuraion triggered by guest writing to the
corresponding MSRs.

Since auto-EOI behavior of SynIC cannot be made compatible with APIC
hardware virtualization, the latter is disabled using a newly
introduced flag, when SynIC is activated.

This patches seria has been tested by running of kvm-unit-tests
(which also includes previosly sent 'hyperv_synic' test) with
host CPU which supports APICv (Intel(R) Xeon(R) CPU E5-2407 v2 @ 2.40GHz)

Signed-off-by: Andrey Smetanin 
Reviewed-by: Roman Kagan 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 
CC: Roman Kagan 
CC: Denis V. Lunev 
CC: qemu-de...@nongnu.org

Changes v4:
* disable APICv in case Hyper-V SynIC enabled
* patchset rebase into latest kvm/queue (10 Nov 2015)
* do Hyper-V SynIC exit only at !host(guest) msr's writes

Changes v3:
* Hyper-V SynIC KVM API documentation fixes

Changes v2:
* irqchip/eventfd preparation improvements to support
arch specific routing entries like Hyper-V SynIC.
* add Hyper-V SynIC vectors into EOI exit bitmap.
* do not use posted interrupts in case of Hyper-V SynIC
AutoEOI vectors

Andrey Smetanin (5):
  kvm/irqchip: kvm_arch_irq_routing_update renaming split
  kvm/x86: split ioapic-handled and EOI exit bitmaps
  kvm/x86: per-vcpu apicv deactivation support
  kvm/x86: Hyper-V synthetic interrupt controller
  kvm/x86: Hyper-V kvm exit

 Documentation/virtual/kvm/api.txt |  41 +
 arch/x86/include/asm/kvm_host.h   |  26 ++-
 arch/x86/kvm/hyperv.c | 335 ++
 arch/x86/kvm/hyperv.h |  23 +++
 arch/x86/kvm/ioapic.c |   4 +-
 arch/x86/kvm/ioapic.h |   7 +-
 arch/x86/kvm/irq.c|   2 +-
 arch/x86/kvm/irq_comm.c   |  41 -
 arch/x86/kvm/lapic.c  |  40 +++--
 arch/x86/kvm/lapic.h  |   9 +-
 arch/x86/kvm/svm.c|  13 +-
 arch/x86/kvm/vmx.c|  48 +++---
 arch/x86/kvm/x86.c|  66 +++-
 include/linux/kvm_host.h  |  12 +-
 include/uapi/linux/kvm.h  |  25 +++
 virt/kvm/irqchip.c|   7 +-
 16 files changed, 625 insertions(+), 74 deletions(-)

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 resend] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Ard Biesheuvel
The open coded tests for checking whether a PTE maps a page as
uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern,
which is not guaranteed to work since the type of a mapping is
not a set of mutually exclusive bits

For HYP mappings, the type is an index into the MAIR table (i.e, the
index itself does not contain any information whatsoever about the
type of the mapping), and for stage-2 mappings it is a bit field where
normal memory and device types are defined as follows:

#define MT_S2_NORMAL0xf
#define MT_S2_DEVICE_nGnRE  0x1

I.e., masking *and* comparing with the latter matches on the former,
and we have been getting lucky merely because the S2 device mappings
also have the PTE_UXN bit set, or we would misidentify memory mappings
as device mappings.

Since the unmap_range() code path (which contains one instance of the
flawed test) is used both for HYP mappings and stage-2 mappings, and
considering the difference between the two, it is non-trivial to fix
this by rewriting the tests in place, as it would involve passing
down the type of mapping through all the functions.

However, since HYP mappings and stage-2 mappings both deal with host
physical addresses, we can simply check whether the mapping is backed
by memory that is managed by the host kernel, and only perform the
D-cache maintenance if this is the case.

Signed-off-by: Ard Biesheuvel 
Tested-by: Pavel Fedin 
Reviewed-by: Christoffer Dall 
---

Resending since I failed to cc the lists the first time around, only
difference is the added tags.

 arch/arm/kvm/mmu.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 6984342da13d..7dace909d5cf 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
__kvm_flush_dcache_pud(pud);
 }
 
+static bool kvm_is_device_pfn(unsigned long pfn)
+{
+   return !pfn_valid(pfn);
+}
+
 /**
  * stage2_dissolve_pmd() - clear and flush huge PMD entry
  * @kvm:   pointer to kvm structure.
@@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
kvm_tlb_flush_vmid_ipa(kvm, addr);
 
/* No need to invalidate the cache for device mappings 
*/
-   if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
PAGE_S2_DEVICE)
+   if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
kvm_flush_dcache_pte(old_pte);
 
put_page(virt_to_page(pte));
@@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
 
pte = pte_offset_kernel(pmd, addr);
do {
-   if (!pte_none(*pte) &&
-   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
+   if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
kvm_flush_dcache_pte(*pte);
} while (pte++, addr += PAGE_SIZE, addr != end);
 }
@@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
return kvm_vcpu_dabt_iswrite(vcpu);
 }
 
-static bool kvm_is_device_pfn(unsigned long pfn)
-{
-   return !pfn_valid(pfn);
-}
-
 /**
  * stage2_wp_ptes - write protect PMD range
  * @pmd:   pointer to pmd entry
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 resend] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Christoffer Dall
On Tue, Nov 10, 2015 at 03:11:20PM +0100, Ard Biesheuvel wrote:
> The open coded tests for checking whether a PTE maps a page as
> uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern,
> which is not guaranteed to work since the type of a mapping is
> not a set of mutually exclusive bits
> 
> For HYP mappings, the type is an index into the MAIR table (i.e, the
> index itself does not contain any information whatsoever about the
> type of the mapping), and for stage-2 mappings it is a bit field where
> normal memory and device types are defined as follows:
> 
> #define MT_S2_NORMAL0xf
> #define MT_S2_DEVICE_nGnRE  0x1
> 
> I.e., masking *and* comparing with the latter matches on the former,
> and we have been getting lucky merely because the S2 device mappings
> also have the PTE_UXN bit set, or we would misidentify memory mappings
> as device mappings.
> 
> Since the unmap_range() code path (which contains one instance of the
> flawed test) is used both for HYP mappings and stage-2 mappings, and
> considering the difference between the two, it is non-trivial to fix
> this by rewriting the tests in place, as it would involve passing
> down the type of mapping through all the functions.
> 
> However, since HYP mappings and stage-2 mappings both deal with host
> physical addresses, we can simply check whether the mapping is backed
> by memory that is managed by the host kernel, and only perform the
> D-cache maintenance if this is the case.
> 
> Signed-off-by: Ard Biesheuvel 
> Tested-by: Pavel Fedin 
> Reviewed-by: Christoffer Dall 

Thanks, applied.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Andy Lutomirski
On Nov 10, 2015 4:44 PM, "Benjamin Herrenschmidt"
 wrote:
>
> On Tue, 2015-11-10 at 15:44 -0800, Andy Lutomirski wrote:
> >
> > > What about partition <-> partition virtio such as what we could do on
> > > PAPR systems. That would have the weak barrier bit.
> > >
> >
> > Is it partition <-> partition, bypassing IOMMU?
>
> No.
>
> > I think I'd settle for just something that doesn't regress
> > non-experimental setups that actually work today and that allow new
> > setups (x86 with fixed QEMU and maybe something more complicated on
> > powerpc and/or sparc) to work in all cases.
> >
> > We could certainly just make powerpc and sparc continue bypassing the
> > IOMMU until someone comes up with a way to fix it.  I'll send out some
> > patches that do that, and maybe that'll help this make progress.
>
> But we haven't found a solution that works. All we have come up with is
> a quirk that will force bypass on virtio always and will not allow us
> to operate non-bypassing devices on either of those architectures in
> the future.
>
> I'm not too happy about this.

Me neither.  At least it wouldn't be a regression, but it's still crappy.

I think that arm is fine, at least.  I was unable to find an arm QEMU
config that has any problems with my patches.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/6] virtio core DMA API conversion

2015-11-10 Thread Benjamin Herrenschmidt
On Tue, 2015-11-10 at 20:46 -0800, Andy Lutomirski wrote:
> Me neither.  At least it wouldn't be a regression, but it's still
> crappy.
> 
> I think that arm is fine, at least.  I was unable to find an arm QEMU
> config that has any problems with my patches.

Ok, give me a few days for my headache & fever to subside see if I can
find something better. David, no idea from your side ? :-)

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/5] target-i386/kvm: Hyper-V SynIC MSR's support

2015-11-10 Thread Paolo Bonzini


On 10/11/2015 13:52, Andrey Smetanin wrote:
> This patch does Hyper-V Synthetic interrupt
> controller(Hyper-V SynIC) MSR's support and
> migration. Hyper-V SynIC is enabled by cpu's
> 'hv-synic' option.
> 
> This patch does not allow cpu creation if
> 'hv-synic' option specified but kernel
> doesn't support Hyper-V SynIC.
> 
> Changes v2:
> * activate Hyper-V SynIC by enabling corresponding vcpu cap
> * reject cpu initialization if user requested Hyper-V SynIC
>   but kernel does not support Hyper-V SynIC
> 
> Signed-off-by: Andrey Smetanin 
> Reviewed-by: Roman Kagan 
> Signed-off-by: Denis V. Lunev 
> CC: Paolo Bonzini 
> CC: Richard Henderson 
> CC: Eduardo Habkost 
> CC: "Andreas Färber" 
> CC: Marcelo Tosatti 
> CC: Roman Kagan 
> CC: Denis V. Lunev 
> CC: kvm@vger.kernel.org
> 
> ---
>  target-i386/cpu-qom.h |  1 +
>  target-i386/cpu.c |  1 +
>  target-i386/cpu.h |  5 
>  target-i386/kvm.c | 67 
> ++-
>  target-i386/machine.c | 39 ++
>  5 files changed, 112 insertions(+), 1 deletion(-)
> 
> diff --git a/target-i386/cpu-qom.h b/target-i386/cpu-qom.h
> index e3bfe9d..7ea5b34 100644
> --- a/target-i386/cpu-qom.h
> +++ b/target-i386/cpu-qom.h
> @@ -94,6 +94,7 @@ typedef struct X86CPU {
>  bool hyperv_reset;
>  bool hyperv_vpindex;
>  bool hyperv_runtime;
> +bool hyperv_synic;
>  bool check_cpuid;
>  bool enforce_cpuid;
>  bool expose_kvm;
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index e5f1c5b..1462e19 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -3142,6 +3142,7 @@ static Property x86_cpu_properties[] = {
>  DEFINE_PROP_BOOL("hv-reset", X86CPU, hyperv_reset, false),
>  DEFINE_PROP_BOOL("hv-vpindex", X86CPU, hyperv_vpindex, false),
>  DEFINE_PROP_BOOL("hv-runtime", X86CPU, hyperv_runtime, false),
> +DEFINE_PROP_BOOL("hv-synic", X86CPU, hyperv_synic, false),
>  DEFINE_PROP_BOOL("check", X86CPU, check_cpuid, true),
>  DEFINE_PROP_BOOL("enforce", X86CPU, enforce_cpuid, false),
>  DEFINE_PROP_BOOL("kvm", X86CPU, expose_kvm, true),
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index fc4a605..8cf33df 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -918,6 +918,11 @@ typedef struct CPUX86State {
>  uint64_t msr_hv_tsc;
>  uint64_t msr_hv_crash_params[HV_X64_MSR_CRASH_PARAMS];
>  uint64_t msr_hv_runtime;
> +uint64_t msr_hv_synic_control;
> +uint64_t msr_hv_synic_version;
> +uint64_t msr_hv_synic_evt_page;
> +uint64_t msr_hv_synic_msg_page;
> +uint64_t msr_hv_synic_sint[HV_SYNIC_SINT_COUNT];
>  
>  /* exception/interrupt handling */
>  int error_code;
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 2a9953b..cfcd01d 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -86,6 +86,7 @@ static bool has_msr_hv_crash;
>  static bool has_msr_hv_reset;
>  static bool has_msr_hv_vpindex;
>  static bool has_msr_hv_runtime;
> +static bool has_msr_hv_synic;
>  static bool has_msr_mtrr;
>  static bool has_msr_xss;
>  
> @@ -521,7 +522,8 @@ static bool hyperv_enabled(X86CPU *cpu)
>  cpu->hyperv_crash ||
>  cpu->hyperv_reset ||
>  cpu->hyperv_vpindex ||
> -cpu->hyperv_runtime);
> +cpu->hyperv_runtime ||
> +cpu->hyperv_synic);
>  }
>  
>  static Error *invtsc_mig_blocker;
> @@ -610,6 +612,14 @@ int kvm_arch_init_vcpu(CPUState *cs)
>  if (cpu->hyperv_runtime && has_msr_hv_runtime) {
>  c->eax |= HV_X64_MSR_VP_RUNTIME_AVAILABLE;
>  }
> +if (cpu->hyperv_synic) {
> +if (!has_msr_hv_synic ||
> +kvm_vcpu_enable_cap(cs, KVM_CAP_HYPERV_SYNIC, 0)) {
> +fprintf(stderr, "Hyper-V SynIC is not supported by 
> kernel\n");
> +return -ENOSYS;
> +}
> +c->eax |= HV_X64_MSR_SYNIC_AVAILABLE;
> +}
>  c = _data.entries[cpuid_i++];
>  c->function = HYPERV_CPUID_ENLIGHTMENT_INFO;
>  if (cpu->hyperv_relaxed_timing) {
> @@ -950,6 +960,10 @@ static int kvm_get_supported_msrs(KVMState *s)
>  has_msr_hv_runtime = true;
>  continue;
>  }
> +if (kvm_msr_list->indices[i] == HV_X64_MSR_SCONTROL) {
> +has_msr_hv_synic = true;
> +continue;
> +}
>  }
>  }
>  
> @@ -1511,6 +1525,31 @@ static int kvm_put_msrs(X86CPU *cpu, int level)
>  kvm_msr_entry_set([n++], HV_X64_MSR_VP_RUNTIME,
>env->msr_hv_runtime);
>  }
> +if (cpu->hyperv_synic) {
> +int j;
> +
> +

Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Ard Biesheuvel
On 10 November 2015 at 13:22, Christoffer Dall
 wrote:
> On Tue, Nov 10, 2015 at 10:45:37AM +0100, Ard Biesheuvel wrote:
>> Hi all,
>>
>> I wonder if this is a better way to address the problem. It looks at
>> the nature of the memory rather than the nature of the mapping, which
>> is probably a more reliable indicator of whether cache maintenance is
>> required when performing the unmap.
>>
>>
>> ---8<
>> The open coded tests for checking whether a PTE maps a page as
>> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
>> which is not guaranteed to work since the type of a mapping is
>> not a set of mutually exclusive bits
>>
>> For HYP mappings, the type is an index into the MAIR table (i.e, the
>> index itself does not contain any information whatsoever about the
>> type of the mapping), and for stage-2 mappings it is a bit field where
>> normal memory and device types are defined as follows:
>>
>> #define MT_S2_NORMAL0xf
>> #define MT_S2_DEVICE_nGnRE  0x1
>>
>> I.e., masking *and* comparing with the latter matches on the former,
>> and we have been getting lucky merely because the S2 device mappings
>> also have the PTE_UXN bit set, or we would misidentify memory mappings
>> as device mappings.
>>
>> Since the unmap_range() code path (which contains one instance of the
>> flawed test) is used both for HYP mappings and stage-2 mappings, and
>> considering the difference between the two, it is non-trivial to fix
>> this by rewriting the tests in place, as it would involve passing
>> down the type of mapping through all the functions.
>>
>> However, since HYP mappings and stage-2 mappings both deal with host
>> physical addresses, we can simply check whether the mapping is backed
>> by memory that is managed by the host kernel, and only perform the
>> D-cache maintenance if this is the case.
>>
>> Signed-off-by: Ard Biesheuvel 
>> ---
>>  arch/arm/kvm/mmu.c | 15 +++
>>  1 file changed, 7 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 6984342da13d..7dace909d5cf 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
>>   __kvm_flush_dcache_pud(pud);
>>  }
>>
>> +static bool kvm_is_device_pfn(unsigned long pfn)
>> +{
>> + return !pfn_valid(pfn);
>> +}
>> +
>>  /**
>>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
>>   * @kvm: pointer to kvm structure.
>> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
>>   kvm_tlb_flush_vmid_ipa(kvm, addr);
>>
>>   /* No need to invalidate the cache for device mappings 
>> */
>> - if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
>> PAGE_S2_DEVICE)
>> + if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
>>   kvm_flush_dcache_pte(old_pte);
>>
>>   put_page(virt_to_page(pte));
>> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> *pmd,
>>
>>   pte = pte_offset_kernel(pmd, addr);
>>   do {
>> - if (!pte_none(*pte) &&
>> - (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>> + if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
>>   kvm_flush_dcache_pte(*pte);
>>   } while (pte++, addr += PAGE_SIZE, addr != end);
>>  }
>> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>>   return kvm_vcpu_dabt_iswrite(vcpu);
>>  }
>>
>> -static bool kvm_is_device_pfn(unsigned long pfn)
>> -{
>> - return !pfn_valid(pfn);
>> -}
>> -
>>  /**
>>   * stage2_wp_ptes - write protect PMD range
>>   * @pmd: pointer to pmd entry
>> --
>> 1.9.1
>>
>
> So PAGE_HYP_DEVICE is used only to map the vgic-v2 regions and
> PAGE_S2_DEVICE is used to map the vgic VCPU interface and for all memory
> regions where the vma has (vm_flags & VM_PFNMAP).
>
> Will these, and only these, cases be covered by the pfn_valid check?
>

The pfn_valid() check will ensure that cache maintenance is only
performed on regions that are known to the host as memory, are managed
by the host (i.e., there is a struct page associated with them) and
will be accessed by the host via cacheable mappings (they are covered
by the linear mapping, or [on ARM] will be kmap'ed cacheable if they
are highmem). If you look at the commit that introduced these tests
(363ef89f8e9b arm/arm64: KVM: Invalidate data cache on unmap), the
concern it addresses is that the guest may perform uncached accesses
to regions that the host has mapped cacheable, meaning guest writes
may be shadowed by clean cachelines, making them invisble to cache
coherent I/O. So afaict, pfn_valid() is an appropriate check here.

pfn_valid() will not misidentify device regions as memory (unless the
host is really broken) so 

Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Christoffer Dall
On Tue, Nov 10, 2015 at 02:15:45PM +0100, Ard Biesheuvel wrote:
> On 10 November 2015 at 13:22, Christoffer Dall
>  wrote:
> > On Tue, Nov 10, 2015 at 10:45:37AM +0100, Ard Biesheuvel wrote:
> >> Hi all,
> >>
> >> I wonder if this is a better way to address the problem. It looks at
> >> the nature of the memory rather than the nature of the mapping, which
> >> is probably a more reliable indicator of whether cache maintenance is
> >> required when performing the unmap.
> >>
> >>
> >> ---8<
> >> The open coded tests for checking whether a PTE maps a page as
> >> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
> >> which is not guaranteed to work since the type of a mapping is
> >> not a set of mutually exclusive bits
> >>
> >> For HYP mappings, the type is an index into the MAIR table (i.e, the
> >> index itself does not contain any information whatsoever about the
> >> type of the mapping), and for stage-2 mappings it is a bit field where
> >> normal memory and device types are defined as follows:
> >>
> >> #define MT_S2_NORMAL0xf
> >> #define MT_S2_DEVICE_nGnRE  0x1
> >>
> >> I.e., masking *and* comparing with the latter matches on the former,
> >> and we have been getting lucky merely because the S2 device mappings
> >> also have the PTE_UXN bit set, or we would misidentify memory mappings
> >> as device mappings.
> >>
> >> Since the unmap_range() code path (which contains one instance of the
> >> flawed test) is used both for HYP mappings and stage-2 mappings, and
> >> considering the difference between the two, it is non-trivial to fix
> >> this by rewriting the tests in place, as it would involve passing
> >> down the type of mapping through all the functions.
> >>
> >> However, since HYP mappings and stage-2 mappings both deal with host
> >> physical addresses, we can simply check whether the mapping is backed
> >> by memory that is managed by the host kernel, and only perform the
> >> D-cache maintenance if this is the case.
> >>
> >> Signed-off-by: Ard Biesheuvel 
> >> ---
> >>  arch/arm/kvm/mmu.c | 15 +++
> >>  1 file changed, 7 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 6984342da13d..7dace909d5cf 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
> >>   __kvm_flush_dcache_pud(pud);
> >>  }
> >>
> >> +static bool kvm_is_device_pfn(unsigned long pfn)
> >> +{
> >> + return !pfn_valid(pfn);
> >> +}
> >> +
> >>  /**
> >>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
> >>   * @kvm: pointer to kvm structure.
> >> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
> >>   kvm_tlb_flush_vmid_ipa(kvm, addr);
> >>
> >>   /* No need to invalidate the cache for device 
> >> mappings */
> >> - if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
> >> PAGE_S2_DEVICE)
> >> + if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
> >>   kvm_flush_dcache_pte(old_pte);
> >>
> >>   put_page(virt_to_page(pte));
> >> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
> >> *pmd,
> >>
> >>   pte = pte_offset_kernel(pmd, addr);
> >>   do {
> >> - if (!pte_none(*pte) &&
> >> - (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> >> + if (!pte_none(*pte) && 
> >> !kvm_is_device_pfn(__phys_to_pfn(addr)))
> >>   kvm_flush_dcache_pte(*pte);
> >>   } while (pte++, addr += PAGE_SIZE, addr != end);
> >>  }
> >> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu 
> >> *vcpu)
> >>   return kvm_vcpu_dabt_iswrite(vcpu);
> >>  }
> >>
> >> -static bool kvm_is_device_pfn(unsigned long pfn)
> >> -{
> >> - return !pfn_valid(pfn);
> >> -}
> >> -
> >>  /**
> >>   * stage2_wp_ptes - write protect PMD range
> >>   * @pmd: pointer to pmd entry
> >> --
> >> 1.9.1
> >>
> >
> > So PAGE_HYP_DEVICE is used only to map the vgic-v2 regions and
> > PAGE_S2_DEVICE is used to map the vgic VCPU interface and for all memory
> > regions where the vma has (vm_flags & VM_PFNMAP).
> >
> > Will these, and only these, cases be covered by the pfn_valid check?
> >
> 
> The pfn_valid() check will ensure that cache maintenance is only
> performed on regions that are known to the host as memory, are managed
> by the host (i.e., there is a struct page associated with them) and
> will be accessed by the host via cacheable mappings (they are covered
> by the linear mapping, or [on ARM] will be kmap'ed cacheable if they
> are highmem). If you look at the commit that introduced these tests
> (363ef89f8e9b arm/arm64: KVM: Invalidate data cache on unmap), the
> concern it addresses is that the guest may perform uncached 

Re: [PATCH v2] ARM/arm64: KVM: test properly for a PTE's uncachedness

2015-11-10 Thread Ard Biesheuvel
On 10 November 2015 at 14:40, Christoffer Dall
 wrote:
> On Tue, Nov 10, 2015 at 02:15:45PM +0100, Ard Biesheuvel wrote:
>> On 10 November 2015 at 13:22, Christoffer Dall
>>  wrote:
>> > On Tue, Nov 10, 2015 at 10:45:37AM +0100, Ard Biesheuvel wrote:
>> >> Hi all,
>> >>
>> >> I wonder if this is a better way to address the problem. It looks at
>> >> the nature of the memory rather than the nature of the mapping, which
>> >> is probably a more reliable indicator of whether cache maintenance is
>> >> required when performing the unmap.
>> >>
>> >>
>> >> ---8<
>> >> The open coded tests for checking whether a PTE maps a page as
>> >> uncached use a flawed 'pte_val(xxx) & CONST != CONST' pattern,
>> >> which is not guaranteed to work since the type of a mapping is
>> >> not a set of mutually exclusive bits
>> >>
>> >> For HYP mappings, the type is an index into the MAIR table (i.e, the
>> >> index itself does not contain any information whatsoever about the
>> >> type of the mapping), and for stage-2 mappings it is a bit field where
>> >> normal memory and device types are defined as follows:
>> >>
>> >> #define MT_S2_NORMAL0xf
>> >> #define MT_S2_DEVICE_nGnRE  0x1
>> >>
>> >> I.e., masking *and* comparing with the latter matches on the former,
>> >> and we have been getting lucky merely because the S2 device mappings
>> >> also have the PTE_UXN bit set, or we would misidentify memory mappings
>> >> as device mappings.
>> >>
>> >> Since the unmap_range() code path (which contains one instance of the
>> >> flawed test) is used both for HYP mappings and stage-2 mappings, and
>> >> considering the difference between the two, it is non-trivial to fix
>> >> this by rewriting the tests in place, as it would involve passing
>> >> down the type of mapping through all the functions.
>> >>
>> >> However, since HYP mappings and stage-2 mappings both deal with host
>> >> physical addresses, we can simply check whether the mapping is backed
>> >> by memory that is managed by the host kernel, and only perform the
>> >> D-cache maintenance if this is the case.
>> >>
>> >> Signed-off-by: Ard Biesheuvel 
>> >> ---
>> >>  arch/arm/kvm/mmu.c | 15 +++
>> >>  1 file changed, 7 insertions(+), 8 deletions(-)
>> >>
>> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> >> index 6984342da13d..7dace909d5cf 100644
>> >> --- a/arch/arm/kvm/mmu.c
>> >> +++ b/arch/arm/kvm/mmu.c
>> >> @@ -98,6 +98,11 @@ static void kvm_flush_dcache_pud(pud_t pud)
>> >>   __kvm_flush_dcache_pud(pud);
>> >>  }
>> >>
>> >> +static bool kvm_is_device_pfn(unsigned long pfn)
>> >> +{
>> >> + return !pfn_valid(pfn);
>> >> +}
>> >> +
>> >>  /**
>> >>   * stage2_dissolve_pmd() - clear and flush huge PMD entry
>> >>   * @kvm: pointer to kvm structure.
>> >> @@ -213,7 +218,7 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
>> >>   kvm_tlb_flush_vmid_ipa(kvm, addr);
>> >>
>> >>   /* No need to invalidate the cache for device 
>> >> mappings */
>> >> - if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
>> >> PAGE_S2_DEVICE)
>> >> + if (!kvm_is_device_pfn(__phys_to_pfn(addr)))
>> >>   kvm_flush_dcache_pte(old_pte);
>> >>
>> >>   put_page(virt_to_page(pte));
>> >> @@ -305,8 +310,7 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t 
>> >> *pmd,
>> >>
>> >>   pte = pte_offset_kernel(pmd, addr);
>> >>   do {
>> >> - if (!pte_none(*pte) &&
>> >> - (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>> >> + if (!pte_none(*pte) && 
>> >> !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> >>   kvm_flush_dcache_pte(*pte);
>> >>   } while (pte++, addr += PAGE_SIZE, addr != end);
>> >>  }
>> >> @@ -1037,11 +1041,6 @@ static bool kvm_is_write_fault(struct kvm_vcpu 
>> >> *vcpu)
>> >>   return kvm_vcpu_dabt_iswrite(vcpu);
>> >>  }
>> >>
>> >> -static bool kvm_is_device_pfn(unsigned long pfn)
>> >> -{
>> >> - return !pfn_valid(pfn);
>> >> -}
>> >> -
>> >>  /**
>> >>   * stage2_wp_ptes - write protect PMD range
>> >>   * @pmd: pointer to pmd entry
>> >> --
>> >> 1.9.1
>> >>
>> >
>> > So PAGE_HYP_DEVICE is used only to map the vgic-v2 regions and
>> > PAGE_S2_DEVICE is used to map the vgic VCPU interface and for all memory
>> > regions where the vma has (vm_flags & VM_PFNMAP).
>> >
>> > Will these, and only these, cases be covered by the pfn_valid check?
>> >
>>
>> The pfn_valid() check will ensure that cache maintenance is only
>> performed on regions that are known to the host as memory, are managed
>> by the host (i.e., there is a struct page associated with them) and
>> will be accessed by the host via cacheable mappings (they are covered
>> by the linear mapping, or [on ARM] will be kmap'ed cacheable if they
>> are 

Ask for ACK (was Re: [Qemu-devel] [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI)

2015-11-10 Thread Xiao Guangrong



On 11/09/2015 07:13 PM, Igor Mammedov wrote:

On Fri, 6 Nov 2015 16:31:43 +0800
Xiao Guangrong  wrote:




On 11/05/2015 10:49 PM, Igor Mammedov wrote:

On Thu, 5 Nov 2015 21:33:39 +0800
Xiao Guangrong  wrote:




On 11/05/2015 09:03 PM, Igor Mammedov wrote:

On Thu, 5 Nov 2015 18:15:31 +0800
Xiao Guangrong  wrote:




On 11/05/2015 05:58 PM, Igor Mammedov wrote:

On Mon,  2 Nov 2015 17:13:27 +0800
Xiao Guangrong  wrote:


A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are

   ^^ missing one 0???


reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
for detailed design

A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
that controls if nvdimm support is enabled, it is true on default and
it is false on 2.4 and its earlier version to keep compatibility

Signed-off-by: Xiao Guangrong 

[...]


@@ -33,6 +33,15 @@
  */
 #define MIN_NAMESPACE_LABEL_SIZE  (128UL << 10)

+/*
+ * A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are

 ^^^ missing 0 or value in define below has 
an extra 0


+ * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
+ * for detailed design.
+ */
+#define NVDIMM_ACPI_MEM_BASE  0xFF00ULL

it still maps RAM at arbitrary place,
that's the reason why VMGenID patches hasn't been merged for
more than several months.
I'm not against of using (more exactly I'm for it) direct mapping
but we should reach consensus when and how to use it first.

I'd wouldn't use addresses below 4G as it may be used firmware or
legacy hardware and I won't bet that 0xFF00ULL won't conflict
with anything.
An alternative place to allocate reserve from could be high memory.
For pc we have "reserved-memory-end" which currently makes sure
that hotpluggable memory range isn't used by firmware.

How about making API that allows to map additional memory
ranges before reserved-memory-end and pushes it up as mappings are
added.

[...]



Really a good study case to me, i tried your patch and moved the 64 bit
staffs to the private method, it worked. :)

Igor, is this the API you want?


Lets get ack from Michael on the idea of RAM mapping before
"reserved-memory-end" first.
If he rejects it then there isn't any other way except of switching
to MMIO instead.


Michael, what's your idea?

BTW, this is the reason why we prefer to reserve memory space just in case
if you missed the thread:
  http://marc.info/?l=qemu-devel=144530844718146=2
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html