Re: [RFC 0/2] VFIO SRIOV support

2015-12-22 Thread Alex Williamson
On Tue, 2015-12-22 at 15:42 +0200, Ilya Lesokhin wrote:
> Today the QEMU hypervisor allows assigning a physical device to a VM,
> facilitating driver development. However, it does not support
> enabling
> SR-IOV by the VM kernel driver. Our goal is to implement such
> support,
> allowing developers working on SR-IOV physical function drivers to
> work
> inside VMs as well.
> 
> This patch series implements the kernel side of our solution.  It
> extends
> the VFIO driver to support the PCIE SRIOV extended capability with
> following features:
> 1. The ability to probe SRIOV BAR sizes.
> 2. The ability to enable and disable sriov.
> 
> This patch series is going to be used by QEMU to expose sriov
> capabilities
> to VM. We already have an early prototype based on Knut Omang's
> patches for
> SRIOV[1]. 
> 
> Open issues:
> 1. Binding the new VFs to VFIO driver.
> Once the VM enables sriov it expects the new VFs to appear inside the
> VM.
> To this end we need to bind the new vfs to the VFIO driver and have
> QEMU
> grab them. We are currently achieve this goal using:
> echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
> but we are not happy about this solution as a system might have
> another
> device with the same id that is unrelated to our VM.
> Other solution we've considered are:
>  a. Having user space unbind and then bind the VFs to VFIO.
>  Typically resulting in an unnecessary probing of the device.
>  b. Adding a driver argument to pci_enable_sriov(...) and have
> vfio call pci_enable_sriov with the vfio driver as argument.
> This solution avoids the unnecessary but is more intrusive.

You could use driver_override for this, but the open issue you haven't
listed is the ownership problem, VFs will be in separate iommu groups
and therefore create separate vfio groups.  How do those get associated
with the user so that we don't have one user controlling the VFs for
another user, or worse for the host kernel.  Whatever solution you come
up with needs to protect the host kernel, first and foremost.  It's not
sufficient to rely on userspace to grab the VFs and sequester them for
use only by that user, the host kernel needs to provide that security
automatically.  Thanks,

Alex

> 2. How to tell if it is safe to disable SRIOV?
> In the current implementation, a userspace can enable sriov, grab one
> of
> the VFs and then call disable sriov without releasing the
> device.  This
> will result in a deadlock where the user process is stuck inside
> disable
> sriov waiting for itself to release the device. Killing the process
> leaves
> it in a zombie state.
> We also get a strange warning saying:
> [  181.668492] WARNING: CPU: 22 PID: 3684 at kernel/sched/core.c:7497
> __might_sleep+0x77/0x80() 
> [  181.668502] do not call blocking ops when !TASK_RUNNING; state=1
> set at [] prepare_to_wait_event+0x63/0xf0
> 
> 3. How to expose the Supported Page Sizes and System Page Size
> registers in
> the SRIOV capability? 
> Presently the hypervisor initializes Supported Page Sizes once and
> assumes
> it doesn't change therefore we cannot allow user space to change this
> register at will. The first solution that comes to mind is to expose
> a
> device that only supports the page size selected by the hypervisor.
> Unfourtently, Per SR-IOV spec section 3.3.12, PFs are required to
> support
> 4-KB, 8-KB, 64-KB, 256-KB, 1-MB, and 4-MB page sizes. We currently
> map both
> registers as virtualized and read only and leave user space to worry
> about
> this problem.
> 
> 4. Other SRIOV capabilities.
> Do we want to hide capabilities we do not support in the SR-IOV
> Capabilities register? or leave it to the userspace application?
> 
> [1] https://github.com/knuto/qemu/tree/sriov_patches_v6
> 
> Ilya Lesokhin (2):
>   PCI: Expose iov_set_numvfs and iov_resource_size for modules.
>   VFIO: Add support for SRIOV extended capablity
> 
>  drivers/pci/iov.c  |   4 +-
>  drivers/vfio/pci/vfio_pci_config.c | 169
> +
>  include/linux/pci.h|   4 +
>  3 files changed, 159 insertions(+), 18 deletions(-)
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-22 Thread Kevin O'Connor
On Tue, Dec 22, 2015 at 03:15:26AM +, Xulei (Stone) wrote:
> Hi, Kevin,
> Can you tell how to reset/reboot this VM, if it goes to the handle_hwpic1()
> on its booting procedure? I mean, usually, SeaBIOS would not go to 
> handle_hwpic routine. But in my test case, SeaBIOS calls handle_hwpic when
> KVM injects a #UD expcetion (not irq) and  SeaBIOS will loop to handle this
> if KVM persistently injects exception.
>  
> Now, i just wish to reset/reboot this VM if it is fall into handle_hwpic. I
> tried follwing patch and it seems not work. What can i do to force 
> reset/reboot? 

Call the reset() function.

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] Add a rudimentary manpage

2015-12-22 Thread Will Deacon
Hi Andre, of Phoronix fame,

On Tue, Dec 22, 2015 at 02:00:46PM +, Andre Przywara wrote:
> The kvmtool documentation is somewhat lacking, also it is not easily
> accessible when living in the source tree only.
> Add a good ol' manpage to document at least the basic commands and
> their options.
> This level of documentation matches the one that is already there in
> the Documentation directory and should be subject to extension.
> 
> Signed-off-by: Andre Przywara 
> ---
>  Documentation/kvmtool.1 | 222 
> 
>  1 file changed, 222 insertions(+)
>  create mode 100644 Documentation/kvmtool.1
> 
> diff --git a/Documentation/kvmtool.1 b/Documentation/kvmtool.1
> new file mode 100644
> index 000..aecb2dc
> --- /dev/null
> +++ b/Documentation/kvmtool.1
> @@ -0,0 +1,222 @@
> +.\" Manpage for kvmtool
> +.\" Copyright (C) 2015 by Andre Przywara 
> +.TH kvmtool 1 "11 Nov 2015" "0.1" "kvmtool man page"
> +.SH NAME
> +kvmtool \- running KVM guests
> +.SH SYNOPSIS
> +lkvm COMMAND [ARGS]
> +.SH DESCRIPTION
> +kvmtool is a userland tool for creating and controlling KVM guests.
> +.SH "KVMTOOL COMMANDS"
> +.sp
> +.PP
> +.B run -k  ...

You seem to be inconsistent with your synopses for each command. That is,
here you just have -k  ... , but later you have things like

> +.B debug --all|--name  [--dump] [--nmi ] [--sysrq ]

which describes all of the possible options to debug. I think I prefer
this latter way, so could you make all of the commands look like that,
please?

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] VFIO: platform: reset: fix a warning message condition

2015-12-22 Thread Alex Williamson
On Thu, 2015-12-17 at 15:27 +0300, Dan Carpenter wrote:
> This loop ends with count set to -1 and not zero so the warning
> message
> isn't printed when it should be.  I've fixed this by change the
> postop
> to a preop.
> 
> Fixes: 0990822c9866 ('VFIO: platform: reset: AMD xgbe reset module')
> Signed-off-by: Dan Carpenter 

Applied to next for v4.5 with Eric's Reviewed-by.  Thanks!

Alex

> diff --git a/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
> b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
> index da5356f..d4030d0 100644
> --- a/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
> +++ b/drivers/vfio/platform/reset/vfio_platform_amdxgbe.c
> @@ -110,7 +110,7 @@ int vfio_platform_amdxgbe_reset(struct
> vfio_platform_device *vdev)
>   usleep_range(10, 15);
>  
>   count = 2000;
> - while (count-- && (ioread32(xgmac_regs->ioaddr + DMA_MR) &
> 1))
> + while (--count && (ioread32(xgmac_regs->ioaddr + DMA_MR) &
> 1))
>   usleep_range(500, 600);
>  
>   if (!count)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 3/3] add timeout support

2015-12-22 Thread Andrew Jones
On Tue, Dec 22, 2015 at 07:02:21PM +0100, Radim Krčmář wrote:
> 2015-12-21 13:45-0600, Andrew Jones:
> > On Mon, Dec 21, 2015 at 06:04:20PM +0100, Radim Krčmář wrote:
> >> 2015-12-17 14:10-0600, Andrew Jones:
> >> > diff --git a/run_tests.sh b/run_tests.sh
> >> > @@ -21,6 +21,7 @@ function run()
> >> > +local timeout="${9:-$TIMEOUT}"
> >> > diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
> >> > @@ -97,8 +98,12 @@ if [ "\$QEMU" ]; then
> >> > +if [ "$timeout" ]; then
> >> > +timeout_cmd='timeout --foreground $timeout'
> >> 
> >> Both would be nicer if they took the TIMEOUT variable as an override.
> > 
> > Everything already takes TIMEOUT as an override, i.e.
> > 
> > TIMEOUT=3 ./run_tests.sh
> > 
> > and
> > 
> > TIMEOUT=3 arm/run arm/test.flat
> > 
> > will both already set a timeout for any test that didn't have a timeout
> > set in unittests.cfg, or wasn't run with run()/unittests.cfg.
> 
> Tests made with mkstandalone.sh ignore the TIMEOUT variable ...
> 
> >   Or, did
> > you mean that you'd prefer TIMEOUT to override the timeout in
> > unittests.cfg?
> 
> ... and yes, I think that we could have a
> - global timeout for all tests.  Something far longer than any tests
>   should take (2 minutes?).  To automatically handle random hangs.
> 
> - per-test timeout in unittests.cfg.  When the test is known to timeout
>   often and the usual time to fail is much shorter than the global
>   default.  (Shouldn't be used much.)
> 
> - TIMEOUT variable.  It has to override the global timeout and I think
>   that if we are ever going to use it, it's because we want something
>   weird.  Like using `TIMEOUT=0 ./run_tests.sh` to disable all
>   timeouts, prolonging/shortening timeouts because of a special
>   configuration, ...
>   Because we should improve our defaults otherwise.

OK, I'll do something allowing us to easily enable a long default
timeout.

> 
>   (I'd probably allow something as evil as `eval`ing the TIMEOUT, for
>unlikely stuff like TIMEOUT='$(( ${timeout:-10} / 2 ))')

I'd prefer to avoid the evil^Weval stuff... And the timeout duration can
already be a floating point.

> 
> >That does make some sense, in the case the one in the
> > config is longer than desired, however it also makes sense the way I
> > have it now when the one in the config is shorter than TIMEOUT (the
> > fallback default). I think I like it this way better.
> 
> Ok, the difference was negligible to begin with.
> 
> >> We already don't do that for accel and the patch seems ok in other
> >> regards,
> > 
> > Hmm, for accel I see a need for a patch allowing us to do
> > 
> > ACCEL=?? ./run_tests.sh
> 
> Btw. why do we have ACCEL when the project is *kvm*_unit_tests?

arm tests are sometimes tcg only. Hey, we'll take what we get for
arm, as we're sadly missing everything...

> 
> > as I already have for TIMEOUT. Also, for both I should add a
> > mkstandalone patch allowing
> > 
> > TIMEOUT=? ACCEL=? make standalone
> 
> I'd also handle TIMEOUT/ACCEL in resulting standalone tests.

OK

Thanks,
drew

> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kernel] vfio: Add explicit alignments in vfio_iommu_spapr_tce_create

2015-12-22 Thread Alex Williamson
On Fri, 2015-12-18 at 12:35 +1100, Alexey Kardashevskiy wrote:
> The vfio_iommu_spapr_tce_create struct has 4x32bit and 2x64bit fields
> which should have resulted in sizeof(fio_iommu_spapr_tce_create)
> equal
> to 32 bytes. However due to the gcc's default alignment, the actual
> size of this struct is 40 bytes.
> 
> This fills gaps with __resv1/2 fields.
> 
> This should not cause any change in behavior.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---

Applied to next for v4.5 with David's ack.  Thanks!

Alex

>  include/uapi/linux/vfio.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9fd7b5d..d117233 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -568,8 +568,10 @@ struct vfio_iommu_spapr_tce_create {
>   __u32 flags;
>   /* in */
>   __u32 page_shift;
> + __u32 __resv1;
>   __u64 window_size;
>   __u32 levels;
> + __u32 __resv2;
>   /* out */
>   __u64 start_addr;
>  };

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-22 Thread Yang Zhang

On 2015/12/23 3:52, rkrc...@redhat.com wrote:

2015-12-22 07:19+, Wu, Feng:

From: Yang Zhang [mailto:yang.zhang...@gmail.com]
On 2015/12/22 14:59, Wu, Feng wrote:

From: Yang Zhang [mailto:yang.zhang...@gmail.com]

On 2015/12/16 9:37, Feng Wu wrote:

+   for_each_set_bit(i, , 16) {
+   if (!dst[i]

&& !kvm_lapic_enabled(dst[i]->vcpu)) {

It should be or(||) not and (&&).


Oh, you are right! My negligence! Thanks for pointing this out, Yang!


btw, i think the kvm_lapic_enabled check is wrong here? Why need it here?


If the lapic is not enabled, I think we cannot recognize it as a candidate, can

we?

Maybe Radim can confirm this, Radim, what is your option?


SDM 10.6.2.2 Logical Destination Mode:
   For both configurations of logical destination mode, when combined
   with lowest priority delivery mode, software is responsible for
   ensuring that all of the local APICs included in or addressed by the
   IPI or I/O subsystem interrupt are present and enabled to receive the
   interrupt.

The case is undefined if some targeted LAPICs weren't hardware enabled
as no interrupts can be delivered to hardware disabled LAPIC, so we can
check for hardware enabled.

It's not obvious if "enabled to receive the interrupt" means hardware or
software enabled, but lowest priority cannot deliver NMI/INIT/..., so
checking for software enabled doesn't restrict any valid uses either.


Agree. My understanding is that it is software's responsibility to 
ensuring this case not happen. But for hypervisor, we should not check 
it for software. What we can do is just follow the SDM.




so ... KVM only musn't blow up when encountering this situation :)

The current code seems correct, but redundant.  Just for reference, KVM
now does:
- check for software enabled LAPIC since patch aefd18f01ee8 ("KVM: x86:
   In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's")
- check only for hardware enabled LAPIC in the fast path, since
   1e08ec4a130e ("KVM: optimize apic interrupt delivery"))

(v1 was arguable better, I pointed the need for enabled LAPIC in v1 only
  from looking at one KVM function, sorry.)


Lapic can be disable by hw or sw. Here we only need to check the hw is
enough which is already covered while injecting the interrupt into
guest. I remember we(Glab, Macelo and me) have discussed it several ago,
but i cannot find the mail thread.




But if the lapic is disabled by software, we cannot still inject interrupts to
it, can we?


Yes, We cannot inject the normal interrupt. But this already covered by
current logic and add a check here seems meaningless. Conversely, it may
do bad thing..



Let's wait for Radim/Paolo's opinions about this.


I'd pick whatever results in less code: this time it seems like checking
for hardware enabled LAPIC in both paths (implicitly in the fast path).
Maybe it can be done better, I haven't given it much thought.

We should revert aefd18f01ee8 at the same time, so our PI/non-PI slow
paths won't diverge -- I hope it wasn't fixing a bug :)

I'll review the series tomorrow, thanks for your patience.


--
best regards
yang
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-22 Thread Wu, Feng


> -Original Message-
> From: rkrc...@redhat.com [mailto:rkrc...@redhat.com]
> Sent: Wednesday, December 23, 2015 3:53 AM
> To: Wu, Feng 
> Cc: Yang Zhang ; pbonz...@redhat.com;
> kvm@vger.kernel.org; linux-ker...@vger.kernel.org; Jiang Liu
> (jiang@linux.intel.com) 
> Subject: Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-
> priority interrupts
> 
> 2015-12-22 07:19+, Wu, Feng:
> >> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> >> On 2015/12/22 14:59, Wu, Feng wrote:
> >> >> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
> >> >> On 2015/12/16 9:37, Feng Wu wrote:
> >> >>> +  for_each_set_bit(i, , 16) {
> >> >>> +  if (!dst[i]
> >> >> && !kvm_lapic_enabled(dst[i]->vcpu)) {
> >> >>
> >> >> It should be or(||) not and (&&).
> >> >
> >> > Oh, you are right! My negligence! Thanks for pointing this out, Yang!
> >> 
> >>  btw, i think the kvm_lapic_enabled check is wrong here? Why need it
> here?
> >> >>>
> >> >>> If the lapic is not enabled, I think we cannot recognize it as a 
> >> >>> candidate,
> can
> >> >> we?
> >> >>> Maybe Radim can confirm this, Radim, what is your option?
> 
> SDM 10.6.2.2 Logical Destination Mode:
>   For both configurations of logical destination mode, when combined
>   with lowest priority delivery mode, software is responsible for
>   ensuring that all of the local APICs included in or addressed by the
>   IPI or I/O subsystem interrupt are present and enabled to receive the
>   interrupt.
> 

Radim, thanks a lot for your feedback!

> The case is undefined if some targeted LAPICs weren't hardware enabled
> as no interrupts can be delivered to hardware disabled LAPIC, so we can
> check for hardware enabled.
> 
> It's not obvious if "enabled to receive the interrupt" means hardware or
> software enabled, but lowest priority cannot deliver NMI/INIT/..., so
> checking for software enabled doesn't restrict any valid uses either.
> 
> so ... KVM only musn't blow up when encountering this situation :)
> 
> The current code seems correct, but redundant.  Just for reference, KVM
> now does:
> - check for software enabled LAPIC since patch aefd18f01ee8 ("KVM: x86:
>   In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's")
> - check only for hardware enabled LAPIC in the fast path, since
>   1e08ec4a130e ("KVM: optimize apic interrupt delivery"))

Software enabled LAPIC is also checked in patch 1e08ec4a130e
("KVM: optimize apic interrupt delivery"), however, it was removed
in patch 3b5a5ffa928a3f875b0d5dd284eeb7c322e1688a. Now I am
a little confused about the policy, when and where should we do
the software/hardware enabled check?

> 
> (v1 was arguable better, I pointed the need for enabled LAPIC in v1 only
>  from looking at one KVM function, sorry.)
> 
> >> >> Lapic can be disable by hw or sw. Here we only need to check the hw is
> >> >> enough which is already covered while injecting the interrupt into
> >> >> guest. I remember we(Glab, Macelo and me) have discussed it several ago,
> >> >> but i cannot find the mail thread.
> >>
> >> >
> >> > But if the lapic is disabled by software, we cannot still inject 
> >> > interrupts to
> >> > it, can we?
> >>
> >> Yes, We cannot inject the normal interrupt. But this already covered by
> >> current logic and add a check here seems meaningless. Conversely, it may
> >> do bad thing..
> >>
> >
> > Let's wait for Radim/Paolo's opinions about this.
> 
> I'd pick whatever results in less code: this time it seems like checking
> for hardware enabled LAPIC in both paths (implicitly in the fast path).
> Maybe it can be done better, I haven't given it much thought.
> 
> We should revert aefd18f01ee8 at the same time, so our PI/non-PI slow
> paths won't diverge -- I hope it wasn't fixing a bug :)

>From the change log, It seems to me this patch was fixing a bug.

Thanks,
Feng
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] KVM: x86: Use vector-hashing to deliver lowest-priority interrupts

2015-12-22 Thread rkrc...@redhat.com
2015-12-22 07:19+, Wu, Feng:
>> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
>> On 2015/12/22 14:59, Wu, Feng wrote:
>> >> From: Yang Zhang [mailto:yang.zhang...@gmail.com]
>> >> On 2015/12/16 9:37, Feng Wu wrote:
>> >>> +for_each_set_bit(i, , 16) {
>> >>> +if (!dst[i]
>> >> && !kvm_lapic_enabled(dst[i]->vcpu)) {
>> >>
>> >> It should be or(||) not and (&&).
>> >
>> > Oh, you are right! My negligence! Thanks for pointing this out, Yang!
>> 
>>  btw, i think the kvm_lapic_enabled check is wrong here? Why need it 
>>  here?
>> >>>
>> >>> If the lapic is not enabled, I think we cannot recognize it as a 
>> >>> candidate, can
>> >> we?
>> >>> Maybe Radim can confirm this, Radim, what is your option?

SDM 10.6.2.2 Logical Destination Mode:
  For both configurations of logical destination mode, when combined
  with lowest priority delivery mode, software is responsible for
  ensuring that all of the local APICs included in or addressed by the
  IPI or I/O subsystem interrupt are present and enabled to receive the
  interrupt.

The case is undefined if some targeted LAPICs weren't hardware enabled
as no interrupts can be delivered to hardware disabled LAPIC, so we can
check for hardware enabled.

It's not obvious if "enabled to receive the interrupt" means hardware or
software enabled, but lowest priority cannot deliver NMI/INIT/..., so
checking for software enabled doesn't restrict any valid uses either.

so ... KVM only musn't blow up when encountering this situation :)

The current code seems correct, but redundant.  Just for reference, KVM
now does:
- check for software enabled LAPIC since patch aefd18f01ee8 ("KVM: x86:
  In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC's")
- check only for hardware enabled LAPIC in the fast path, since
  1e08ec4a130e ("KVM: optimize apic interrupt delivery"))

(v1 was arguable better, I pointed the need for enabled LAPIC in v1 only
 from looking at one KVM function, sorry.)

>> >> Lapic can be disable by hw or sw. Here we only need to check the hw is
>> >> enough which is already covered while injecting the interrupt into
>> >> guest. I remember we(Glab, Macelo and me) have discussed it several ago,
>> >> but i cannot find the mail thread.
>>
>> >
>> > But if the lapic is disabled by software, we cannot still inject 
>> > interrupts to
>> > it, can we?
>> 
>> Yes, We cannot inject the normal interrupt. But this already covered by
>> current logic and add a check here seems meaningless. Conversely, it may
>> do bad thing..
>> 
> 
> Let's wait for Radim/Paolo's opinions about this.

I'd pick whatever results in less code: this time it seems like checking
for hardware enabled LAPIC in both paths (implicitly in the fast path).
Maybe it can be done better, I haven't given it much thought.

We should revert aefd18f01ee8 at the same time, so our PI/non-PI slow
paths won't diverge -- I hope it wasn't fixing a bug :)

I'll review the series tomorrow, thanks for your patience.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvm-unit-tests PATCH 3/3] add timeout support

2015-12-22 Thread Radim Krčmář
2015-12-21 13:45-0600, Andrew Jones:
> On Mon, Dec 21, 2015 at 06:04:20PM +0100, Radim Krčmář wrote:
>> 2015-12-17 14:10-0600, Andrew Jones:
>> > diff --git a/run_tests.sh b/run_tests.sh
>> > @@ -21,6 +21,7 @@ function run()
>> > +local timeout="${9:-$TIMEOUT}"
>> > diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
>> > @@ -97,8 +98,12 @@ if [ "\$QEMU" ]; then
>> > +if [ "$timeout" ]; then
>> > +  timeout_cmd='timeout --foreground $timeout'
>> 
>> Both would be nicer if they took the TIMEOUT variable as an override.
> 
> Everything already takes TIMEOUT as an override, i.e.
> 
> TIMEOUT=3 ./run_tests.sh
> 
> and
> 
> TIMEOUT=3 arm/run arm/test.flat
> 
> will both already set a timeout for any test that didn't have a timeout
> set in unittests.cfg, or wasn't run with run()/unittests.cfg.

Tests made with mkstandalone.sh ignore the TIMEOUT variable ...

>   Or, did
> you mean that you'd prefer TIMEOUT to override the timeout in
> unittests.cfg?

... and yes, I think that we could have a
- global timeout for all tests.  Something far longer than any tests
  should take (2 minutes?).  To automatically handle random hangs.

- per-test timeout in unittests.cfg.  When the test is known to timeout
  often and the usual time to fail is much shorter than the global
  default.  (Shouldn't be used much.)

- TIMEOUT variable.  It has to override the global timeout and I think
  that if we are ever going to use it, it's because we want something
  weird.  Like using `TIMEOUT=0 ./run_tests.sh` to disable all
  timeouts, prolonging/shortening timeouts because of a special
  configuration, ...
  Because we should improve our defaults otherwise.

  (I'd probably allow something as evil as `eval`ing the TIMEOUT, for
   unlikely stuff like TIMEOUT='$(( ${timeout:-10} / 2 ))')

>That does make some sense, in the case the one in the
> config is longer than desired, however it also makes sense the way I
> have it now when the one in the config is shorter than TIMEOUT (the
> fallback default). I think I like it this way better.

Ok, the difference was negligible to begin with.

>> We already don't do that for accel and the patch seems ok in other
>> regards,
> 
> Hmm, for accel I see a need for a patch allowing us to do
> 
> ACCEL=?? ./run_tests.sh

Btw. why do we have ACCEL when the project is *kvm*_unit_tests?

> as I already have for TIMEOUT. Also, for both I should add a
> mkstandalone patch allowing
> 
> TIMEOUT=? ACCEL=? make standalone

I'd also handle TIMEOUT/ACCEL in resulting standalone tests.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch

2015-12-22 Thread Mario Smarduch


On 12/22/2015 12:06 AM, Christoffer Dall wrote:
> On Mon, Dec 21, 2015 at 11:34:25AM -0800, Mario Smarduch wrote:
>>
>>
>> On 12/18/2015 11:45 PM, Christoffer Dall wrote:
>>> On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
 On 12/18/2015 5:54 AM, Christoffer Dall wrote:
> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
[...]

>> +  * we set FPEXC.EN to prevent traps to EL1, when setting the TFP 
>> bit.
>> +  */
>> +ENTRY(__kvm_vcpu_enable_fpexc32)
>> +mov x3, #(1 << 30)
>> +msr fpexc32_el2, x3
>> +isb
>
> this is only called via a hypercall so do you really need the ISB?

 Same comment as in 2nd patch for the isb.

>>>
>>> Unless you can argue that something needs to take effect before
>>> something else, where there's no other implicit barrier, you don't need
>>> the ISB.
>>
>> Make sense an exception level change should be a barrier. It was not there
>> before I put it in due to lack of info on meaning of 'implicit'. The manual 
>> has
>> more info on implicit barriers for operations like DMB.
> 
> if the effect from the register write just has to be visible after
> taking an exception, then you don't need the ISB.

Good definition, should be in the manual :)

Thanks.
> 
>>
>> Speaking of ISB it doesn't appear like this one is needed, it's between 
>> couple
>> register reads in 'save_time_state' macro.
>>
>> mrc p15, 0, r2, c14, c3, 1  @ CNTV_CTL
>> str r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
>>
>> isb
>>
>> mrrcp15, 3, rr_lo_hi(r2, r3), c14   @ CNTV_CVAL
>>
> 
> I think there was a reason for that one, so let's not worry about that
> for now.
> 
> -Christoffer
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-22 Thread Gonglei (Arei)
> -Original Message-
> From: Kevin O'Connor [mailto:ke...@koconnor.net]
> Sent: Tuesday, December 22, 2015 11:51 PM
> To: Gonglei (Arei)
> Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> problem on qemu-kvm platform
> 
> On Tue, Dec 22, 2015 at 02:14:12AM +, Gonglei (Arei) wrote:
> > > From: Kevin O'Connor [mailto:ke...@koconnor.net]
> > > Sent: Tuesday, December 22, 2015 2:47 AM
> > > To: Gonglei (Arei)
> > > Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> > > Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> > > Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> > > problem on qemu-kvm platform
> > >
> > > On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote:
> > > > When the gurb of OS is booting, then the softirq and C function
> > > > send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI,
> > > > romlayout.S: irqentry_extrastack is invoked, and the extra stack will
> > > > be used again. And the stack of first calling will be broken, so that 
> > > > the
> > > SeaBIOS stuck.
> > > >
> > > > You can easily reproduce the problem.
> > > >
> > > > 1. start on guest
> > > > 2. reset the guest
> > > > 3. inject a NMI when the guest show the grub surface 4. then the guest
> > > > stuck
> > >
> > > Does the SeaBIOS patch below help?
> >
> > Sorry, it doesn't work. What's worse is we cannot stop SeaBIOS stuck by
> > Setting "CONFIG_ENTRY_EXTRASTACK=n" after applying this patch.
> 
> Oops, can you try with the patch below instead?
> 

It works now. Thanks!

But do we need to check other possible situations
that maybe cause *extra stack* broken or overridden?


> > > I'm not familiar with how to "inject a
> > > NMI" - can you describe the process in more detail?
> > >
> >
> > 1. Qemu Command line:
> >
> > #: /home/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096
> -smp 8 -name suse -vnc 0.0.0.0:10 \
> > -device virtio-scsi-pci,id=scsi0 -drive
> file=/home/suse11_sp3_32_2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=
> none,aio=native \
> > -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
> > -chardev file,id=seabios,path=/home/seabios.log -device
> isa-debugcon,iobase=0x402,chardev=seabios \
> > -monitor stdio -qmp unix:/tmp/qmp,server,nowait
> >
> > 2. Inject a NMI by QMP:
> >
> > #: /home/qemu/scripts/qmp # ./qmp-shell /tmp/qmp
> > Welcome to the QMP low-level shell!
> > Connected to QEMU 2.5.0
> >
> > (QEMU) system_reset
> > {"return": {}}
> > (QEMU) inject-nmi
> > {"return": {}}
> > (QEMU) inject-nmi
> > {"return": {}}
> >
> 
> I tried a few simple tests but was not able to reproduce.
> 
After reset the guest, then you inject an NMI when you see the grub surface
ASAP. 

Kevin, I sent you a picture in private. :)


Regards,
-Gonglei

> -Kevin
> 
> 
> --- a/src/romlayout.S
> +++ b/src/romlayout.S
> @@ -548,7 +548,10 @@ entry_post:
>  ENTRY_INTO32 _cfunc32flat_handle_post   // Normal entry point
> 
>  ORG 0xe2c3
> -IRQ_ENTRY 02
> +.global entry_02
> +entry_02:
> +ENTRY handle_02  // NMI handler does not switch onto extra stack
> +iretw
> 
>  ORG 0xe3fe
>  .global entry_13_official
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] arm64: KVM: Do not update PC if the trap handler has updated it

2015-12-22 Thread Marc Zyngier
Assuming we trap a system register, and decide that the access is
illegal, we will inject an exception in the guest. In this
case, we shouldn't increment the PC, or the vcpu will miss the
first instruction of the handler, leading to a mildly confused
guest.

Solve this by snapshoting PC before the access is performed,
and checking if it has moved or not before incrementing it.

Reported-by: Shannon Zhao 
Signed-off-by: Marc Zyngier 
---
 arch/arm64/kvm/sys_regs.c | 73 +++
 1 file changed, 36 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index d2650e8..9c87e0c 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -966,6 +966,39 @@ static const struct sys_reg_desc *find_reg(const struct 
sys_reg_params *params,
return NULL;
 }
 
+/* Perform the sysreg access, returns 0 on success */
+static int access_sys_reg(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *params,
+ const struct sys_reg_desc *r)
+{
+   u64 pc = *vcpu_pc(vcpu);
+
+   if (unlikely(!r))
+   return -1;
+
+   /*
+* Not having an accessor means that we have configured a trap
+* that we don't know how to handle. This certainly qualifies
+* as a gross bug that should be fixed right away.
+*/
+   BUG_ON(!r->access);
+
+   if (likely(r->access(vcpu, params, r))) {
+   /*
+* Skip the instruction if it was emulated without PC
+* having changed. This allows us to detect a fault
+* being injected (incrementing the PC here would
+* cause the vcpu to skip the first instruction of its
+* fault handler).
+*/
+   if (pc == *vcpu_pc(vcpu))
+   kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+   return 0;
+   }
+
+   return -1;
+}
+
 int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
kvm_inject_undefined(vcpu);
@@ -994,26 +1027,7 @@ static int emulate_cp(struct kvm_vcpu *vcpu,
 
r = find_reg(params, table, num);
 
-   if (r) {
-   /*
-* Not having an accessor means that we have
-* configured a trap that we don't know how to
-* handle. This certainly qualifies as a gross bug
-* that should be fixed right away.
-*/
-   BUG_ON(!r->access);
-
-   if (likely(r->access(vcpu, params, r))) {
-   /* Skip instruction, since it was emulated */
-   kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
-   }
-
-   /* Handled */
-   return 0;
-   }
-
-   /* Not handled */
-   return -1;
+   return access_sys_reg(vcpu, params, r);
 }
 
 static void unhandled_cp_access(struct kvm_vcpu *vcpu,
@@ -1178,27 +1192,12 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
if (!r)
r = find_reg(params, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
 
-   if (likely(r)) {
-   /*
-* Not having an accessor means that we have
-* configured a trap that we don't know how to
-* handle. This certainly qualifies as a gross bug
-* that should be fixed right away.
-*/
-   BUG_ON(!r->access);
-
-   if (likely(r->access(vcpu, params, r))) {
-   /* Skip instruction, since it was emulated */
-   kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
-   return 1;
-   }
-   /* If access function fails, it should complain. */
-   } else {
+   if (access_sys_reg(vcpu, params, r)) {
kvm_err("Unsupported guest sys_reg access at: %lx\n",
*vcpu_pc(vcpu));
print_sys_reg_instr(params);
+   kvm_inject_undefined(vcpu);
}
-   kvm_inject_undefined(vcpu);
return 1;
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Fix PC corruption when injecting a fault

2015-12-22 Thread Marc Zyngier
When injecting a fault as the result of a system register trap, we
change the PC to point to the fault handler. This clashes with the
code that increments the PC to skip over the emulated system register
access, leading to a situation where we skip the first instruction of
the fault handler.

The good news is that so far, we never do this, so I believe the
current code is safe. But the PMU code is soon going to exercise that
path, and I'd rather plug it sooner that later.

Thanks,

M.

Marc Zyngier (2):
  arm: KVM: Do not update PC if the trap handler has updated it
  arm64: KVM: Do not update PC if the trap handler has updated it

 arch/arm/kvm/coproc.c | 14 +++--
 arch/arm64/kvm/sys_regs.c | 73 +++
 2 files changed, 48 insertions(+), 39 deletions(-)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] arm: KVM: Do not update PC if the trap handler has updated it

2015-12-22 Thread Marc Zyngier
Assuming we trap a coprocessor access, and decide that the access
is illegal, we will inject an exception in the guest. In this
case, we shouldn't increment the PC, or the vcpu will miss the
first instruction of the handler, leading to a mildly confused
guest.

Solve this by snapshoting PC before the access is performed,
and checking if it has moved or not before incrementing it.

Reported-by: Shannon Zhao 
Signed-off-by: Marc Zyngier 
---
 arch/arm/kvm/coproc.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index f3d88dc..f4ad2f2 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -447,12 +447,22 @@ static int emulate_cp15(struct kvm_vcpu *vcpu,
r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
 
if (likely(r)) {
+   unsigned long pc = *vcpu_pc(vcpu);
+
/* If we don't have an accessor, we should never get here! */
BUG_ON(!r->access);
 
if (likely(r->access(vcpu, params, r))) {
-   /* Skip instruction, since it was emulated */
-   kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+   /*
+* Skip the instruction if it was emulated
+* without PC having changed. This allows us
+* to detect a fault being injected
+* (incrementing the PC here would cause the
+* vcpu to skip the first instruction of its
+* fault handler).
+*/
+   if (pc == *vcpu_pc(vcpu))
+   kvm_skip_instr(vcpu, 
kvm_vcpu_trap_il_is32bit(vcpu));
return 1;
}
/* If access function fails, it should complain. */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] arm: KVM: Do not update PC if the trap handler has updated it

2015-12-22 Thread Peter Maydell
On 22 December 2015 at 09:55, Marc Zyngier  wrote:
> Assuming we trap a coprocessor access, and decide that the access
> is illegal, we will inject an exception in the guest. In this
> case, we shouldn't increment the PC, or the vcpu will miss the
> first instruction of the handler, leading to a mildly confused
> guest.
>
> Solve this by snapshoting PC before the access is performed,
> and checking if it has moved or not before incrementing it.
>
> Reported-by: Shannon Zhao 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm/kvm/coproc.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> index f3d88dc..f4ad2f2 100644
> --- a/arch/arm/kvm/coproc.c
> +++ b/arch/arm/kvm/coproc.c
> @@ -447,12 +447,22 @@ static int emulate_cp15(struct kvm_vcpu *vcpu,
> r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
>
> if (likely(r)) {
> +   unsigned long pc = *vcpu_pc(vcpu);
> +
> /* If we don't have an accessor, we should never get here! */
> BUG_ON(!r->access);
>
> if (likely(r->access(vcpu, params, r))) {
> -   /* Skip instruction, since it was emulated */
> -   kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> +   /*
> +* Skip the instruction if it was emulated
> +* without PC having changed. This allows us
> +* to detect a fault being injected
> +* (incrementing the PC here would cause the
> +* vcpu to skip the first instruction of its
> +* fault handler).
> +*/
> +   if (pc == *vcpu_pc(vcpu))
> +   kvm_skip_instr(vcpu, 
> kvm_vcpu_trap_il_is32bit(vcpu));

Won't this result in our incorrectly skipping the first insn
in the fault handler if the original offending instruction
was itself the first insn in the fault handler?

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC v4 4/5] VSOCK: Introduce vhost_vsock.ko

2015-12-22 Thread Stefan Hajnoczi
From: Asias He 

VM sockets vhost transport implementation.  This driver runs on the
host.

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
v4:
 * Add MAINTAINERS file entry
 * virtqueue used len is now sizeof(pkt->hdr) + pkt->len instead of just
   pkt->len
 * checkpatch.pl cleanups
 * Clarify struct vhost_vsock locking
 * Add comments about optimization that disables virtqueue notify
 * Drop unused vhost_vsock_handle_ctl_kick()
 * Call wake_up() after decrementing total_tx_buf to prevent deadlock
v3:
 * Remove unneeded variable used to store return value
   (Fengguang Wu  and Julia Lawall
   )
v2:
 * Add missing total_tx_buf decrement
 * Support flexible rx/tx descriptor layout
 * Refuse to assign reserved CIDs
 * Refuse guest CID if already in use
 * Only accept correctly addressed packets
vhost: checkpatch.pl cleanups

Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS   |   2 +
 drivers/vhost/vsock.c | 607 ++
 drivers/vhost/vsock.h |   4 +
 3 files changed, 613 insertions(+)
 create mode 100644 drivers/vhost/vsock.c
 create mode 100644 drivers/vhost/vsock.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 67d8504..0181dc2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11370,6 +11370,8 @@ F:  include/linux/virtio_vsock.h
 F: include/uapi/linux/virtio_vsock.h
 F: net/vmw_vsock/virtio_transport_common.c
 F: net/vmw_vsock/virtio_transport.c
+F: drivers/vhost/vsock.c
+F: drivers/vhost/vsock.h
 
 VIRTUAL SERIO DEVICE DRIVER
 M: Stephen Chandler Paul 
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
new file mode 100644
index 000..2c5963c
--- /dev/null
+++ b/drivers/vhost/vsock.c
@@ -0,0 +1,607 @@
+/*
+ * vhost transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He 
+ * Stefan Hajnoczi 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include "vhost.h"
+#include "vsock.h"
+
+#define VHOST_VSOCK_DEFAULT_HOST_CID   2
+
+enum {
+   VHOST_VSOCK_FEATURES = VHOST_FEATURES,
+};
+
+/* Used to track all the vhost_vsock instances on the system. */
+static LIST_HEAD(vhost_vsock_list);
+static DEFINE_MUTEX(vhost_vsock_mutex);
+
+struct vhost_vsock {
+   struct vhost_dev dev;
+   struct vhost_virtqueue vqs[VSOCK_VQ_MAX];
+
+   /* Link to global vhost_vsock_list, protected by vhost_vsock_mutex */
+   struct list_head list;
+
+   struct vhost_work send_pkt_work;
+   wait_queue_head_t send_wait;
+
+   /* Fields protected by vqs[VSOCK_VQ_RX].mutex */
+   struct list_head send_pkt_list; /* host->guest pending packets */
+   u32 total_tx_buf;
+
+   u32 guest_cid;
+};
+
+static u32 vhost_transport_get_local_cid(void)
+{
+   return VHOST_VSOCK_DEFAULT_HOST_CID;
+}
+
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+{
+   struct vhost_vsock *vsock;
+
+   mutex_lock(_vsock_mutex);
+   list_for_each_entry(vsock, _vsock_list, list) {
+   if (vsock->guest_cid == guest_cid) {
+   mutex_unlock(_vsock_mutex);
+   return vsock;
+   }
+   }
+   mutex_unlock(_vsock_mutex);
+
+   return NULL;
+}
+
+static void
+vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
+   struct vhost_virtqueue *vq)
+{
+   bool added = false;
+
+   mutex_lock(>mutex);
+
+   /* Avoid further vmexits, we're already processing the virtqueue */
+   vhost_disable_notify(>dev, vq);
+
+   for (;;) {
+   struct virtio_vsock_pkt *pkt;
+   struct iov_iter iov_iter;
+   unsigned out, in;
+   size_t nbytes;
+   size_t len;
+   int head;
+
+   if (list_empty(>send_pkt_list)) {
+   vhost_enable_notify(>dev, vq);
+   break;
+   }
+
+   head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
+, , NULL, NULL);
+   if (head < 0)
+   break;
+
+   if (head == vq->num) {
+   /* We cannot finish yet if more buffers snuck in while
+* re-enabling notify.
+*/
+   if (unlikely(vhost_enable_notify(>dev, vq))) {
+   vhost_disable_notify(>dev, vq);
+   continue;
+   }
+   break;
+   }
+
+   pkt = list_first_entry(>send_pkt_list,
+  struct virtio_vsock_pkt, list);
+   

[RFC v4 0/5] Add virtio transport for AF_VSOCK

2015-12-22 Thread Stefan Hajnoczi
This series is based on v4.4-rc2 and the "virtio: make find_vqs()
checkpatch.pl-friendly" patch I recently submitted.

v4:
 * Addressed code review comments from Alex Bennee
 * MAINTAINERS file entries for new files
 * Trace events instead of pr_debug()
 * RST packet is sent when there is no listen socket
 * Allow guest->host connections again (began discussing netfilter support with
   Matt Benjamin instead of hard-coding security policy in virtio-vsock code)
 * Many checkpatch.pl cleanups (will be 100% clean in v5)

v3:
 * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
   of REQUEST/RESPONSE/ACK
 * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
   (also drop v2 Patch 1, it's only needed for SOCK_DGRAM)
 * Only allow host->guest connections (same security model as latest
   VMware)
 * Don't put vhost vsock driver into staging
 * Add missing Kconfig dependencies (Arnd Bergmann )
 * Remove unneeded variable used to store return value
   (Fengguang Wu  and Julia Lawall
   )

v2:
 * Rebased onto Linux v4.4-rc2
 * vhost: Refuse to assign reserved CIDs
 * vhost: Refuse guest CID if already in use
 * vhost: Only accept correctly addressed packets (no spoofing!)
 * vhost: Support flexible rx/tx descriptor layout
 * vhost: Add missing total_tx_buf decrement
 * virtio_transport: Fix total_tx_buf accounting
 * virtio_transport: Add virtio_transport global mutex to prevent races
 * common: Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * common: Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * common: Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * common: Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
 * common: Fix peer_buf_alloc inheritance on child socket

This patch series adds a virtio transport for AF_VSOCK (net/vmw_vsock/).
AF_VSOCK is designed for communication between virtual machines and
hypervisors.  It is currently only implemented for VMware's VMCI transport.

This series implements the proposed virtio-vsock device specification from
here:
http://permalink.gmane.org/gmane.comp.emulators.virtio.devel/980

Most of the work was done by Asias He and Gerd Hoffmann a while back.  I have
picked up the series again.

The QEMU userspace changes are here:
https://github.com/stefanha/qemu/commits/vsock

Why virtio-vsock?
-
Guest<->host communication is currently done over the virtio-serial device.
This makes it hard to port sockets API-based applications and is limited to
static ports.

virtio-vsock uses the sockets API so that applications can rely on familiar
SOCK_STREAM semantics.  Applications on the host can easily connect to guest
agents because the sockets API allows multiple connections to a listen socket
(unlike virtio-serial).  This simplifies the guest<->host communication and
eliminates the need for extra processes on the host to arbitrate virtio-serial
ports.

Overview

This series adds 3 pieces:

1. virtio_transport_common.ko - core virtio vsock code that uses vsock.ko

2. virtio_transport.ko - guest driver

3. drivers/vhost/vsock.ko - host driver

Howto
-
The following kernel options are needed:
  CONFIG_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS=y
  CONFIG_VIRTIO_VSOCKETS_COMMON=y
  CONFIG_VHOST_VSOCK=m

Launch QEMU as follows:
  # qemu ... -device vhost-vsock-pci,id=vhost-vsock-pci0,guest-cid=3

Guest and host can communicate via AF_VSOCK sockets.  The host's CID (address)
is 2 and the guest must be assigned a CID (3 in the example above).

Status
--
This patch series implements the latest draft specification.  Please review.

Asias He (4):
  VSOCK: Introduce virtio_vsock_common.ko
  VSOCK: Introduce virtio_transport.ko
  VSOCK: Introduce vhost_vsock.ko
  VSOCK: Add Makefile and Kconfig

Stefan Hajnoczi (1):
  VSOCK: transport-specific vsock_transport functions

 MAINTAINERS|  13 +
 drivers/vhost/Kconfig  |  15 +
 drivers/vhost/Makefile |   4 +
 drivers/vhost/vsock.c  | 607 +++
 drivers/vhost/vsock.h  |   4 +
 include/linux/virtio_vsock.h   | 167 +
 include/net/af_vsock.h |   3 +
 .../trace/events/vsock_virtio_transport_common.h   | 144 
 include/uapi/linux/virtio_ids.h|   1 +
 include/uapi/linux/virtio_vsock.h  |  87 +++
 net/vmw_vsock/Kconfig  |  19 +
 net/vmw_vsock/Makefile |   2 +
 net/vmw_vsock/af_vsock.c   |   9 +
 net/vmw_vsock/virtio_transport.c   | 481 
 net/vmw_vsock/virtio_transport_common.c| 834 +
 15 files changed, 2390 insertions(+)
 create mode 100644 drivers/vhost/vsock.c
 create mode 

[RFC v4 3/5] VSOCK: Introduce virtio_transport.ko

2015-12-22 Thread Stefan Hajnoczi
From: Asias He 

VM sockets virtio transport implementation.  This driver runs in the
guest.

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
v4:
 * Add MAINTAINERS file entry
 * Drop short/long rx packets
 * checkpatch.pl cleanups
 * Clarify locking in struct virtio_vsock
 * Narrow local variable scopes as suggested by Alex Bennee
 * Call wake_up() after decrementing total_tx_buf to avoid deadlock
v2:
 * Fix total_tx_buf accounting
 * Add virtio_transport global mutex to prevent races

Signed-off-by: Stefan Hajnoczi 
---
 MAINTAINERS  |   1 +
 net/vmw_vsock/virtio_transport.c | 481 +++
 2 files changed, 482 insertions(+)
 create mode 100644 net/vmw_vsock/virtio_transport.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d42db78..67d8504 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11369,6 +11369,7 @@ S:  Maintained
 F: include/linux/virtio_vsock.h
 F: include/uapi/linux/virtio_vsock.h
 F: net/vmw_vsock/virtio_transport_common.c
+F: net/vmw_vsock/virtio_transport.c
 
 VIRTUAL SERIO DEVICE DRIVER
 M: Stephen Chandler Paul 
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
new file mode 100644
index 000..e4787bf
--- /dev/null
+++ b/net/vmw_vsock/virtio_transport.c
@@ -0,0 +1,481 @@
+/*
+ * virtio transport for vsock
+ *
+ * Copyright (C) 2013-2015 Red Hat, Inc.
+ * Author: Asias He 
+ * Stefan Hajnoczi 
+ *
+ * Some of the code is take from Gerd Hoffmann 's
+ * early virtio-vsock proof-of-concept bits.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct workqueue_struct *virtio_vsock_workqueue;
+static struct virtio_vsock *the_virtio_vsock;
+static DEFINE_MUTEX(the_virtio_vsock_mutex); /* protects the_virtio_vsock */
+static void virtio_vsock_rx_fill(struct virtio_vsock *vsock);
+
+struct virtio_vsock {
+   struct virtio_device *vdev;
+   struct virtqueue *vqs[VSOCK_VQ_MAX];
+
+   /* Virtqueue processing is deferred to a workqueue */
+   struct work_struct tx_work;
+   struct work_struct rx_work;
+
+   wait_queue_head_t tx_wait;  /* for waiting for tx resources */
+
+   /* The following fields are protected by tx_lock.  vqs[VSOCK_VQ_TX]
+* must be accessed with tx_lock held.
+*/
+   struct mutex tx_lock;
+   u32 total_tx_buf;
+
+   /* The following fields are protected by rx_lock.  vqs[VSOCK_VQ_RX]
+* must be accessed with rx_lock held.
+*/
+   struct mutex rx_lock;
+   int rx_buf_nr;
+   int rx_buf_max_nr;
+
+   u32 guest_cid;
+};
+
+static struct virtio_vsock *virtio_vsock_get(void)
+{
+   return the_virtio_vsock;
+}
+
+static u32 virtio_transport_get_local_cid(void)
+{
+   struct virtio_vsock *vsock = virtio_vsock_get();
+
+   return vsock->guest_cid;
+}
+
+static int
+virtio_transport_send_one_pkt(struct virtio_vsock *vsock,
+ struct virtio_vsock_pkt *pkt)
+{
+   struct scatterlist hdr, buf, *sgs[2];
+   int ret, in_sg = 0, out_sg = 0;
+   struct virtqueue *vq;
+   DEFINE_WAIT(wait);
+
+   vq = vsock->vqs[VSOCK_VQ_TX];
+
+   /* Put pkt in the virtqueue */
+   sg_init_one(, >hdr, sizeof(pkt->hdr));
+   sgs[out_sg++] = 
+   if (pkt->buf) {
+   sg_init_one(, pkt->buf, pkt->len);
+   sgs[out_sg++] = 
+   }
+
+   mutex_lock(>tx_lock);
+   while ((ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, pkt,
+   GFP_KERNEL)) < 0) {
+   prepare_to_wait_exclusive(>tx_wait, ,
+ TASK_UNINTERRUPTIBLE);
+   mutex_unlock(>tx_lock);
+   schedule();
+   mutex_lock(>tx_lock);
+   finish_wait(>tx_wait, );
+   }
+   virtqueue_kick(vq);
+   mutex_unlock(>tx_lock);
+
+   return pkt->len;
+}
+
+static int
+virtio_transport_send_pkt_no_sock(struct virtio_vsock_pkt *pkt)
+{
+   struct virtio_vsock *vsock;
+
+   vsock = virtio_vsock_get();
+   if (!vsock) {
+   virtio_transport_free_pkt(pkt);
+   return -ENODEV;
+   }
+
+   return virtio_transport_send_one_pkt(vsock, pkt);
+}
+
+static int
+virtio_transport_send_pkt(struct vsock_sock *vsk,
+ struct virtio_vsock_pkt_info *info)
+{
+   u32 src_cid, src_port, dst_cid, dst_port;
+   struct virtio_vsock_sock *vvs;
+   struct virtio_vsock_pkt *pkt;
+   struct virtio_vsock *vsock;
+   u32 pkt_len = info->pkt_len;
+   DEFINE_WAIT(wait);
+
+   vsock = virtio_vsock_get();
+   if (!vsock)
+   return 

Re: [PATCH 1/2] arm: KVM: Do not update PC if the trap handler has updated it

2015-12-22 Thread Shannon Zhao


On 2015/12/22 17:55, Marc Zyngier wrote:
> Assuming we trap a coprocessor access, and decide that the access
> is illegal, we will inject an exception in the guest. In this
> case, we shouldn't increment the PC, or the vcpu will miss the
> first instruction of the handler, leading to a mildly confused
> guest.
> 
> Solve this by snapshoting PC before the access is performed,
> and checking if it has moved or not before incrementing it.
> 
> Reported-by: Shannon Zhao 
> Signed-off-by: Marc Zyngier 

Reviewed-by: Shannon Zhao 
> ---
>  arch/arm/kvm/coproc.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> index f3d88dc..f4ad2f2 100644
> --- a/arch/arm/kvm/coproc.c
> +++ b/arch/arm/kvm/coproc.c
> @@ -447,12 +447,22 @@ static int emulate_cp15(struct kvm_vcpu *vcpu,
>   r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
>  
>   if (likely(r)) {
> + unsigned long pc = *vcpu_pc(vcpu);
> +
>   /* If we don't have an accessor, we should never get here! */
>   BUG_ON(!r->access);
>  
>   if (likely(r->access(vcpu, params, r))) {
> - /* Skip instruction, since it was emulated */
> - kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> + /*
> +  * Skip the instruction if it was emulated
> +  * without PC having changed. This allows us
> +  * to detect a fault being injected
> +  * (incrementing the PC here would cause the
> +  * vcpu to skip the first instruction of its
> +  * fault handler).
> +  */
> + if (pc == *vcpu_pc(vcpu))
> + kvm_skip_instr(vcpu, 
> kvm_vcpu_trap_il_is32bit(vcpu));
>   return 1;
>   }
>   /* If access function fails, it should complain. */
> 

-- 
Shannon
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] arm64: KVM: Do not update PC if the trap handler has updated it

2015-12-22 Thread Shannon Zhao


On 2015/12/22 17:55, Marc Zyngier wrote:
> Assuming we trap a system register, and decide that the access is
> illegal, we will inject an exception in the guest. In this
> case, we shouldn't increment the PC, or the vcpu will miss the
> first instruction of the handler, leading to a mildly confused
> guest.
> 
> Solve this by snapshoting PC before the access is performed,
> and checking if it has moved or not before incrementing it.
> 
Thanks a lot. This solves the problem of guest PMU failing to inject EL1
fault to guest.

Tested-by: Shannon Zhao 
Reviewed-by: Shannon Zhao 

> Reported-by: Shannon Zhao 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/kvm/sys_regs.c | 73 
> +++
>  1 file changed, 36 insertions(+), 37 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index d2650e8..9c87e0c 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -966,6 +966,39 @@ static const struct sys_reg_desc *find_reg(const struct 
> sys_reg_params *params,
>   return NULL;
>  }
>  
> +/* Perform the sysreg access, returns 0 on success */
> +static int access_sys_reg(struct kvm_vcpu *vcpu,
> +   struct sys_reg_params *params,
> +   const struct sys_reg_desc *r)
> +{
> + u64 pc = *vcpu_pc(vcpu);
> +
> + if (unlikely(!r))
> + return -1;
> +
> + /*
> +  * Not having an accessor means that we have configured a trap
> +  * that we don't know how to handle. This certainly qualifies
> +  * as a gross bug that should be fixed right away.
> +  */
> + BUG_ON(!r->access);
> +
> + if (likely(r->access(vcpu, params, r))) {
> + /*
> +  * Skip the instruction if it was emulated without PC
> +  * having changed. This allows us to detect a fault
> +  * being injected (incrementing the PC here would
> +  * cause the vcpu to skip the first instruction of its
> +  * fault handler).
> +  */
> + if (pc == *vcpu_pc(vcpu))
> + kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> + return 0;
> + }
> +
> + return -1;
> +}
> +
>  int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  {
>   kvm_inject_undefined(vcpu);
> @@ -994,26 +1027,7 @@ static int emulate_cp(struct kvm_vcpu *vcpu,
>  
>   r = find_reg(params, table, num);
>  
> - if (r) {
> - /*
> -  * Not having an accessor means that we have
> -  * configured a trap that we don't know how to
> -  * handle. This certainly qualifies as a gross bug
> -  * that should be fixed right away.
> -  */
> - BUG_ON(!r->access);
> -
> - if (likely(r->access(vcpu, params, r))) {
> - /* Skip instruction, since it was emulated */
> - kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> - }
> -
> - /* Handled */
> - return 0;
> - }
> -
> - /* Not handled */
> - return -1;
> + return access_sys_reg(vcpu, params, r);
>  }
>  
>  static void unhandled_cp_access(struct kvm_vcpu *vcpu,
> @@ -1178,27 +1192,12 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu,
>   if (!r)
>   r = find_reg(params, sys_reg_descs, ARRAY_SIZE(sys_reg_descs));
>  
> - if (likely(r)) {
> - /*
> -  * Not having an accessor means that we have
> -  * configured a trap that we don't know how to
> -  * handle. This certainly qualifies as a gross bug
> -  * that should be fixed right away.
> -  */
> - BUG_ON(!r->access);
> -
> - if (likely(r->access(vcpu, params, r))) {
> - /* Skip instruction, since it was emulated */
> - kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> - return 1;
> - }
> - /* If access function fails, it should complain. */
> - } else {
> + if (access_sys_reg(vcpu, params, r)) {
>   kvm_err("Unsupported guest sys_reg access at: %lx\n",
>   *vcpu_pc(vcpu));
>   print_sys_reg_instr(params);
> + kvm_inject_undefined(vcpu);
>   }
> - kvm_inject_undefined(vcpu);
>   return 1;
>  }
>  
> 

-- 
Shannon
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 3/3] KVM/arm/arm64: enable enhanced armv8 fp/simd lazy switch

2015-12-22 Thread Christoffer Dall
On Mon, Dec 21, 2015 at 11:34:25AM -0800, Mario Smarduch wrote:
> 
> 
> On 12/18/2015 11:45 PM, Christoffer Dall wrote:
> > On Fri, Dec 18, 2015 at 05:17:00PM -0800, Mario Smarduch wrote:
> >> On 12/18/2015 5:54 AM, Christoffer Dall wrote:
> >>> On Sun, Dec 06, 2015 at 05:07:14PM -0800, Mario Smarduch wrote:
>  This patch tracks armv7 and armv8 fp/simd hardware state with cptr_el2 
>  register.
>  On vcpu_load for 32 bit guests enable FP access, and enable fp/simd
>  trapping for 32 and 64 bit guests. On first fp/simd access trap to 
>  handler 
>  to save host and restore guest context, and clear trapping bits to 
>  enable vcpu 
>  lazy mode. On vcpu_put if trap bits are clear save guest and restore 
>  host 
>  context and also save 32 bit guest fpexc register.
> 
>  Signed-off-by: Mario Smarduch 
>  ---
>   arch/arm/include/asm/kvm_emulate.h   |   5 ++
>   arch/arm/include/asm/kvm_host.h  |   2 +
>   arch/arm/kvm/arm.c   |  20 +--
>   arch/arm64/include/asm/kvm_asm.h |   2 +
>   arch/arm64/include/asm/kvm_emulate.h |  15 +++--
>   arch/arm64/include/asm/kvm_host.h|  16 +-
>   arch/arm64/kernel/asm-offsets.c  |   1 +
>   arch/arm64/kvm/Makefile  |   3 +-
>   arch/arm64/kvm/fpsimd_switch.S   |  38 
>   arch/arm64/kvm/hyp.S | 108 
>  +--
>   arch/arm64/kvm/hyp_head.S|  48 
>   11 files changed, 181 insertions(+), 77 deletions(-)
>   create mode 100644 arch/arm64/kvm/fpsimd_switch.S
>   create mode 100644 arch/arm64/kvm/hyp_head.S
> 
>  diff --git a/arch/arm/include/asm/kvm_emulate.h 
>  b/arch/arm/include/asm/kvm_emulate.h
>  index 3de11a2..13feed5 100644
>  --- a/arch/arm/include/asm/kvm_emulate.h
>  +++ b/arch/arm/include/asm/kvm_emulate.h
>  @@ -243,6 +243,11 @@ static inline unsigned long 
>  vcpu_data_host_to_guest(struct kvm_vcpu *vcpu,
>   }
>   }
>   
>  +static inline bool kvm_guest_vcpu_is_32bit(struct kvm_vcpu *vcpu)
>  +{
>  +return true;
>  +}
>  +
>   #ifdef CONFIG_VFPv3
>   /* Called from vcpu_load - save fpexc and enable guest access to 
>  fp/simd unit */
>   static inline void kvm_enable_vcpu_fpexc(struct kvm_vcpu *vcpu)
>  diff --git a/arch/arm/include/asm/kvm_host.h 
>  b/arch/arm/include/asm/kvm_host.h
>  index ecc883a..720ae51 100644
>  --- a/arch/arm/include/asm/kvm_host.h
>  +++ b/arch/arm/include/asm/kvm_host.h
>  @@ -227,6 +227,8 @@ int kvm_perf_teardown(void);
>   void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>   
>   struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long 
>  mpidr);
>  +
>  +static inline void kvm_save_guest_vcpu_fpexc(struct kvm_vcpu *vcpu) {}
>   void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>   
>   static inline void kvm_arch_hardware_disable(void) {}
>  diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>  index 1de07ab..dd59f8a 100644
>  --- a/arch/arm/kvm/arm.c
>  +++ b/arch/arm/kvm/arm.c
>  @@ -292,8 +292,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int 
>  cpu)
>   
>   kvm_arm_set_running_vcpu(vcpu);
>   
>  -/*  Save and enable FPEXC before we load guest context */
>  -kvm_enable_vcpu_fpexc(vcpu);
>  +/*
>  + * For 32bit guest executing on arm64, enable fp/simd access in
>  + * EL2. On arm32 save host fpexc and then enable fp/simd access.
>  + */
>  +if (kvm_guest_vcpu_is_32bit(vcpu))
>  +kvm_enable_vcpu_fpexc(vcpu);
>   
>   /* reset hyp cptr register to trap on tracing and vfp/simd 
>  access*/
>   vcpu_reset_cptr(vcpu);
>  @@ -302,10 +306,18 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int 
>  cpu)
>   void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>   {
>   /* If the fp/simd registers are dirty save guest, restore host. 
>  */
>  -if (kvm_vcpu_vfp_isdirty(vcpu))
>  +if (kvm_vcpu_vfp_isdirty(vcpu)) {
>   kvm_restore_host_vfp_state(vcpu);
>   
>  -/* Restore host FPEXC trashed in vcpu_load */
>  +/*
>  + * For 32bit guest on arm64 save the guest fpexc 
>  register
>  + * in EL2 mode.
>  + */
>  +if (kvm_guest_vcpu_is_32bit(vcpu))
>  +kvm_save_guest_vcpu_fpexc(vcpu);
>  +}
>  +
>  +/* For arm32 restore host FPEXC trashed in vcpu_load. */
>   kvm_restore_host_fpexc(vcpu);
>   
>   /*
>  diff --git 

[PATCH v8 04/20] KVM: ARM64: Add access handler for PMCR register

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

Add reset handler which gets host value of PMCR_EL0 and make writable
bits architecturally UNKNOWN except PMCR.E which is zero. Add an access
handler for PMCR.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index e8bf374..c60047e 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -439,6 +440,40 @@ static void reset_mpidr(struct kvm_vcpu *vcpu, const 
struct sys_reg_desc *r)
vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
 }
 
+static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
+{
+   u64 pmcr, val;
+
+   asm volatile("mrs %0, pmcr_el0\n" : "=r" (pmcr));
+   /* Writable bits of PMCR_EL0 (ARMV8_PMCR_MASK) is reset to UNKNOWN
+* except PMCR.E resetting to zero.
+*/
+   val = ((pmcr & ~ARMV8_PMCR_MASK) | (ARMV8_PMCR_MASK & 0xdecafbad))
+ & (~ARMV8_PMCR_E);
+   vcpu_sys_reg(vcpu, r->reg) = val;
+}
+
+static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+   const struct sys_reg_desc *r)
+{
+   u64 val;
+
+   if (p->is_write) {
+   /* Only update writeable bits of PMCR */
+   val = vcpu_sys_reg(vcpu, r->reg);
+   val &= ~ARMV8_PMCR_MASK;
+   val |= p->regval & ARMV8_PMCR_MASK;
+   vcpu_sys_reg(vcpu, r->reg) = val;
+   } else {
+   /* PMCR.P & PMCR.C are RAZ */
+   val = vcpu_sys_reg(vcpu, r->reg)
+ & ~(ARMV8_PMCR_P | ARMV8_PMCR_C);
+   p->regval = val;
+   }
+
+   return true;
+}
+
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
/* DBGBVRn_EL1 */   \
@@ -623,7 +658,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
/* PMCR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b000),
- trap_raz_wi },
+ access_pmcr, reset_pmcr, PMCR_EL0, },
/* PMCNTENSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
  trap_raz_wi },
@@ -885,7 +920,7 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 7), CRm(14), Op2( 2), access_dcsw },
 
/* PMU */
-   { Op1( 0), CRn( 9), CRm(12), Op2( 0), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr },
{ Op1( 0), CRn( 9), CRm(12), Op2( 1), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 2), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 15/20] KVM: ARM64: Add a helper to forward trap to guest EL1

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

This helper forward the trap caused by MRS/MSR for arch64 and MCR/MRC,
MCRR/MRRC for arch32 CP15 to guest EL1.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/include/asm/kvm_emulate.h |  1 +
 arch/arm64/kvm/inject_fault.c| 52 +++-
 2 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h 
b/arch/arm64/include/asm/kvm_emulate.h
index 3066328..88b2958 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -36,6 +36,7 @@ unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
 bool kvm_condition_valid32(const struct kvm_vcpu *vcpu);
 void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr);
 
+void kvm_forward_trap_to_el1(struct kvm_vcpu *vcpu);
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 648112e..052ef25 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -27,7 +27,10 @@
 
 #define PSTATE_FAULT_BITS_64   (PSR_MODE_EL1h | PSR_A_BIT | PSR_F_BIT | \
 PSR_I_BIT | PSR_D_BIT)
-#define EL1_EXCEPT_SYNC_OFFSET 0x200
+#define EL1_EXCEPT_BAD_SYNC_OFFSET 0x0
+#define EL1_EXCEPT_SYNC_OFFSET 0x200
+#define EL0_EXCEPT_SYNC_OFFSET_64  0x400
+#define EL0_EXCEPT_SYNC_OFFSET_32  0x600
 
 static void prepare_fault32(struct kvm_vcpu *vcpu, u32 mode, u32 vect_offset)
 {
@@ -201,3 +204,50 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
else
inject_undef64(vcpu);
 }
+
+/**
+ * kvm_forward_trap_to_el1 - forward access trap to the guest EL1
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ */
+void kvm_forward_trap_to_el1(struct kvm_vcpu *vcpu)
+{
+   unsigned long cpsr;
+   u32 esr = vcpu->arch.fault.esr_el2;
+   u32 esr_ec = (esr & ESR_ELx_EC_MASK) >> ESR_ELx_EC_SHIFT;
+
+   if (esr_ec == ESR_ELx_EC_SYS64) {
+   u64 exc_offset;
+
+   cpsr = *vcpu_cpsr(vcpu);
+   *vcpu_spsr(vcpu) = cpsr;
+   *vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
+
+   *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
+
+   switch (cpsr & (PSR_MODE_MASK | PSR_MODE32_BIT)) {
+   case PSR_MODE_EL0t:
+   exc_offset = EL0_EXCEPT_SYNC_OFFSET_64;
+   break;
+   case PSR_MODE_EL1t:
+   exc_offset = EL1_EXCEPT_BAD_SYNC_OFFSET;
+   break;
+   case PSR_MODE_EL1h:
+   exc_offset = EL1_EXCEPT_SYNC_OFFSET;
+   break;
+   default:
+   exc_offset = EL0_EXCEPT_SYNC_OFFSET_32;
+   }
+
+   *vcpu_pc(vcpu) = vcpu_sys_reg(vcpu, VBAR_EL1) + exc_offset;
+
+   if (kvm_vcpu_trap_il_is32bit(vcpu))
+   esr |= ESR_ELx_IL;
+
+   vcpu_sys_reg(vcpu, ESR_EL1) = esr;
+   } else if (esr_ec == ESR_ELx_EC_CP15_32 ||
+  esr_ec == ESR_ELx_EC_CP15_64) {
+   prepare_fault32(vcpu, COMPAT_PSR_MODE_UND, 4);
+   }
+}
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 16/20] KVM: ARM64: Add access handler for PMUSERENR register

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

This register resets as unknown in 64bit mode while it resets as zero
in 32bit mode. Here we choose to reset it as zero for consistency.

PMUSERENR_EL0 holds some bits which decide whether PMU registers can be
accessed from EL0. Add some check helpers to handle the access from EL0.

When these bits are zero, only reading PMUSERENR will trap to EL2 and
writing PMUSERENR or reading/writing other PMU registers will trap to
EL1 other than EL2 when HCR.TGE==0. To current KVM configuration
(HCR.TGE==0) there is no way to get these traps. Here we write 0xf to
physical PMUSERENR register on VM entry, so that it will trap PMU access
from EL0 to EL2. Within the register access handler we check the real
value of guest PMUSERENR register to decide whether this access is
allowed. If not allowed, forward this trap to EL1.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/include/asm/pmu.h |   9 
 arch/arm64/kvm/hyp/switch.c  |   3 ++
 arch/arm64/kvm/sys_regs.c| 122 +--
 3 files changed, 129 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/pmu.h b/arch/arm64/include/asm/pmu.h
index 2588f9c..1238ade 100644
--- a/arch/arm64/include/asm/pmu.h
+++ b/arch/arm64/include/asm/pmu.h
@@ -67,4 +67,13 @@
 #defineARMV8_EXCLUDE_EL0   (1 << 30)
 #defineARMV8_INCLUDE_EL2   (1 << 27)
 
+/*
+ * PMUSERENR: user enable reg
+ */
+#define ARMV8_USERENR_MASK 0xf /* Mask for writable bits */
+#define ARMV8_USERENR_EN   (1 << 0) /* PMU regs can be accessed at EL0 */
+#define ARMV8_USERENR_SW   (1 << 1) /* PMSWINC can be written at EL0 */
+#define ARMV8_USERENR_CR   (1 << 2) /* Cycle counter can be read at EL0 */
+#define ARMV8_USERENR_ER   (1 << 3) /* Event counter can be read at EL0 */
+
 #endif /* __ASM_PMU_H */
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index ca8f5a5..a85375f 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -37,6 +37,8 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
write_sysreg(1 << 15, hstr_el2);
write_sysreg(CPTR_EL2_TTA | CPTR_EL2_TFP, cptr_el2);
+   /* Make sure we trap PMU access from EL0 to EL2 */
+   write_sysreg(15, pmuserenr_el0);
write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
 }
 
@@ -45,6 +47,7 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu 
*vcpu)
write_sysreg(HCR_RW, hcr_el2);
write_sysreg(0, hstr_el2);
write_sysreg(read_sysreg(mdcr_el2) & MDCR_EL2_HPMN_MASK, mdcr_el2);
+   write_sysreg(0, pmuserenr_el0);
write_sysreg(0, cptr_el2);
 }
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 04281f1..ac0cbf8 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -453,11 +453,47 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const 
struct sys_reg_desc *r)
vcpu_sys_reg(vcpu, r->reg) = val;
 }
 
+static inline bool pmu_access_el0_disabled(struct kvm_vcpu *vcpu)
+{
+   u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
+
+   return !((reg & ARMV8_USERENR_EN) || vcpu_mode_priv(vcpu));
+}
+
+static inline bool pmu_write_swinc_el0_disabled(struct kvm_vcpu *vcpu)
+{
+   u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
+
+   return !((reg & (ARMV8_USERENR_SW | ARMV8_USERENR_EN))
+|| vcpu_mode_priv(vcpu));
+}
+
+static inline bool pmu_access_cycle_counter_el0_disabled(struct kvm_vcpu *vcpu)
+{
+   u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
+
+   return !((reg & (ARMV8_USERENR_CR | ARMV8_USERENR_EN))
+|| vcpu_mode_priv(vcpu));
+}
+
+static inline bool pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu)
+{
+   u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0);
+
+   return !((reg & (ARMV8_USERENR_ER | ARMV8_USERENR_EN))
+|| vcpu_mode_priv(vcpu));
+}
+
 static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
const struct sys_reg_desc *r)
 {
u64 val;
 
+   if (pmu_access_el0_disabled(vcpu)) {
+   kvm_forward_trap_to_el1(vcpu);
+   return true;
+   }
+
if (p->is_write) {
/* Only update writeable bits of PMCR */
val = vcpu_sys_reg(vcpu, r->reg);
@@ -478,6 +514,11 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
 static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
  const struct sys_reg_desc *r)
 {
+   if (pmu_access_event_counter_el0_disabled(vcpu)) {
+   kvm_forward_trap_to_el1(vcpu);
+   return true;
+   }
+
if (p->is_write)
vcpu_sys_reg(vcpu, r->reg) = p->regval;
else
@@ -492,6 +533,11 @@ static bool access_pmceid(struct kvm_vcpu *vcpu, struct 

[RFC v4 1/5] VSOCK: transport-specific vsock_transport functions

2015-12-22 Thread Stefan Hajnoczi
struct vsock_transport contains function pointers called by AF_VSOCK
core code.  The transport may want its own transport-specific function
pointers and they can be added after struct vsock_transport.

Allow the transport to fetch vsock_transport.  It can downcast it to
access transport-specific function pointers.

The virtio transport will use this.

Signed-off-by: Stefan Hajnoczi 
---
 include/net/af_vsock.h   | 3 +++
 net/vmw_vsock/af_vsock.c | 9 +
 2 files changed, 12 insertions(+)

diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index e9eb2d6..23f5525 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -165,6 +165,9 @@ static inline int vsock_core_init(const struct 
vsock_transport *t)
 }
 void vsock_core_exit(void);
 
+/* The transport may downcast this to access transport-specific functions */
+const struct vsock_transport *vsock_core_get_transport(void);
+
 / UTILS /
 
 void vsock_release_pending(struct sock *pending);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 7fd1220..9783a38 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1994,6 +1994,15 @@ void vsock_core_exit(void)
 }
 EXPORT_SYMBOL_GPL(vsock_core_exit);
 
+const struct vsock_transport *vsock_core_get_transport(void)
+{
+   /* vsock_register_mutex not taken since only the transport uses this
+* function and only while registered.
+*/
+   return transport;
+}
+EXPORT_SYMBOL_GPL(vsock_core_get_transport);
+
 MODULE_AUTHOR("VMware, Inc.");
 MODULE_DESCRIPTION("VMware Virtual Socket Family");
 MODULE_VERSION("1.0.1.0-k");
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 17/20] KVM: ARM64: Add PMU overflow interrupt routing

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

When calling perf_event_create_kernel_counter to create perf_event,
assign a overflow handler. Then when the perf event overflows, set the
corresponding bit of guest PMOVSSET register. If this counter is enabled
and its interrupt is enabled as well, kick the vcpu to sync the
interrupt.

On VM entry, if there is counter overflowed, inject the interrupt with
the level set to 1. Otherwise, inject the interrupt with the level set
to 0.

Signed-off-by: Shannon Zhao 
---
 arch/arm/kvm/arm.c|  2 ++
 include/kvm/arm_pmu.h |  2 ++
 virt/kvm/arm/pmu.c| 50 +-
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index dda1959..f54264c 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -577,6 +578,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 * non-preemptible context.
 */
preempt_disable();
+   kvm_pmu_flush_hwstate(vcpu);
kvm_timer_flush_hwstate(vcpu);
kvm_vgic_flush_hwstate(vcpu);
 
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 7ec7706..136f4e3 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -40,6 +40,7 @@ u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu);
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val);
+void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu);
 void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
@@ -59,6 +60,7 @@ u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val) {}
+void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {}
 void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index fda32cb..e28df0f 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /**
  * kvm_pmu_get_counter_value - get PMU counter value
@@ -161,6 +162,52 @@ void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
 }
 
 /**
+ * kvm_pmu_flush_hwstate - flush pmu state to cpu
+ * @vcpu: The vcpu pointer
+ *
+ * Inject virtual PMU IRQ if IRQ is pending for this cpu.
+ */
+void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu)
+{
+   struct kvm_pmu *pmu = >arch.pmu;
+   u64 overflow;
+
+   if (pmu->irq_num == -1)
+   return;
+
+   if (!(vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMCR_E))
+   return;
+
+   overflow = kvm_pmu_overflow_status(vcpu);
+   kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id, pmu->irq_num, !!overflow);
+}
+
+static inline struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc)
+{
+   struct kvm_pmu *pmu;
+   struct kvm_vcpu_arch *vcpu_arch;
+
+   pmc -= pmc->idx;
+   pmu = container_of(pmc, struct kvm_pmu, pmc[0]);
+   vcpu_arch = container_of(pmu, struct kvm_vcpu_arch, pmu);
+   return container_of(vcpu_arch, struct kvm_vcpu, arch);
+}
+
+/**
+ * When perf event overflows, call kvm_pmu_overflow_set to set overflow status.
+ */
+static void kvm_pmu_perf_overflow(struct perf_event *perf_event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+   struct kvm_pmc *pmc = perf_event->overflow_handler_context;
+   struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc);
+   int idx = pmc->idx;
+
+   kvm_pmu_overflow_set(vcpu, BIT(idx));
+}
+
+/**
  * kvm_pmu_software_increment - do software increment
  * @vcpu: The vcpu pointer
  * @val: the value guest writes to PMSWINC register
@@ -279,7 +326,8 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
/* The initial sample period (overflow count) of an event. */
attr.sample_period = (-counter) & pmc->bitmask;
 
-   event = perf_event_create_kernel_counter(, -1, current, NULL, pmc);
+   event = perf_event_create_kernel_counter(, -1, current,
+kvm_pmu_perf_overflow, pmc);
if (IS_ERR(event)) {
printk_once("kvm: pmu event creation failed %ld\n",
PTR_ERR(event));
-- 
2.0.4


--
To unsubscribe from this list: send the line 

[PATCH v8 00/20] KVM: ARM64: Add guest PMU support

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

This patchset adds guest PMU support for KVM on ARM64. It takes
trap-and-emulate approach. When guest wants to monitor one event, it
will be trapped by KVM and KVM will call perf_event API to create a perf
event and call relevant perf_event APIs to get the count value of event.

Use perf to test this patchset in guest. When using "perf list", it
shows the list of the hardware events and hardware cache events perf
supports. Then use "perf stat -e EVENT" to monitor some event. For
example, use "perf stat -e cycles" to count cpu cycles and
"perf stat -e cache-misses" to count cache misses.

Below are the outputs of "perf stat -r 5 sleep 5" when running in host
and guest.

Host:
 Performance counter stats for 'sleep 5' (5 runs):

  0.549456  task-clock (msec) #0.000 CPUs utilized  
  ( +-  5.68% )
 1  context-switches  #0.002 M/sec
 0  cpu-migrations#0.000 K/sec
48  page-faults   #0.088 M/sec  
  ( +-  1.40% )
   1146243  cycles#2.086 GHz
  ( +-  5.71% )
 stalled-cycles-frontend
 stalled-cycles-backend
627195  instructions  #0.55  insns per cycle
  ( +- 15.65% )
 branches
  9826  branch-misses #   17.883 M/sec  
  ( +-  1.10% )

   5.000875516 seconds time elapsed 
 ( +-  0.00% )


Guest:
 Performance counter stats for 'sleep 5' (5 runs):

  0.640712  task-clock (msec) #0.000 CPUs utilized  
  ( +-  0.41% )
 1  context-switches  #0.002 M/sec
 0  cpu-migrations#0.000 K/sec
50  page-faults   #0.077 M/sec  
  ( +-  1.37% )
   1320428  cycles#2.061 GHz
  ( +-  0.29% )
 stalled-cycles-frontend
 stalled-cycles-backend
642373  instructions  #0.49  insns per cycle
  ( +-  0.46% )
 branches
 10399  branch-misses #   16.230 M/sec  
  ( +-  1.57% )

   5.001181020 seconds time elapsed 
 ( +-  0.00% )


Have a cycle counter read test like below in guest and host:

static void test(void)
{
unsigned long count, count1, count2;
count1 = read_cycles();
count++;
count2 = read_cycles();
}

Host:
count1: 3049567104
count2: 3049567247
delta: 143

Guest:
count1: 5281420890
count2: 5281421068
delta: 178

The gap between guest and host is very small. One reason for this I
think is that it doesn't count the cycles in EL2 and host since we add
exclude_hv = 1. So the cycles spent to store/restore registers which
happens at EL2 are not included.

This patchset can be fetched from [1] and the relevant QEMU version for
test can be fetched from [2].

The results of 'perf test' can be found from [3][4].
The results of perf_event_tests test suite can be found from [5][6].

Also, I have tested "perf top" in two VMs and host at the same time. It
works well.

Thanks,
Shannon

[1] https://git.linaro.org/people/shannon.zhao/linux-mainline.git  
KVM_ARM64_PMU_v8
[2] https://git.linaro.org/people/shannon.zhao/qemu.git  virtual_PMU
[3] http://people.linaro.org/~shannon.zhao/PMU/perf-test-host.txt
[4] http://people.linaro.org/~shannon.zhao/PMU/perf-test-guest.txt
[5] http://people.linaro.org/~shannon.zhao/PMU/perf_event_tests-host.txt
[6] http://people.linaro.org/~shannon.zhao/PMU/perf_event_tests-guest.txt

Changes since v7:
* Rebase on kvm-arm next
* Fix the handler of PMUSERENR and add a helper to forward trap to guest
  EL1
* Fix some small bugs found by Marc

Changes since v6:
* Rebase on v4.4-rc5
* Drop access_pmu_cp15_regs() so that it could use same handler for both
  arch64 and arch32. And it could drop the definitions of CP15 register
  offsets, also avoid same codes added twice
* Use vcpu_sys_reg() when accessing PMU registers to avoid endian things
* Add handler for PMUSERENR and some checkers for other registers
* Add kvm_arm_pmu_get_attr()

Changes since v5:
* Rebase on new linux kernel mainline
* Remove state duplications and drop PMOVSCLR, PMCNTENCLR, PMINTENCLR,
  PMXEVCNTR, PMXEVTYPER
* Add a helper to check if vPMU is already initialized
* remove kvm_vcpu from kvm_pmc

Changes since v4:
* Rebase on new linux kernel mainline 
* Drop the reset handler of CP15 registers
* Fix a compile failure on arch ARM due to lack of asm/pmu.h
* Refactor the interrupt injecting flow according to Marc's suggestion
* Check the value of PMSELR register
* Calculate the attr.disabled according to PMCR.E and PMCNTENSET/CLR
* Fix some coding style
* Document the vPMU irq 

[PATCH v8 01/20] ARM64: Move PMU register related defines to asm/pmu.h

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

To use the ARMv8 PMU related register defines from the KVM code,
we move the relevant definitions to asm/pmu.h header file.

Signed-off-by: Anup Patel 
Signed-off-by: Shannon Zhao 
---
 arch/arm64/include/asm/pmu.h   | 67 ++
 arch/arm64/kernel/perf_event.c | 36 +--
 2 files changed, 68 insertions(+), 35 deletions(-)
 create mode 100644 arch/arm64/include/asm/pmu.h

diff --git a/arch/arm64/include/asm/pmu.h b/arch/arm64/include/asm/pmu.h
new file mode 100644
index 000..4406184
--- /dev/null
+++ b/arch/arm64/include/asm/pmu.h
@@ -0,0 +1,67 @@
+/*
+ * PMU support
+ *
+ * Copyright (C) 2012 ARM Limited
+ * Author: Will Deacon 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+#ifndef __ASM_PMU_H
+#define __ASM_PMU_H
+
+#define ARMV8_MAX_COUNTERS  32
+#define ARMV8_COUNTER_MASK  (ARMV8_MAX_COUNTERS - 1)
+
+/*
+ * Per-CPU PMCR: config reg
+ */
+#define ARMV8_PMCR_E   (1 << 0) /* Enable all counters */
+#define ARMV8_PMCR_P   (1 << 1) /* Reset all counters */
+#define ARMV8_PMCR_C   (1 << 2) /* Cycle counter reset */
+#define ARMV8_PMCR_D   (1 << 3) /* CCNT counts every 64th cpu cycle */
+#define ARMV8_PMCR_X   (1 << 4) /* Export to ETM */
+#define ARMV8_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive debug*/
+#defineARMV8_PMCR_N_SHIFT  11   /* Number of counters 
supported */
+#defineARMV8_PMCR_N_MASK   0x1f
+#defineARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
+
+/*
+ * PMCNTEN: counters enable reg
+ */
+#defineARMV8_CNTEN_MASK0x  /* Mask for writable 
bits */
+
+/*
+ * PMINTEN: counters interrupt enable reg
+ */
+#defineARMV8_INTEN_MASK0x  /* Mask for writable 
bits */
+
+/*
+ * PMOVSR: counters overflow flag status reg
+ */
+#defineARMV8_OVSR_MASK 0x  /* Mask for writable 
bits */
+#defineARMV8_OVERFLOWED_MASK   ARMV8_OVSR_MASK
+
+/*
+ * PMXEVTYPER: Event selection reg
+ */
+#defineARMV8_EVTYPE_MASK   0xc80003ff  /* Mask for writable 
bits */
+#defineARMV8_EVTYPE_EVENT  0x3ff   /* Mask for EVENT bits 
*/
+
+/*
+ * Event filters for PMUv3
+ */
+#defineARMV8_EXCLUDE_EL1   (1 << 31)
+#defineARMV8_EXCLUDE_EL0   (1 << 30)
+#defineARMV8_INCLUDE_EL2   (1 << 27)
+
+#endif /* __ASM_PMU_H */
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 5b1897e..7eca5dc 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * ARMv8 PMUv3 Performance Events handling code.
@@ -187,9 +188,6 @@ static const unsigned 
armv8_a57_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 #defineARMV8_IDX_COUNTER_LAST(cpu_pmu) \
(ARMV8_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1)
 
-#defineARMV8_MAX_COUNTERS  32
-#defineARMV8_COUNTER_MASK  (ARMV8_MAX_COUNTERS - 1)
-
 /*
  * ARMv8 low level PMU access
  */
@@ -200,38 +198,6 @@ static const unsigned 
armv8_a57_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 #defineARMV8_IDX_TO_COUNTER(x) \
(((x) - ARMV8_IDX_COUNTER0) & ARMV8_COUNTER_MASK)
 
-/*
- * Per-CPU PMCR: config reg
- */
-#define ARMV8_PMCR_E   (1 << 0) /* Enable all counters */
-#define ARMV8_PMCR_P   (1 << 1) /* Reset all counters */
-#define ARMV8_PMCR_C   (1 << 2) /* Cycle counter reset */
-#define ARMV8_PMCR_D   (1 << 3) /* CCNT counts every 64th cpu cycle */
-#define ARMV8_PMCR_X   (1 << 4) /* Export to ETM */
-#define ARMV8_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive debug*/
-#defineARMV8_PMCR_N_SHIFT  11   /* Number of counters 
supported */
-#defineARMV8_PMCR_N_MASK   0x1f
-#defineARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
-
-/*
- * PMOVSR: counters overflow flag status reg
- */
-#defineARMV8_OVSR_MASK 0x  /* Mask for writable 
bits */
-#defineARMV8_OVERFLOWED_MASK   ARMV8_OVSR_MASK
-
-/*
- * PMXEVTYPER: Event selection reg
- */
-#defineARMV8_EVTYPE_MASK   0xc80003ff  /* Mask for writable 
bits */
-#define

[PATCH v8 18/20] KVM: ARM64: Reset PMU state when resetting vcpu

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

When resetting vcpu, it needs to reset the PMU state to initial status.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/reset.c |  3 +++
 include/kvm/arm_pmu.h  |  2 ++
 virt/kvm/arm/pmu.c | 17 +
 3 files changed, 22 insertions(+)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index f34745c..dfbce78 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -120,6 +120,9 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
/* Reset system registers */
kvm_reset_sys_regs(vcpu);
 
+   /* Reset PMU */
+   kvm_pmu_vcpu_reset(vcpu);
+
/* Reset timer */
return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);
 }
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 136f4e3..51dd2d1 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -37,6 +37,7 @@ struct kvm_pmu {
 
 u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx);
 u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu);
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu);
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val);
@@ -57,6 +58,7 @@ u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
 {
return 0;
 }
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {}
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val) {}
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index e28df0f..e6aac73 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -68,6 +68,23 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, 
struct kvm_pmc *pmc)
}
 }
 
+/**
+ * kvm_pmu_vcpu_reset - reset pmu state for cpu
+ * @vcpu: The vcpu pointer
+ *
+ */
+void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
+{
+   int i;
+   struct kvm_pmu *pmu = >arch.pmu;
+
+   for (i = 0; i < ARMV8_MAX_COUNTERS; i++) {
+   kvm_pmu_stop_counter(vcpu, >pmc[i]);
+   pmu->pmc[i].idx = i;
+   pmu->pmc[i].bitmask = 0xUL;
+   }
+}
+
 u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
 {
u64 val = vcpu_sys_reg(vcpu, PMCR_EL0) >> ARMV8_PMCR_N_SHIFT;
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 03/20] KVM: ARM64: Add offset defines for PMU registers

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

We are about to trap and emulate accesses to each PMU register
individually. This adds the context offsets for the AArch64 PMU
registers.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/include/asm/kvm_host.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 6f0241f..6bab7fb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -115,6 +115,21 @@ enum vcpu_sysreg {
MDSCR_EL1,  /* Monitor Debug System Control Register */
MDCCINT_EL1,/* Monitor Debug Comms Channel Interrupt Enable Reg */
 
+   /* Performance Monitors Registers */
+   PMCR_EL0,   /* Control Register */
+   PMOVSSET_EL0,   /* Overflow Flag Status Set Register */
+   PMSELR_EL0, /* Event Counter Selection Register */
+   PMEVCNTR0_EL0,  /* Event Counter Register (0-30) */
+   PMEVCNTR30_EL0 = PMEVCNTR0_EL0 + 30,
+   PMCCNTR_EL0,/* Cycle Counter Register */
+   PMEVTYPER0_EL0, /* Event Type Register (0-30) */
+   PMEVTYPER30_EL0 = PMEVTYPER0_EL0 + 30,
+   PMCCFILTR_EL0,  /* Cycle Count Filter Register */
+   PMCNTENSET_EL0, /* Count Enable Set Register */
+   PMINTENSET_EL1, /* Interrupt Enable Set Register */
+   PMUSERENR_EL0,  /* User Enable Register */
+   PMSWINC_EL0,/* Software Increment Register */
+
/* 32bit specific registers. Keep them at the end of the range */
DACR32_EL2, /* Domain Access Control Register */
IFSR32_EL2, /* Instruction Fault Status Register */
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 13/20] KVM: ARM64: Add access handler for PMSWINC register

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

Add access handler which emulates writing and reading PMSWINC
register and add support for creating software increment event.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 18 +-
 include/kvm/arm_pmu.h |  2 ++
 virt/kvm/arm/pmu.c| 33 +
 3 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index d61f271dd..92021dc 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -682,6 +682,21 @@ static bool access_pmovs(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
return true;
 }
 
+static bool access_pmswinc(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+  const struct sys_reg_desc *r)
+{
+   u64 mask;
+
+   if (p->is_write) {
+   mask = kvm_pmu_valid_counter_mask(vcpu);
+   kvm_pmu_software_increment(vcpu, p->regval & mask);
+   } else {
+   kvm_inject_undefined(vcpu);
+   }
+
+   return true;
+}
+
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
/* DBGBVRn_EL1 */   \
@@ -892,7 +907,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmovs, NULL, PMOVSSET_EL0 },
/* PMSWINC_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b100),
- trap_raz_wi },
+ access_pmswinc, reset_unknown, PMSWINC_EL0 },
/* PMSELR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
  access_pmselr, reset_unknown, PMSELR_EL0 },
@@ -1231,6 +1246,7 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmcnten },
{ Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmcnten },
{ Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmovs },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 4), access_pmswinc },
{ Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmselr },
{ Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmceid },
{ Op1( 0), CRn( 9), CRm(12), Op2( 7), access_pmceid },
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 244970b..67d168c 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -40,6 +40,7 @@ u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu);
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val);
+void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
u64 select_idx);
 #else
@@ -57,6 +58,7 @@ u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val) {}
+void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
u64 select_idx) {}
 #endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index c23d57e..409f3c4 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -160,6 +160,35 @@ void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
kvm_vcpu_kick(vcpu);
 }
 
+/**
+ * kvm_pmu_software_increment - do software increment
+ * @vcpu: The vcpu pointer
+ * @val: the value guest writes to PMSWINC register
+ */
+void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val)
+{
+   int i;
+   u64 type, enable, reg;
+
+   if (val == 0)
+   return;
+
+   for (i = 0; i < ARMV8_CYCLE_IDX; i++) {
+   if (!(val & BIT(i)))
+   continue;
+   type = vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i)
+  & ARMV8_EVTYPE_EVENT;
+   enable = vcpu_sys_reg(vcpu, PMCNTENSET_EL0);
+   if ((type == 0) && (enable & BIT(i))) {
+   reg = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1;
+   reg = lower_32_bits(reg);
+   vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg;
+   if (!reg)
+   kvm_pmu_overflow_set(vcpu, BIT(i));
+   }
+   }
+}
+
 static inline bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu,
  u64 select_idx)
 {
@@ -189,6 +218,10 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 
u64 data,
kvm_pmu_stop_counter(vcpu, pmc);
eventsel = data & ARMV8_EVTYPE_EVENT;
 
+   /* For software increment event it does't need to create perf 

[PATCH v8 14/20] KVM: ARM64: Add helper to handle PMCR register bits

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

According to ARMv8 spec, when writing 1 to PMCR.E, all counters are
enabled by PMCNTENSET, while writing 0 to PMCR.E, all counters are
disabled. When writing 1 to PMCR.P, reset all event counters, not
including PMCCNTR, to zero. When writing 1 to PMCR.C, reset PMCCNTR to
zero.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c |  1 +
 include/kvm/arm_pmu.h |  2 ++
 virt/kvm/arm/pmu.c| 42 ++
 3 files changed, 45 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 92021dc..04281f1 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -464,6 +464,7 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
val &= ~ARMV8_PMCR_MASK;
val |= p->regval & ARMV8_PMCR_MASK;
vcpu_sys_reg(vcpu, r->reg) = val;
+   kvm_pmu_handle_pmcr(vcpu, val);
} else {
/* PMCR.P & PMCR.C are RAZ */
val = vcpu_sys_reg(vcpu, r->reg)
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 67d168c..7ec7706 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -41,6 +41,7 @@ void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val);
+void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
u64 select_idx);
 #else
@@ -59,6 +60,7 @@ void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val) 
{}
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) {}
+void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
u64 select_idx) {}
 #endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 409f3c4..fda32cb 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -189,6 +189,48 @@ void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 
val)
}
 }
 
+/**
+ * kvm_pmu_handle_pmcr - handle PMCR register
+ * @vcpu: The vcpu pointer
+ * @val: the value guest writes to PMCR register
+ */
+void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val)
+{
+   struct kvm_pmu *pmu = >arch.pmu;
+   struct kvm_pmc *pmc;
+   u64 mask;
+   int i;
+
+   mask = kvm_pmu_valid_counter_mask(vcpu);
+   if (val & ARMV8_PMCR_E) {
+   kvm_pmu_enable_counter(vcpu,
+vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask);
+   } else {
+   kvm_pmu_disable_counter(vcpu, mask);
+   }
+
+   if (val & ARMV8_PMCR_C) {
+   pmc = >pmc[ARMV8_CYCLE_IDX];
+   if (pmc->perf_event)
+   local64_set(>perf_event->count, 0);
+   vcpu_sys_reg(vcpu, PMCCNTR_EL0) = 0;
+   }
+
+   if (val & ARMV8_PMCR_P) {
+   for (i = 0; i < ARMV8_CYCLE_IDX; i++) {
+   pmc = >pmc[i];
+   if (pmc->perf_event)
+   local64_set(>perf_event->count, 0);
+   vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = 0;
+   }
+   }
+
+   if (val & ARMV8_PMCR_LC) {
+   pmc = >pmc[ARMV8_CYCLE_IDX];
+   pmc->bitmask = 0xUL;
+   }
+}
+
 static inline bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu,
  u64 select_idx)
 {
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 02/20] KVM: ARM64: Define PMU data structure for each vcpu

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

Here we plan to support virtual PMU for guest by full software
emulation, so define some basic structs and functions preparing for
futher steps. Define struct kvm_pmc for performance monitor counter and
struct kvm_pmu for performance monitor unit for each vcpu. According to
ARMv8 spec, the PMU contains at most 32(ARMV8_MAX_COUNTERS) counters.

Since this only supports ARM64 (or PMUv3), add a separate config symbol
for it.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/kvm/Kconfig|  7 +++
 include/kvm/arm_pmu.h | 42 +++
 3 files changed, 51 insertions(+)
 create mode 100644 include/kvm/arm_pmu.h

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 689d4c9..6f0241f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -36,6 +36,7 @@
 
 #include 
 #include 
+#include 
 
 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
 
@@ -211,6 +212,7 @@ struct kvm_vcpu_arch {
/* VGIC state */
struct vgic_cpu vgic_cpu;
struct arch_timer_cpu timer_cpu;
+   struct kvm_pmu pmu;
 
/*
 * Anything that is not used directly from assembly code goes
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a5272c0..de7450d 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
select KVM_ARM_VGIC_V3
+   select KVM_ARM_PMU if HW_PERF_EVENTS
---help---
  Support hosting virtualized guest machines.
  We don't support KVM with 16K page tables yet, due to the multiple
@@ -48,6 +49,12 @@ config KVM_ARM_HOST
---help---
  Provides host support for ARM processors.
 
+config KVM_ARM_PMU
+   bool
+   ---help---
+ Adds support for a virtual Performance Monitoring Unit (PMU) in
+ virtual machines.
+
 source drivers/vhost/Kconfig
 
 endif # VIRTUALIZATION
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
new file mode 100644
index 000..ddcb5b2
--- /dev/null
+++ b/include/kvm/arm_pmu.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (C) 2015 Linaro Ltd.
+ * Author: Shannon Zhao 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#ifndef __ASM_ARM_KVM_PMU_H
+#define __ASM_ARM_KVM_PMU_H
+
+#ifdef CONFIG_KVM_ARM_PMU
+
+#include 
+#include 
+
+struct kvm_pmc {
+   u8 idx;/* index into the pmu->pmc array */
+   struct perf_event *perf_event;
+   u64 bitmask;
+};
+
+struct kvm_pmu {
+   /* PMU IRQ Number per VCPU */
+   int irq_num;
+   struct kvm_pmc pmc[ARMV8_MAX_COUNTERS];
+};
+#else
+struct kvm_pmu {
+};
+#endif
+
+#endif
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 20/20] KVM: ARM64: Add a new kvm ARM PMU device

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

Add a new kvm device type KVM_DEV_TYPE_ARM_PMU_V3 for ARM PMU. Implement
the kvm_device_ops for it.

Signed-off-by: Shannon Zhao 
---
 Documentation/virtual/kvm/devices/arm-pmu.txt |  24 +
 arch/arm64/include/uapi/asm/kvm.h |   4 +
 include/linux/kvm_host.h  |   1 +
 include/uapi/linux/kvm.h  |   2 +
 virt/kvm/arm/pmu.c| 128 ++
 virt/kvm/kvm_main.c   |   4 +
 6 files changed, 163 insertions(+)
 create mode 100644 Documentation/virtual/kvm/devices/arm-pmu.txt

diff --git a/Documentation/virtual/kvm/devices/arm-pmu.txt 
b/Documentation/virtual/kvm/devices/arm-pmu.txt
new file mode 100644
index 000..dda864e
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/arm-pmu.txt
@@ -0,0 +1,24 @@
+ARM Virtual Performance Monitor Unit (vPMU)
+===
+
+Device types supported:
+  KVM_DEV_TYPE_ARM_PMU_V3 ARM Performance Monitor Unit v3
+
+Instantiate one PMU instance for per VCPU through this API.
+
+Groups:
+  KVM_DEV_ARM_PMU_GRP_IRQ
+  Attributes:
+The attr field of kvm_device_attr encodes one value:
+bits: | 63  32 | 31   0 |
+values:   |  reserved  | vcpu_index |
+A value describing the PMU overflow interrupt number for the specified
+vcpu_index vcpu. This interrupt could be a PPI or SPI, but for one VM the
+interrupt type must be same for each vcpu. As a PPI, the interrupt number 
is
+same for all vcpus, while as a SPI it must be different for each vcpu.
+
+  Errors:
+-ENXIO: Unsupported attribute group
+-EBUSY: The PMU overflow interrupt is already set
+-ENODEV: Getting the PMU overflow interrupt number while it's not set
+-EINVAL: Invalid vcpu_index or PMU overflow interrupt number supplied
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 2d4ca4b..cbb9022 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -204,6 +204,10 @@ struct kvm_arch_memory_slot {
 #define KVM_DEV_ARM_VGIC_GRP_CTRL  4
 #define   KVM_DEV_ARM_VGIC_CTRL_INIT   0
 
+/* Device Control API: ARM PMU */
+#define KVM_DEV_ARM_PMU_GRP_IRQ0
+#define   KVM_DEV_ARM_PMU_CPUID_MASK   0xULL
+
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
 #define KVM_ARM_IRQ_TYPE_MASK  0xff
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c923350..608dea6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1161,6 +1161,7 @@ extern struct kvm_device_ops kvm_mpic_ops;
 extern struct kvm_device_ops kvm_xics_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
 extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
+extern struct kvm_device_ops kvm_arm_pmu_ops;
 
 #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
 
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 03f3618..4ba6fdd 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1032,6 +1032,8 @@ enum kvm_device_type {
 #define KVM_DEV_TYPE_FLIC  KVM_DEV_TYPE_FLIC
KVM_DEV_TYPE_ARM_VGIC_V3,
 #define KVM_DEV_TYPE_ARM_VGIC_V3   KVM_DEV_TYPE_ARM_VGIC_V3
+   KVM_DEV_TYPE_ARM_PMU_V3,
+#defineKVM_DEV_TYPE_ARM_PMU_V3 KVM_DEV_TYPE_ARM_PMU_V3
KVM_DEV_TYPE_MAX,
 };
 
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 3ec3cdd..5518308 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -374,3 +375,130 @@ void kvm_pmu_set_counter_event_type(struct kvm_vcpu 
*vcpu, u64 data,
 
pmc->perf_event = event;
 }
+
+static inline bool kvm_arm_pmu_initialized(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.pmu.irq_num != -1;
+}
+
+static int kvm_arm_pmu_irq_access(struct kvm *kvm, struct kvm_device_attr 
*attr,
+ int *irq, bool is_set)
+{
+   int cpuid;
+   struct kvm_vcpu *vcpu;
+   struct kvm_pmu *pmu;
+
+   cpuid = attr->attr & KVM_DEV_ARM_PMU_CPUID_MASK;
+   if (cpuid >= atomic_read(>online_vcpus))
+   return -EINVAL;
+
+   vcpu = kvm_get_vcpu(kvm, cpuid);
+   if (!vcpu)
+   return -EINVAL;
+
+   pmu = >arch.pmu;
+   if (!is_set) {
+   if (!kvm_arm_pmu_initialized(vcpu))
+   return -ENODEV;
+
+   *irq = pmu->irq_num;
+   } else {
+   if (kvm_arm_pmu_initialized(vcpu))
+   return -EBUSY;
+
+   kvm_debug("Set kvm ARM PMU irq: %d\n", *irq);
+   pmu->irq_num = *irq;
+   }
+
+   return 0;
+}
+
+static int kvm_arm_pmu_create(struct kvm_device *dev, u32 type)
+{
+   int i;
+   struct kvm_vcpu *vcpu;
+   struct kvm *kvm = dev->kvm;
+
+ 

[PATCH v8 05/20] KVM: ARM64: Add access handler for PMSELR register

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

Since the reset value of PMSELR_EL0 is UNKNOWN, use reset_unknown for
its reset handler. When reading PMSELR, return the PMSELR.SEL field to
guest.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index c60047e..f9985fc 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -474,6 +474,18 @@ static bool access_pmcr(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
return true;
 }
 
+static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+   if (p->is_write)
+   vcpu_sys_reg(vcpu, r->reg) = p->regval;
+   else
+   /* return PMSELR.SEL field */
+   p->regval = vcpu_sys_reg(vcpu, r->reg) & ARMV8_COUNTER_MASK;
+
+   return true;
+}
+
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
/* DBGBVRn_EL1 */   \
@@ -673,7 +685,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  trap_raz_wi },
/* PMSELR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101),
- trap_raz_wi },
+ access_pmselr, reset_unknown, PMSELR_EL0 },
/* PMCEID0_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b110),
  trap_raz_wi },
@@ -924,7 +936,7 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(12), Op2( 1), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 2), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 5), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmselr },
{ Op1( 0), CRn( 9), CRm(12), Op2( 6), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 7), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi },
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 09/20] KVM: ARM64: Add access handler for event counter register

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

These kind of registers include PMEVCNTRn, PMCCNTR and PMXEVCNTR which
is mapped to PMEVCNTRn.

The access handler translates all aarch32 register offsets to aarch64
ones and uses vcpu_sys_reg() to access their values to avoid taking care
of big endian.

When reading these registers, return the sum of register value and the
value perf event counts.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 138 --
 1 file changed, 134 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ed2939b..1818947 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -569,6 +569,57 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, 
struct sys_reg_params *p,
return true;
 }
 
+static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
+ struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+   u64 idx, reg, val;
+
+   if (!p->is_aarch32) {
+   if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 2)
+   /* PMXEVCNTR_EL0 */
+   reg = 0;
+   else
+   /* PMEVCNTRn_EL0 or PMCCNTR_EL0 */
+   reg = r->reg;
+   } else {
+   if (r->CRn == 9 && r->CRm == 13) {
+   reg = (r->Op2 & 2) ? 0 : PMCCNTR_EL0;
+   } else {
+   reg = ((r->CRm & 3) << 3) | (r->Op2 & 7);
+   reg += PMEVCNTR0_EL0;
+   }
+   }
+
+   switch (reg) {
+   case PMEVCNTR0_EL0 ... PMEVCNTR30_EL0:
+   idx = reg - PMEVCNTR0_EL0;
+   if (!pmu_counter_idx_valid(vcpu, idx))
+   return true;
+   break;
+   case PMCCNTR_EL0:
+   idx = ARMV8_CYCLE_IDX;
+   break;
+   default:
+   /* PMXEVCNTR_EL0 */
+   idx = vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_COUNTER_MASK;
+   if (!pmu_counter_idx_valid(vcpu, idx))
+   return true;
+
+   reg = (idx == ARMV8_CYCLE_IDX) ? PMCCNTR_EL0
+: PMEVCNTR0_EL0 + idx;
+   break;
+   }
+
+   val = kvm_pmu_get_counter_value(vcpu, idx);
+   if (p->is_write)
+   vcpu_sys_reg(vcpu, reg) += (s64)p->regval - val;
+   else
+   p->regval = val;
+
+   return true;
+}
+
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
/* DBGBVRn_EL1 */   \
@@ -584,6 +635,13 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, 
struct sys_reg_params *p,
{ Op0(0b10), Op1(0b000), CRn(0b), CRm((n)), Op2(0b111), \
  trap_wcr, reset_wcr, n, 0,  get_wcr, set_wcr }
 
+/* Macro to expand the PMEVCNTRn_EL0 register */
+#define PMU_PMEVCNTR_EL0(n)\
+   /* PMEVCNTRn_EL0 */ \
+   { Op0(0b11), Op1(0b011), CRn(0b1110),   \
+ CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ access_pmu_evcntr, reset_unknown, (PMEVCNTR0_EL0 + n), }
+
 /* Macro to expand the PMEVTYPERn_EL0 register */
 #define PMU_PMEVTYPER_EL0(n)   \
/* PMEVTYPERn_EL0 */\
@@ -784,13 +842,13 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmceid },
/* PMCCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
- trap_raz_wi },
+ access_pmu_evcntr, reset_unknown, PMCCNTR_EL0 },
/* PMXEVTYPER_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
  access_pmu_evtyper },
/* PMXEVCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
- trap_raz_wi },
+ access_pmu_evcntr },
/* PMUSERENR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000),
  trap_raz_wi },
@@ -805,6 +863,38 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b), Op2(0b011),
  NULL, reset_unknown, TPIDRRO_EL0 },
 
+   /* PMEVCNTRn_EL0 */
+   PMU_PMEVCNTR_EL0(0),
+   PMU_PMEVCNTR_EL0(1),
+   PMU_PMEVCNTR_EL0(2),
+   PMU_PMEVCNTR_EL0(3),
+   PMU_PMEVCNTR_EL0(4),
+   PMU_PMEVCNTR_EL0(5),
+   PMU_PMEVCNTR_EL0(6),
+   PMU_PMEVCNTR_EL0(7),
+   PMU_PMEVCNTR_EL0(8),
+   PMU_PMEVCNTR_EL0(9),
+   PMU_PMEVCNTR_EL0(10),
+   PMU_PMEVCNTR_EL0(11),
+   PMU_PMEVCNTR_EL0(12),
+   PMU_PMEVCNTR_EL0(13),
+   

[PATCH v8 19/20] KVM: ARM64: Free perf event of PMU when destroying vcpu

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

When KVM frees VCPU, it needs to free the perf_event of PMU.

Signed-off-by: Shannon Zhao 
---
 arch/arm/kvm/arm.c|  1 +
 include/kvm/arm_pmu.h |  2 ++
 virt/kvm/arm/pmu.c| 21 +
 3 files changed, 24 insertions(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index f54264c..d2c2cc3 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -266,6 +266,7 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
kvm_mmu_free_memory_caches(vcpu);
kvm_timer_vcpu_terminate(vcpu);
kvm_vgic_vcpu_destroy(vcpu);
+   kvm_pmu_vcpu_destroy(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 51dd2d1..bd49cde 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -38,6 +38,7 @@ struct kvm_pmu {
 u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx);
 u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu);
 void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu);
+void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu);
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val);
@@ -59,6 +60,7 @@ u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
return 0;
 }
 void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {}
+void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) {}
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val) {}
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index e6aac73..3ec3cdd 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -85,6 +85,27 @@ void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu)
}
 }
 
+/**
+ * kvm_pmu_vcpu_destroy - free perf event of PMU for cpu
+ * @vcpu: The vcpu pointer
+ *
+ */
+void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu)
+{
+   int i;
+   struct kvm_pmu *pmu = >arch.pmu;
+
+   for (i = 0; i < ARMV8_MAX_COUNTERS; i++) {
+   struct kvm_pmc *pmc = >pmc[i];
+
+   if (pmc->perf_event) {
+   perf_event_disable(pmc->perf_event);
+   perf_event_release_kernel(pmc->perf_event);
+   pmc->perf_event = NULL;
+   }
+   }
+}
+
 u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
 {
u64 val = vcpu_sys_reg(vcpu, PMCR_EL0) >> ARMV8_PMCR_N_SHIFT;
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 12/20] KVM: ARM64: Add access handler for PMOVSSET and PMOVSCLR register

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

Since the reset value of PMOVSSET and PMOVSCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a handler to emulate writing
PMOVSSET or PMOVSCLR register.

When writing non-zero value to PMOVSSET, the counter and its interrupt
is enabled, kick this vcpu to sync PMU interrupt.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 26 +++---
 include/kvm/arm_pmu.h |  2 ++
 virt/kvm/arm/pmu.c| 30 ++
 3 files changed, 55 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 24ce4fe..d61f271dd 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -663,6 +663,25 @@ static bool access_pminten(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
return true;
 }
 
+static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+const struct sys_reg_desc *r)
+{
+   u64 mask = kvm_pmu_valid_counter_mask(vcpu);
+
+   if (p->is_write) {
+   if (r->CRm & 0x2)
+   /* accessing PMOVSSET_EL0 */
+   kvm_pmu_overflow_set(vcpu, p->regval & mask);
+   else
+   /* accessing PMOVSCLR_EL0 */
+   vcpu_sys_reg(vcpu, r->reg) &= ~(p->regval & mask);
+   } else {
+   p->regval = vcpu_sys_reg(vcpu, r->reg) & mask;
+   }
+
+   return true;
+}
+
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
/* DBGBVRn_EL1 */   \
@@ -870,7 +889,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmcnten, NULL, PMCNTENSET_EL0 },
/* PMOVSCLR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
- trap_raz_wi },
+ access_pmovs, NULL, PMOVSSET_EL0 },
/* PMSWINC_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b100),
  trap_raz_wi },
@@ -897,7 +916,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  trap_raz_wi },
/* PMOVSSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011),
- trap_raz_wi },
+ access_pmovs, reset_unknown, PMOVSSET_EL0 },
 
/* TPIDR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b), Op2(0b010),
@@ -1211,7 +1230,7 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr },
{ Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmcnten },
{ Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmcnten },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmovs },
{ Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmselr },
{ Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmceid },
{ Op1( 0), CRn( 9), CRm(12), Op2( 7), access_pmceid },
@@ -1221,6 +1240,7 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pminten },
{ Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pminten },
+   { Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmovs },
 
{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 9d2d0c0..244970b 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -39,6 +39,7 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx);
 u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu);
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val);
+void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
u64 select_idx);
 #else
@@ -55,6 +56,7 @@ u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
 }
 void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val) {}
+void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
u64 select_idx) {}
 #endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index bc64043..c23d57e 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -130,6 +130,36 @@ void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 
val)
}
 }
 
+static inline u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
+{
+   u64 reg;
+
+   reg = vcpu_sys_reg(vcpu, PMOVSSET_EL0);
+   reg &= vcpu_sys_reg(vcpu, 

[RFC v4 5/5] VSOCK: Add Makefile and Kconfig

2015-12-22 Thread Stefan Hajnoczi
From: Asias He 

Enable virtio-vsock and vhost-vsock.

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
v4:
 * Make checkpatch.pl happy with longer option description
 * Clarify dependency on virtio rather than QEMU as suggested by Alex
   Bennee
v3:
 * Don't put vhost vsock driver into staging
 * Add missing Kconfig dependencies (Arnd Bergmann )
---
 drivers/vhost/Kconfig  | 15 +++
 drivers/vhost/Makefile |  4 
 net/vmw_vsock/Kconfig  | 19 +++
 net/vmw_vsock/Makefile |  2 ++
 4 files changed, 40 insertions(+)

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 533eaf0..d7aae9e 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -21,6 +21,21 @@ config VHOST_SCSI
Say M here to enable the vhost_scsi TCM fabric module
for use with virtio-scsi guests
 
+config VHOST_VSOCK
+   tristate "vhost virtio-vsock driver"
+   depends on VSOCKETS && EVENTFD
+   select VIRTIO_VSOCKETS_COMMON
+   select VHOST
+   select VHOST_RING
+   default n
+   ---help---
+   This kernel module can be loaded in the host kernel to provide AF_VSOCK
+   sockets for communicating with guests.  The guests must have the
+   virtio_transport.ko driver loaded to use the virtio-vsock device.
+
+   To compile this driver as a module, choose M here: the module will be 
called
+   vhost_vsock.
+
 config VHOST_RING
tristate
---help---
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index e0441c3..6b012b9 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -4,5 +4,9 @@ vhost_net-y := net.o
 obj-$(CONFIG_VHOST_SCSI) += vhost_scsi.o
 vhost_scsi-y := scsi.o
 
+obj-$(CONFIG_VHOST_VSOCK) += vhost_vsock.o
+vhost_vsock-y := vsock.o
+
 obj-$(CONFIG_VHOST_RING) += vringh.o
+
 obj-$(CONFIG_VHOST)+= vhost.o
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 14810ab..f27e74b 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -26,3 +26,22 @@ config VMWARE_VMCI_VSOCKETS
 
  To compile this driver as a module, choose M here: the module
  will be called vmw_vsock_vmci_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS
+   tristate "virtio transport for Virtual Sockets"
+   depends on VSOCKETS && VIRTIO
+   select VIRTIO_VSOCKETS_COMMON
+   help
+ This module implements a virtio transport for Virtual Sockets.
+
+ Enable this transport if your Virtual Machine host supports Virtual
+ Sockets over virtio.
+
+ To compile this driver as a module, choose M here: the module
+ will be called virtio_vsock_transport. If unsure, say N.
+
+config VIRTIO_VSOCKETS_COMMON
+   tristate
+   ---help---
+ This option is selected by any driver which needs to access
+ the virtio_vsock.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 2ce52d7..cf4c294 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -1,5 +1,7 @@
 obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS) += virtio_transport.o
+obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += virtio_transport_common.o
 
 vsock-y += af_vsock.o vsock_addr.o
 
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 07/20] KVM: ARM64: PMU: Add perf event map and introduce perf event creating function

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

When we use tools like perf on host, perf passes the event type and the
id of this event type category to kernel, then kernel will map them to
hardware event number and write this number to PMU PMEVTYPER_EL0
register. When getting the event number in KVM, directly use raw event
type to create a perf_event for it.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/include/asm/pmu.h |   3 ++
 arch/arm64/kvm/Makefile  |   1 +
 include/kvm/arm_pmu.h|  11 
 virt/kvm/arm/pmu.c   | 122 +++
 4 files changed, 137 insertions(+)
 create mode 100644 virt/kvm/arm/pmu.c

diff --git a/arch/arm64/include/asm/pmu.h b/arch/arm64/include/asm/pmu.h
index 4406184..2588f9c 100644
--- a/arch/arm64/include/asm/pmu.h
+++ b/arch/arm64/include/asm/pmu.h
@@ -21,6 +21,7 @@
 
 #define ARMV8_MAX_COUNTERS  32
 #define ARMV8_COUNTER_MASK  (ARMV8_MAX_COUNTERS - 1)
+#define ARMV8_CYCLE_IDX (ARMV8_MAX_COUNTERS - 1)
 
 /*
  * Per-CPU PMCR: config reg
@@ -31,6 +32,8 @@
 #define ARMV8_PMCR_D   (1 << 3) /* CCNT counts every 64th cpu cycle */
 #define ARMV8_PMCR_X   (1 << 4) /* Export to ETM */
 #define ARMV8_PMCR_DP  (1 << 5) /* Disable CCNT if non-invasive debug*/
+/* Determines which PMCCNTR_EL0 bit generates an overflow */
+#define ARMV8_PMCR_LC  (1 << 6)
 #defineARMV8_PMCR_N_SHIFT  11   /* Number of counters 
supported */
 #defineARMV8_PMCR_N_MASK   0x1f
 #defineARMV8_PMCR_MASK 0x3f /* Mask for writable bits */
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index caee9ee..122cff4 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -26,3 +26,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2-emul.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3-emul.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
+kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index ddcb5b2..14bedb0 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -34,9 +34,20 @@ struct kvm_pmu {
int irq_num;
struct kvm_pmc pmc[ARMV8_MAX_COUNTERS];
 };
+
+u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx);
+void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
+   u64 select_idx);
 #else
 struct kvm_pmu {
 };
+
+u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx)
+{
+   return 0;
+}
+void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
+   u64 select_idx) {}
 #endif
 
 #endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
new file mode 100644
index 000..9d27999
--- /dev/null
+++ b/virt/kvm/arm/pmu.c
@@ -0,0 +1,122 @@
+/*
+ * Copyright (C) 2015 Linaro Ltd.
+ * Author: Shannon Zhao 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/**
+ * kvm_pmu_get_counter_value - get PMU counter value
+ * @vcpu: The vcpu pointer
+ * @select_idx: The counter index
+ */
+u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx)
+{
+   u64 counter, reg, enabled, running;
+   struct kvm_pmu *pmu = >arch.pmu;
+   struct kvm_pmc *pmc = >pmc[select_idx];
+
+   reg = (select_idx == ARMV8_CYCLE_IDX)
+ ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx;
+   counter = vcpu_sys_reg(vcpu, reg);
+
+   /* The real counter value is equal to the value of counter register plus
+* the value perf event counts.
+*/
+   if (pmc->perf_event)
+   counter += perf_event_read_value(pmc->perf_event, ,
+);
+
+   return counter & pmc->bitmask;
+}
+
+/**
+ * kvm_pmu_stop_counter - stop PMU counter
+ * @pmc: The PMU counter pointer
+ *
+ * If this counter has been configured to monitor some event, release it here.
+ */
+static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc)
+{
+   u64 counter, reg;
+
+   if (pmc->perf_event) {
+   counter = kvm_pmu_get_counter_value(vcpu, pmc->idx);
+   reg = (pmc->idx == ARMV8_CYCLE_IDX)
+  ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + 

[PATCH v8 10/20] KVM: ARM64: Add access handler for PMCNTENSET and PMCNTENCLR register

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

Since the reset value of PMCNTENSET and PMCNTENCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a handler to emulate writing
PMCNTENSET or PMCNTENCLR register.

When writing to PMCNTENSET, call perf_event_enable to enable the perf
event. When writing to PMCNTENCLR, call perf_event_disable to disable
the perf event.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 32 +---
 include/kvm/arm_pmu.h |  9 +++
 virt/kvm/arm/pmu.c| 63 +++
 3 files changed, 100 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 1818947..3416881 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -620,6 +620,30 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu,
return true;
 }
 
+static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+  const struct sys_reg_desc *r)
+{
+   u64 val, mask;
+
+   mask = kvm_pmu_valid_counter_mask(vcpu);
+   if (p->is_write) {
+   val = p->regval & mask;
+   if (r->Op2 & 0x1) {
+   /* accessing PMCNTENSET_EL0 */
+   vcpu_sys_reg(vcpu, r->reg) |= val;
+   kvm_pmu_enable_counter(vcpu, val);
+   } else {
+   /* accessing PMCNTENCLR_EL0 */
+   vcpu_sys_reg(vcpu, r->reg) &= ~val;
+   kvm_pmu_disable_counter(vcpu, val);
+   }
+   } else {
+   p->regval = vcpu_sys_reg(vcpu, r->reg) & mask;
+   }
+
+   return true;
+}
+
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
/* DBGBVRn_EL1 */   \
@@ -821,10 +845,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmcr, reset_pmcr, PMCR_EL0, },
/* PMCNTENSET_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001),
- trap_raz_wi },
+ access_pmcnten, reset_unknown, PMCNTENSET_EL0 },
/* PMCNTENCLR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b010),
- trap_raz_wi },
+ access_pmcnten, NULL, PMCNTENSET_EL0 },
/* PMOVSCLR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011),
  trap_raz_wi },
@@ -1166,8 +1190,8 @@ static const struct sys_reg_desc cp15_regs[] = {
 
/* PMU */
{ Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 1), trap_raz_wi },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 2), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmcnten },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmcnten },
{ Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmselr },
{ Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmceid },
diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h
index 14bedb0..9d2d0c0 100644
--- a/include/kvm/arm_pmu.h
+++ b/include/kvm/arm_pmu.h
@@ -36,6 +36,9 @@ struct kvm_pmu {
 };
 
 u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx);
+u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu);
+void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val);
+void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val);
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
u64 select_idx);
 #else
@@ -46,6 +49,12 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 
select_idx)
 {
return 0;
 }
+u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
+{
+   return 0;
+}
+void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val) {}
+void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val) {}
 void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data,
u64 select_idx) {}
 #endif
diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 9d27999..bc64043 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -67,6 +67,69 @@ static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, 
struct kvm_pmc *pmc)
}
 }
 
+u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu)
+{
+   u64 val = vcpu_sys_reg(vcpu, PMCR_EL0) >> ARMV8_PMCR_N_SHIFT;
+
+   val &= ARMV8_PMCR_N_MASK;
+   return GENMASK(val - 1, 0) | BIT(ARMV8_CYCLE_IDX);
+}
+
+/**
+ * kvm_pmu_enable_counter - enable selected PMU counter
+ * @vcpu: The vcpu pointer
+ * @val: the value guest writes to PMCNTENSET register
+ *
+ * Call perf_event_enable to start counting the perf event
+ */
+void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val)
+{
+   int i;
+   struct kvm_pmu *pmu = >arch.pmu;

[PATCH v8 11/20] KVM: ARM64: Add access handler for PMINTENSET and PMINTENCLR register

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

Since the reset value of PMINTENSET and PMINTENCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a handler to emulate writing
PMINTENSET or PMINTENCLR register.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 3416881..24ce4fe 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -644,6 +644,25 @@ static bool access_pmcnten(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
return true;
 }
 
+static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+  const struct sys_reg_desc *r)
+{
+   u64 mask = kvm_pmu_valid_counter_mask(vcpu);
+
+   if (p->is_write) {
+   if (r->Op2 & 0x1)
+   /* accessing PMINTENSET_EL1 */
+   vcpu_sys_reg(vcpu, r->reg) |= (p->regval & mask);
+   else
+   /* accessing PMINTENCLR_EL1 */
+   vcpu_sys_reg(vcpu, r->reg) &= ~(p->regval & mask);
+   } else {
+   p->regval = vcpu_sys_reg(vcpu, r->reg) & mask;
+   }
+
+   return true;
+}
+
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
/* DBGBVRn_EL1 */   \
@@ -802,10 +821,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 
/* PMINTENSET_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b001),
- trap_raz_wi },
+ access_pminten, reset_unknown, PMINTENSET_EL1 },
/* PMINTENCLR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b010),
- trap_raz_wi },
+ access_pminten, NULL, PMINTENSET_EL1 },
 
/* MAIR_EL1 */
{ Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000),
@@ -1200,8 +1219,8 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_evtyper },
{ Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_evcntr },
{ Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi },
-   { Op1( 0), CRn( 9), CRm(14), Op2( 1), trap_raz_wi },
-   { Op1( 0), CRn( 9), CRm(14), Op2( 2), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pminten },
+   { Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pminten },
 
{ Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR },
{ Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR },
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 06/20] KVM: ARM64: Add access handler for PMCEID0 and PMCEID1 register

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

Add access handler which gets host value of PMCEID0 or PMCEID1 when
guest access these registers. Writing action to PMCEID0 or PMCEID1 is
UNDEFINED.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index f9985fc..2552db1 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -486,6 +486,25 @@ static bool access_pmselr(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
return true;
 }
 
+static bool access_pmceid(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+ const struct sys_reg_desc *r)
+{
+   u64 pmceid;
+
+   if (p->is_write) {
+   kvm_inject_undefined(vcpu);
+   } else {
+   if (!(p->Op2 & 1))
+   asm volatile("mrs %0, pmceid0_el0\n" : "=r" (pmceid));
+   else
+   asm volatile("mrs %0, pmceid1_el0\n" : "=r" (pmceid));
+
+   p->regval = pmceid;
+   }
+
+   return true;
+}
+
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
/* DBGBVRn_EL1 */   \
@@ -688,10 +707,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  access_pmselr, reset_unknown, PMSELR_EL0 },
/* PMCEID0_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b110),
- trap_raz_wi },
+ access_pmceid },
/* PMCEID1_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b111),
- trap_raz_wi },
+ access_pmceid },
/* PMCCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000),
  trap_raz_wi },
@@ -937,8 +956,8 @@ static const struct sys_reg_desc cp15_regs[] = {
{ Op1( 0), CRn( 9), CRm(12), Op2( 2), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmselr },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 6), trap_raz_wi },
-   { Op1( 0), CRn( 9), CRm(12), Op2( 7), trap_raz_wi },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmceid },
+   { Op1( 0), CRn( 9), CRm(12), Op2( 7), access_pmceid },
{ Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(13), Op2( 1), trap_raz_wi },
{ Op1( 0), CRn( 9), CRm(13), Op2( 2), trap_raz_wi },
-- 
2.0.4


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v8 08/20] KVM: ARM64: Add access handler for event typer register

2015-12-22 Thread Shannon Zhao
From: Shannon Zhao 

These kind of registers include PMEVTYPERn, PMCCFILTR and PMXEVTYPER
which is mapped to PMEVTYPERn or PMCCFILTR.

The access handler translates all aarch32 register offsets to aarch64
ones and uses vcpu_sys_reg() to access their values to avoid taking care
of big endian.

When writing to these registers, create a perf_event for the selected
event type.

Signed-off-by: Shannon Zhao 
---
 arch/arm64/kvm/sys_regs.c | 156 +-
 1 file changed, 154 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2552db1..ed2939b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -505,6 +505,70 @@ static bool access_pmceid(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
return true;
 }
 
+static inline bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx)
+{
+   u64 pmcr, val;
+
+   pmcr = vcpu_sys_reg(vcpu, PMCR_EL0);
+   val = (pmcr >> ARMV8_PMCR_N_SHIFT) & ARMV8_PMCR_N_MASK;
+   if (idx >= val && idx != ARMV8_CYCLE_IDX)
+   return false;
+
+   return true;
+}
+
+static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
+  const struct sys_reg_desc *r)
+{
+   u64 idx, reg;
+
+   if (r->CRn == 9) {
+   /* PMXEVTYPER_EL0 */
+   reg = 0;
+   } else {
+   if (!p->is_aarch32) {
+   /* PMEVTYPERn_EL0 or PMCCFILTR_EL0 */
+   reg = r->reg;
+   } else {
+   if (r->CRn == 14 && r->CRm == 15 && r->Op2 == 7) {
+   reg = PMCCFILTR_EL0;
+   } else {
+   reg = ((r->CRm & 3) << 3) | (r->Op2 & 7);
+   reg += PMEVTYPER0_EL0;
+   }
+   }
+   }
+
+   switch (reg) {
+   case PMEVTYPER0_EL0 ... PMEVTYPER30_EL0:
+   idx = reg - PMEVTYPER0_EL0;
+   if (!pmu_counter_idx_valid(vcpu, idx))
+   return true;
+   break;
+   case PMCCFILTR_EL0:
+   idx = ARMV8_CYCLE_IDX;
+   break;
+   default:
+   /* PMXEVTYPER_EL0 */
+   idx = vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_COUNTER_MASK;
+   if (!pmu_counter_idx_valid(vcpu, idx))
+   return true;
+
+   reg = (idx == ARMV8_CYCLE_IDX) ? PMCCFILTR_EL0
+: PMEVTYPER0_EL0 + idx;
+   break;
+   }
+
+   if (p->is_write) {
+   kvm_pmu_set_counter_event_type(vcpu, p->regval, idx);
+   vcpu_sys_reg(vcpu, reg) = p->regval & ARMV8_EVTYPE_MASK;
+   } else {
+   p->regval = vcpu_sys_reg(vcpu, reg) & ARMV8_EVTYPE_MASK;
+   }
+
+   return true;
+}
+
 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */
 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \
/* DBGBVRn_EL1 */   \
@@ -520,6 +584,13 @@ static bool access_pmceid(struct kvm_vcpu *vcpu, struct 
sys_reg_params *p,
{ Op0(0b10), Op1(0b000), CRn(0b), CRm((n)), Op2(0b111), \
  trap_wcr, reset_wcr, n, 0,  get_wcr, set_wcr }
 
+/* Macro to expand the PMEVTYPERn_EL0 register */
+#define PMU_PMEVTYPER_EL0(n)   \
+   /* PMEVTYPERn_EL0 */\
+   { Op0(0b11), Op1(0b011), CRn(0b1110),   \
+ CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \
+ access_pmu_evtyper, reset_unknown, (PMEVTYPER0_EL0 + n), }
+
 /*
  * Architected system registers.
  * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2
@@ -716,7 +787,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
  trap_raz_wi },
/* PMXEVTYPER_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001),
- trap_raz_wi },
+ access_pmu_evtyper },
/* PMXEVCNTR_EL0 */
{ Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010),
  trap_raz_wi },
@@ -734,6 +805,45 @@ static const struct sys_reg_desc sys_reg_descs[] = {
{ Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b), Op2(0b011),
  NULL, reset_unknown, TPIDRRO_EL0 },
 
+   /* PMEVTYPERn_EL0 */
+   PMU_PMEVTYPER_EL0(0),
+   PMU_PMEVTYPER_EL0(1),
+   PMU_PMEVTYPER_EL0(2),
+   PMU_PMEVTYPER_EL0(3),
+   PMU_PMEVTYPER_EL0(4),
+   PMU_PMEVTYPER_EL0(5),
+   PMU_PMEVTYPER_EL0(6),
+   PMU_PMEVTYPER_EL0(7),
+   PMU_PMEVTYPER_EL0(8),
+   PMU_PMEVTYPER_EL0(9),
+   PMU_PMEVTYPER_EL0(10),
+   PMU_PMEVTYPER_EL0(11),
+   PMU_PMEVTYPER_EL0(12),
+   PMU_PMEVTYPER_EL0(13),
+ 

[RFC v4 2/5] VSOCK: Introduce virtio_vsock_common.ko

2015-12-22 Thread Stefan Hajnoczi
From: Asias He 

This module contains the common code and header files for the following
virtio_transporto and vhost_vsock kernel modules.

Signed-off-by: Asias He 
Signed-off-by: Stefan Hajnoczi 
---
v4:
 * Add MAINTAINERS file entry
 * checkpatch.pl cleanups
 * linux_vsock.h: drop wrong copy-pasted license header
 * Move tx sock refcounting to virtio_transport_alloc/free_pkt() to fix
   leaks in error paths
 * Add send_pkt_no_sock() to send RST packets with no listen socket
 * Rename per-socket state from virtio_transport to virtio_vsock_sock
 * Move send_pkt_ops to new virtio_transport struct
 * Drop dumppkt function, packet capture will be added in the future
 * Drop empty virtio_transport_dec_tx_pkt()
 * Allow guest->host connections again
 * Use trace events instead of pr_debug()
v3:
 * Remove unnecessary 3-way handshake, just do REQUEST/RESPONSE instead
   of REQUEST/RESPONSE/ACK
 * Remove SOCK_DGRAM support and focus on SOCK_STREAM first
 * Only allow host->guest connections (same security model as latest
   VMware)
v2:
 * Fix peer_buf_alloc inheritance on child socket
 * Notify other side of SOCK_STREAM disconnect (fixes shutdown
   semantics)
 * Avoid recursive mutex_lock(tx_lock) for write_space (fixes deadlock)
 * Define VIRTIO_VSOCK_TYPE_STREAM/DGRAM hardware interface constants
 * Define VIRTIO_VSOCK_SHUTDOWN_RCV/SEND hardware interface constants
---
 MAINTAINERS|  10 +
 include/linux/virtio_vsock.h   | 167 +
 .../trace/events/vsock_virtio_transport_common.h   | 144 
 include/uapi/linux/virtio_ids.h|   1 +
 include/uapi/linux/virtio_vsock.h  |  87 +++
 net/vmw_vsock/virtio_transport_common.c| 834 +
 6 files changed, 1243 insertions(+)
 create mode 100644 include/linux/virtio_vsock.h
 create mode 100644 include/trace/events/vsock_virtio_transport_common.h
 create mode 100644 include/uapi/linux/virtio_vsock.h
 create mode 100644 net/vmw_vsock/virtio_transport_common.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 050d0e7..d42db78 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11360,6 +11360,16 @@ S: Maintained
 F: drivers/media/v4l2-core/videobuf2-*
 F: include/media/videobuf2-*
 
+VIRTIO AND VHOST VSOCK DRIVER
+M: Stefan Hajnoczi 
+L: kvm@vger.kernel.org
+L: virtualizat...@lists.linux-foundation.org
+L: net...@vger.kernel.org
+S: Maintained
+F: include/linux/virtio_vsock.h
+F: include/uapi/linux/virtio_vsock.h
+F: net/vmw_vsock/virtio_transport_common.c
+
 VIRTUAL SERIO DEVICE DRIVER
 M: Stephen Chandler Paul 
 S: Maintained
diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
new file mode 100644
index 000..4acf1ad
--- /dev/null
+++ b/include/linux/virtio_vsock.h
@@ -0,0 +1,167 @@
+#ifndef _LINUX_VIRTIO_VSOCK_H
+#define _LINUX_VIRTIO_VSOCK_H
+
+#include 
+#include 
+#include 
+#include 
+
+#define VIRTIO_VSOCK_DEFAULT_MIN_BUF_SIZE  128
+#define VIRTIO_VSOCK_DEFAULT_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_MAX_BUF_SIZE  (1024 * 256)
+#define VIRTIO_VSOCK_DEFAULT_RX_BUF_SIZE   (1024 * 4)
+#define VIRTIO_VSOCK_MAX_BUF_SIZE  0xUL
+#define VIRTIO_VSOCK_MAX_PKT_BUF_SIZE  (1024 * 64)
+#define VIRTIO_VSOCK_MAX_TX_BUF_SIZE   (1024 * 1024 * 16)
+#define VIRTIO_VSOCK_MAX_DGRAM_SIZE(1024 * 64)
+
+enum {
+   VSOCK_VQ_CTRL   = 0,
+   VSOCK_VQ_RX = 1, /* for host to guest data */
+   VSOCK_VQ_TX = 2, /* for guest to host data */
+   VSOCK_VQ_MAX= 3,
+};
+
+/* Per-socket state (accessed via vsk->trans) */
+struct virtio_vsock_sock {
+   struct vsock_sock *vsk;
+
+   /* Protected by lock_sock(sk_vsock(trans->vsk)) */
+   u32 buf_size;
+   u32 buf_size_min;
+   u32 buf_size_max;
+
+   struct mutex tx_lock;
+   struct mutex rx_lock;
+
+   /* Protected by tx_lock */
+   u32 tx_cnt;
+   u32 buf_alloc;
+   u32 peer_fwd_cnt;
+   u32 peer_buf_alloc;
+
+   /* Protected by rx_lock */
+   u32 fwd_cnt;
+   u32 rx_bytes;
+   struct list_head rx_queue;
+};
+
+struct virtio_vsock_pkt {
+   struct virtio_vsock_hdr hdr;
+   struct work_struct work;
+   struct list_head list;
+   void *buf;
+   u32 len;
+   u32 off;
+};
+
+struct virtio_vsock_pkt_info {
+   u32 remote_cid, remote_port;
+   struct msghdr *msg;
+   u32 pkt_len;
+   u16 type;
+   u16 op;
+   u32 flags;
+};
+
+struct virtio_transport {
+   /* This must be the first field */
+   struct vsock_transport transport;
+
+   /* Send packet for a specific socket */
+   int (*send_pkt)(struct vsock_sock *vsk,
+   struct virtio_vsock_pkt_info *info);
+
+   /* Send packet without a socket (e.g. RST).  Prefer 

RE: [RFC PATCH 2/5] KVM: add KVM_EXIT_MSR exit reason and capability.

2015-12-22 Thread Pavel Fedin
 Hello!

> > 1. Is there any real need to distinguish between KVM_EXIT_MSR_WRITE and
> KVM_EXIT_MSR_AFTER_WRITE ? IMHO from userland's point of view these are the 
> same.
> 
> Indeed.  Perhaps the kernel can set .handled to true to let userspace
> know it already took care of it, instead of introducing yet another
> exit_reason.  The field would need to be marked in/out, then.

 I'm not sure that you need even this. Anyway, particular MSRs are 
function-specific, and if you're emulating an MSR in userspace,
then, i believe, you know the function behind it. And it's IMHO safe to just 
know that SynIC MSRs have some extra handling in
kernel. And i believe this has no direct impact on userland's behavior.
 But, you better know the details.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/5] KVM: add KVM_EXIT_MSR exit reason and capability.

2015-12-22 Thread 'Roman Kagan'
On Tue, Dec 22, 2015 at 10:24:13AM +0300, Pavel Fedin wrote:
> > +On the return path into kvm, user space should set handled to
> > +KVM_EXIT_MSR_HANDLED if it successfully handled the MSR access. Otherwise,
> > +handled should be set to KVM_EXIT_MSR_UNHANDLED, which will cause a general
> > +protection fault to be injected into the vcpu. If an error occurs during 
> > the
> > +return into kvm, the vcpu will not be run and another exit will be 
> > generated
> > +with type set to KVM_EXIT_MSR_COMPLETION_FAILED.
> > +
> > +If exit_reason is KVM_EXIT_MSR_AFTER_WRITE, then the vcpu has executed a 
> > wrmsr
> > +instruction which is handled by kvm but which user space may need to be
> > +notified about. index and data are set as described above; the value of 
> > type
> > +depends on the MSR that was written. handled is ignored on reentry into 
> > kvm.
> 
> 1. Is there any real need to distinguish between KVM_EXIT_MSR_WRITE and 
> KVM_EXIT_MSR_AFTER_WRITE ? IMHO from userland's point of view these are the 
> same.

Indeed.  Perhaps the kernel can set .handled to true to let userspace
know it already took care of it, instead of introducing yet another
exit_reason.  The field would need to be marked in/out, then.

Roman.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy problem on qemu-kvm platform

2015-12-22 Thread Kevin O'Connor
On Tue, Dec 22, 2015 at 02:14:12AM +, Gonglei (Arei) wrote:
> > From: Kevin O'Connor [mailto:ke...@koconnor.net]
> > Sent: Tuesday, December 22, 2015 2:47 AM
> > To: Gonglei (Arei)
> > Cc: Xulei (Stone); Paolo Bonzini; qemu-devel; seab...@seabios.org;
> > Huangweidong (C); kvm@vger.kernel.org; Radim Krcmar
> > Subject: Re: [Qemu-devel] [PATCH] SeaBios: Fix reset procedure reentrancy
> > problem on qemu-kvm platform
> > 
> > On Mon, Dec 21, 2015 at 09:41:32AM +, Gonglei (Arei) wrote:
> > > When the gurb of OS is booting, then the softirq and C function
> > > send_disk_op() may use extra stack of SeaBIOS. If we inject a NMI,
> > > romlayout.S: irqentry_extrastack is invoked, and the extra stack will
> > > be used again. And the stack of first calling will be broken, so that the
> > SeaBIOS stuck.
> > >
> > > You can easily reproduce the problem.
> > >
> > > 1. start on guest
> > > 2. reset the guest
> > > 3. inject a NMI when the guest show the grub surface 4. then the guest
> > > stuck
> > 
> > Does the SeaBIOS patch below help?  
> 
> Sorry, it doesn't work. What's worse is we cannot stop SeaBIOS stuck by
> Setting "CONFIG_ENTRY_EXTRASTACK=n" after applying this patch. 

Oops, can you try with the patch below instead?

> > I'm not familiar with how to "inject a
> > NMI" - can you describe the process in more detail?
> > 
> 
> 1. Qemu Command line:
> 
> #: /home/qemu/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 4096 -smp 8 
> -name suse -vnc 0.0.0.0:10 \
> -device virtio-scsi-pci,id=scsi0 -drive 
> file=/home/suse11_sp3_32_2,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native
>  \
> -device scsi-hd,bus=scsi0.0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
> -chardev file,id=seabios,path=/home/seabios.log -device 
> isa-debugcon,iobase=0x402,chardev=seabios \
> -monitor stdio -qmp unix:/tmp/qmp,server,nowait 
> 
> 2. Inject a NMI by QMP:
> 
> #: /home/qemu/scripts/qmp # ./qmp-shell /tmp/qmp
> Welcome to the QMP low-level shell!
> Connected to QEMU 2.5.0
> 
> (QEMU) system_reset
> {"return": {}}
> (QEMU) inject-nmi  
> {"return": {}}
> (QEMU) inject-nmi
> {"return": {}}
> 

I tried a few simple tests but was not able to reproduce.

-Kevin


--- a/src/romlayout.S
+++ b/src/romlayout.S
@@ -548,7 +548,10 @@ entry_post:
 ENTRY_INTO32 _cfunc32flat_handle_post   // Normal entry point
 
 ORG 0xe2c3
-IRQ_ENTRY 02
+.global entry_02
+entry_02:
+ENTRY handle_02  // NMI handler does not switch onto extra stack
+iretw
 
 ORG 0xe3fe
 .global entry_13_official
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 0/2] VFIO SRIOV support

2015-12-22 Thread Ilya Lesokhin
Today the QEMU hypervisor allows assigning a physical device to a VM,
facilitating driver development. However, it does not support enabling
SR-IOV by the VM kernel driver. Our goal is to implement such support,
allowing developers working on SR-IOV physical function drivers to work
inside VMs as well.

This patch series implements the kernel side of our solution.  It extends
the VFIO driver to support the PCIE SRIOV extended capability with
following features:
1. The ability to probe SRIOV BAR sizes.
2. The ability to enable and disable sriov.

This patch series is going to be used by QEMU to expose sriov capabilities
to VM. We already have an early prototype based on Knut Omang's patches for
SRIOV[1]. 

Open issues:
1. Binding the new VFs to VFIO driver.
Once the VM enables sriov it expects the new VFs to appear inside the VM.
To this end we need to bind the new vfs to the VFIO driver and have QEMU
grab them. We are currently achieve this goal using:
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
but we are not happy about this solution as a system might have another
device with the same id that is unrelated to our VM.
Other solution we've considered are:
 a. Having user space unbind and then bind the VFs to VFIO.
 Typically resulting in an unnecessary probing of the device.
 b. Adding a driver argument to pci_enable_sriov(...) and have
vfio call pci_enable_sriov with the vfio driver as argument.
This solution avoids the unnecessary but is more intrusive.

2. How to tell if it is safe to disable SRIOV?
In the current implementation, a userspace can enable sriov, grab one of
the VFs and then call disable sriov without releasing the device.  This
will result in a deadlock where the user process is stuck inside disable
sriov waiting for itself to release the device. Killing the process leaves
it in a zombie state.
We also get a strange warning saying:
[  181.668492] WARNING: CPU: 22 PID: 3684 at kernel/sched/core.c:7497 
__might_sleep+0x77/0x80() 
[  181.668502] do not call blocking ops when !TASK_RUNNING; state=1 set at 
[] prepare_to_wait_event+0x63/0xf0

3. How to expose the Supported Page Sizes and System Page Size registers in
the SRIOV capability? 
Presently the hypervisor initializes Supported Page Sizes once and assumes
it doesn't change therefore we cannot allow user space to change this
register at will. The first solution that comes to mind is to expose a
device that only supports the page size selected by the hypervisor.
Unfourtently, Per SR-IOV spec section 3.3.12, PFs are required to support
4-KB, 8-KB, 64-KB, 256-KB, 1-MB, and 4-MB page sizes. We currently map both
registers as virtualized and read only and leave user space to worry about
this problem.

4. Other SRIOV capabilities.
Do we want to hide capabilities we do not support in the SR-IOV
Capabilities register? or leave it to the userspace application?

[1] https://github.com/knuto/qemu/tree/sriov_patches_v6

Ilya Lesokhin (2):
  PCI: Expose iov_set_numvfs and iov_resource_size for modules.
  VFIO: Add support for SRIOV extended capablity

 drivers/pci/iov.c  |   4 +-
 drivers/vfio/pci/vfio_pci_config.c | 169 +
 include/linux/pci.h|   4 +
 3 files changed, 159 insertions(+), 18 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC 2/2] VFIO: Add support for SRIOV extended capablity

2015-12-22 Thread Ilya Lesokhin
Add support for PCIE SRIOV extended capablity with following features:
1. The ability to probe SRIOV BAR sizes.
2. The ability to enable and disable sriov.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Noa Osherovich 
Signed-off-by: Haggai Eran 
---
 drivers/vfio/pci/vfio_pci_config.c | 169 +
 1 file changed, 152 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c 
b/drivers/vfio/pci/vfio_pci_config.c
index ff75ca3..04e364f 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -420,6 +420,35 @@ static __le32 vfio_generate_bar_flags(struct pci_dev 
*pdev, int bar)
return cpu_to_le32(val);
 }
 
+static void vfio_sriov_bar_fixup(struct vfio_pci_device *vdev,
+int sriov_cap_start)
+{
+   struct pci_dev *pdev = vdev->pdev;
+   int i;
+   __le32 *bar;
+   u64 mask;
+
+   bar = (__le32 *)>vconfig[sriov_cap_start + PCI_SRIOV_BAR];
+
+   for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++, bar++) {
+   if (!pci_resource_start(pdev, i)) {
+   *bar = 0; /* Unmapped by host = unimplemented to user */
+   continue;
+   }
+
+   mask = ~(pci_iov_resource_size(pdev, i) - 1);
+
+   *bar &= cpu_to_le32((u32)mask);
+   *bar |= vfio_generate_bar_flags(pdev, i);
+
+   if (*bar & cpu_to_le32(PCI_BASE_ADDRESS_MEM_TYPE_64)) {
+   bar++;
+   *bar &= cpu_to_le32((u32)(mask >> 32));
+   i++;
+   }
+   }
+}
+
 /*
  * Pretend we're hardware and tweak the values of the *virtual* PCI BARs
  * to reflect the hardware capabilities.  This implements BAR sizing.
@@ -782,6 +811,124 @@ static int __init init_pci_ext_cap_pwr_perm(struct 
perm_bits *perm)
return 0;
 }
 
+static int __init init_pci_ext_cap_sriov_perm(struct perm_bits *perm)
+{
+   int i;
+
+   if (alloc_perm_bits(perm, pci_ext_cap_length[PCI_EXT_CAP_ID_SRIOV]))
+   return -ENOMEM;
+
+   /*
+* Virtualize the first dword of all express capabilities
+* because it includes the next pointer.  This lets us later
+* remove capabilities from the chain if we need to.
+*/
+   p_setd(perm, 0, ALL_VIRT, NO_WRITE);
+
+   /* VF Enable - Virtualized and writable
+* Memory Space Enable - Non-virtualized and writable
+*/
+   p_setw(perm, PCI_SRIOV_CTRL, PCI_SRIOV_CTRL_VFE,
+  PCI_SRIOV_CTRL_VFE | PCI_SRIOV_CTRL_MSE);
+
+   p_setw(perm, PCI_SRIOV_NUM_VF, (u16)ALL_VIRT, (u16)ALL_WRITE);
+   p_setw(perm, PCI_SRIOV_SUP_PGSIZE, (u16)ALL_VIRT, 0);
+
+   /* We cannot let user space application change the page size
+* so we mark it as read only and trust the user application
+* (e.g. qemu) to virtualize this correctly for the guest
+*/
+   p_setw(perm, PCI_SRIOV_SYS_PGSIZE, (u16)ALL_VIRT, 0);
+
+   for (i = 0; i < PCI_SRIOV_NUM_BARS; i++)
+   p_setd(perm, PCI_SRIOV_BAR + 4 * i, ALL_VIRT, ALL_WRITE);
+
+   return 0;
+}
+
+static int vfio_find_cap_start(struct vfio_pci_device *vdev, int pos)
+{
+   u8 cap;
+   int base = (pos >= PCI_CFG_SPACE_SIZE) ? PCI_CFG_SPACE_SIZE :
+PCI_STD_HEADER_SIZEOF;
+   cap = vdev->pci_config_map[pos];
+
+   if (cap == PCI_CAP_ID_BASIC)
+   return 0;
+
+   /* XXX Can we have to abutting capabilities of the same type? */
+   while (pos - 1 >= base && vdev->pci_config_map[pos - 1] == cap)
+   pos--;
+
+   return pos;
+}
+
+static int vfio_sriov_cap_config_read(struct vfio_pci_device *vdev, int pos,
+ int count, struct perm_bits *perm,
+  int offset, __le32 *val)
+{
+   int cap_start = vfio_find_cap_start(vdev, pos);
+
+   vfio_sriov_bar_fixup(vdev, cap_start);
+   return vfio_default_config_read(vdev, pos, count, perm, offset, val);
+}
+
+static int vfio_sriov_cap_config_write(struct vfio_pci_device *vdev, int pos,
+  int count, struct perm_bits *perm,
+  int offset, __le32 val)
+{
+   int ret;
+   int cap_start = vfio_find_cap_start(vdev, pos);
+   u16 sriov_ctrl = *(u16 *)(vdev->vconfig + cap_start + PCI_SRIOV_CTRL);
+   bool cur_vf_enabled = sriov_ctrl & PCI_SRIOV_CTRL_VFE;
+   bool vf_enabled;
+
+   switch (offset) {
+   case  PCI_SRIOV_NUM_VF:
+   /* Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset
+* and VF Stride may change when NumVFs changes.
+*
+* Therefore we should pass valid writes to the hardware.
+*
+* Per SR-IOV spec sec 3.3.7
+* The results are 

[RFC 1/2] PCI: Expose iov_set_numvfs and iov_resource_size for modules.

2015-12-22 Thread Ilya Lesokhin
Expose iov_set_numvfs and iov_resource_size to make them available
for VFIO-PCI sriov support.

Signed-off-by: Ilya Lesokhin 
Signed-off-by: Noa Osherovich 
Signed-off-by: Haggai Eran 
---
 drivers/pci/iov.c   | 4 +++-
 include/linux/pci.h | 4 
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index ee0ebff..f296bd3 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -41,7 +41,7 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
  *
  * Update iov->offset and iov->stride when NumVFs is written.
  */
-static inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
+inline void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn)
 {
struct pci_sriov *iov = dev->sriov;
 
@@ -49,6 +49,7 @@ static inline void pci_iov_set_numvfs(struct pci_dev *dev, 
int nr_virtfn)
pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_OFFSET, >offset);
pci_read_config_word(dev, iov->pos + PCI_SRIOV_VF_STRIDE, >stride);
 }
+EXPORT_SYMBOL(pci_iov_set_numvfs);
 
 /*
  * The PF consumes one bus number.  NumVFs, First VF Offset, and VF Stride
@@ -107,6 +108,7 @@ resource_size_t pci_iov_resource_size(struct pci_dev *dev, 
int resno)
 
return dev->sriov->barsz[resno - PCI_IOV_RESOURCES];
 }
+EXPORT_SYMBOL(pci_iov_resource_size);
 
 static int virtfn_add(struct pci_dev *dev, int id, int reset)
 {
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e90eb22..1039e18 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1724,6 +1724,8 @@ int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
+
+void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn);
 #else
 static inline int pci_iov_virtfn_bus(struct pci_dev *dev, int id)
 {
@@ -1745,6 +1747,8 @@ static inline int pci_sriov_get_totalvfs(struct pci_dev 
*dev)
 { return 0; }
 static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int 
resno)
 { return 0; }
+
+void pci_iov_set_numvfs(struct pci_dev *dev, int nr_virtfn) { }
 #endif
 
 #if defined(CONFIG_HOTPLUG_PCI) || defined(CONFIG_HOTPLUG_PCI_MODULE)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] arm: KVM: Do not update PC if the trap handler has updated it

2015-12-22 Thread Christoffer Dall
On Tue, Dec 22, 2015 at 11:08:10AM +, Peter Maydell wrote:
> On 22 December 2015 at 09:55, Marc Zyngier  wrote:
> > Assuming we trap a coprocessor access, and decide that the access
> > is illegal, we will inject an exception in the guest. In this
> > case, we shouldn't increment the PC, or the vcpu will miss the
> > first instruction of the handler, leading to a mildly confused
> > guest.
> >
> > Solve this by snapshoting PC before the access is performed,
> > and checking if it has moved or not before incrementing it.
> >
> > Reported-by: Shannon Zhao 
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm/kvm/coproc.c | 14 --
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> > index f3d88dc..f4ad2f2 100644
> > --- a/arch/arm/kvm/coproc.c
> > +++ b/arch/arm/kvm/coproc.c
> > @@ -447,12 +447,22 @@ static int emulate_cp15(struct kvm_vcpu *vcpu,
> > r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
> >
> > if (likely(r)) {
> > +   unsigned long pc = *vcpu_pc(vcpu);
> > +
> > /* If we don't have an accessor, we should never get here! 
> > */
> > BUG_ON(!r->access);
> >
> > if (likely(r->access(vcpu, params, r))) {
> > -   /* Skip instruction, since it was emulated */
> > -   kvm_skip_instr(vcpu, 
> > kvm_vcpu_trap_il_is32bit(vcpu));
> > +   /*
> > +* Skip the instruction if it was emulated
> > +* without PC having changed. This allows us
> > +* to detect a fault being injected
> > +* (incrementing the PC here would cause the
> > +* vcpu to skip the first instruction of its
> > +* fault handler).
> > +*/
> > +   if (pc == *vcpu_pc(vcpu))
> > +   kvm_skip_instr(vcpu, 
> > kvm_vcpu_trap_il_is32bit(vcpu));
> 
> Won't this result in our incorrectly skipping the first insn
> in the fault handler if the original offending instruction
> was itself the first insn in the fault handler?
> 
Wouldn't that then loop with the exception forever?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] kvmtool: replace documentations stubs with manpage

2015-12-22 Thread Andre Przywara
Hi,

as I got annoyed with the availability and quality of the
documentation and always wanted to write a manpage, I just took this
first step by replacing the stub text files in the Documentation
directory with a manpage.
This is clearly only the beginning, there is more functionality which
currently is not documented at all (networking comes to mind).

Cheers,
Andre.

Andre Przywara (2):
  Add a rudimentary manpage
  Documentation: remove documentation stubs and common-cmds.h generation

 .gitignore|   1 -
 Documentation/kvm-balloon.txt |  24 -
 Documentation/kvm-debug.txt   |  16 ---
 Documentation/kvm-list.txt|  16 ---
 Documentation/kvm-pause.txt   |  16 ---
 Documentation/kvm-resume.txt  |  16 ---
 Documentation/kvm-run.txt |  62 
 Documentation/kvm-sandbox.txt |  16 ---
 Documentation/kvm-setup.txt   |  15 ---
 Documentation/kvm-stat.txt|  19 
 Documentation/kvm-stop.txt|  16 ---
 Documentation/kvm-version.txt |  21 
 Documentation/kvmtool.1   | 222 ++
 Makefile  |  10 --
 command-list.txt  |  15 ---
 include/common-cmds.h |  19 
 16 files changed, 241 insertions(+), 263 deletions(-)
 delete mode 100644 Documentation/kvm-balloon.txt
 delete mode 100644 Documentation/kvm-debug.txt
 delete mode 100644 Documentation/kvm-list.txt
 delete mode 100644 Documentation/kvm-pause.txt
 delete mode 100644 Documentation/kvm-resume.txt
 delete mode 100644 Documentation/kvm-run.txt
 delete mode 100644 Documentation/kvm-sandbox.txt
 delete mode 100644 Documentation/kvm-setup.txt
 delete mode 100644 Documentation/kvm-stat.txt
 delete mode 100644 Documentation/kvm-stop.txt
 delete mode 100644 Documentation/kvm-version.txt
 create mode 100644 Documentation/kvmtool.1
 delete mode 100644 command-list.txt
 create mode 100644 include/common-cmds.h

-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Documentation: remove documentation stubs and common-cmds.h generation

2015-12-22 Thread Andre Przywara
Now that we have a manpage in place, we can get rid of the manpage
style text files in the Documentation directory.
This allows us also to get rid of the crude common-cmds.h generation,
which relied on these files and on a command-list.txt file.
Instead include the version of that header file generated with the
current HEAD into the source tree.

Signed-off-by: Andre Przywara 
---
 .gitignore|  1 -
 Documentation/kvm-balloon.txt | 24 -
 Documentation/kvm-debug.txt   | 16 ---
 Documentation/kvm-list.txt| 16 ---
 Documentation/kvm-pause.txt   | 16 ---
 Documentation/kvm-resume.txt  | 16 ---
 Documentation/kvm-run.txt | 62 ---
 Documentation/kvm-sandbox.txt | 16 ---
 Documentation/kvm-setup.txt   | 15 ---
 Documentation/kvm-stat.txt| 19 -
 Documentation/kvm-stop.txt| 16 ---
 Documentation/kvm-version.txt | 21 ---
 Makefile  | 10 ---
 command-list.txt  | 15 ---
 include/common-cmds.h | 19 +
 15 files changed, 19 insertions(+), 263 deletions(-)
 delete mode 100644 Documentation/kvm-balloon.txt
 delete mode 100644 Documentation/kvm-debug.txt
 delete mode 100644 Documentation/kvm-list.txt
 delete mode 100644 Documentation/kvm-pause.txt
 delete mode 100644 Documentation/kvm-resume.txt
 delete mode 100644 Documentation/kvm-run.txt
 delete mode 100644 Documentation/kvm-sandbox.txt
 delete mode 100644 Documentation/kvm-setup.txt
 delete mode 100644 Documentation/kvm-stat.txt
 delete mode 100644 Documentation/kvm-stop.txt
 delete mode 100644 Documentation/kvm-version.txt
 delete mode 100644 command-list.txt
 create mode 100644 include/common-cmds.h

diff --git a/.gitignore b/.gitignore
index a16a97f..f21a0bd 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,7 +6,6 @@
 *.swp
 .cscope
 tags
-include/common-cmds.h
 tests/boot/boot_test.iso
 tests/boot/rootfs/
 guest/init
diff --git a/Documentation/kvm-balloon.txt b/Documentation/kvm-balloon.txt
deleted file mode 100644
index efc0a87..000
--- a/Documentation/kvm-balloon.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-lkvm-balloon(1)
-
-
-NAME
-
-lkvm-balloon - Inflate or deflate the virtio balloon
-
-SYNOPSIS
-
-[verse]
-'lkvm balloon [command] [size] [instance]'
-
-DESCRIPTION

-The command inflates or deflates the virtio balloon located in the
-specified instance.
-For a list of running instances see 'lkvm list'.
-
-Command can be either 'inflate' or 'deflate'. Inflate increases the
-size of the balloon, thus decreasing the amount of virtual RAM available
-for the guest. Deflation returns previously inflated memory back to the
-guest.
-
-size is specified in Mb.
diff --git a/Documentation/kvm-debug.txt b/Documentation/kvm-debug.txt
deleted file mode 100644
index a8eb2c0..000
--- a/Documentation/kvm-debug.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-lkvm-debug(1)
-
-
-NAME
-
-lkvm-debug - Print debug information from a running instance
-
-SYNOPSIS
-
-[verse]
-'lkvm debug [instance]'
-
-DESCRIPTION

-The command prints debug information from a running instance.
-For a list of running instances see 'lkvm list'.
diff --git a/Documentation/kvm-list.txt b/Documentation/kvm-list.txt
deleted file mode 100644
index a245607..000
--- a/Documentation/kvm-list.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-lkvm-list(1)
-
-
-NAME
-
-lkvm-list - Print a list of running instances on the host.
-
-SYNOPSIS
-
-[verse]
-'lkvm list'
-
-DESCRIPTION

-This command prints a list of running instances on the host which
-belong to the user who currently ran 'lkvm list'.
diff --git a/Documentation/kvm-pause.txt b/Documentation/kvm-pause.txt
deleted file mode 100644
index 1ea2a23..000
--- a/Documentation/kvm-pause.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-lkvm-pause(1)
-
-
-NAME
-
-lkvm-pause - Pause the virtual machine
-
-SYNOPSIS
-
-[verse]
-'lkvm pause [instance]'
-
-DESCRIPTION

-The command pauses a virtual machine.
-For a list of running instances see 'lkvm list'.
diff --git a/Documentation/kvm-resume.txt b/Documentation/kvm-resume.txt
deleted file mode 100644
index a36c4df..000
--- a/Documentation/kvm-resume.txt
+++ /dev/null
@@ -1,16 +0,0 @@
-lkvm-resume(1)
-
-
-NAME
-
-lkvm-resume - Resume the virtual machine
-
-SYNOPSIS
-
-[verse]
-'lkvm resume [instance]'
-
-DESCRIPTION

-The command resumes a virtual machine.
-For a list of running instances see 'lkvm list'.
diff --git a/Documentation/kvm-run.txt b/Documentation/kvm-run.txt
deleted file mode 100644
index 8ddf470..000
--- a/Documentation/kvm-run.txt
+++ /dev/null
@@ -1,62 +0,0 @@
-lkvm-run(1)
-
-
-NAME
-
-lkvm-run - Start the virtual machine
-
-SYNOPSIS
-
-[verse]
-'lkvm 

[PATCH 1/2] Add a rudimentary manpage

2015-12-22 Thread Andre Przywara
The kvmtool documentation is somewhat lacking, also it is not easily
accessible when living in the source tree only.
Add a good ol' manpage to document at least the basic commands and
their options.
This level of documentation matches the one that is already there in
the Documentation directory and should be subject to extension.

Signed-off-by: Andre Przywara 
---
 Documentation/kvmtool.1 | 222 
 1 file changed, 222 insertions(+)
 create mode 100644 Documentation/kvmtool.1

diff --git a/Documentation/kvmtool.1 b/Documentation/kvmtool.1
new file mode 100644
index 000..aecb2dc
--- /dev/null
+++ b/Documentation/kvmtool.1
@@ -0,0 +1,222 @@
+.\" Manpage for kvmtool
+.\" Copyright (C) 2015 by Andre Przywara 
+.TH kvmtool 1 "11 Nov 2015" "0.1" "kvmtool man page"
+.SH NAME
+kvmtool \- running KVM guests
+.SH SYNOPSIS
+lkvm COMMAND [ARGS]
+.SH DESCRIPTION
+kvmtool is a userland tool for creating and controlling KVM guests.
+.SH "KVMTOOL COMMANDS"
+.sp
+.PP
+.B run -k  ...
+.RS 4
+Run a guest.
+.sp
+.B \-k, \-\-kernel 
+.RS 4
+The virtual machine kernel.
+.RE
+.sp
+.B \-c, \-\-cpus 
+.RS 4
+The number of virtual CPUs to run.
+.RE
+.sp
+.B \-m, \-\-mem 
+.RS 4
+Virtual machine memory size in MiB.
+.RE
+.sp
+.B \-p, \-\-params 
+.RS 4
+Additional kernel command line arguments.
+.RE
+.sp
+.B \-i, \-\-initrd 
+.RS 4
+Initial RAM disk image.
+.RE
+.sp
+.B \-d, \-\-disk 
+.RS 4
+A disk image file or a rootfs directory.
+.RE
+.sp
+.B \-\-console serial|virtio|hv
+.RS 4
+Console to use.
+.RE
+.sp
+.B \-\-dev 
+.RS 4
+KVM device file (instead of the default /dev/kvm).
+.RE
+.sp
+.B \-\-debug
+.RS 4
+Enable debug messages.
+.RE
+.sp
+.B \-\-debug-single-step
+.RS 4
+Enable single stepping.
+.RE
+.sp
+.B \-\-debug-ioport
+.RS 4
+Enable ioport debugging.
+.RE
+.RE
+.PP
+.B setup 
+.RS 4
+Setup a new virtual machine. This creates a new rootfs in the .lkvm
+folder of your home directory.
+.RE
+.PP
+.B pause \-\-all|\-\-name 
+.RS 4
+Pause a virtual machine.
+.sp
+.B \-a, \-\-all
+.RS 4
+Pause all running instances.
+.RE
+.sp
+.B \-n, \-\-name 
+.RS 4
+Pause that specified instance. For a list of running instances, see \fI lkvm 
list\fR.
+.RE
+.RE
+.PP
+.B resume --all|--name 
+.RS 4
+Resume a previously paused virtual machine.
+.sp
+.B \-a, \-\-all
+.RS 4
+Resume all running instances.
+.RE
+.sp
+.B \-n, \-\-name 
+.RS 4
+Resume that specified instance. For a list of running instances, see \fI lkvm 
list\fR.
+.RE
+.RE
+.PP
+.B list [\-i] [\-r]
+.RS 4
+Print a list of running instances on the host. This is restricted to instances
+started by the current user, as it looks in the .lkvm folder in your home
+directory to find the socket files.
+.sp
+.B \-i, \-\-run
+.RS 4
+List all running instances.
+.RE
+.sp
+.B \-r, \-\-rootfs
+.RS 4
+List rootfs instances.
+.RE
+.RE
+.PP
+.B debug --all|--name  [--dump] [--nmi ] [--sysrq ]
+.RS 4
+Print debug information from a running VM instance.
+.sp
+.B \-a, \-\-all
+.RS 4
+Debug all running instances.
+.RE
+.PP
+.B \-n, \-\-name 
+.RS 4
+Debug the specified instance.
+.RE
+.sp
+.B \-d, \-\-dump
+.RS 4
+Generate a debug dump from guest.
+.RE
+.PP
+.B \-m, \-\-nmi 
+.RS 4
+Generate an NMI on the specified virtual CPU.
+.RE
+.PP
+.B \-s, \-\-sysrq 
+.RS 4
+Inject a Linux sysrq into the guest.
+.RE
+.RE
+.PP
+.B balloon \-\-name  \-\-inflate|\-\-deflate 
+.RS 4
+This command inflates or deflates the virtio balloon located in the
+specified instance.
+\-\-inflate increases the size of the balloon, thus \fIdecreasing\fR the
+amount of virtual RAM available for the guest. \-\-deflate returns previously
+inflated memory back to the guest.
+.sp
+.B \-n, \-\-name 
+.RS 4
+Ballon the specified instance. For a list of all instances, see \fI"lkvm 
list"\fR.
+.RE
+.PP
+.B \-i, \-\-inflate 
+.RS 4
+Inflates the ballon by the specified number of Megabytes. This decreases the
+amount of usable memory in the guest.
+.RE
+.PP
+.B \-d, \-\-deflate 
+.RS 4
+Deflates the ballon by the specified number of Megabytes. This increases the
+amount of usable memory in the guest.
+.RE
+.RE
+.PP
+.B stop --all|--name 
+.RS 4
+Stop a running instance.
+.sp
+.B \-a, \-\-all
+.RS 4
+Stop all running instances.
+.RE
+.sp
+.B \-n, \-\-name 
+.RS 4
+Stop the specified instance. For a list of running instances, see \fI lkvm 
list\fR.
+.RE
+.RE
+.PP
+.B stat \-\-all|\-\-name  [\-m]
+.RS 4
+Print statistics about a running instance.
+.sp
+.B \-m, \-\-memory
+.RS 4
+Display memory statistics.
+.RE
+.RE
+.PP
+.B sandbox (\fIlkvm run arguments\fR) \-\- [sandboxed command]
+.RS 4
+Run a command in a sandboxed guest. Kvmtool will inject a special init
+binary which will do an initial setup of the guest Linux and then
+lauch a shell script with the specified command. Upon this command ending,
+the guest will be shutdown.
+.RE
+.SH EXAMPLES
+.RS 4
+\fB$\fR lkvm run -k bzImage
+.RE
+.SH SEE ALSO
+qemu(1), kvm(4)
+.SH BUGS
+.SH AUTHOR

Re: [RFC PATCH 2/5] KVM: add KVM_EXIT_MSR exit reason and capability.

2015-12-22 Thread 'Roman Kagan'
On Tue, Dec 22, 2015 at 03:51:52PM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > > 1. Is there any real need to distinguish between KVM_EXIT_MSR_WRITE and
> > KVM_EXIT_MSR_AFTER_WRITE ? IMHO from userland's point of view these are the 
> > same.
> > 
> > Indeed.  Perhaps the kernel can set .handled to true to let userspace
> > know it already took care of it, instead of introducing yet another
> > exit_reason.  The field would need to be marked in/out, then.
> 
>  I'm not sure that you need even this. Anyway, particular MSRs are 
> function-specific, and if you're emulating an MSR in userspace,
> then, i believe, you know the function behind it. And it's IMHO safe to just 
> know that SynIC MSRs have some extra handling in
> kernel. And i believe this has no direct impact on userland's behavior.

It has: unlike the scenario that was the original motivation for Peter's
patches, where the the userspace wanted to handle register accesses
which the kernel *didn't*, in case of SynIC the userspace wants do
something about MSR accesses *only* if the kernel *also* handles them.

I guess that was the reason why Paolo suggested an extra exit_reason,
and I think .handled field can be used to pass that information instead.

You're probably right that, at least in SynIC case, it should be safe to
assume that, if all the SynIC setup succeeded, the corresponding MSR
accesses would only trigger exits when the kernel processed them
appropriately.

But the proposed use of .handled costs basically nothing, and it may
prove useful in general (as a conisistency proof, if anything).

Roman.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] KVM fixes for v4.4-rc7

2015-12-22 Thread Paolo Bonzini
Linus,

The following changes since commit 6764e5ebd5c62236d082f9ae030674467d0b2779:

  Merge tag 'vfio-v4.4-rc5' of git://github.com/awilliam/linux-vfio (2015-12-09 
16:52:12 -0800)

are available in the git repository at:


  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to 0185604c2d82c560dab2f2933a18f797e74ab5a8:

  KVM: x86: Reload pit counters for all channels when restoring state 
(2015-12-22 15:36:26 +0100)


KVM/ARM fixes for v4.4-rc7

- A series of fixes to the MTRR emulation, tested in the BZ by several users
  so they should be safe this late

- A fix for a division by zero

- Two very simple ARM and PPC fixes


Alexis Dambricourt (1):
  KVM: MTRR: fix fixed MTRR segment look up

Andrew Honig (1):
  KVM: x86: Reload pit counters for all channels when restoring state

Christoffer Dall (1):
  KVM: arm/arm64: vgic: Fix kvm_vgic_map_is_active's dist check

Haozhong Zhang (1):
  KVM: VMX: Fix host initiated access to guest MSR_TSC_AUX

Paolo Bonzini (5):
  kvm: x86: move tracepoints outside extended quiescent state
  Merge branch 'kvm-ppc-fixes' of git://git.kernel.org/.../paulus/powerpc 
into kvm-master
  Merge tag 'kvm-arm-for-v4.4-rc6' of 
git://git.kernel.org/.../kvmarm/kvmarm into HEAD
  KVM: MTRR: observe maxphyaddr from guest CPUID, not host
  KVM: MTRR: treat memory as writeback if MTRR is disabled in guest CPUID

Paul Mackerras (1):
  KVM: PPC: Book3S HV: Prohibit setting illegal transaction state in MSR

 arch/powerpc/kvm/book3s_hv.c |  6 ++
 arch/x86/kvm/cpuid.h |  8 
 arch/x86/kvm/mtrr.c  | 25 +++--
 arch/x86/kvm/svm.c   |  4 ++--
 arch/x86/kvm/vmx.c   |  7 ---
 arch/x86/kvm/x86.c   | 12 
 virt/kvm/arm/vgic.c  |  2 +-
 7 files changed, 48 insertions(+), 16 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] arm: KVM: Do not update PC if the trap handler has updated it

2015-12-22 Thread Peter Maydell
On 22 December 2015 at 14:39, Christoffer Dall
 wrote:
> On Tue, Dec 22, 2015 at 11:08:10AM +, Peter Maydell wrote:
>> Won't this result in our incorrectly skipping the first insn
>> in the fault handler if the original offending instruction
>> was itself the first insn in the fault handler?
>>
> Wouldn't that then loop with the exception forever?

Yes, but so would real hardware...

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] vfio: Include No-IOMMU mode

2015-12-22 Thread Alex Williamson
There is really no way to safely give a user full access to a DMA
capable device without an IOMMU to protect the host system.  There is
also no way to provide DMA translation, for use cases such as device
assignment to virtual machines.  However, there are still those users
that want userspace drivers even under those conditions.  The UIO
driver exists for this use case, but does not provide the degree of
device access and programming that VFIO has.  In an effort to avoid
code duplication, this introduces a No-IOMMU mode for VFIO.

This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
the "enable_unsafe_noiommu_mode" option on the vfio driver.  This
should make it very clear that this mode is not safe.  Additionally,
CAP_SYS_RAWIO privileges are necessary to work with groups and
containers using this mode.  Groups making use of this support are
named /dev/vfio/noiommu-$GROUP and can only make use of the special
VFIO_NOIOMMU_IOMMU for the container.  Use of this mode, specifically
binding a device without a native IOMMU group to a VFIO bus driver
will taint the kernel and should therefore not be considered
supported.  This patch includes no-iommu support for the vfio-pci bus
driver only.

Signed-off-by: Alex Williamson 
Acked-by: Michael S. Tsirkin 
---

v3: Version 2 was dropped from kernel v4.4 due to lack of a user.  We
now have a working DPDK port to this interface, so I'm proposing
it again for v4.5.  The changes since v2 can be found split out
in the dpdk archive here:

http://dpdk.org/ml/archives/dev/2015-December/030561.html

The problem was that the NOIOMMU extension was only advertised
once a group was attached to a container.  While we want the
no-iommu backed to be used exclusively for no-iommu groups, we
should still advertise it when the module option is enabled.
Handling the no-iommu iommu driver less as a special case
accomplishes this.  Also fixed a mismatch in naming between module
parameter and description and tagged a struct as const.


 drivers/vfio/Kconfig|   15 
 drivers/vfio/pci/vfio_pci.c |8 +-
 drivers/vfio/vfio.c |  181 ++-
 include/linux/vfio.h|3 +
 include/uapi/linux/vfio.h   |7 ++
 5 files changed, 207 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 850d86c..da6e2ce 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -31,6 +31,21 @@ menuconfig VFIO
 
  If you don't know what to do here, say N.
 
+menuconfig VFIO_NOIOMMU
+   bool "VFIO No-IOMMU support"
+   depends on VFIO
+   help
+ VFIO is built on the ability to isolate devices using the IOMMU.
+ Only with an IOMMU can userspace access to DMA capable devices be
+ considered secure.  VFIO No-IOMMU mode enables IOMMU groups for
+ devices without IOMMU backing for the purpose of re-using the VFIO
+ infrastructure in a non-secure mode.  Use of this mode will result
+ in an unsupportable kernel and will therefore taint the kernel.
+ Device assignment to virtual machines is also not possible with
+ this mode since there is no IOMMU to provide DMA translation.
+
+ If you don't know what to do here, say N.
+
 source "drivers/vfio/pci/Kconfig"
 source "drivers/vfio/platform/Kconfig"
 source "virt/lib/Kconfig"
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 56bf6db..2760a7b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -940,13 +940,13 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
return -EINVAL;
 
-   group = iommu_group_get(>dev);
+   group = vfio_iommu_group_get(>dev);
if (!group)
return -EINVAL;
 
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev) {
-   iommu_group_put(group);
+   vfio_iommu_group_put(group, >dev);
return -ENOMEM;
}
 
@@ -957,7 +957,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
 
ret = vfio_add_group_dev(>dev, _pci_ops, vdev);
if (ret) {
-   iommu_group_put(group);
+   vfio_iommu_group_put(group, >dev);
kfree(vdev);
return ret;
}
@@ -993,7 +993,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
if (!vdev)
return;
 
-   iommu_group_put(pdev->dev.iommu_group);
+   vfio_iommu_group_put(pdev->dev.iommu_group, >dev);
kfree(vdev);
 
if (vfio_pci_is_vga(pdev)) {
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 6070b79..5c7ebf2 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -62,6 +62,7 @@ struct vfio_container {
struct 

Re: [kvm-unit-tests PATCH 1/3] run_tests.sh: reduce return code ambiguity

2015-12-22 Thread Radim Krčmář
2015-12-21 13:35-0600, Andrew Jones:
> On Mon, Dec 21, 2015 at 05:31:24PM +0100, Radim Krčmář wrote:
> > 2015-12-17 14:10-0600, Andrew Jones:
>>  > 128 = exited because of signal $? - 128
>>  * = unit-test failed
>> 
>> (Signal 0 is not used, so we could map 128 to mean "debug-exit probably
>>  wasn't called", but others might not understand our signal convention.
> 
> I think we want 128 to be the beginning of signal space, which goes all
> the way up to 255, in order to allow exit code masking to work.
> 
>>  Anyway, it'd be best for us to start at 200, for `case $? in 2??)` ...)
> 
> Start what at 200?

Signals, signal = $? - 200.  Shell default to decimal representation of
numbers, so using binary steps doesn't give an advantage and 55 is still
a plenty of space.  (I deplore elif cascade on the same variable, but we
can always convert the $? to binary/octal/hex, for `case` decoding. :])

>I think we have everything covered above. The mapping
> looks like this
> 
> 0 = success
> 1-63  = unit test failure code
> 64-127= test suite failure code

77 is not that easy to categorize -- we want to return it from both.

> 128-255   = signal
> 
> which sounds good to me.

To me as well.

>> > Signed-off-by: Andrew Jones 
>> > ---
>> > diff --git a/run_tests.sh b/run_tests.sh
>> > @@ -54,10 +55,32 @@ function run()
>> >  
>> >  # extra_params in the config file may contain backticks that need to 
>> > be
>> >  # expanded, so use eval to start qemu
>> > -eval $cmdline >> test.log
>> > +errlog=$(mktemp)
>> > +eval $cmdline >> test.log 2> $errlog
>> | [...]
>> |  cat $errlog >> test.log
>> 
>> This assumes that stderr is always after stdout,
> 
> True. I'm not sure that matters when the unit test, which only uses stdout
> will always output stuff serially with qemu, which could output a mix. But
> your version below is fine by me if we want to pick up the need for the
> pipe and tee.

Yeah, I assume that QEMU can warn during the test, or interact with its
own stdout in an ordered manner.  I don't think it matter much, but
there isn't a significant draw-back.

>>   eval $cmdline 2>&1 >> test.log | tee $errlog >> test.log
>> 
>> has a chance to print lines in wrong order too, but I think it's going
>> to be closer to the original.
> 
> I'll play with it and send a v2 soon.

Thanks, though I am quite distracted during the end of the year, so
"soon" won't be truly appreciated. :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] vfio: Include No-IOMMU mode

2015-12-22 Thread Alex Williamson
There is really no way to safely give a user full access to a DMA
capable device without an IOMMU to protect the host system.  There is
also no way to provide DMA translation, for use cases such as device
assignment to virtual machines.  However, there are still those users
that want userspace drivers even under those conditions.  The UIO
driver exists for this use case, but does not provide the degree of
device access and programming that VFIO has.  In an effort to avoid
code duplication, this introduces a No-IOMMU mode for VFIO.

This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
the "enable_unsafe_noiommu_mode" option on the vfio driver.  This
should make it very clear that this mode is not safe.  Additionally,
CAP_SYS_RAWIO privileges are necessary to work with groups and
containers using this mode.  Groups making use of this support are
named /dev/vfio/noiommu-$GROUP and can only make use of the special
VFIO_NOIOMMU_IOMMU for the container.  Use of this mode, specifically
binding a device without a native IOMMU group to a VFIO bus driver
will taint the kernel and should therefore not be considered
supported.  This patch includes no-iommu support for the vfio-pci bus
driver only.

Signed-off-by: Alex Williamson 
Acked-by: Michael S. Tsirkin 
---

v4: Fix build without CONFIG_VFIO_NOIOMMU (oops).  Also avoid local
noiommu variable in vfio_create_group() to avoid scope confusion
with global of the same name.

 drivers/vfio/Kconfig|   15 
 drivers/vfio/pci/vfio_pci.c |8 +-
 drivers/vfio/vfio.c |  184 ++-
 include/linux/vfio.h|3 +
 include/uapi/linux/vfio.h   |7 ++
 5 files changed, 210 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 850d86c..da6e2ce 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -31,6 +31,21 @@ menuconfig VFIO
 
  If you don't know what to do here, say N.
 
+menuconfig VFIO_NOIOMMU
+   bool "VFIO No-IOMMU support"
+   depends on VFIO
+   help
+ VFIO is built on the ability to isolate devices using the IOMMU.
+ Only with an IOMMU can userspace access to DMA capable devices be
+ considered secure.  VFIO No-IOMMU mode enables IOMMU groups for
+ devices without IOMMU backing for the purpose of re-using the VFIO
+ infrastructure in a non-secure mode.  Use of this mode will result
+ in an unsupportable kernel and will therefore taint the kernel.
+ Device assignment to virtual machines is also not possible with
+ this mode since there is no IOMMU to provide DMA translation.
+
+ If you don't know what to do here, say N.
+
 source "drivers/vfio/pci/Kconfig"
 source "drivers/vfio/platform/Kconfig"
 source "virt/lib/Kconfig"
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 56bf6db..2760a7b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -940,13 +940,13 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL)
return -EINVAL;
 
-   group = iommu_group_get(>dev);
+   group = vfio_iommu_group_get(>dev);
if (!group)
return -EINVAL;
 
vdev = kzalloc(sizeof(*vdev), GFP_KERNEL);
if (!vdev) {
-   iommu_group_put(group);
+   vfio_iommu_group_put(group, >dev);
return -ENOMEM;
}
 
@@ -957,7 +957,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
 
ret = vfio_add_group_dev(>dev, _pci_ops, vdev);
if (ret) {
-   iommu_group_put(group);
+   vfio_iommu_group_put(group, >dev);
kfree(vdev);
return ret;
}
@@ -993,7 +993,7 @@ static void vfio_pci_remove(struct pci_dev *pdev)
if (!vdev)
return;
 
-   iommu_group_put(pdev->dev.iommu_group);
+   vfio_iommu_group_put(pdev->dev.iommu_group, >dev);
kfree(vdev);
 
if (vfio_pci_is_vga(pdev)) {
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 6070b79..82f25cc 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -62,6 +62,7 @@ struct vfio_container {
struct rw_semaphore group_lock;
struct vfio_iommu_driver*iommu_driver;
void*iommu_data;
+   boolnoiommu;
 };
 
 struct vfio_unbound_dev {
@@ -84,6 +85,7 @@ struct vfio_group {
struct list_headunbound_list;
struct mutexunbound_lock;
atomic_topened;
+   boolnoiommu;
 };
 
 struct vfio_device {
@@ -95,6 +97,128 @@ struct vfio_device {
void*device_data;
 };

RE: [RFC PATCH 2/5] KVM: add KVM_EXIT_MSR exit reason and capability.

2015-12-22 Thread Pavel Fedin
 Hello!

> It has: unlike the scenario that was the original motivation for Peter's
> patches, where the the userspace wanted to handle register accesses
> which the kernel *didn't*, in case of SynIC the userspace wants do
> something about MSR accesses *only* if the kernel *also* handles them.

 Well... I believe, that qemu knows if we are instantiating SynIC. And, if we 
are, it knows that the kernel will do something about
it. Otherwise these registers don't exist, and, by the way, the guest is not 
expected to touch them, is it?

> I guess that was the reason why Paolo suggested an extra exit_reason,
> and I think .handled field can be used to pass that information instead.

[skip]

> But the proposed use of .handled costs basically nothing, and it may
> prove useful in general (as a conisistency proof, if anything).

 Well... May be... So, i'm OK with it.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html