date:20150810



On 10/08/2015 16:14, Jintack Lim wrote:
  Yes, you just use the TSC. :)  However, you first have to check that the
  TSC is consistent across CPUs.  On older machines it's not, but the
  kernel can detect it.
 Thanks, Paolo.
 
 What would be the best way to check if TSC is consistent across CPUs?

You need to have boot_cpu_has(X86_FEATURE_CONSTANT_TSC) and
boot_cpu_has(X86_FEATURE_NONSTOP_TSC).

However, I would just use TAI (ktime_get_clocktai).  x86 KVM provides a
paravirtual interface that synchronizes CLOCK_TAI with the host, and
using it is the simplest way to get synchronized times between the host
and the guest.

Paolo

 Is it synchronized in nano second (or even cpu clock cycle) resolution?
 
 To get synchronized tsc across the host and the guest,
 just calling rdtscll() in host and guest would be enough?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: zero IDT limit on entry to SMM

2015-08-10 Thread Radim Krčmář

2015-08-07 12:54+0200, Paolo Bonzini:
 The recent BlackHat 2015 presentation The Memory Sinkhole
 mentions that the IDT limit is zeroed on entry to SMM.

Slide 64 of
https://www.blackhat.com/docs/us-15/materials/us-15-Domas-The-Memory-Sinkhole-Unleashing-An-x86-Design-Flaw-Allowing-Universal-Privilege-Escalation.pdf

 This is not documented, and must have changed some time after 2010
 (see http://www.ssi.gouv.fr/uploads/IMG/pdf/IT_Defense_2010_final.pdf).
 KVM was not doing it, but the fix is easy.

This patch also clears the IDT base.  Fetching original IDT is better
done from SMM saved state (and an anti-exploit based on comparing those
two seems unlikely) so it should be fine,

Reviewed-by: Radim Krčmář rkrc...@redhat.com

 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 ---

That takes care of Attack 1.
KVM is likely not vulnerable to attack 2 and 3 because of an emergent
security feature.  (A simple modification of kvm-unit-tests show that
mapping APIC base on top of real code/data makes the APIC page hidden
and I expect SMM memslot to behave similarly.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm:s390:Fix assumption that kvm_set_irq_routing is always run successfully



On 10/08/2015 17:21, nick wrote:
  Seems good.
  
  Paolo

 If it makes it easier for you to trust my patches I can show at least 10 bug 
 fixes for other subsystems
 to prove that I am trying to do this correctly.

That's up to those maintainers...

I definitely see some improvement in your patches, though I still
haven't answered the submissions that in my opinion are a bit more
problematic.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 1/4] vhost: Introduce a universal thread to serve all users

2015-08-10 Thread Bandan Das

Michael S. Tsirkin m...@redhat.com writes:

 On Mon, Jul 13, 2015 at 12:07:32AM -0400, Bandan Das wrote:
 vhost threads are per-device, but in most cases a single thread
 is enough. This change creates a single thread that is used to
 serve all guests.
 
 However, this complicates cgroups associations. The current policy
 is to attach the per-device thread to all cgroups of the parent process
 that the device is associated it. This is no longer possible if we
 have a single thread. So, we end up moving the thread around to
 cgroups of whichever device that needs servicing. This is a very
 inefficient protocol but seems to be the only way to integrate
 cgroups support.
 
 Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 Signed-off-by: Bandan Das b...@redhat.com

 BTW, how does this interact with virtio net MQ?
 It would seem that MQ gains from more parallelism and
 CPU locality.

Hm.. Good point. As of this version, this design will always have
one worker thread servicing a guest. Now suppose we have 10 virtio
queues for a guest, surely, we could benefit from spawning off another
worker just like we are doing in case of a new guest/device with
the devs_per_worker parameter.

 ---
  drivers/vhost/scsi.c  |  15 +++--
  drivers/vhost/vhost.c | 150 
 --
  drivers/vhost/vhost.h |  19 +--
  3 files changed, 97 insertions(+), 87 deletions(-)
 
 diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
 index ea32b38..6c42936 100644
 --- a/drivers/vhost/scsi.c
 +++ b/drivers/vhost/scsi.c
 @@ -535,7 +535,7 @@ static void vhost_scsi_complete_cmd(struct 
 vhost_scsi_cmd *cmd)
  
  llist_add(cmd-tvc_completion_list, vs-vs_completion_list);
  
 -vhost_work_queue(vs-dev, vs-vs_completion_work);
 +vhost_work_queue(vs-dev.worker, vs-vs_completion_work);
  }
  
  static int vhost_scsi_queue_data_in(struct se_cmd *se_cmd)
 @@ -1282,7 +1282,7 @@ vhost_scsi_send_evt(struct vhost_scsi *vs,
  }
  
  llist_add(evt-list, vs-vs_event_list);
 -vhost_work_queue(vs-dev, vs-vs_event_work);
 +vhost_work_queue(vs-dev.worker, vs-vs_event_work);
  }
  
  static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
 @@ -1335,8 +1335,8 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
  /* Flush both the vhost poll and vhost work */
  for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
  vhost_scsi_flush_vq(vs, i);
 -vhost_work_flush(vs-dev, vs-vs_completion_work);
 -vhost_work_flush(vs-dev, vs-vs_event_work);
 +vhost_work_flush(vs-dev.worker, vs-vs_completion_work);
 +vhost_work_flush(vs-dev.worker, vs-vs_event_work);
  
  /* Wait for all reqs issued before the flush to be finished */
  for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
 @@ -1584,8 +1584,11 @@ static int vhost_scsi_open(struct inode *inode, 
 struct file *f)
  if (!vqs)
  goto err_vqs;
  
 -vhost_work_init(vs-vs_completion_work, vhost_scsi_complete_cmd_work);
 -vhost_work_init(vs-vs_event_work, vhost_scsi_evt_work);
 +vhost_work_init(vs-dev, vs-vs_completion_work,
 +vhost_scsi_complete_cmd_work);
 +
 +vhost_work_init(vs-dev, vs-vs_event_work,
 +vhost_scsi_evt_work);
  
  vs-vs_events_nr = 0;
  vs-vs_events_missed = false;
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 2ee2826..951c96b 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -11,6 +11,8 @@
   * Generic code for virtio server in host kernel.
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/eventfd.h
  #include linux/vhost.h
  #include linux/uio.h
 @@ -28,6 +30,9 @@
  
  #include vhost.h
  
 +/* Just one worker thread to service all devices */
 +static struct vhost_worker *worker;
 +
  enum {
  VHOST_MEMORY_MAX_NREGIONS = 64,
  VHOST_MEMORY_F_LOG = 0x1,
 @@ -58,13 +63,15 @@ static int vhost_poll_wakeup(wait_queue_t *wait, 
 unsigned mode, int sync,
  return 0;
  }
  
 -void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 +void vhost_work_init(struct vhost_dev *dev,
 + struct vhost_work *work, vhost_work_fn_t fn)
  {
  INIT_LIST_HEAD(work-node);
  work-fn = fn;
  init_waitqueue_head(work-done);
  work-flushing = 0;
  work-queue_seq = work-done_seq = 0;
 +work-dev = dev;
  }
  EXPORT_SYMBOL_GPL(vhost_work_init);
  
 @@ -78,7 +85,7 @@ void vhost_poll_init(struct vhost_poll *poll, 
 vhost_work_fn_t fn,
  poll-dev = dev;
  poll-wqh = NULL;
  
 -vhost_work_init(poll-work, fn);
 +vhost_work_init(dev, poll-work, fn);
  }
  EXPORT_SYMBOL_GPL(vhost_poll_init);
  
 @@ -116,30 +123,30 @@ void vhost_poll_stop(struct vhost_poll *poll)
  }
  EXPORT_SYMBOL_GPL(vhost_poll_stop);
  
 -static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work 
 *work,
 -unsigned seq)
 +static bool vhost_work_seq_done(struct vhost_worker *worker,
 +struct

Re: [RFC PATCH 1/4] vhost: Introduce a universal thread to serve all users

2015-08-10 Thread Bandan Das

Bandan Das b...@redhat.com writes:

 Michael S. Tsirkin m...@redhat.com writes:

 On Mon, Jul 13, 2015 at 12:07:32AM -0400, Bandan Das wrote:
 vhost threads are per-device, but in most cases a single thread
 is enough. This change creates a single thread that is used to
 serve all guests.
 
 However, this complicates cgroups associations. The current policy
 is to attach the per-device thread to all cgroups of the parent process
 that the device is associated it. This is no longer possible if we
 have a single thread. So, we end up moving the thread around to
 cgroups of whichever device that needs servicing. This is a very
 inefficient protocol but seems to be the only way to integrate
 cgroups support.
 
 Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 Signed-off-by: Bandan Das b...@redhat.com

 BTW, how does this interact with virtio net MQ?
 It would seem that MQ gains from more parallelism and
 CPU locality.

 Hm.. Good point. As of this version, this design will always have
 one worker thread servicing a guest. Now suppose we have 10 virtio
 queues for a guest, surely, we could benefit from spawning off another
 worker just like we are doing in case of a new guest/device with
 the devs_per_worker parameter.

So, I did a quick smoke test with virtio-net and the Elvis patches.
virtio net MQ already spawns a new worker thread for every queue,
it seems ? So, the above setup already works! :) I will run some tests and
post back the results.

 ---
  drivers/vhost/scsi.c  |  15 +++--
  drivers/vhost/vhost.c | 150 
 --
  drivers/vhost/vhost.h |  19 +--
  3 files changed, 97 insertions(+), 87 deletions(-)
 
 diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
 index ea32b38..6c42936 100644
 --- a/drivers/vhost/scsi.c
 +++ b/drivers/vhost/scsi.c
 @@ -535,7 +535,7 @@ static void vhost_scsi_complete_cmd(struct 
 vhost_scsi_cmd *cmd)
  
 llist_add(cmd-tvc_completion_list, vs-vs_completion_list);
  
 -   vhost_work_queue(vs-dev, vs-vs_completion_work);
 +   vhost_work_queue(vs-dev.worker, vs-vs_completion_work);
  }
  
  static int vhost_scsi_queue_data_in(struct se_cmd *se_cmd)
 @@ -1282,7 +1282,7 @@ vhost_scsi_send_evt(struct vhost_scsi *vs,
 }
  
 llist_add(evt-list, vs-vs_event_list);
 -   vhost_work_queue(vs-dev, vs-vs_event_work);
 +   vhost_work_queue(vs-dev.worker, vs-vs_event_work);
  }
  
  static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
 @@ -1335,8 +1335,8 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
 /* Flush both the vhost poll and vhost work */
 for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
 vhost_scsi_flush_vq(vs, i);
 -   vhost_work_flush(vs-dev, vs-vs_completion_work);
 -   vhost_work_flush(vs-dev, vs-vs_event_work);
 +   vhost_work_flush(vs-dev.worker, vs-vs_completion_work);
 +   vhost_work_flush(vs-dev.worker, vs-vs_event_work);
  
 /* Wait for all reqs issued before the flush to be finished */
 for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
 @@ -1584,8 +1584,11 @@ static int vhost_scsi_open(struct inode *inode, 
 struct file *f)
 if (!vqs)
 goto err_vqs;
  
 -   vhost_work_init(vs-vs_completion_work, vhost_scsi_complete_cmd_work);
 -   vhost_work_init(vs-vs_event_work, vhost_scsi_evt_work);
 +   vhost_work_init(vs-dev, vs-vs_completion_work,
 +   vhost_scsi_complete_cmd_work);
 +
 +   vhost_work_init(vs-dev, vs-vs_event_work,
 +   vhost_scsi_evt_work);
  
 vs-vs_events_nr = 0;
 vs-vs_events_missed = false;
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 2ee2826..951c96b 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -11,6 +11,8 @@
   * Generic code for virtio server in host kernel.
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/eventfd.h
  #include linux/vhost.h
  #include linux/uio.h
 @@ -28,6 +30,9 @@
  
  #include vhost.h
  
 +/* Just one worker thread to service all devices */
 +static struct vhost_worker *worker;
 +
  enum {
 VHOST_MEMORY_MAX_NREGIONS = 64,
 VHOST_MEMORY_F_LOG = 0x1,
 @@ -58,13 +63,15 @@ static int vhost_poll_wakeup(wait_queue_t *wait, 
 unsigned mode, int sync,
 return 0;
  }
  
 -void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 +void vhost_work_init(struct vhost_dev *dev,
 +struct vhost_work *work, vhost_work_fn_t fn)
  {
 INIT_LIST_HEAD(work-node);
 work-fn = fn;
 init_waitqueue_head(work-done);
 work-flushing = 0;
 work-queue_seq = work-done_seq = 0;
 +   work-dev = dev;
  }
  EXPORT_SYMBOL_GPL(vhost_work_init);
  
 @@ -78,7 +85,7 @@ void vhost_poll_init(struct vhost_poll *poll, 
 vhost_work_fn_t fn,
 poll-dev = dev;
 poll-wqh = NULL;
  
 -   vhost_work_init(poll-work, fn);
 +   vhost_work_init(dev, poll-work, fn);
  }
  EXPORT_SYMBOL_GPL(vhost_poll_init);
  
 @@ -116,30 +123,30 @@ void vhost_poll_stop(struct vhost_poll *poll)
  }
  EXPORT_SYMBOL_GPL(vhost_poll_stop);

[Bug 102651] New: vcpuX unhandled rdmsr: 0x570

2015-08-10 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=102651

Bug ID: 102651
   Summary: vcpuX unhandled rdmsr: 0x570
   Product: Virtualization
   Version: unspecified
Kernel Version: 4.1.4
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
  Assignee: virtualization_...@kernel-bugs.osdl.org
  Reporter: jamespharve...@gmail.com
Regression: No

Host: Intel Xeon CPU E5335 @ 2.00GHz, stepping 7, microcode 0x6b

Up to date arch, using kernel 4.1.4 (-1 Arch) x86_64

Guest: Up to date arch x86_64

Every time I start the vm, tty0/dmesg gets:

==
[11681.871378] kvm [2469]: vcpu0 unhandled rdmsr: 0x570
[11681.871915] kvm [2469]: vcpu1 unhandled rdmsr: 0x570
[11681.872559] kvm [2469]: vcpu2 unhandled rdmsr: 0x570
[11681.873057] kvm [2469]: vcpu3 unhandled rdmsr: 0x570
[11681.873536] kvm [2469]: vcpu4 unhandled rdmsr: 0x570
[11681.874033] kvm [2469]: vcpu5 unhandled rdmsr: 0x570
[11681.874525] kvm [2469]: vcpu6 unhandled rdmsr: 0x570
==

(It is a 7 virtual cpu system.)

Looks like this was supposed to have been changed a while ago so it shows in
dmesg, but not tty0.  But, it's still showing in tty0.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 102651] vcpuX unhandled rdmsr: 0x570

2015-08-10 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=102651

--- Comment #1 from jamespharve...@gmail.com ---
The libvirt/qemu log shows:

2015-08-11 03:29:38.508+: starting up libvirt version: 1.2.18, qemu
version: 2.3.94
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
QEMU_AUDIO_DRV=spice /usr/sbin/qemu-system-x86_64 -name servo -S -machine
pc-q35-2.3,accel=kvm,usb=off,vmport=off -cpu
core2duo,+lahf_lm,+dca,+pdcm,+xtpr,+cx16,+tm2,+vmx,+ds_cpl,+dtes64,+pbe,+tm,+ht,+ss,+acpi,+ds
-m 3300 -realtime mlock=off -smp 7,sockets=7,cores=1,threads=1 -uuid
f6defdf1-369a-47e2-9611-92c04a0ff933 -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/servo.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew
-global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global
PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -boot strict=on -device
i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e -device
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x1 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x3 -drive
file=/dev/disk1/servo1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive
file=/dev/disk2/servo2,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=2
-drive
file=/dev/disk3/servo3,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x7,drive=drive-virtio-disk2,id=virtio-disk2,bootindex=3
-netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:61:7a:4e,bus=pci.2,addr=0x1
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0
-chardev spicevmc,id=charchannel0,name=vdagent -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0
-chardev
socket,id=charchannel1,path=/var/lib/libvirt/qemu/channel/target/servo.org.qemu.guest_agent.0,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
-spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pcie.0,addr=0x1
-device virtio-balloon-pci,id=balloon0,bus=pci.2,addr=0x4 -msg timestamp=on
char device redirected to /dev/pts/0 (label charserial0)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 0/4] Shared vhost design

2015-08-10 Thread Bandan Das

Michael S. Tsirkin m...@redhat.com writes:

 On Sat, Aug 08, 2015 at 07:06:38PM -0400, Bandan Das wrote:
 Hi Michael,
...

  - does the design address the issue of VM 1 being blocked
(e.g. because it hits swap) and blocking VM 2?
 Good question. I haven't thought of this yet. But IIUC,
 the worker thread will complete VM1's job and then move on to
 executing VM2's scheduled work.
 It doesn't matter if VM1 is
 blocked currently. I think it would be a problem though if/when
 polling is introduced.

 Sorry, I wasn't clear. If VM1's memory is in swap, attempts to
 access it might block the service thread, so it won't
 complete VM2's job.

Ah ok, I understand now. I am pretty sure the current RFC doesn't
take care of this :) I will add this to my todo list for v2.

Bandan



 
  
  #* Last run with the vCPU and I/O thread(s) pinned, no CPU/memory limit 
  imposed.
  #  I/O thread runs on CPU 14 or 15 depending on which guest it's serving
  
  There's a simple graph at
  http://people.redhat.com/~bdas/elvis/data/results.png
  that shows how task affinity results in a jump and even without it,
  as the number of guests increase, the shared vhost design performs
  slightly better.
  
  Observations:
  1. In terms of stock performance, the results are comparable.
  2. However, with a tuned setup, even without polling, we see an 
  improvement
  with the new design.
  3. Making the new design simulate old behavior would be a matter of 
  setting
  the number of guests per vhost threads to 1.
  4. Maybe, setting a per guest limit on the work being done by a specific 
  vhost
  thread is needed for it to be fair.
  5. cgroup associations needs to be figured out. I just slightly hacked the
  current cgroup association mechanism to work with the new model. Ccing 
  cgroups
  for input/comments.
  
  Many thanks to Razya Ladelsky and Eyal Moscovici, IBM for the initial
  patches, the helpful testing suggestions and discussions.
  
  Bandan Das (4):
vhost: Introduce a universal thread to serve all users
vhost: Limit the number of devices served by a single worker thread
cgroup: Introduce a function to compare cgroups
vhost: Add cgroup-aware creation of worker threads
  
   drivers/vhost/net.c|   6 +-
   drivers/vhost/scsi.c   |  18 ++--
   drivers/vhost/vhost.c  | 272 
  +++--
   drivers/vhost/vhost.h  |  32 +-
   include/linux/cgroup.h |   1 +
   kernel/cgroup.c|  40 
   6 files changed, 275 insertions(+), 94 deletions(-)
  
  -- 
  2.4.3
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 102301] Shutting down a Windowvs 10 virtual machine (with VGA passthrough) causes a hard crash, every time

2015-08-10 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=102301

Will Marler will.mar...@gmail.com changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |UNREPRODUCIBLE

--- Comment #1 from Will Marler will.mar...@gmail.com ---
No longer reproduces. Not sure what changed.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 4/5] KVM: introduce kvm_arch functions for IRQ bypass

On 08/07/2015 10:09 PM, Alex Williamson wrote:
 On Mon, 2015-08-03 at 19:20 +0200, Eric Auger wrote:
 This patch introduces
 - kvm_arch_irq_bypass_add_producer
 - kvm_arch_irq_bypass_del_producer
 - kvm_arch_irq_bypass_stop
 - kvm_arch_irq_bypass_start

 They make possible to specialize the KVM IRQ bypass consumer in
 case CONFIG_KVM_HAVE_IRQ_BYPASS is set.

 Signed-off-by: Eric Auger eric.au...@linaro.org
 Signed-off-by: Feng Wu feng...@intel.com

 ---

 v2 - v3 (Feng Wu):
 - use 'kvm_arch_irq_bypass_start' instead of 'kvm_arch_irq_bypass_resume'
 - Remove 'kvm_arch_irq_bypass_update', which is not needed to be
   a irqbypass callback per Alex's comments.
 - Make kvm_arch_irq_bypass_add_producer return 'int'

 v1 - v2:
 - use CONFIG_KVM_HAVE_IRQ_BYPASS instead CONFIG_IRQ_BYPASS_MANAGER
 - rename all functions according to Paolo's proposal
 - add kvm_arch_irq_bypass_update according to Feng's need
 ---
  include/linux/kvm_host.h | 33 +
  virt/kvm/Kconfig |  3 +++
  2 files changed, 36 insertions(+)

 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index 05e99b8..84b5feb 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -24,6 +24,7 @@
  #include linux/err.h
  #include linux/irqflags.h
  #include linux/context_tracking.h
 +#include linux/irqbypass.h
  #include asm/signal.h
  
  #include linux/kvm.h
 @@ -1151,5 +1152,37 @@ static inline void kvm_vcpu_set_dy_eligible(struct 
 kvm_vcpu *vcpu, bool val)
  {
  }
  #endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */
 +
 +#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
 +
 +int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *,
 +   struct irq_bypass_producer *);
 +void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *,
 +   struct irq_bypass_producer *);
 +void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *);
 +void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *);
 +
 +#else
 
 Do we really need static inline stubs?  When would they get used?
This addresses the case where another arch would use an irq bypass
producer (such as vfio platform driver ) and irqfd without
implementing/needing forwarding/posting stuff.

I see 2 solutions: either we have those stubs or in eventfd.c we guard
irq_bypass_unregister_consumer(irqfd-consumer);
and
ret = irq_bypass_register_consumer(irqfd-consumer);
by

#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
#endif

This latter solution maybe is better.

  How
 would they work since we call them via function pointers?
Just tested that without the ARM forwarding implementation of kvm_arch
functions and that runs fine.
 
 +
 +static inline int kvm_arch_irq_bypass_add_producer(
 +struct irq_bypass_consumer *cons,
 +struct irq_bypass_producer *prod)
 +{
 +return -1;
 
 No reason not to stick with standard errno values, is there?  -EINVAL
 Thanks,
sure

Thanks

Eric
 
 Alex
 
 +}
 +static inline void kvm_arch_irq_bypass_del_producer(
 +struct irq_bypass_consumer *cons,
 +struct irq_bypass_producer *prod)
 +{
 +}
 +static inline void kvm_arch_irq_bypass_stop(
 +struct irq_bypass_consumer *cons)
 +{
 +}
 +static inline void kvm_arch_irq_bypass_start(
 +struct irq_bypass_consumer *cons)
 +{
 +}
 +#endif /* CONFIG_HAVE_KVM_IRQ_BYPASS */
  #endif
  
 diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
 index e2c876d..9f8014d 100644
 --- a/virt/kvm/Kconfig
 +++ b/virt/kvm/Kconfig
 @@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
  config KVM_COMPAT
 def_bool y
 depends on COMPAT  !S390
 +
 +config HAVE_KVM_IRQ_BYPASS
 +   bool
 
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: rdtsc() in kvm-unit-tests on x86



On 07/08/2015 21:19, Jintack Lim wrote:
 Hi all,
 
 While I was looking at rdtsc() code in kvm-unit-tests (e.g. x86/vmexit.c),
 I was getting curious that out-of-order execution on the processor
 may make rdtsc() executed not in the place we expect.
 
 Referring to this document from intel,
 http://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-execution-paper.html
 they suggested to use rdtscp instruction and other techniques to
 serialize reading tsc register.
 
 I wonder how the serialization is achieved when using rdtsc() in
 kvm-unit-tests code.
 Or, maybe the serialization is not necessary for some reason?

kvm-unit-tests executes the instruction thousands of times, so any error
due to lack of serialization is lost in the noise.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 5/5] KVM: eventfd: add irq bypass consumer management

Hi Alex,
On 08/07/2015 10:09 PM, Alex Williamson wrote:
 On Mon, 2015-08-03 at 19:20 +0200, Eric Auger wrote:
 This patch adds the registration/unregistration of an
 irq_bypass_consumer on irqfd assignment/deassignment.

 Signed-off-by: Eric Auger eric.au...@linaro.org
 Signed-off-by: Feng Wu feng...@intel.com

 ---

 v2 - v3 (Feng Wu):
 - Use kvm_arch_irq_bypass_start
 - Remove kvm_arch_irq_bypass_update
 - Add member 'struct irq_bypass_producer *producer' in
   'struct kvm_kernel_irqfd', it is needed by posted interrupt.
 - Remove 'irq_bypass_unregister_consumer' in kvm_irqfd_deassign()

 v1 - v2:
 - populate of kvm and gsi removed
 - unregister the consumer on irqfd_shutdown
 ---
  include/linux/kvm_irqfd.h |  2 ++
  virt/kvm/eventfd.c| 10 ++
  2 files changed, 12 insertions(+)

 diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
 index f926b39..0c1de05 100644
 --- a/include/linux/kvm_irqfd.h
 +++ b/include/linux/kvm_irqfd.h
 @@ -64,6 +64,8 @@ struct kvm_kernel_irqfd {
  struct list_head list;
  poll_table pt;
  struct work_struct shutdown;
 +struct irq_bypass_consumer consumer;
 +struct irq_bypass_producer *producer;
  };
  
  #endif /* __LINUX_KVM_IRQFD_H */
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 647ffb8..08855de 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -35,6 +35,7 @@
  #include linux/srcu.h
  #include linux/slab.h
  #include linux/seqlock.h
 +#include linux/irqbypass.h
  #include trace/events/kvm.h
  
  #include kvm/iodev.h
 @@ -140,6 +141,7 @@ irqfd_shutdown(struct work_struct *work)
  /*
   * It is now safe to release the object's resources
   */
 +irq_bypass_unregister_consumer(irqfd-consumer);
  eventfd_ctx_put(irqfd-eventfd);
  kfree(irqfd);
  }
 @@ -380,6 +382,14 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd 
 *args)
   */
  fdput(f);
  
 +irqfd-consumer.token = (void *)irqfd-eventfd;
 +irqfd-consumer.add_producer = kvm_arch_irq_bypass_add_producer;
 +irqfd-consumer.del_producer = kvm_arch_irq_bypass_del_producer;
 +irqfd-consumer.stop = kvm_arch_irq_bypass_stop;
 +irqfd-consumer.start = kvm_arch_irq_bypass_start;
 +ret = irq_bypass_register_consumer(irqfd-consumer);
 +WARN_ON(ret);
 
 This seems like a lazy way to handle this error.  What is the stack
 trace from this WARN_ON going to tell us that we didn't already know?
 If we get the WARN_ON, it's probably means that an incompatible producer
 registered the token first.  It means we can't do bypass, but it doesn't
 tell us anything about whether bypass is or is not enabled.  Wouldn't a
 pr_info/debug() suffice for that?  Thanks,

Sure I will output some more relevant traces.

Thanks

Eric
 
 Alex
 
 +
  return 0;
  
  fail:
 
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH 1/4] vhost: Introduce a universal thread to serve all users

2015-08-10 Thread Michael S. Tsirkin

On Mon, Jul 13, 2015 at 12:07:32AM -0400, Bandan Das wrote:
 vhost threads are per-device, but in most cases a single thread
 is enough. This change creates a single thread that is used to
 serve all guests.
 
 However, this complicates cgroups associations. The current policy
 is to attach the per-device thread to all cgroups of the parent process
 that the device is associated it. This is no longer possible if we
 have a single thread. So, we end up moving the thread around to
 cgroups of whichever device that needs servicing. This is a very
 inefficient protocol but seems to be the only way to integrate
 cgroups support.
 
 Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 Signed-off-by: Bandan Das b...@redhat.com

BTW, how does this interact with virtio net MQ?
It would seem that MQ gains from more parallelism and
CPU locality.

 ---
  drivers/vhost/scsi.c  |  15 +++--
  drivers/vhost/vhost.c | 150 
 --
  drivers/vhost/vhost.h |  19 +--
  3 files changed, 97 insertions(+), 87 deletions(-)
 
 diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
 index ea32b38..6c42936 100644
 --- a/drivers/vhost/scsi.c
 +++ b/drivers/vhost/scsi.c
 @@ -535,7 +535,7 @@ static void vhost_scsi_complete_cmd(struct vhost_scsi_cmd 
 *cmd)
  
   llist_add(cmd-tvc_completion_list, vs-vs_completion_list);
  
 - vhost_work_queue(vs-dev, vs-vs_completion_work);
 + vhost_work_queue(vs-dev.worker, vs-vs_completion_work);
  }
  
  static int vhost_scsi_queue_data_in(struct se_cmd *se_cmd)
 @@ -1282,7 +1282,7 @@ vhost_scsi_send_evt(struct vhost_scsi *vs,
   }
  
   llist_add(evt-list, vs-vs_event_list);
 - vhost_work_queue(vs-dev, vs-vs_event_work);
 + vhost_work_queue(vs-dev.worker, vs-vs_event_work);
  }
  
  static void vhost_scsi_evt_handle_kick(struct vhost_work *work)
 @@ -1335,8 +1335,8 @@ static void vhost_scsi_flush(struct vhost_scsi *vs)
   /* Flush both the vhost poll and vhost work */
   for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
   vhost_scsi_flush_vq(vs, i);
 - vhost_work_flush(vs-dev, vs-vs_completion_work);
 - vhost_work_flush(vs-dev, vs-vs_event_work);
 + vhost_work_flush(vs-dev.worker, vs-vs_completion_work);
 + vhost_work_flush(vs-dev.worker, vs-vs_event_work);
  
   /* Wait for all reqs issued before the flush to be finished */
   for (i = 0; i  VHOST_SCSI_MAX_VQ; i++)
 @@ -1584,8 +1584,11 @@ static int vhost_scsi_open(struct inode *inode, struct 
 file *f)
   if (!vqs)
   goto err_vqs;
  
 - vhost_work_init(vs-vs_completion_work, vhost_scsi_complete_cmd_work);
 - vhost_work_init(vs-vs_event_work, vhost_scsi_evt_work);
 + vhost_work_init(vs-dev, vs-vs_completion_work,
 + vhost_scsi_complete_cmd_work);
 +
 + vhost_work_init(vs-dev, vs-vs_event_work,
 + vhost_scsi_evt_work);
  
   vs-vs_events_nr = 0;
   vs-vs_events_missed = false;
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index 2ee2826..951c96b 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -11,6 +11,8 @@
   * Generic code for virtio server in host kernel.
   */
  
 +#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
 +
  #include linux/eventfd.h
  #include linux/vhost.h
  #include linux/uio.h
 @@ -28,6 +30,9 @@
  
  #include vhost.h
  
 +/* Just one worker thread to service all devices */
 +static struct vhost_worker *worker;
 +
  enum {
   VHOST_MEMORY_MAX_NREGIONS = 64,
   VHOST_MEMORY_F_LOG = 0x1,
 @@ -58,13 +63,15 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned 
 mode, int sync,
   return 0;
  }
  
 -void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
 +void vhost_work_init(struct vhost_dev *dev,
 +  struct vhost_work *work, vhost_work_fn_t fn)
  {
   INIT_LIST_HEAD(work-node);
   work-fn = fn;
   init_waitqueue_head(work-done);
   work-flushing = 0;
   work-queue_seq = work-done_seq = 0;
 + work-dev = dev;
  }
  EXPORT_SYMBOL_GPL(vhost_work_init);
  
 @@ -78,7 +85,7 @@ void vhost_poll_init(struct vhost_poll *poll, 
 vhost_work_fn_t fn,
   poll-dev = dev;
   poll-wqh = NULL;
  
 - vhost_work_init(poll-work, fn);
 + vhost_work_init(dev, poll-work, fn);
  }
  EXPORT_SYMBOL_GPL(vhost_poll_init);
  
 @@ -116,30 +123,30 @@ void vhost_poll_stop(struct vhost_poll *poll)
  }
  EXPORT_SYMBOL_GPL(vhost_poll_stop);
  
 -static bool vhost_work_seq_done(struct vhost_dev *dev, struct vhost_work 
 *work,
 - unsigned seq)
 +static bool vhost_work_seq_done(struct vhost_worker *worker,
 + struct vhost_work *work, unsigned seq)
  {
   int left;
  
 - spin_lock_irq(dev-work_lock);
 + spin_lock_irq(worker-work_lock);
   left = seq - work-done_seq;
 - spin_unlock_irq(dev-work_lock);
 + spin_unlock_irq(worker-work_lock);
   return left = 0;
  }
  
 -void vhost_work_flush(struct

Re: Fwd: KVM : Virtio ring size

2015-08-10 Thread Stefan Hajnoczi

On Fri, Aug 07, 2015 at 10:48:50AM +0530, sai kiran wrote:
 I am experimenting on Virtio-net frontend driver. And I observe that
 the virtio ring size is communicated to guest as 256.
 I tried changing backend-qemu code manually, to propagate 512 ring size.
 
 But other than changing code and hardcoding, Is there anyway to
 configure the virtio ring size.

The ring size is hardcoded in the host.

If ring size is a problem, first check that the indirect vring feature
is enabled.  It allows each packet to take just 1 (indirect) descriptor
in the virtqueue.

Stefan


pgpMspK3_7oK7.pgp
Description: PGP signature

[PATCH v3 06/10] VFIO: platform: add irq bypass producer management

This patch populates the IRQ bypass callacks:
- stop/start producer simply consist in disabling/enabling the host irq
- add/del consumer: basically set the automasked flag to false/true

Signed-off-by: Eric Auger eric.au...@linaro.org

---
v2 - v3:
- vfio_platform_irq_bypass_add_consumer now returns an error in case
  the IRQ is recognized as active
- active boolean not set anymore
- do not VFIO mask the IRQ anymore on unforward

v1 - v2:
- device handle caching in vfio_platform_device is introduced in a
  separate patch
- use container_of
---
 drivers/vfio/platform/vfio_platform_irq.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index efaee58..400a188 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -224,23 +224,44 @@ static int vfio_platform_is_active(struct 
vfio_platform_irq *irq)
 
 static void vfio_platform_irq_bypass_stop(struct irq_bypass_producer *prod)
 {
+   disable_irq(prod-irq);
 }
 
 static void vfio_platform_irq_bypass_start(struct irq_bypass_producer *prod)
 {
+   enable_irq(prod-irq);
 }
 
 static int vfio_platform_irq_bypass_add_consumer(
struct irq_bypass_producer *prod,
struct irq_bypass_consumer *cons)
 {
-   return 0;
+   struct vfio_platform_irq *irq =
+   container_of(prod, struct vfio_platform_irq, producer);
+
+   /*
+* if the IRQ is active at irqchip level or VFIO (auto)masked
+* this means the host IRQ is already under injection in the
+* guest and this not safe to change the forwarding state at
+* that stage.
+* It is not possible to differentiate user-space masking
+* from auto-masking, leading to possible false detection of
+* active state.
+*/
+   if (vfio_platform_is_active(irq))
+   return -EAGAIN;
+
+   return vfio_platform_set_automasked(irq, false);
 }
 
 static void vfio_platform_irq_bypass_del_consumer(
struct irq_bypass_producer *prod,
struct irq_bypass_consumer *cons)
 {
+   struct vfio_platform_irq *irq =
+   container_of(prod, struct vfio_platform_irq, producer);
+
+   vfio_platform_set_automasked(irq, true);
 }
 
 static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 08/15] KVM: arm: add a trace event for cp14 traps

There are too many cp15 traps, so we don't reuse the cp15 trace event
but add a new trace event to trace the access of debug registers.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
Acked-by: Christoffer Dall christoffer.d...@linaro.org
---
 arch/arm/kvm/coproc.c | 14 ++
 arch/arm/kvm/trace.h  | 30 ++
 2 files changed, 44 insertions(+)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 2164f4e..d15b250 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -643,6 +643,13 @@ int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
params.Rt2 = (kvm_vcpu_get_hsr(vcpu)  10)  0xf;
params.CRm = 0;
 
+   if (global == cp14_regs)
+   trace_kvm_emulate_cp14_imp(params.Op1, params.Rt1, params.CRn,
+   params.CRm, params.Op2, params.is_write);
+   else
+   trace_kvm_emulate_cp15_imp(params.Op1, params.Rt1, params.CRn,
+   params.CRm, params.Op2, params.is_write);
+
if (!emulate_cp(vcpu, params, target_specific, nr_specific))
return 1;
if (!emulate_cp(vcpu, params, global, nr_global))
@@ -680,6 +687,13 @@ int kvm_handle_cp_32(struct kvm_vcpu *vcpu,
params.Op2 = (kvm_vcpu_get_hsr(vcpu)  17)  0x7;
params.Rt2 = 0;
 
+   if (global == cp14_regs)
+   trace_kvm_emulate_cp14_imp(params.Op1, params.Rt1, params.CRn,
+   params.CRm, params.Op2, params.is_write);
+   else
+   trace_kvm_emulate_cp15_imp(params.Op1, params.Rt1, params.CRn,
+   params.CRm, params.Op2, params.is_write);
+
if (!emulate_cp(vcpu, params, target_specific, nr_specific))
return 1;
if (!emulate_cp(vcpu, params, global, nr_global))
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 0ec3539..988da03 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -159,6 +159,36 @@ TRACE_EVENT(kvm_emulate_cp15_imp,
__entry-CRm, __entry-Op2)
 );
 
+/* Architecturally implementation defined CP14 register access */
+TRACE_EVENT(kvm_emulate_cp14_imp,
+   TP_PROTO(unsigned long Op1, unsigned long Rt1, unsigned long CRn,
+unsigned long CRm, unsigned long Op2, bool is_write),
+   TP_ARGS(Op1, Rt1, CRn, CRm, Op2, is_write),
+
+   TP_STRUCT__entry(
+   __field(unsigned int,   Op1 )
+   __field(unsigned int,   Rt1 )
+   __field(unsigned int,   CRn )
+   __field(unsigned int,   CRm )
+   __field(unsigned int,   Op2 )
+   __field(bool,   is_write)
+   ),
+
+   TP_fast_assign(
+   __entry-is_write   = is_write;
+   __entry-Op1= Op1;
+   __entry-Rt1= Rt1;
+   __entry-CRn= CRn;
+   __entry-CRm= CRm;
+   __entry-Op2= Op2;
+   ),
+
+   TP_printk(Implementation defined CP14: %s\tp14, %u, r%u, c%u, c%u, %u,
+   (__entry-is_write) ? mcr : mrc,
+   __entry-Op1, __entry-Rt1, __entry-CRn,
+   __entry-CRm, __entry-Op2)
+);
+
 TRACE_EVENT(kvm_wfx,
TP_PROTO(unsigned long vcpu_pc, bool is_wfe),
TP_ARGS(vcpu_pc, is_wfe),
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 07/15] KVM: arm: add trap handlers for 64-bit debug registers

Add handlers for all the 64-bit debug registers.

There is an overlap between 32 and 64bit registers. Make sure that
64-bit registers preceding 32-bit ones.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
---
 arch/arm/kvm/coproc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index b3627f0..2164f4e 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -440,6 +440,12 @@ static const struct coproc_reg cp15_regs[] = {
  * Trapped cp14 registers. We generally ignore most of the external
  * debug, on the principle that they don't really make sense to a
  * guest. Revisit this one day, should this principle change.
+ *
+ * CRn denotes the primary register number, but is copied to the CRm in the
+ * user space API for 64-bit register access in line with the terminology used
+ * in the ARM ARM.
+ * Important: Must be sorted ascending by CRn, CRM, Op1, Op2 and with 64-bit
+ *registers preceding 32-bit ones.
  */
 static const struct coproc_reg cp14_regs[] = {
/* DBGIDR */
@@ -447,10 +453,14 @@ static const struct coproc_reg cp14_regs[] = {
/* DBGDTRRXext */
{ CRn( 0), CRm( 0), Op1( 0), Op2( 2), is32, trap_raz_wi },
DBG_BCR_BVR_WCR_WVR(0),
+   /* DBGDRAR (64bit) */
+   { CRn( 0), CRm( 1), Op1( 0), Op2( 0), is64, trap_raz_wi },
/* DBGDSCRint */
{ CRn( 0), CRm( 1), Op1( 0), Op2( 0), is32, trap_dbgdscr,
NULL, cp14_DBGDSCRext },
DBG_BCR_BVR_WCR_WVR(1),
+   /* DBGDSAR (64bit) */
+   { CRn( 0), CRm( 2), Op1( 0), Op2( 0), is64, trap_raz_wi },
/* DBGDSCRext */
{ CRn( 0), CRm( 2), Op1( 0), Op2( 2), is32, trap_debug32,
reset_val, cp14_DBGDSCRext, 0 },
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 10/15] KVM: arm: implement world switch for debug registers

Implement switching of the debug registers. While the number
of registers is massive, CPUs usually don't implement them all
(A15 has 6 breakpoints and 4 watchpoints, which gives us a total
of 22 registers only).

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
---
 arch/arm/kvm/interrupts_head.S | 170 ++---
 1 file changed, 159 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 7ac5e51..b9e7410 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -5,6 +5,7 @@
 #define VCPU_USR_SP(VCPU_USR_REG(13))
 #define VCPU_USR_LR(VCPU_USR_REG(14))
 #define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4))
+#define CP14_OFFSET(_cp14_reg_idx) ((_cp14_reg_idx) * 4)
 
 /*
  * Many of these macros need to access the VCPU structure, which is always
@@ -239,6 +240,136 @@ vcpu  .reqr0  @ vcpu pointer always 
in r0
save_guest_regs_mode irq, #VCPU_IRQ_REGS
 .endm
 
+/* Assume r10/r11/r12 are in use, clobbers r2-r3 */
+.macro cp14_read_and_str base Op2 cp14_reg0 skip_num
+   adr r3, 1f
+   add r3, r3, \skip_num, lsl #3
+   bx  r3
+1:
+   mrc p14, 0, r2, c0, c15, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+15)]
+   mrc p14, 0, r2, c0, c14, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+14)]
+   mrc p14, 0, r2, c0, c13, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+13)]
+   mrc p14, 0, r2, c0, c12, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+12)]
+   mrc p14, 0, r2, c0, c11, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+11)]
+   mrc p14, 0, r2, c0, c10, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+10)]
+   mrc p14, 0, r2, c0, c9, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+9)]
+   mrc p14, 0, r2, c0, c8, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+8)]
+   mrc p14, 0, r2, c0, c7, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+7)]
+   mrc p14, 0, r2, c0, c6, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+6)]
+   mrc p14, 0, r2, c0, c5, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+5)]
+   mrc p14, 0, r2, c0, c4, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+4)]
+   mrc p14, 0, r2, c0, c3, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+3)]
+   mrc p14, 0, r2, c0, c2, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+2)]
+   mrc p14, 0, r2, c0, c1, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0+1)]
+   mrc p14, 0, r2, c0, c0, \Op2
+   str r2, [\base, #CP14_OFFSET(\cp14_reg0)]
+.endm
+
+/* Assume r11/r12 are in use, clobbers r2-r3 */
+.macro cp14_ldr_and_write base Op2 cp14_reg0 skip_num
+   adr r3, 1f
+   add r3, r3, \skip_num, lsl #3
+   bx  r3
+1:
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+15)]
+   mcr p14, 0, r2, c0, c15, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+14)]
+   mcr p14, 0, r2, c0, c14, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+13)]
+   mcr p14, 0, r2, c0, c13, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+12)]
+   mcr p14, 0, r2, c0, c12, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+11)]
+   mcr p14, 0, r2, c0, c11, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+10)]
+   mcr p14, 0, r2, c0, c10, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+9)]
+   mcr p14, 0, r2, c0, c9, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+8)]
+   mcr p14, 0, r2, c0, c8, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+7)]
+   mcr p14, 0, r2, c0, c7, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+6)]
+   mcr p14, 0, r2, c0, c6, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+5)]
+   mcr p14, 0, r2, c0, c5, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+4)]
+   mcr p14, 0, r2, c0, c4, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+3)]
+   mcr p14, 0, r2, c0, c3, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+2)]
+   mcr p14, 0, r2, c0, c2, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0+1)]
+   mcr p14, 0, r2, c0, c1, \Op2
+   ldr r2, [\base, #CP14_OFFSET(\cp14_reg0)]
+   mcr p14, 0, r2, c0, c0, \Op2
+.endm
+
+/* Get extract number of BRPs and WRPs. Saved in r11/r12 */
+.macro read_hw_dbg_num
+   mrc p14, 0, r2, c0, c0, 0
+   ubfxr11, r2, #24, #4
+   add r11, r11, #1@ Extract BRPs
+   ubfxr12, r2, #28, #4
+   add r12, r12, #1@ Extract WRPs
+   mov r2, #16
+   sub r11, r2, r11@ How many BPs to skip
+   sub r12, r2,

[PATCH v4 15/15] KVM: arm: enable trapping of all debug registers

Enable trapping of the debug registers unconditionally, allowing guests to
use the debug infrastructure.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
---
 arch/arm/kvm/interrupts_head.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 7ad0adf..494991d 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -792,7 +792,7 @@ ARM_BE8(rev r6, r6  )
  * (hardware reset value is 0) */
 .macro set_hdcr operation
mrc p15, 4, r2, c1, c1, 1
-   ldr r3, =(HDCR_TPM|HDCR_TPMCR)
+   ldr r3, =(HDCR_TPM|HDCR_TPMCR|HDCR_TDRA|HDCR_TDOSA|HDCR_TDA)
.if \operation == vmentry
orr r2, r2, r3  @ Trap some perfmon accesses
.else
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 0/5] KVM: irqfd consumer based on IRQ bypass manager

This series transforms irqfd into an IRQ bypass consumer and
introduce the infrastructure shared by Intel posted-interrupts
and ARM forwarded IRQ series.

The bypass manager gets compiled for x86 and arm/arm64 when
KVM is used. A new kvm_irqfd.h header is created to externalize
some irqfd declarations going to be used by architecture specific
code. A new CONFIG_HAVE_KVM_IRQ_BYPASS option is introduced
to enable architecture specific IRQ manager hooks.

the series depends on the IRQ bypass manager:
- [PATCH v4] virt: IRQ bypass manager (https://lkml.org/lkml/2015/8/6/526)

can be found at:
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.2-rc6-irq-forward-v3

Best Regards

Eric

History:

v4 - v5:
- remove kvm_arch static inline stubs
- add pr_info on registration failure

v3 - v4:
- fix compilation for arm/arm64
- rearrange signoffs/history on some patch files

v2 - v3 (Feng Wu):
- rebase on IRQ bypass manager [PATCH v2] virt: IRQ bypass manager:
  https://lkml.org/lkml/2015/7/16/810
- Correct some minor errors and typo
- Add something needed for posted-interrupts

v1 - v2:
- isolate the bypass manager and irqfd consumer in this series
- take into account Paolo's comments and use container_of strategy and
  remove additional fields introduced in v1.
- create kvm_irqfd.h
- add unregistration in irqfd_shutdown

v1: originally part of [RFC 00/17] ARM IRQ forward control based on IRQ
bypass manager (https://lkml.org/lkml/2015/7/2/268)

Eric Auger (4):
  KVM: arm/arm64: select IRQ_BYPASS_MANAGER
  KVM: create kvm_irqfd.h
  KVM: introduce kvm_arch functions for IRQ bypass
  KVM: eventfd: add irq bypass consumer management

Feng Wu (1):
  KVM: x86: select IRQ_BYPASS_MANAGER

 arch/arm/kvm/Kconfig  |   2 +
 arch/arm/kvm/Makefile |   1 +
 arch/arm64/kvm/Kconfig|   2 +
 arch/arm64/kvm/Makefile   |   1 +
 arch/x86/kvm/Kconfig  |   2 +
 arch/x86/kvm/Makefile |   3 ++
 include/linux/kvm_host.h  |  10 +
 include/linux/kvm_irqfd.h |  71 ++
 virt/kvm/Kconfig  |   3 ++
 virt/kvm/eventfd.c| 110 --
 10 files changed, 133 insertions(+), 72 deletions(-)
 create mode 100644 include/linux/kvm_irqfd.h

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 1/5] KVM: x86: select IRQ_BYPASS_MANAGER

From: Feng Wu feng...@intel.com

Select IRQ_BYPASS_MANAGER for x86 when CONFIG_KVM is set

Signed-off-by: Feng Wu feng...@intel.com
---
 arch/x86/kvm/Kconfig  | 2 ++
 arch/x86/kvm/Makefile | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index d8a1d56..c951d44 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -3,6 +3,7 @@
 #
 
 source virt/kvm/Kconfig
+source virt/lib/Kconfig
 
 menuconfig VIRTUALIZATION
bool Virtualization
@@ -28,6 +29,7 @@ config KVM
select ANON_INODES
select HAVE_KVM_IRQCHIP
select HAVE_KVM_IRQFD
+   select IRQ_BYPASS_MANAGER
select HAVE_KVM_IRQ_ROUTING
select HAVE_KVM_EVENTFD
select KVM_APIC_ARCHITECTURE
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 67d215c..05cc2d7 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -6,6 +6,9 @@ CFLAGS_svm.o := -I.
 CFLAGS_vmx.o := -I.
 
 KVM := ../../../virt/kvm
+LIB := ../../../virt/lib
+
+obj-$(CONFIG_IRQ_BYPASS_MANAGER)   += $(LIB)/
 
 kvm-y  += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 2/5] KVM: arm/arm64: select IRQ_BYPASS_MANAGER

Select IRQ_BYPASS_MANAGER when CONFIG_KVM is set
Also add compilation of virt/lib.

Signed-off-by: Eric Auger eric.au...@linaro.org
Signed-off-by: Feng Wu feng...@intel.com

---

v3 - v4:
- add compilation of virt/lib in arm/arm64 KVM

v2 - v3:
- [Feng Wu] Correct a typo in 'arch/arm64/kvm/Kconfig'

v1 - v2:
- also set IRQ_BYPASS_MANAGER for arm64
---
 arch/arm/kvm/Kconfig| 2 ++
 arch/arm/kvm/Makefile   | 1 +
 arch/arm64/kvm/Kconfig  | 2 ++
 arch/arm64/kvm/Makefile | 1 +
 4 files changed, 6 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index bfb915d..3c565b9 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -3,6 +3,7 @@
 #
 
 source virt/kvm/Kconfig
+source virt/lib/Kconfig
 
 menuconfig VIRTUALIZATION
bool Virtualization
@@ -31,6 +32,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select IRQ_BYPASS_MANAGER
depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
---help---
  Support hosting virtualized guest machines.
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index c5eef02c..a6a41dd 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -24,3 +24,4 @@ obj-y += $(KVM)/arm/vgic.o
 obj-y += $(KVM)/arm/vgic-v2.o
 obj-y += $(KVM)/arm/vgic-v2-emul.o
 obj-y += $(KVM)/arm/arch_timer.o
+obj-y += ../../../virt/lib/
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index bfffe8f..2509539 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -3,6 +3,7 @@
 #
 
 source virt/kvm/Kconfig
+source virt/lib/Kconfig
 
 menuconfig VIRTUALIZATION
bool Virtualization
@@ -31,6 +32,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select IRQ_BYPASS_MANAGER
---help---
  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index f90f4aa..55eec69 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -27,3 +27,4 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3-emul.o
 kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v3-switch.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
+kvm-$(CONFIG_KVM_ARM_HOST) += ../../../virt/lib/
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 0/7] KVM: arm/arm64: gsi routing support

Hi Pavel,
On 08/06/2015 02:06 PM, Pavel Fedin wrote:
 Tested-by: Pavel Fedin p.fe...@samsung.com

Many thanks for testing!

Best Regards

Eric
 
 Kind regards,
 Pavel Fedin
 Expert Engineer
 Samsung Electronics Research center Russia
 
 -Original Message-
 From: Eric Auger [mailto:eric.au...@linaro.org]
 Sent: Monday, August 03, 2015 6:31 PM
 To: eric.au...@st.com; eric.au...@linaro.org; 
 linux-arm-ker...@lists.infradead.org;
 kvm...@lists.cs.columbia.edu; kvm@vger.kernel.org; 
 christoffer.d...@linaro.org;
 marc.zyng...@arm.com
 Cc: linux-ker...@vger.kernel.org; patc...@linaro.org; pbonz...@redhat.com;
 andre.przyw...@arm.com; p.fe...@samsung.com
 Subject: [PATCH v3 0/7] KVM: arm/arm64: gsi routing support

 With the advent of GICv3 ITS in-kernel emulation, KVM GSI routing
 appears to be requested. More specifically MSI routing is needed.
 irqchip routing does not sound to be really useful on arm but usage of
 MSI routing also mandates to integrate irqchip routing. The initial
 implementation of irqfd on arm must be upgraded with the integration
 of kvm irqchip.c code and the implementation of its standard hooks
 in the architecture specific part.

 In case KVM_SET_GSI_ROUTING ioctl is not called, a default routing
 table with flat irqchip routing entries is built enabling to inject gsi
 corresponding to the SPI indexes seen by the guest.

 As soon as KVM_SET_GSI_ROUTING is called, user-space overwrites this
 default routing table and is responsible for building the whole routing
 table.

 for arm/arm64 KVM_SET_GSI_ROUTING has a limited support:
 - only applies to KVM_IRQFD and not to KVM_IRQ_LINE

 - irqchip routing was tested on Calxeda midway (VFIO with irqfd)
   with and without explicit routing
 - MSI routing without GICv3 ITS was tested using APM Xgene-I
   (qemu VIRTIO-PCI vhost-net without gsi_direct_mapping).
 - MSI routing with GICv3 ITS is *NOT* tested.

 Code can be found at
 https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.2-rc4-
 gsi-routing-v3

 It applies on Andre's [PATCH v2 00/15] KVM: arm64: GICv3 ITS emulation
 (http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355727.html)

 History:
 v2 - v3:
 - eventually got rid of KVM_IRQ_ROUTING_EXTENDED_MSI new type at user
   api level and use KVM_MSI_VALID_DEVID flag instead
 - remove usage of KVM_IRQ_ROUTING_EXTENDED_MSI type at kernel level too.
 - propagate user flags downto the kernel to make sure the userspace
   correctly set devid in GICv3 ITS case (still under discussion)

 v1 - v2:
 - user API changed:
   x devid id passed in kvm_irq_routing_msi
   x kept the new routing entry type: KVM_IRQ_ROUTING_EXTENDED_MSI
 - kvm_host.h: adopt Andre's proposal to replace the msi_msg by a struct
   composed of the msi_msg and devid in kvm_kernel_irq_routing_entry
 - Fix bug reported by Pavel: Added KVM_IRQ_ROUTING_EXTENDED_MSI handling
   in eventfd.c
 - added vgic_v2m_inject_msi in vgic-v2-emul.c as suggested by Andre
 - fix bug reported by Andre: bad setting of msi.flags and msi.devid
   in kvm_send_userspace_msi
 - avoid injecting reserved IRQ numbers in vgic_irqfd_set_irq

 RFC - PATCH:
 - clearly state limited support on arm/arm64:
   KVM_IRQ_LINE not impacted by GSI routing
 - add default routing table feature (new patch file)
 - changed uapi to use padding field area
 - reword api.txt

 Eric Auger (7):
   KVM: api: pass the devid in the msi routing entry
   KVM: kvm_host: add devid in kvm_kernel_irq_routing_entry
   KVM: irqchip: convey devid to kvm_set_msi
   KVM: arm/arm64: enable irqchip routing
   KVM: arm/arm64: build a default routing table
   KVM: arm/arm64: enable MSI routing
   KVM: arm: enable KVM_SIGNAL_MSI and MSI routing

  Documentation/virtual/kvm/api.txt |  35 ++---
  arch/arm/include/asm/kvm_host.h   |   2 +
  arch/arm/kvm/Kconfig  |   3 ++
  arch/arm/kvm/Makefile |   2 +-
  arch/arm64/include/asm/kvm_host.h |   1 +
  arch/arm64/kvm/Kconfig|   2 +
  arch/arm64/kvm/Makefile   |   2 +-
  include/kvm/arm_vgic.h|   2 -
  include/linux/kvm_host.h  |   8 ++-
  include/uapi/linux/kvm.h  |   5 +-
  virt/kvm/arm/vgic-v2-emul.c   |  16 ++
  virt/kvm/arm/vgic.c   | 107 
 ++
  virt/kvm/irqchip.c|   8 ++-
  13 files changed, 158 insertions(+), 35 deletions(-)

 --
 1.9.1
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 10/10] KVM: arm/arm64: implement IRQ bypass consumer functions

Implement IRQ bypass callbacks for arm/arm64 IRQ forwarding:
- kvm_arch_irq_bypass_add_producer: perform VGIC/irqchip
  settings for forwarding
- kvm_arch_irq_bypass_del_producer: same for inverse operation
- kvm_arch_irq_bypass_stop: halt guest execution
- kvm_arch_irq_bypass_start: resume guest execution

and set CONFIG_HAVE_KVM_IRQ_BYPASS for arm/arm64.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v2 - v3:
- kvm_arch_irq_bypass_resume renamed into kvm_arch_irq_bypass_start
- kvm_vgic_unset_forward does not take the active bool param anymore
- kvm_arch_irq_bypass_add_producer now returns an error value
- remove kvm_arch_irq_bypass_update

v1 - v2:
- struct kvm_kernel_irqfd is retrieved with container_of
- function names changed
---
 arch/arm/kvm/Kconfig   |  1 +
 arch/arm/kvm/arm.c | 35 +++
 arch/arm64/kvm/Kconfig |  1 +
 3 files changed, 37 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 3c565b9..655d277 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -33,6 +33,7 @@ config KVM
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
select IRQ_BYPASS_MANAGER
+   select HAVE_KVM_IRQ_BYPASS
depends on ARM_VIRT_EXT  ARM_LPAE  ARM_ARCH_TIMER
---help---
  Support hosting virtualized guest machines.
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 0529b38..7cfc5dc 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -27,6 +27,8 @@
 #include linux/mman.h
 #include linux/sched.h
 #include linux/kvm.h
+#include linux/kvm_irqfd.h
+#include linux/irqbypass.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -1149,6 +1151,39 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, 
unsigned long mpidr)
return NULL;
 }
 
+int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+   struct kvm_kernel_irqfd *irqfd =
+   container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+   return kvm_vgic_set_forward(irqfd-kvm, prod-irq, irqfd-gsi);
+}
+void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+   struct kvm_kernel_irqfd *irqfd =
+   container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+   kvm_vgic_unset_forward(irqfd-kvm, prod-irq, irqfd-gsi);
+}
+
+void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *cons)
+{
+   struct kvm_kernel_irqfd *irqfd =
+   container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+   kvm_arm_halt_guest(irqfd-kvm);
+}
+
+void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *cons)
+{
+   struct kvm_kernel_irqfd *irqfd =
+   container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+   kvm_arm_resume_guest(irqfd-kvm);
+}
+
 /**
  * Initialize Hyp-mode and memory mappings on all CPUs.
  */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 2509539..6f6e7a5 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -33,6 +33,7 @@ config KVM
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
select IRQ_BYPASS_MANAGER
+   select HAVE_KVM_IRQ_BYPASS
---help---
  Support hosting virtualized guest machines.
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 07/10] KVM: arm/arm64: vgic: Allow HW interrupts for non-shared devices

From: Marc Zyngier marc.zyng...@arm.com

So far, the only use of the HW interrupt facility was the timer,
implying that the active state is context-switched for each vcpu,
as the device is is shared across all vcpus.

This does not work for a device that has been assigned to a VM,
as the guest is entierely in control of that device (the HW is
not shared). In that case, it makes sense to bypass the whole
active state switching.

Also the VGIC state machine is adapted to support those assigned
(non shared) HW IRQs:
- nly can be sampled when it is pending
- when queueing the IRQ (programming the LR), the pending state is
  removed as for edge sensitive IRQs
- queued state is not modelled. Level state is not modelled
- its injection always is valid since steming from the HW.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Eric Auger eric.au...@linaro.org

---

- a mix of
  [PATCH v4 11/11] KVM: arm/arm64: vgic: Allow HW interrupts for
   non-shared devices
  [RFC v2 2/4] KVM: arm: vgic: fix state machine for forwarded IRQ
---
 include/kvm/arm_vgic.h|  6 +++--
 virt/kvm/arm/arch_timer.c |  3 ++-
 virt/kvm/arm/vgic.c   | 58 +++
 3 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index d901f1a..7ef9ce0 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -163,7 +163,8 @@ struct irq_phys_map {
u32 virt_irq;
u32 phys_irq;
u32 irq;
-   boolactive;
+   boolshared;
+   boolactive; /* Only valid if shared */
 };
 
 struct irq_phys_map_entry {
@@ -356,7 +357,8 @@ void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg);
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 int kvm_vgic_vcpu_active_irq(struct kvm_vcpu *vcpu);
 struct irq_phys_map *kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu,
-  int virt_irq, int irq);
+  int virt_irq, int irq,
+  bool shared);
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct irq_phys_map *map);
 bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
 void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 76e38d2..db21d8f 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -203,7 +203,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
 * Tell the VGIC that the virtual interrupt is tied to a
 * physical interrupt. We do that once per VCPU.
 */
-   map = kvm_vgic_map_phys_irq(vcpu, irq-irq, host_vtimer_irq);
+   map = kvm_vgic_map_phys_irq(vcpu, irq-irq,
+   host_vtimer_irq, true);
if (WARN_ON(IS_ERR(map)))
return PTR_ERR(map);
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 9eb489a..fbd5ba5 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -400,7 +400,11 @@ void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
 
 static bool vgic_can_sample_irq(struct kvm_vcpu *vcpu, int irq)
 {
-   return !vgic_irq_is_queued(vcpu, irq);
+   struct irq_phys_map *map = vgic_irq_map_search(vcpu, irq);
+   bool shared_hw = map  !map-shared;
+
+   return !vgic_irq_is_queued(vcpu, irq) ||
+   (shared_hw  vgic_dist_irq_is_pending(vcpu, irq));
 }
 
 /**
@@ -1150,19 +1154,26 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, 
int irq,
 * active in the physical world. Otherwise the
 * physical interrupt will fire and the guest will
 * exit before processing the virtual interrupt.
+*
+* This is of course only valid for a shared
+* interrupt. A non shared interrupt should already be
+* active.
 */
if (map) {
-   int ret;
-
-   BUG_ON(!map-active);
vlr.hwirq = map-phys_irq;
vlr.state |= LR_HW;
vlr.state = ~LR_EOI_INT;
 
-   ret = irq_set_irqchip_state(map-irq,
-   IRQCHIP_STATE_ACTIVE,
-   true);
-   WARN_ON(ret);
+   if (map-shared) {
+   int ret;
+
+   BUG_ON(!map-active);
+   ret = irq_set_irqchip_state(
+   map-irq,
+   IRQCHIP_STATE_ACTIVE,
+   true);
+

[PATCH v3 09/10] KVM: arm/arm64: vgic: forwarding control

Implements kvm_vgic_[set|unset]_forward.

Handle low-level VGIC programming: physical IRQ/guest IRQ mapping,
list register cleanup, VGIC state machine. Also interacts with
the irqchip.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v2 - v3:
- on unforward, we do not compute  output the active state anymore.
  This means if the unforward happens while the physical IRQ is
  active, we will not VFIO mask the IRQ while deactiving it. If a
  new physical IRQ hits, the corresponding virtual IRQ might not
  be injected (hence lost) due to VGIC state machine.

bypass rfc v2:
- use irq_set_vcpu_affinity API
- use irq_set_irqchip_state instead of chip-irq_eoi

bypass rfc:
- rename kvm_arch_{set|unset}_forward into
  kvm_vgic_{set|unset}_forward. Remove __KVM_HAVE_ARCH_HALT_GUEST.
  The function is bound to be called by ARM code only.

v4 - v5:
- fix arm64 compilation issues, ie. also defines
  __KVM_HAVE_ARCH_HALT_GUEST for arm64

v3 - v4:
- code originally located in kvm_vfio_arm.c
- kvm_arch_vfio_{set|unset}_forward renamed into
  kvm_arch_{set|unset}_forward
- split into 2 functions (set/unset) since unset does not fail anymore
- unset can be invoked at whatever time. Extra care is taken to handle
  transition in VGIC state machine, LR cleanup, ...

v2 - v3:
- renaming of kvm_arch_set_fwd_state into kvm_arch_vfio_set_forward
- takes a bool arg instead of kvm_fwd_irq_action enum
- removal of KVM_VFIO_IRQ_CLEANUP
- platform device check now happens here
- more precise errors returned
- irq_eoi handled externally to this patch (VGIC)
- correct enable_irq bug done twice
- reword the commit message
- correct check of platform_bus_type
- use raw_spin_lock_irqsave and check the validity of the handler
---
 include/kvm/arm_vgic.h |   6 ++
 virt/kvm/arm/vgic.c| 149 +
 2 files changed, 155 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 7ef9ce0..409ac0f 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -363,6 +363,12 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, struct 
irq_phys_map *map);
 bool kvm_vgic_get_phys_irq_active(struct irq_phys_map *map);
 void kvm_vgic_set_phys_irq_active(struct irq_phys_map *map, bool active);
 
+int kvm_vgic_set_forward(struct kvm *kvm,
+unsigned int host_irq, unsigned int guest_irq);
+
+void kvm_vgic_unset_forward(struct kvm *kvm,
+   unsigned int host_irq, unsigned int guest_irq);
+
 #define irqchip_in_kernel(k)   (!!((k)-arch.vgic.in_kernel))
 #define vgic_initialized(k)(!!((k)-arch.vgic.nr_cpus))
 #define vgic_ready(k)  ((k)-arch.vgic.ready)
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 03a85b3..b15999a 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2551,3 +2551,152 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
 {
return 0;
 }
+
+/**
+ * kvm_vgic_set_forward - Set IRQ forwarding
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ *
+ * This function is supposed to be called only if the IRQ
+ * is not in progress: ie. not active at GIC level and not
+ * currently under injection in the guest. The physical IRQ must
+ * also be disabled and the guest must have been exited and
+ * prevented from being re-entered.
+ */
+int kvm_vgic_set_forward(struct kvm *kvm,
+unsigned int host_irq,
+unsigned int guest_irq)
+{
+   struct irq_phys_map *map = NULL;
+   struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0);
+   int spi_id = guest_irq + VGIC_NR_PRIVATE_IRQS;
+
+   kvm_debug(%s host_irq=%d guest_irq=%d\n,
+   __func__, host_irq, guest_irq);
+
+   if (!vcpu)
+   return 0;
+
+   irq_set_vcpu_affinity(host_irq, vcpu);
+   /*
+* next physical IRQ will be be handled as forwarded
+* by the host (priority drop only)
+*/
+
+   map = kvm_vgic_map_phys_irq(vcpu, spi_id, host_irq, false);
+   /*
+* next guest_irq injection will be considered as
+* forwarded and next flush will program LR
+* without maintenance IRQ but with HW bit set
+*/
+   return !map;
+}
+
+/**
+ * kvm_vgic_unset_forward - Unset IRQ forwarding
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ *
+ * This function must be called when the host_irq is disabled
+ * and guest has been exited and prevented from being re-entered.
+ *
+ */
+void kvm_vgic_unset_forward(struct kvm *kvm,
+   unsigned int host_irq,
+   unsigned int guest_irq)
+{
+   struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0);
+   struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
+   struct vgic_dist *dist = kvm-arch.vgic;
+   int ret, lr;
+   struct vgic_lr vlr;
+   int spi_id = guest_irq + VGIC_NR_PRIVATE_IRQS;
+   bool

[PATCH v3 08/10] KVM: arm/arm64: vgic: support irqfd injection of a forwarded IRQ

Currently irqfd injection relies on kvm_vgic_inject_irq function.
However this function cannot be used anymore for mapped IRQs. So
let's change the implementation to use kvm_vgic_inject_mapped_irq
when the IRQ is forwarded.

Signed-off-by: Eric Auger eric.au...@linaro.org
---
 virt/kvm/arm/vgic.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index fbd5ba5..03a85b3 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2529,13 +2529,19 @@ int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned 
irqchip, unsigned pin)
 int kvm_set_irq(struct kvm *kvm, int irq_source_id,
u32 irq, int level, bool line_status)
 {
+   struct irq_phys_map *map;
unsigned int spi = irq + VGIC_NR_PRIVATE_IRQS;
 
trace_kvm_set_irq(irq, level, irq_source_id);
 
BUG_ON(!vgic_initialized(kvm));
 
-   return kvm_vgic_inject_irq(kvm, 0, spi, level);
+   map = vgic_irq_map_search(kvm_get_vcpu(kvm, 0), spi);
+
+   if (!map)
+   return kvm_vgic_inject_irq(kvm, 0, spi, level);
+   else
+   return kvm_vgic_inject_mapped_irq(kvm, 0, map, level);
 }
 
 /* MSI not implemented yet */
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 05/10] VFIO: platform: add vfio_platform_is_active

This function returns whether the IRQ is active at irqchip level or
VFIO masked. If either is true, it is considered the IRQ is active.
Currently there is no way to differentiate userspace masked IRQ from
automasked IRQ. There might be false detection of activity. However
it is currently acceptable to have false detection.

Signed-off-by: Eric Auger eric.au...@linaro.org

---
---
 drivers/vfio/platform/vfio_platform_irq.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index a285384..efaee58 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -205,6 +205,23 @@ static int vfio_platform_set_automasked(struct 
vfio_platform_irq *irq,
return 0;
 }
 
+static int vfio_platform_is_active(struct vfio_platform_irq *irq)
+{
+   unsigned long flags;
+   bool active, masked, outstanding;
+   int ret;
+
+   spin_lock_irqsave(irq-lock, flags);
+
+   ret = irq_get_irqchip_state(irq-hwirq, IRQCHIP_STATE_ACTIVE, active);
+   BUG_ON(ret);
+   masked = irq-masked;
+   outstanding = active || masked;
+
+   spin_unlock_irqrestore(irq-lock, flags);
+   return outstanding;
+}
+
 static void vfio_platform_irq_bypass_stop(struct irq_bypass_producer *prod)
 {
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 00/10] ARM IRQ forward control based on IRQ bypass manager

This series allows to set ARM IRQ forwarding between a VFIO platform
device physical IRQ and a guest virtual IRQ. The link is coordinated
by the IRQ bypass manager.

The principle is the VFIO platform driver registers an IRQ bypass producer
struct on VFIO_IRQ_SET_ACTION_TRIGGER while KVM irqfd registers a consumer
struct on the irqfd assignment. This leads to a handshake based on the
eventfd context (used as token) match. When either of the producer/consumer
disappears, an unregistration occurs and the link is disconnected.

This kernel integration deprecates the former kvm-vfio approach:
https://lkml.org/lkml/2015/4/13/353. Some rationale about that change can
be found in IRQ bypass manager thread: https://lkml.org/lkml/2015/6/29/268

Dependencies:
1- [PATCH v4 00/11] arm/arm64: KVM: Active interrupt state switching
for shared devices (http://www.spinics.net/lists/arm-kernel/msg437884.html)
except PATCH 11
2- [PATCH 0/6] irqchip: GICv2/v3: Add support for irq_vcpu_affinity
3- [PATCH v4] virt: IRQ bypass manager (https://lkml.org/lkml/2015/8/6/526)
4- [PATCH 0/2] KVM: arm/arm64: Guest synchronous halt/resume
(https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg950942.html)

All those pieces can be found at:
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/v4.2-rc6-irq-forward-v3

More backgroung on ARM IRQ forwarding in the text below and at
http://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf.

A forwarded IRQ is deactivated by the guest and not by the host.
When the guest deactivates the associated virtual IRQ, the interrupt
controller automatically completes the physical IRQ. Obviously
this requires some HW support in the interrupt controller. This is
the case for ARM GICv2.

The direct benefit is that, for a level sensitive IRQ, a VM exit
can be avoided on forwarded IRQ completion.

When the IRQ is forwarded, the VFIO platform driver does not need to
mask the physical IRQ anymore before signaling the eventfd. Indeed
genirq lowers the running priority, enabling other physical IRQ to hit
except that one.

Besides, the injection still is based on irqfd triggering. The only
impact on irqfd process is resamplefd is not called anymore on
virtual IRQ completion since deactivation is not trapped by KVM.

This was tested on Calxeda Midway, assigning the xgmac main IRQ

History:

v2 (RFC) - v3(PATCH):
- all dependencies now have a patch status
- we dropped the producer active boolean exchanged between the
VFIO producer and irqfd arm consumer. As a consequence, on
unforward, if the IRQ is active, this latter is deactivated
without VFIO-masking it. So we do not exactly come back to the
exact state where we would be in unforwarded state. A new
physical IRQ can hit while the previous virtual IRQ is under
treatment. Its injection in the guest may be rejected thanks
to the VGIC state machine. This IRQ will be lost but I don't
think this is a severe issue. In case no new IRQ hits, the
guest deactivation of the virtual IRQ will trigger the resamplefd
which will VFIO unmask the non-masked IRQ. This also has no
consequence.
- VFIO platform driver consumer_add now can fail. It rejects the
transition for forwarding state in case the IRQ is active
- the series is rebased on new irq_vcpu_affinity series
- no dependency anymore on chip/vgic adaptations for forwarded irq
which was partially integrated into Marc's series. A fix is still
needed through.
- Guest synchronous halt/resume patch re-integrated into this series
- integrate a new patch file coming mixing
[PATCH v4 11/11] KVM: arm/arm64: vgic: Allow HW interrupts for
non-shared devices
[RFC v2 2/4] KVM: arm: vgic: fix state machine for forwarded IRQ

v1 - v2:
- irq bypass manager and irqfd consumer moved in a separate patch
- kvm_arm_[halt,resume]_guest moved in a separate patch
- remove VFIO external functions since we do not need them anymore
- apply container_of strategy advised by Paolo. Only active field
remains and discussions will tell whether we get rid of it.
- renamed kvm_arch functions

- kvm-vfio v6 - RFC v1 based on IRQ bypass manager
see previous history in https://lkml.org/lkml/2015/4/13/353).

Best Regards

Eric

Eric Auger (9):
VFIO: platform: registration of a dummy IRQ bypass producer
VFIO: platform: test forwarded state when selecting the IRQ handler
VFIO: platform: single handler using function pointer
VFIO: platform: add vfio_platform_set_automasked
VFIO: platform: add vfio_platform_is_active
VFIO: platform: add irq bypass producer management
KVM: arm/arm64: vgic: support irqfd injection of a forwarded IRQ
KVM: arm/arm64: vgic: forwarding control
KVM: arm/arm64: implement IRQ bypass consumer functions

Marc Zyngier (1):
KVM: arm/arm64: vgic: Allow HW interrupts for non-shared devices

arch/arm/kvm/Kconfig | 1 +
arch/arm/kvm/arm.c| 35 +
arch/arm64/kvm/Kconfig

[PATCH v3 01/10] VFIO: platform: registration of a dummy IRQ bypass producer

Register a dummy producer with void callbacks

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v2 - v3:
- rename vfio_platform_irq_bypass_resume into *_start
---
 drivers/vfio/platform/vfio_platform_irq.c | 32 +++
 drivers/vfio/platform/vfio_platform_private.h |  2 ++
 2 files changed, 34 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 88bba57..b5cb8c7 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -20,6 +20,7 @@
 #include linux/types.h
 #include linux/vfio.h
 #include linux/irq.h
+#include linux/irqbypass.h
 
 #include vfio_platform_private.h
 
@@ -177,6 +178,27 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
+static void vfio_platform_irq_bypass_stop(struct irq_bypass_producer *prod)
+{
+}
+
+static void vfio_platform_irq_bypass_start(struct irq_bypass_producer *prod)
+{
+}
+
+static int vfio_platform_irq_bypass_add_consumer(
+   struct irq_bypass_producer *prod,
+   struct irq_bypass_consumer *cons)
+{
+   return 0;
+}
+
+static void vfio_platform_irq_bypass_del_consumer(
+   struct irq_bypass_producer *prod,
+   struct irq_bypass_consumer *cons)
+{
+}
+
 static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
int fd, irq_handler_t handler)
 {
@@ -186,6 +208,7 @@ static int vfio_set_trigger(struct vfio_platform_device 
*vdev, int index,
 
if (irq-trigger) {
free_irq(irq-hwirq, irq);
+   irq_bypass_unregister_producer(irq-producer);
kfree(irq-name);
eventfd_ctx_put(irq-trigger);
irq-trigger = NULL;
@@ -216,6 +239,15 @@ static int vfio_set_trigger(struct vfio_platform_device 
*vdev, int index,
return ret;
}
 
+   irq-producer.token = (void *)trigger;
+   irq-producer.irq = irq-hwirq;
+   irq-producer.add_consumer = vfio_platform_irq_bypass_add_consumer;
+   irq-producer.del_consumer = vfio_platform_irq_bypass_del_consumer;
+   irq-producer.stop = vfio_platform_irq_bypass_stop;
+   irq-producer.start = vfio_platform_irq_bypass_start;
+   ret = irq_bypass_register_producer(irq-producer);
+   WARN_ON(ret);
+
if (!irq-masked)
enable_irq(irq-hwirq);
 
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 1c9b3d5..1d2d4d6 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -17,6 +17,7 @@
 
 #include linux/types.h
 #include linux/interrupt.h
+#include linux/irqbypass.h
 
 #define VFIO_PLATFORM_OFFSET_SHIFT   40
 #define VFIO_PLATFORM_OFFSET_MASK (((u64)(1)  VFIO_PLATFORM_OFFSET_SHIFT) - 
1)
@@ -37,6 +38,7 @@ struct vfio_platform_irq {
spinlock_t  lock;
struct virqfd   *unmask;
struct virqfd   *mask;
+   struct irq_bypass_producer producer;
 };
 
 struct vfio_platform_region {
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 04/15] KVM: arm: common infrastructure for handling AArch32 CP14/CP15

As we're about to trap a bunch of CP14 registers, let's rework
the CP15 handling so it can be generalized and work with multiple
tables.

We stop trapping access here, because we haven't finished our trap
handlers. We will enable trapping agian until everything is OK.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
---
 arch/arm/kvm/coproc.c  | 168 ++---
 arch/arm/kvm/interrupts_head.S |   2 +-
 2 files changed, 110 insertions(+), 60 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 4db571d..d23395b 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -375,6 +375,9 @@ static const struct coproc_reg cp15_regs[] = {
{ CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
 };
 
+static const struct coproc_reg cp14_regs[] = {
+};
+
 /* Target specific emulation tables */
 static struct kvm_coproc_target_table *target_tables[KVM_ARM_NUM_TARGETS];
 
@@ -424,43 +427,75 @@ static const struct coproc_reg *find_reg(const struct 
coproc_params *params,
return NULL;
 }
 
-static int emulate_cp15(struct kvm_vcpu *vcpu,
-   const struct coproc_params *params)
+/*
+ * emulate_cp --  tries to match a cp14/cp15 access in a handling table,
+ *and call the corresponding trap handler.
+ *
+ * @params: pointer to the descriptor of the access
+ * @table: array of trap descriptors
+ * @num: size of the trap descriptor array
+ *
+ * Return 0 if the access has been handled, and -1 if not.
+ */
+static int emulate_cp(struct kvm_vcpu *vcpu,
+   const struct coproc_params *params,
+   const struct coproc_reg *table,
+   size_t num)
 {
-   size_t num;
-   const struct coproc_reg *table, *r;
-
-   trace_kvm_emulate_cp15_imp(params-Op1, params-Rt1, params-CRn,
-  params-CRm, params-Op2, params-is_write);
+   const struct coproc_reg *r;
 
-   table = get_target_table(vcpu-arch.target, num);
+   if (!table)
+   return -1;  /* Not handled */
 
-   /* Search target-specific then generic table. */
r = find_reg(params, table, num);
-   if (!r)
-   r = find_reg(params, cp15_regs, ARRAY_SIZE(cp15_regs));
 
-   if (likely(r)) {
+   if (r) {
/* If we don't have an accessor, we should never get here! */
BUG_ON(!r-access);
 
if (likely(r-access(vcpu, params, r))) {
/* Skip instruction, since it was emulated */
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
-   return 1;
}
-   /* If access function fails, it should complain. */
-   } else {
-   kvm_err(Unsupported guest CP15 access at: %08lx\n,
-   *vcpu_pc(vcpu));
-   print_cp_instr(params);
+
+   /* Handled */
+   return 0;
}
+
+   /* Not handled */
+   return -1;
+}
+
+static void unhandled_cp_access(struct kvm_vcpu *vcpu,
+   const struct coproc_params *params)
+{
+   u8 hsr_ec = kvm_vcpu_trap_get_class(vcpu);
+   int cp;
+
+   switch (hsr_ec) {
+   case HSR_EC_CP15_32:
+   case HSR_EC_CP15_64:
+   cp = 15;
+   break;
+   case HSR_EC_CP14_MR:
+   case HSR_EC_CP14_64:
+   cp = 14;
+   break;
+   default:
+   WARN_ON((cp = -1));
+   }
+
+   kvm_err(Unsupported guest CP%d access at: %08lx\n,
+   cp, *vcpu_pc(vcpu));
+   print_cp_instr(params);
kvm_inject_undefined(vcpu);
-   return 1;
 }
 
-static int kvm_handle_cp_64(struct kvm_vcpu *vcpu, struct kvm_run *run,
-   bool cp15)
+int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
+   const struct coproc_reg *global,
+   size_t nr_global,
+   const struct coproc_reg *target_specific,
+   size_t nr_specific)
 {
struct coproc_params params;
 
@@ -474,37 +509,15 @@ static int kvm_handle_cp_64(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
params.Rt2 = (kvm_vcpu_get_hsr(vcpu)  10)  0xf;
params.CRm = 0;
 
-   if (cp15)
-   return emulate_cp15(vcpu, params);
+   if (!emulate_cp(vcpu, params, target_specific, nr_specific))
+   return 1;
+   if (!emulate_cp(vcpu, params, global, nr_global))
+   return 1;
 
-   /* raz_wi cp14 */
-   (void)trap_raz_wi(vcpu, params, NULL);
-
-   /* handled */
-   kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+   unhandled_cp_access(vcpu, params);
return 1;
 }
 
-/**
- * kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15 access
- * @vcpu: The VCPU pointer
- * @run:  The kvm_run struct
- */
-int kvm_handle_cp15_64(struct

[PATCH v4 03/15] KVM: arm: enable to use the ARM_DSCR_MDBGEN macro from KVM assembly code

Add #ifndef __ASSEMBLY__ in hw_breakpoint.h, in order to use
the ARM_DSCR_MDBGEN macro from KVM assembly code.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
Reviewed-by: Alex Bennee alex.ben...@linaro.org
Acked-by: Christoffer Dall christoffer.d...@linaro.org
---
 arch/arm/include/asm/hw_breakpoint.h | 54 +++-
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/arch/arm/include/asm/hw_breakpoint.h 
b/arch/arm/include/asm/hw_breakpoint.h
index 8e427c7..f2f4c61 100644
--- a/arch/arm/include/asm/hw_breakpoint.h
+++ b/arch/arm/include/asm/hw_breakpoint.h
@@ -3,6 +3,8 @@
 
 #ifdef __KERNEL__
 
+#ifndef __ASSEMBLY__
+
 struct task_struct;
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
@@ -44,6 +46,33 @@ static inline void decode_ctrl_reg(u32 reg,
ctrl-mismatch  = reg  0x1;
 }
 
+struct notifier_block;
+struct perf_event;
+struct pmu;
+
+extern struct pmu perf_ops_bp;
+extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
+ int *gen_len, int *gen_type);
+extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
+extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
+extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
+  unsigned long val, void *data);
+
+extern u8 arch_get_debug_arch(void);
+extern u8 arch_get_max_wp_len(void);
+extern void clear_ptrace_hw_breakpoint(struct task_struct *tsk);
+
+int arch_install_hw_breakpoint(struct perf_event *bp);
+void arch_uninstall_hw_breakpoint(struct perf_event *bp);
+void hw_breakpoint_pmu_read(struct perf_event *bp);
+int hw_breakpoint_slots(int type);
+
+#else
+static inline void clear_ptrace_hw_breakpoint(struct task_struct *tsk) {}
+
+#endif /* CONFIG_HAVE_HW_BREAKPOINT */
+#endif  /* __ASSEMBLY */
+
 /* Debug architecture numbers. */
 #define ARM_DEBUG_ARCH_RESERVED0   /* In case of ptrace ABI 
updates. */
 #define ARM_DEBUG_ARCH_V6  1
@@ -110,30 +139,5 @@ static inline void decode_ctrl_reg(u32 reg,
asm volatile(mcr p14, 0, %0,  #N , #M ,  #OP2 : : r (VAL));\
 } while (0)
 
-struct notifier_block;
-struct perf_event;
-struct pmu;
-
-extern struct pmu perf_ops_bp;
-extern int arch_bp_generic_fields(struct arch_hw_breakpoint_ctrl ctrl,
- int *gen_len, int *gen_type);
-extern int arch_check_bp_in_kernelspace(struct perf_event *bp);
-extern int arch_validate_hwbkpt_settings(struct perf_event *bp);
-extern int hw_breakpoint_exceptions_notify(struct notifier_block *unused,
-  unsigned long val, void *data);
-
-extern u8 arch_get_debug_arch(void);
-extern u8 arch_get_max_wp_len(void);
-extern void clear_ptrace_hw_breakpoint(struct task_struct *tsk);
-
-int arch_install_hw_breakpoint(struct perf_event *bp);
-void arch_uninstall_hw_breakpoint(struct perf_event *bp);
-void hw_breakpoint_pmu_read(struct perf_event *bp);
-int hw_breakpoint_slots(int type);
-
-#else
-static inline void clear_ptrace_hw_breakpoint(struct task_struct *tsk) {}
-
-#endif /* CONFIG_HAVE_HW_BREAKPOINT */
 #endif /* __KERNEL__ */
 #endif /* _ARM_HW_BREAKPOINT_H */
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 01/15] KVM: arm: plug guest debug exploit

Hardware debugging in guests is not intercepted currently, it means
that a malicious guest can bring down the entire machine by writing
to the debug registers.

This patch enable trapping of all debug registers, preventing the guests
to access the debug registers.

This patch also disable the debug mode(DBGDSCR) in the guest world all
the time, preventing the guests to mess with the host state.

However, it is a precursor for later patches which will need to do
more to world switch debug states while necessary.

Cc: sta...@vger.kernel.org
Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
---
 arch/arm/include/asm/kvm_coproc.h |  3 +-
 arch/arm/kvm/coproc.c | 82 +--
 arch/arm/kvm/handle_exit.c|  4 +-
 arch/arm/kvm/interrupts.S | 12 +++---
 arch/arm/kvm/interrupts_head.S| 21 --
 5 files changed, 89 insertions(+), 33 deletions(-)

diff --git a/arch/arm/include/asm/kvm_coproc.h 
b/arch/arm/include/asm/kvm_coproc.h
index 4917c2f..e74ab0f 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -31,7 +31,8 @@ void kvm_register_target_coproc_table(struct 
kvm_coproc_target_table *table);
 int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
-int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index f3d88dc..a885cfe 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -91,12 +91,6 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
return 1;
 }
 
-int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
-{
-   kvm_inject_undefined(vcpu);
-   return 1;
-}
-
 static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
 {
/*
@@ -465,12 +459,8 @@ static int emulate_cp15(struct kvm_vcpu *vcpu,
return 1;
 }
 
-/**
- * kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15 access
- * @vcpu: The VCPU pointer
- * @run:  The kvm_run struct
- */
-int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
+static int kvm_handle_cp_64(struct kvm_vcpu *vcpu, struct kvm_run *run,
+   bool cp15)
 {
struct coproc_params params;
 
@@ -484,7 +474,35 @@ int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
params.Rt2 = (kvm_vcpu_get_hsr(vcpu)  10)  0xf;
params.CRm = 0;
 
-   return emulate_cp15(vcpu, params);
+   if (cp15)
+   return emulate_cp15(vcpu, params);
+
+   /* raz_wi cp14 */
+   (void)pm_fake(vcpu, params, NULL);
+
+   /* handled */
+   kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+   return 1;
+}
+
+/**
+ * kvm_handle_cp15_64 -- handles a mrrc/mcrr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+   return kvm_handle_cp_64(vcpu, run, 1);
+}
+
+/**
+ * kvm_handle_cp14_64 -- handles a mrrc/mcrr trap on a guest CP14 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp14_64(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+   return kvm_handle_cp_64(vcpu, run, 0);
 }
 
 static void reset_coproc_regs(struct kvm_vcpu *vcpu,
@@ -497,12 +515,8 @@ static void reset_coproc_regs(struct kvm_vcpu *vcpu,
table[i].reset(vcpu, table[i]);
 }
 
-/**
- * kvm_handle_cp15_32 -- handles a mrc/mcr trap on a guest CP15 access
- * @vcpu: The VCPU pointer
- * @run:  The kvm_run struct
- */
-int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
+static int kvm_handle_cp_32(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ bool cp15)
 {
struct coproc_params params;
 
@@ -516,7 +530,35 @@ int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
params.Op2 = (kvm_vcpu_get_hsr(vcpu)  17)  0x7;
params.Rt2 = 0;
 
-   return emulate_cp15(vcpu, params);
+   if (cp15)
+   return emulate_cp15(vcpu, params);
+
+   /* raz_wi cp14 */
+   (void)pm_fake(vcpu, params, NULL);
+
+   /* handled */
+   kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+   return 1;
+}
+
+/**
+ * kvm_handle_cp15_32 -- handles a mrc/mcr trap on a guest CP15 access
+ * @vcpu: The VCPU pointer
+ * @run:  The kvm_run struct
+ */
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+   return kvm_handle_cp_32(vcpu, run, 1);
+}
+

[PATCH v3 03/10] VFIO: platform: single handler using function pointer

A single handler now is registered whatever the use case: automasked
or not. A function pointer is set according to the wished behavior
and the handler calls this function.

The irq lock is taken/released in the root handler. eventfd_signal can
be called in regions not allowed to sleep.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v4: creation
---
 drivers/vfio/platform/vfio_platform_irq.c | 21 +++--
 drivers/vfio/platform/vfio_platform_private.h |  1 +
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index 40f057a..b31b1f0 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -148,11 +148,8 @@ static int vfio_platform_set_irq_unmask(struct 
vfio_platform_device *vdev,
 static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
 {
struct vfio_platform_irq *irq_ctx = dev_id;
-   unsigned long flags;
int ret = IRQ_NONE;
 
-   spin_lock_irqsave(irq_ctx-lock, flags);
-
if (!irq_ctx-masked) {
ret = IRQ_HANDLED;
 
@@ -161,8 +158,6 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, 
void *dev_id)
irq_ctx-masked = true;
}
 
-   spin_unlock_irqrestore(irq_ctx-lock, flags);
-
if (ret == IRQ_HANDLED)
eventfd_signal(irq_ctx-trigger, 1);
 
@@ -178,6 +173,19 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
+static irqreturn_t vfio_handler(int irq, void *dev_id)
+{
+   struct vfio_platform_irq *irq_ctx = dev_id;
+   unsigned long flags;
+   irqreturn_t ret;
+
+   spin_lock_irqsave(irq_ctx-lock, flags);
+   ret = irq_ctx-handler(irq, dev_id);
+   spin_unlock_irqrestore(irq_ctx-lock, flags);
+
+   return ret;
+}
+
 static void vfio_platform_irq_bypass_stop(struct irq_bypass_producer *prod)
 {
 }
@@ -229,9 +237,10 @@ static int vfio_set_trigger(struct vfio_platform_device 
*vdev, int index,
}
 
irq-trigger = trigger;
+   irq-handler = handler;
 
irq_set_status_flags(irq-hwirq, IRQ_NOAUTOEN);
-   ret = request_irq(irq-hwirq, handler, 0, irq-name, irq);
+   ret = request_irq(irq-hwirq, vfio_handler, 0, irq-name, irq);
if (ret) {
kfree(irq-name);
eventfd_ctx_put(trigger);
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 8b4f814..f848a6b 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -40,6 +40,7 @@ struct vfio_platform_irq {
struct virqfd   *mask;
struct irq_bypass_producer producer;
boolforwarded;
+   irqreturn_t (*handler)(int irq, void *dev_id);
 };
 
 struct vfio_platform_region {
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 05/15] KVM: arm: check ordering of all system register tables

We now have multiple tables for the various system registers
we trap. Make sure we check the order of all of them, as it is
critical that we get the order right (been there, done that...).

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
---
 arch/arm/kvm/coproc.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index d23395b..16d5f69 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -737,6 +737,9 @@ static struct coproc_reg invariant_cp15[] = {
{ CRn( 0), CRm( 0), Op1( 0), Op2( 3), is32, NULL, get_TLBTR },
{ CRn( 0), CRm( 0), Op1( 0), Op2( 6), is32, NULL, get_REVIDR },
 
+   { CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR },
+   { CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
+
{ CRn( 0), CRm( 1), Op1( 0), Op2( 0), is32, NULL, get_ID_PFR0 },
{ CRn( 0), CRm( 1), Op1( 0), Op2( 1), is32, NULL, get_ID_PFR1 },
{ CRn( 0), CRm( 1), Op1( 0), Op2( 2), is32, NULL, get_ID_DFR0 },
@@ -752,9 +755,6 @@ static struct coproc_reg invariant_cp15[] = {
{ CRn( 0), CRm( 2), Op1( 0), Op2( 3), is32, NULL, get_ID_ISAR3 },
{ CRn( 0), CRm( 2), Op1( 0), Op2( 4), is32, NULL, get_ID_ISAR4 },
{ CRn( 0), CRm( 2), Op1( 0), Op2( 5), is32, NULL, get_ID_ISAR5 },
-
-   { CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR },
-   { CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR },
 };
 
 /*
@@ -1297,13 +1297,29 @@ int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, 
u64 __user *uindices)
return write_demux_regids(uindices);
 }
 
+static int check_sysreg_table(const struct coproc_reg *table, unsigned int n)
+{
+   unsigned int i;
+
+   for (i = 1; i  n; i++) {
+   if (cmp_reg(table[i-1], table[i]) = 0) {
+   kvm_err(sys_reg table %p out of order (%d)\n,
+   table, i - 1);
+   return 1;
+   }
+   }
+
+   return 0;
+}
+
 void kvm_coproc_table_init(void)
 {
unsigned int i;
 
/* Make sure tables are unique and in order. */
-   for (i = 1; i  ARRAY_SIZE(cp15_regs); i++)
-   BUG_ON(cmp_reg(cp15_regs[i-1], cp15_regs[i]) = 0);
+   BUG_ON(check_sysreg_table(cp14_regs, ARRAY_SIZE(cp14_regs)));
+   BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs)));
+   BUG_ON(check_sysreg_table(invariant_cp15, ARRAY_SIZE(invariant_cp15)));
 
/* We abuse the reset function to overwrite the table itself. */
for (i = 0; i  ARRAY_SIZE(invariant_cp15); i++)
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 00/15] KVM: arm: debug infrastructure support

This patch series adds debug support, a key feature missing from the
KVM/armv7 port.

The main idea is to keep track of whether the host and the guest have any
break/watch points enabled or not. We only do the world switch for debug
registers when the host or the guest is actually using it.

We add a function reading the break/watch control variables directly to
indicate whether the host has enabled any break/watch points or not. 
We only call the function upon guest entry, after preempt_disable() and
local_irq_disable(), so there is no race for it.

We also tried implementing this series with trapping host use of debug
registers to hyp mode, and keep track of host/guest use of the debug
hardware in that way. This, however, proved to be very difficult,
because it requires us to: First, we have to add a new mechanism to
support trapping host, jump from EL2 to EL1 to run our host_trap_handlers,
and then jump back to the orginal code which trigger the trap.
Second, we have to take specail care when tearing down KVM to disable
the traps, we also impose an ordering requirement of whether KVM or
the breakpoint functionality gets initialized first. In the end we decided
that this was too difficult and convoluted compared to simply read the
values from variables, so we reverted back to this approach.

The amount of registers is properly frightening, but CPUs actually
only implement a subset of them. Also, there is a number of registers
we don't bother emulating (things having to do with external debug and
OSlock).

External debug is when you actually plug a physical JTAG into the CPU.
OSlock is a way to prevent other software to play with the debug
registers. My understanding is that it is only useful in combination
with the external debug. In both case, implementing support for this
is probably not worth the effort, at least for the time being.

This has been tested on a Cortex-A15 platform, running 32bit guests.

The patches for this series are based off v4.2-rc6 and can be found
at:

https://git.linaro.org/people/zhichao.huang/linux.git
branch: guest-debug/4.2-rc6-v4

From v3 [3]:
- Redefine kvm_cpu_context_t as a new struct including the cp14 states
- Save host cp14 states in the vcpu struct intead of memory
- Add a function to keep track of the host use of the debug registers
- Add new lazy world switch mechanism

From v2 [2]:
- Delete the debug mode enabling/disabling strategy
- Add missing cp14/cp15 trace events

From v1 [1]:
- Added missing cp14 reset functions
- Disable debug mode if we don't need it to reduce unnecessary switch

[1]: https://lists.cs.columbia.edu/pipermail/kvmarm/2015-May/014729.html
[2]: https://lists.cs.columbia.edu/pipermail/kvmarm/2015-May/014847.html
[3]: https://lists.cs.columbia.edu/pipermail/kvmarm/2015-June/015167.html

Zhichao Huang (15):
  KVM: arm: plug guest debug exploit
  KVM: arm: rename pm_fake handler to trap_raz_wi
  KVM: arm: enable to use the ARM_DSCR_MDBGEN macro from KVM assembly
code
  KVM: arm: common infrastructure for handling AArch32 CP14/CP15
  KVM: arm: check ordering of all system register tables
  KVM: arm: add trap handlers for 32-bit debug registers
  KVM: arm: add trap handlers for 64-bit debug registers
  KVM: arm: add a trace event for cp14 traps
  KVM: arm: redefine kvm_cpu_context_t to save the host cp14 states
  KVM: arm: implement world switch for debug registers
  KVM: arm: add a function to keep track of host use of the debug
registers
  KVM: arm: keep track of host use of the debug registers
  KVM: arm: keep track of guest use of the debug registers
  KVM: arm: implement lazy world switch for debug registers
  KVM: arm: enable trapping of all debug registers

 arch/arm/include/asm/hw_breakpoint.h |  59 +++--
 arch/arm/include/asm/kvm_asm.h   |  17 ++
 arch/arm/include/asm/kvm_coproc.h|   3 +-
 arch/arm/include/asm/kvm_host.h  |  13 +-
 arch/arm/kernel/asm-offsets.c|   6 +-
 arch/arm/kernel/hw_breakpoint.c  |  21 ++
 arch/arm/kvm/arm.c   |   2 +
 arch/arm/kvm/coproc.c| 445 ++-
 arch/arm/kvm/handle_exit.c   |   4 +-
 arch/arm/kvm/interrupts.S|  18 +-
 arch/arm/kvm/interrupts_head.S   | 188 ++-
 arch/arm/kvm/trace.h |  30 +++
 arch/arm64/include/asm/kvm_host.h|   1 +
 13 files changed, 707 insertions(+), 100 deletions(-)

-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 02/10] VFIO: platform: test forwarded state when selecting the IRQ handler

Add a new forwarded flag in vfio_platform_irq.  In case the IRQ
is forwarded, the VFIO platform IRQ handler does not need to
disable the IRQ anymore.

When setting the IRQ handler we now also test the forwarded state. In
case the IRQ is forwarded we select the vfio_irq_handler.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v2:
- add a new forwarded flag and do not use irqd_irq_forwarded anymore
---
 drivers/vfio/platform/vfio_platform_irq.c | 3 ++-
 drivers/vfio/platform/vfio_platform_private.h | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index b5cb8c7..40f057a 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -262,7 +262,8 @@ static int vfio_platform_set_irq_trigger(struct 
vfio_platform_device *vdev,
struct vfio_platform_irq *irq = vdev-irqs[index];
irq_handler_t handler;
 
-   if (vdev-irqs[index].flags  VFIO_IRQ_INFO_AUTOMASKED)
+   if (vdev-irqs[index].flags  VFIO_IRQ_INFO_AUTOMASKED 
+   !irq-forwarded)
handler = vfio_automasked_irq_handler;
else
handler = vfio_irq_handler;
diff --git a/drivers/vfio/platform/vfio_platform_private.h 
b/drivers/vfio/platform/vfio_platform_private.h
index 1d2d4d6..8b4f814 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -39,6 +39,7 @@ struct vfio_platform_irq {
struct virqfd   *unmask;
struct virqfd   *mask;
struct irq_bypass_producer producer;
+   boolforwarded;
 };
 
 struct vfio_platform_region {
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 02/15] KVM: arm: rename pm_fake handler to trap_raz_wi

pm_fake doesn't quite describe what the handler does (ignoring writes
and returning 0 for reads).

As we're about to use it (a lot) in a different context, rename it
with a (admitedly cryptic) name that make sense for all users.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
Reviewed-by: Alex Bennee alex.ben...@linaro.org
Acked-by: Christoffer Dall christoffer.d...@linaro.org
---
 arch/arm/kvm/coproc.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index a885cfe..4db571d 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -229,7 +229,7 @@ bool access_vm_reg(struct kvm_vcpu *vcpu,
  * must always support PMCCNTR (the cycle counter): we just RAZ/WI for
  * all PM registers, which doesn't crash the guest kernel at least.
  */
-static bool pm_fake(struct kvm_vcpu *vcpu,
+static bool trap_raz_wi(struct kvm_vcpu *vcpu,
const struct coproc_params *p,
const struct coproc_reg *r)
 {
@@ -239,19 +239,19 @@ static bool pm_fake(struct kvm_vcpu *vcpu,
return read_zero(vcpu, p);
 }
 
-#define access_pmcr pm_fake
-#define access_pmcntenset pm_fake
-#define access_pmcntenclr pm_fake
-#define access_pmovsr pm_fake
-#define access_pmselr pm_fake
-#define access_pmceid0 pm_fake
-#define access_pmceid1 pm_fake
-#define access_pmccntr pm_fake
-#define access_pmxevtyper pm_fake
-#define access_pmxevcntr pm_fake
-#define access_pmuserenr pm_fake
-#define access_pmintenset pm_fake
-#define access_pmintenclr pm_fake
+#define access_pmcr trap_raz_wi
+#define access_pmcntenset trap_raz_wi
+#define access_pmcntenclr trap_raz_wi
+#define access_pmovsr trap_raz_wi
+#define access_pmselr trap_raz_wi
+#define access_pmceid0 trap_raz_wi
+#define access_pmceid1 trap_raz_wi
+#define access_pmccntr trap_raz_wi
+#define access_pmxevtyper trap_raz_wi
+#define access_pmxevcntr trap_raz_wi
+#define access_pmuserenr trap_raz_wi
+#define access_pmintenset trap_raz_wi
+#define access_pmintenclr trap_raz_wi
 
 /* Architected CP15 registers.
  * CRn denotes the primary register number, but is copied to the CRm in the
@@ -478,7 +478,7 @@ static int kvm_handle_cp_64(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
return emulate_cp15(vcpu, params);
 
/* raz_wi cp14 */
-   (void)pm_fake(vcpu, params, NULL);
+   (void)trap_raz_wi(vcpu, params, NULL);
 
/* handled */
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
@@ -534,7 +534,7 @@ static int kvm_handle_cp_32(struct kvm_vcpu *vcpu, struct 
kvm_run *run,
return emulate_cp15(vcpu, params);
 
/* raz_wi cp14 */
-   (void)pm_fake(vcpu, params, NULL);
+   (void)trap_raz_wi(vcpu, params, NULL);
 
/* handled */
kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 04/10] VFIO: platform: add vfio_platform_set_automasked

This function makes possible to change the automasked mode.

Signed-off-by: Eric Auger eric.au...@linaro.org

---

v1 - v2:
- set forwarded flag
---
 drivers/vfio/platform/vfio_platform_irq.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c 
b/drivers/vfio/platform/vfio_platform_irq.c
index b31b1f0..a285384 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -186,6 +186,25 @@ static irqreturn_t vfio_handler(int irq, void *dev_id)
return ret;
 }
 
+static int vfio_platform_set_automasked(struct vfio_platform_irq *irq,
+  bool automasked)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(irq-lock, flags);
+   if (automasked) {
+   irq-forwarded = true;
+   irq-flags |= VFIO_IRQ_INFO_AUTOMASKED;
+   irq-handler = vfio_automasked_irq_handler;
+   } else {
+   irq-forwarded = false;
+   irq-flags = ~VFIO_IRQ_INFO_AUTOMASKED;
+   irq-handler = vfio_irq_handler;
+   }
+   spin_unlock_irqrestore(irq-lock, flags);
+   return 0;
+}
+
 static void vfio_platform_irq_bypass_stop(struct irq_bypass_producer *prod)
 {
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 06/15] KVM: arm: add trap handlers for 32-bit debug registers

Add handlers for all the 32-bit debug registers.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
---
 arch/arm/include/asm/kvm_asm.h  |  12 
 arch/arm/include/asm/kvm_host.h |   3 +
 arch/arm/kernel/asm-offsets.c   |   1 +
 arch/arm/kvm/coproc.c   | 124 
 4 files changed, 140 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 194c91b..7443a3a 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -52,6 +52,18 @@
 #define c10_AMAIR1 30  /* Auxilary Memory Attribute Indirection Reg1 */
 #define NR_CP15_REGS   31  /* Number of regs (incl. invalid) */
 
+/* 0 is reserved as an invalid value. */
+#define cp14_DBGBVR0   1   /* Debug Breakpoint Control Registers (0-15) */
+#define cp14_DBGBVR15  16
+#define cp14_DBGBCR0   17  /* Debug Breakpoint Value Registers (0-15) */
+#define cp14_DBGBCR15  32
+#define cp14_DBGWVR0   33  /* Debug Watchpoint Control Registers (0-15) */
+#define cp14_DBGWVR15  48
+#define cp14_DBGWCR0   49  /* Debug Watchpoint Value Registers (0-15) */
+#define cp14_DBGWCR15  64
+#define cp14_DBGDSCRext65  /* Debug Status and Control external */
+#define NR_CP14_REGS   66  /* Number of regs (incl. invalid) */
+
 #define ARM_EXCEPTION_RESET  0
 #define ARM_EXCEPTION_UNDEFINED   1
 #define ARM_EXCEPTION_SOFTWARE2
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index e896d2c..63ac005 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -124,6 +124,9 @@ struct kvm_vcpu_arch {
struct vgic_cpu vgic_cpu;
struct arch_timer_cpu timer_cpu;
 
+   /* System control coprocessor (cp14) */
+   u32 cp14[NR_CP14_REGS];
+
/*
 * Anything that is not used directly from assembly code goes
 * here.
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 871b826..9158de0 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -172,6 +172,7 @@ int main(void)
 #ifdef CONFIG_KVM_ARM_HOST
   DEFINE(VCPU_KVM, offsetof(struct kvm_vcpu, kvm));
   DEFINE(VCPU_MIDR,offsetof(struct kvm_vcpu, arch.midr));
+  DEFINE(VCPU_CP14,offsetof(struct kvm_vcpu, arch.cp14));
   DEFINE(VCPU_CP15,offsetof(struct kvm_vcpu, arch.cp15));
   DEFINE(VCPU_VFP_GUEST,   offsetof(struct kvm_vcpu, arch.vfp_guest));
   DEFINE(VCPU_VFP_HOST,offsetof(struct kvm_vcpu, 
arch.host_cpu_context));
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 16d5f69..b3627f0 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -220,6 +220,47 @@ bool access_vm_reg(struct kvm_vcpu *vcpu,
return true;
 }
 
+static bool trap_debug32(struct kvm_vcpu *vcpu,
+   const struct coproc_params *p,
+   const struct coproc_reg *r)
+{
+   if (p-is_write)
+   vcpu-arch.cp14[r-reg] = *vcpu_reg(vcpu, p-Rt1);
+   else
+   *vcpu_reg(vcpu, p-Rt1) = vcpu-arch.cp14[r-reg];
+
+   return true;
+}
+
+/* DBGIDR (RO) Debug ID */
+static bool trap_dbgidr(struct kvm_vcpu *vcpu,
+   const struct coproc_params *p,
+   const struct coproc_reg *r)
+{
+   u32 val;
+
+   if (p-is_write)
+   return ignore_write(vcpu, p);
+
+   ARM_DBG_READ(c0, c0, 0, val);
+   *vcpu_reg(vcpu, p-Rt1) = val;
+
+   return true;
+}
+
+/* DBGDSCRint (RO) Debug Status and Control Register */
+static bool trap_dbgdscr(struct kvm_vcpu *vcpu,
+   const struct coproc_params *p,
+   const struct coproc_reg *r)
+{
+   if (p-is_write)
+   return ignore_write(vcpu, p);
+
+   *vcpu_reg(vcpu, p-Rt1) = vcpu-arch.cp14[r-reg];
+
+   return true;
+}
+
 /*
  * We could trap ID_DFR0 and tell the guest we don't support performance
  * monitoring.  Unfortunately the patch to make the kernel check ID_DFR0 was
@@ -375,7 +416,90 @@ static const struct coproc_reg cp15_regs[] = {
{ CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar},
 };
 
+#define DBG_BCR_BVR_WCR_WVR(n) \
+   /* DBGBVRn */   \
+   { CRn( 0), CRm((n)), Op1( 0), Op2( 4), is32,\
+ trap_debug32, reset_val, (cp14_DBGBVR0 + (n)), 0 },   \
+   /* DBGBCRn */   \
+   { CRn( 0), CRm((n)), Op1( 0), Op2( 5), is32,\
+ trap_debug32, reset_val, (cp14_DBGBCR0 + (n)), 0 },   \
+   /* DBGWVRn */   \
+   { CRn( 0), CRm((n)), Op1( 0), Op2( 6), is32,\
+ trap_debug32, reset_val, (cp14_DBGWVR0 + (n)), 0 },   \
+   /* DBGWCRn */   \
+   { CRn( 0), CRm((n)), Op1(

[PATCH v4 09/15] KVM: arm: redefine kvm_cpu_context_t to save the host cp14 states

Redefine kvm_cpu_context_t as a new struct that include the cp14 states,
which we used to save the host cp14 states.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
---
 arch/arm/include/asm/kvm_host.h | 6 +-
 arch/arm/kernel/asm-offsets.c   | 4 +++-
 arch/arm/kvm/interrupts.S   | 6 --
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 63ac005..fc461ac 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -91,7 +91,11 @@ struct kvm_vcpu_fault_info {
u32 hyp_pc; /* PC when exception was taken from Hyp mode */
 };
 
-typedef struct vfp_hard_struct kvm_cpu_context_t;
+struct kvm_cpu_context {
+   struct vfp_hard_struct vfp;
+   u32 cp14[NR_CP14_REGS];
+};
+typedef struct kvm_cpu_context kvm_cpu_context_t;
 
 struct kvm_vcpu_arch {
struct kvm_regs regs;
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 9158de0..cee4254 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -175,7 +175,9 @@ int main(void)
   DEFINE(VCPU_CP14,offsetof(struct kvm_vcpu, arch.cp14));
   DEFINE(VCPU_CP15,offsetof(struct kvm_vcpu, arch.cp15));
   DEFINE(VCPU_VFP_GUEST,   offsetof(struct kvm_vcpu, arch.vfp_guest));
-  DEFINE(VCPU_VFP_HOST,offsetof(struct kvm_vcpu, 
arch.host_cpu_context));
+  DEFINE(VCPU_HOST_CONTEXT,offsetof(struct kvm_vcpu, 
arch.host_cpu_context));
+  DEFINE(VCPU_VFP_HOST,offsetof(struct kvm_cpu_context, vfp));
+  DEFINE(VCPU_CP14_HOST,   offsetof(struct kvm_cpu_context, cp14));
   DEFINE(VCPU_REGS,offsetof(struct kvm_vcpu, arch.regs));
   DEFINE(VCPU_USR_REGS,offsetof(struct kvm_vcpu, 
arch.regs.usr_regs));
   DEFINE(VCPU_SVC_REGS,offsetof(struct kvm_vcpu, 
arch.regs.svc_regs));
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 48333ff..d4ff22e 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -176,8 +176,9 @@ __kvm_vcpu_return:
@ Switch VFP/NEON hardware state to the host's
add r7, vcpu, #VCPU_VFP_GUEST
store_vfp_state r7
-   add r7, vcpu, #VCPU_VFP_HOST
+   add r7, vcpu, #VCPU_HOST_CONTEXT
ldr r7, [r7]
+   add r7, r7, #VCPU_VFP_HOST
restore_vfp_state r7
 
 after_vfp_restore:
@@ -484,8 +485,9 @@ switch_to_guest_vfp:
set_hcptr vmtrap, (HCPTR_TCP(10) | HCPTR_TCP(11))
 
@ Switch VFP/NEON hardware state to the guest's
-   add r7, r0, #VCPU_VFP_HOST
+   add r7, r0, #VCPU_HOST_CONTEXT
ldr r7, [r7]
+   add r7, r7, #VCPU_VFP_HOST
store_vfp_state r7
add r7, r0, #VCPU_VFP_GUEST
restore_vfp_state r7
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 14/15] KVM: arm: implement lazy world switch for debug registers

Avoid world-switching all the debug registers when neither the host
nor the guest has configured any [WB]points.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
---
 arch/arm/kvm/interrupts_head.S | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index b9e7410..7ad0adf 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -332,6 +332,21 @@ vcpu   .reqr0  @ vcpu pointer always 
in r0
sub r12, r2, r12@ How many WPs to skip
 .endm
 
+/* If VCPU_DEBUG_FLAGS is not set, it means that neither the host
+ * nor the guest has configured any [WB]points.
+ *
+ * We avoid world-switching all the debug registers in that case.
+ *
+ * Assume vcpu pointer in vcpu reg
+ *
+ * clobbers r2
+ */
+.macro skip_debug_state target
+   ldr r2, [vcpu, #VCPU_DEBUG_FLAGS]
+   cmp r2, #0
+   beq \target
+.endm
+
 /* Reads cp14 registers from hardware.
  * Writes cp14 registers in-order to the CP14 struct pointed to by r10
  *
@@ -340,12 +355,14 @@ vcpu  .reqr0  @ vcpu pointer always 
in r0
  * Clobbers r2-r12
  */
 .macro save_debug_state
+   skip_debug_state f
+
read_hw_dbg_num
cp14_read_and_str r10, 4, cp14_DBGBVR0, r11
cp14_read_and_str r10, 5, cp14_DBGBCR0, r11
cp14_read_and_str r10, 6, cp14_DBGWVR0, r12
cp14_read_and_str r10, 7, cp14_DBGWCR0, r12
-
+:
/* DBGDSCR reg */
mrc p14, 0, r2, c0, c1, 0
str r2, [r10, #CP14_OFFSET(cp14_DBGDSCRext)]
@@ -359,12 +376,14 @@ vcpu  .reqr0  @ vcpu pointer always 
in r0
  * Clobbers r2-r12
  */
 .macro restore_debug_state
+   skip_debug_state f
+
read_hw_dbg_num
cp14_ldr_and_write r10, 4, cp14_DBGBVR0, r11
cp14_ldr_and_write r10, 5, cp14_DBGBCR0, r11
cp14_ldr_and_write r10, 6, cp14_DBGWVR0, r12
cp14_ldr_and_write r10, 7, cp14_DBGWCR0, r12
-
+:
/* DBGDSCR reg */
ldr r2, [r10, #CP14_OFFSET(cp14_DBGDSCRext)]
mcr p14, 0, r2, c0, c2, 2
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 12/15] KVM: arm: keep track of host use of the debug registers

Every guest entry, we need to keep track of host use of the debug
registers.

We only call the function upon guest entry, after preempt_disable()
and local_irq_disable(), so there is no race for it.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
---
 arch/arm/include/asm/kvm_asm.h| 3 +++
 arch/arm/include/asm/kvm_host.h   | 4 
 arch/arm/kernel/asm-offsets.c | 1 +
 arch/arm/kvm/arm.c| 2 ++
 arch/arm/kvm/coproc.c | 8 
 arch/arm64/include/asm/kvm_host.h | 1 +
 6 files changed, 19 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 7443a3a..5b1c3eb 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -64,6 +64,9 @@
 #define cp14_DBGDSCRext65  /* Debug Status and Control external */
 #define NR_CP14_REGS   66  /* Number of regs (incl. invalid) */
 
+#define KVM_ARM_DEBUG_HOST_INUSE_SHIFT 0
+#define KVM_ARM_DEBUG_HOST_INUSE   (1  KVM_ARM_DEBUG_HOST_INUSE_SHIFT)
+
 #define ARM_EXCEPTION_RESET  0
 #define ARM_EXCEPTION_UNDEFINED   1
 #define ARM_EXCEPTION_SOFTWARE2
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index fc461ac..7338f34 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -131,6 +131,9 @@ struct kvm_vcpu_arch {
/* System control coprocessor (cp14) */
u32 cp14[NR_CP14_REGS];
 
+   /* Debug State */
+   u32 debug_flags;
+
/*
 * Anything that is not used directly from assembly code goes
 * here.
@@ -237,5 +240,6 @@ static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index cee4254..7750597 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -178,6 +178,7 @@ int main(void)
   DEFINE(VCPU_HOST_CONTEXT,offsetof(struct kvm_vcpu, 
arch.host_cpu_context));
   DEFINE(VCPU_VFP_HOST,offsetof(struct kvm_cpu_context, vfp));
   DEFINE(VCPU_CP14_HOST,   offsetof(struct kvm_cpu_context, cp14));
+  DEFINE(VCPU_DEBUG_FLAGS, offsetof(struct kvm_vcpu, arch.debug_flags));
   DEFINE(VCPU_REGS,offsetof(struct kvm_vcpu, arch.regs));
   DEFINE(VCPU_USR_REGS,offsetof(struct kvm_vcpu, 
arch.regs.usr_regs));
   DEFINE(VCPU_SVC_REGS,offsetof(struct kvm_vcpu, 
arch.regs.svc_regs));
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index bc738d2..0129346 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -550,6 +550,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
continue;
}
 
+   kvm_arm_setup_debug(vcpu);
+
/**
 * Enter the guest
 */
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index d15b250..b37afd6 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -1516,3 +1516,11 @@ void kvm_reset_coprocs(struct kvm_vcpu *vcpu)
if (vcpu-arch.cp15[num] == 0x42424242)
panic(Didn't reset vcpu-arch.cp15[%zi], num);
 }
+
+void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
+{
+   if (hw_breakpoint_enabled())
+   vcpu-arch.debug_flags |= KVM_ARM_DEBUG_HOST_INUSE;
+   else
+   vcpu-arch.debug_flags = ~KVM_ARM_DEBUG_HOST_INUSE;
+}
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 2709db2..84fa4f0 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -226,5 +226,6 @@ static inline void kvm_arch_hardware_unsetup(void) {}
 static inline void kvm_arch_sync_events(struct kvm *kvm) {}
 static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
 static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
+static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {}
 
 #endif /* __ARM64_KVM_HOST_H__ */
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 11/15] KVM: arm: add a function to keep track of host use of the debug registers

As we're about to implement a lazy world switch for debug registers,
we add a function reading the break/watch control variables directly to
indicate whether the host has enabled any break/watch points or not.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
---
 arch/arm/include/asm/hw_breakpoint.h |  5 +
 arch/arm/kernel/hw_breakpoint.c  | 21 +
 2 files changed, 26 insertions(+)

diff --git a/arch/arm/include/asm/hw_breakpoint.h 
b/arch/arm/include/asm/hw_breakpoint.h
index f2f4c61..6f375c5 100644
--- a/arch/arm/include/asm/hw_breakpoint.h
+++ b/arch/arm/include/asm/hw_breakpoint.h
@@ -66,9 +66,14 @@ int arch_install_hw_breakpoint(struct perf_event *bp);
 void arch_uninstall_hw_breakpoint(struct perf_event *bp);
 void hw_breakpoint_pmu_read(struct perf_event *bp);
 int hw_breakpoint_slots(int type);
+bool hw_breakpoint_enabled(void);
 
 #else
 static inline void clear_ptrace_hw_breakpoint(struct task_struct *tsk) {}
+static inline bool hw_breakpoint_enabled(void)
+{
+   return false;
+}
 
 #endif /* CONFIG_HAVE_HW_BREAKPOINT */
 #endif  /* __ASSEMBLY */
diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index dc7d0a9..f56788f 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -227,6 +227,27 @@ static int get_num_brps(void)
return core_has_mismatch_brps() ? brps - 1 : brps;
 }
 
+/* Indicate whether the host has enabled any break/watch points or not. */
+bool hw_breakpoint_enabled(void)
+{
+   struct perf_event **slots;
+   int i;
+
+   slots = this_cpu_ptr(bp_on_reg);
+   for (i = 0; i  core_num_brps; i++) {
+   if (slots[i])
+   return true;
+   }
+
+   slots = this_cpu_ptr(wp_on_reg);
+   for (i = 0; i  core_num_wrps; i++) {
+   if (slots[i])
+   return true;
+   }
+
+   return false;
+}
+
 /*
  * In order to access the breakpoint/watchpoint control registers,
  * we must be running in debug monitor mode. Unfortunately, we can
-- 
1.7.12.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v4 13/15] KVM: arm: keep track of guest use of the debug registers

We trap debug register accesses from guest all the time, and read the
BCR/WCR to indicate whether the guest has enabled any break/watch points
or not.

Signed-off-by: Zhichao Huang zhichao.hu...@linaro.org
---
 arch/arm/include/asm/kvm_asm.h |  2 ++
 arch/arm/kvm/coproc.c  | 75 +++---
 2 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 5b1c3eb..e9e1f0a 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -65,7 +65,9 @@
 #define NR_CP14_REGS   66  /* Number of regs (incl. invalid) */
 
 #define KVM_ARM_DEBUG_HOST_INUSE_SHIFT 0
+#define KVM_ARM_DEBUG_GUEST_INUSE_SHIFT1
 #define KVM_ARM_DEBUG_HOST_INUSE   (1  KVM_ARM_DEBUG_HOST_INUSE_SHIFT)
+#define KVM_ARM_DEBUG_GUEST_INUSE  (1  KVM_ARM_DEBUG_GUEST_INUSE_SHIFT)
 
 #define ARM_EXCEPTION_RESET  0
 #define ARM_EXCEPTION_UNDEFINED   1
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index b37afd6..d9dcd28b 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -220,7 +220,22 @@ bool access_vm_reg(struct kvm_vcpu *vcpu,
return true;
 }
 
-static bool trap_debug32(struct kvm_vcpu *vcpu,
+/* Indicate whether the guest has enabled any break/watch points or not. */
+static bool guest_debug_in_use(struct kvm_vcpu *vcpu)
+{
+   unsigned int i;
+
+   for (i = 0; i  ARM_MAX_BRP; i++)
+   if (vcpu-arch.cp14[cp14_DBGBCR0 + i]  0x1)
+   return true;
+   for (i = 0; i  ARM_MAX_WRP; i++)
+   if (vcpu-arch.cp14[cp14_DBGWCR0 + i]  0x1)
+   return true;
+
+   return false;
+}
+
+static bool __trap_debug32(struct kvm_vcpu *vcpu,
const struct coproc_params *p,
const struct coproc_reg *r)
 {
@@ -232,6 +247,56 @@ static bool trap_debug32(struct kvm_vcpu *vcpu,
return true;
 }
 
+/*
+ * We want to avoid world-switching all the DBG registers all the
+ * time:
+ *
+ * When we are about to run a guest, we have the following cases:
+ *
+ * 1) Neither the host nor the guest has configured any [WB]points
+ * 2) Only the host has configured any [WB]points
+ * 3) Only the guest has configured any [WB]points
+ * 4) Both the host and the guest have configured any [WB]points
+ *
+ * - In case (1), KVM should enable trapping and swtich the register
+ *   state on guest accesses.
+ * - In cases (2), (3), and (4) we must switch the register state on each
+ *   entry/exit.
+ *
+ * For ARMv7, if the CONFIG_HAVE_HW_BREAKPOINT is set, ARM_DSCR_MDBGEN
+ * is always set(ARM64 use it to indicate that debug registers are actively
+ * in use).
+ *
+ * - We add a function reading the break/watch control variables directly to
+ *   indicate whether the host has enabled any break/watch points or not.
+ *   We only call the function upon guest entry, after preempt_disable() and
+ *   local_irq_disable(), so there is no race for it.
+ *
+ * - We trap debug register accesses from guest all the time, and read the
+ *   BCR/WCR to indicate whether the guest has enabled any break/watch points
+ *   or not.
+ *
+ * For this, we can keep track of the host/guest use of debug registers,
+ * and skip the save/restore dance when neither the host nor the guest has
+ * configured any [WB]points.
+ */
+static bool trap_debug32(struct kvm_vcpu *vcpu,
+   const struct coproc_params *p,
+   const struct coproc_reg *r)
+{
+   __trap_debug32(vcpu, p, r);
+
+   if (p-is_write) {
+   if ((vcpu-arch.cp14[r-reg]  0x1) ||
+   guest_debug_in_use(vcpu))
+   vcpu-arch.debug_flags |= KVM_ARM_DEBUG_GUEST_INUSE;
+   else
+   vcpu-arch.debug_flags = ~KVM_ARM_DEBUG_GUEST_INUSE;
+   }
+
+   return true;
+}
+
 /* DBGIDR (RO) Debug ID */
 static bool trap_dbgidr(struct kvm_vcpu *vcpu,
const struct coproc_params *p,
@@ -419,13 +484,13 @@ static const struct coproc_reg cp15_regs[] = {
 #define DBG_BCR_BVR_WCR_WVR(n) \
/* DBGBVRn */   \
{ CRn( 0), CRm((n)), Op1( 0), Op2( 4), is32,\
- trap_debug32, reset_val, (cp14_DBGBVR0 + (n)), 0 },   \
+ __trap_debug32, reset_val, (cp14_DBGBVR0 + (n)), 0 }, \
/* DBGBCRn */   \
{ CRn( 0), CRm((n)), Op1( 0), Op2( 5), is32,\
  trap_debug32, reset_val, (cp14_DBGBCR0 + (n)), 0 },   \
/* DBGWVRn */   \
{ CRn( 0), CRm((n)), Op1( 0), Op2( 6), is32,\
- trap_debug32, reset_val, (cp14_DBGWVR0 + (n)), 0 },   \
+ __trap_debug32, reset_val, (cp14_DBGWVR0 + (n)), 0 }, \
/* DBGWCRn */   \
{ CRn( 0),

[PATCH v5 5/5] KVM: eventfd: add irq bypass consumer management

This patch adds the registration/unregistration of an
irq_bypass_consumer on irqfd assignment/deassignment.

Signed-off-by: Eric Auger eric.au...@linaro.org
Signed-off-by: Feng Wu feng...@intel.com

---

v4 - v5:
- due to removal of static inline stubs, add
  #ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
  around consumer registration/unregistration
- add pr_info when registration fails

v2 - v3 (Feng Wu):
- Use kvm_arch_irq_bypass_start
- Remove kvm_arch_irq_bypass_update
- Add member 'struct irq_bypass_producer *producer' in
  'struct kvm_kernel_irqfd', it is needed by posted interrupt.
- Remove 'irq_bypass_unregister_consumer' in kvm_irqfd_deassign()

v1 - v2:
- populate of kvm and gsi removed
- unregister the consumer on irqfd_shutdown
---
 include/linux/kvm_irqfd.h |  2 ++
 virt/kvm/eventfd.c| 15 +++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index f926b39..0c1de05 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -64,6 +64,8 @@ struct kvm_kernel_irqfd {
struct list_head list;
poll_table pt;
struct work_struct shutdown;
+   struct irq_bypass_consumer consumer;
+   struct irq_bypass_producer *producer;
 };
 
 #endif /* __LINUX_KVM_IRQFD_H */
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 647ffb8..d7a230f 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -35,6 +35,7 @@
 #include linux/srcu.h
 #include linux/slab.h
 #include linux/seqlock.h
+#include linux/irqbypass.h
 #include trace/events/kvm.h
 
 #include kvm/iodev.h
@@ -140,6 +141,9 @@ irqfd_shutdown(struct work_struct *work)
/*
 * It is now safe to release the object's resources
 */
+#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+   irq_bypass_unregister_consumer(irqfd-consumer);
+#endif
eventfd_ctx_put(irqfd-eventfd);
kfree(irqfd);
 }
@@ -379,6 +383,17 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
 * we might race against the POLLHUP
 */
fdput(f);
+#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+   irqfd-consumer.token = (void *)irqfd-eventfd;
+   irqfd-consumer.add_producer = kvm_arch_irq_bypass_add_producer;
+   irqfd-consumer.del_producer = kvm_arch_irq_bypass_del_producer;
+   irqfd-consumer.stop = kvm_arch_irq_bypass_stop;
+   irqfd-consumer.start = kvm_arch_irq_bypass_start;
+   ret = irq_bypass_register_consumer(irqfd-consumer);
+   if (ret)
+   pr_info(irq bypass consumer (token %p) registration fails: 
%d\n,
+   irqfd-consumer.token, ret);
+#endif
 
return 0;
 
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 4/5] KVM: introduce kvm_arch functions for IRQ bypass

This patch introduces
- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop
- kvm_arch_irq_bypass_start

They make possible to specialize the KVM IRQ bypass consumer in
case CONFIG_KVM_HAVE_IRQ_BYPASS is set.

Signed-off-by: Eric Auger eric.au...@linaro.org
Signed-off-by: Feng Wu feng...@intel.com

---
v4 - v5:
- remove static inline stub functions

v2 - v3 (Feng Wu):
- use 'kvm_arch_irq_bypass_start' instead of 'kvm_arch_irq_bypass_resume'
- Remove 'kvm_arch_irq_bypass_update', which is not needed to be
  a irqbypass callback per Alex's comments.
- Make kvm_arch_irq_bypass_add_producer return 'int'

v1 - v2:
- use CONFIG_KVM_HAVE_IRQ_BYPASS instead CONFIG_IRQ_BYPASS_MANAGER
- rename all functions according to Paolo's proposal
- add kvm_arch_irq_bypass_update according to Feng's need
---
 include/linux/kvm_host.h | 10 ++
 virt/kvm/Kconfig |  3 +++
 2 files changed, 13 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 05e99b8..5ac8d21 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -24,6 +24,7 @@
 #include linux/err.h
 #include linux/irqflags.h
 #include linux/context_tracking.h
+#include linux/irqbypass.h
 #include asm/signal.h
 
 #include linux/kvm.h
@@ -1151,5 +1152,14 @@ static inline void kvm_vcpu_set_dy_eligible(struct 
kvm_vcpu *vcpu, bool val)
 {
 }
 #endif /* CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT */
+
+#ifdef CONFIG_HAVE_KVM_IRQ_BYPASS
+int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *,
+  struct irq_bypass_producer *);
+void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *,
+  struct irq_bypass_producer *);
+void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *);
+void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *);
+#endif /* CONFIG_HAVE_KVM_IRQ_BYPASS */
 #endif
 
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index e2c876d..9f8014d 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -47,3 +47,6 @@ config KVM_GENERIC_DIRTYLOG_READ_PROTECT
 config KVM_COMPAT
def_bool y
depends on COMPAT  !S390
+
+config HAVE_KVM_IRQ_BYPASS
+   bool
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v5 3/5] KVM: create kvm_irqfd.h

Move _irqfd_resampler and _irqfd struct declarations in a new
public header: kvm_irqfd.h. They are respectively renamed into
kvm_kernel_irqfd_resampler and kvm_kernel_irqfd. Those datatypes
will be used by architecture specific code, in the context of
IRQ bypass manager integration.

Signed-off-by: Eric Auger eric.au...@linaro.org
---
 include/linux/kvm_irqfd.h | 69 ++
 virt/kvm/eventfd.c| 95 ---
 2 files changed, 92 insertions(+), 72 deletions(-)
 create mode 100644 include/linux/kvm_irqfd.h

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
new file mode 100644
index 000..f926b39
--- /dev/null
+++ b/include/linux/kvm_irqfd.h
@@ -0,0 +1,69 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * irqfd: Allows an fd to be used to inject an interrupt to the guest
+ * Credit goes to Avi Kivity for the original idea.
+ */
+
+#ifndef __LINUX_KVM_IRQFD_H
+#define __LINUX_KVM_IRQFD_H
+
+#include linux/kvm_host.h
+#include linux/poll.h
+
+/*
+ * Resampling irqfds are a special variety of irqfds used to emulate
+ * level triggered interrupts.  The interrupt is asserted on eventfd
+ * trigger.  On acknowledgment through the irq ack notifier, the
+ * interrupt is de-asserted and userspace is notified through the
+ * resamplefd.  All resamplers on the same gsi are de-asserted
+ * together, so we don't need to track the state of each individual
+ * user.  We can also therefore share the same irq source ID.
+ */
+struct kvm_kernel_irqfd_resampler {
+   struct kvm *kvm;
+   /*
+* List of resampling struct _irqfd objects sharing this gsi.
+* RCU list modified under kvm-irqfds.resampler_lock
+*/
+   struct list_head list;
+   struct kvm_irq_ack_notifier notifier;
+   /*
+* Entry in list of kvm-irqfd.resampler_list.  Use for sharing
+* resamplers among irqfds on the same gsi.
+* Accessed and modified under kvm-irqfds.resampler_lock
+*/
+   struct list_head link;
+};
+
+struct kvm_kernel_irqfd {
+   /* Used for MSI fast-path */
+   struct kvm *kvm;
+   wait_queue_t wait;
+   /* Update side is protected by irqfds.lock */
+   struct kvm_kernel_irq_routing_entry irq_entry;
+   seqcount_t irq_entry_sc;
+   /* Used for level IRQ fast-path */
+   int gsi;
+   struct work_struct inject;
+   /* The resampler used by this irqfd (resampler-only) */
+   struct kvm_kernel_irqfd_resampler *resampler;
+   /* Eventfd notified on resample (resampler-only) */
+   struct eventfd_ctx *resamplefd;
+   /* Entry in list of irqfds for a resampler (resampler-only) */
+   struct list_head resampler_link;
+   /* Used for setup/shutdown */
+   struct eventfd_ctx *eventfd;
+   struct list_head list;
+   poll_table pt;
+   struct work_struct shutdown;
+};
+
+#endif /* __LINUX_KVM_IRQFD_H */
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9ff4193..647ffb8 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -23,6 +23,7 @@
 
 #include linux/kvm_host.h
 #include linux/kvm.h
+#include linux/kvm_irqfd.h
 #include linux/workqueue.h
 #include linux/syscalls.h
 #include linux/wait.h
@@ -39,68 +40,14 @@
 #include kvm/iodev.h
 
 #ifdef CONFIG_HAVE_KVM_IRQFD
-/*
- * 
- * irqfd: Allows an fd to be used to inject an interrupt to the guest
- *
- * Credit goes to Avi Kivity for the original idea.
- * 
- */
-
-/*
- * Resampling irqfds are a special variety of irqfds used to emulate
- * level triggered interrupts.  The interrupt is asserted on eventfd
- * trigger.  On acknowledgement through the irq ack notifier, the
- * interrupt is de-asserted and userspace is notified through the
- * resamplefd.  All resamplers on the same gsi are de-asserted
- * together, so we don't need to track the state of each individual
- * user.  We can also therefore share the same irq source ID.
- */
-struct _irqfd_resampler {
-   struct kvm *kvm;
-   /*
-* List of resampling struct _irqfd objects sharing this gsi.
-* RCU list modified under kvm-irqfds.resampler_lock
-*/
-   struct list_head list;
-   struct kvm_irq_ack_notifier notifier;
-   /*
-* Entry in list of kvm-irqfd.resampler_list.  Use for sharing
-* resamplers among irqfds on the same gsi.
-* Accessed and modified under

Re: rdtsc() in kvm-unit-tests on x86