date:20150514

On Thu, May 14, 2015 at 11:37:46AM +0100, Peter Maydell wrote:
 On 14 May 2015 at 11:31, Andrew Jones drjo...@redhat.com wrote:
  Forgot to (4): switch from setting userspace's mapping to
  device memory to normal, non-cacheable. Using device memory
  caused a problem that Alex Graf found, and Peter Maydell suggested
  using normal, non-cacheable instead.
 
 Did you check that non-cacheable is definitely the correct
 kind of Normal memory attribute we want? (ie not write-through).

I was concerned that write-through wouldn't be sufficient. If the
guest writes to its non-cached memory, and QEMU needs to see what
it wrote, then won't write-through fail to work? Unless we some
how invalidate the cache first?

drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

On Thu, May 14, 2015 at 01:38:38PM +0200, Paolo Bonzini wrote:
 
 
 On 14/05/2015 13:36, Christoffer Dall wrote:
 (It's probably worth looking at the documentation in the first hunk 
 too,
 under the commit message.)

Why is this a hack/unintuitive?  Is the semantics of the QEMU PCI bus
not simply that MMIO regions are coherent?
   
   Only until device assignment gets into the picture.
  
  Will UEFI have to deal with device assignment in any respect?
 
 Why not?  For example you could do network boot from an assigned network
 card.
 
 In fact, anything that UEFI has to deal with, the OS has to deal with
 too.  If you need a UEFI hack, chances are you need or will need a Linux
 hack too.
 
Fair enough.  I was thinking that UEFI needs to be built with knowledge
of all the hardware present including any passthrough devices, but I
guess this is plainly not true with PCI (and might not even be true with
the level of DT parsing we do for the virtual platform).

So, getting back to my original question.  Is the point then that UEFI
must assume (from ACPI/DT) the cache-coherency properties of the PCI
controller which exists in hardware on the system you're running on,
even for the virtual PCI bus because that will be the semantics for
assigned devices?

And in that case, we have no way to distinguish between passthrough
devices and virtual devices plugged into the virtual PCI bus?

What about the idea of having two virtual PCI buses on your system where
one is always cache-coherent and uses for virtual devices, and the other
is whatever the hardware is and used for passthrough devices?

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Laszlo Ersek

On 05/14/15 15:00, Andrew Jones wrote:
 On Thu, May 14, 2015 at 01:38:11PM +0100, Peter Maydell wrote:
 On 14 May 2015 at 13:28, Paolo Bonzini pbonz...@redhat.com wrote:
 Well, PCI BARs are generally MMIO resources, and hence should not be cached.

 As an optimization, OS drivers can mark them as cacheable or
 write-combining or something like that, but in general it's a safe
 default to leave them uncached---one would think.

 Isn't this handled by the OS mapping them in the 'prefetchable'
 MMIO window rather than the 'non-prefetchable' one? (QEMU's
 generic-PCIe device doesn't yet support the prefetchable window.)
 
 I was thinking (with my limited PCI knowledge) the same thing, and
 was planning on experimenting with that.

This could be supported in UEFI as well, with the following steps:
- the DTB that QEMU provides UEFI with should advertise such a
  prefetchable window.
- The driver in UEFI that parses the DTB should understand that DTB
  node (well, record type), and store the appropriate base  size into
  some new dynamic PCDs (= basically, firmware wide global variables;
  PCD = platform configuration database)
- The entry point of the host bridge driver would call
  gDS-AddMemorySpace() twice, separately for the two different windows,
  with their appropriate caching attributes.
- The host bridge driver needs to be extended so that TypePMem32
  requests are not rejected (like now); they should be handled
  similarly to TypeMem32. Except, the gDS-AllocateMemorySpace() call
  should allocate from the prefetchable range (determined by the new
  PCDs above).
- QEMU's emulated devices should then expose their BARs as prefetchable
  (so that the above branch would be taken in the host bridge driver).

(Of course, if QEMU intends to emulate PCI devices somewhat
realistically, then QEMU should claim non-prefetchable for BARs that
would not be prefetchable on physical hardware either, and then the
hypervisor should accommodate the firmware's UC mapping and say hey I
know better, we're virtual in fact, and override the attribute (- use
WB instead of UC). With which we'd be back to square one...)

Thanks
Laszlo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH v3 5/5] arm64: KVM: Switch vgic save/restore to alternative_insn

On Fri, Mar 27, 2015 at 01:09:25PM +, Marc Zyngier wrote:
 So far, we configured the world-switch by having a small array
 of pointers to the save and restore functions, depending on the
 GIC used on the platform.
 
 Loading these values each time is a bit silly (they never change),
 and it makes sense to rely on the instruction patching instead.
 
 This leads to a nice cleanup of the code.
 
 Acked-by: Will Deacon will.dea...@arm.com
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com

I gave this a quick spin on Juno as well and works as expected:

Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Peter Maydell

On 14 May 2015 at 13:28, Paolo Bonzini pbonz...@redhat.com wrote:
 Well, PCI BARs are generally MMIO resources, and hence should not be cached.

 As an optimization, OS drivers can mark them as cacheable or
 write-combining or something like that, but in general it's a safe
 default to leave them uncached---one would think.

Isn't this handled by the OS mapping them in the 'prefetchable'
MMIO window rather than the 'non-prefetchable' one? (QEMU's
generic-PCIe device doesn't yet support the prefetchable window.)

-- PMM
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

On Thu, May 14, 2015 at 02:08:49PM +0200, Paolo Bonzini wrote:
 
 
 On 14/05/2015 14:00, Christoffer Dall wrote:
  So, getting back to my original question.  Is the point then that UEFI
  must assume (from ACPI/DT) the cache-coherency properties of the PCI
  controller which exists in hardware on the system you're running on,
  even for the virtual PCI bus because that will be the semantics for
  assigned devices?
  
  And in that case, we have no way to distinguish between passthrough
  devices and virtual devices plugged into the virtual PCI bus?
 
 Well, we could use the subsystem id.  But it's a hack, and may cause
 incompatibilities with some drivers.  Michael, any ideas?
 
  What about the idea of having two virtual PCI buses on your system where
  one is always cache-coherent and uses for virtual devices, and the other
  is whatever the hardware is and used for passthrough devices?
 
 I think that was rejected before.
 

Do you remember where?  I just remember Catalin mentioning the idea to
me verbally.

Besides the slightly heavy added use of resources etc. it seems like it
would address some of our issues in a good way.

But I'm still not sure why UEFI/Linux currently sees our PCI bus as
being non-coherent when in fact it is and we have no passthrough issues
currently.  Are all PCI controllers always non-coherent for some reason
and therefore we model it as such too?

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

On Thu, May 14, 2015 at 02:28:49PM +0200, Paolo Bonzini wrote:
 
 
 On 14/05/2015 14:24, Christoffer Dall wrote:
  On Thu, May 14, 2015 at 02:08:49PM +0200, Paolo Bonzini wrote:
 
 
  On 14/05/2015 14:00, Christoffer Dall wrote:
  So, getting back to my original question.  Is the point then that UEFI
  must assume (from ACPI/DT) the cache-coherency properties of the PCI
  controller which exists in hardware on the system you're running on,
  even for the virtual PCI bus because that will be the semantics for
  assigned devices?
 
  And in that case, we have no way to distinguish between passthrough
  devices and virtual devices plugged into the virtual PCI bus?
 
  Well, we could use the subsystem id.  But it's a hack, and may cause
  incompatibilities with some drivers.  Michael, any ideas?
 
  What about the idea of having two virtual PCI buses on your system where
  one is always cache-coherent and uses for virtual devices, and the other
  is whatever the hardware is and used for passthrough devices?
 
  I think that was rejected before.
  
  Do you remember where?  I just remember Catalin mentioning the idea to
  me verbally.
 
 In the last centithread on the subject. :)
 
 At least I and Peter disagreed.  It's not about the heavy added use of
 resources, it's more about it being really easy to misconfigure.
 
  But I'm still not sure why UEFI/Linux currently sees our PCI bus as
  being non-coherent when in fact it is and we have no passthrough issues
  currently.  Are all PCI controllers always non-coherent for some reason
  and therefore we model it as such too?
 
 Well, PCI BARs are generally MMIO resources, and hence should not be cached.
 
 As an optimization, OS drivers can mark them as cacheable or
 write-combining or something like that, but in general it's a safe
 default to leave them uncached---one would think.
 
ok, I guess this series makes sense then, assuming it works, and
assuming we don't kill performance by going to RAM all the time when we
don't have to...

Thanks,
-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

On Thu, May 14, 2015 at 01:38:11PM +0100, Peter Maydell wrote:
 On 14 May 2015 at 13:28, Paolo Bonzini pbonz...@redhat.com wrote:
  Well, PCI BARs are generally MMIO resources, and hence should not be cached.
 
  As an optimization, OS drivers can mark them as cacheable or
  write-combining or something like that, but in general it's a safe
  default to leave them uncached---one would think.
 
 Isn't this handled by the OS mapping them in the 'prefetchable'
 MMIO window rather than the 'non-prefetchable' one? (QEMU's
 generic-PCIe device doesn't yet support the prefetchable window.)

I was thinking (with my limited PCI knowledge) the same thing, and
was planning on experimenting with that.

drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Peter Maydell

On 14 May 2015 at 14:03, Andrew Jones drjo...@redhat.com wrote:
 On Thu, May 14, 2015 at 11:37:46AM +0100, Peter Maydell wrote:
 On 14 May 2015 at 11:31, Andrew Jones drjo...@redhat.com wrote:
  Forgot to (4): switch from setting userspace's mapping to
  device memory to normal, non-cacheable. Using device memory
  caused a problem that Alex Graf found, and Peter Maydell suggested
  using normal, non-cacheable instead.

 Did you check that non-cacheable is definitely the correct
 kind of Normal memory attribute we want? (ie not write-through).

 I was concerned that write-through wouldn't be sufficient. If the
 guest writes to its non-cached memory, and QEMU needs to see what
 it wrote, then won't write-through fail to work? Unless we some
 how invalidate the cache first?

Well, I meant more that the correct mapping for userspace is
the same as the guest, whatever that is, and so somebody needs
to look at what the guest actually does rather than merely
hoping NormalNC is OK. (For instance, do we need to provide
support for QEMU to map both NC and writethrough?)

-- PMM
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc

On Thu, May 14, 2015 at 01:05:09PM +0200, Christoffer Dall wrote:
 On Wed, May 13, 2015 at 01:31:52PM +0200, Andrew Jones wrote:
  Provide a method to change normal, cacheable memory to non-cacheable.
  KVM will make use of this to keep emulated device memory regions
  coherent with the guest.
  
  Signed-off-by: Andrew Jones drjo...@redhat.com
 
 Reviewed-by: Christoffer Dall christoffer.d...@linaro.org
 
 But you obviously need Russell and Will/Catalin to ack/merge this.

I guess this patch is going to go away in the next round. You've
pointed out that I screwed stuff up royally with my over eagerness
to reuse code. I need to reimplement change_memory_common, but a
version that takes an mm, which is more or less what I did in the
last version of this series, back when I was pinning pages.

drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Laszlo Ersek

On 05/14/15 14:34, Christoffer Dall wrote:
On Thu, May 14, 2015 at 02:28:49PM +0200, Paolo Bonzini wrote:

On 14/05/2015 14:24, Christoffer Dall wrote:
On Thu, May 14, 2015 at 02:08:49PM +0200, Paolo Bonzini wrote:

On 14/05/2015 14:00, Christoffer Dall wrote:
So, getting back to my original question. Is the point then that UEFI
must assume (from ACPI/DT) the cache-coherency properties of the PCI
controller which exists in hardware on the system you're running on,
even for the virtual PCI bus because that will be the semantics for
assigned devices?

And in that case, we have no way to distinguish between passthrough
devices and virtual devices plugged into the virtual PCI bus?

Well, we could use the subsystem id. But it's a hack, and may cause
incompatibilities with some drivers. Michael, any ideas?

What about the idea of having two virtual PCI buses on your system where
one is always cache-coherent and uses for virtual devices, and the other
is whatever the hardware is and used for passthrough devices?

I think that was rejected before.

Do you remember where? I just remember Catalin mentioning the idea to
me verbally.

In the last centithread on the subject. :)

At least I and Peter disagreed. It's not about the heavy added use of
resources, it's more about it being really easy to misconfigure.

But I'm still not sure why UEFI/Linux currently sees our PCI bus as
being non-coherent when in fact it is and we have no passthrough issues
currently. Are all PCI controllers always non-coherent for some reason
and therefore we model it as such too?

Well, PCI BARs are generally MMIO resources, and hence should not be cached.

As an optimization, OS drivers can mark them as cacheable or
write-combining or something like that, but in general it's a safe
default to leave them uncached---one would think.

ok, I guess this series makes sense then, assuming it works, and
assuming we don't kill performance by going to RAM all the time when we
don't have to...

The idea Paolo and I had discussed in the past is:
- Remove the kludge from UEFI, and map all MMIO BARs as uncached by
default. This should be a theoretically correct approach, and for
assigned devices, correct in practice too.

- At an appropriate place in the firmware (specifically, right before
this line: [1]), when PCI devices have been enumerated, but their
particular drivers (especially VGA) have not been connected yet,
check the subsystem id / vendor id / etc for each, and if we can tell
it's virtual, then set the attributes for all of its MMIO bars to
writeback.

It doesn't seem hard to implement, I've just been shying away from
actually coding it up because I'd like to see it make a difference for
an actual assigned device. That is, reproducing the current (statically
kludged) behavior wouldn't be hard, but I prefer not to write a new
patch until I can test it both ways. UC is broken and WB works for
virtual devices, fine; now let's see if the exact reverse holds for
assigned devices in reality.

... Testing of which will require someone to send a PCI card (NIC or
GPU) -- with an AARCH64 UEFI driver oprom on it -- to my place, so that
I can plug into my Mustang. ;)

Thanks
Laszlo

[1]
https://github.com/tianocore/edk2/blob/99d9ade85aad554a0fa08fff8586b0fd40570ac3/ArmPlatformPkg/ArmVirtualizationPkg/Library/PlatformIntelBdsLib/IntelBdsPlatform.c#L366

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Paolo Bonzini



On 14/05/2015 14:00, Christoffer Dall wrote:
 So, getting back to my original question.  Is the point then that UEFI
 must assume (from ACPI/DT) the cache-coherency properties of the PCI
 controller which exists in hardware on the system you're running on,
 even for the virtual PCI bus because that will be the semantics for
 assigned devices?
 
 And in that case, we have no way to distinguish between passthrough
 devices and virtual devices plugged into the virtual PCI bus?

Well, we could use the subsystem id.  But it's a hack, and may cause
incompatibilities with some drivers.  Michael, any ideas?

 What about the idea of having two virtual PCI buses on your system where
 one is always cache-coherent and uses for virtual devices, and the other
 is whatever the hardware is and used for passthrough devices?

I think that was rejected before.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [Qemu-devel] [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Michael S. Tsirkin

On Thu, May 14, 2015 at 03:32:10PM +0200, Laszlo Ersek wrote:
 On 05/14/15 15:00, Andrew Jones wrote:
  On Thu, May 14, 2015 at 01:38:11PM +0100, Peter Maydell wrote:
  On 14 May 2015 at 13:28, Paolo Bonzini pbonz...@redhat.com wrote:
  Well, PCI BARs are generally MMIO resources, and hence should not be 
  cached.
 
  As an optimization, OS drivers can mark them as cacheable or
  write-combining or something like that, but in general it's a safe
  default to leave them uncached---one would think.
 
  Isn't this handled by the OS mapping them in the 'prefetchable'
  MMIO window rather than the 'non-prefetchable' one? (QEMU's
  generic-PCIe device doesn't yet support the prefetchable window.)
  
  I was thinking (with my limited PCI knowledge) the same thing, and
  was planning on experimenting with that.
 
 This could be supported in UEFI as well, with the following steps:
 - the DTB that QEMU provides UEFI with should advertise such a
   prefetchable window.
 - The driver in UEFI that parses the DTB should understand that DTB
   node (well, record type), and store the appropriate base  size into
   some new dynamic PCDs (= basically, firmware wide global variables;
   PCD = platform configuration database)
 - The entry point of the host bridge driver would call
   gDS-AddMemorySpace() twice, separately for the two different windows,
   with their appropriate caching attributes.
 - The host bridge driver needs to be extended so that TypePMem32
   requests are not rejected (like now); they should be handled
   similarly to TypeMem32. Except, the gDS-AllocateMemorySpace() call
   should allocate from the prefetchable range (determined by the new
   PCDs above).
 - QEMU's emulated devices should then expose their BARs as prefetchable
   (so that the above branch would be taken in the host bridge driver).
 
 (Of course, if QEMU intends to emulate PCI devices somewhat
 realistically, then QEMU should claim non-prefetchable for BARs that
 would not be prefetchable on physical hardware either, and then the
 hypervisor should accommodate the firmware's UC mapping and say hey I
 know better, we're virtual in fact, and override the attribute (- use
 WB instead of UC). With which we'd be back to square one...)
 
 Thanks
 Laszlo

Prefetcheable is unrelated to BAR caching or drivers, it's a way to tell
host bridges they can do limited tweaks to downstream transactions in a
specific range.

Really non-prefetcheable BARs are mostly those where read has
side-effects, which is best avoided. this does not mean it's ok to
reorder transactions or cache them.

-- 
MST
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 1/5] VFIO: platform: add reset_list and register/unregister functions

2015-05-14 Thread Alex Williamson

On Thu, 2015-05-14 at 10:25 +0200, Eric Auger wrote:
 Hi Alex,
 On 05/13/2015 08:32 PM, Alex Williamson wrote:
  On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote:
  vfio_platform_common now stores a lists of available reset functions.
  Two functions are exposed to register/unregister a reset function. A
  reset function is paired with a compat string.
 
  Signed-off-by: Eric Auger eric.au...@linaro.org
  ---
   drivers/vfio/platform/vfio_platform_common.c  | 63 
  +++
   drivers/vfio/platform/vfio_platform_private.h | 13 ++
   2 files changed, 76 insertions(+)
 
  diff --git a/drivers/vfio/platform/vfio_platform_common.c 
  b/drivers/vfio/platform/vfio_platform_common.c
  index abcff7a..edbf24c 100644
  --- a/drivers/vfio/platform/vfio_platform_common.c
  +++ b/drivers/vfio/platform/vfio_platform_common.c
  @@ -23,6 +23,9 @@
   
   #include vfio_platform_private.h
   
  +struct list_head reset_list;
  +LIST_HEAD(reset_list);
  +
  
  Redundant?  Static?
 static, yes
  
   static DEFINE_MUTEX(driver_lock);
   
   static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
  @@ -511,6 +514,13 @@ EXPORT_SYMBOL_GPL(vfio_platform_probe_common);
   struct vfio_platform_device *vfio_platform_remove_common(struct device 
  *dev)
   {
 struct vfio_platform_device *vdev;
  +  struct vfio_platform_reset_node *iter, *tmp;
  +
  +  list_for_each_entry_safe(iter, tmp, reset_list, link) {
  +  list_del(iter-link);
  +  kfree(iter-compat);
  +  kfree(iter);
  +  }
  
  
  This doesn't make sense.  We allow reset functions to be registered and
  unregistered, but we forget them all when any device is released?!
 I acknowledge indeed. Can I rely on the reset module exit and associated
 unregister_reset or shall I take this action in the vfio driver itself,
 core?

If we were to load the reset modules via request_module() from
vfio-platform, I think they would stay loaded as long as vfio-platform
is loaded and automatically unload if vfio-platform is unloaded.  Our
reference via try_module_get() for an in-use reset function should
prevent the module from being unloaded while we're dependent on it.  I
think that covers everything we need.  A user is free to rmmod the reset
module, but if it's in use it shouldn't work, and we can request it
again for the next device.  

   
 vdev = vfio_del_group_dev(dev);
 if (vdev)
  @@ -519,3 +529,56 @@ struct vfio_platform_device 
  *vfio_platform_remove_common(struct device *dev)
 return vdev;
   }
   EXPORT_SYMBOL_GPL(vfio_platform_remove_common);
  +
  +int vfio_platform_register_reset(char *compat, struct module *reset_owner,
  +   vfio_platform_reset_fn_t reset)
  +{
  +  struct vfio_platform_reset_node *node, *iter;
  +  bool found = false;
  +
  +  list_for_each_entry(iter, reset_list, link) {
  +  if (!strcmp(iter-compat, compat)) {
  +  found = true;
  
  Just return errno here
 ok
  
  +  break;
  +  }
  +  }
  +  if (found)
  +  return -EINVAL;
  +
  +  node = kmalloc(sizeof(*node), GFP_KERNEL);
  +  if (!node)
  +  return -ENOMEM;
  +
  +  node-compat = kstrdup(compat, GFP_KERNEL);
  +  if (!node-compat)
  
  Leaking node
 ok
  
  +  return -ENOMEM;
  +
  +  node-owner = reset_owner;
  +  node-reset = reset;
  +
  +  list_add(node-link, reset_list);
  
  Isn't this racy?  Don't we need some locks around the list?
 I will add a lock to protect access to the list.
  
  +  return 0;
  +}
  +EXPORT_SYMBOL_GPL(vfio_platform_register_reset);
  +
  +int vfio_platform_unregister_reset(char *compat)
  +{
  +  struct vfio_platform_reset_node *iter;
  +  bool found = false;
  +
  +  list_for_each_entry(iter, reset_list, link) {
  +  if (!strcmp(iter-compat, compat)) {
  
  Return errno here
 ok
  
  +  found = true;
  +  break;
  +  }
  +  }
  +  if (!found)
  +  return -EINVAL;
  +
  +  list_del(iter-link);
  
  Racy
  
  +  kfree(iter-compat);
  +  kfree(iter);
  +  return 0;
  +}
  +EXPORT_SYMBOL_GPL(vfio_platform_unregister_reset);
  +
  diff --git a/drivers/vfio/platform/vfio_platform_private.h 
  b/drivers/vfio/platform/vfio_platform_private.h
  index 5d31e04..da2d60b 100644
  --- a/drivers/vfio/platform/vfio_platform_private.h
  +++ b/drivers/vfio/platform/vfio_platform_private.h
  @@ -69,6 +69,15 @@ struct vfio_platform_device {
 int (*get_irq)(struct vfio_platform_device *vdev, int i);
   };
   
  +typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device 
  *vdev);
  
  Seems like this ought to be in a non-private header if we're exporting
  the [un]register functions.
 I considered the vfio reset modules were internal to the vfio subsystem
 but if you prefer I can expose that in vfio.h. I guess
 register/unregister should become an external API then?

The patch descriptions implied that it was intended for external reset

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Laszlo Ersek

On 05/14/15 12:30, Christoffer Dall wrote:
 On Wed, May 13, 2015 at 01:31:51PM +0200, Andrew Jones wrote:
 Introduce a new memory region flag, KVM_MEM_UNCACHED, which is
 needed by ARM. This flag informs KVM that the given memory region
 is typically mapped by the guest as non-cacheable. KVM for ARM
 then ensures that that memory is indeed mapped non-cacheable by
 the guest, and also remaps that region as non-cacheable for
 userspace, allowing them both to maintain a coherent view.

 Changes since v1:
  1) don't pin pages [Paolo]
  2) ensure the guest maps the memory non-cacheable [me]
  3) clean up memslot flag documentation [Christoffer]
 changes 1 and 2 effectively redesigned/rewrote v1. Find v1 here
 http://www.spinics.net/lists/kvm-arm/msg14022.html

 The QEMU series for v1 hasn't really changed. Only the linux
 header hack needed to bump KVM_CAP_UNCACHED_MEM from 107 to
 116.  Find the series here
 http://www.spinics.net/lists/kvm-arm/msg14026.html

 Testing:
 This series still needs lots of testing, but I thought I'd
 kick it to the list early, as there's been recent interest
 in solving this problem, and I'd like to get test results
 and opinions on this approach from others sooner than later.
 I've tested with AAVMF (UEFI for AArch64 mach-virt guests).
 AAVMF has a kludge in it to avoid the coherency problem.
 
 How does the 'kludge' work?

https://github.com/tianocore/edk2/commit/f9a8be42

(It's probably worth looking at the documentation in the first hunk too,
under the commit message.)

Thanks
Laszlo


 
 I've tested both with and without that kludge active. Both
 worked for me (almost). Sometimes with the non-kludged
 version I was still able to see a bit of corruption in
 grub's output after edk2 loaded it - not much, and not always,
 but something.
 
 Remind me, this is a VGA framebuffer corruption with a PCI-plugged VGA
 card?
 
 Thanks,
 -Christoffer
 
 Anyway, it's quite frustrating, as I'm not sure
 what I'm missing...

 This series applies to Linus' 110bc76729d4, but I tested with
 a version backported to the current RHELSA kernel.

 Thanks for reviews and testing!

 drew


 Andrew Jones (3):
   arm/arm64: pageattr: add set_memory_nc
   KVM: promote KVM_MEMSLOT_INCOHERENT to uapi
   arm/arm64: KVM: implement 'uncached' mem coherency

  Documentation/virtual/kvm/api.txt | 20 --
  arch/arm/include/asm/cacheflush.h |  1 +
  arch/arm/include/asm/kvm_mmu.h|  5 -
  arch/arm/include/asm/pgtable-3level.h |  1 +
  arch/arm/include/asm/pgtable.h|  1 +
  arch/arm/include/uapi/asm/kvm.h   |  1 +
  arch/arm/kvm/arm.c|  1 +
  arch/arm/kvm/mmu.c| 39 
 ++-
  arch/arm/mm/pageattr.c|  7 +++
  arch/arm64/include/asm/cacheflush.h   |  1 +
  arch/arm64/include/asm/kvm_mmu.h  |  5 -
  arch/arm64/include/asm/memory.h   |  1 +
  arch/arm64/include/asm/pgtable.h  |  1 +
  arch/arm64/include/uapi/asm/kvm.h |  1 +
  arch/arm64/mm/pageattr.c  |  8 +++
  include/linux/kvm_host.h  |  1 -
  include/uapi/linux/kvm.h  |  2 ++
  virt/kvm/kvm_main.c   |  7 ++-
  18 files changed, 79 insertions(+), 24 deletions(-)

 -- 
 2.1.0


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

On Thu, May 14, 2015 at 01:09:34PM +0200, Laszlo Ersek wrote:
 On 05/14/15 12:30, Christoffer Dall wrote:
  On Wed, May 13, 2015 at 01:31:51PM +0200, Andrew Jones wrote:
  Introduce a new memory region flag, KVM_MEM_UNCACHED, which is
  needed by ARM. This flag informs KVM that the given memory region
  is typically mapped by the guest as non-cacheable. KVM for ARM
  then ensures that that memory is indeed mapped non-cacheable by
  the guest, and also remaps that region as non-cacheable for
  userspace, allowing them both to maintain a coherent view.
 
  Changes since v1:
   1) don't pin pages [Paolo]
   2) ensure the guest maps the memory non-cacheable [me]
   3) clean up memslot flag documentation [Christoffer]
  changes 1 and 2 effectively redesigned/rewrote v1. Find v1 here
  http://www.spinics.net/lists/kvm-arm/msg14022.html
 
  The QEMU series for v1 hasn't really changed. Only the linux
  header hack needed to bump KVM_CAP_UNCACHED_MEM from 107 to
  116.  Find the series here
  http://www.spinics.net/lists/kvm-arm/msg14026.html
 
  Testing:
  This series still needs lots of testing, but I thought I'd
  kick it to the list early, as there's been recent interest
  in solving this problem, and I'd like to get test results
  and opinions on this approach from others sooner than later.
  I've tested with AAVMF (UEFI for AArch64 mach-virt guests).
  AAVMF has a kludge in it to avoid the coherency problem.
  
  How does the 'kludge' work?
 
 https://github.com/tianocore/edk2/commit/f9a8be42
 
 (It's probably worth looking at the documentation in the first hunk too,
 under the commit message.)
 
Why is this a hack/unintuitive?  Is the semantics of the QEMU PCI bus
not simply that MMIO regions are coherent?

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Paolo Bonzini



On 14/05/2015 13:29, Christoffer Dall wrote:
  (It's probably worth looking at the documentation in the first hunk too,
  under the commit message.)
 
 Why is this a hack/unintuitive?  Is the semantics of the QEMU PCI bus
 not simply that MMIO regions are coherent?

Only until device assignment gets into the picture.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

On Thu, May 14, 2015 at 01:31:03PM +0200, Paolo Bonzini wrote:
 
 
 On 14/05/2015 13:29, Christoffer Dall wrote:
   (It's probably worth looking at the documentation in the first hunk too,
   under the commit message.)
  
  Why is this a hack/unintuitive?  Is the semantics of the QEMU PCI bus
  not simply that MMIO regions are coherent?
 
 Only until device assignment gets into the picture.
 
Will UEFI have to deal with device assignment in any respect?

-Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Paolo Bonzini



On 14/05/2015 13:36, Christoffer Dall wrote:
(It's probably worth looking at the documentation in the first hunk too,
under the commit message.)
   
   Why is this a hack/unintuitive?  Is the semantics of the QEMU PCI bus
   not simply that MMIO regions are coherent?
  
  Only until device assignment gets into the picture.
 
 Will UEFI have to deal with device assignment in any respect?

Why not?  For example you could do network boot from an assigned network
card.

In fact, anything that UEFI has to deal with, the OS has to deal with
too.  If you need a UEFI hack, chances are you need or will need a Linux
hack too.

Paolo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 2/5] VFIO: platform: add get_device callback

On 05/13/2015 08:32 PM, Alex Williamson wrote:
 On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote:
 It is needed to introduce a new callback enabling to retrieve the
 struct device* from the vfio_platform_device. Implementation depends
 on the underlying device, platform or amba. This will be used to retrieve
 the compatibility string of the device.

 Signed-off-by: Eric Auger eric.au...@linaro.org
 ---
  drivers/vfio/platform/vfio_amba.c |  9 +
  drivers/vfio/platform/vfio_platform.c | 10 ++
  drivers/vfio/platform/vfio_platform_private.h |  1 +
  3 files changed, 20 insertions(+)

 diff --git a/drivers/vfio/platform/vfio_amba.c 
 b/drivers/vfio/platform/vfio_amba.c
 index ff0331f..fd68115 100644
 --- a/drivers/vfio/platform/vfio_amba.c
 +++ b/drivers/vfio/platform/vfio_amba.c
 @@ -48,6 +48,14 @@ static int get_amba_irq(struct vfio_platform_device 
 *vdev, int i)
  return ret ? ret : -ENXIO;
  }
  
 +static struct device *vfio_amba_get_device(struct vfio_platform_device 
 *vdev)
 +{
 +struct amba_device *pdev = (struct amba_device *) vdev-opaque;
 +struct device *dev = pdev-dev;
 +
 +return dev;
 +}
 
 *get* device implies a increment in the reference count.  There's
 nothing like that happening here.  Do you just want a to_device wrapper?
Yes you already told that in the past and I should have learnt ...
 Why doesn't struct vfio_platform_device just cache a pointer to the
 struct device?
simpler indeed if agreed.
 
 +
  static int vfio_amba_probe(struct amba_device *adev, const struct amba_id 
 *id)
  {
  struct vfio_platform_device *vdev;
 @@ -67,6 +75,7 @@ static int vfio_amba_probe(struct amba_device *adev, const 
 struct amba_id *id)
  vdev-flags = VFIO_DEVICE_FLAGS_AMBA;
  vdev-get_resource = get_amba_resource;
  vdev-get_irq = get_amba_irq;
 +vdev-get_device = vfio_amba_get_device;
  
  ret = vfio_platform_probe_common(vdev, adev-dev);
  if (ret) {
 diff --git a/drivers/vfio/platform/vfio_platform.c 
 b/drivers/vfio/platform/vfio_platform.c
 index cef645c..c025d76 100644
 --- a/drivers/vfio/platform/vfio_platform.c
 +++ b/drivers/vfio/platform/vfio_platform.c
 @@ -51,6 +51,15 @@ static int get_platform_irq(struct vfio_platform_device 
 *vdev, int i)
  return platform_get_irq(pdev, i);
  }
  
 +static struct device *vfio_platform_get_device(
 +struct vfio_platform_device *vdev)
 +{
 +struct platform_device *pdev = (struct platform_device *) vdev-opaque;
 +struct device *dev = pdev-dev;
 +
 +return dev;
 +}
 +
  static int vfio_platform_probe(struct platform_device *pdev)
  {
  struct vfio_platform_device *vdev;
 @@ -65,6 +74,7 @@ static int vfio_platform_probe(struct platform_device 
 *pdev)
  vdev-flags = VFIO_DEVICE_FLAGS_PLATFORM;
  vdev-get_resource = get_platform_resource;
  vdev-get_irq = get_platform_irq;
 +vdev-get_device = vfio_platform_get_device;
  
  ret = vfio_platform_probe_common(vdev, pdev-dev);
  if (ret)
 diff --git a/drivers/vfio/platform/vfio_platform_private.h 
 b/drivers/vfio/platform/vfio_platform_private.h
 index da2d60b..68909a4 100644
 --- a/drivers/vfio/platform/vfio_platform_private.h
 +++ b/drivers/vfio/platform/vfio_platform_private.h
 @@ -67,6 +67,7 @@ struct vfio_platform_device {
  struct resource*
  (*get_resource)(struct vfio_platform_device *vdev, int i);
  int (*get_irq)(struct vfio_platform_device *vdev, int i);
 +struct device *(*get_device)(struct vfio_platform_device *vdev);
  };
  
  typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev);
 
 
 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 5/5] VFIO: platform: VFIO platform Calxeda xgmac reset module

Alex,
On 05/13/2015 08:33 PM, Alex Williamson wrote:
 On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote:
 This patch introduces a module that registers and implements a basic
 reset function for the Calxeda xgmac device. This latter basically disables
 interrupts and stops DMA transfers.

 The reset function code is inherited from the native calxeda xgmac driver.

 Signed-off-by: Eric Auger eric.au...@linaro.org
 ---
  drivers/vfio/platform/Kconfig  |   2 +
  drivers/vfio/platform/Makefile |   2 +
  drivers/vfio/platform/reset/Kconfig|   7 ++
  drivers/vfio/platform/reset/Makefile   |   5 +
  .../platform/reset/vfio_platform_calxedaxgmac.c| 121 
 +
  5 files changed, 137 insertions(+)
  create mode 100644 drivers/vfio/platform/reset/Kconfig
  create mode 100644 drivers/vfio/platform/reset/Makefile
  create mode 100644 drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c

 diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
 index 9a4403e..1df7477 100644
 --- a/drivers/vfio/platform/Kconfig
 +++ b/drivers/vfio/platform/Kconfig
 @@ -18,3 +18,5 @@ config VFIO_AMBA
framework.
  
If you don't know what to do here, say N.
 +
 +source drivers/vfio/platform/reset/Kconfig
 diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
 index 81de144..9ce8afe 100644
 --- a/drivers/vfio/platform/Makefile
 +++ b/drivers/vfio/platform/Makefile
 @@ -2,7 +2,9 @@
  vfio-platform-y := vfio_platform.o vfio_platform_common.o 
 vfio_platform_irq.o
  
  obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
 +obj-$(CONFIG_VFIO_PLATFORM) += reset/
  
  vfio-amba-y := vfio_amba.o
  
  obj-$(CONFIG_VFIO_AMBA) += vfio-amba.o
 +obj-$(CONFIG_VFIO_AMBA) += reset/
 diff --git a/drivers/vfio/platform/reset/Kconfig 
 b/drivers/vfio/platform/reset/Kconfig
 new file mode 100644
 index 000..746b96b
 --- /dev/null
 +++ b/drivers/vfio/platform/reset/Kconfig
 @@ -0,0 +1,7 @@
 +config VFIO_PLATFORM_CALXEDAXGMAC_RESET
 +tristate VFIO support for calxeda xgmac reset
 +depends on VFIO_PLATFORM
 +help
 +  Enables the VFIO platform driver to handle reset for Calxeda xgmac
 +
 +  If you don't know what to do here, say N.
 diff --git a/drivers/vfio/platform/reset/Makefile 
 b/drivers/vfio/platform/reset/Makefile
 new file mode 100644
 index 000..2a486af
 --- /dev/null
 +++ b/drivers/vfio/platform/reset/Makefile
 @@ -0,0 +1,5 @@
 +vfio-platform-calxedaxgmac-y := vfio_platform_calxedaxgmac.o
 +
 +ccflags-y += -Idrivers/vfio/platform
 +
 +obj-$(CONFIG_VFIO_PLATFORM_CALXEDAXGMAC_RESET) += 
 vfio-platform-calxedaxgmac.o
 diff --git a/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c 
 b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
 new file mode 100644
 index 000..3c6e428
 --- /dev/null
 +++ b/drivers/vfio/platform/reset/vfio_platform_calxedaxgmac.c
 @@ -0,0 +1,121 @@
 +/*
 + * VFIO platform driver specialized for Calxeda xgmac reset
 + * reset code is inherited from calxeda xgmac native driver
 + *
 + * Copyright 2010-2011 Calxeda, Inc.
 + * Copyright (c) 2015 Linaro Ltd.
 + *  www.linaro.org
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License along 
 with
 + * this program.  If not, see http://www.gnu.org/licenses/.
 + */
 +
 +#include linux/module.h
 +#include linux/kernel.h
 +#include linux/init.h
 +#include linux/io.h
 +
 +#include vfio_platform_private.h
 +
 +#define DRIVER_VERSION  0.1
 +#define DRIVER_AUTHOR   Eric Auger eric.au...@linaro.org
 +#define DRIVER_DESC Reset support for Calxeda xgmac vfio platform 
 device
 +
 +#define CALXEDAXGMAC_COMPAT calxeda,hb-xgmac
 +
 +/* XGMAC Register definitions */
 +#define XGMAC_CONTROL   0x  /* MAC Configuration */
 +
 +/* DMA Control and Status Registers */
 +#define XGMAC_DMA_CONTROL   0x0f18  /* Ctrl (Operational Mode) 
 */
 +#define XGMAC_DMA_INTR_ENA  0x0f1c  /* Interrupt Enable */
 +
 +/* DMA Control registe defines */
 +#define DMA_CONTROL_ST  0x2000  /* Start/Stop Transmission 
 */
 +#define DMA_CONTROL_SR  0x0002  /* Start/Stop Receive */
 +
 +/* Common MAC defines */
 +#define MAC_ENABLE_TX   0x0008  /* Transmitter Enable */
 +#define MAC_ENABLE_RX   0x0004  /* Receiver Enable */
 +
 +static inline void xgmac_mac_disable(void __iomem *ioaddr)
 +{
 +u32 value = readl(ioaddr + XGMAC_DMA_CONTROL);

Re: [PATCH 1/5] VFIO: platform: add reset_list and register/unregister functions

Hi Alex,
On 05/13/2015 08:32 PM, Alex Williamson wrote:
 On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote:
 vfio_platform_common now stores a lists of available reset functions.
 Two functions are exposed to register/unregister a reset function. A
 reset function is paired with a compat string.

 Signed-off-by: Eric Auger eric.au...@linaro.org
 ---
  drivers/vfio/platform/vfio_platform_common.c  | 63 
 +++
  drivers/vfio/platform/vfio_platform_private.h | 13 ++
  2 files changed, 76 insertions(+)

 diff --git a/drivers/vfio/platform/vfio_platform_common.c 
 b/drivers/vfio/platform/vfio_platform_common.c
 index abcff7a..edbf24c 100644
 --- a/drivers/vfio/platform/vfio_platform_common.c
 +++ b/drivers/vfio/platform/vfio_platform_common.c
 @@ -23,6 +23,9 @@
  
  #include vfio_platform_private.h
  
 +struct list_head reset_list;
 +LIST_HEAD(reset_list);
 +
 
 Redundant?  Static?
static, yes
 
  static DEFINE_MUTEX(driver_lock);
  
  static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
 @@ -511,6 +514,13 @@ EXPORT_SYMBOL_GPL(vfio_platform_probe_common);
  struct vfio_platform_device *vfio_platform_remove_common(struct device *dev)
  {
  struct vfio_platform_device *vdev;
 +struct vfio_platform_reset_node *iter, *tmp;
 +
 +list_for_each_entry_safe(iter, tmp, reset_list, link) {
 +list_del(iter-link);
 +kfree(iter-compat);
 +kfree(iter);
 +}
 
 
 This doesn't make sense.  We allow reset functions to be registered and
 unregistered, but we forget them all when any device is released?!
I acknowledge indeed. Can I rely on the reset module exit and associated
unregister_reset or shall I take this action in the vfio driver itself,
core?
 
  
  vdev = vfio_del_group_dev(dev);
  if (vdev)
 @@ -519,3 +529,56 @@ struct vfio_platform_device 
 *vfio_platform_remove_common(struct device *dev)
  return vdev;
  }
  EXPORT_SYMBOL_GPL(vfio_platform_remove_common);
 +
 +int vfio_platform_register_reset(char *compat, struct module *reset_owner,
 + vfio_platform_reset_fn_t reset)
 +{
 +struct vfio_platform_reset_node *node, *iter;
 +bool found = false;
 +
 +list_for_each_entry(iter, reset_list, link) {
 +if (!strcmp(iter-compat, compat)) {
 +found = true;
 
 Just return errno here
ok
 
 +break;
 +}
 +}
 +if (found)
 +return -EINVAL;
 +
 +node = kmalloc(sizeof(*node), GFP_KERNEL);
 +if (!node)
 +return -ENOMEM;
 +
 +node-compat = kstrdup(compat, GFP_KERNEL);
 +if (!node-compat)
 
 Leaking node
ok
 
 +return -ENOMEM;
 +
 +node-owner = reset_owner;
 +node-reset = reset;
 +
 +list_add(node-link, reset_list);
 
 Isn't this racy?  Don't we need some locks around the list?
I will add a lock to protect access to the list.
 
 +return 0;
 +}
 +EXPORT_SYMBOL_GPL(vfio_platform_register_reset);
 +
 +int vfio_platform_unregister_reset(char *compat)
 +{
 +struct vfio_platform_reset_node *iter;
 +bool found = false;
 +
 +list_for_each_entry(iter, reset_list, link) {
 +if (!strcmp(iter-compat, compat)) {
 
 Return errno here
ok
 
 +found = true;
 +break;
 +}
 +}
 +if (!found)
 +return -EINVAL;
 +
 +list_del(iter-link);
 
 Racy
 
 +kfree(iter-compat);
 +kfree(iter);
 +return 0;
 +}
 +EXPORT_SYMBOL_GPL(vfio_platform_unregister_reset);
 +
 diff --git a/drivers/vfio/platform/vfio_platform_private.h 
 b/drivers/vfio/platform/vfio_platform_private.h
 index 5d31e04..da2d60b 100644
 --- a/drivers/vfio/platform/vfio_platform_private.h
 +++ b/drivers/vfio/platform/vfio_platform_private.h
 @@ -69,6 +69,15 @@ struct vfio_platform_device {
  int (*get_irq)(struct vfio_platform_device *vdev, int i);
  };
  
 +typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev);
 
 Seems like this ought to be in a non-private header if we're exporting
 the [un]register functions.
I considered the vfio reset modules were internal to the vfio subsystem
but if you prefer I can expose that in vfio.h. I guess
register/unregister should become an external API then?

Thanks

Eric
 +
 +struct vfio_platform_reset_node {
 +struct list_head link;
 +char *compat;
 +struct module *owner;
 +vfio_platform_reset_fn_t reset;
 +};
 +
  extern int vfio_platform_probe_common(struct vfio_platform_device *vdev,
struct device *dev);
  extern struct vfio_platform_device *vfio_platform_remove_common
 @@ -82,4 +91,8 @@ extern int vfio_platform_set_irqs_ioctl(struct 
 vfio_platform_device *vdev,
  unsigned start, unsigned count,
  void *data);
  
 +extern int vfio_platform_register_reset(char *compat, struct module *owner,
 +

Re: [PATCH 3/5] VFIO: platform: add reset callback

On 05/13/2015 08:32 PM, Alex Williamson wrote:
 On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote:
 A new reset callback is introduced. If this callback is populated,
 the reset is invoked on device release or upon userspace ioctl. The
 modality is exposed on VFIO_DEVICE_GET_INFO.

 Signed-off-by: Eric Auger eric.au...@linaro.org

 ---

 v2 - v3:
 - change patch title and commit message. Use IS_ERR_OR_NULL to
   anticipate for getter returned value.

 v1 - v2:
 - On VFIO_DEVICE_RESET returns -EINVAL in case the callback is not
   populated (instead of -ENOTTY)
 ---
  drivers/vfio/platform/vfio_platform_common.c  | 12 ++--
  drivers/vfio/platform/vfio_platform_private.h |  1 +
  2 files changed, 11 insertions(+), 2 deletions(-)

 diff --git a/drivers/vfio/platform/vfio_platform_common.c 
 b/drivers/vfio/platform/vfio_platform_common.c
 index edbf24c..0d10018 100644
 --- a/drivers/vfio/platform/vfio_platform_common.c
 +++ b/drivers/vfio/platform/vfio_platform_common.c
 @@ -103,6 +103,8 @@ static void vfio_platform_release(void *device_data)
  mutex_lock(driver_lock);
  
  if (!(--vdev-refcnt)) {
 +if (!IS_ERR_OR_NULL(vdev-reset))
 +vdev-reset(vdev);
 
 
 It seems sloppy to have a function pointer that could be ERR_PTR.
The reset lookup function currently can return errors. I will change this.
 
 Should we also be trying to reset on device open?
may be yes. I was influenced by qemu use case where reset is also
explicitly invoked through iotcl by the reset notifier. If we do so we
might have duplicated resets.

- Eric
 
  vfio_platform_regions_cleanup(vdev);
  vfio_platform_irq_cleanup(vdev);
  }
 @@ -162,6 +164,8 @@ static long vfio_platform_ioctl(void *device_data,
  if (info.argsz  minsz)
  return -EINVAL;
  
 +if (!IS_ERR_OR_NULL(vdev-reset))
 +vdev-flags |= VFIO_DEVICE_FLAGS_RESET;
  info.flags = vdev-flags;
  info.num_regions = vdev-num_regions;
  info.num_irqs = vdev-num_irqs;
 @@ -255,8 +259,12 @@ static long vfio_platform_ioctl(void *device_data,
  
  return ret;
  
 -} else if (cmd == VFIO_DEVICE_RESET)
 -return -EINVAL;
 +} else if (cmd == VFIO_DEVICE_RESET) {
 +if (!IS_ERR_OR_NULL(vdev-reset))
 +return vdev-reset(vdev);
 +else
 +return -EINVAL;
 +}
  
  return -ENOTTY;
  }
 diff --git a/drivers/vfio/platform/vfio_platform_private.h 
 b/drivers/vfio/platform/vfio_platform_private.h
 index 68909a4..84ccf6d 100644
 --- a/drivers/vfio/platform/vfio_platform_private.h
 +++ b/drivers/vfio/platform/vfio_platform_private.h
 @@ -68,6 +68,7 @@ struct vfio_platform_device {
  (*get_resource)(struct vfio_platform_device *vdev, int i);
  int (*get_irq)(struct vfio_platform_device *vdev, int i);
  struct device *(*get_device)(struct vfio_platform_device *vdev);
 +int (*reset)(struct vfio_platform_device *vdev);
  };
  
  typedef int (*vfio_platform_reset_fn_t)(struct vfio_platform_device *vdev);
 
 
 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [PATCH 4/5] VFIO: platform: populate reset function according to compat

On 05/13/2015 08:33 PM, Alex Williamson wrote:
 On Thu, 2015-05-07 at 16:27 +0200, Eric Auger wrote:
 Add the reset function lookup according to the device compat
 string. This lookup is added at different places:
 - on VFIO_DEVICE_GET_INFO
 - on VFIO_DEVICE_RESET
 - on device release

 A reference to the module implementing the reset function is taken
 on first reset function lookup and released on vfio platform device
 release.

 Signed-off-by: Eric Auger eric.au...@linaro.org
 ---
  drivers/vfio/platform/vfio_platform_common.c | 50 
 
  1 file changed, 50 insertions(+)

 diff --git a/drivers/vfio/platform/vfio_platform_common.c 
 b/drivers/vfio/platform/vfio_platform_common.c
 index 0d10018..bd7e44c 100644
 --- a/drivers/vfio/platform/vfio_platform_common.c
 +++ b/drivers/vfio/platform/vfio_platform_common.c
 @@ -28,6 +28,52 @@ LIST_HEAD(reset_list);
  
  static DEFINE_MUTEX(driver_lock);
  
 +static vfio_platform_reset_fn_t vfio_platform_lookup_reset(const char 
 *compat)
 +{
 +struct vfio_platform_reset_node *iter;
 +
 +list_for_each_entry(iter, reset_list, link) {
 
 Racy
ok
 
 +if (!strcmp(iter-compat, compat) 
 +try_module_get(iter-owner))
 +return iter-reset;
 +}
 +
 +return ERR_PTR(-ENODEV);
 
 return NULL imo
ok
 
 +}
 +
 +static vfio_platform_reset_fn_t vfio_platform_get_reset(
 +struct vfio_platform_device *vdev)
 +{
 +struct device *dev = vdev-get_device(vdev);
 +const char *compat_str_array[2];
 +vfio_platform_reset_fn_t reset;
 +int ret;
 +
 +if (!IS_ERR_OR_NULL(vdev-reset))
 +return vdev-reset;
 +
 +ret = device_property_read_string_array(dev, compatible,
 +compat_str_array, 2);
 +if (!ret)
 +return ERR_PTR(ret);
 +
 +reset = vfio_platform_lookup_reset(compat_str_array[0]);
 +return reset;
 
 Something got allocated into compat_str_array and gets leaked here.
is there any allocation? device_property_read_string_array does not
return -ENOMEM.

 
 +}
 +
 +static void vfio_platform_put_reset(struct vfio_platform_device *vdev)
 +{
 +struct vfio_platform_reset_node *iter;
 +
 +list_for_each_entry(iter, reset_list, link) {
 
 Racy
ok
 
 +if (iter-reset == vdev-reset) {
 +module_put(iter-owner);
 +vdev-reset = NULL;
 +return;
 +}
 +}
 +}
 +
  static int vfio_platform_regions_init(struct vfio_platform_device *vdev)
  {
  int cnt = 0, i;
 @@ -103,10 +149,12 @@ static void vfio_platform_release(void *device_data)
  mutex_lock(driver_lock);
  
  if (!(--vdev-refcnt)) {
 +vdev-reset = vfio_platform_get_reset(vdev);
  if (!IS_ERR_OR_NULL(vdev-reset))
  vdev-reset(vdev);
  vfio_platform_regions_cleanup(vdev);
  vfio_platform_irq_cleanup(vdev);
 +vfio_platform_put_reset(vdev);
  }
  
  mutex_unlock(driver_lock);
 @@ -164,6 +212,7 @@ static long vfio_platform_ioctl(void *device_data,
  if (info.argsz  minsz)
  return -EINVAL;
  
 +vdev-reset = vfio_platform_get_reset(vdev);
  if (!IS_ERR_OR_NULL(vdev-reset))
  vdev-flags |= VFIO_DEVICE_FLAGS_RESET;
  info.flags = vdev-flags;
 @@ -260,6 +309,7 @@ static long vfio_platform_ioctl(void *device_data,
  return ret;
  
  } else if (cmd == VFIO_DEVICE_RESET) {
 +vdev-reset = vfio_platform_get_reset(vdev);
  if (!IS_ERR_OR_NULL(vdev-reset))
  return vdev-reset(vdev);
  else
 
 I count 3 gets and 1 put, isn't the module reference count increase
 showing that?

vfio_platform_get_reset simply returns if the function pointer already
is populated so there is no systematic ref increment.

  This looks like it hasn't been tested.

It did testing with external and in-kernel modules through
  Why would we do a
 get every time we want to do a reset?

My doubt were about the order of probing between the
vfio-platform_driver and the vfio reset module? This question was the
rationale of this implementation choice. But again the actual ref count
increment is devised to be done once on the first entry point (iotcl or
internal release)

  Do one get when the device is
 registered or opened and one put when the device is unregistered or
 closed.  We don't want erratic userspace behavior that the reset
 property of a device can come and go.

 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

On Wed, May 13, 2015 at 01:31:51PM +0200, Andrew Jones wrote:
 Introduce a new memory region flag, KVM_MEM_UNCACHED, which is
 needed by ARM. This flag informs KVM that the given memory region
 is typically mapped by the guest as non-cacheable. KVM for ARM
 then ensures that that memory is indeed mapped non-cacheable by
 the guest, and also remaps that region as non-cacheable for
 userspace, allowing them both to maintain a coherent view.
 
 Changes since v1:
  1) don't pin pages [Paolo]
  2) ensure the guest maps the memory non-cacheable [me]
  3) clean up memslot flag documentation [Christoffer]

Forgot to (4): switch from setting userspace's mapping to
device memory to normal, non-cacheable. Using device memory
caused a problem that Alex Graf found, and Peter Maydell suggested
using normal, non-cacheable instead.


 changes 1 and 2 effectively redesigned/rewrote v1. Find v1 here
 http://www.spinics.net/lists/kvm-arm/msg14022.html
 
 The QEMU series for v1 hasn't really changed. Only the linux
 header hack needed to bump KVM_CAP_UNCACHED_MEM from 107 to
 116.  Find the series here
 http://www.spinics.net/lists/kvm-arm/msg14026.html
 
 Testing:
 This series still needs lots of testing, but I thought I'd
 kick it to the list early, as there's been recent interest
 in solving this problem, and I'd like to get test results
 and opinions on this approach from others sooner than later.
 I've tested with AAVMF (UEFI for AArch64 mach-virt guests).
 AAVMF has a kludge in it to avoid the coherency problem.
 I've tested both with and without that kludge active. Both
 worked for me (almost). Sometimes with the non-kludged
 version I was still able to see a bit of corruption in
 grub's output after edk2 loaded it - not much, and not always,
 but something. Anyway, it's quite frustrating, as I'm not sure
 what I'm missing...
 
 This series applies to Linus' 110bc76729d4, but I tested with
 a version backported to the current RHELSA kernel.
 
 Thanks for reviews and testing!
 
 drew
 
 
 Andrew Jones (3):
   arm/arm64: pageattr: add set_memory_nc
   KVM: promote KVM_MEMSLOT_INCOHERENT to uapi
   arm/arm64: KVM: implement 'uncached' mem coherency
 
  Documentation/virtual/kvm/api.txt | 20 --
  arch/arm/include/asm/cacheflush.h |  1 +
  arch/arm/include/asm/kvm_mmu.h|  5 -
  arch/arm/include/asm/pgtable-3level.h |  1 +
  arch/arm/include/asm/pgtable.h|  1 +
  arch/arm/include/uapi/asm/kvm.h   |  1 +
  arch/arm/kvm/arm.c|  1 +
  arch/arm/kvm/mmu.c| 39 
 ++-
  arch/arm/mm/pageattr.c|  7 +++
  arch/arm64/include/asm/cacheflush.h   |  1 +
  arch/arm64/include/asm/kvm_mmu.h  |  5 -
  arch/arm64/include/asm/memory.h   |  1 +
  arch/arm64/include/asm/pgtable.h  |  1 +
  arch/arm64/include/uapi/asm/kvm.h |  1 +
  arch/arm64/mm/pageattr.c  |  8 +++
  include/linux/kvm_host.h  |  1 -
  include/uapi/linux/kvm.h  |  2 ++
  virt/kvm/kvm_main.c   |  7 ++-
  18 files changed, 79 insertions(+), 24 deletions(-)
 
 -- 
 2.1.0
 
 ___
 kvmarm mailing list
 kvmarm@lists.cs.columbia.edu
 https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 0/3] KVM: Introduce KVM_MEM_UNCACHED

2015-05-14 Thread Peter Maydell

On 14 May 2015 at 11:31, Andrew Jones drjo...@redhat.com wrote:
 Forgot to (4): switch from setting userspace's mapping to
 device memory to normal, non-cacheable. Using device memory
 caused a problem that Alex Graf found, and Peter Maydell suggested
 using normal, non-cacheable instead.

Did you check that non-cacheable is definitely the correct
kind of Normal memory attribute we want? (ie not write-through).

-- PMM
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Re: [RFC/RFT PATCH v2 3/3] arm/arm64: KVM: implement 'uncached' mem coherency

On Wed, May 13, 2015 at 01:31:54PM +0200, Andrew Jones wrote:
 When S1 and S2 memory attributes combine wrt to caching policy,
 non-cacheable types take precedence. If a guest maps a region as
 device memory, which KVM userspace is using to emulate the device
 using normal, cacheable memory, then we lose coherency. With
 KVM_MEM_UNCACHED, KVM userspace can now hint to KVM which memory
 regions are likely to be problematic. With this patch, as pages
 of these types of regions are faulted into the guest, not only do
 we flush the page's dcache, but we also change userspace's
 mapping to NC in order to maintain coherency.
 
 What if the guest doesn't do what we expect? While we can't
 force a guest to use cacheable memory, we can take advantage of
 the non-cacheable precedence, and force it to use non-cacheable.
 So, this patch also introduces PAGE_S2_NORMAL_NC, and uses it on
 KVM_MEM_UNCACHED regions to force them to NC.
 
 We now have both guest and userspace on the same page (pun intended)

I'd like to revisit the overall approach here.  Is doing non-cached
accesses in both the guest and host really the right thing to do here?

The semantics of the device becomes that it is cache coherent (because
QEMU is), and I think Marc argued that Linux/UEFI should simply be
adapted to handle whatever emulated devices we have as coherent.  I also
remember someone arguing that would be wrong (Peter?).

Finally, does this address all cache coherency issues with emulated
devices?  Some VOS guys had seen things still not working with this
approach, unsure why...  I'd like to avoid us merging this only to merge
a more complete solution in a few weeks which reverts this solution...

More comments/questions below:

 
 Signed-off-by: Andrew Jones drjo...@redhat.com
 ---
  arch/arm/include/asm/kvm_mmu.h|  5 -
  arch/arm/include/asm/pgtable-3level.h |  1 +
  arch/arm/include/asm/pgtable.h|  1 +
  arch/arm/kvm/mmu.c| 37 
 +++
  arch/arm64/include/asm/kvm_mmu.h  |  5 -
  arch/arm64/include/asm/memory.h   |  1 +
  arch/arm64/include/asm/pgtable.h  |  1 +
  7 files changed, 36 insertions(+), 15 deletions(-)
 
 diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
 index 405aa18833073..e8034a80b12e5 100644
 --- a/arch/arm/include/asm/kvm_mmu.h
 +++ b/arch/arm/include/asm/kvm_mmu.h
 @@ -214,8 +214,11 @@ static inline void __coherent_cache_guest_page(struct 
 kvm_vcpu *vcpu, pfn_t pfn,
   while (size) {
   void *va = kmap_atomic_pfn(pfn);
  
 - if (need_flush)
 + if (need_flush) {
   kvm_flush_dcache_to_poc(va, PAGE_SIZE);
 + if (ipa_uncached)
 + set_memory_nc((unsigned long)va, 1);

nit: consider moving this outside the need_flush

 + }
  
   if (icache_is_pipt())
   __cpuc_coherent_user_range((unsigned long)va,
 diff --git a/arch/arm/include/asm/pgtable-3level.h 
 b/arch/arm/include/asm/pgtable-3level.h
 index a745a2a53853c..39b3f7a40e663 100644
 --- a/arch/arm/include/asm/pgtable-3level.h
 +++ b/arch/arm/include/asm/pgtable-3level.h
 @@ -121,6 +121,7 @@
   * 2nd stage PTE definitions for LPAE.
   */
  #define L_PTE_S2_MT_UNCACHED (_AT(pteval_t, 0x0)  2) /* strongly 
 ordered */
 +#define L_PTE_S2_MT_NORMAL_NC(_AT(pteval_t, 0x5)  2) /* 
 normal non-cacheable */
  #define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa)  2) /* normal 
 inner write-through */
  #define L_PTE_S2_MT_WRITEBACK(_AT(pteval_t, 0xf)  2) /* 
 normal inner write-back */
  #define L_PTE_S2_MT_DEV_SHARED   (_AT(pteval_t, 0x1)  2) /* 
 device */
 diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
 index f40354198bad4..ae13ca8b0a23d 100644
 --- a/arch/arm/include/asm/pgtable.h
 +++ b/arch/arm/include/asm/pgtable.h
 @@ -100,6 +100,7 @@ extern pgprot_t   pgprot_s2_device;
  #define PAGE_HYP _MOD_PROT(pgprot_kernel, L_PTE_HYP)
  #define PAGE_HYP_DEVICE  _MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
  #define PAGE_S2  _MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
 +#define PAGE_S2_NORMAL_NC__pgprot((pgprot_val(PAGE_S2)  
 ~L_PTE_S2_MT_MASK) | L_PTE_S2_MT_NORMAL_NC)
  #define PAGE_S2_DEVICE   _MOD_PROT(pgprot_s2_device, 
 L_PTE_S2_RDONLY)
  
  #define __PAGE_NONE  __pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | 
 L_PTE_XN | L_PTE_NONE)
 diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
 index bc1665acd73e7..6b3bd8061bd2a 100644
 --- a/arch/arm/kvm/mmu.c
 +++ b/arch/arm/kvm/mmu.c
 @@ -1220,7 +1220,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
 phys_addr_t fault_ipa,
   struct vm_area_struct *vma;
   pfn_t pfn;
   pgprot_t mem_type = PAGE_S2;
 - bool fault_ipa_uncached;
 + bool fault_ipa_uncached = false;
   bool logging_active =

Re: [RFC/RFT PATCH v2 1/3] arm/arm64: pageattr: add set_memory_nc