Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8

2018-04-11 Thread gengdongjiu
Dear James,
Thanks for this mail and sorry for my late response.


2018-02-16 1:55 GMT+08:00 James Morse :
> Hi gengdongjiu, liu jun
>
> On 05/02/18 11:24, gengdongjiu wrote:
[]
>>
>>> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO,
>>> TGE}?
>>
>> Yes, it is.
>
> ... and yet ...
>
>
>>> What does your firmware do when it wants to emulate SError but its masked?
>>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had
>>> PSTATE.A  set.
>>>  e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the
>>> emulated  SError should go to EL1. This effectively masks SError.)
>>
>> Currently we does not consider much about the mask status(SPSR).
>
> .. this is a problem.
>
> If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception
> interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't eret 
> to
> EL2. This should never happen, SError is effectively masked if you are running
> at an EL higher than the one its routed to.
>
> More obviously: if the exception came from the EL that SError should be routed
> to, but PSTATE.A was set, you can't deliver SError. Masking SError is the only

James, I  summarized the masking and routing rules for SError to
confirm with you for the firmware first solution,

1. If the HCR_EL2.{AMO,TGE} is set, which means the SError should route to EL2,
When system happens SError and trap to EL3,   If EL3 find
HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set,
and find this SError come from EL2, it will not deliver an SError:
store the RAS error in the BERT and 'reboot'; but if
it find that this SError come from EL1 or EL0, it also need to deliver
an SError, right?


2. If the HCR_EL2.{AMO,TGE} is not set, which means the SError should
route to EL1,
When system happens SError and trap to EL3, If EL3 find
HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set,
and find this SError come from EL1, it will not deliver an SError:
store the RAS error in the BERT and 'reboot'; but if
it find that this SError come from EL0, it also need to deliver an
SError, right?


> way the OS has to indicate it can't take an exception right now. VBAR_EL1 may 
> be
> 'wrong' if we're doing some power-management, the FAR/ELR/ESR/SPSR registers 
> may
> contain live values that the OS would lose if you deliver another exception 
> over
> the top.
>
> If you deliver an emulated-SError as the OS eret's, your new ELR will point at
> the eret instruction and the CPU will spin on this instruction forever.
>
> You have to honour the masking and routing rules for SError, otherwise no OS 
> can
> run safely with this firmware.
>
>
>> I remember that you ever suggested firmware should reboot if the mask status
>> is set(SPSR), right?
>
> Yes, this is my suggestion of what to do if you can't deliver an SError: store
> the RAS error in the BERT and 'reboot'.
>
>
>> I CC "liu jun"  who is our UEFI firmware Architect,
>> if you have firmware requirements, you can raise again.
>
> (UEFI? I didn't think there was any of that at EL3, but I'm not familiar with
> all the 'PI' bits).
>
> The requirement is your emulated-SError from EL3 looks exactly like a
> physical-SError as if EL3 wasn't implemented.
> Your CPU has to handle cases where it can't deliver an SError, your emulation
> has to do the same.
>
> This is not something any OS can work around.
>
>
>>> Answers to these let us determine whether a bug is in the firmware or the
>>> kernel. If firmware is expecting the OS to do something special, I'd like 
>>> to know
>>> about it from the beginning!
>>
>> I know your meaning, thanks for raising it again.
>
>
> Happy new year,
>
> James
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-04-11 Thread Jean-Philippe Brucker
On 23/03/18 08:27, Tian, Kevin wrote:
>>> The host kernel needs to have *some* MSI region in place before the
>>> guest can start configuring interrupts, otherwise it won't know what
>>> address to give to the underlying hardware. However, as soon as the host
>>> kernel has picked a region, host userspace needs to know that it can no
>>> longer use addresses in that region for DMA-able guest memory. It's a
>>> lot easier when the address is fixed in hardware and the host userspace
>>> will never be stupid enough to try and VFIO_IOMMU_DMA_MAP it, but in
>>> the
>>> more general case where MSI writes undergo IOMMU address translation
>>> so
>>> it's an arbitrary IOVA, this has the potential to conflict with stuff
>>> like guest memory hotplug.
>>>
>>> What we currently have is just the simplest option, with the host kernel
>>> just picking something up-front and pretending to host userspace that
>>> it's a fixed hardware address. There's certainly scope for it to be a
>>> bit more dynamic in the sense of adding an interface to let userspace
>>> move it around (before attaching any devices, at least), but I don't
>>> think it's feasible for the host kernel to second-guess userspace enough
>>> to make it entirely transparent like it is in the DMA API domain case.
>>>
>>> Of course, that's all assuming the host itself is using a virtio-iommu
>>> (e.g. in a nested virt or emulation scenario). When it's purely within a
>>> guest then an MSI reservation shouldn't matter so much, since the guest
>>> won't be anywhere near the real hardware configuration anyway.
>>>
>>> Robin.
>>
>> Curious since anyway we are defining a new iommu architecture
>> is it possible to avoid those ARM-specific burden completely?
>>
> 
> OK, after some study around those tricks below is my learning:
> 
> - MSI_IOVA window is used only on request (iommu_dma_get
> _msi_page), not meant to take effect on all architectures once 
> initialized. e.g. ARM GIC does it but not x86. So it is reasonable 
> for virtio-iommu driver to implement such capability;
> 
> - I thought whether hardware MSI doorbell can be always reported
> on virtio-iommu since it's newly defined. Looks there is a problem
> if underlying IOMMU is sw-managed MSI style - valid mapping is
> expected in all level of translations, meaning guest has to manage
> stage-1 mapping in nested configuration since stage-1 is owned
> by guest. 
> 
> Then virtio-iommu is naturally expected to report the same MSI 
> model as supported by underlying hardware. Below are some
> further thoughts along this route (use 'IOMMU' to represent the
> physical one and 'virtio-iommu' for virtual one):
> 
> 
> 
> In the scope of current virtio-iommu spec v.6, there is no nested
> consideration yet. Guest driver is expected to use MAP/UNMAP
> interface on assigned endpoints. In this case the MAP requests
> (IOVA->GPA) is caught and maintained within Qemu which then 
> further talks to VFIO to map IOVA->HPA in IOMMU.
> 
> Qemu can learn the MSI model of IOMMU from sysfs.
> 
> For hardware MSI doorbell (x86 and some ARM):
> * Host kernel reports to Qemu as IOMMU_RESV_MSI
> * Qemu report to guest as VIRTIO_IOMMU_RESV_MEM_T_MSI
> * Guest takes the range as IOMMU_RESV_MSI. reserved
> * Qemu MAP database has no mapping for the doorbell
> * Physical IOMMU page table has no mapping for the doorbell
> * MSI from passthrough device bypass IOMMU
> * MSI from emulated device bypass virtio-iommu
> 
> For software MSI doorbell (most ARM):
> * Host kernel reports to Qemu as IOMMU_RESV_SW_MSI
> * Qemu report to guest as VIRTIO_IOMMU_RESV_MEM_T_RESERVED
> * Guest takes the range as IOMMU_RESV_RESERVED
> * vGIC requests to map 'GPA of the virtual doorbell'
> * a map request (IOVA->GPA) sent on endpoint
> * Qemu maintains the mapping in MAP database
>   * but no VFIO_MAP request since it's purely virtual
> * GIC requests to map 'HPA of the physical doorbell'
>   * e.g. triggered by VFIO enable msi
> * IOMMU now includes a valid mapping (IOVA->HPA)
> * MSI from emulated device go through Qemu MAP
> database (IOVA->'GPA of virtual doorbell') and then hit vGIC
> * MSI from passthrough device go through IOMMU
> (IOVA->'HPA of physical doorbell') and then hit GIC
> 
> In this case, host doorbell is treated as reserved resource in
> guest side. Guest has its own sw-management for virtual
> doorbell which is only used for emulated device. two paths 
> are completely separated.
> 
> If above captures the right flow, current v0.6 spec is complete
> regarding to required function definition.

Yes I think this summarizes well the current state or SW/HW MSI

> Then comes nested case, with two level page tables (stage-1
> and stage-2) in IOMMU. stage-1 is for IOVA->GPA and stage-2
> is for GPA->HPA. VFIO map/unmap happens on stage-2, 
> while stage-1 is directly managed by guest (and bound to
> IOMMU which enables nested translation from IOVA->GPA
> ->HPA).
> 
> For hardware MSI, there is nothing special compared to
> 

Re: [PATCH 2/4] iommu/virtio: Add probe request

2018-04-11 Thread Jean-Philippe Brucker
On 23/03/18 15:00, Robin Murphy wrote:
[...]
>> +/*
>> + * Treat unknown subtype as RESERVED, but urge users to update their
>> + * driver.
>> + */
>> +if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
>> +mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI)
>> +pr_warn("unknown resv mem subtype 0x%x\n", mem->subtype);
> 
> Might as well avoid the extra comparisons by incorporating this into the 
> switch statement, i.e.:
> 
>   default:
>   dev_warn(vdev->viommu_dev->dev, ...);
>   /* Fallthrough */
>   case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
>   ...
> 
> (dev_warn is generally preferable to pr_warn when feasible)

Alright, that's nicer

[...]
>>  /*
>>   * Last step creates a default domain and attaches to it. Everything
>>   * must be ready.
>> @@ -735,7 +849,19 @@ static int viommu_add_device(struct device *dev)
>>   
>>   static void viommu_remove_device(struct device *dev)
>>   {
>> -kfree(dev->iommu_fwspec->iommu_priv);
>> +struct viommu_endpoint *vdev;
>> +struct iommu_resv_region *entry, *next;
>> +struct iommu_fwspec *fwspec = dev->iommu_fwspec;
>> +
>> +if (!fwspec || fwspec->ops != _ops)
>> +return;
> 
> Oh good :) I guess that was just a patch-splitting issue. The group 
> thing still applies, though.

Ok

Thanks,
Jean
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 1/4] iommu: Add virtio-iommu driver

2018-04-11 Thread Jean-Philippe Brucker
On 23/03/18 14:48, Robin Murphy wrote:
[..]
>> + * Virtio driver for the paravirtualized IOMMU
>> + *
>> + * Copyright (C) 2018 ARM Limited
>> + * Author: Jean-Philippe Brucker 
>> + *
>> + * SPDX-License-Identifier: GPL-2.0
> 
> This wants to be a // comment at the very top of the file (thankfully 
> the policy is now properly documented in-tree since 
> Documentation/process/license-rules.rst got merged)

Ok

[...]
>> +
>> +/*
>> + * viommu_del_mappings - remove mappings from the internal tree
>> + *
>> + * @vdomain: the domain
>> + * @iova: start of the range
>> + * @size: size of the range. A size of 0 corresponds to the entire address
>> + *  space.
>> + * @out_mapping: if not NULL, the first removed mapping is returned in 
>> there.
>> + *  This allows the caller to reuse the buffer for the unmap request. When
>> + *  the returned size is greater than zero, if a mapping is returned, the
>> + *  caller must free it.
> 
> This "free multiple mappings except maybe hand one of them off to the 
> caller" interface is really unintuitive. AFAICS it's only used by 
> viommu_unmap() to grab mapping->req, but that doesn't seem to care about 
> mapping itself, so I wonder whether it wouldn't make more sense to just 
> have a global kmem_cache of struct virtio_iommu_req_unmap for that and 
> avoid a lot of complexity...

Well it's a small complication for what I hoped would be a meanignful
performance difference, but more below.

>> +
>> +/* IOMMU API */
>> +
>> +static bool viommu_capable(enum iommu_cap cap)
>> +{
>> +return false;
>> +}
> 
> The .capable callback is optional, so it's only worth implementing once 
> you want it to do something beyond the default behaviour.
> 

Ah, right

[...]
>> +static size_t viommu_unmap(struct iommu_domain *domain, unsigned long iova,
>> +   size_t size)
>> +{
>> +int ret = 0;
>> +size_t unmapped;
>> +struct viommu_mapping *mapping = NULL;
>> +struct viommu_domain *vdomain = to_viommu_domain(domain);
>> +
>> +unmapped = viommu_del_mappings(vdomain, iova, size, );
>> +if (unmapped < size) {
>> +ret = -EINVAL;
>> +goto out_free;
>> +}
>> +
>> +/* Device already removed all mappings after detach. */
>> +if (!vdomain->endpoints)
>> +goto out_free;
>> +
>> +if (WARN_ON(!mapping))
>> +return 0;
>> +
>> +mapping->req.unmap = (struct virtio_iommu_req_unmap) {
>> +.head.type  = VIRTIO_IOMMU_T_UNMAP,
>> +.domain = cpu_to_le32(vdomain->id),
>> +.virt_start = cpu_to_le64(iova),
>> +.virt_end   = cpu_to_le64(iova + unmapped - 1),
>> +};
> 
> ...In fact, the kmem_cache idea might be moot since it looks like with a 
> bit of restructuring you could get away with just a single per-viommu 
> virtio_iommu_req_unmap structure; this lot could be passed around on the 
> stack until request_lock is taken, at which point it would be copied 
> into the 'real' DMA-able structure. The equivalent might apply to 
> viommu_map() too - now that I'm looking at it, it seems like it would 
> take pretty minimal effort to encapsulate the whole business cleanly in 
> viommu_send_req_sync(), which could do something like this instead of 
> going through viommu_send_reqs_sync():
> 
>   ...
>   spin_lock_irqsave(>request_lock, flags);
>   viommu_copy_req(viommu->dma_req, req);
>   ret = _viommu_send_reqs_sync(viommu, viommu->dma_req, 1, );
>   spin_unlock_irqrestore(>request_lock, flags);
>   ...

I'll have to come back to that sorry, still conducting some experiments
with map/unmap.

I'd rather avoid introducing a single dma_req per viommu, because I'd
like to move to the iotlb_range_add/iotlb_sync interface as soon as
possible, and the request logic changes a lot when multiple threads are
susceptible to interleave map/unmap requests.

I ran some tests, and adding a kmem_cache (or simply using kmemdup, it
doesn't make a noticeable difference at our scale) reduces netperf
stream/maerts throughput by 1.1%/1.4% (+/- 0.5% over 30 tests). That's
for a virtio-net device (1 tx/rx vq), and with a vfio device the
difference isn't measurable. At this point I'm not fussy about such
small difference, so don't mind simplifying viommu_del_mapping.

[...]
>> +/*
>> + * Last step creates a default domain and attaches to it. Everything
>> + * must be ready.
>> + */
>> +group = iommu_group_get_for_dev(dev);
>> +if (!IS_ERR(group))
>> +iommu_group_put(group);
> 
> Since you create the sysfs IOMMU device, maybe also create the links for 
> the masters?

Ok

>> +
>> +return PTR_ERR_OR_ZERO(group);
>> +}
>> +
>> +static void viommu_remove_device(struct device *dev)
>> +{
> 
> You need to remove dev from its group, too (basically, .remove_device 
> should always undo everything .add_device did)
> 
> It would also be good practice to verify that 

Re: [PATCH] vhost: Fix vhost_copy_to_user()

2018-04-11 Thread Jason Wang



On 2018年04月11日 21:30, Eric Auger wrote:

vhost_copy_to_user is used to copy vring used elements to userspace.
We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.

Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache")
Signed-off-by: Eric Auger 

---

This fixes a stall observed when running an aarch64 guest with
virtual smmu
---
  drivers/vhost/vhost.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index bec722e..f44aead 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -744,7 +744,7 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, 
void __user *to,
struct iov_iter t;
void __user *uaddr = vhost_vq_meta_fetch(vq,
 (u64)(uintptr_t)to, size,
-VHOST_ADDR_DESC);
+VHOST_ADDR_USED);
  
  		if (uaddr)

return __copy_to_user(uaddr, from, size);


Acked-by: Jason Wang 

Thanks!

Stable material I think.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] vhost: Fix vhost_copy_to_user()

2018-04-11 Thread Michael S. Tsirkin
On Wed, Apr 11, 2018 at 03:30:38PM +0200, Eric Auger wrote:
> vhost_copy_to_user is used to copy vring used elements to userspace.
> We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.
> 
> Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache")
> Signed-off-by: Eric Auger 

Acked-by: Michael S. Tsirkin 

> ---
> 
> This fixes a stall observed when running an aarch64 guest with
> virtual smmu
> ---
>  drivers/vhost/vhost.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index bec722e..f44aead 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -744,7 +744,7 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, 
> void __user *to,
>   struct iov_iter t;
>   void __user *uaddr = vhost_vq_meta_fetch(vq,
>(u64)(uintptr_t)to, size,
> -  VHOST_ADDR_DESC);
> +  VHOST_ADDR_USED);
>  
>   if (uaddr)
>   return __copy_to_user(uaddr, from, size);
> -- 
> 2.5.5
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] vhost: Fix vhost_copy_to_user()

2018-04-11 Thread Auger Eric
Hi Jason,

On 11/04/18 15:44, Jason Wang wrote:
> 
> 
> On 2018年04月11日 21:30, Eric Auger wrote:
>> vhost_copy_to_user is used to copy vring used elements to userspace.
>> We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.
>>
>> Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache")
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> This fixes a stall observed when running an aarch64 guest with
>> virtual smmu
>> ---
>>   drivers/vhost/vhost.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index bec722e..f44aead 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -744,7 +744,7 @@ static int vhost_copy_to_user(struct
>> vhost_virtqueue *vq, void __user *to,
>>   struct iov_iter t;
>>   void __user *uaddr = vhost_vq_meta_fetch(vq,
>>(u64)(uintptr_t)to, size,
>> - VHOST_ADDR_DESC);
>> + VHOST_ADDR_USED);
>> if (uaddr)
>>   return __copy_to_user(uaddr, from, size);
> 
> Acked-by: Jason Wang 
> 
> Thanks!
> 
> Stable material I think.

yes I think so.

Thanks

Eric
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] vhost: Fix vhost_copy_to_user()

2018-04-11 Thread Eric Auger
vhost_copy_to_user is used to copy vring used elements to userspace.
We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.

Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache")
Signed-off-by: Eric Auger 

---

This fixes a stall observed when running an aarch64 guest with
virtual smmu
---
 drivers/vhost/vhost.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index bec722e..f44aead 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -744,7 +744,7 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, 
void __user *to,
struct iov_iter t;
void __user *uaddr = vhost_vq_meta_fetch(vq,
 (u64)(uintptr_t)to, size,
-VHOST_ADDR_DESC);
+VHOST_ADDR_USED);
 
if (uaddr)
return __copy_to_user(uaddr, from, size);
-- 
2.5.5

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm