Re: [PATCH v9 3/7] acpi: apei: Add SEI notification type support for ARMv8
Dear James, Thanks for this mail and sorry for my late response. 2018-02-16 1:55 GMT+08:00 James Morse: > Hi gengdongjiu, liu jun > > On 05/02/18 11:24, gengdongjiu wrote: [] >> >>> Is the emulated SError routed following the routing rules for HCR_EL2.{AMO, >>> TGE}? >> >> Yes, it is. > > ... and yet ... > > >>> What does your firmware do when it wants to emulate SError but its masked? >>> (e.g.1: The physical-SError interrupted EL2 and the SPSR shows EL2 had >>> PSTATE.A set. >>> e.g.2: The physical-SError interrupted EL2 but HCR_EL2 indicates the >>> emulated SError should go to EL1. This effectively masks SError.) >> >> Currently we does not consider much about the mask status(SPSR). > > .. this is a problem. > > If you ignore SPSR_EL3 you may deliver an SError to EL1 when the exception > interrupted EL2. Even if you setup the EL1 register correctly, EL1 can't eret > to > EL2. This should never happen, SError is effectively masked if you are running > at an EL higher than the one its routed to. > > More obviously: if the exception came from the EL that SError should be routed > to, but PSTATE.A was set, you can't deliver SError. Masking SError is the only James, I summarized the masking and routing rules for SError to confirm with you for the firmware first solution, 1. If the HCR_EL2.{AMO,TGE} is set, which means the SError should route to EL2, When system happens SError and trap to EL3, If EL3 find HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both set, and find this SError come from EL2, it will not deliver an SError: store the RAS error in the BERT and 'reboot'; but if it find that this SError come from EL1 or EL0, it also need to deliver an SError, right? 2. If the HCR_EL2.{AMO,TGE} is not set, which means the SError should route to EL1, When system happens SError and trap to EL3, If EL3 find HCR_EL2.{AMO,TGE} and SPSR_EL3.A are both not set, and find this SError come from EL1, it will not deliver an SError: store the RAS error in the BERT and 'reboot'; but if it find that this SError come from EL0, it also need to deliver an SError, right? > way the OS has to indicate it can't take an exception right now. VBAR_EL1 may > be > 'wrong' if we're doing some power-management, the FAR/ELR/ESR/SPSR registers > may > contain live values that the OS would lose if you deliver another exception > over > the top. > > If you deliver an emulated-SError as the OS eret's, your new ELR will point at > the eret instruction and the CPU will spin on this instruction forever. > > You have to honour the masking and routing rules for SError, otherwise no OS > can > run safely with this firmware. > > >> I remember that you ever suggested firmware should reboot if the mask status >> is set(SPSR), right? > > Yes, this is my suggestion of what to do if you can't deliver an SError: store > the RAS error in the BERT and 'reboot'. > > >> I CC "liu jun" who is our UEFI firmware Architect, >> if you have firmware requirements, you can raise again. > > (UEFI? I didn't think there was any of that at EL3, but I'm not familiar with > all the 'PI' bits). > > The requirement is your emulated-SError from EL3 looks exactly like a > physical-SError as if EL3 wasn't implemented. > Your CPU has to handle cases where it can't deliver an SError, your emulation > has to do the same. > > This is not something any OS can work around. > > >>> Answers to these let us determine whether a bug is in the firmware or the >>> kernel. If firmware is expecting the OS to do something special, I'd like >>> to know >>> about it from the beginning! >> >> I know your meaning, thanks for raising it again. > > > Happy new year, > > James > ___ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/4] iommu: Add virtio-iommu driver
On 23/03/18 08:27, Tian, Kevin wrote: >>> The host kernel needs to have *some* MSI region in place before the >>> guest can start configuring interrupts, otherwise it won't know what >>> address to give to the underlying hardware. However, as soon as the host >>> kernel has picked a region, host userspace needs to know that it can no >>> longer use addresses in that region for DMA-able guest memory. It's a >>> lot easier when the address is fixed in hardware and the host userspace >>> will never be stupid enough to try and VFIO_IOMMU_DMA_MAP it, but in >>> the >>> more general case where MSI writes undergo IOMMU address translation >>> so >>> it's an arbitrary IOVA, this has the potential to conflict with stuff >>> like guest memory hotplug. >>> >>> What we currently have is just the simplest option, with the host kernel >>> just picking something up-front and pretending to host userspace that >>> it's a fixed hardware address. There's certainly scope for it to be a >>> bit more dynamic in the sense of adding an interface to let userspace >>> move it around (before attaching any devices, at least), but I don't >>> think it's feasible for the host kernel to second-guess userspace enough >>> to make it entirely transparent like it is in the DMA API domain case. >>> >>> Of course, that's all assuming the host itself is using a virtio-iommu >>> (e.g. in a nested virt or emulation scenario). When it's purely within a >>> guest then an MSI reservation shouldn't matter so much, since the guest >>> won't be anywhere near the real hardware configuration anyway. >>> >>> Robin. >> >> Curious since anyway we are defining a new iommu architecture >> is it possible to avoid those ARM-specific burden completely? >> > > OK, after some study around those tricks below is my learning: > > - MSI_IOVA window is used only on request (iommu_dma_get > _msi_page), not meant to take effect on all architectures once > initialized. e.g. ARM GIC does it but not x86. So it is reasonable > for virtio-iommu driver to implement such capability; > > - I thought whether hardware MSI doorbell can be always reported > on virtio-iommu since it's newly defined. Looks there is a problem > if underlying IOMMU is sw-managed MSI style - valid mapping is > expected in all level of translations, meaning guest has to manage > stage-1 mapping in nested configuration since stage-1 is owned > by guest. > > Then virtio-iommu is naturally expected to report the same MSI > model as supported by underlying hardware. Below are some > further thoughts along this route (use 'IOMMU' to represent the > physical one and 'virtio-iommu' for virtual one): > > > > In the scope of current virtio-iommu spec v.6, there is no nested > consideration yet. Guest driver is expected to use MAP/UNMAP > interface on assigned endpoints. In this case the MAP requests > (IOVA->GPA) is caught and maintained within Qemu which then > further talks to VFIO to map IOVA->HPA in IOMMU. > > Qemu can learn the MSI model of IOMMU from sysfs. > > For hardware MSI doorbell (x86 and some ARM): > * Host kernel reports to Qemu as IOMMU_RESV_MSI > * Qemu report to guest as VIRTIO_IOMMU_RESV_MEM_T_MSI > * Guest takes the range as IOMMU_RESV_MSI. reserved > * Qemu MAP database has no mapping for the doorbell > * Physical IOMMU page table has no mapping for the doorbell > * MSI from passthrough device bypass IOMMU > * MSI from emulated device bypass virtio-iommu > > For software MSI doorbell (most ARM): > * Host kernel reports to Qemu as IOMMU_RESV_SW_MSI > * Qemu report to guest as VIRTIO_IOMMU_RESV_MEM_T_RESERVED > * Guest takes the range as IOMMU_RESV_RESERVED > * vGIC requests to map 'GPA of the virtual doorbell' > * a map request (IOVA->GPA) sent on endpoint > * Qemu maintains the mapping in MAP database > * but no VFIO_MAP request since it's purely virtual > * GIC requests to map 'HPA of the physical doorbell' > * e.g. triggered by VFIO enable msi > * IOMMU now includes a valid mapping (IOVA->HPA) > * MSI from emulated device go through Qemu MAP > database (IOVA->'GPA of virtual doorbell') and then hit vGIC > * MSI from passthrough device go through IOMMU > (IOVA->'HPA of physical doorbell') and then hit GIC > > In this case, host doorbell is treated as reserved resource in > guest side. Guest has its own sw-management for virtual > doorbell which is only used for emulated device. two paths > are completely separated. > > If above captures the right flow, current v0.6 spec is complete > regarding to required function definition. Yes I think this summarizes well the current state or SW/HW MSI > Then comes nested case, with two level page tables (stage-1 > and stage-2) in IOMMU. stage-1 is for IOVA->GPA and stage-2 > is for GPA->HPA. VFIO map/unmap happens on stage-2, > while stage-1 is directly managed by guest (and bound to > IOMMU which enables nested translation from IOVA->GPA > ->HPA). > > For hardware MSI, there is nothing special compared to >
Re: [PATCH 2/4] iommu/virtio: Add probe request
On 23/03/18 15:00, Robin Murphy wrote: [...] >> +/* >> + * Treat unknown subtype as RESERVED, but urge users to update their >> + * driver. >> + */ >> +if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED && >> +mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI) >> +pr_warn("unknown resv mem subtype 0x%x\n", mem->subtype); > > Might as well avoid the extra comparisons by incorporating this into the > switch statement, i.e.: > > default: > dev_warn(vdev->viommu_dev->dev, ...); > /* Fallthrough */ > case VIRTIO_IOMMU_RESV_MEM_T_RESERVED: > ... > > (dev_warn is generally preferable to pr_warn when feasible) Alright, that's nicer [...] >> /* >> * Last step creates a default domain and attaches to it. Everything >> * must be ready. >> @@ -735,7 +849,19 @@ static int viommu_add_device(struct device *dev) >> >> static void viommu_remove_device(struct device *dev) >> { >> -kfree(dev->iommu_fwspec->iommu_priv); >> +struct viommu_endpoint *vdev; >> +struct iommu_resv_region *entry, *next; >> +struct iommu_fwspec *fwspec = dev->iommu_fwspec; >> + >> +if (!fwspec || fwspec->ops != _ops) >> +return; > > Oh good :) I guess that was just a patch-splitting issue. The group > thing still applies, though. Ok Thanks, Jean ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/4] iommu: Add virtio-iommu driver
On 23/03/18 14:48, Robin Murphy wrote: [..] >> + * Virtio driver for the paravirtualized IOMMU >> + * >> + * Copyright (C) 2018 ARM Limited >> + * Author: Jean-Philippe Brucker>> + * >> + * SPDX-License-Identifier: GPL-2.0 > > This wants to be a // comment at the very top of the file (thankfully > the policy is now properly documented in-tree since > Documentation/process/license-rules.rst got merged) Ok [...] >> + >> +/* >> + * viommu_del_mappings - remove mappings from the internal tree >> + * >> + * @vdomain: the domain >> + * @iova: start of the range >> + * @size: size of the range. A size of 0 corresponds to the entire address >> + * space. >> + * @out_mapping: if not NULL, the first removed mapping is returned in >> there. >> + * This allows the caller to reuse the buffer for the unmap request. When >> + * the returned size is greater than zero, if a mapping is returned, the >> + * caller must free it. > > This "free multiple mappings except maybe hand one of them off to the > caller" interface is really unintuitive. AFAICS it's only used by > viommu_unmap() to grab mapping->req, but that doesn't seem to care about > mapping itself, so I wonder whether it wouldn't make more sense to just > have a global kmem_cache of struct virtio_iommu_req_unmap for that and > avoid a lot of complexity... Well it's a small complication for what I hoped would be a meanignful performance difference, but more below. >> + >> +/* IOMMU API */ >> + >> +static bool viommu_capable(enum iommu_cap cap) >> +{ >> +return false; >> +} > > The .capable callback is optional, so it's only worth implementing once > you want it to do something beyond the default behaviour. > Ah, right [...] >> +static size_t viommu_unmap(struct iommu_domain *domain, unsigned long iova, >> + size_t size) >> +{ >> +int ret = 0; >> +size_t unmapped; >> +struct viommu_mapping *mapping = NULL; >> +struct viommu_domain *vdomain = to_viommu_domain(domain); >> + >> +unmapped = viommu_del_mappings(vdomain, iova, size, ); >> +if (unmapped < size) { >> +ret = -EINVAL; >> +goto out_free; >> +} >> + >> +/* Device already removed all mappings after detach. */ >> +if (!vdomain->endpoints) >> +goto out_free; >> + >> +if (WARN_ON(!mapping)) >> +return 0; >> + >> +mapping->req.unmap = (struct virtio_iommu_req_unmap) { >> +.head.type = VIRTIO_IOMMU_T_UNMAP, >> +.domain = cpu_to_le32(vdomain->id), >> +.virt_start = cpu_to_le64(iova), >> +.virt_end = cpu_to_le64(iova + unmapped - 1), >> +}; > > ...In fact, the kmem_cache idea might be moot since it looks like with a > bit of restructuring you could get away with just a single per-viommu > virtio_iommu_req_unmap structure; this lot could be passed around on the > stack until request_lock is taken, at which point it would be copied > into the 'real' DMA-able structure. The equivalent might apply to > viommu_map() too - now that I'm looking at it, it seems like it would > take pretty minimal effort to encapsulate the whole business cleanly in > viommu_send_req_sync(), which could do something like this instead of > going through viommu_send_reqs_sync(): > > ... > spin_lock_irqsave(>request_lock, flags); > viommu_copy_req(viommu->dma_req, req); > ret = _viommu_send_reqs_sync(viommu, viommu->dma_req, 1, ); > spin_unlock_irqrestore(>request_lock, flags); > ... I'll have to come back to that sorry, still conducting some experiments with map/unmap. I'd rather avoid introducing a single dma_req per viommu, because I'd like to move to the iotlb_range_add/iotlb_sync interface as soon as possible, and the request logic changes a lot when multiple threads are susceptible to interleave map/unmap requests. I ran some tests, and adding a kmem_cache (or simply using kmemdup, it doesn't make a noticeable difference at our scale) reduces netperf stream/maerts throughput by 1.1%/1.4% (+/- 0.5% over 30 tests). That's for a virtio-net device (1 tx/rx vq), and with a vfio device the difference isn't measurable. At this point I'm not fussy about such small difference, so don't mind simplifying viommu_del_mapping. [...] >> +/* >> + * Last step creates a default domain and attaches to it. Everything >> + * must be ready. >> + */ >> +group = iommu_group_get_for_dev(dev); >> +if (!IS_ERR(group)) >> +iommu_group_put(group); > > Since you create the sysfs IOMMU device, maybe also create the links for > the masters? Ok >> + >> +return PTR_ERR_OR_ZERO(group); >> +} >> + >> +static void viommu_remove_device(struct device *dev) >> +{ > > You need to remove dev from its group, too (basically, .remove_device > should always undo everything .add_device did) > > It would also be good practice to verify that
Re: [PATCH] vhost: Fix vhost_copy_to_user()
On 2018年04月11日 21:30, Eric Auger wrote: vhost_copy_to_user is used to copy vring used elements to userspace. We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC. Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") Signed-off-by: Eric Auger--- This fixes a stall observed when running an aarch64 guest with virtual smmu --- drivers/vhost/vhost.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index bec722e..f44aead 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -744,7 +744,7 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, void __user *to, struct iov_iter t; void __user *uaddr = vhost_vq_meta_fetch(vq, (u64)(uintptr_t)to, size, -VHOST_ADDR_DESC); +VHOST_ADDR_USED); if (uaddr) return __copy_to_user(uaddr, from, size); Acked-by: Jason Wang Thanks! Stable material I think. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] vhost: Fix vhost_copy_to_user()
On Wed, Apr 11, 2018 at 03:30:38PM +0200, Eric Auger wrote: > vhost_copy_to_user is used to copy vring used elements to userspace. > We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC. > > Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") > Signed-off-by: Eric AugerAcked-by: Michael S. Tsirkin > --- > > This fixes a stall observed when running an aarch64 guest with > virtual smmu > --- > drivers/vhost/vhost.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index bec722e..f44aead 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -744,7 +744,7 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, > void __user *to, > struct iov_iter t; > void __user *uaddr = vhost_vq_meta_fetch(vq, >(u64)(uintptr_t)to, size, > - VHOST_ADDR_DESC); > + VHOST_ADDR_USED); > > if (uaddr) > return __copy_to_user(uaddr, from, size); > -- > 2.5.5 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] vhost: Fix vhost_copy_to_user()
Hi Jason, On 11/04/18 15:44, Jason Wang wrote: > > > On 2018年04月11日 21:30, Eric Auger wrote: >> vhost_copy_to_user is used to copy vring used elements to userspace. >> We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC. >> >> Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") >> Signed-off-by: Eric Auger>> >> --- >> >> This fixes a stall observed when running an aarch64 guest with >> virtual smmu >> --- >> drivers/vhost/vhost.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c >> index bec722e..f44aead 100644 >> --- a/drivers/vhost/vhost.c >> +++ b/drivers/vhost/vhost.c >> @@ -744,7 +744,7 @@ static int vhost_copy_to_user(struct >> vhost_virtqueue *vq, void __user *to, >> struct iov_iter t; >> void __user *uaddr = vhost_vq_meta_fetch(vq, >>(u64)(uintptr_t)to, size, >> - VHOST_ADDR_DESC); >> + VHOST_ADDR_USED); >> if (uaddr) >> return __copy_to_user(uaddr, from, size); > > Acked-by: Jason Wang > > Thanks! > > Stable material I think. yes I think so. Thanks Eric ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] vhost: Fix vhost_copy_to_user()
vhost_copy_to_user is used to copy vring used elements to userspace. We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC. Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache") Signed-off-by: Eric Auger--- This fixes a stall observed when running an aarch64 guest with virtual smmu --- drivers/vhost/vhost.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index bec722e..f44aead 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -744,7 +744,7 @@ static int vhost_copy_to_user(struct vhost_virtqueue *vq, void __user *to, struct iov_iter t; void __user *uaddr = vhost_vq_meta_fetch(vq, (u64)(uintptr_t)to, size, -VHOST_ADDR_DESC); +VHOST_ADDR_USED); if (uaddr) return __copy_to_user(uaddr, from, size); -- 2.5.5 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm