Re: [PATCH v2 2/5] iommu/mediatek: Always check runtime PM status in tlb flush range callback
On Wed, 2021-12-08 at 14:07 +0200, Dafna Hirschfeld wrote: > From: Sebastian Reichel > > In case of v4l2_reqbufs() it is possible, that a TLB flush is done > without runtime PM being enabled. In that case the "Partial TLB flush > timed out, falling back to full flush" warning is printed. > > Commit c0b57581b73b ("iommu/mediatek: Add power-domain operation") > introduced has_pm as optimization to avoid checking runtime PM > when there is no power domain attached. But without the PM domain > there is still the device driver's runtime PM suspend handler, which > disables the clock. Thus flushing should also be avoided when there > is no PM domain involved. > > Signed-off-by: Sebastian Reichel > Reviewed-by: Dafna Hirschfeld Reviewed-by: Yong Wu > --- > drivers/iommu/mtk_iommu.c | 10 +++--- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c > index 342aa562ab6a..dd2c08c54df4 100644 > --- a/drivers/iommu/mtk_iommu.c > +++ b/drivers/iommu/mtk_iommu.c > @@ -225,16 +225,13 @@ static void > mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size, > size_t granule, > struct mtk_iommu_data *data) > { > - bool has_pm = !!data->dev->pm_domain; > unsigned long flags; > int ret; > u32 tmp; > > for_each_m4u(data) { > - if (has_pm) { > - if (pm_runtime_get_if_in_use(data->dev) <= 0) > - continue; > - } > + if (pm_runtime_get_if_in_use(data->dev) <= 0) > + continue; > > spin_lock_irqsave(>tlb_lock, flags); > writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, > @@ -259,8 +256,7 @@ static void > mtk_iommu_tlb_flush_range_sync(unsigned long iova, size_t size, > writel_relaxed(0, data->base + REG_MMU_CPE_DONE); > spin_unlock_irqrestore(>tlb_lock, flags); > > - if (has_pm) > - pm_runtime_put(data->dev); > + pm_runtime_put(data->dev); > } > } > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
> From: Jason Gunthorpe > Sent: Monday, December 13, 2021 7:37 AM > > On Sun, Dec 12, 2021 at 09:55:32PM +0100, Thomas Gleixner wrote: > > Kevin, > > > > On Sun, Dec 12 2021 at 01:56, Kevin Tian wrote: > > >> From: Thomas Gleixner > > >> All I can find is drivers/iommu/virtio-iommu.c but I can't find anything > > >> vIR related there. > > > > > > Well, virtio-iommu is a para-virtualized vIOMMU implementations. > > > > > > In reality there are also fully emulated vIOMMU implementations (e.g. > > > Qemu fully emulates Intel/AMD/ARM IOMMUs). In those configurations > > > the IR logic in existing iommu drivers just apply: > > > > > > drivers/iommu/intel/irq_remapping.c > > > drivers/iommu/amd/iommu.c > > > > thanks for the explanation. So that's a full IOMMU emulation. I was more > > expecting a paravirtualized lightweight one. > > Kevin can you explain what on earth vIR is for and how does it work?? > > Obviously we don't expose the IR machinery to userspace, so at best > this is somehow changing what the MSI trap does? > Initially it was introduced for supporting more than 255 vCPUs. Due to full emulation this capability can certainly support other vIR usages as observed on bare metal. vIR doesn't rely on the physical IR presence. First if the guest doesn't have vfio device then the physical capability doesn't matter. Even with vfio device, IR by definition is just about remapping instead of injection (talk about this part later). The interrupts are always routed to the host handler first (vfio_msihandler() in this case), which then triggers irqfd to call virtual interrupt injection handler (irqfd_wakeup()) in kvm. This suggests a clear role split between vfio and kvm: - vfio is responsible for irq allocation/startup as it is the device driver; - kvm takes care of virtual interrupt injection, being a VMM; The two are connected via irqfd. Following this split vIR information is completely hidden in userspace. Qemu calculates the routing information between vGSI and vCPU (with or without vIR, and for whatever trapped interrupt storages) and then registers it to kvm. When kvm receives a notification via irqfd, it follows irqfd->vGSI->vCPU and injects a virtual interrupt into the target vCPU. Then comes an interesting scenario about IOMMU posted interrupt (PI). This capability allows the IR engine directly converting a physical interrupt into virtual and then inject it into the guest. Kinda offloading the virtual routing information into the hardware. This is currently achieved via IRQ bypass manager, which helps connect vfio (IRQ producer) to kvm (IRQ consumer) around a specific Linux irq number. Once the connection is established, kvm calls irq_set_vcpu_affinity() to update IRTE with virtual routing information for that irq number. With that design Qemu doesn't know whether IR or PI is enabled physically. It always talks to vfio for having IRQ resource allocated and to kvm for registering virtual routing information. Then adding the new hypercall machinery into this picture: 1) The hypercall needs to carry all necessary virtual routing information due to no-trap; 2) Before querying IRTE data/pair, Qemu needs to complete necessary operations as of today to have IRTE ready: a) Register irqfd and related GSI routing info to kvm b) Allocates/startup IRQs via vfio; When PI is enabled, IRTE is ready only after both are completed. 3) Qemu gets IRTE data/pair from kernel and return to the guest. Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V7 5/5] net: netvsc: Add Isolation VM support for netvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ pagebuffer() stills need to be handled. Use DMA API to map/umap these memory during sending/receiving packet and Hyper-V swiotlb bounce buffer dma address will be returned. The swiotlb bounce buffer has been masked to be visible to host during boot up. rx/tx ring buffer is allocated via vzalloc() and they need to be mapped into unencrypted address space(above vTOM) before sharing with host and accessing. Add hv_map/unmap_memory() to map/umap rx /tx ring buffer. Signed-off-by: Tianyu Lan --- Change since v3: * Replace HV_HYP_PAGE_SIZE with PAGE_SIZE and virt_to_hvpfn() with vmalloc_to_pfn() in the hv_map_memory() Change since v2: * Add hv_map/unmap_memory() to map/umap rx/tx ring buffer. --- arch/x86/hyperv/ivm.c | 28 ++ drivers/hv/hv_common.c| 11 +++ drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 5 ++ 8 files changed, 187 insertions(+), 3 deletions(-) diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c index 69c7a57f3307..2b994117581e 100644 --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -287,3 +287,31 @@ int hv_set_mem_host_visibility(unsigned long kbuffer, int pagecount, bool visibl kfree(pfn_array); return ret; } + +/* + * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM. + */ +void *hv_map_memory(void *addr, unsigned long size) +{ + unsigned long *pfns = kcalloc(size / PAGE_SIZE, + sizeof(unsigned long), GFP_KERNEL); + void *vaddr; + int i; + + if (!pfns) + return NULL; + + for (i = 0; i < size / PAGE_SIZE; i++) + pfns[i] = vmalloc_to_pfn(addr + i * PAGE_SIZE) + + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT); + + vaddr = vmap_pfn(pfns, size / PAGE_SIZE, PAGE_KERNEL_IO); + kfree(pfns); + + return vaddr; +} + +void hv_unmap_memory(void *addr) +{ + vunmap(addr); +} diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c index 7be173a99f27..3c5cb1f70319 100644 --- a/drivers/hv/hv_common.c +++ b/drivers/hv/hv_common.c @@ -295,3 +295,14 @@ u64 __weak hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_s return HV_STATUS_INVALID_PARAMETER; } EXPORT_SYMBOL_GPL(hv_ghcb_hypercall); + +void __weak *hv_map_memory(void *addr, unsigned long size) +{ + return NULL; +} +EXPORT_SYMBOL_GPL(hv_map_memory); + +void __weak hv_unmap_memory(void *addr) +{ +} +EXPORT_SYMBOL_GPL(hv_unmap_memory); diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h index 315278a7cf88..cf69da0e296c 100644 --- a/drivers/net/hyperv/hyperv_net.h +++ b/drivers/net/hyperv/hyperv_net.h @@ -164,6 +164,7 @@ struct hv_netvsc_packet { u32 total_bytes; u32 send_buf_index; u32 total_data_buflen; + struct hv_dma_range *dma_range; }; #define NETVSC_HASH_KEYLEN 40 @@ -1074,6 +1075,7 @@ struct netvsc_device { /* Receive buffer allocated by us but manages by NetVSP */ void *recv_buf; + void *recv_original_buf; u32 recv_buf_size; /* allocated bytes */ struct vmbus_gpadl recv_buf_gpadl_handle; u32 recv_section_cnt; @@ -1082,6 +1084,7 @@ struct netvsc_device { /* Send buffer allocated by us */ void *send_buf; + void *send_original_buf; u32 send_buf_size; struct vmbus_gpadl send_buf_gpadl_handle; u32 send_section_cnt; @@ -1731,4 +1734,6 @@ struct rndis_message { #define RETRY_US_HI1 #define RETRY_MAX 2000/* >10 sec */ +void netvsc_dma_unmap(struct hv_device *hv_dev, + struct hv_netvsc_packet *packet); #endif /* _HYPERV_NET_H */ diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 396bc1c204e6..b7ade735a806 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head) int i; kfree(nvdev->extension); - vfree(nvdev->recv_buf); - vfree(nvdev->send_buf); + + if (nvdev->recv_original_buf) { + hv_unmap_memory(nvdev->recv_buf); + vfree(nvdev->recv_original_buf); + } else { + vfree(nvdev->recv_buf); + } + + if (nvdev->send_original_buf) { + hv_unmap_memory(nvdev->send_buf); + vfree(nvdev->send_original_buf); + } else { + vfree(nvdev->send_buf); + } +
[PATCH V7 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver
From: Tianyu Lan In Isolation VM, all shared memory with host needs to mark visible to host via hvcall. vmbus_establish_gpadl() has already done it for storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_ mpb_desc() still needs to be handled. Use DMA API(scsi_dma_map/unmap) to map these memory during sending/receiving packet and return swiotlb bounce buffer dma address. In Isolation VM, swiotlb bounce buffer is marked to be visible to host and the swiotlb force mode is enabled. Set device's dma min align mask to HV_HYP_PAGE_SIZE - 1 in order to keep the original data offset in the bounce buffer. Signed-off-by: Tianyu Lan --- drivers/hv/vmbus_drv.c | 4 drivers/scsi/storvsc_drv.c | 37 + include/linux/hyperv.h | 1 + 3 files changed, 26 insertions(+), 16 deletions(-) diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 392c1ac4f819..ae6ec503399a 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #include "hyperv_vmbus.h" @@ -2078,6 +2079,7 @@ struct hv_device *vmbus_device_create(const guid_t *type, return child_device_obj; } +static u64 vmbus_dma_mask = DMA_BIT_MASK(64); /* * vmbus_device_register - Register the child device */ @@ -2118,6 +2120,8 @@ int vmbus_device_register(struct hv_device *child_device_obj) } hv_debug_add_dev_dir(child_device_obj); + child_device_obj->device.dma_mask = _dma_mask; + child_device_obj->device.dma_parms = _device_obj->dma_parms; return 0; err_kset_unregister: diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index 20595c0ba0ae..ae293600d799 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -21,6 +21,8 @@ #include #include #include +#include + #include #include #include @@ -1336,6 +1338,7 @@ static void storvsc_on_channel_callback(void *context) continue; } request = (struct storvsc_cmd_request *)scsi_cmd_priv(scmnd); + scsi_dma_unmap(scmnd); } storvsc_on_receive(stor_device, packet, request); @@ -1749,7 +1752,6 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) struct hv_host_device *host_dev = shost_priv(host); struct hv_device *dev = host_dev->dev; struct storvsc_cmd_request *cmd_request = scsi_cmd_priv(scmnd); - int i; struct scatterlist *sgl; unsigned int sg_count; struct vmscsi_request *vm_srb; @@ -1831,10 +1833,11 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload_sz = sizeof(cmd_request->mpb); if (sg_count) { - unsigned int hvpgoff, hvpfns_to_add; unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset); unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length); - u64 hvpfn; + struct scatterlist *sg; + unsigned long hvpfn, hvpfns_to_add; + int j, i = 0; if (hvpg_count > MAX_PAGE_BUFFER_COUNT) { @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) payload->range.len = length; payload->range.offset = offset_in_hvpg; + sg_count = scsi_dma_map(scmnd); + if (sg_count < 0) + return SCSI_MLQUEUE_DEVICE_BUSY; - for (i = 0; sgl != NULL; sgl = sg_next(sgl)) { + for_each_sg(sgl, sg, sg_count, j) { /* -* Init values for the current sgl entry. hvpgoff -* and hvpfns_to_add are in units of Hyper-V size -* pages. Handling the PAGE_SIZE != HV_HYP_PAGE_SIZE -* case also handles values of sgl->offset that are -* larger than PAGE_SIZE. Such offsets are handled -* even on other than the first sgl entry, provided -* they are a multiple of PAGE_SIZE. +* Init values for the current sgl entry. hvpfns_to_add +* is in units of Hyper-V size pages. Handling the +* PAGE_SIZE != HV_HYP_PAGE_SIZE case also handles +* values of sgl->offset that are larger than PAGE_SIZE. +* Such offsets are handled even on other than the first +* sgl entry, provided they are a multiple of PAGE_SIZE. */ - hvpgoff = HVPFN_DOWN(sgl->offset); - hvpfn = page_to_hvpfn(sg_page(sgl)) + hvpgoff; -
[PATCH V7 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM
From: Tianyu Lan hyperv Isolation VM requires bounce buffer support to copy data from/to encrypted memory and so enable swiotlb force mode to use swiotlb bounce buffer for DMA transaction. In Isolation VM with AMD SEV, the bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Swiotlb bounce buffer code calls set_memory_decrypted() to mark bounce buffer visible to host and map it in extra address space via memremap. Populate the shared_gpa_boundary (vTOM) via swiotlb_unencrypted_base variable. The map function memremap() can't work in the early place (e.g ms_hyperv_init_platform()) and so call swiotlb_update_mem_ attributes() in the hyperv_init(). Signed-off-by: Tianyu Lan --- Change since v6: * Fix compile error when swiotlb is not enabled. Change since v4: * Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes() in the hyperv_init(). Change since v3: * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). --- arch/x86/hyperv/hv_init.c | 12 arch/x86/kernel/cpu/mshyperv.c | 15 ++- 2 files changed, 26 insertions(+), 1 deletion(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 24f4a06ac46a..749906a8e068 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -28,6 +28,7 @@ #include #include #include +#include int hyperv_init_cpuhp; u64 hv_current_partition_id = ~0ull; @@ -502,6 +503,17 @@ void __init hyperv_init(void) /* Query the VMs extended capability once, so that it can be cached. */ hv_query_ext_cap(0); + +#ifdef CONFIG_SWIOTLB + /* +* Swiotlb bounce buffer needs to be mapped in extra address +* space. Map function doesn't work in the early place and so +* call swiotlb_update_mem_attributes() here. +*/ + if (hv_is_isolation_supported()) + swiotlb_update_mem_attributes(); +#endif + return; clean_guest_os_id: diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 4794b716ec79..e3a240c5e4f5 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -319,8 +320,20 @@ static void __init ms_hyperv_init_platform(void) pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) + if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { static_branch_enable(_type_snp); +#ifdef CONFIG_SWIOTLB + swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary; +#endif + } + +#ifdef CONFIG_SWIOTLB + /* +* Enable swiotlb force mode in Isolation VM to +* use swiotlb bounce buffer for dma transaction. +*/ + swiotlb_force = SWIOTLB_FORCE; +#endif } if (hv_max_functions_eax >= HYPERV_CPUID_NESTED_FEATURES) { -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V7 2/5] x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has()
From: Tianyu Lan Hyper-V provides Isolation VM for confidential computing support and guest memory is encrypted in it. Places checking cc_platform_has() with GUEST_MEM_ENCRYPT attr should return "True" in Isolation vm. e.g, swiotlb bounce buffer size needs to adjust according to memory size in the sev_setup_arch(). Add GUEST_MEM_ENCRYPT check for Hyper-V Isolation VM. Signed-off-by: Tianyu Lan --- Change since v6: * Change the order in the cc_platform_has() and check sev first. Change since v3: * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). --- arch/x86/kernel/cc_platform.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/cc_platform.c b/arch/x86/kernel/cc_platform.c index 03bb2f343ddb..6cb3a675e686 100644 --- a/arch/x86/kernel/cc_platform.c +++ b/arch/x86/kernel/cc_platform.c @@ -11,6 +11,7 @@ #include #include +#include #include static bool __maybe_unused intel_cc_platform_has(enum cc_attr attr) @@ -58,12 +59,19 @@ static bool amd_cc_platform_has(enum cc_attr attr) #endif } +static bool hyperv_cc_platform_has(enum cc_attr attr) +{ + return attr == CC_ATTR_GUEST_MEM_ENCRYPT; +} bool cc_platform_has(enum cc_attr attr) { if (sme_me_mask) return amd_cc_platform_has(attr); + if (hv_is_isolation_supported()) + return hyperv_cc_platform_has(attr); + return false; } EXPORT_SYMBOL_GPL(cc_platform_has); -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V7 0/5] x86/Hyper-V: Add Hyper-V Isolation VM support(Second part)
From: Tianyu Lan Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset is to add support for these Isolation VM support in Linux. The memory of these vms are encrypted and host can't access guest memory directly. Hyper-V provides new host visibility hvcall and the guest needs to call new hvcall to mark memory visible to host before sharing memory with host. For security, all network/storage stack memory should not be shared with host and so there is bounce buffer requests. Vmbus channel ring buffer already plays bounce buffer role because all data from/to host needs to copy from/to between the ring buffer and IO stack memory. So mark vmbus channel ring buffer visible. For SNP isolation VM, guest needs to access the shared memory via extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_ ISOLATION_CONFIG. The access physical address of the shared memory should be bounce buffer memory GPA plus with shared_gpa_boundary reported by CPUID. This patchset is to enable swiotlb bounce buffer for netvsc/storvsc drivers in Isolation VM. Change since v6: * Fix compile error in hv_init.c and mshyperv.c when swiotlb is not enabled. * Change the order in the cc_platform_has() and check sev first. Change sicne v5: * Modify "Swiotlb" to "swiotlb" in commit log. * Remove CONFIG_HYPERV check in the hyperv_cc_platform_has() Change since v4: * Remove Hyper-V IOMMU IOMMU_INIT_FINISH related functions and set SWIOTLB_FORCE and swiotlb_unencrypted_base in the ms_hyperv_init_platform(). Call swiotlb_update_mem_attributes() in the hyperv_init(). Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). * Change code style of checking GUEST_MEM attribute in the hyperv_cc_platform_has(). * Add comment in pci-swiotlb-xen.c to explain why add dependency between hyperv_swiotlb_detect() and pci_ xen_swiotlb_detect(). * Return directly when fails to allocate Hyper-V swiotlb buffer in the hyperv_iommu_swiotlb_init(). Change since v2: * Remove Hyper-V dma ops and dma_alloc/free_noncontiguous. Add hv_map/unmap_memory() to map/umap netvsc rx/tx ring into extra address space. * Leave mem->vaddr in swiotlb code with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Add Hyper-V Isolation support check in the cc_platform_has() and return true for guest memory encrypt attr. * Remove hv isolation check in the sev_setup_arch() Tianyu Lan (5): swiotlb: Add swiotlb bounce buffer remap function for HV IVM x86/hyper-v: Add hyperv Isolation VM check in the cc_platform_has() hyper-v: Enable swiotlb bounce buffer for Isolation VM scsi: storvsc: Add Isolation VM support for storvsc driver net: netvsc: Add Isolation VM support for netvsc driver arch/x86/hyperv/hv_init.c | 12 +++ arch/x86/hyperv/ivm.c | 28 ++ arch/x86/kernel/cc_platform.c | 8 ++ arch/x86/kernel/cpu/mshyperv.c| 15 +++- drivers/hv/hv_common.c| 11 +++ drivers/hv/vmbus_drv.c| 4 + drivers/net/hyperv/hyperv_net.h | 5 ++ drivers/net/hyperv/netvsc.c | 136 +- drivers/net/hyperv/netvsc_drv.c | 1 + drivers/net/hyperv/rndis_filter.c | 2 + drivers/scsi/storvsc_drv.c| 37 include/asm-generic/mshyperv.h| 2 + include/linux/hyperv.h| 6 ++ include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c | 43 +- 15 files changed, 294 insertions(+), 22 deletions(-) -- 2.25.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH V7 1/5] swiotlb: Add swiotlb bounce buffer remap function for HV IVM
From: Tianyu Lan In Isolation VM with AMD SEV, bounce buffer needs to be accessed via extra address space which is above shared_gpa_boundary (E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG. The access physical address will be original physical address + shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP spec is called virtual top of memory(vTOM). Memory addresses below vTOM are automatically treated as private while memory above vTOM is treated as shared. Expose swiotlb_unencrypted_base for platforms to set unencrypted memory base offset and platform calls swiotlb_update_mem_attributes() to remap swiotlb mem to unencrypted address space. memremap() can not be called in the early stage and so put remapping code into swiotlb_update_mem_attributes(). Store remap address and use it to copy data from/to swiotlb bounce buffer. Acked-by: Christoph Hellwig Signed-off-by: Tianyu Lan --- Change since v3: * Fix boot up failure on the host with mem_encrypt=on. Move calloing of set_memory_decrypted() back from swiotlb_init_io_tlb_mem to swiotlb_late_init_with_tbl() and rmem_swiotlb_device_init(). Change since v2: * Leave mem->vaddr with phys_to_virt(mem->start) when fail to remap swiotlb memory. Change since v1: * Rework comment in the swiotlb_init_io_tlb_mem() * Make swiotlb_init_io_tlb_mem() back to return void. --- include/linux/swiotlb.h | 6 ++ kernel/dma/swiotlb.c| 43 +++-- 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 569272871375..f6c3638255d5 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -73,6 +73,9 @@ extern enum swiotlb_force swiotlb_force; * @end: The end address of the swiotlb memory pool. Used to do a quick * range check to see if the memory was in fact allocated by this * API. + * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb memory pool + * may be remapped in the memory encrypted case and store virtual + * address for bounce buffer operation. * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and * @end. For default swiotlb, this is command line adjustable via * setup_io_tlb_npages. @@ -92,6 +95,7 @@ extern enum swiotlb_force swiotlb_force; struct io_tlb_mem { phys_addr_t start; phys_addr_t end; + void *vaddr; unsigned long nslabs; unsigned long used; unsigned int index; @@ -186,4 +190,6 @@ static inline bool is_swiotlb_for_alloc(struct device *dev) } #endif /* CONFIG_DMA_RESTRICTED_POOL */ +extern phys_addr_t swiotlb_unencrypted_base; + #endif /* __LINUX_SWIOTLB_H */ diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 8e840fbbed7c..34e6ade4f73c 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -50,6 +50,7 @@ #include #include +#include #include #include #include @@ -72,6 +73,8 @@ enum swiotlb_force swiotlb_force; struct io_tlb_mem io_tlb_default_mem; +phys_addr_t swiotlb_unencrypted_base; + /* * Max segment that we can provide which (if pages are contingous) will * not be bounced (unless SWIOTLB_FORCE is set). @@ -155,6 +158,27 @@ static inline unsigned long nr_slots(u64 val) return DIV_ROUND_UP(val, IO_TLB_SIZE); } +/* + * Remap swioltb memory in the unencrypted physical address space + * when swiotlb_unencrypted_base is set. (e.g. for Hyper-V AMD SEV-SNP + * Isolation VMs). + */ +void *swiotlb_mem_remap(struct io_tlb_mem *mem, unsigned long bytes) +{ + void *vaddr = NULL; + + if (swiotlb_unencrypted_base) { + phys_addr_t paddr = mem->start + swiotlb_unencrypted_base; + + vaddr = memremap(paddr, bytes, MEMREMAP_WB); + if (!vaddr) + pr_err("Failed to map the unencrypted memory %llx size %lx.\n", + paddr, bytes); + } + + return vaddr; +} + /* * Early SWIOTLB allocation may be too early to allow an architecture to * perform the desired operations. This function allows the architecture to @@ -172,7 +196,12 @@ void __init swiotlb_update_mem_attributes(void) vaddr = phys_to_virt(mem->start); bytes = PAGE_ALIGN(mem->nslabs << IO_TLB_SHIFT); set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT); - memset(vaddr, 0, bytes); + + mem->vaddr = swiotlb_mem_remap(mem, bytes); + if (!mem->vaddr) + mem->vaddr = vaddr; + + memset(mem->vaddr, 0, bytes); } static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, @@ -196,7 +225,17 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start, mem->slots[i].orig_addr = INVALID_PHYS_ADDR; mem->slots[i].alloc_size
Re: [PATCH 1/4] dt-bindings: memory: mediatek: Correct the minItems of clk for larbs
On Fri, 2021-12-03 at 17:34 -0600, Rob Herring wrote: > On Fri, 03 Dec 2021 14:40:24 +0800, Yong Wu wrote: > > If a platform's larb support gals, there will be some larbs have a > > one > > more "gals" clock while the others still only need "apb"/"smi" > > clocks. > > then the minItems is 2 and the maxItems is 3. > > > > Fixes: 27bb0e42855a ("dt-bindings: memory: mediatek: Convert SMI to > > DT schema") > > Signed-off-by: Yong Wu > > --- > > .../bindings/memory-controllers/mediatek,smi-larb.yaml | > > 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > Running 'make dtbs_check' with the schema in this patch gives the > following warnings. Consider if they are expected or the schema is > incorrect. These may not be new warnings. > > Note that it is not yet a requirement to have 0 warnings for > dtbs_check. > This will change in the future. > > Full log is available here: > https://patchwork.ozlabs.org/patch/1563127 > > > larb@14016000: 'mediatek,larb-id' is a required property > arch/arm64/boot/dts/mediatek/mt8167-pumpkin.dt.yaml I will fix this in next version. This property is not needed in mt8167. > > larb@14017000: clock-names: ['apb', 'smi'] is too short > arch/arm64/boot/dts/mediatek/mt8183-evb.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi- > burnet.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-damu.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi- > fennel14.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-fennel- > sku1.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-fennel- > sku6.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-juniper- > sku16.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-kappa.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-kenzo.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-willow- > sku0.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-willow- > sku1.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-kakadu.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-kodama-sku16.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-kodama-sku272.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-kodama-sku288.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-kodama-sku32.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku0.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku176.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dt.yaml Some larbs only have two clocks(apb/smi) in mt8183. thus it is reasonable for me. I won't fix this in next version. Please tell me if I miss something. Thanks. > > larb@15001000: 'mediatek,larb-id' is a required property > arch/arm64/boot/dts/mediatek/mt8167-pumpkin.dt.yaml > > larb@1601: clock-names: ['apb', 'smi'] is too short > arch/arm64/boot/dts/mediatek/mt8183-evb.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi- > burnet.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-damu.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi- > fennel14.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-fennel- > sku1.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-fennel- > sku6.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-juniper- > sku16.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-kappa.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-kenzo.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-willow- > sku0.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-willow- > sku1.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-kakadu.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-kodama-sku16.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-kodama-sku272.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-kodama-sku288.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-kodama-sku32.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku0.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-krane-sku176.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dt.yaml > > larb@1601: 'mediatek,larb-id' is a required property > arch/arm64/boot/dts/mediatek/mt8167-pumpkin.dt.yaml > > larb@1701: clock-names: ['apb', 'smi'] is too short > arch/arm64/boot/dts/mediatek/mt8183-evb.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi- > burnet.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-damu.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi- > fennel14.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-fennel- > sku1.dt.yaml > arch/arm64/boot/dts/mediatek/mt8183-kukui-jacuzzi-fennel- > sku6.dt.yaml >
Re: [patch V3 34/35] soc: ti: ti_sci_inta_msi: Get rid of ti_sci_inta_msi_get_virq()
On 10-12-21, 23:19, Thomas Gleixner wrote: > From: Thomas Gleixner > > Just use the core function msi_get_virq(). Acked-By: Vinod Koul -- ~Vinod ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 35/35] dmaengine: qcom_hidma: Cleanup MSI handling
On 10-12-21, 23:19, Thomas Gleixner wrote: > From: Thomas Gleixner > > There is no reason to walk the MSI descriptors to retrieve the interrupt > number for a device. Use msi_get_virq() instead. Acked-By: Vinod Koul -- ~Vinod ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch V3 29/35] dmaengine: mv_xor_v2: Get rid of msi_desc abuse
On 10-12-21, 23:19, Thomas Gleixner wrote: > From: Thomas Gleixner > > Storing a pointer to the MSI descriptor just to keep track of the Linux > interrupt number is daft. Use msi_get_virq() instead. Acked-By: Vinod Koul -- ~Vinod ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 04/18] driver core: platform: Add driver dma ownership management
On 12/10/21 9:23 AM, Lu Baolu wrote: Hi Greg, Jason and Christoph, On 12/9/21 9:20 AM, Lu Baolu wrote: On 12/7/21 9:16 PM, Jason Gunthorpe wrote: On Tue, Dec 07, 2021 at 10:57:25AM +0800, Lu Baolu wrote: On 12/6/21 11:06 PM, Jason Gunthorpe wrote: On Mon, Dec 06, 2021 at 06:36:27AM -0800, Christoph Hellwig wrote: I really hate the amount of boilerplate code that having this in each bus type causes. +1 I liked the first version of this series better with the code near really_probe(). Can we go back to that with some device_configure_dma() wrapper condtionally called by really_probe as we discussed? [...] diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 68ea1f949daa..68ca5a579eb1 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -538,6 +538,32 @@ static int call_driver_probe(struct device *dev, struct device_driver *drv) return ret; } +static int device_dma_configure(struct device *dev, struct device_driver *drv) +{ + int ret; + + if (!dev->bus->dma_configure) + return 0; + + ret = dev->bus->dma_configure(dev); + if (ret) + return ret; + + if (!drv->suppress_auto_claim_dma_owner) + ret = iommu_device_set_dma_owner(dev, DMA_OWNER_DMA_API, NULL); + + return ret; +} + +static void device_dma_cleanup(struct device *dev, struct device_driver *drv) +{ + if (!dev->bus->dma_configure) + return; + + if (!drv->suppress_auto_claim_dma_owner) + iommu_device_release_dma_owner(dev, DMA_OWNER_DMA_API); +} + static int really_probe(struct device *dev, struct device_driver *drv) { bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) && @@ -574,11 +600,8 @@ static int really_probe(struct device *dev, struct device_driver *drv) if (ret) goto pinctrl_bind_failed; - if (dev->bus->dma_configure) { - ret = dev->bus->dma_configure(dev); - if (ret) - goto probe_failed; - } + if (device_dma_configure(dev, drv)) + goto pinctrl_bind_failed; ret = driver_sysfs_add(dev); if (ret) { @@ -660,6 +683,8 @@ static int really_probe(struct device *dev, struct device_driver *drv) if (dev->bus) blocking_notifier_call_chain(>bus->p->bus_notifier, BUS_NOTIFY_DRIVER_NOT_BOUND, dev); + + device_dma_cleanup(dev, drv); pinctrl_bind_failed: device_links_no_driver(dev); devres_release_all(dev); @@ -1204,6 +1229,7 @@ static void __device_release_driver(struct device *dev, struct device *parent) else if (drv->remove) drv->remove(dev); + device_dma_cleanup(dev, drv); device_links_driver_cleanup(dev); devres_release_all(dev); diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index a498ebcf4993..374a3c2cc10d 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -100,6 +100,7 @@ struct device_driver { const char *mod_name; /* used for built-in modules */ bool suppress_bind_attrs; /* disables bind/unbind via sysfs */ + bool suppress_auto_claim_dma_owner; enum probe_type probe_type; const struct of_device_id *of_match_table; Does this work for you? Can I work towards this in the next version? A kindly ping ... Is this heading the right direction? I need your advice to move ahead. :-) Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
On Sun, Dec 12, 2021 at 01:12:05AM +0100, Thomas Gleixner wrote: > PCI/MSI and PCI/MSI-X are just implementations of IMS > > Not more, not less. The fact that they have very strict rules about the > storage space and the fact that they are mutually exclusive does not > change that at all. And the mess we have is that virtualiation broke this design. Virtualization made MSI/MSI-X special! I am wondering if we just need to bite the bullet and force the introduction of a new ACPI flag for the APIC that says one of: - message addr/data pairs work correctly (baremetal) - creating message addr/data pairs need to use a hypercall protocol - property not defined so assume only MSI/MSI-X/etc work. Intel was originally trying to do this with the 'IMS enabled' PCI Capability block, but a per PCI device capability is in the wrong layer. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
On Sun, Dec 12, 2021 at 09:55:32PM +0100, Thomas Gleixner wrote: > Kevin, > > On Sun, Dec 12 2021 at 01:56, Kevin Tian wrote: > >> From: Thomas Gleixner > >> All I can find is drivers/iommu/virtio-iommu.c but I can't find anything > >> vIR related there. > > > > Well, virtio-iommu is a para-virtualized vIOMMU implementations. > > > > In reality there are also fully emulated vIOMMU implementations (e.g. > > Qemu fully emulates Intel/AMD/ARM IOMMUs). In those configurations > > the IR logic in existing iommu drivers just apply: > > > > drivers/iommu/intel/irq_remapping.c > > drivers/iommu/amd/iommu.c > > thanks for the explanation. So that's a full IOMMU emulation. I was more > expecting a paravirtualized lightweight one. Kevin can you explain what on earth vIR is for and how does it work?? Obviously we don't expose the IR machinery to userspace, so at best this is somehow changing what the MSI trap does? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] ioasid: Reserve a global PASID for in-kernel DMA
On Sat, Dec 11, 2021 at 08:39:12AM +, Tian, Kevin wrote: > Uniqueness is not the main argument of using global PASIDs for > SWQ, since it can be defined either in per-RID or in global PASID > space. No SVA architecture can allow two processes to use the > same PASID to submit work unless they share mm! > > IMO the real reason is that SWQ for user SVA must be accessed > via ENQCMD instruction which fetches the PASID from a CPU MSR This really should have been inside a comment in the struct mm "pasid is the value used by x86 ENQCMD" (and if we phrase it that way I wonder why it is in a struct mm not some process or task related struct, since it has nothing to do with page tables) And, IMHO, the IOMMU part of the code should avoid using this field. IOMMU should be able to create arbitarily many "SVA" iommu_domains for use by PASID even if they can't be used with ENQCMD. Such is proper layering. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
On Sun, Dec 12, 2021 at 08:44:46AM +0200, Mika Penttilä wrote: > > /* > > * The MSIX mappable capability informs that MSIX data of a BAR can be > > mmapped > > * which allows direct access to non-MSIX registers which happened to be > > within > > * the same system page. > > * > > * Even though the userspace gets direct access to the MSIX data, the > > existing > > * VFIO_DEVICE_SET_IRQS interface must still be used for MSIX > > configuration. > > */ > > #define VFIO_REGION_INFO_CAP_MSIX_MAPPABLE 3 > > > > IIRC this was introduced for PPC when a device has MSI-X in the same BAR as > > other MMIO registers. Trapping MSI-X leads to performance downgrade on > > accesses to adjacent registers. MSI-X can be mapped by userspace because > > PPC already uses a hypercall mechanism for interrupt. Though unclear about > > the detail it sounds a similar usage as proposed here. > > > > Thanks > > Kevin > > I see VFIO_REGION_INFO_CAP_MSIX_MAPPABLE is always set so if msix table is > in its own bar, qemu never traps/emulates the access. It is some backwards compat, the kernel always sets it to indicate a new kernel, that doesn't mean qemu doesn't trap. As the comment says, ""VFIO_DEVICE_SET_IRQS interface must still be used for MSIX configuration"" so there is no way qemu can meet that without either trapping the MSI page or using a special hypercall (ppc) Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Kevin, On Sun, Dec 12 2021 at 01:56, Kevin Tian wrote: >> From: Thomas Gleixner >> All I can find is drivers/iommu/virtio-iommu.c but I can't find anything >> vIR related there. > > Well, virtio-iommu is a para-virtualized vIOMMU implementations. > > In reality there are also fully emulated vIOMMU implementations (e.g. > Qemu fully emulates Intel/AMD/ARM IOMMUs). In those configurations > the IR logic in existing iommu drivers just apply: > > drivers/iommu/intel/irq_remapping.c > drivers/iommu/amd/iommu.c thanks for the explanation. So that's a full IOMMU emulation. I was more expecting a paravirtualized lightweight one. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
Kevin, On Sun, Dec 12 2021 at 02:14, Kevin Tian wrote: >> From: Thomas Gleixner > I just continue the thought practice along that direction to see what > the host flow will be like (step by step). Looking at the current > implementation is just one necessary step in my thought practice to > help refine the picture. When I found something which may be > worth being aligned then I shared to avoid follow a wrong direction > too far. > > If both of your think it simply adds noise to this discussion, I can > surely hold back and focus on 'concept' only. All good. We _want_ your participartion for sure. Comparing and contrasting it to the existing flow is fine. Thanks, tglx ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch 21/32] NTB/msi: Convert to msi_on_each_desc()
On 10.12.2021 9.36, Tian, Kevin wrote: From: Jason Gunthorpe Sent: Friday, December 10, 2021 4:59 AM On Thu, Dec 09, 2021 at 09:32:42PM +0100, Thomas Gleixner wrote: On Thu, Dec 09 2021 at 12:21, Jason Gunthorpe wrote: On Thu, Dec 09, 2021 at 09:37:06AM +0100, Thomas Gleixner wrote: If we keep the MSI emulation in the hypervisor then MSI != IMS. The MSI code needs to write a addr/data pair compatible with the emulation and the IMS code needs to write an addr/data pair from the hypercall. Seems like this scenario is best avoided! From this perspective I haven't connected how virtual interrupt remapping helps in the guest? Is this a way to provide the hypercall I'm imagining above? That was my thought to avoid having different mechanisms. The address/data pair is computed in two places: 1) Activation of an interrupt 2) Affinity setting on an interrupt Both configure the IRTE when interrupt remapping is in place. In both cases a vector is allocated in the vector domain and based on the resulting target APIC / vector number pair the IRTE is (re)configured. So putting the hypercall into the vIRTE update is the obvious place. Both activation and affinity setting can fail and propagate an error code down to the originating caller. Hmm? Okay, I think I get it. Would be nice to have someone from intel familiar with the vIOMMU protocols and qemu code remark what the hypervisor side can look like. There is a bit more work here, we'd have to change VFIO to somehow entirely disconnect the kernel IRQ logic from the MSI table and directly pass control of it to the guest after the hypervisor IOMMU IR secures it. ie directly mmap the msi-x table into the guest It's supported already: /* * The MSIX mappable capability informs that MSIX data of a BAR can be mmapped * which allows direct access to non-MSIX registers which happened to be within * the same system page. * * Even though the userspace gets direct access to the MSIX data, the existing * VFIO_DEVICE_SET_IRQS interface must still be used for MSIX configuration. */ #define VFIO_REGION_INFO_CAP_MSIX_MAPPABLE 3 IIRC this was introduced for PPC when a device has MSI-X in the same BAR as other MMIO registers. Trapping MSI-X leads to performance downgrade on accesses to adjacent registers. MSI-X can be mapped by userspace because PPC already uses a hypercall mechanism for interrupt. Though unclear about the detail it sounds a similar usage as proposed here. Thanks Kevin I see VFIO_REGION_INFO_CAP_MSIX_MAPPABLE is always set so if msix table is in its own bar, qemu never traps/emulates the access. On the other hand, qemu is said to depend on emulating masking. So how is this supposed to work, in case the table is not in the config bar? Thanks, Mika ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu