[PATCH v3 00/10] Support using MSI interrupts in ntb_transport
This is version 3 of the MSI interrupts for ntb_transport patchset. I've addressed the feedback so far and rebased on the latest kernel and would like this to be considered for merging this cycle. The only outstanding issue I know of is that it still will not work with IDT hardware, but ntb_transport doesn't work with IDT hardware and there is still no sensible common infrastructure to support ntb_peer_mw_set_trans(). Thus, I decline to consider that complication in this patchset. However, I'll be happy to review work that adds this feature in the future. Also as the port number and resource index stuff is a bit complicated, I made a quick out of tree test fixture to ensure it's correct[1]. As an excerise I also wrote some test code[2] using the upcomming KUnit feature. Logan [1] https://repl.it/repls/ExcitingPresentFile [2] https://github.com/sbates130272/linux-p2pmem/commits/ntb_kunit -- Changes in v3: * Rebased onto v5.1-rc1 (Dropped the first two patches as they have been merged, and cleaned up some minor conflicts in the PCI tree) * Added a new patch (#3) to calculate logical port numbers that are port numbers from 0 to (number of ports - 1). This is then used in ntb_peer_resource_idx() to fix the issues brought up by Serge. * Fixed missing __iomem and iowrite calls (as noticed by Serge) * Added patch 10 which describes ntb_msi_test in the documentation file (as requested by Serge) * A couple other minor nits and documentation fixes -- Changes in v2: * Cleaned up the changes in intel_irq_remapping.c to make them less confusing and add a comment. (Per discussion with Jacob and Joerg) * Fixed a nit from Bjorn and collected his Ack * Added a Kconfig dependancy on CONFIG_PCI_MSI for CONFIG_NTB_MSI as the Kbuild robot hit a random config that didn't build without it. * Worked in a callback for when the MSI descriptor changes so that the clients can resend the new address and data values to the peer. On my test system this was never necessary, but there may be other platforms where this can occur. I tested this by hacking in a path to rewrite the MSI descriptor when I change the cpu affinity of an IRQ. There's a bit of uncertainty over the latency of the change, but without hardware this can acctually occur on we can't test this. This was the result of a discussion with Dave. -- This patch series adds optional support for using MSI interrupts instead of NTB doorbells in ntb_transport. This is desirable seeing doorbells on current hardware are quite slow and therefore switching to MSI interrupts provides a significant performance gain. On switchtec hardware, a simple apples-to-apples comparison shows ntb_netdev/iperf numbers going from 3.88Gb/s to 14.1Gb/s when switching to MSI interrupts. To do this, a couple changes are required outside of the NTB tree: 1) The IOMMU must know to accept MSI requests from aliased bused numbers seeing NTB hardware typically sends proxied request IDs through additional requester IDs. The first patch in this series adds support for the Intel IOMMU. A quirk to add these aliases for switchtec hardware was already accepted. See commit ad281ecf1c7d ("PCI: Add DMA alias quirk for Microsemi Switchtec NTB") for a description of NTB proxy IDs and why this is necessary. 2) NTB transport (and other clients) may often need more MSI interrupts than the NTB hardware actually advertises support for. However, seeing these interrupts will not be triggered by the hardware but through an NTB memory window, the hardware does not actually need support or need to know about them. Therefore we add the concept of Virtual MSI interrupts which are allocated just like any other MSI interrupt but are not programmed into the hardware's MSI table. This is done in Patch 2 and then made use of in Patch 3. The remaining patches in this series add a library for dealing with MSI interrupts, a test client and finally support in ntb_transport. The series is based off of v5.1-rc1 plus the patches in ntb-next. A git repo is available here: https://github.com/sbates130272/linux-p2pmem/ ntb_transport_msi_v3 Thanks, Logan -- Logan Gunthorpe (10): PCI/MSI: Support allocating virtual MSI interrupts PCI/switchtec: Add module parameter to request more interrupts NTB: Introduce helper functions to calculate logical port number NTB: Introduce functions to calculate multi-port resource index NTB: Rename ntb.c to support multiple source files in the module NTB: Introduce MSI library NTB: Introduce NTB MSI Test Client NTB: Add ntb_msi_test support to ntb_test NTB: Add MSI interrupt support to ntb_transport NTB: Describe the ntb_msi_test client in the documentation. Documentation/ntb.txt | 27 ++ drivers/ntb/Kconfig | 11 + drivers/ntb/Makefile| 3 + drivers/ntb/{ntb.c => core.c} | 0 drivers/ntb/msi.c | 415 +++
[PATCH v3 10/10] NTB: Describe the ntb_msi_test client in the documentation.
Add a blurb in Documentation/ntb.txt to describe the ntb_msi_test tool's debugfs interface. Similar to the (out of date) ntb_tool description. Signed-off-by: Logan Gunthorpe --- Documentation/ntb.txt | 27 +++ 1 file changed, 27 insertions(+) diff --git a/Documentation/ntb.txt b/Documentation/ntb.txt index a043854d28df..802a539f1592 100644 --- a/Documentation/ntb.txt +++ b/Documentation/ntb.txt @@ -194,6 +194,33 @@ Debugfs Files: This file is used to read and write peer scratchpads. See *spad* for details. +NTB MSI Test Client (ntb\_msi\_test) + + +The MSI test client serves to test and debug the MSI library which +allows for passing MSI interrupts across NTB memory windows. The +test client is interacted with through the debugfs filesystem: + +* *debugfs*/ntb\_tool/*hw*/ + A directory in debugfs will be created for each + NTB device probed by the tool. This directory is shortened to *hw* + below. +* *hw*/port + This file describes the local port number +* *hw*/irq*_occurrences + One occurrences file exists for each interrupt and, when read, + returns the number of times the interrupt has been triggered. +* *hw*/peer*/port + This file describes the port number for each peer +* *hw*/peer*/count + This file describes the number of interrupts that can be + triggered on each peer +* *hw*/peer*/trigger + Writing an interrupt number (any number less than the value + specified in count) will trigger the interrupt on the + specified peer. That peer's interrupt's occurrence file + should be incremented. + NTB Hardware Drivers -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 09/10] NTB: Add MSI interrupt support to ntb_transport
Introduce the module parameter 'use_msi' which, when set uses MSI interrupts instead of doorbells for each queue pair (QP). T he parameter is only available if NTB MSI support is configured into the kernel. We also require there to be more than one memory window (MW) so that an extra one is available to forward the APIC region. To use MSIs, we request one interrupt per QP and forward the MSI address and data to the peer using scratch pad registers (SPADS) above the MW spads. (If there are not enough SPADS the MSI interrupt will not be used.) Once registered, we simply use ntb_msi_peer_trigger and the recieving ISR simply queues up the rxc_db_work for the queue. This addition can significantly improve performance of ntb_transport. In a simple, untuned, apples-to-apples comparision using ntb_netdev and iperf with switchtec hardware, I see 3.88Gb/s without MSI interrupts and 14.1Gb/s which is a more than 3x improvement. Signed-off-by: Logan Gunthorpe Cc: Jon Mason Cc: Dave Jiang Cc: Allen Hubbe --- drivers/ntb/ntb_transport.c | 169 +++- 1 file changed, 168 insertions(+), 1 deletion(-) diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c index d4f39ba1d976..f1cf0942cb99 100644 --- a/drivers/ntb/ntb_transport.c +++ b/drivers/ntb/ntb_transport.c @@ -93,6 +93,12 @@ static bool use_dma; module_param(use_dma, bool, 0644); MODULE_PARM_DESC(use_dma, "Use DMA engine to perform large data copy"); +static bool use_msi; +#ifdef CONFIG_NTB_MSI +module_param(use_msi, bool, 0644); +MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells"); +#endif + static struct dentry *nt_debugfs_dir; /* Only two-ports NTB devices are supported */ @@ -188,6 +194,11 @@ struct ntb_transport_qp { u64 tx_err_no_buf; u64 tx_memcpy; u64 tx_async; + + bool use_msi; + int msi_irq; + struct ntb_msi_desc msi_desc; + struct ntb_msi_desc peer_msi_desc; }; struct ntb_transport_mw { @@ -221,6 +232,10 @@ struct ntb_transport_ctx { u64 qp_bitmap; u64 qp_bitmap_free; + bool use_msi; + unsigned int msi_spad_offset; + u64 msi_db_mask; + bool link_is_up; struct delayed_work link_work; struct work_struct link_cleanup; @@ -667,6 +682,114 @@ static int ntb_transport_setup_qp_mw(struct ntb_transport_ctx *nt, return 0; } +static irqreturn_t ntb_transport_isr(int irq, void *dev) +{ + struct ntb_transport_qp *qp = dev; + + tasklet_schedule(>rxc_db_work); + + return IRQ_HANDLED; +} + +static void ntb_transport_setup_qp_peer_msi(struct ntb_transport_ctx *nt, + unsigned int qp_num) +{ + struct ntb_transport_qp *qp = >qp_vec[qp_num]; + int spad = qp_num * 2 + nt->msi_spad_offset; + + if (!nt->use_msi) + return; + + if (spad >= ntb_spad_count(nt->ndev)) + return; + + qp->peer_msi_desc.addr_offset = + ntb_peer_spad_read(qp->ndev, PIDX, spad); + qp->peer_msi_desc.data = + ntb_peer_spad_read(qp->ndev, PIDX, spad + 1); + + dev_dbg(>ndev->pdev->dev, "QP%d Peer MSI addr=%x data=%x\n", + qp_num, qp->peer_msi_desc.addr_offset, qp->peer_msi_desc.data); + + if (qp->peer_msi_desc.addr_offset) { + qp->use_msi = true; + dev_info(>ndev->pdev->dev, +"Using MSI interrupts for QP%d\n", qp_num); + } +} + +static void ntb_transport_setup_qp_msi(struct ntb_transport_ctx *nt, + unsigned int qp_num) +{ + struct ntb_transport_qp *qp = >qp_vec[qp_num]; + int spad = qp_num * 2 + nt->msi_spad_offset; + int rc; + + if (!nt->use_msi) + return; + + if (spad >= ntb_spad_count(nt->ndev)) { + dev_warn_once(>ndev->pdev->dev, + "Not enough SPADS to use MSI interrupts\n"); + return; + } + + ntb_spad_write(qp->ndev, spad, 0); + ntb_spad_write(qp->ndev, spad + 1, 0); + + if (!qp->msi_irq) { + qp->msi_irq = ntbm_msi_request_irq(qp->ndev, ntb_transport_isr, + KBUILD_MODNAME, qp, + >msi_desc); + if (qp->msi_irq < 0) { + dev_warn(>ndev->pdev->dev, +"Unable to allocate MSI interrupt for qp%d\n", +qp_num); + return; + } + } + + rc = ntb_spad_write(qp->ndev, spad, qp->msi_desc.addr_offset); + if (rc) + goto err_free_interrupt; + + rc = ntb_spad_write(qp->ndev, spad + 1, qp->msi_desc.data); + if (rc) + goto err_free_interrupt; + + dev_dbg(>ndev->pdev->dev, "QP%d MSI %d addr=%x data=%x\n", + qp_num,
[PATCH v3 07/10] NTB: Introduce NTB MSI Test Client
Introduce a tool to test NTB MSI interrupts similar to the other NTB test tools. This tool creates a debugfs directory for each NTB device with the following files: port irqX_occurrences peerX/port peerX/count peerX/trigger The 'port' file tells the user the local port number and the 'occurrences' files tell the number of local interrupts that have been received for each interrupt. For each peer, the 'port' file and the 'count' file tell you the peer's port number and number of interrupts respectively. Writing the interrupt number to the 'trigger' file triggers the interrupt handler for the peer which should increment their corresponding 'occurrences' file. The 'ready' file indicates if a peer is ready, writing to this file blocks until it is ready. The module parameter num_irqs can be used to set the number of local interrupts. By default this is 4. This is only limited by the number of unused MSI interrupts registered by the hardware (this will require support of the hardware driver) and there must be at least 2*num_irqs + 1 spads registers available. Signed-off-by: Logan Gunthorpe Cc: Jon Mason Cc: Dave Jiang Cc: Allen Hubbe --- drivers/ntb/test/Kconfig| 9 + drivers/ntb/test/Makefile | 1 + drivers/ntb/test/ntb_msi_test.c | 433 3 files changed, 443 insertions(+) create mode 100644 drivers/ntb/test/ntb_msi_test.c diff --git a/drivers/ntb/test/Kconfig b/drivers/ntb/test/Kconfig index a5d0eda44438..a3f3e2638935 100644 --- a/drivers/ntb/test/Kconfig +++ b/drivers/ntb/test/Kconfig @@ -25,3 +25,12 @@ config NTB_PERF to and from the window without additional software interaction. If unsure, say N. + +config NTB_MSI_TEST + tristate "NTB MSI Test Client" + depends on NTB_MSI + help + This tool demonstrates the use of the NTB MSI library to + send MSI interrupts between peers. + + If unsure, say N. diff --git a/drivers/ntb/test/Makefile b/drivers/ntb/test/Makefile index 9e77e0b761c2..d2895ca995e4 100644 --- a/drivers/ntb/test/Makefile +++ b/drivers/ntb/test/Makefile @@ -1,3 +1,4 @@ obj-$(CONFIG_NTB_PINGPONG) += ntb_pingpong.o obj-$(CONFIG_NTB_TOOL) += ntb_tool.o obj-$(CONFIG_NTB_PERF) += ntb_perf.o +obj-$(CONFIG_NTB_MSI_TEST) += ntb_msi_test.o diff --git a/drivers/ntb/test/ntb_msi_test.c b/drivers/ntb/test/ntb_msi_test.c new file mode 100644 index ..99d826ed9c34 --- /dev/null +++ b/drivers/ntb/test/ntb_msi_test.c @@ -0,0 +1,433 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) + +#include +#include +#include +#include +#include +#include + +MODULE_LICENSE("Dual BSD/GPL"); +MODULE_VERSION("0.1"); +MODULE_AUTHOR("Logan Gunthorpe "); +MODULE_DESCRIPTION("Test for sending MSI interrupts over an NTB memory window"); + +static int num_irqs = 4; +module_param(num_irqs, int, 0644); +MODULE_PARM_DESC(num_irqs, "number of irqs to use"); + +struct ntb_msit_ctx { + struct ntb_dev *ntb; + struct dentry *dbgfs_dir; + struct work_struct setup_work; + + struct ntb_msit_isr_ctx { + int irq_idx; + int irq_num; + int occurrences; + struct ntb_msit_ctx *nm; + struct ntb_msi_desc desc; + } *isr_ctx; + + struct ntb_msit_peer { + struct ntb_msit_ctx *nm; + int pidx; + int num_irqs; + struct completion init_comp; + struct ntb_msi_desc *msi_desc; + } peers[]; +}; + +static struct dentry *ntb_msit_dbgfs_topdir; + +static irqreturn_t ntb_msit_isr(int irq, void *dev) +{ + struct ntb_msit_isr_ctx *isr_ctx = dev; + struct ntb_msit_ctx *nm = isr_ctx->nm; + + dev_dbg(>ntb->dev, "Interrupt Occurred: %d", + isr_ctx->irq_idx); + + isr_ctx->occurrences++; + + return IRQ_HANDLED; +} + +static void ntb_msit_setup_work(struct work_struct *work) +{ + struct ntb_msit_ctx *nm = container_of(work, struct ntb_msit_ctx, + setup_work); + int irq_count = 0; + int irq; + int ret; + uintptr_t i; + + ret = ntb_msi_setup_mws(nm->ntb); + if (ret) { + dev_err(>ntb->dev, "Unable to setup MSI windows: %d\n", + ret); + return; + } + + for (i = 0; i < num_irqs; i++) { + nm->isr_ctx[i].irq_idx = i; + nm->isr_ctx[i].nm = nm; + + if (!nm->isr_ctx[i].irq_num) { + irq = ntbm_msi_request_irq(nm->ntb, ntb_msit_isr, + KBUILD_MODNAME, + >isr_ctx[i], + >isr_ctx[i].desc); + if (irq < 0) + break; + + nm->isr_ctx[i].irq_num = irq; + } + + ret =
[PATCH v3 08/10] NTB: Add ntb_msi_test support to ntb_test
When the ntb_msi_test module is available, the test code will trigger each of the interrupts and ensure the corresponding occurrences files gets incremented. Signed-off-by: Logan Gunthorpe Cc: Jon Mason Cc: Dave Jiang Cc: Allen Hubbe --- tools/testing/selftests/ntb/ntb_test.sh | 54 - 1 file changed, 52 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/ntb/ntb_test.sh b/tools/testing/selftests/ntb/ntb_test.sh index 17ca36403d04..1a10b8f67727 100755 --- a/tools/testing/selftests/ntb/ntb_test.sh +++ b/tools/testing/selftests/ntb/ntb_test.sh @@ -87,10 +87,10 @@ set -e function _modprobe() { - modprobe "$@" + modprobe "$@" || return 1 if [[ "$REMOTE_HOST" != "" ]]; then - ssh "$REMOTE_HOST" modprobe "$@" + ssh "$REMOTE_HOST" modprobe "$@" || return 1 fi } @@ -451,6 +451,30 @@ function pingpong_test() echo " Passed" } +function msi_test() +{ + LOC=$1 + REM=$2 + + write_file 1 $LOC/ready + + echo "Running MSI interrupt tests on: $(subdirname $LOC) / $(subdirname $REM)" + + CNT=$(read_file "$LOC/count") + for ((i = 0; i < $CNT; i++)); do + START=$(read_file $REM/../irq${i}_occurrences) + write_file $i $LOC/trigger + END=$(read_file $REM/../irq${i}_occurrences) + + if [[ $(($END - $START)) != 1 ]]; then + echo "MSI did not trigger the interrupt on the remote side!" >&2 + exit 1 + fi + done + + echo " Passed" +} + function perf_test() { USE_DMA=$1 @@ -529,6 +553,29 @@ function ntb_pingpong_tests() _modprobe -r ntb_pingpong } +function ntb_msi_tests() +{ + LOCAL_MSI="$DEBUGFS/ntb_msi_test/$LOCAL_DEV" + REMOTE_MSI="$REMOTE_HOST:$DEBUGFS/ntb_msi_test/$REMOTE_DEV" + + echo "Starting ntb_msi_test tests..." + + if ! _modprobe ntb_msi_test 2> /dev/null; then + echo " Not doing MSI tests seeing the module is not available." + return + fi + + port_test $LOCAL_MSI $REMOTE_MSI + + LOCAL_PEER="$LOCAL_MSI/peer$LOCAL_PIDX" + REMOTE_PEER="$REMOTE_MSI/peer$REMOTE_PIDX" + + msi_test $LOCAL_PEER $REMOTE_PEER + msi_test $REMOTE_PEER $LOCAL_PEER + + _modprobe -r ntb_msi_test +} + function ntb_perf_tests() { LOCAL_PERF="$DEBUGFS/ntb_perf/$LOCAL_DEV" @@ -550,6 +597,7 @@ function cleanup() _modprobe -r ntb_perf 2> /dev/null _modprobe -r ntb_pingpong 2> /dev/null _modprobe -r ntb_transport 2> /dev/null + _modprobe -r ntb_msi_test 2> /dev/null set -e } @@ -586,5 +634,7 @@ ntb_tool_tests echo ntb_pingpong_tests echo +ntb_msi_tests +echo ntb_perf_tests echo -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 05/10] NTB: Rename ntb.c to support multiple source files in the module
The kbuild system does not support having multiple source files in a module if one of those source files has the same name as the module. Therefore, we must rename ntb.c to core.c, while the module remains ntb.ko. This is similar to the way the nvme modules are structured. Signed-off-by: Logan Gunthorpe Cc: Jon Mason Cc: Dave Jiang Cc: Allen Hubbe --- drivers/ntb/Makefile | 2 ++ drivers/ntb/{ntb.c => core.c} | 0 2 files changed, 2 insertions(+) rename drivers/ntb/{ntb.c => core.c} (100%) diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile index 1921dec1949d..537226f8e78d 100644 --- a/drivers/ntb/Makefile +++ b/drivers/ntb/Makefile @@ -1,2 +1,4 @@ obj-$(CONFIG_NTB) += ntb.o hw/ test/ obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o + +ntb-y := core.o diff --git a/drivers/ntb/ntb.c b/drivers/ntb/core.c similarity index 100% rename from drivers/ntb/ntb.c rename to drivers/ntb/core.c -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 02/10] PCI/switchtec: Add module parameter to request more interrupts
Seeing the we want to use more interrupts in the NTB MSI code we need to be able allocate more (sometimes virtual) interrupts in the switchtec driver. Therefore add a module parameter to request to allocate additional interrupts. This puts virtually no limit on the number of MSI interrupts available to NTB clients. Signed-off-by: Logan Gunthorpe Cc: Bjorn Helgaas --- drivers/pci/switch/switchtec.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c index e22766c79fe9..8b1db78197d9 100644 --- a/drivers/pci/switch/switchtec.c +++ b/drivers/pci/switch/switchtec.c @@ -30,6 +30,10 @@ module_param(use_dma_mrpc, bool, 0644); MODULE_PARM_DESC(use_dma_mrpc, "Enable the use of the DMA MRPC feature"); +static int nirqs = 32; +module_param(nirqs, int, 0644); +MODULE_PARM_DESC(nirqs, "number of interrupts to allocate (more may be useful for NTB applications)"); + static dev_t switchtec_devt; static DEFINE_IDA(switchtec_minor_ida); @@ -1247,8 +1251,12 @@ static int switchtec_init_isr(struct switchtec_dev *stdev) int dma_mrpc_irq; int rc; - nvecs = pci_alloc_irq_vectors(stdev->pdev, 1, 4, - PCI_IRQ_MSIX | PCI_IRQ_MSI); + if (nirqs < 4) + nirqs = 4; + + nvecs = pci_alloc_irq_vectors(stdev->pdev, 1, nirqs, + PCI_IRQ_MSIX | PCI_IRQ_MSI | + PCI_IRQ_VIRTUAL); if (nvecs < 0) return nvecs; -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v3 01/10] PCI/MSI: Support allocating virtual MSI interrupts
For NTB devices, we want to be able to trigger MSI interrupts through a memory window. In these cases we may want to use more interrupts than the NTB PCI device has available in its MSI-X table. We allow for this by creating a new 'virtual' interrupt. These interrupts are allocated as usual but are not programmed into the MSI-X table (as there may not be space for them). The MSI address and data will then handled through an NTB MSI library introduced later in this series. Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas --- drivers/pci/msi.c | 54 + include/linux/msi.h | 8 +++ include/linux/pci.h | 9 3 files changed, 62 insertions(+), 9 deletions(-) diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 73986825d221..668bc16ef4d1 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -192,6 +192,9 @@ static void msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag) static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) { + if (desc->msi_attrib.is_virtual) + return NULL; + return desc->mask_base + desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE; } @@ -206,14 +209,19 @@ static void __iomem *pci_msix_desc_addr(struct msi_desc *desc) u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag) { u32 mask_bits = desc->masked; + void __iomem *desc_addr; if (pci_msi_ignore_mask) return 0; + desc_addr = pci_msix_desc_addr(desc); + if (!desc_addr) + return 0; mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT; if (flag) mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT; - writel(mask_bits, pci_msix_desc_addr(desc) + PCI_MSIX_ENTRY_VECTOR_CTRL); + + writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL); return mask_bits; } @@ -273,6 +281,11 @@ void __pci_read_msi_msg(struct msi_desc *entry, struct msi_msg *msg) if (entry->msi_attrib.is_msix) { void __iomem *base = pci_msix_desc_addr(entry); + if (!base) { + WARN_ON(1); + return; + } + msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR); msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR); msg->data = readl(base + PCI_MSIX_ENTRY_DATA); @@ -303,6 +316,9 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) } else if (entry->msi_attrib.is_msix) { void __iomem *base = pci_msix_desc_addr(entry); + if (!base) + goto skip; + writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR); writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR); writel(msg->data, base + PCI_MSIX_ENTRY_DATA); @@ -327,7 +343,13 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg) msg->data); } } + +skip: entry->msg = *msg; + + if (entry->write_msi_msg) + entry->write_msi_msg(entry, entry->write_msi_msg_data); + } void pci_write_msi_msg(unsigned int irq, struct msi_msg *msg) @@ -550,6 +572,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd) entry->msi_attrib.is_msix = 0; entry->msi_attrib.is_64 = !!(control & PCI_MSI_FLAGS_64BIT); + entry->msi_attrib.is_virtual= 0; entry->msi_attrib.entry_nr = 0; entry->msi_attrib.maskbit = !!(control & PCI_MSI_FLAGS_MASKBIT); entry->msi_attrib.default_irq = dev->irq; /* Save IOAPIC IRQ */ @@ -674,6 +697,7 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base, struct irq_affinity_desc *curmsk, *masks = NULL; struct msi_desc *entry; int ret, i; + int vec_count = pci_msix_vec_count(dev); if (affd) masks = irq_create_affinity_masks(nvec, affd); @@ -696,6 +720,10 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base, entry->msi_attrib.entry_nr = entries[i].entry; else entry->msi_attrib.entry_nr = i; + + entry->msi_attrib.is_virtual = + entry->msi_attrib.entry_nr >= vec_count; + entry->msi_attrib.default_irq = dev->irq; entry->mask_base= base; @@ -714,12 +742,19 @@ static void msix_program_entries(struct pci_dev *dev, { struct msi_desc *entry; int i = 0; + void __iomem *desc_addr; for_each_pci_msi_entry(entry, dev) { if (entries) entries[i++].vector = entry->irq; - entry->masked = readl(pci_msix_desc_addr(entry) + - PCI_MSIX_ENTRY_VECTOR_CTRL); +
[PATCH v3 03/10] NTB: Introduce helper functions to calculate logical port number
This patch introduces the "Logical Port Number" which is similar to the "Port Number" in that it enumerates the ports in the system. The original (or Physical) "Port Number" can be any number used by the hardware to uniquley identify a port in the system. The "Logical Port Number" enumerates all ports in the system from 0 to the number of ports minus one. For example a system with 5 ports might have the following port numbers which would be enumareted thusly: Port Number: 1 2 5 7 116 Logical Port Number: 0 1 2 3 4 The logical port number is useful when calculating which resources to use for which peers. So we thus define two helper functions: ntb_logical_port_number() and ntb_peer_logical_port_number() which provide the "Logical Port Number" for the local port and any peer respectively. Signed-off-by: Logan Gunthorpe Cc: Jon Mason Cc: Dave Jiang Cc: Allen Hubbe Cc: Serge Semin --- include/linux/ntb.h | 53 - 1 file changed, 52 insertions(+), 1 deletion(-) diff --git a/include/linux/ntb.h b/include/linux/ntb.h index 56a92e3ae3ae..52060b2130b6 100644 --- a/include/linux/ntb.h +++ b/include/linux/ntb.h @@ -616,7 +616,6 @@ static inline int ntb_port_number(struct ntb_dev *ntb) return ntb->ops->port_number(ntb); } - /** * ntb_peer_port_count() - get the number of peer device ports * @ntb: NTB device context. @@ -653,6 +652,58 @@ static inline int ntb_peer_port_number(struct ntb_dev *ntb, int pidx) return ntb->ops->peer_port_number(ntb, pidx); } +/** + * ntb_logical_port_number() - get the logical port number of the local port + * @ntb: NTB device context. + * + * The Logical Port Number is defined to be a unique number for each + * port starting from zero through to the number of ports minus one. + * This is in contrast to the Port Number where each port can be assigned + * any unqique physical number by the hardware. + * + * The logical port number is useful for calculating the resource indexes + * used by peers. + * + * Return: the logical port number or negative value indicating an error + */ +static inline int ntb_logical_port_number(struct ntb_dev *ntb) +{ + int lport = ntb_port_number(ntb); + int pidx; + + if (lport < 0) + return lport; + + for (pidx = 0; pidx < ntb_peer_port_count(ntb); pidx++) + if (lport <= ntb_peer_port_number(ntb, pidx)) + return pidx; + + return pidx; +} + +/** + * ntb_peer_logical_port_number() - get the logical peer port by given index + * @ntb: NTB device context. + * @pidx: Peer port index. + * + * The Logical Port Number is defined to be a unique number for each + * port starting from zero through to the number of ports minus one. + * This is in contrast to the Port Number where each port can be assigned + * any unqique physical number by the hardware. + * + * The logical port number is useful for calculating the resource indexes + * used by peers. + * + * Return: the peer's logical port number or negative value indicating an error + */ +static inline int ntb_peer_logical_port_number(struct ntb_dev *ntb, int pidx) +{ + if (ntb_peer_port_number(ntb, pidx) < ntb_port_number(ntb)) + return pidx; + else + return pidx + 1; +} + /** * ntb_peer_port_idx() - get the peer device port index by given port number * @ntb: NTB device context. -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted
Michael S. Tsirkin writes: > On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote: >> >> Another way of looking at this issue which also explains our reluctance >> >> is that the only difference between a secure guest and a regular guest >> >> (at least regarding virtio) is that the former uses swiotlb while the >> >> latter doens't. >> > >> > But swiotlb is just one implementation. It's a guest internal thing. The >> > issue is that memory isn't host accessible. >> >> >From what I understand of the ACCESS_PLATFORM definition, the host will >> only ever try to access memory addresses that are supplied to it by the >> guest, so all of the secure guest memory that the host cares about is >> accessible: >> >> If this feature bit is set to 0, then the device has same access to >> memory addresses supplied to it as the driver has. In particular, >> the device will always use physical addresses matching addresses >> used by the driver (typically meaning physical addresses used by the >> CPU) and not translated further, and can access any address supplied >> to it by the driver. When clear, this overrides any >> platform-specific description of whether device access is limited or >> translated in any way, e.g. whether an IOMMU may be present. >> >> All of the above is true for POWER guests, whether they are secure >> guests or not. >> >> Or are you saying that a virtio device may want to access memory >> addresses that weren't supplied to it by the driver? > > Your logic would apply to IOMMUs as well. For your mode, there are > specific encrypted memory regions that driver has access to but device > does not. that seems to violate the constraint. Right, if there's a pre-configured 1:1 mapping in the IOMMU such that the device can ignore the IOMMU for all practical purposes I would indeed say that the logic would apply to IOMMUs as well. :-) I guess I'm still struggling with the purpose of signalling to the driver that the host may not have access to memory addresses that it will never try to access. >> >> And from the device's point of view they're >> >> indistinguishable. It can't tell one guest that is using swiotlb from >> >> one that isn't. And that implies that secure guest vs regular guest >> >> isn't a virtio interface issue, it's "guest internal affairs". So >> >> there's no reason to reflect that in the feature flags. >> > >> > So don't. The way not to reflect that in the feature flags is >> > to set ACCESS_PLATFORM. Then you say *I don't care let platform device*. >> > >> > >> > Without ACCESS_PLATFORM >> > virtio has a very specific opinion about the security of the >> > device, and that opinion is that device is part of the guest >> > supervisor security domain. >> >> Sorry for being a bit dense, but not sure what "the device is part of >> the guest supervisor security domain" means. In powerpc-speak, >> "supervisor" is the operating system so perhaps that explains my >> confusion. Are you saying that without ACCESS_PLATFORM, the guest >> considers the host to be part of the guest operating system's security >> domain? > > I think so. The spec says "device has same access as driver". Ok, makes sense. >> If so, does that have any other implication besides "the host >> can access any address supplied to it by the driver"? If that is the >> case, perhaps the definition of ACCESS_PLATFORM needs to be amended to >> include that information because it's not part of the current >> definition. >> >> >> > But the name "sev_active" makes me scared because at least AMD guys who >> >> > were doing the sensible thing and setting ACCESS_PLATFORM >> >> >> >> My understanding is, AMD guest-platform knows in advance that their >> >> guest will run in secure mode and hence sets the flag at the time of VM >> >> instantiation. Unfortunately we dont have that luxury on our platforms. >> > >> > Well you do have that luxury. It looks like that there are existing >> > guests that already acknowledge ACCESS_PLATFORM and you are not happy >> > with how that path is slow. So you are trying to optimize for >> > them by clearing ACCESS_PLATFORM and then you have lost ability >> > to invoke DMA API. >> > >> > For example if there was another flag just like ACCESS_PLATFORM >> > just not yet used by anyone, you would be all fine using that right? >> >> Yes, a new flag sounds like a great idea. What about the definition >> below? >> >> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as >> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the >> exception that the IOMMU is explicitly defined to be off or bypassed >> when accessing memory addresses supplied to the device by the >> driver. This flag should be set by the guest if offered, but to >> allow for backward-compatibility device implementations allow for it >> to be left unset by the guest. It is an error to set both this flag >> and VIRTIO_F_ACCESS_PLATFORM.
Re: [PATCH v2 RFC/RFT] dma-contiguous: Get normal pages for single-page allocations
Hi Catalin, Thank you for the review. And I realized that the free() path is missing too. On Tue, Mar 19, 2019 at 02:43:01PM +, Catalin Marinas wrote: > On Tue, Mar 05, 2019 at 10:32:02AM -0800, Nicolin Chen wrote: > > The addresses within a single page are always contiguous, so it's > > not so necessary to always allocate one single page from CMA area. > > Since the CMA area has a limited predefined size of space, it may > > run out of space in heavy use cases, where there might be quite a > > lot CMA pages being allocated for single pages. > > > > However, there is also a concern that a device might care where a > > page comes from -- it might expect the page from CMA area and act > > differently if the page doesn't. > > > > This patch tries to get normal pages for single-page allocations > > unless the device has its own CMA area. This would save resources > > from the CMA area for more CMA allocations. And it'd also reduce > > CMA fragmentations resulted from trivial allocations. > > This is not sufficient. Some architectures/platforms declare limits on > the CMA range so that DMA is possible with all expected devices. For > example, on arm64 we keep the CMA in the lower 4GB of the address range, > though with this patch you only covered the iommu ops allocation. I will follow the way of v1 by adding alloc_page()/free_page() function to those callers who don't have fallback allocations. In this way, archs may use different callbacks to alloc pages. > Do you have any numbers to back this up? You don't seem to address > dma_direct_alloc() either but, as I said above, it's not trivial since > some platforms expect certain physical range for DMA allocations. What's the dma_direct_alloc() here about? Mind elaborating? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 09/22] vfio: VFIO_IOMMU_BIND/UNBIND_MSI
On Sun, 17 Mar 2019 18:22:19 +0100 Eric Auger wrote: > This patch adds the VFIO_IOMMU_BIND/UNBIND_MSI ioctl which aim > to pass/withdraw the guest MSI binding to/from the host. > > Signed-off-by: Eric Auger > > --- > v3 -> v4: > - add UNBIND > - unwind on BIND error > > v2 -> v3: > - adapt to new proto of bind_guest_msi > - directly use vfio_iommu_for_each_dev > > v1 -> v2: > - s/vfio_iommu_type1_guest_msi_binding/vfio_iommu_type1_bind_guest_msi > --- > drivers/vfio/vfio_iommu_type1.c | 58 + > include/uapi/linux/vfio.h | 29 + > 2 files changed, 87 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 12a40b9db6aa..66513679081b 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -1710,6 +1710,25 @@ static int vfio_cache_inv_fn(struct device *dev, void > *data) > return iommu_cache_invalidate(d, dev, >info); > } > > +static int vfio_bind_msi_fn(struct device *dev, void *data) > +{ > + struct vfio_iommu_type1_bind_msi *ustruct = > + (struct vfio_iommu_type1_bind_msi *)data; > + struct iommu_domain *d = iommu_get_domain_for_dev(dev); > + > + return iommu_bind_guest_msi(d, dev, ustruct->iova, > + ustruct->gpa, ustruct->size); > +} > + > +static int vfio_unbind_msi_fn(struct device *dev, void *data) > +{ > + dma_addr_t *iova = (dma_addr_t *)data; > + struct iommu_domain *d = iommu_get_domain_for_dev(dev); Same as previous, we can encapsulate domain in our own struct to avoid a lookup. > + > + iommu_unbind_guest_msi(d, dev, *iova); Is it strange that iommu-core is exposing these interfaces at a device level if every one of them requires us to walk all the devices? Thanks, Alex > + return 0; > +} > + > static long vfio_iommu_type1_ioctl(void *iommu_data, > unsigned int cmd, unsigned long arg) > { > @@ -1814,6 +1833,45 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, > vfio_cache_inv_fn); > mutex_unlock(>lock); > return ret; > + } else if (cmd == VFIO_IOMMU_BIND_MSI) { > + struct vfio_iommu_type1_bind_msi ustruct; > + int ret; > + > + minsz = offsetofend(struct vfio_iommu_type1_bind_msi, > + size); > + > + if (copy_from_user(, (void __user *)arg, minsz)) > + return -EFAULT; > + > + if (ustruct.argsz < minsz || ustruct.flags) > + return -EINVAL; > + > + mutex_lock(>lock); > + ret = vfio_iommu_for_each_dev(iommu, , > + vfio_bind_msi_fn); > + if (ret) > + vfio_iommu_for_each_dev(iommu, , > + vfio_unbind_msi_fn); > + mutex_unlock(>lock); > + return ret; > + } else if (cmd == VFIO_IOMMU_UNBIND_MSI) { > + struct vfio_iommu_type1_unbind_msi ustruct; > + int ret; > + > + minsz = offsetofend(struct vfio_iommu_type1_unbind_msi, > + iova); > + > + if (copy_from_user(, (void __user *)arg, minsz)) > + return -EFAULT; > + > + if (ustruct.argsz < minsz || ustruct.flags) > + return -EINVAL; > + > + mutex_lock(>lock); > + ret = vfio_iommu_for_each_dev(iommu, , > + vfio_unbind_msi_fn); > + mutex_unlock(>lock); > + return ret; > } > > return -ENOTTY; > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 29f0ef2d805d..6763389b6adc 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -789,6 +789,35 @@ struct vfio_iommu_type1_cache_invalidate { > }; > #define VFIO_IOMMU_CACHE_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 24) > > +/** > + * VFIO_IOMMU_BIND_MSI - _IOWR(VFIO_TYPE, VFIO_BASE + 25, > + * struct vfio_iommu_type1_bind_msi) > + * > + * Pass a stage 1 MSI doorbell mapping to the host so that this > + * latter can build a nested stage2 mapping > + */ > +struct vfio_iommu_type1_bind_msi { > + __u32 argsz; > + __u32 flags; > + __u64 iova; > + __u64 gpa; > + __u64 size; > +}; > +#define VFIO_IOMMU_BIND_MSI _IO(VFIO_TYPE, VFIO_BASE + 25) > + > +/** > + * VFIO_IOMMU_UNBIND_MSI - _IOWR(VFIO_TYPE, VFIO_BASE + 26, > + * struct vfio_iommu_type1_unbind_msi) > + * > + * Unregister an MSI mapping > + */ > +struct vfio_iommu_type1_unbind_msi { > + __u32 argsz; > + __u32 flags; > + __u64 iova; > +}; > +#define VFIO_IOMMU_UNBIND_MSI _IO(VFIO_TYPE, VFIO_BASE + 26) > + > /* Additional API for SPAPR TCE (Server
[PATCH 5/7] arm: use a dummy struct device for ISA DMA use of the DMA API
This gets rid of the last NULL dev argument passed to the DMA API. Signed-off-by: Christoph Hellwig --- arch/arm/kernel/dma-isa.c | 8 +++- arch/arm/mach-rpc/dma.c | 8 +++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/arm/kernel/dma-isa.c b/arch/arm/kernel/dma-isa.c index 84363fe7bad2..10c45cc6b957 100644 --- a/arch/arm/kernel/dma-isa.c +++ b/arch/arm/kernel/dma-isa.c @@ -55,6 +55,12 @@ static int isa_get_dma_residue(unsigned int chan, dma_t *dma) return chan < 4 ? count : (count << 1); } +static struct device isa_dma_dev = { + .init_name = "fallback device", + .coherent_dma_mask = ~(dma_addr_t)0, + .dma_mask = _dma_dev.coherent_dma_mask, +}; + static void isa_enable_dma(unsigned int chan, dma_t *dma) { if (dma->invalid) { @@ -89,7 +95,7 @@ static void isa_enable_dma(unsigned int chan, dma_t *dma) dma->sg = >buf; dma->sgcount = 1; dma->buf.length = dma->count; - dma->buf.dma_address = dma_map_single(NULL, + dma->buf.dma_address = dma_map_single(_dma_dev, dma->addr, dma->count, direction); } diff --git a/arch/arm/mach-rpc/dma.c b/arch/arm/mach-rpc/dma.c index fb48f3141fb4..f2703ca17954 100644 --- a/arch/arm/mach-rpc/dma.c +++ b/arch/arm/mach-rpc/dma.c @@ -151,6 +151,12 @@ static void iomd_free_dma(unsigned int chan, dma_t *dma) free_irq(idma->irq, idma); } +static struct device isa_dma_dev = { + .init_name = "fallback device", + .coherent_dma_mask = ~(dma_addr_t)0, + .dma_mask = _dma_dev.coherent_dma_mask, +}; + static void iomd_enable_dma(unsigned int chan, dma_t *dma) { struct iomd_dma *idma = container_of(dma, struct iomd_dma, dma); @@ -168,7 +174,7 @@ static void iomd_enable_dma(unsigned int chan, dma_t *dma) idma->dma.sg = >dma.buf; idma->dma.sgcount = 1; idma->dma.buf.length = idma->dma.count; - idma->dma.buf.dma_address = dma_map_single(NULL, + idma->dma.buf.dma_address = dma_map_single(_dma_dev, idma->dma.addr, idma->dma.count, idma->dma.dma_mode == DMA_MODE_READ ? DMA_FROM_DEVICE : DMA_TO_DEVICE); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 7/7] x86: remove the x86_dma_fallback_dev hack
Now that we removed support for the NULL device argument in the DMA API, there is no need to cater for that in the x86 code. Signed-off-by: Christoph Hellwig --- arch/x86/include/asm/dma-mapping.h | 10 -- arch/x86/kernel/amd_gart_64.c | 6 -- arch/x86/kernel/pci-dma.c | 20 kernel/dma/mapping.c | 7 --- 4 files changed, 43 deletions(-) diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h index ce4d176b3d13..6b15a24930e0 100644 --- a/arch/x86/include/asm/dma-mapping.h +++ b/arch/x86/include/asm/dma-mapping.h @@ -13,14 +13,7 @@ #include #include -#ifdef CONFIG_ISA -# define ISA_DMA_BIT_MASK DMA_BIT_MASK(24) -#else -# define ISA_DMA_BIT_MASK DMA_BIT_MASK(32) -#endif - extern int iommu_merge; -extern struct device x86_dma_fallback_dev; extern int panic_on_overflow; extern const struct dma_map_ops *dma_ops; @@ -30,7 +23,4 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus) return dma_ops; } -bool arch_dma_alloc_attrs(struct device **dev); -#define arch_dma_alloc_attrs arch_dma_alloc_attrs - #endif diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c index 2c0aa34af69c..bf7f13ea3c64 100644 --- a/arch/x86/kernel/amd_gart_64.c +++ b/arch/x86/kernel/amd_gart_64.c @@ -233,9 +233,6 @@ static dma_addr_t gart_map_page(struct device *dev, struct page *page, unsigned long bus; phys_addr_t paddr = page_to_phys(page) + offset; - if (!dev) - dev = _dma_fallback_dev; - if (!need_iommu(dev, paddr, size)) return paddr; @@ -392,9 +389,6 @@ static int gart_map_sg(struct device *dev, struct scatterlist *sg, int nents, if (nents == 0) return 0; - if (!dev) - dev = _dma_fallback_dev; - out = 0; start = 0; start_sg= sg; diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index d460998ae828..dcd272dbd0a9 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -51,14 +51,6 @@ int iommu_pass_through __read_mostly; extern struct iommu_table_entry __iommu_table[], __iommu_table_end[]; -/* Dummy device used for NULL arguments (normally ISA). */ -struct device x86_dma_fallback_dev = { - .init_name = "fallback device", - .coherent_dma_mask = ISA_DMA_BIT_MASK, - .dma_mask = _dma_fallback_dev.coherent_dma_mask, -}; -EXPORT_SYMBOL(x86_dma_fallback_dev); - void __init pci_iommu_alloc(void) { struct iommu_table_entry *p; @@ -77,18 +69,6 @@ void __init pci_iommu_alloc(void) } } -bool arch_dma_alloc_attrs(struct device **dev) -{ - if (!*dev) - *dev = _dma_fallback_dev; - - if (!is_device_dma_capable(*dev)) - return false; - return true; - -} -EXPORT_SYMBOL(arch_dma_alloc_attrs); - /* * See for the iommu kernel * parameter documentation. diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index c000906348c9..685a53f2a793 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -238,10 +238,6 @@ u64 dma_get_required_mask(struct device *dev) } EXPORT_SYMBOL_GPL(dma_get_required_mask); -#ifndef arch_dma_alloc_attrs -#define arch_dma_alloc_attrs(dev) (true) -#endif - void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t flag, unsigned long attrs) { @@ -256,9 +252,6 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle, /* let the implementation decide on the zone to allocate from: */ flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM); - if (!arch_dma_alloc_attrs()) - return NULL; - if (dma_is_direct(ops)) cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs); else if (ops->alloc) -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/7] dma-mapping: remove leftover NULL device support
Most dma_map_ops implementations already had some issues with a NULL device, or did simply crash if one was fed to them. Now that we have cleaned up all the obvious offenders we can stop to pretend we support this mode. Signed-off-by: Christoph Hellwig --- Documentation/DMA-API-HOWTO.txt | 13 ++--- include/linux/dma-mapping.h | 6 +++--- kernel/dma/direct.c | 2 +- 3 files changed, 10 insertions(+), 11 deletions(-) diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt index 1a721d0f35c8..0e98cf7466dd 100644 --- a/Documentation/DMA-API-HOWTO.txt +++ b/Documentation/DMA-API-HOWTO.txt @@ -365,13 +365,12 @@ __get_free_pages() (but takes size instead of a page order). If your driver needs regions sized smaller than a page, you may prefer using the dma_pool interface, described below. -The consistent DMA mapping interfaces, for non-NULL dev, will by -default return a DMA address which is 32-bit addressable. Even if the -device indicates (via DMA mask) that it may address the upper 32-bits, -consistent allocation will only return > 32-bit addresses for DMA if -the consistent DMA mask has been explicitly changed via -dma_set_coherent_mask(). This is true of the dma_pool interface as -well. +The consistent DMA mapping interfaces, will by default return a DMA address +which is 32-bit addressable. Even if the device indicates (via the DMA mask) +that it may address the upper 32-bits, consistent allocation will only +return > 32-bit addresses for DMA if the consistent DMA mask has been +explicitly changed via dma_set_coherent_mask(). This is true of the +dma_pool interface as well. dma_alloc_coherent() returns two values: the virtual address which you can use to access it from the CPU and dma_handle which you pass to the diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 75e60be91e5f..6309a721394b 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -267,9 +267,9 @@ size_t dma_direct_max_mapping_size(struct device *dev); static inline const struct dma_map_ops *get_dma_ops(struct device *dev) { - if (dev && dev->dma_ops) + if (dev->dma_ops) return dev->dma_ops; - return get_arch_dma_ops(dev ? dev->bus : NULL); + return get_arch_dma_ops(dev->bus); } static inline void set_dma_ops(struct device *dev, @@ -650,7 +650,7 @@ static inline void dma_free_coherent(struct device *dev, size_t size, static inline u64 dma_get_mask(struct device *dev) { - if (dev && dev->dma_mask && *dev->dma_mask) + if (dev->dma_mask && *dev->dma_mask) return *dev->dma_mask; return DMA_BIT_MASK(32); } diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index fcdb23e8d2fc..2c2772e9702a 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -311,7 +311,7 @@ static inline bool dma_direct_possible(struct device *dev, dma_addr_t dma_addr, size_t size) { return swiotlb_force != SWIOTLB_FORCE && - (!dev || dma_capable(dev, dma_addr, size)); + dma_capable(dev, dma_addr, size); } dma_addr_t dma_direct_map_page(struct device *dev, struct page *page, -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/7] parport_ip32: pass struct device to DMA API functions
The DMA API generally relies on a struct device to work properly, and only barely works without one for legacy reasons. Pass the easily available struct device from the platform_device to remedy this. Signed-off-by: Christoph Hellwig --- drivers/parport/parport_ip32.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/parport/parport_ip32.c b/drivers/parport/parport_ip32.c index 62873070f988..b7a892791c3e 100644 --- a/drivers/parport/parport_ip32.c +++ b/drivers/parport/parport_ip32.c @@ -568,6 +568,7 @@ static irqreturn_t parport_ip32_merr_interrupt(int irq, void *dev_id) /** * parport_ip32_dma_start - begins a DMA transfer + * @p: partport to work on * @dir: DMA direction: DMA_TO_DEVICE or DMA_FROM_DEVICE * @addr: pointer to data buffer * @count: buffer size @@ -575,8 +576,8 @@ static irqreturn_t parport_ip32_merr_interrupt(int irq, void *dev_id) * Calls to parport_ip32_dma_start() and parport_ip32_dma_stop() must be * correctly balanced. */ -static int parport_ip32_dma_start(enum dma_data_direction dir, - void *addr, size_t count) +static int parport_ip32_dma_start(struct parport *p, + enum dma_data_direction dir, void *addr, size_t count) { unsigned int limit; u64 ctrl; @@ -601,7 +602,7 @@ static int parport_ip32_dma_start(enum dma_data_direction dir, /* Prepare DMA pointers */ parport_ip32_dma.dir = dir; - parport_ip32_dma.buf = dma_map_single(NULL, addr, count, dir); + parport_ip32_dma.buf = dma_map_single(>bus_dev, addr, count, dir); parport_ip32_dma.len = count; parport_ip32_dma.next = parport_ip32_dma.buf; parport_ip32_dma.left = parport_ip32_dma.len; @@ -625,11 +626,12 @@ static int parport_ip32_dma_start(enum dma_data_direction dir, /** * parport_ip32_dma_stop - ends a running DMA transfer + * @p: partport to work on * * Calls to parport_ip32_dma_start() and parport_ip32_dma_stop() must be * correctly balanced. */ -static void parport_ip32_dma_stop(void) +static void parport_ip32_dma_stop(struct parport *p) { u64 ctx_a; u64 ctx_b; @@ -685,8 +687,8 @@ static void parport_ip32_dma_stop(void) enable_irq(MACEISA_PAR_CTXB_IRQ); parport_ip32_dma.irq_on = 1; - dma_unmap_single(NULL, parport_ip32_dma.buf, parport_ip32_dma.len, -parport_ip32_dma.dir); + dma_unmap_single(>bus_dev, parport_ip32_dma.buf, +parport_ip32_dma.len, parport_ip32_dma.dir); } /** @@ -1445,7 +1447,7 @@ static size_t parport_ip32_fifo_write_block_dma(struct parport *p, priv->irq_mode = PARPORT_IP32_IRQ_HERE; - parport_ip32_dma_start(DMA_TO_DEVICE, (void *)buf, len); + parport_ip32_dma_start(p, DMA_TO_DEVICE, (void *)buf, len); reinit_completion(>irq_complete); parport_ip32_frob_econtrol(p, ECR_DMAEN | ECR_SERVINTR, ECR_DMAEN); @@ -1461,7 +1463,7 @@ static size_t parport_ip32_fifo_write_block_dma(struct parport *p, if (ecr & ECR_SERVINTR) break; /* DMA transfer just finished */ } - parport_ip32_dma_stop(); + parport_ip32_dma_stop(p); written = len - parport_ip32_dma_get_residue(); priv->irq_mode = PARPORT_IP32_IRQ_FWD; -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
remove NULL struct device support in the DMA API
We still have a few drivers which pass a NULL struct device pointer to DMA API functions, which generally is a bad idea as the API implementations rely on the device not only for ops selection, but also the dma mask and various other attributes, and many implementations have been broken for NULL device support for a while. This series removes the few remaning users that weren't picked up in the last merge window and then removes core support for this "feature". A git tree is also available at: git://git.infradead.org/users/hch/misc.git dma-remove-NULL-dev-support Gitweb: http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-remove-NULL-dev-support ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/7] gbefb: switch to managed version of the DMA allocator
gbefb uses managed resources, so it should do the same for DMA allocations. Signed-off-by: Christoph Hellwig --- drivers/video/fbdev/gbefb.c | 24 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c index 1a242b1338e9..3fcb33232ba3 100644 --- a/drivers/video/fbdev/gbefb.c +++ b/drivers/video/fbdev/gbefb.c @@ -1162,9 +1162,9 @@ static int gbefb_probe(struct platform_device *p_dev) } gbe_revision = gbe->ctrlstat & 15; - gbe_tiles.cpu = - dma_alloc_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t), - _tiles.dma, GFP_KERNEL); + gbe_tiles.cpu = dmam_alloc_coherent(_dev->dev, + GBE_TLB_SIZE * sizeof(uint16_t), + _tiles.dma, GFP_KERNEL); if (!gbe_tiles.cpu) { printk(KERN_ERR "gbefb: couldn't allocate tiles table\n"); ret = -ENOMEM; @@ -1178,19 +1178,20 @@ static int gbefb_probe(struct platform_device *p_dev) if (!gbe_mem) { printk(KERN_ERR "gbefb: couldn't map framebuffer\n"); ret = -ENOMEM; - goto out_tiles_free; + goto out_release_mem_region; } gbe_dma_addr = 0; } else { /* try to allocate memory with the classical allocator * this has high chance to fail on low memory machines */ - gbe_mem = dma_alloc_wc(NULL, gbe_mem_size, _dma_addr, - GFP_KERNEL); + gbe_mem = dmam_alloc_attrs(_dev->dev, gbe_mem_size, + _dma_addr, GFP_KERNEL, + DMA_ATTR_WRITE_COMBINE); if (!gbe_mem) { printk(KERN_ERR "gbefb: couldn't allocate framebuffer memory\n"); ret = -ENOMEM; - goto out_tiles_free; + goto out_release_mem_region; } gbe_mem_phys = (unsigned long) gbe_dma_addr; @@ -1237,11 +1238,6 @@ static int gbefb_probe(struct platform_device *p_dev) out_gbe_unmap: arch_phys_wc_del(par->wc_cookie); - if (gbe_dma_addr) - dma_free_wc(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys); -out_tiles_free: - dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t), - (void *)gbe_tiles.cpu, gbe_tiles.dma); out_release_mem_region: release_mem_region(GBE_BASE, sizeof(struct sgi_gbe)); out_release_framebuffer: @@ -1258,10 +1254,6 @@ static int gbefb_remove(struct platform_device* p_dev) unregister_framebuffer(info); gbe_turn_off(); arch_phys_wc_del(par->wc_cookie); - if (gbe_dma_addr) - dma_free_wc(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys); - dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t), - (void *)gbe_tiles.cpu, gbe_tiles.dma); release_mem_region(GBE_BASE, sizeof(struct sgi_gbe)); gbefb_remove_sysfs(_dev->dev); framebuffer_release(info); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/7] da8xx-fb: pass struct device to DMA API functions
The DMA API generally relies on a struct device to work properly, and only barely works without one for legacy reasons. Pass the easily available struct device from the platform_device to remedy this. Signed-off-by: Christoph Hellwig --- drivers/video/fbdev/da8xx-fb.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/video/fbdev/da8xx-fb.c b/drivers/video/fbdev/da8xx-fb.c index 43f2a4816860..ec62274b914b 100644 --- a/drivers/video/fbdev/da8xx-fb.c +++ b/drivers/video/fbdev/da8xx-fb.c @@ -1097,9 +1097,9 @@ static int fb_remove(struct platform_device *dev) unregister_framebuffer(info); fb_dealloc_cmap(>cmap); - dma_free_coherent(NULL, PALETTE_SIZE, par->v_palette_base, + dma_free_coherent(par->dev, PALETTE_SIZE, par->v_palette_base, par->p_palette_base); - dma_free_coherent(NULL, par->vram_size, par->vram_virt, + dma_free_coherent(par->dev, par->vram_size, par->vram_virt, par->vram_phys); pm_runtime_put_sync(>dev); pm_runtime_disable(>dev); @@ -1425,7 +1425,7 @@ static int fb_probe(struct platform_device *device) par->vram_size = roundup(par->vram_size/8, ulcm); par->vram_size = par->vram_size * LCD_NUM_BUFFERS; - par->vram_virt = dma_alloc_coherent(NULL, + par->vram_virt = dma_alloc_coherent(par->dev, par->vram_size, >vram_phys, GFP_KERNEL | GFP_DMA); @@ -1446,7 +1446,7 @@ static int fb_probe(struct platform_device *device) da8xx_fb_fix.line_length - 1; /* allocate palette buffer */ - par->v_palette_base = dma_alloc_coherent(NULL, PALETTE_SIZE, + par->v_palette_base = dma_alloc_coherent(par->dev, PALETTE_SIZE, >p_palette_base, GFP_KERNEL | GFP_DMA); if (!par->v_palette_base) { @@ -1532,11 +1532,12 @@ static int fb_probe(struct platform_device *device) fb_dealloc_cmap(_fb_info->cmap); err_release_pl_mem: - dma_free_coherent(NULL, PALETTE_SIZE, par->v_palette_base, + dma_free_coherent(par->dev, PALETTE_SIZE, par->v_palette_base, par->p_palette_base); err_release_fb_mem: - dma_free_coherent(NULL, par->vram_size, par->vram_virt, par->vram_phys); + dma_free_coherent(par->dev, par->vram_size, par->vram_virt, + par->vram_phys); err_release_fb: framebuffer_release(da8xx_fb_info); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/7] pxa3xx-gcu: pass struct device to dma_mmap_coherent
Just like we do for all other DMA operations. Signed-off-by: Christoph Hellwig --- drivers/video/fbdev/pxa3xx-gcu.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/video/fbdev/pxa3xx-gcu.c b/drivers/video/fbdev/pxa3xx-gcu.c index 69cfb337c857..047a2fa4b87e 100644 --- a/drivers/video/fbdev/pxa3xx-gcu.c +++ b/drivers/video/fbdev/pxa3xx-gcu.c @@ -96,6 +96,7 @@ struct pxa3xx_gcu_batch { }; struct pxa3xx_gcu_priv { + struct device*dev; void __iomem *mmio_base; struct clk *clk; struct pxa3xx_gcu_shared *shared; @@ -493,7 +494,7 @@ pxa3xx_gcu_mmap(struct file *file, struct vm_area_struct *vma) if (size != SHARED_SIZE) return -EINVAL; - return dma_mmap_coherent(NULL, vma, + return dma_mmap_coherent(priv->dev, vma, priv->shared, priv->shared_phys, size); case SHARED_SIZE >> PAGE_SHIFT: @@ -670,6 +671,7 @@ static int pxa3xx_gcu_probe(struct platform_device *pdev) platform_set_drvdata(pdev, priv); priv->resource_mem = r; + priv->dev = dev; pxa3xx_gcu_reset(priv); pxa3xx_gcu_init_debug_timer(priv); -- 2.20.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 08/22] vfio: VFIO_IOMMU_CACHE_INVALIDATE
On Sun, 17 Mar 2019 18:22:18 +0100 Eric Auger wrote: > From: "Liu, Yi L" > > When the guest "owns" the stage 1 translation structures, the host > IOMMU driver has no knowledge of caching structure updates unless > the guest invalidation requests are trapped and passed down to the > host. > > This patch adds the VFIO_IOMMU_CACHE_INVALIDATE ioctl with aims > at propagating guest stage1 IOMMU cache invalidations to the host. > > Signed-off-by: Liu, Yi L > Signed-off-by: Eric Auger > > --- > > v2 -> v3: > - introduce vfio_iommu_for_each_dev back in this patch > > v1 -> v2: > - s/TLB/CACHE > - remove vfio_iommu_task usage > - commit message rewording > --- > drivers/vfio/vfio_iommu_type1.c | 47 + > include/uapi/linux/vfio.h | 13 + > 2 files changed, 60 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 222e9199edbf..12a40b9db6aa 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -113,6 +113,26 @@ struct vfio_regions { > #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \ > (!list_empty(>domain_list)) > static struct foo { struct iommu_domain *domain; void *data; }; > +/* iommu->lock must be held */ > +static int > +vfio_iommu_for_each_dev(struct vfio_iommu *iommu, void *data, > + int (*fn)(struct device *, void *)) > +{ > + struct vfio_domain *d; > + struct vfio_group *g; > + int ret = 0; struct foo bar = { .data = data }; > + > + list_for_each_entry(d, >domain_list, next) { bar.domain = d->domain; > + list_for_each_entry(g, >group_list, next) { > + ret = iommu_group_for_each_dev(g->iommu_group, > +data, fn); s/data// > + if (ret) > + break; > + } > + } > + return ret; > +} > + > static int put_pfn(unsigned long pfn, int prot); > > /* > @@ -1681,6 +1701,15 @@ vfio_attach_pasid_table(struct vfio_iommu *iommu, > return ret; > } > > +static int vfio_cache_inv_fn(struct device *dev, void *data) > +{ struct foo *bar = data; > + struct vfio_iommu_type1_cache_invalidate *ustruct = > + (struct vfio_iommu_type1_cache_invalidate *)data; ... = bar->data; > + struct iommu_domain *d = iommu_get_domain_for_dev(dev); ... = bar->domain; ¯\_(ツ)_/¯ seems more efficient that doing a lookup. > + > + return iommu_cache_invalidate(d, dev, >info); > +} > + > static long vfio_iommu_type1_ioctl(void *iommu_data, > unsigned int cmd, unsigned long arg) > { > @@ -1767,6 +1796,24 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, > } else if (cmd == VFIO_IOMMU_DETACH_PASID_TABLE) { > vfio_detach_pasid_table(iommu); > return 0; > + } else if (cmd == VFIO_IOMMU_CACHE_INVALIDATE) { > + struct vfio_iommu_type1_cache_invalidate ustruct; > + int ret; > + > + minsz = offsetofend(struct vfio_iommu_type1_cache_invalidate, > + info); > + > + if (copy_from_user(, (void __user *)arg, minsz)) > + return -EFAULT; > + > + if (ustruct.argsz < minsz || ustruct.flags) > + return -EINVAL; > + > + mutex_lock(>lock); > + ret = vfio_iommu_for_each_dev(iommu, , > + vfio_cache_inv_fn); Guess what has a version field that never gets checked ;) Thanks, Alex > + mutex_unlock(>lock); > + return ret; > } > > return -ENOTTY; > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 329d378565d9..29f0ef2d805d 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -776,6 +776,19 @@ struct vfio_iommu_type1_attach_pasid_table { > #define VFIO_IOMMU_ATTACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 22) > #define VFIO_IOMMU_DETACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 23) > > +/** > + * VFIO_IOMMU_CACHE_INVALIDATE - _IOWR(VFIO_TYPE, VFIO_BASE + 24, > + * struct vfio_iommu_type1_cache_invalidate) > + * > + * Propagate guest IOMMU cache invalidation to the host. > + */ > +struct vfio_iommu_type1_cache_invalidate { > + __u32 argsz; > + __u32 flags; > + struct iommu_cache_invalidate_info info; > +}; > +#define VFIO_IOMMU_CACHE_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 24) > + > /* Additional API for SPAPR TCE (Server POWERPC) IOMMU */ > > /* ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 07/22] vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE
On Sun, 17 Mar 2019 18:22:17 +0100 Eric Auger wrote: > From: "Liu, Yi L" > > This patch adds VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE ioctl > which aims to pass/withdraw the virtual iommu guest configuration > to/from the VFIO driver downto to the iommu subsystem. > > Signed-off-by: Jacob Pan > Signed-off-by: Liu, Yi L > Signed-off-by: Eric Auger > > --- > v3 -> v4: > - restore ATTACH/DETACH > - add unwind on failure > > v2 -> v3: > - s/BIND_PASID_TABLE/SET_PASID_TABLE > > v1 -> v2: > - s/BIND_GUEST_STAGE/BIND_PASID_TABLE > - remove the struct device arg > --- > drivers/vfio/vfio_iommu_type1.c | 53 + > include/uapi/linux/vfio.h | 17 +++ > 2 files changed, 70 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 73652e21efec..222e9199edbf 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -1644,6 +1644,43 @@ static int vfio_domains_have_iommu_cache(struct > vfio_iommu *iommu) > return ret; > } > > +static void > +vfio_detach_pasid_table(struct vfio_iommu *iommu) > +{ > + struct vfio_domain *d; > + > + mutex_lock(>lock); > + > + list_for_each_entry(d, >domain_list, next) { > + iommu_detach_pasid_table(d->domain); > + } > + mutex_unlock(>lock); > +} > + > +static int > +vfio_attach_pasid_table(struct vfio_iommu *iommu, > + struct vfio_iommu_type1_attach_pasid_table *ustruct) > +{ > + struct vfio_domain *d; > + int ret = 0; > + > + mutex_lock(>lock); > + > + list_for_each_entry(d, >domain_list, next) { > + ret = iommu_attach_pasid_table(d->domain, >config); > + if (ret) > + goto unwind; > + } > + goto unlock; > +unwind: > + list_for_each_entry_continue_reverse(d, >domain_list, next) { > + iommu_detach_pasid_table(d->domain); > + } > +unlock: > + mutex_unlock(>lock); > + return ret; > +} > + > static long vfio_iommu_type1_ioctl(void *iommu_data, > unsigned int cmd, unsigned long arg) > { > @@ -1714,6 +1751,22 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, > > return copy_to_user((void __user *)arg, , minsz) ? > -EFAULT : 0; > + } else if (cmd == VFIO_IOMMU_ATTACH_PASID_TABLE) { > + struct vfio_iommu_type1_attach_pasid_table ustruct; > + > + minsz = offsetofend(struct vfio_iommu_type1_attach_pasid_table, > + config); > + > + if (copy_from_user(, (void __user *)arg, minsz)) > + return -EFAULT; > + > + if (ustruct.argsz < minsz || ustruct.flags) > + return -EINVAL; Who is responsible for validating the ustruct.config? arm_smmu_attach_pasid_table() only looks at the format, ignoring the version field :-\ In fact, where is struct iommu_pasid_smmuv3 ever used from the config? Should the padding fields be forced to zero? We don't have flags to incorporate use of them with future extensions, so maybe we don't care? > + > + return vfio_attach_pasid_table(iommu, ); > + } else if (cmd == VFIO_IOMMU_DETACH_PASID_TABLE) { > + vfio_detach_pasid_table(iommu); > + return 0; > } > > return -ENOTTY; > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 02bb7ad6e986..329d378565d9 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -14,6 +14,7 @@ > > #include > #include > +#include > > #define VFIO_API_VERSION 0 > > @@ -759,6 +760,22 @@ struct vfio_iommu_type1_dma_unmap { > #define VFIO_IOMMU_ENABLE_IO(VFIO_TYPE, VFIO_BASE + 15) > #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) > > +/** > + * VFIO_IOMMU_ATTACH_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 22, > + * struct vfio_iommu_type1_attach_pasid_table) > + * > + * Passes the PASID table to the host. Calling ATTACH_PASID_TABLE > + * while a table is already installed is allowed: it replaces the old > + * table. DETACH does a comprehensive tear down of the nested mode. > + */ > +struct vfio_iommu_type1_attach_pasid_table { > + __u32 argsz; > + __u32 flags; > + struct iommu_pasid_table_config config; > +}; > +#define VFIO_IOMMU_ATTACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 22) > +#define VFIO_IOMMU_DETACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 23) > + DETACH should also be documented, it's not clear from the uapi that it requires no parameters. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API
On Thu, 21 Mar 2019 15:32:45 +0100 Auger Eric wrote: > Hi jean, Jacob, > > On 3/21/19 3:13 PM, Jean-Philippe Brucker wrote: > > On 21/03/2019 13:54, Auger Eric wrote: > >> Hi Jacob, Jean-Philippe, > >> > >> On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote: > >>> On 20/03/2019 16:37, Jacob Pan wrote: > >>> [...] > > +struct iommu_inv_addr_info { > > +#define IOMMU_INV_ADDR_FLAGS_PASID (1 << 0) > > +#define IOMMU_INV_ADDR_FLAGS_ARCHID(1 << 1) > > +#define IOMMU_INV_ADDR_FLAGS_LEAF (1 << 2) > > + __u32 flags; > > + __u32 archid; > > + __u64 pasid; > > + __u64 addr; > > + __u64 granule_size; > > + __u64 nb_granules; > > +}; > > + > > +/** > > + * First level/stage invalidation information > > + * @cache: bitfield that allows to select which caches to > > invalidate > > + * @granularity: defines the lowest granularity used for the > > invalidation: > > + * domain > pasid > addr > > + * > > + * Not all the combinations of cache/granularity make sense: > > + * > > + * type | DEV_IOTLB | IOTLB | > > PASID| > > + * granularity | | | > > cache | > > + * > > -+---+---+---+ > > + * DOMAIN | N/A | Y | > > Y | > > + * PASID | Y | Y | > > Y | > > + * ADDR| Y | Y | > > N/A | > > + */ > > +struct iommu_cache_invalidate_info { > > +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1 > > + __u32 version; > > +/* IOMMU paging structure cache */ > > +#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU > > IOTLB */ +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB(1 << > > 1) /* Device IOTLB */ +#define > > IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */ > Just a clarification, this used to be an enum. You do intend to > issue a single invalidation request on multiple cache types? > Perhaps for virtio-IOMMU? I only see a single cache type in your > patch #14. For VT-d we plan to issue one cache type at a time > for now. So this format works for us. > >>> > >>> Yes for virtio-iommu I'd like as little overhead as possible, > >>> which means a single invalidation message to hit both IOTLB and > >>> ATC at once, and the ability to specify multiple pages with > >>> @nb_granules. > >> The original request/explanation from Jean-Philippe can be found > >> here: https://lkml.org/lkml/2019/1/28/1497 > >> > >>> > However, if multiple cache types are issued in a single > invalidation. They must share a single granularity, not all > combinations are valid. e.g. dev IOTLB does not support domain > granularity. Just a reminder, not an issue. Driver could filter > out invalid combinations. > >> Sure I will add a comment about this restriction. > >>> > >>> Agreed. Even the core could filter out invalid combinations based > >>> on the table above: IOTLB and domain granularity are N/A. > >> I don't get this sentence. What about vtd IOTLB domain-selective > >> invalidation: > > > > My mistake: I meant dev-IOTLB and domain granularity are N/A > > Ah OK, no worries. > > How do we proceed further with those user APIs? Besides the comment to > be added above and previous suggestion from Jean ("Invalidations by > @granularity use field ...), have we reached a consensus now on: > > - attach/detach_pasid_table > - cache_invalidate > - fault data and fault report API? > These APIs are sufficient for today's VT-d use and leave enough space for extension. E.g. new fault reasons. I have cherry picked the above APIs from your patchset to enable VT-d nested translation. Going forward, I will reuse these until they get merged. > If not, please let me know. > > Thanks > > Eric > > > > > > Thanks, > > Jean > > > >> " > >> • IOTLB entries caching mappings associated with the specified > >> domain-id are invalidated. > >> • Paging-structure-cache entries caching mappings associated with > >> the specified domain-id are invalidated. > >> " > >> > >> Thanks > >> > >> Eric > >> > >>> > >>> Thanks, > >>> Jean > >>> > > > + __u8cache; > > + __u8granularity; > > + __u8padding[2]; > > + union { > > + __u64 pasid; > > + struct iommu_inv_addr_info addr_info; > > + }; > > +}; > > + > > + > > #endif /* _UAPI_IOMMU_H */ > > [Jacob Pan] > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu > > >>> > >> ___
Re: [PATCH v6 02/22] iommu: introduce device fault data
On Sun, 17 Mar 2019 18:22:12 +0100 Eric Auger wrote: > From: Jacob Pan > > Device faults detected by IOMMU can be reported outside the IOMMU > subsystem for further processing. This patch introduces > a generic device fault data structure. > > The fault can be either an unrecoverable fault or a page request, > also referred to as a recoverable fault. > > We only care about non internal faults that are likely to be reported > to an external subsystem. > > Signed-off-by: Jacob Pan > Signed-off-by: Jean-Philippe Brucker > Signed-off-by: Liu, Yi L > Signed-off-by: Ashok Raj > Signed-off-by: Eric Auger > > --- > v4 -> v5: > - simplified struct iommu_fault_event comment > - Moved IOMMU_FAULT_PERM outside of the struct > - Removed IOMMU_FAULT_PERM_INST > - s/IOMMU_FAULT_PAGE_REQUEST_PASID_PRESENT/ > IOMMU_FAULT_PAGE_REQUEST_PASID_VALID > > v3 -> v4: > - use a union containing aither an unrecoverable fault or a page > request message. Move the device private data in the page request > structure. Reshuffle the fields and use flags. > - move fault perm attributes to the uapi > - remove a bunch of iommu_fault_reason enum values that were related > to internal errors > --- > include/linux/iommu.h | 44 ++ > include/uapi/linux/iommu.h | 115 > + 2 files changed, 159 > insertions(+) create mode 100644 include/uapi/linux/iommu.h > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index ffbbc7e39cee..c6f398f7e6e0 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -25,6 +25,7 @@ > #include > #include > #include > +#include > > #define IOMMU_READ (1 << 0) > #define IOMMU_WRITE (1 << 1) > @@ -48,6 +49,7 @@ struct bus_type; > struct device; > struct iommu_domain; > struct notifier_block; > +struct iommu_fault_event; > > /* iommu fault flags */ > #define IOMMU_FAULT_READ 0x0 > @@ -55,6 +57,7 @@ struct notifier_block; > > typedef int (*iommu_fault_handler_t)(struct iommu_domain *, > struct device *, unsigned long, int, void *); > +typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *, > void *); > struct iommu_domain_geometry { > dma_addr_t aperture_start; /* First address that can be > mapped*/ @@ -247,6 +250,46 @@ struct iommu_device { > struct device *dev; > }; > > +/** > + * struct iommu_fault_event - Generic fault event > + * > + * Can represent recoverable faults such as a page requests or > + * unrecoverable faults such as DMA or IRQ remapping faults. > + * > + * @fault: fault descriptor > + * @iommu_private: used by the IOMMU driver for storing > fault-specific > + * data. Users should not modify this field before > + * sending the fault response. > + */ > +struct iommu_fault_event { > + struct iommu_fault fault; > + u64 iommu_private; > +}; > + > +/** > + * struct iommu_fault_param - per-device IOMMU fault data > + * @dev_fault_handler: Callback function to handle IOMMU faults at > device level > + * @data: handler private data > + * > + */ > +struct iommu_fault_param { > + iommu_dev_fault_handler_t handler; > + void *data; > +}; > + > +/** > + * struct iommu_param - collection of per-device IOMMU data > + * > + * @fault_param: IOMMU detected device fault reporting data > + * > + * TODO: migrate other per device data pointers under > iommu_dev_data, e.g. > + * struct iommu_group *iommu_group; > + * struct iommu_fwspec *iommu_fwspec; > + */ > +struct iommu_param { > + struct iommu_fault_param *fault_param; > +}; > + > int iommu_device_register(struct iommu_device *iommu); > void iommu_device_unregister(struct iommu_device *iommu); > int iommu_device_sysfs_add(struct iommu_device *iommu, > @@ -422,6 +465,7 @@ struct iommu_ops {}; > struct iommu_group {}; > struct iommu_fwspec {}; > struct iommu_device {}; > +struct iommu_fault_param {}; > > static inline bool iommu_present(struct bus_type *bus) > { > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h > new file mode 100644 > index ..edcc0dda7993 > --- /dev/null > +++ b/include/uapi/linux/iommu.h > @@ -0,0 +1,115 @@ > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ > +/* > + * IOMMU user API definitions > + */ > + > +#ifndef _UAPI_IOMMU_H > +#define _UAPI_IOMMU_H > + > +#include > + > +#define IOMMU_FAULT_PERM_WRITE (1 << 0) /* write */ > +#define IOMMU_FAULT_PERM_EXEC(1 << 1) /* exec */ > +#define IOMMU_FAULT_PERM_PRIV(1 << 2) /* privileged */ > + > +/* Generic fault types, can be expanded IRQ remapping fault */ > +enum iommu_fault_type { > + IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */ > + IOMMU_FAULT_PAGE_REQ, /* page request fault */ > +}; > + > +enum iommu_fault_reason { > + IOMMU_FAULT_REASON_UNKNOWN = 0, > + > + /* Could not access the PASID table (fetch caused external > abort) */ > +
Re: [PATCH v6 03/22] iommu: introduce device fault report API
On Sun, 17 Mar 2019 18:22:13 +0100 Eric Auger wrote: > From: Jacob Pan > > Traditionally, device specific faults are detected and handled within > their own device drivers. When IOMMU is enabled, faults such as DMA > related transactions are detected by IOMMU. There is no generic > reporting mechanism to report faults back to the in-kernel device > driver or the guest OS in case of assigned devices. > > This patch introduces a registration API for device specific fault > handlers. This differs from the existing iommu_set_fault_handler/ > report_iommu_fault infrastructures in several ways: > - it allows to report more sophisticated fault events (both > unrecoverable faults and page request faults) due to the nature > of the iommu_fault struct > - it is device specific and not domain specific. > > The current iommu_report_device_fault() implementation only handles > the "shoot and forget" unrecoverable fault case. Handling of page > request faults or stalled faults will come later. > > Signed-off-by: Jacob Pan > Signed-off-by: Ashok Raj > Signed-off-by: Jean-Philippe Brucker > Signed-off-by: Eric Auger > > --- > > v4 -> v5: > - remove stuff related to recoverable faults > --- > drivers/iommu/iommu.c | 134 +- > include/linux/iommu.h | 36 +++- > 2 files changed, 168 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 33a982e33716..56d5bf68de53 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -648,6 +648,13 @@ int iommu_group_add_device(struct iommu_group *group, > struct device *dev) > goto err_free_name; > } > > + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL); > + if (!dev->iommu_param) { > + ret = -ENOMEM; > + goto err_free_name; > + } > + mutex_init(>iommu_param->lock); > + > kobject_get(group->devices_kobj); > > dev->iommu_group = group; > @@ -678,6 +685,7 @@ int iommu_group_add_device(struct iommu_group *group, > struct device *dev) > mutex_unlock(>mutex); > dev->iommu_group = NULL; > kobject_put(group->devices_kobj); > + kfree(dev->iommu_param); > err_free_name: > kfree(device->name); > err_remove_link: > @@ -724,7 +732,7 @@ void iommu_group_remove_device(struct device *dev) > sysfs_remove_link(>kobj, "iommu_group"); > > trace_remove_device_from_group(group->id, dev); > - > + kfree(dev->iommu_param); > kfree(device->name); > kfree(device); > dev->iommu_group = NULL; > @@ -858,6 +866,130 @@ int iommu_group_unregister_notifier(struct iommu_group > *group, > } > EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier); > > +/** > + * iommu_register_device_fault_handler() - Register a device fault handler > + * @dev: the device > + * @handler: the fault handler > + * @data: private data passed as argument to the handler > + * > + * When an IOMMU fault event is received, this handler gets called with the > + * fault event and data as argument. > + * > + * Return 0 if the fault handler was installed successfully, or an error. > + */ > +int iommu_register_device_fault_handler(struct device *dev, > + iommu_dev_fault_handler_t handler, > + void *data) > +{ > + struct iommu_param *param = dev->iommu_param; > + int ret = 0; > + > + /* > + * Device iommu_param should have been allocated when device is > + * added to its iommu_group. > + */ > + if (!param) > + return -EINVAL; > + > + mutex_lock(>lock); > + /* Only allow one fault handler registered for each device */ > + if (param->fault_param) { > + ret = -EBUSY; > + goto done_unlock; > + } > + > + get_device(dev); > + param->fault_param = > + kzalloc(sizeof(struct iommu_fault_param), GFP_KERNEL); > + if (!param->fault_param) { > + put_device(dev); > + ret = -ENOMEM; > + goto done_unlock; > + } > + mutex_init(>fault_param->lock); > + param->fault_param->handler = handler; > + param->fault_param->data = data; > + INIT_LIST_HEAD(>fault_param->faults); > + > +done_unlock: > + mutex_unlock(>lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler); > + > +/** > + * iommu_unregister_device_fault_handler() - Unregister the device fault > handler > + * @dev: the device > + * > + * Remove the device fault handler installed with > + * iommu_register_device_fault_handler(). > + * > + * Return 0 on success, or an error. > + */ > +int iommu_unregister_device_fault_handler(struct device *dev) > +{ > + struct iommu_param *param = dev->iommu_param; > + int ret = 0; > + > + if (!param) > + return -EINVAL; > + > + mutex_lock(>lock); > + > + if (!param->fault_param) > +
Re: [PATCH 1/4] ACPI/IORT: Check ATS capability in root complex nodes
On 3/20/2019 1:36 PM, Jean-Philippe Brucker wrote: err = pci_for_each_dma_alias(to_pci_dev(dev), iort_pci_iommu_init, ); + + if (!err && !iort_pci_rc_supports_ats(node)) + dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_NO_ATS; Is it possible to remove !err check here. ATS supported is just a flag in IORT table. Does this decision have to rely on having a correct dma alias? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 3/4] iommu/arm-smmu-v3: Add support for PCI ATS
On 3/20/2019 1:36 PM, Jean-Philippe Brucker wrote: pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ATS); + if (!pos) + return -ENOSYS; + You don't need this. pci_enable_ats() validates this via. if (!dev->ats_cap) return -EINVAL; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API
Hi jean, Jacob, On 3/21/19 3:13 PM, Jean-Philippe Brucker wrote: > On 21/03/2019 13:54, Auger Eric wrote: >> Hi Jacob, Jean-Philippe, >> >> On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote: >>> On 20/03/2019 16:37, Jacob Pan wrote: >>> [...] > +struct iommu_inv_addr_info { > +#define IOMMU_INV_ADDR_FLAGS_PASID (1 << 0) > +#define IOMMU_INV_ADDR_FLAGS_ARCHID (1 << 1) > +#define IOMMU_INV_ADDR_FLAGS_LEAF(1 << 2) > + __u32 flags; > + __u32 archid; > + __u64 pasid; > + __u64 addr; > + __u64 granule_size; > + __u64 nb_granules; > +}; > + > +/** > + * First level/stage invalidation information > + * @cache: bitfield that allows to select which caches to invalidate > + * @granularity: defines the lowest granularity used for the > invalidation: > + * domain > pasid > addr > + * > + * Not all the combinations of cache/granularity make sense: > + * > + * type | DEV_IOTLB | IOTLB | PASID| > + * granularity | | | > cache | > + * -+---+---+---+ > + * DOMAIN| N/A | Y | > Y | > + * PASID | Y | Y | > Y | > + * ADDR | Y | Y | > N/A | > + */ > +struct iommu_cache_invalidate_info { > +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1 > + __u32 version; > +/* IOMMU paging structure cache */ > +#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */ > +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device > IOTLB */ +#define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID > cache */ Just a clarification, this used to be an enum. You do intend to issue a single invalidation request on multiple cache types? Perhaps for virtio-IOMMU? I only see a single cache type in your patch #14. For VT-d we plan to issue one cache type at a time for now. So this format works for us. >>> >>> Yes for virtio-iommu I'd like as little overhead as possible, which >>> means a single invalidation message to hit both IOTLB and ATC at once, >>> and the ability to specify multiple pages with @nb_granules. >> The original request/explanation from Jean-Philippe can be found here: >> https://lkml.org/lkml/2019/1/28/1497 >> >>> However, if multiple cache types are issued in a single invalidation. They must share a single granularity, not all combinations are valid. e.g. dev IOTLB does not support domain granularity. Just a reminder, not an issue. Driver could filter out invalid combinations. >> Sure I will add a comment about this restriction. >>> >>> Agreed. Even the core could filter out invalid combinations based on the >>> table above: IOTLB and domain granularity are N/A. >> I don't get this sentence. What about vtd IOTLB domain-selective >> invalidation: > > My mistake: I meant dev-IOTLB and domain granularity are N/A Ah OK, no worries. How do we proceed further with those user APIs? Besides the comment to be added above and previous suggestion from Jean ("Invalidations by @granularity use field ...), have we reached a consensus now on: - attach/detach_pasid_table - cache_invalidate - fault data and fault report API? If not, please let me know. Thanks Eric > > Thanks, > Jean > >> " >> • IOTLB entries caching mappings associated with the specified domain-id >> are invalidated. >> • Paging-structure-cache entries caching mappings associated with the >> specified domain-id are invalidated. >> " >> >> Thanks >> >> Eric >> >>> >>> Thanks, >>> Jean >>> > + __u8cache; > + __u8granularity; > + __u8padding[2]; > + union { > + __u64 pasid; > + struct iommu_inv_addr_info addr_info; > + }; > +}; > + > + > #endif /* _UAPI_IOMMU_H */ [Jacob Pan] ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu >>> >> ___ >> iommu mailing list >> iommu@lists.linux-foundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/iommu >> > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API
On 21/03/2019 13:54, Auger Eric wrote: > Hi Jacob, Jean-Philippe, > > On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote: >> On 20/03/2019 16:37, Jacob Pan wrote: >> [...] +struct iommu_inv_addr_info { +#define IOMMU_INV_ADDR_FLAGS_PASID(1 << 0) +#define IOMMU_INV_ADDR_FLAGS_ARCHID (1 << 1) +#define IOMMU_INV_ADDR_FLAGS_LEAF (1 << 2) + __u32 flags; + __u32 archid; + __u64 pasid; + __u64 addr; + __u64 granule_size; + __u64 nb_granules; +}; + +/** + * First level/stage invalidation information + * @cache: bitfield that allows to select which caches to invalidate + * @granularity: defines the lowest granularity used for the invalidation: + * domain > pasid > addr + * + * Not all the combinations of cache/granularity make sense: + * + * type | DEV_IOTLB | IOTLB | PASID| + * granularity| | | cache | + * -+---+---+---+ + * DOMAIN | N/A | Y | Y | + * PASID | Y | Y | Y | + * ADDR | Y | Y | N/A| + */ +struct iommu_cache_invalidate_info { +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1 + __u32 version; +/* IOMMU paging structure cache */ +#define IOMMU_CACHE_INV_TYPE_IOTLB(1 << 0) /* IOMMU IOTLB */ +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB(1 << 1) /* Device IOTLB */ +#define IOMMU_CACHE_INV_TYPE_PASID (1 << 2) /* PASID cache */ >>> Just a clarification, this used to be an enum. You do intend to issue a >>> single invalidation request on multiple cache types? Perhaps for >>> virtio-IOMMU? I only see a single cache type in your patch #14. For VT-d >>> we plan to issue one cache type at a time for now. So this format works >>> for us. >> >> Yes for virtio-iommu I'd like as little overhead as possible, which >> means a single invalidation message to hit both IOTLB and ATC at once, >> and the ability to specify multiple pages with @nb_granules. > The original request/explanation from Jean-Philippe can be found here: > https://lkml.org/lkml/2019/1/28/1497 > >> >>> However, if multiple cache types are issued in a single invalidation. >>> They must share a single granularity, not all combinations are valid. >>> e.g. dev IOTLB does not support domain granularity. Just a reminder, >>> not an issue. Driver could filter out invalid combinations. > Sure I will add a comment about this restriction. >> >> Agreed. Even the core could filter out invalid combinations based on the >> table above: IOTLB and domain granularity are N/A. > I don't get this sentence. What about vtd IOTLB domain-selective > invalidation: My mistake: I meant dev-IOTLB and domain granularity are N/A Thanks, Jean > " > • IOTLB entries caching mappings associated with the specified domain-id > are invalidated. > • Paging-structure-cache entries caching mappings associated with the > specified domain-id are invalidated. > " > > Thanks > > Eric > >> >> Thanks, >> Jean >> >>> + __u8cache; + __u8granularity; + __u8padding[2]; + union { + __u64 pasid; + struct iommu_inv_addr_info addr_info; + }; +}; + + #endif /* _UAPI_IOMMU_H */ >>> >>> [Jacob Pan] >>> ___ >>> iommu mailing list >>> iommu@lists.linux-foundation.org >>> https://lists.linuxfoundation.org/mailman/listinfo/iommu >>> >> > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API
Hi Jacob, Jean-Philippe, On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote: > On 20/03/2019 16:37, Jacob Pan wrote: > [...] >>> +struct iommu_inv_addr_info { >>> +#define IOMMU_INV_ADDR_FLAGS_PASID (1 << 0) >>> +#define IOMMU_INV_ADDR_FLAGS_ARCHID(1 << 1) >>> +#define IOMMU_INV_ADDR_FLAGS_LEAF (1 << 2) >>> + __u32 flags; >>> + __u32 archid; >>> + __u64 pasid; >>> + __u64 addr; >>> + __u64 granule_size; >>> + __u64 nb_granules; >>> +}; >>> + >>> +/** >>> + * First level/stage invalidation information >>> + * @cache: bitfield that allows to select which caches to invalidate >>> + * @granularity: defines the lowest granularity used for the >>> invalidation: >>> + * domain > pasid > addr >>> + * >>> + * Not all the combinations of cache/granularity make sense: >>> + * >>> + * type | DEV_IOTLB | IOTLB | PASID| >>> + * granularity | | | >>> cache | >>> + * -+---+---+---+ >>> + * DOMAIN | N/A | Y | >>> Y | >>> + * PASID | Y | Y | >>> Y | >>> + * ADDR| Y | Y | >>> N/A | >>> + */ >>> +struct iommu_cache_invalidate_info { >>> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1 >>> + __u32 version; >>> +/* IOMMU paging structure cache */ >>> +#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */ >>> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device >>> IOTLB */ +#define IOMMU_CACHE_INV_TYPE_PASID(1 << 2) /* PASID >>> cache */ >> Just a clarification, this used to be an enum. You do intend to issue a >> single invalidation request on multiple cache types? Perhaps for >> virtio-IOMMU? I only see a single cache type in your patch #14. For VT-d >> we plan to issue one cache type at a time for now. So this format works >> for us. > > Yes for virtio-iommu I'd like as little overhead as possible, which > means a single invalidation message to hit both IOTLB and ATC at once, > and the ability to specify multiple pages with @nb_granules. The original request/explanation from Jean-Philippe can be found here: https://lkml.org/lkml/2019/1/28/1497 > >> However, if multiple cache types are issued in a single invalidation. >> They must share a single granularity, not all combinations are valid. >> e.g. dev IOTLB does not support domain granularity. Just a reminder, >> not an issue. Driver could filter out invalid combinations. Sure I will add a comment about this restriction. > > Agreed. Even the core could filter out invalid combinations based on the > table above: IOTLB and domain granularity are N/A. I don't get this sentence. What about vtd IOTLB domain-selective invalidation: " • IOTLB entries caching mappings associated with the specified domain-id are invalidated. • Paging-structure-cache entries caching mappings associated with the specified domain-id are invalidated. " Thanks Eric > > Thanks, > Jean > >> >>> + __u8cache; >>> + __u8granularity; >>> + __u8padding[2]; >>> + union { >>> + __u64 pasid; >>> + struct iommu_inv_addr_info addr_info; >>> + }; >>> +}; >>> + >>> + >>> #endif /* _UAPI_IOMMU_H */ >> >> [Jacob Pan] >> ___ >> iommu mailing list >> iommu@lists.linux-foundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/iommu >> > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu