[PATCH v3 00/10] Support using MSI interrupts in ntb_transport

2019-03-21 Thread Logan Gunthorpe
This is version 3 of the MSI interrupts for ntb_transport patchset.

I've addressed the feedback so far and rebased on the latest kernel
and would like this to be considered for merging this cycle.

The only outstanding issue I know of is that it still will not work
with IDT hardware, but ntb_transport doesn't work with IDT hardware
and there is still no sensible common infrastructure to support
ntb_peer_mw_set_trans(). Thus, I decline to consider that complication
in this patchset. However, I'll be happy to review work that adds this
feature in the future.

Also as the port number and resource index stuff is a bit complicated,
I made a quick out of tree test fixture to ensure it's correct[1]. As
an excerise I also wrote some test code[2] using the upcomming KUnit
feature.

Logan

[1] https://repl.it/repls/ExcitingPresentFile
[2] https://github.com/sbates130272/linux-p2pmem/commits/ntb_kunit

--

Changes in v3:

* Rebased onto v5.1-rc1 (Dropped the first two patches as they have
  been merged, and cleaned up some minor conflicts in the PCI tree)

* Added a new patch (#3) to calculate logical port numbers that
  are port numbers from 0 to (number of ports - 1). This is
  then used in ntb_peer_resource_idx() to fix the issues brought
  up by Serge.

* Fixed missing __iomem and iowrite calls (as noticed by Serge)

* Added patch 10 which describes ntb_msi_test in the documentation
  file (as requested by Serge)

* A couple other minor nits and documentation fixes

--

Changes in v2:

* Cleaned up the changes in intel_irq_remapping.c to make them
  less confusing and add a comment. (Per discussion with Jacob and
  Joerg)

* Fixed a nit from Bjorn and collected his Ack

* Added a Kconfig dependancy on CONFIG_PCI_MSI for CONFIG_NTB_MSI
  as the Kbuild robot hit a random config that didn't build
  without it.

* Worked in a callback for when the MSI descriptor changes so that
  the clients can resend the new address and data values to the peer.
  On my test system this was never necessary, but there may be
  other platforms where this can occur. I tested this by hacking
  in a path to rewrite the MSI descriptor when I change the cpu
  affinity of an IRQ. There's a bit of uncertainty over the latency
  of the change, but without hardware this can acctually occur on
  we can't test this. This was the result of a discussion with Dave.

--

This patch series adds optional support for using MSI interrupts instead
of NTB doorbells in ntb_transport. This is desirable seeing doorbells on
current hardware are quite slow and therefore switching to MSI interrupts
provides a significant performance gain. On switchtec hardware, a simple
apples-to-apples comparison shows ntb_netdev/iperf numbers going from
3.88Gb/s to 14.1Gb/s when switching to MSI interrupts.

To do this, a couple changes are required outside of the NTB tree:

1) The IOMMU must know to accept MSI requests from aliased bused numbers
seeing NTB hardware typically sends proxied request IDs through
additional requester IDs. The first patch in this series adds support
for the Intel IOMMU. A quirk to add these aliases for switchtec hardware
was already accepted. See commit ad281ecf1c7d ("PCI: Add DMA alias quirk
for Microsemi Switchtec NTB") for a description of NTB proxy IDs and why
this is necessary.

2) NTB transport (and other clients) may often need more MSI interrupts
than the NTB hardware actually advertises support for. However, seeing
these interrupts will not be triggered by the hardware but through an
NTB memory window, the hardware does not actually need support or need
to know about them. Therefore we add the concept of Virtual MSI
interrupts which are allocated just like any other MSI interrupt but
are not programmed into the hardware's MSI table. This is done in
Patch 2 and then made use of in Patch 3.

The remaining patches in this series add a library for dealing with MSI
interrupts, a test client and finally support in ntb_transport.

The series is based off of v5.1-rc1 plus the patches in ntb-next.
A git repo is available here:

https://github.com/sbates130272/linux-p2pmem/ ntb_transport_msi_v3

Thanks,

Logan

--

Logan Gunthorpe (10):
  PCI/MSI: Support allocating virtual MSI interrupts
  PCI/switchtec: Add module parameter to request more interrupts
  NTB: Introduce helper functions to calculate logical port number
  NTB: Introduce functions to calculate multi-port resource index
  NTB: Rename ntb.c to support multiple source files in the module
  NTB: Introduce MSI library
  NTB: Introduce NTB MSI Test Client
  NTB: Add ntb_msi_test support to ntb_test
  NTB: Add MSI interrupt support to ntb_transport
  NTB: Describe the ntb_msi_test client in the documentation.

 Documentation/ntb.txt   |  27 ++
 drivers/ntb/Kconfig |  11 +
 drivers/ntb/Makefile|   3 +
 drivers/ntb/{ntb.c => core.c}   |   0
 drivers/ntb/msi.c   | 415 +++
 

[PATCH v3 10/10] NTB: Describe the ntb_msi_test client in the documentation.

2019-03-21 Thread Logan Gunthorpe
Add a blurb in Documentation/ntb.txt to describe the ntb_msi_test tool's
debugfs interface. Similar to the (out of date) ntb_tool description.

Signed-off-by: Logan Gunthorpe 
---
 Documentation/ntb.txt | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/Documentation/ntb.txt b/Documentation/ntb.txt
index a043854d28df..802a539f1592 100644
--- a/Documentation/ntb.txt
+++ b/Documentation/ntb.txt
@@ -194,6 +194,33 @@ Debugfs Files:
This file is used to read and write peer scratchpads.  See
*spad* for details.
 
+NTB MSI Test Client (ntb\_msi\_test)
+
+
+The MSI test client serves to test and debug the MSI library which
+allows for passing MSI interrupts across NTB memory windows. The
+test client is interacted with through the debugfs filesystem:
+
+* *debugfs*/ntb\_tool/*hw*/
+   A directory in debugfs will be created for each
+   NTB device probed by the tool.  This directory is shortened to *hw*
+   below.
+* *hw*/port
+   This file describes the local port number
+* *hw*/irq*_occurrences
+   One occurrences file exists for each interrupt and, when read,
+   returns the number of times the interrupt has been triggered.
+* *hw*/peer*/port
+   This file describes the port number for each peer
+* *hw*/peer*/count
+   This file describes the number of interrupts that can be
+   triggered on each peer
+* *hw*/peer*/trigger
+   Writing an interrupt number (any number less than the value
+   specified in count) will trigger the interrupt on the
+   specified peer. That peer's interrupt's occurrence file
+   should be incremented.
+
 NTB Hardware Drivers
 
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 09/10] NTB: Add MSI interrupt support to ntb_transport

2019-03-21 Thread Logan Gunthorpe
Introduce the module parameter 'use_msi' which, when set uses
MSI interrupts instead of doorbells for each queue pair (QP). T
he parameter is only available if NTB MSI support is configured into
the kernel. We also require there to be more than one memory window
(MW) so that an extra one is available to forward the APIC region.

To use MSIs, we request one interrupt per QP and forward the MSI address
and data to the peer using scratch pad registers (SPADS) above the MW
spads. (If there are not enough SPADS the MSI interrupt will not be used.)

Once registered, we simply use ntb_msi_peer_trigger and the recieving
ISR simply queues up the rxc_db_work for the queue.

This addition can significantly improve performance of ntb_transport.
In a simple, untuned, apples-to-apples comparision using ntb_netdev
and iperf with switchtec hardware, I see 3.88Gb/s without MSI
interrupts and 14.1Gb/s which is a more than 3x improvement.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/ntb_transport.c | 169 +++-
 1 file changed, 168 insertions(+), 1 deletion(-)

diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index d4f39ba1d976..f1cf0942cb99 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -93,6 +93,12 @@ static bool use_dma;
 module_param(use_dma, bool, 0644);
 MODULE_PARM_DESC(use_dma, "Use DMA engine to perform large data copy");
 
+static bool use_msi;
+#ifdef CONFIG_NTB_MSI
+module_param(use_msi, bool, 0644);
+MODULE_PARM_DESC(use_msi, "Use MSI interrupts instead of doorbells");
+#endif
+
 static struct dentry *nt_debugfs_dir;
 
 /* Only two-ports NTB devices are supported */
@@ -188,6 +194,11 @@ struct ntb_transport_qp {
u64 tx_err_no_buf;
u64 tx_memcpy;
u64 tx_async;
+
+   bool use_msi;
+   int msi_irq;
+   struct ntb_msi_desc msi_desc;
+   struct ntb_msi_desc peer_msi_desc;
 };
 
 struct ntb_transport_mw {
@@ -221,6 +232,10 @@ struct ntb_transport_ctx {
u64 qp_bitmap;
u64 qp_bitmap_free;
 
+   bool use_msi;
+   unsigned int msi_spad_offset;
+   u64 msi_db_mask;
+
bool link_is_up;
struct delayed_work link_work;
struct work_struct link_cleanup;
@@ -667,6 +682,114 @@ static int ntb_transport_setup_qp_mw(struct 
ntb_transport_ctx *nt,
return 0;
 }
 
+static irqreturn_t ntb_transport_isr(int irq, void *dev)
+{
+   struct ntb_transport_qp *qp = dev;
+
+   tasklet_schedule(>rxc_db_work);
+
+   return IRQ_HANDLED;
+}
+
+static void ntb_transport_setup_qp_peer_msi(struct ntb_transport_ctx *nt,
+   unsigned int qp_num)
+{
+   struct ntb_transport_qp *qp = >qp_vec[qp_num];
+   int spad = qp_num * 2 + nt->msi_spad_offset;
+
+   if (!nt->use_msi)
+   return;
+
+   if (spad >= ntb_spad_count(nt->ndev))
+   return;
+
+   qp->peer_msi_desc.addr_offset =
+   ntb_peer_spad_read(qp->ndev, PIDX, spad);
+   qp->peer_msi_desc.data =
+   ntb_peer_spad_read(qp->ndev, PIDX, spad + 1);
+
+   dev_dbg(>ndev->pdev->dev, "QP%d Peer MSI addr=%x data=%x\n",
+   qp_num, qp->peer_msi_desc.addr_offset, qp->peer_msi_desc.data);
+
+   if (qp->peer_msi_desc.addr_offset) {
+   qp->use_msi = true;
+   dev_info(>ndev->pdev->dev,
+"Using MSI interrupts for QP%d\n", qp_num);
+   }
+}
+
+static void ntb_transport_setup_qp_msi(struct ntb_transport_ctx *nt,
+  unsigned int qp_num)
+{
+   struct ntb_transport_qp *qp = >qp_vec[qp_num];
+   int spad = qp_num * 2 + nt->msi_spad_offset;
+   int rc;
+
+   if (!nt->use_msi)
+   return;
+
+   if (spad >= ntb_spad_count(nt->ndev)) {
+   dev_warn_once(>ndev->pdev->dev,
+ "Not enough SPADS to use MSI interrupts\n");
+   return;
+   }
+
+   ntb_spad_write(qp->ndev, spad, 0);
+   ntb_spad_write(qp->ndev, spad + 1, 0);
+
+   if (!qp->msi_irq) {
+   qp->msi_irq = ntbm_msi_request_irq(qp->ndev, ntb_transport_isr,
+  KBUILD_MODNAME, qp,
+  >msi_desc);
+   if (qp->msi_irq < 0) {
+   dev_warn(>ndev->pdev->dev,
+"Unable to allocate MSI interrupt for qp%d\n",
+qp_num);
+   return;
+   }
+   }
+
+   rc = ntb_spad_write(qp->ndev, spad, qp->msi_desc.addr_offset);
+   if (rc)
+   goto err_free_interrupt;
+
+   rc = ntb_spad_write(qp->ndev, spad + 1, qp->msi_desc.data);
+   if (rc)
+   goto err_free_interrupt;
+
+   dev_dbg(>ndev->pdev->dev, "QP%d MSI %d addr=%x data=%x\n",
+   qp_num, 

[PATCH v3 07/10] NTB: Introduce NTB MSI Test Client

2019-03-21 Thread Logan Gunthorpe
Introduce a tool to test NTB MSI interrupts similar to the other
NTB test tools. This tool creates a debugfs directory for each
NTB device with the following files:

port
irqX_occurrences
peerX/port
peerX/count
peerX/trigger

The 'port' file tells the user the local port number and the
'occurrences' files tell the number of local interrupts that
have been received for each interrupt.

For each peer, the 'port' file and the 'count' file tell you the
peer's port number and number of interrupts respectively. Writing
the interrupt number to the 'trigger' file triggers the interrupt
handler for the peer which should increment their corresponding
'occurrences' file. The 'ready' file indicates if a peer is ready,
writing to this file blocks until it is ready.

The module parameter num_irqs can be used to set the number of
local interrupts. By default this is 4. This is only limited by
the number of unused MSI interrupts registered by the hardware
(this will require support of the hardware driver) and there must
be at least 2*num_irqs + 1 spads registers available.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/test/Kconfig|   9 +
 drivers/ntb/test/Makefile   |   1 +
 drivers/ntb/test/ntb_msi_test.c | 433 
 3 files changed, 443 insertions(+)
 create mode 100644 drivers/ntb/test/ntb_msi_test.c

diff --git a/drivers/ntb/test/Kconfig b/drivers/ntb/test/Kconfig
index a5d0eda44438..a3f3e2638935 100644
--- a/drivers/ntb/test/Kconfig
+++ b/drivers/ntb/test/Kconfig
@@ -25,3 +25,12 @@ config NTB_PERF
 to and from the window without additional software interaction.
 
 If unsure, say N.
+
+config NTB_MSI_TEST
+   tristate "NTB MSI Test Client"
+   depends on NTB_MSI
+   help
+ This tool demonstrates the use of the NTB MSI library to
+ send MSI interrupts between peers.
+
+ If unsure, say N.
diff --git a/drivers/ntb/test/Makefile b/drivers/ntb/test/Makefile
index 9e77e0b761c2..d2895ca995e4 100644
--- a/drivers/ntb/test/Makefile
+++ b/drivers/ntb/test/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_NTB_PINGPONG) += ntb_pingpong.o
 obj-$(CONFIG_NTB_TOOL) += ntb_tool.o
 obj-$(CONFIG_NTB_PERF) += ntb_perf.o
+obj-$(CONFIG_NTB_MSI_TEST) += ntb_msi_test.o
diff --git a/drivers/ntb/test/ntb_msi_test.c b/drivers/ntb/test/ntb_msi_test.c
new file mode 100644
index ..99d826ed9c34
--- /dev/null
+++ b/drivers/ntb/test/ntb_msi_test.c
@@ -0,0 +1,433 @@
+// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_VERSION("0.1");
+MODULE_AUTHOR("Logan Gunthorpe ");
+MODULE_DESCRIPTION("Test for sending MSI interrupts over an NTB memory 
window");
+
+static int num_irqs = 4;
+module_param(num_irqs, int, 0644);
+MODULE_PARM_DESC(num_irqs, "number of irqs to use");
+
+struct ntb_msit_ctx {
+   struct ntb_dev *ntb;
+   struct dentry *dbgfs_dir;
+   struct work_struct setup_work;
+
+   struct ntb_msit_isr_ctx {
+   int irq_idx;
+   int irq_num;
+   int occurrences;
+   struct ntb_msit_ctx *nm;
+   struct ntb_msi_desc desc;
+   } *isr_ctx;
+
+   struct ntb_msit_peer {
+   struct ntb_msit_ctx *nm;
+   int pidx;
+   int num_irqs;
+   struct completion init_comp;
+   struct ntb_msi_desc *msi_desc;
+   } peers[];
+};
+
+static struct dentry *ntb_msit_dbgfs_topdir;
+
+static irqreturn_t ntb_msit_isr(int irq, void *dev)
+{
+   struct ntb_msit_isr_ctx *isr_ctx = dev;
+   struct ntb_msit_ctx *nm = isr_ctx->nm;
+
+   dev_dbg(>ntb->dev, "Interrupt Occurred: %d",
+   isr_ctx->irq_idx);
+
+   isr_ctx->occurrences++;
+
+   return IRQ_HANDLED;
+}
+
+static void ntb_msit_setup_work(struct work_struct *work)
+{
+   struct ntb_msit_ctx *nm = container_of(work, struct ntb_msit_ctx,
+  setup_work);
+   int irq_count = 0;
+   int irq;
+   int ret;
+   uintptr_t i;
+
+   ret = ntb_msi_setup_mws(nm->ntb);
+   if (ret) {
+   dev_err(>ntb->dev, "Unable to setup MSI windows: %d\n",
+   ret);
+   return;
+   }
+
+   for (i = 0; i < num_irqs; i++) {
+   nm->isr_ctx[i].irq_idx = i;
+   nm->isr_ctx[i].nm = nm;
+
+   if (!nm->isr_ctx[i].irq_num) {
+   irq = ntbm_msi_request_irq(nm->ntb, ntb_msit_isr,
+  KBUILD_MODNAME,
+  >isr_ctx[i],
+  >isr_ctx[i].desc);
+   if (irq < 0)
+   break;
+
+   nm->isr_ctx[i].irq_num = irq;
+   }
+
+   ret = 

[PATCH v3 08/10] NTB: Add ntb_msi_test support to ntb_test

2019-03-21 Thread Logan Gunthorpe
When the ntb_msi_test module is available, the test code will trigger
each of the interrupts and ensure the corresponding occurrences files
gets incremented.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 tools/testing/selftests/ntb/ntb_test.sh | 54 -
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/ntb/ntb_test.sh 
b/tools/testing/selftests/ntb/ntb_test.sh
index 17ca36403d04..1a10b8f67727 100755
--- a/tools/testing/selftests/ntb/ntb_test.sh
+++ b/tools/testing/selftests/ntb/ntb_test.sh
@@ -87,10 +87,10 @@ set -e
 
 function _modprobe()
 {
-   modprobe "$@"
+   modprobe "$@" || return 1
 
if [[ "$REMOTE_HOST" != "" ]]; then
-   ssh "$REMOTE_HOST" modprobe "$@"
+   ssh "$REMOTE_HOST" modprobe "$@" || return 1
fi
 }
 
@@ -451,6 +451,30 @@ function pingpong_test()
echo "  Passed"
 }
 
+function msi_test()
+{
+   LOC=$1
+   REM=$2
+
+   write_file 1 $LOC/ready
+
+   echo "Running MSI interrupt tests on: $(subdirname $LOC) / $(subdirname 
$REM)"
+
+   CNT=$(read_file "$LOC/count")
+   for ((i = 0; i < $CNT; i++)); do
+   START=$(read_file $REM/../irq${i}_occurrences)
+   write_file $i $LOC/trigger
+   END=$(read_file $REM/../irq${i}_occurrences)
+
+   if [[ $(($END - $START)) != 1 ]]; then
+   echo "MSI did not trigger the interrupt on the remote 
side!" >&2
+   exit 1
+   fi
+   done
+
+   echo "  Passed"
+}
+
 function perf_test()
 {
USE_DMA=$1
@@ -529,6 +553,29 @@ function ntb_pingpong_tests()
_modprobe -r ntb_pingpong
 }
 
+function ntb_msi_tests()
+{
+   LOCAL_MSI="$DEBUGFS/ntb_msi_test/$LOCAL_DEV"
+   REMOTE_MSI="$REMOTE_HOST:$DEBUGFS/ntb_msi_test/$REMOTE_DEV"
+
+   echo "Starting ntb_msi_test tests..."
+
+   if ! _modprobe ntb_msi_test 2> /dev/null; then
+   echo "  Not doing MSI tests seeing the module is not available."
+   return
+   fi
+
+   port_test $LOCAL_MSI $REMOTE_MSI
+
+   LOCAL_PEER="$LOCAL_MSI/peer$LOCAL_PIDX"
+   REMOTE_PEER="$REMOTE_MSI/peer$REMOTE_PIDX"
+
+   msi_test $LOCAL_PEER $REMOTE_PEER
+   msi_test $REMOTE_PEER $LOCAL_PEER
+
+   _modprobe -r ntb_msi_test
+}
+
 function ntb_perf_tests()
 {
LOCAL_PERF="$DEBUGFS/ntb_perf/$LOCAL_DEV"
@@ -550,6 +597,7 @@ function cleanup()
_modprobe -r ntb_perf 2> /dev/null
_modprobe -r ntb_pingpong 2> /dev/null
_modprobe -r ntb_transport 2> /dev/null
+   _modprobe -r ntb_msi_test 2> /dev/null
set -e
 }
 
@@ -586,5 +634,7 @@ ntb_tool_tests
 echo
 ntb_pingpong_tests
 echo
+ntb_msi_tests
+echo
 ntb_perf_tests
 echo
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 05/10] NTB: Rename ntb.c to support multiple source files in the module

2019-03-21 Thread Logan Gunthorpe
The kbuild system does not support having multiple source files in
a module if one of those source files has the same name as the module.

Therefore, we must rename ntb.c to core.c, while the module remains
ntb.ko.

This is similar to the way the nvme modules are structured.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
---
 drivers/ntb/Makefile  | 2 ++
 drivers/ntb/{ntb.c => core.c} | 0
 2 files changed, 2 insertions(+)
 rename drivers/ntb/{ntb.c => core.c} (100%)

diff --git a/drivers/ntb/Makefile b/drivers/ntb/Makefile
index 1921dec1949d..537226f8e78d 100644
--- a/drivers/ntb/Makefile
+++ b/drivers/ntb/Makefile
@@ -1,2 +1,4 @@
 obj-$(CONFIG_NTB) += ntb.o hw/ test/
 obj-$(CONFIG_NTB_TRANSPORT) += ntb_transport.o
+
+ntb-y := core.o
diff --git a/drivers/ntb/ntb.c b/drivers/ntb/core.c
similarity index 100%
rename from drivers/ntb/ntb.c
rename to drivers/ntb/core.c
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 02/10] PCI/switchtec: Add module parameter to request more interrupts

2019-03-21 Thread Logan Gunthorpe
Seeing the we want to use more interrupts in the NTB MSI code
we need to be able allocate more (sometimes virtual) interrupts
in the switchtec driver. Therefore add a module parameter to
request to allocate additional interrupts.

This puts virtually no limit on the number of MSI interrupts available
to NTB clients.

Signed-off-by: Logan Gunthorpe 
Cc: Bjorn Helgaas 
---
 drivers/pci/switch/switchtec.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
index e22766c79fe9..8b1db78197d9 100644
--- a/drivers/pci/switch/switchtec.c
+++ b/drivers/pci/switch/switchtec.c
@@ -30,6 +30,10 @@ module_param(use_dma_mrpc, bool, 0644);
 MODULE_PARM_DESC(use_dma_mrpc,
 "Enable the use of the DMA MRPC feature");
 
+static int nirqs = 32;
+module_param(nirqs, int, 0644);
+MODULE_PARM_DESC(nirqs, "number of interrupts to allocate (more may be useful 
for NTB applications)");
+
 static dev_t switchtec_devt;
 static DEFINE_IDA(switchtec_minor_ida);
 
@@ -1247,8 +1251,12 @@ static int switchtec_init_isr(struct switchtec_dev 
*stdev)
int dma_mrpc_irq;
int rc;
 
-   nvecs = pci_alloc_irq_vectors(stdev->pdev, 1, 4,
- PCI_IRQ_MSIX | PCI_IRQ_MSI);
+   if (nirqs < 4)
+   nirqs = 4;
+
+   nvecs = pci_alloc_irq_vectors(stdev->pdev, 1, nirqs,
+ PCI_IRQ_MSIX | PCI_IRQ_MSI |
+ PCI_IRQ_VIRTUAL);
if (nvecs < 0)
return nvecs;
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 01/10] PCI/MSI: Support allocating virtual MSI interrupts

2019-03-21 Thread Logan Gunthorpe
For NTB devices, we want to be able to trigger MSI interrupts
through a memory window. In these cases we may want to use
more interrupts than the NTB PCI device has available in its MSI-X
table.

We allow for this by creating a new 'virtual' interrupt. These
interrupts are allocated as usual but are not programmed into the
MSI-X table (as there may not be space for them).

The MSI address and data will then handled through an NTB MSI library
introduced later in this series.

Signed-off-by: Logan Gunthorpe 
Acked-by: Bjorn Helgaas 
---
 drivers/pci/msi.c   | 54 +
 include/linux/msi.h |  8 +++
 include/linux/pci.h |  9 
 3 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 73986825d221..668bc16ef4d1 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -192,6 +192,9 @@ static void msi_mask_irq(struct msi_desc *desc, u32 mask, 
u32 flag)
 
 static void __iomem *pci_msix_desc_addr(struct msi_desc *desc)
 {
+   if (desc->msi_attrib.is_virtual)
+   return NULL;
+
return desc->mask_base +
desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
 }
@@ -206,14 +209,19 @@ static void __iomem *pci_msix_desc_addr(struct msi_desc 
*desc)
 u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag)
 {
u32 mask_bits = desc->masked;
+   void __iomem *desc_addr;
 
if (pci_msi_ignore_mask)
return 0;
+   desc_addr = pci_msix_desc_addr(desc);
+   if (!desc_addr)
+   return 0;
 
mask_bits &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
if (flag)
mask_bits |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
-   writel(mask_bits, pci_msix_desc_addr(desc) + 
PCI_MSIX_ENTRY_VECTOR_CTRL);
+
+   writel(mask_bits, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 
return mask_bits;
 }
@@ -273,6 +281,11 @@ void __pci_read_msi_msg(struct msi_desc *entry, struct 
msi_msg *msg)
if (entry->msi_attrib.is_msix) {
void __iomem *base = pci_msix_desc_addr(entry);
 
+   if (!base) {
+   WARN_ON(1);
+   return;
+   }
+
msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR);
msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR);
msg->data = readl(base + PCI_MSIX_ENTRY_DATA);
@@ -303,6 +316,9 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct 
msi_msg *msg)
} else if (entry->msi_attrib.is_msix) {
void __iomem *base = pci_msix_desc_addr(entry);
 
+   if (!base)
+   goto skip;
+
writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR);
writel(msg->data, base + PCI_MSIX_ENTRY_DATA);
@@ -327,7 +343,13 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct 
msi_msg *msg)
  msg->data);
}
}
+
+skip:
entry->msg = *msg;
+
+   if (entry->write_msi_msg)
+   entry->write_msi_msg(entry, entry->write_msi_msg_data);
+
 }
 
 void pci_write_msi_msg(unsigned int irq, struct msi_msg *msg)
@@ -550,6 +572,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct 
irq_affinity *affd)
 
entry->msi_attrib.is_msix   = 0;
entry->msi_attrib.is_64 = !!(control & PCI_MSI_FLAGS_64BIT);
+   entry->msi_attrib.is_virtual= 0;
entry->msi_attrib.entry_nr  = 0;
entry->msi_attrib.maskbit   = !!(control & PCI_MSI_FLAGS_MASKBIT);
entry->msi_attrib.default_irq   = dev->irq; /* Save IOAPIC IRQ */
@@ -674,6 +697,7 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
struct irq_affinity_desc *curmsk, *masks = NULL;
struct msi_desc *entry;
int ret, i;
+   int vec_count = pci_msix_vec_count(dev);
 
if (affd)
masks = irq_create_affinity_masks(nvec, affd);
@@ -696,6 +720,10 @@ static int msix_setup_entries(struct pci_dev *dev, void 
__iomem *base,
entry->msi_attrib.entry_nr = entries[i].entry;
else
entry->msi_attrib.entry_nr = i;
+
+   entry->msi_attrib.is_virtual =
+   entry->msi_attrib.entry_nr >= vec_count;
+
entry->msi_attrib.default_irq   = dev->irq;
entry->mask_base= base;
 
@@ -714,12 +742,19 @@ static void msix_program_entries(struct pci_dev *dev,
 {
struct msi_desc *entry;
int i = 0;
+   void __iomem *desc_addr;
 
for_each_pci_msi_entry(entry, dev) {
if (entries)
entries[i++].vector = entry->irq;
-   entry->masked = readl(pci_msix_desc_addr(entry) +
-   PCI_MSIX_ENTRY_VECTOR_CTRL);
+

[PATCH v3 03/10] NTB: Introduce helper functions to calculate logical port number

2019-03-21 Thread Logan Gunthorpe
This patch introduces the "Logical Port Number" which is similar to the
"Port Number" in that it enumerates the ports in the system.

The original (or Physical) "Port Number" can be any number used by the
hardware to uniquley identify a port in the system. The "Logical Port
Number" enumerates all ports in the system from 0 to the number of
ports minus one.

For example a system with 5 ports might have the following port numbers
which would be enumareted thusly:

Port Number:   1  2  5  7  116
Logical Port Number:   0  1  2  3  4

The logical port number is useful when calculating which resources
to use for which peers. So we thus define two helper functions:
ntb_logical_port_number() and ntb_peer_logical_port_number() which
provide the "Logical Port Number" for the local port and any peer
respectively.

Signed-off-by: Logan Gunthorpe 
Cc: Jon Mason 
Cc: Dave Jiang 
Cc: Allen Hubbe 
Cc: Serge Semin 
---
 include/linux/ntb.h | 53 -
 1 file changed, 52 insertions(+), 1 deletion(-)

diff --git a/include/linux/ntb.h b/include/linux/ntb.h
index 56a92e3ae3ae..52060b2130b6 100644
--- a/include/linux/ntb.h
+++ b/include/linux/ntb.h
@@ -616,7 +616,6 @@ static inline int ntb_port_number(struct ntb_dev *ntb)
 
return ntb->ops->port_number(ntb);
 }
-
 /**
  * ntb_peer_port_count() - get the number of peer device ports
  * @ntb:   NTB device context.
@@ -653,6 +652,58 @@ static inline int ntb_peer_port_number(struct ntb_dev 
*ntb, int pidx)
return ntb->ops->peer_port_number(ntb, pidx);
 }
 
+/**
+ * ntb_logical_port_number() - get the logical port number of the local port
+ * @ntb:   NTB device context.
+ *
+ * The Logical Port Number is defined to be a unique number for each
+ * port starting from zero through to the number of ports minus one.
+ * This is in contrast to the Port Number where each port can be assigned
+ * any unqique physical number by the hardware.
+ *
+ * The logical port number is useful for calculating the resource indexes
+ * used by peers.
+ *
+ * Return: the logical port number or negative value indicating an error
+ */
+static inline int ntb_logical_port_number(struct ntb_dev *ntb)
+{
+   int lport = ntb_port_number(ntb);
+   int pidx;
+
+   if (lport < 0)
+   return lport;
+
+   for (pidx = 0; pidx < ntb_peer_port_count(ntb); pidx++)
+   if (lport <= ntb_peer_port_number(ntb, pidx))
+   return pidx;
+
+   return pidx;
+}
+
+/**
+ * ntb_peer_logical_port_number() - get the logical peer port by given index
+ * @ntb:   NTB device context.
+ * @pidx:  Peer port index.
+ *
+ * The Logical Port Number is defined to be a unique number for each
+ * port starting from zero through to the number of ports minus one.
+ * This is in contrast to the Port Number where each port can be assigned
+ * any unqique physical number by the hardware.
+ *
+ * The logical port number is useful for calculating the resource indexes
+ * used by peers.
+ *
+ * Return: the peer's logical port number or negative value indicating an error
+ */
+static inline int ntb_peer_logical_port_number(struct ntb_dev *ntb, int pidx)
+{
+   if (ntb_peer_port_number(ntb, pidx) < ntb_port_number(ntb))
+   return pidx;
+   else
+   return pidx + 1;
+}
+
 /**
  * ntb_peer_port_idx() - get the peer device port index by given port number
  * @ntb:   NTB device context.
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH] virtio_ring: Use DMA API if guest memory is encrypted

2019-03-21 Thread Thiago Jung Bauermann


Michael S. Tsirkin  writes:

> On Wed, Mar 20, 2019 at 01:13:41PM -0300, Thiago Jung Bauermann wrote:
>> >> Another way of looking at this issue which also explains our reluctance
>> >> is that the only difference between a secure guest and a regular guest
>> >> (at least regarding virtio) is that the former uses swiotlb while the
>> >> latter doens't.
>> >
>> > But swiotlb is just one implementation. It's a guest internal thing. The
>> > issue is that memory isn't host accessible.
>>
>> >From what I understand of the ACCESS_PLATFORM definition, the host will
>> only ever try to access memory addresses that are supplied to it by the
>> guest, so all of the secure guest memory that the host cares about is
>> accessible:
>>
>> If this feature bit is set to 0, then the device has same access to
>> memory addresses supplied to it as the driver has. In particular,
>> the device will always use physical addresses matching addresses
>> used by the driver (typically meaning physical addresses used by the
>> CPU) and not translated further, and can access any address supplied
>> to it by the driver. When clear, this overrides any
>> platform-specific description of whether device access is limited or
>> translated in any way, e.g. whether an IOMMU may be present.
>>
>> All of the above is true for POWER guests, whether they are secure
>> guests or not.
>>
>> Or are you saying that a virtio device may want to access memory
>> addresses that weren't supplied to it by the driver?
>
> Your logic would apply to IOMMUs as well.  For your mode, there are
> specific encrypted memory regions that driver has access to but device
> does not. that seems to violate the constraint.

Right, if there's a pre-configured 1:1 mapping in the IOMMU such that
the device can ignore the IOMMU for all practical purposes I would
indeed say that the logic would apply to IOMMUs as well. :-)

I guess I'm still struggling with the purpose of signalling to the
driver that the host may not have access to memory addresses that it
will never try to access.

>> >> And from the device's point of view they're
>> >> indistinguishable. It can't tell one guest that is using swiotlb from
>> >> one that isn't. And that implies that secure guest vs regular guest
>> >> isn't a virtio interface issue, it's "guest internal affairs". So
>> >> there's no reason to reflect that in the feature flags.
>> >
>> > So don't. The way not to reflect that in the feature flags is
>> > to set ACCESS_PLATFORM.  Then you say *I don't care let platform device*.
>> >
>> >
>> > Without ACCESS_PLATFORM
>> > virtio has a very specific opinion about the security of the
>> > device, and that opinion is that device is part of the guest
>> > supervisor security domain.
>>
>> Sorry for being a bit dense, but not sure what "the device is part of
>> the guest supervisor security domain" means. In powerpc-speak,
>> "supervisor" is the operating system so perhaps that explains my
>> confusion. Are you saying that without ACCESS_PLATFORM, the guest
>> considers the host to be part of the guest operating system's security
>> domain?
>
> I think so. The spec says "device has same access as driver".

Ok, makes sense.

>> If so, does that have any other implication besides "the host
>> can access any address supplied to it by the driver"? If that is the
>> case, perhaps the definition of ACCESS_PLATFORM needs to be amended to
>> include that information because it's not part of the current
>> definition.
>>
>> >> > But the name "sev_active" makes me scared because at least AMD guys who
>> >> > were doing the sensible thing and setting ACCESS_PLATFORM
>> >>
>> >> My understanding is, AMD guest-platform knows in advance that their
>> >> guest will run in secure mode and hence sets the flag at the time of VM
>> >> instantiation. Unfortunately we dont have that luxury on our platforms.
>> >
>> > Well you do have that luxury. It looks like that there are existing
>> > guests that already acknowledge ACCESS_PLATFORM and you are not happy
>> > with how that path is slow. So you are trying to optimize for
>> > them by clearing ACCESS_PLATFORM and then you have lost ability
>> > to invoke DMA API.
>> >
>> > For example if there was another flag just like ACCESS_PLATFORM
>> > just not yet used by anyone, you would be all fine using that right?
>>
>> Yes, a new flag sounds like a great idea. What about the definition
>> below?
>>
>> VIRTIO_F_ACCESS_PLATFORM_NO_IOMMU This feature has the same meaning as
>> VIRTIO_F_ACCESS_PLATFORM both when set and when not set, with the
>> exception that the IOMMU is explicitly defined to be off or bypassed
>> when accessing memory addresses supplied to the device by the
>> driver. This flag should be set by the guest if offered, but to
>> allow for backward-compatibility device implementations allow for it
>> to be left unset by the guest. It is an error to set both this flag
>> and VIRTIO_F_ACCESS_PLATFORM.

Re: [PATCH v2 RFC/RFT] dma-contiguous: Get normal pages for single-page allocations

2019-03-21 Thread Nicolin Chen
Hi Catalin,

Thank you for the review. And I realized that the free() path
is missing too.

On Tue, Mar 19, 2019 at 02:43:01PM +, Catalin Marinas wrote:
> On Tue, Mar 05, 2019 at 10:32:02AM -0800, Nicolin Chen wrote:
> > The addresses within a single page are always contiguous, so it's
> > not so necessary to always allocate one single page from CMA area.
> > Since the CMA area has a limited predefined size of space, it may
> > run out of space in heavy use cases, where there might be quite a
> > lot CMA pages being allocated for single pages.
> > 
> > However, there is also a concern that a device might care where a
> > page comes from -- it might expect the page from CMA area and act
> > differently if the page doesn't.
> > 
> > This patch tries to get normal pages for single-page allocations
> > unless the device has its own CMA area. This would save resources
> > from the CMA area for more CMA allocations. And it'd also reduce
> > CMA fragmentations resulted from trivial allocations.
> 
> This is not sufficient. Some architectures/platforms declare limits on
> the CMA range so that DMA is possible with all expected devices. For
> example, on arm64 we keep the CMA in the lower 4GB of the address range,
> though with this patch you only covered the iommu ops allocation.

I will follow the way of v1 by adding alloc_page()/free_page()
function to those callers who don't have fallback allocations.
In this way, archs may use different callbacks to alloc pages.

> Do you have any numbers to back this up? You don't seem to address
> dma_direct_alloc() either but, as I said above, it's not trivial since
> some platforms expect certain physical range for DMA allocations.

What's the dma_direct_alloc() here about? Mind elaborating?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 09/22] vfio: VFIO_IOMMU_BIND/UNBIND_MSI

2019-03-21 Thread Alex Williamson
On Sun, 17 Mar 2019 18:22:19 +0100
Eric Auger  wrote:

> This patch adds the VFIO_IOMMU_BIND/UNBIND_MSI ioctl which aim
> to pass/withdraw the guest MSI binding to/from the host.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> v3 -> v4:
> - add UNBIND
> - unwind on BIND error
> 
> v2 -> v3:
> - adapt to new proto of bind_guest_msi
> - directly use vfio_iommu_for_each_dev
> 
> v1 -> v2:
> - s/vfio_iommu_type1_guest_msi_binding/vfio_iommu_type1_bind_guest_msi
> ---
>  drivers/vfio/vfio_iommu_type1.c | 58 +
>  include/uapi/linux/vfio.h   | 29 +
>  2 files changed, 87 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 12a40b9db6aa..66513679081b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1710,6 +1710,25 @@ static int vfio_cache_inv_fn(struct device *dev, void 
> *data)
>   return iommu_cache_invalidate(d, dev, >info);
>  }
>  
> +static int vfio_bind_msi_fn(struct device *dev, void *data)
> +{
> + struct vfio_iommu_type1_bind_msi *ustruct =
> + (struct vfio_iommu_type1_bind_msi *)data;
> + struct iommu_domain *d = iommu_get_domain_for_dev(dev);
> +
> + return iommu_bind_guest_msi(d, dev, ustruct->iova,
> + ustruct->gpa, ustruct->size);
> +}
> +
> +static int vfio_unbind_msi_fn(struct device *dev, void *data)
> +{
> + dma_addr_t *iova = (dma_addr_t *)data;
> + struct iommu_domain *d = iommu_get_domain_for_dev(dev);

Same as previous, we can encapsulate domain in our own struct to avoid
a lookup.

> +
> + iommu_unbind_guest_msi(d, dev, *iova);

Is it strange that iommu-core is exposing these interfaces at a device
level if every one of them requires us to walk all the devices?  Thanks,

Alex

> + return 0;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  unsigned int cmd, unsigned long arg)
>  {
> @@ -1814,6 +1833,45 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
> vfio_cache_inv_fn);
>   mutex_unlock(>lock);
>   return ret;
> + } else if (cmd == VFIO_IOMMU_BIND_MSI) {
> + struct vfio_iommu_type1_bind_msi ustruct;
> + int ret;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_bind_msi,
> + size);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + ret = vfio_iommu_for_each_dev(iommu, ,
> +   vfio_bind_msi_fn);
> + if (ret)
> + vfio_iommu_for_each_dev(iommu, ,
> + vfio_unbind_msi_fn);
> + mutex_unlock(>lock);
> + return ret;
> + } else if (cmd == VFIO_IOMMU_UNBIND_MSI) {
> + struct vfio_iommu_type1_unbind_msi ustruct;
> + int ret;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_unbind_msi,
> + iova);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + ret = vfio_iommu_for_each_dev(iommu, ,
> +   vfio_unbind_msi_fn);
> + mutex_unlock(>lock);
> + return ret;
>   }
>  
>   return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 29f0ef2d805d..6763389b6adc 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -789,6 +789,35 @@ struct vfio_iommu_type1_cache_invalidate {
>  };
>  #define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 24)
>  
> +/**
> + * VFIO_IOMMU_BIND_MSI - _IOWR(VFIO_TYPE, VFIO_BASE + 25,
> + *   struct vfio_iommu_type1_bind_msi)
> + *
> + * Pass a stage 1 MSI doorbell mapping to the host so that this
> + * latter can build a nested stage2 mapping
> + */
> +struct vfio_iommu_type1_bind_msi {
> + __u32   argsz;
> + __u32   flags;
> + __u64   iova;
> + __u64   gpa;
> + __u64   size;
> +};
> +#define VFIO_IOMMU_BIND_MSI  _IO(VFIO_TYPE, VFIO_BASE + 25)
> +
> +/**
> + * VFIO_IOMMU_UNBIND_MSI - _IOWR(VFIO_TYPE, VFIO_BASE + 26,
> + *   struct vfio_iommu_type1_unbind_msi)
> + *
> + * Unregister an MSI mapping
> + */
> +struct vfio_iommu_type1_unbind_msi {
> + __u32   argsz;
> + __u32   flags;
> + __u64   iova;
> +};
> +#define VFIO_IOMMU_UNBIND_MSI  _IO(VFIO_TYPE, VFIO_BASE + 26)
> +
>  /*  Additional API for SPAPR TCE (Server 

[PATCH 5/7] arm: use a dummy struct device for ISA DMA use of the DMA API

2019-03-21 Thread Christoph Hellwig
This gets rid of the last NULL dev argument passed to the DMA API.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/kernel/dma-isa.c | 8 +++-
 arch/arm/mach-rpc/dma.c   | 8 +++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/dma-isa.c b/arch/arm/kernel/dma-isa.c
index 84363fe7bad2..10c45cc6b957 100644
--- a/arch/arm/kernel/dma-isa.c
+++ b/arch/arm/kernel/dma-isa.c
@@ -55,6 +55,12 @@ static int isa_get_dma_residue(unsigned int chan, dma_t *dma)
return chan < 4 ? count : (count << 1);
 }
 
+static struct device isa_dma_dev = {
+   .init_name  = "fallback device",
+   .coherent_dma_mask  = ~(dma_addr_t)0,
+   .dma_mask   = _dma_dev.coherent_dma_mask,
+};
+
 static void isa_enable_dma(unsigned int chan, dma_t *dma)
 {
if (dma->invalid) {
@@ -89,7 +95,7 @@ static void isa_enable_dma(unsigned int chan, dma_t *dma)
dma->sg = >buf;
dma->sgcount = 1;
dma->buf.length = dma->count;
-   dma->buf.dma_address = dma_map_single(NULL,
+   dma->buf.dma_address = dma_map_single(_dma_dev,
dma->addr, dma->count,
direction);
}
diff --git a/arch/arm/mach-rpc/dma.c b/arch/arm/mach-rpc/dma.c
index fb48f3141fb4..f2703ca17954 100644
--- a/arch/arm/mach-rpc/dma.c
+++ b/arch/arm/mach-rpc/dma.c
@@ -151,6 +151,12 @@ static void iomd_free_dma(unsigned int chan, dma_t *dma)
free_irq(idma->irq, idma);
 }
 
+static struct device isa_dma_dev = {
+   .init_name  = "fallback device",
+   .coherent_dma_mask  = ~(dma_addr_t)0,
+   .dma_mask   = _dma_dev.coherent_dma_mask,
+};
+
 static void iomd_enable_dma(unsigned int chan, dma_t *dma)
 {
struct iomd_dma *idma = container_of(dma, struct iomd_dma, dma);
@@ -168,7 +174,7 @@ static void iomd_enable_dma(unsigned int chan, dma_t *dma)
idma->dma.sg = >dma.buf;
idma->dma.sgcount = 1;
idma->dma.buf.length = idma->dma.count;
-   idma->dma.buf.dma_address = dma_map_single(NULL,
+   idma->dma.buf.dma_address = dma_map_single(_dma_dev,
idma->dma.addr, idma->dma.count,
idma->dma.dma_mode == DMA_MODE_READ ?
DMA_FROM_DEVICE : DMA_TO_DEVICE);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 7/7] x86: remove the x86_dma_fallback_dev hack

2019-03-21 Thread Christoph Hellwig
Now that we removed support for the NULL device argument in the DMA API,
there is no need to cater for that in the x86 code.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/dma-mapping.h | 10 --
 arch/x86/kernel/amd_gart_64.c  |  6 --
 arch/x86/kernel/pci-dma.c  | 20 
 kernel/dma/mapping.c   |  7 ---
 4 files changed, 43 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h 
b/arch/x86/include/asm/dma-mapping.h
index ce4d176b3d13..6b15a24930e0 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -13,14 +13,7 @@
 #include 
 #include 
 
-#ifdef CONFIG_ISA
-# define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
-#else
-# define ISA_DMA_BIT_MASK DMA_BIT_MASK(32)
-#endif
-
 extern int iommu_merge;
-extern struct device x86_dma_fallback_dev;
 extern int panic_on_overflow;
 
 extern const struct dma_map_ops *dma_ops;
@@ -30,7 +23,4 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return dma_ops;
 }
 
-bool arch_dma_alloc_attrs(struct device **dev);
-#define arch_dma_alloc_attrs arch_dma_alloc_attrs
-
 #endif
diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index 2c0aa34af69c..bf7f13ea3c64 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -233,9 +233,6 @@ static dma_addr_t gart_map_page(struct device *dev, struct 
page *page,
unsigned long bus;
phys_addr_t paddr = page_to_phys(page) + offset;
 
-   if (!dev)
-   dev = _dma_fallback_dev;
-
if (!need_iommu(dev, paddr, size))
return paddr;
 
@@ -392,9 +389,6 @@ static int gart_map_sg(struct device *dev, struct 
scatterlist *sg, int nents,
if (nents == 0)
return 0;
 
-   if (!dev)
-   dev = _dma_fallback_dev;
-
out = 0;
start   = 0;
start_sg= sg;
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index d460998ae828..dcd272dbd0a9 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -51,14 +51,6 @@ int iommu_pass_through __read_mostly;
 
 extern struct iommu_table_entry __iommu_table[], __iommu_table_end[];
 
-/* Dummy device used for NULL arguments (normally ISA). */
-struct device x86_dma_fallback_dev = {
-   .init_name = "fallback device",
-   .coherent_dma_mask = ISA_DMA_BIT_MASK,
-   .dma_mask = _dma_fallback_dev.coherent_dma_mask,
-};
-EXPORT_SYMBOL(x86_dma_fallback_dev);
-
 void __init pci_iommu_alloc(void)
 {
struct iommu_table_entry *p;
@@ -77,18 +69,6 @@ void __init pci_iommu_alloc(void)
}
 }
 
-bool arch_dma_alloc_attrs(struct device **dev)
-{
-   if (!*dev)
-   *dev = _dma_fallback_dev;
-
-   if (!is_device_dma_capable(*dev))
-   return false;
-   return true;
-
-}
-EXPORT_SYMBOL(arch_dma_alloc_attrs);
-
 /*
  * See  for the iommu kernel
  * parameter documentation.
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index c000906348c9..685a53f2a793 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -238,10 +238,6 @@ u64 dma_get_required_mask(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dma_get_required_mask);
 
-#ifndef arch_dma_alloc_attrs
-#define arch_dma_alloc_attrs(dev)  (true)
-#endif
-
 void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t flag, unsigned long attrs)
 {
@@ -256,9 +252,6 @@ void *dma_alloc_attrs(struct device *dev, size_t size, 
dma_addr_t *dma_handle,
/* let the implementation decide on the zone to allocate from: */
flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
 
-   if (!arch_dma_alloc_attrs())
-   return NULL;
-
if (dma_is_direct(ops))
cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
else if (ops->alloc)
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/7] dma-mapping: remove leftover NULL device support

2019-03-21 Thread Christoph Hellwig
Most dma_map_ops implementations already had some issues with a NULL
device, or did simply crash if one was fed to them.  Now that we have
cleaned up all the obvious offenders we can stop to pretend we
support this mode.

Signed-off-by: Christoph Hellwig 
---
 Documentation/DMA-API-HOWTO.txt | 13 ++---
 include/linux/dma-mapping.h |  6 +++---
 kernel/dma/direct.c |  2 +-
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index 1a721d0f35c8..0e98cf7466dd 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -365,13 +365,12 @@ __get_free_pages() (but takes size instead of a page 
order).  If your
 driver needs regions sized smaller than a page, you may prefer using
 the dma_pool interface, described below.
 
-The consistent DMA mapping interfaces, for non-NULL dev, will by
-default return a DMA address which is 32-bit addressable.  Even if the
-device indicates (via DMA mask) that it may address the upper 32-bits,
-consistent allocation will only return > 32-bit addresses for DMA if
-the consistent DMA mask has been explicitly changed via
-dma_set_coherent_mask().  This is true of the dma_pool interface as
-well.
+The consistent DMA mapping interfaces, will by default return a DMA address
+which is 32-bit addressable.  Even if the device indicates (via the DMA mask)
+that it may address the upper 32-bits, consistent allocation will only
+return > 32-bit addresses for DMA if the consistent DMA mask has been
+explicitly changed via dma_set_coherent_mask().  This is true of the
+dma_pool interface as well.
 
 dma_alloc_coherent() returns two values: the virtual address which you
 can use to access it from the CPU and dma_handle which you pass to the
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 75e60be91e5f..6309a721394b 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -267,9 +267,9 @@ size_t dma_direct_max_mapping_size(struct device *dev);
 
 static inline const struct dma_map_ops *get_dma_ops(struct device *dev)
 {
-   if (dev && dev->dma_ops)
+   if (dev->dma_ops)
return dev->dma_ops;
-   return get_arch_dma_ops(dev ? dev->bus : NULL);
+   return get_arch_dma_ops(dev->bus);
 }
 
 static inline void set_dma_ops(struct device *dev,
@@ -650,7 +650,7 @@ static inline void dma_free_coherent(struct device *dev, 
size_t size,
 
 static inline u64 dma_get_mask(struct device *dev)
 {
-   if (dev && dev->dma_mask && *dev->dma_mask)
+   if (dev->dma_mask && *dev->dma_mask)
return *dev->dma_mask;
return DMA_BIT_MASK(32);
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index fcdb23e8d2fc..2c2772e9702a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -311,7 +311,7 @@ static inline bool dma_direct_possible(struct device *dev, 
dma_addr_t dma_addr,
size_t size)
 {
return swiotlb_force != SWIOTLB_FORCE &&
-   (!dev || dma_capable(dev, dma_addr, size));
+   dma_capable(dev, dma_addr, size);
 }
 
 dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/7] parport_ip32: pass struct device to DMA API functions

2019-03-21 Thread Christoph Hellwig
The DMA API generally relies on a struct device to work properly, and
only barely works without one for legacy reasons.  Pass the easily
available struct device from the platform_device to remedy this.

Signed-off-by: Christoph Hellwig 
---
 drivers/parport/parport_ip32.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/parport/parport_ip32.c b/drivers/parport/parport_ip32.c
index 62873070f988..b7a892791c3e 100644
--- a/drivers/parport/parport_ip32.c
+++ b/drivers/parport/parport_ip32.c
@@ -568,6 +568,7 @@ static irqreturn_t parport_ip32_merr_interrupt(int irq, 
void *dev_id)
 
 /**
  * parport_ip32_dma_start - begins a DMA transfer
+ * @p: partport to work on
  * @dir:   DMA direction: DMA_TO_DEVICE or DMA_FROM_DEVICE
  * @addr:  pointer to data buffer
  * @count: buffer size
@@ -575,8 +576,8 @@ static irqreturn_t parport_ip32_merr_interrupt(int irq, 
void *dev_id)
  * Calls to parport_ip32_dma_start() and parport_ip32_dma_stop() must be
  * correctly balanced.
  */
-static int parport_ip32_dma_start(enum dma_data_direction dir,
- void *addr, size_t count)
+static int parport_ip32_dma_start(struct parport *p,
+   enum dma_data_direction dir, void *addr, size_t count)
 {
unsigned int limit;
u64 ctrl;
@@ -601,7 +602,7 @@ static int parport_ip32_dma_start(enum dma_data_direction 
dir,
 
/* Prepare DMA pointers */
parport_ip32_dma.dir = dir;
-   parport_ip32_dma.buf = dma_map_single(NULL, addr, count, dir);
+   parport_ip32_dma.buf = dma_map_single(>bus_dev, addr, count, dir);
parport_ip32_dma.len = count;
parport_ip32_dma.next = parport_ip32_dma.buf;
parport_ip32_dma.left = parport_ip32_dma.len;
@@ -625,11 +626,12 @@ static int parport_ip32_dma_start(enum dma_data_direction 
dir,
 
 /**
  * parport_ip32_dma_stop - ends a running DMA transfer
+ * @p: partport to work on
  *
  * Calls to parport_ip32_dma_start() and parport_ip32_dma_stop() must be
  * correctly balanced.
  */
-static void parport_ip32_dma_stop(void)
+static void parport_ip32_dma_stop(struct parport *p)
 {
u64 ctx_a;
u64 ctx_b;
@@ -685,8 +687,8 @@ static void parport_ip32_dma_stop(void)
enable_irq(MACEISA_PAR_CTXB_IRQ);
parport_ip32_dma.irq_on = 1;
 
-   dma_unmap_single(NULL, parport_ip32_dma.buf, parport_ip32_dma.len,
-parport_ip32_dma.dir);
+   dma_unmap_single(>bus_dev, parport_ip32_dma.buf,
+parport_ip32_dma.len, parport_ip32_dma.dir);
 }
 
 /**
@@ -1445,7 +1447,7 @@ static size_t parport_ip32_fifo_write_block_dma(struct 
parport *p,
 
priv->irq_mode = PARPORT_IP32_IRQ_HERE;
 
-   parport_ip32_dma_start(DMA_TO_DEVICE, (void *)buf, len);
+   parport_ip32_dma_start(p, DMA_TO_DEVICE, (void *)buf, len);
reinit_completion(>irq_complete);
parport_ip32_frob_econtrol(p, ECR_DMAEN | ECR_SERVINTR, ECR_DMAEN);
 
@@ -1461,7 +1463,7 @@ static size_t parport_ip32_fifo_write_block_dma(struct 
parport *p,
if (ecr & ECR_SERVINTR)
break;  /* DMA transfer just finished */
}
-   parport_ip32_dma_stop();
+   parport_ip32_dma_stop(p);
written = len - parport_ip32_dma_get_residue();
 
priv->irq_mode = PARPORT_IP32_IRQ_FWD;
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


remove NULL struct device support in the DMA API

2019-03-21 Thread Christoph Hellwig
We still have a few drivers which pass a NULL struct device pointer
to DMA API functions, which generally is a bad idea as the API
implementations rely on the device not only for ops selection, but
also the dma mask and various other attributes, and many implementations
have been broken for NULL device support for a while.

This series removes the few remaning users that weren't picked up in
the last merge window and then removes core support for this "feature".

A git tree is also available at:

git://git.infradead.org/users/hch/misc.git dma-remove-NULL-dev-support

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-remove-NULL-dev-support
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/7] gbefb: switch to managed version of the DMA allocator

2019-03-21 Thread Christoph Hellwig
gbefb uses managed resources, so it should do the same for DMA
allocations.

Signed-off-by: Christoph Hellwig 
---
 drivers/video/fbdev/gbefb.c | 24 
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/drivers/video/fbdev/gbefb.c b/drivers/video/fbdev/gbefb.c
index 1a242b1338e9..3fcb33232ba3 100644
--- a/drivers/video/fbdev/gbefb.c
+++ b/drivers/video/fbdev/gbefb.c
@@ -1162,9 +1162,9 @@ static int gbefb_probe(struct platform_device *p_dev)
}
gbe_revision = gbe->ctrlstat & 15;
 
-   gbe_tiles.cpu =
-   dma_alloc_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t),
-  _tiles.dma, GFP_KERNEL);
+   gbe_tiles.cpu = dmam_alloc_coherent(_dev->dev,
+   GBE_TLB_SIZE * sizeof(uint16_t),
+   _tiles.dma, GFP_KERNEL);
if (!gbe_tiles.cpu) {
printk(KERN_ERR "gbefb: couldn't allocate tiles table\n");
ret = -ENOMEM;
@@ -1178,19 +1178,20 @@ static int gbefb_probe(struct platform_device *p_dev)
if (!gbe_mem) {
printk(KERN_ERR "gbefb: couldn't map framebuffer\n");
ret = -ENOMEM;
-   goto out_tiles_free;
+   goto out_release_mem_region;
}
 
gbe_dma_addr = 0;
} else {
/* try to allocate memory with the classical allocator
 * this has high chance to fail on low memory machines */
-   gbe_mem = dma_alloc_wc(NULL, gbe_mem_size, _dma_addr,
-  GFP_KERNEL);
+   gbe_mem = dmam_alloc_attrs(_dev->dev, gbe_mem_size,
+   _dma_addr, GFP_KERNEL,
+   DMA_ATTR_WRITE_COMBINE);
if (!gbe_mem) {
printk(KERN_ERR "gbefb: couldn't allocate framebuffer 
memory\n");
ret = -ENOMEM;
-   goto out_tiles_free;
+   goto out_release_mem_region;
}
 
gbe_mem_phys = (unsigned long) gbe_dma_addr;
@@ -1237,11 +1238,6 @@ static int gbefb_probe(struct platform_device *p_dev)
 
 out_gbe_unmap:
arch_phys_wc_del(par->wc_cookie);
-   if (gbe_dma_addr)
-   dma_free_wc(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
-out_tiles_free:
-   dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t),
- (void *)gbe_tiles.cpu, gbe_tiles.dma);
 out_release_mem_region:
release_mem_region(GBE_BASE, sizeof(struct sgi_gbe));
 out_release_framebuffer:
@@ -1258,10 +1254,6 @@ static int gbefb_remove(struct platform_device* p_dev)
unregister_framebuffer(info);
gbe_turn_off();
arch_phys_wc_del(par->wc_cookie);
-   if (gbe_dma_addr)
-   dma_free_wc(NULL, gbe_mem_size, gbe_mem, gbe_mem_phys);
-   dma_free_coherent(NULL, GBE_TLB_SIZE * sizeof(uint16_t),
- (void *)gbe_tiles.cpu, gbe_tiles.dma);
release_mem_region(GBE_BASE, sizeof(struct sgi_gbe));
gbefb_remove_sysfs(_dev->dev);
framebuffer_release(info);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/7] da8xx-fb: pass struct device to DMA API functions

2019-03-21 Thread Christoph Hellwig
The DMA API generally relies on a struct device to work properly, and
only barely works without one for legacy reasons.  Pass the easily
available struct device from the platform_device to remedy this.

Signed-off-by: Christoph Hellwig 
---
 drivers/video/fbdev/da8xx-fb.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/video/fbdev/da8xx-fb.c b/drivers/video/fbdev/da8xx-fb.c
index 43f2a4816860..ec62274b914b 100644
--- a/drivers/video/fbdev/da8xx-fb.c
+++ b/drivers/video/fbdev/da8xx-fb.c
@@ -1097,9 +1097,9 @@ static int fb_remove(struct platform_device *dev)
 
unregister_framebuffer(info);
fb_dealloc_cmap(>cmap);
-   dma_free_coherent(NULL, PALETTE_SIZE, par->v_palette_base,
+   dma_free_coherent(par->dev, PALETTE_SIZE, par->v_palette_base,
  par->p_palette_base);
-   dma_free_coherent(NULL, par->vram_size, par->vram_virt,
+   dma_free_coherent(par->dev, par->vram_size, par->vram_virt,
  par->vram_phys);
pm_runtime_put_sync(>dev);
pm_runtime_disable(>dev);
@@ -1425,7 +1425,7 @@ static int fb_probe(struct platform_device *device)
par->vram_size = roundup(par->vram_size/8, ulcm);
par->vram_size = par->vram_size * LCD_NUM_BUFFERS;
 
-   par->vram_virt = dma_alloc_coherent(NULL,
+   par->vram_virt = dma_alloc_coherent(par->dev,
par->vram_size,
>vram_phys,
GFP_KERNEL | GFP_DMA);
@@ -1446,7 +1446,7 @@ static int fb_probe(struct platform_device *device)
da8xx_fb_fix.line_length - 1;
 
/* allocate palette buffer */
-   par->v_palette_base = dma_alloc_coherent(NULL, PALETTE_SIZE,
+   par->v_palette_base = dma_alloc_coherent(par->dev, PALETTE_SIZE,
 >p_palette_base,
 GFP_KERNEL | GFP_DMA);
if (!par->v_palette_base) {
@@ -1532,11 +1532,12 @@ static int fb_probe(struct platform_device *device)
fb_dealloc_cmap(_fb_info->cmap);
 
 err_release_pl_mem:
-   dma_free_coherent(NULL, PALETTE_SIZE, par->v_palette_base,
+   dma_free_coherent(par->dev, PALETTE_SIZE, par->v_palette_base,
  par->p_palette_base);
 
 err_release_fb_mem:
-   dma_free_coherent(NULL, par->vram_size, par->vram_virt, par->vram_phys);
+   dma_free_coherent(par->dev, par->vram_size, par->vram_virt,
+ par->vram_phys);
 
 err_release_fb:
framebuffer_release(da8xx_fb_info);
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/7] pxa3xx-gcu: pass struct device to dma_mmap_coherent

2019-03-21 Thread Christoph Hellwig
Just like we do for all other DMA operations.

Signed-off-by: Christoph Hellwig 
---
 drivers/video/fbdev/pxa3xx-gcu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/pxa3xx-gcu.c b/drivers/video/fbdev/pxa3xx-gcu.c
index 69cfb337c857..047a2fa4b87e 100644
--- a/drivers/video/fbdev/pxa3xx-gcu.c
+++ b/drivers/video/fbdev/pxa3xx-gcu.c
@@ -96,6 +96,7 @@ struct pxa3xx_gcu_batch {
 };
 
 struct pxa3xx_gcu_priv {
+   struct device*dev;
void __iomem *mmio_base;
struct clk   *clk;
struct pxa3xx_gcu_shared *shared;
@@ -493,7 +494,7 @@ pxa3xx_gcu_mmap(struct file *file, struct vm_area_struct 
*vma)
if (size != SHARED_SIZE)
return -EINVAL;
 
-   return dma_mmap_coherent(NULL, vma,
+   return dma_mmap_coherent(priv->dev, vma,
priv->shared, priv->shared_phys, size);
 
case SHARED_SIZE >> PAGE_SHIFT:
@@ -670,6 +671,7 @@ static int pxa3xx_gcu_probe(struct platform_device *pdev)
 
platform_set_drvdata(pdev, priv);
priv->resource_mem = r;
+   priv->dev = dev;
pxa3xx_gcu_reset(priv);
pxa3xx_gcu_init_debug_timer(priv);
 
-- 
2.20.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 08/22] vfio: VFIO_IOMMU_CACHE_INVALIDATE

2019-03-21 Thread Alex Williamson
On Sun, 17 Mar 2019 18:22:18 +0100
Eric Auger  wrote:

> From: "Liu, Yi L" 
> 
> When the guest "owns" the stage 1 translation structures,  the host
> IOMMU driver has no knowledge of caching structure updates unless
> the guest invalidation requests are trapped and passed down to the
> host.
> 
> This patch adds the VFIO_IOMMU_CACHE_INVALIDATE ioctl with aims
> at propagating guest stage1 IOMMU cache invalidations to the host.
> 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v2 -> v3:
> - introduce vfio_iommu_for_each_dev back in this patch
> 
> v1 -> v2:
> - s/TLB/CACHE
> - remove vfio_iommu_task usage
> - commit message rewording
> ---
>  drivers/vfio/vfio_iommu_type1.c | 47 +
>  include/uapi/linux/vfio.h   | 13 +
>  2 files changed, 60 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 222e9199edbf..12a40b9db6aa 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -113,6 +113,26 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
>  

static struct foo {
struct iommu_domain *domain;
void *data;
};

> +/* iommu->lock must be held */
> +static int
> +vfio_iommu_for_each_dev(struct vfio_iommu *iommu, void *data,
> + int (*fn)(struct device *, void *))
> +{
> + struct vfio_domain *d;
> + struct vfio_group *g;
> + int ret = 0;

struct foo bar = { .data = data };

> +
> + list_for_each_entry(d, >domain_list, next) {

bar.domain = d->domain;

> + list_for_each_entry(g, >group_list, next) {
> + ret = iommu_group_for_each_dev(g->iommu_group,
> +data, fn);

s/data//

> + if (ret)
> + break;
> + }
> + }
> + return ret;
> +}
> +
>  static int put_pfn(unsigned long pfn, int prot);
>  
>  /*
> @@ -1681,6 +1701,15 @@ vfio_attach_pasid_table(struct vfio_iommu *iommu,
>   return ret;
>  }
>  
> +static int vfio_cache_inv_fn(struct device *dev, void *data)
> +{

struct foo *bar = data;

> + struct vfio_iommu_type1_cache_invalidate *ustruct =
> + (struct vfio_iommu_type1_cache_invalidate *)data;

... = bar->data;

> + struct iommu_domain *d = iommu_get_domain_for_dev(dev);

... = bar->domain;

¯\_(ツ)_/¯ seems more efficient that doing a lookup.

> +
> + return iommu_cache_invalidate(d, dev, >info);
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  unsigned int cmd, unsigned long arg)
>  {
> @@ -1767,6 +1796,24 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   } else if (cmd == VFIO_IOMMU_DETACH_PASID_TABLE) {
>   vfio_detach_pasid_table(iommu);
>   return 0;
> + } else if (cmd == VFIO_IOMMU_CACHE_INVALIDATE) {
> + struct vfio_iommu_type1_cache_invalidate ustruct;
> + int ret;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_cache_invalidate,
> + info);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + ret = vfio_iommu_for_each_dev(iommu, ,
> +   vfio_cache_inv_fn);

Guess what has a version field that never gets checked ;)

Thanks,
Alex

> + mutex_unlock(>lock);
> + return ret;
>   }
>  
>   return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 329d378565d9..29f0ef2d805d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -776,6 +776,19 @@ struct vfio_iommu_type1_attach_pasid_table {
>  #define VFIO_IOMMU_ATTACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 22)
>  #define VFIO_IOMMU_DETACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 23)
>  
> +/**
> + * VFIO_IOMMU_CACHE_INVALIDATE - _IOWR(VFIO_TYPE, VFIO_BASE + 24,
> + *   struct vfio_iommu_type1_cache_invalidate)
> + *
> + * Propagate guest IOMMU cache invalidation to the host.
> + */
> +struct vfio_iommu_type1_cache_invalidate {
> + __u32   argsz;
> + __u32   flags;
> + struct iommu_cache_invalidate_info info;
> +};
> +#define VFIO_IOMMU_CACHE_INVALIDATE  _IO(VFIO_TYPE, VFIO_BASE + 24)
> +
>  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
>  
>  /*

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 07/22] vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE

2019-03-21 Thread Alex Williamson
On Sun, 17 Mar 2019 18:22:17 +0100
Eric Auger  wrote:

> From: "Liu, Yi L" 
> 
> This patch adds VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE ioctl
> which aims to pass/withdraw the virtual iommu guest configuration
> to/from the VFIO driver downto to the iommu subsystem.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Eric Auger 
> 
> ---
> v3 -> v4:
> - restore ATTACH/DETACH
> - add unwind on failure
> 
> v2 -> v3:
> - s/BIND_PASID_TABLE/SET_PASID_TABLE
> 
> v1 -> v2:
> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
> - remove the struct device arg
> ---
>  drivers/vfio/vfio_iommu_type1.c | 53 +
>  include/uapi/linux/vfio.h   | 17 +++
>  2 files changed, 70 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 73652e21efec..222e9199edbf 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -1644,6 +1644,43 @@ static int vfio_domains_have_iommu_cache(struct 
> vfio_iommu *iommu)
>   return ret;
>  }
>  
> +static void
> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
> +{
> + struct vfio_domain *d;
> +
> + mutex_lock(>lock);
> +
> + list_for_each_entry(d, >domain_list, next) {
> + iommu_detach_pasid_table(d->domain);
> + }
> + mutex_unlock(>lock);
> +}
> +
> +static int
> +vfio_attach_pasid_table(struct vfio_iommu *iommu,
> + struct vfio_iommu_type1_attach_pasid_table *ustruct)
> +{
> + struct vfio_domain *d;
> + int ret = 0;
> +
> + mutex_lock(>lock);
> +
> + list_for_each_entry(d, >domain_list, next) {
> + ret = iommu_attach_pasid_table(d->domain, >config);
> + if (ret)
> + goto unwind;
> + }
> + goto unlock;
> +unwind:
> + list_for_each_entry_continue_reverse(d, >domain_list, next) {
> + iommu_detach_pasid_table(d->domain);
> + }
> +unlock:
> + mutex_unlock(>lock);
> + return ret;
> +}
> +
>  static long vfio_iommu_type1_ioctl(void *iommu_data,
>  unsigned int cmd, unsigned long arg)
>  {
> @@ -1714,6 +1751,22 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>   return copy_to_user((void __user *)arg, , minsz) ?
>   -EFAULT : 0;
> + } else if (cmd == VFIO_IOMMU_ATTACH_PASID_TABLE) {
> + struct vfio_iommu_type1_attach_pasid_table ustruct;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_attach_pasid_table,
> + config);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (ustruct.argsz < minsz || ustruct.flags)
> + return -EINVAL;

Who is responsible for validating the ustruct.config?
arm_smmu_attach_pasid_table() only looks at the format, ignoring the
version field :-\  In fact, where is struct iommu_pasid_smmuv3 ever used
from the config?  Should the padding fields be forced to zero?  We
don't have flags to incorporate use of them with future extensions, so
maybe we don't care?

> +
> + return vfio_attach_pasid_table(iommu, );
> + } else if (cmd == VFIO_IOMMU_DETACH_PASID_TABLE) {
> + vfio_detach_pasid_table(iommu);
> + return 0;
>   }
>  
>   return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 02bb7ad6e986..329d378565d9 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -14,6 +14,7 @@
>  
>  #include 
>  #include 
> +#include 
>  
>  #define VFIO_API_VERSION 0
>  
> @@ -759,6 +760,22 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE   _IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/**
> + * VFIO_IOMMU_ATTACH_PASID_TABLE - _IOWR(VFIO_TYPE, VFIO_BASE + 22,
> + *   struct vfio_iommu_type1_attach_pasid_table)
> + *
> + * Passes the PASID table to the host. Calling ATTACH_PASID_TABLE
> + * while a table is already installed is allowed: it replaces the old
> + * table. DETACH does a comprehensive tear down of the nested mode.
> + */
> +struct vfio_iommu_type1_attach_pasid_table {
> + __u32   argsz;
> + __u32   flags;
> + struct iommu_pasid_table_config config;
> +};
> +#define VFIO_IOMMU_ATTACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 22)
> +#define VFIO_IOMMU_DETACH_PASID_TABLE_IO(VFIO_TYPE, VFIO_BASE + 23)
> +

DETACH should also be documented, it's not clear from the uapi that it
requires no parameters.  Thanks,

Alex
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-21 Thread Jacob Pan
On Thu, 21 Mar 2019 15:32:45 +0100
Auger Eric  wrote:

> Hi jean, Jacob,
> 
> On 3/21/19 3:13 PM, Jean-Philippe Brucker wrote:
> > On 21/03/2019 13:54, Auger Eric wrote:  
> >> Hi Jacob, Jean-Philippe,
> >>
> >> On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote:  
> >>> On 20/03/2019 16:37, Jacob Pan wrote:
> >>> [...]  
> > +struct iommu_inv_addr_info {
> > +#define IOMMU_INV_ADDR_FLAGS_PASID (1 << 0)
> > +#define IOMMU_INV_ADDR_FLAGS_ARCHID(1 << 1)
> > +#define IOMMU_INV_ADDR_FLAGS_LEAF  (1 << 2)
> > +   __u32   flags;
> > +   __u32   archid;
> > +   __u64   pasid;
> > +   __u64   addr;
> > +   __u64   granule_size;
> > +   __u64   nb_granules;
> > +};
> > +
> > +/**
> > + * First level/stage invalidation information
> > + * @cache: bitfield that allows to select which caches to
> > invalidate
> > + * @granularity: defines the lowest granularity used for the
> > invalidation:
> > + * domain > pasid > addr
> > + *
> > + * Not all the combinations of cache/granularity make sense:
> > + *
> > + * type |   DEV_IOTLB   | IOTLB |
> > PASID|
> > + * granularity |   |   |
> > cache   |
> > + *
> > -+---+---+---+
> > + * DOMAIN  |   N/A |   Y   |
> > Y   |
> > + * PASID   |   Y   |   Y   |
> > Y   |
> > + * ADDR|   Y   |   Y   |
> > N/A |
> > + */
> > +struct iommu_cache_invalidate_info {
> > +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> > +   __u32   version;
> > +/* IOMMU paging structure cache */
> > +#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU
> > IOTLB */ +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB(1 <<
> > 1) /* Device IOTLB */ +#define
> > IOMMU_CACHE_INV_TYPE_PASID  (1 << 2) /* PASID cache */  
>  Just a clarification, this used to be an enum. You do intend to
>  issue a single invalidation request on multiple cache types?
>  Perhaps for virtio-IOMMU? I only see a single cache type in your
>  patch #14. For VT-d we plan to issue one cache type at a time
>  for now. So this format works for us.  
> >>>
> >>> Yes for virtio-iommu I'd like as little overhead as possible,
> >>> which means a single invalidation message to hit both IOTLB and
> >>> ATC at once, and the ability to specify multiple pages with
> >>> @nb_granules.  
> >> The original request/explanation from Jean-Philippe can be found
> >> here: https://lkml.org/lkml/2019/1/28/1497
> >>  
> >>>  
>  However, if multiple cache types are issued in a single
>  invalidation. They must share a single granularity, not all
>  combinations are valid. e.g. dev IOTLB does not support domain
>  granularity. Just a reminder, not an issue. Driver could filter
>  out invalid combinations.  
> >> Sure I will add a comment about this restriction.  
> >>>
> >>> Agreed. Even the core could filter out invalid combinations based
> >>> on the table above: IOTLB and domain granularity are N/A.  
> >> I don't get this sentence. What about vtd IOTLB domain-selective
> >> invalidation:  
> > 
> > My mistake: I meant dev-IOTLB and domain granularity are N/A  
> 
> Ah OK, no worries.
> 
> How do we proceed further with those user APIs? Besides the comment to
> be added above and previous suggestion from Jean ("Invalidations by
> @granularity use field ...), have we reached a consensus now on:
> 
> - attach/detach_pasid_table
> - cache_invalidate
> - fault data and fault report API?
> 
These APIs are sufficient for today's VT-d use and leave enough space
for extension. E.g. new fault reasons.

I have cherry picked the above APIs from your patchset to enable VT-d
nested translation. Going forward, I will reuse these until they get
merged.

> If not, please let me know.
> 
> Thanks
> 
> Eric
> 
> 
> > 
> > Thanks,
> > Jean
> >   
> >> "
> >> • IOTLB entries caching mappings associated with the specified
> >> domain-id are invalidated.
> >> • Paging-structure-cache entries caching mappings associated with
> >> the specified domain-id are invalidated.
> >> "
> >>
> >> Thanks
> >>
> >> Eric
> >>  
> >>>
> >>> Thanks,
> >>> Jean
> >>>  
>   
> > +   __u8cache;
> > +   __u8granularity;
> > +   __u8padding[2];
> > +   union {
> > +   __u64   pasid;
> > +   struct iommu_inv_addr_info addr_info;
> > +   };
> > +};
> > +
> > +
> >  #endif /* _UAPI_IOMMU_H */  
> 
>  [Jacob Pan]
>  ___
>  iommu mailing list
>  iommu@lists.linux-foundation.org
>  https://lists.linuxfoundation.org/mailman/listinfo/iommu
>   
> >>>  
> >> ___

Re: [PATCH v6 02/22] iommu: introduce device fault data

2019-03-21 Thread Jacob Pan
On Sun, 17 Mar 2019 18:22:12 +0100
Eric Auger  wrote:

> From: Jacob Pan 
> 
> Device faults detected by IOMMU can be reported outside the IOMMU
> subsystem for further processing. This patch introduces
> a generic device fault data structure.
> 
> The fault can be either an unrecoverable fault or a page request,
> also referred to as a recoverable fault.
> 
> We only care about non internal faults that are likely to be reported
> to an external subsystem.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Liu, Yi L 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Eric Auger 
> 
> ---
> v4 -> v5:
> - simplified struct iommu_fault_event comment
> - Moved IOMMU_FAULT_PERM outside of the struct
> - Removed IOMMU_FAULT_PERM_INST
> - s/IOMMU_FAULT_PAGE_REQUEST_PASID_PRESENT/
>   IOMMU_FAULT_PAGE_REQUEST_PASID_VALID
> 
> v3 -> v4:
> - use a union containing aither an unrecoverable fault or a page
>   request message. Move the device private data in the page request
>   structure. Reshuffle the fields and use flags.
> - move fault perm attributes to the uapi
> - remove a bunch of iommu_fault_reason enum values that were related
>   to internal errors
> ---
>  include/linux/iommu.h  |  44 ++
>  include/uapi/linux/iommu.h | 115
> + 2 files changed, 159
> insertions(+) create mode 100644 include/uapi/linux/iommu.h
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index ffbbc7e39cee..c6f398f7e6e0 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define IOMMU_READ   (1 << 0)
>  #define IOMMU_WRITE  (1 << 1)
> @@ -48,6 +49,7 @@ struct bus_type;
>  struct device;
>  struct iommu_domain;
>  struct notifier_block;
> +struct iommu_fault_event;
>  
>  /* iommu fault flags */
>  #define IOMMU_FAULT_READ 0x0
> @@ -55,6 +57,7 @@ struct notifier_block;
>  
>  typedef int (*iommu_fault_handler_t)(struct iommu_domain *,
>   struct device *, unsigned long, int, void *);
> +typedef int (*iommu_dev_fault_handler_t)(struct iommu_fault_event *,
> void *); 
>  struct iommu_domain_geometry {
>   dma_addr_t aperture_start; /* First address that can be
> mapped*/ @@ -247,6 +250,46 @@ struct iommu_device {
>   struct device *dev;
>  };
>  
> +/**
> + * struct iommu_fault_event - Generic fault event
> + *
> + * Can represent recoverable faults such as a page requests or
> + * unrecoverable faults such as DMA or IRQ remapping faults.
> + *
> + * @fault: fault descriptor
> + * @iommu_private: used by the IOMMU driver for storing
> fault-specific
> + * data. Users should not modify this field before
> + * sending the fault response.
> + */
> +struct iommu_fault_event {
> + struct iommu_fault fault;
> + u64 iommu_private;
> +};
> +
> +/**
> + * struct iommu_fault_param - per-device IOMMU fault data
> + * @dev_fault_handler: Callback function to handle IOMMU faults at
> device level
> + * @data: handler private data
> + *
> + */
> +struct iommu_fault_param {
> + iommu_dev_fault_handler_t handler;
> + void *data;
> +};
> +
> +/**
> + * struct iommu_param - collection of per-device IOMMU data
> + *
> + * @fault_param: IOMMU detected device fault reporting data
> + *
> + * TODO: migrate other per device data pointers under
> iommu_dev_data, e.g.
> + *   struct iommu_group  *iommu_group;
> + *   struct iommu_fwspec *iommu_fwspec;
> + */
> +struct iommu_param {
> + struct iommu_fault_param *fault_param;
> +};
> +
>  int  iommu_device_register(struct iommu_device *iommu);
>  void iommu_device_unregister(struct iommu_device *iommu);
>  int  iommu_device_sysfs_add(struct iommu_device *iommu,
> @@ -422,6 +465,7 @@ struct iommu_ops {};
>  struct iommu_group {};
>  struct iommu_fwspec {};
>  struct iommu_device {};
> +struct iommu_fault_param {};
>  
>  static inline bool iommu_present(struct bus_type *bus)
>  {
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> new file mode 100644
> index ..edcc0dda7993
> --- /dev/null
> +++ b/include/uapi/linux/iommu.h
> @@ -0,0 +1,115 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * IOMMU user API definitions
> + */
> +
> +#ifndef _UAPI_IOMMU_H
> +#define _UAPI_IOMMU_H
> +
> +#include 
> +
> +#define IOMMU_FAULT_PERM_WRITE   (1 << 0) /* write */
> +#define IOMMU_FAULT_PERM_EXEC(1 << 1) /* exec */
> +#define IOMMU_FAULT_PERM_PRIV(1 << 2) /* privileged */
> +
> +/*  Generic fault types, can be expanded IRQ remapping fault */
> +enum iommu_fault_type {
> + IOMMU_FAULT_DMA_UNRECOV = 1,/* unrecoverable fault */
> + IOMMU_FAULT_PAGE_REQ,   /* page request fault */
> +};
> +
> +enum iommu_fault_reason {
> + IOMMU_FAULT_REASON_UNKNOWN = 0,
> +
> + /* Could not access the PASID table (fetch caused external
> abort) */
> + 

Re: [PATCH v6 03/22] iommu: introduce device fault report API

2019-03-21 Thread Alex Williamson
On Sun, 17 Mar 2019 18:22:13 +0100
Eric Auger  wrote:

> From: Jacob Pan 
> 
> Traditionally, device specific faults are detected and handled within
> their own device drivers. When IOMMU is enabled, faults such as DMA
> related transactions are detected by IOMMU. There is no generic
> reporting mechanism to report faults back to the in-kernel device
> driver or the guest OS in case of assigned devices.
> 
> This patch introduces a registration API for device specific fault
> handlers. This differs from the existing iommu_set_fault_handler/
> report_iommu_fault infrastructures in several ways:
> - it allows to report more sophisticated fault events (both
>   unrecoverable faults and page request faults) due to the nature
>   of the iommu_fault struct
> - it is device specific and not domain specific.
> 
> The current iommu_report_device_fault() implementation only handles
> the "shoot and forget" unrecoverable fault case. Handling of page
> request faults or stalled faults will come later.
> 
> Signed-off-by: Jacob Pan 
> Signed-off-by: Ashok Raj 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v4 -> v5:
> - remove stuff related to recoverable faults
> ---
>  drivers/iommu/iommu.c | 134 +-
>  include/linux/iommu.h |  36 +++-
>  2 files changed, 168 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 33a982e33716..56d5bf68de53 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -648,6 +648,13 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   goto err_free_name;
>   }
>  
> + dev->iommu_param = kzalloc(sizeof(*dev->iommu_param), GFP_KERNEL);
> + if (!dev->iommu_param) {
> + ret = -ENOMEM;
> + goto err_free_name;
> + }
> + mutex_init(>iommu_param->lock);
> +
>   kobject_get(group->devices_kobj);
>  
>   dev->iommu_group = group;
> @@ -678,6 +685,7 @@ int iommu_group_add_device(struct iommu_group *group, 
> struct device *dev)
>   mutex_unlock(>mutex);
>   dev->iommu_group = NULL;
>   kobject_put(group->devices_kobj);
> + kfree(dev->iommu_param);
>  err_free_name:
>   kfree(device->name);
>  err_remove_link:
> @@ -724,7 +732,7 @@ void iommu_group_remove_device(struct device *dev)
>   sysfs_remove_link(>kobj, "iommu_group");
>  
>   trace_remove_device_from_group(group->id, dev);
> -
> + kfree(dev->iommu_param);
>   kfree(device->name);
>   kfree(device);
>   dev->iommu_group = NULL;
> @@ -858,6 +866,130 @@ int iommu_group_unregister_notifier(struct iommu_group 
> *group,
>  }
>  EXPORT_SYMBOL_GPL(iommu_group_unregister_notifier);
>  
> +/**
> + * iommu_register_device_fault_handler() - Register a device fault handler
> + * @dev: the device
> + * @handler: the fault handler
> + * @data: private data passed as argument to the handler
> + *
> + * When an IOMMU fault event is received, this handler gets called with the
> + * fault event and data as argument.
> + *
> + * Return 0 if the fault handler was installed successfully, or an error.
> + */
> +int iommu_register_device_fault_handler(struct device *dev,
> + iommu_dev_fault_handler_t handler,
> + void *data)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + /*
> +  * Device iommu_param should have been allocated when device is
> +  * added to its iommu_group.
> +  */
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> + /* Only allow one fault handler registered for each device */
> + if (param->fault_param) {
> + ret = -EBUSY;
> + goto done_unlock;
> + }
> +
> + get_device(dev);
> + param->fault_param =
> + kzalloc(sizeof(struct iommu_fault_param), GFP_KERNEL);
> + if (!param->fault_param) {
> + put_device(dev);
> + ret = -ENOMEM;
> + goto done_unlock;
> + }
> + mutex_init(>fault_param->lock);
> + param->fault_param->handler = handler;
> + param->fault_param->data = data;
> + INIT_LIST_HEAD(>fault_param->faults);
> +
> +done_unlock:
> + mutex_unlock(>lock);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_register_device_fault_handler);
> +
> +/**
> + * iommu_unregister_device_fault_handler() - Unregister the device fault 
> handler
> + * @dev: the device
> + *
> + * Remove the device fault handler installed with
> + * iommu_register_device_fault_handler().
> + *
> + * Return 0 on success, or an error.
> + */
> +int iommu_unregister_device_fault_handler(struct device *dev)
> +{
> + struct iommu_param *param = dev->iommu_param;
> + int ret = 0;
> +
> + if (!param)
> + return -EINVAL;
> +
> + mutex_lock(>lock);
> +
> + if (!param->fault_param)
> + 

Re: [PATCH 1/4] ACPI/IORT: Check ATS capability in root complex nodes

2019-03-21 Thread Sinan Kaya

On 3/20/2019 1:36 PM, Jean-Philippe Brucker wrote:

err = pci_for_each_dma_alias(to_pci_dev(dev),
 iort_pci_iommu_init, );
+
+   if (!err && !iort_pci_rc_supports_ats(node))
+   dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_NO_ATS;


Is it possible to remove !err check here. ATS supported is just a flag
in IORT table. Does this decision have to rely on having a correct dma
alias?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] iommu/arm-smmu-v3: Add support for PCI ATS

2019-03-21 Thread Sinan Kaya

On 3/20/2019 1:36 PM, Jean-Philippe Brucker wrote:

pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ATS);
+   if (!pos)
+   return -ENOSYS;
+


You don't need this. pci_enable_ats() validates this via.

if (!dev->ats_cap)
return -EINVAL;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-21 Thread Auger Eric
Hi jean, Jacob,

On 3/21/19 3:13 PM, Jean-Philippe Brucker wrote:
> On 21/03/2019 13:54, Auger Eric wrote:
>> Hi Jacob, Jean-Philippe,
>>
>> On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote:
>>> On 20/03/2019 16:37, Jacob Pan wrote:
>>> [...]
> +struct iommu_inv_addr_info {
> +#define IOMMU_INV_ADDR_FLAGS_PASID   (1 << 0)
> +#define IOMMU_INV_ADDR_FLAGS_ARCHID  (1 << 1)
> +#define IOMMU_INV_ADDR_FLAGS_LEAF(1 << 2)
> + __u32   flags;
> + __u32   archid;
> + __u64   pasid;
> + __u64   addr;
> + __u64   granule_size;
> + __u64   nb_granules;
> +};
> +
> +/**
> + * First level/stage invalidation information
> + * @cache: bitfield that allows to select which caches to invalidate
> + * @granularity: defines the lowest granularity used for the
> invalidation:
> + * domain > pasid > addr
> + *
> + * Not all the combinations of cache/granularity make sense:
> + *
> + * type |   DEV_IOTLB   | IOTLB |  PASID|
> + * granularity   |   |   |
> cache |
> + * -+---+---+---+
> + * DOMAIN|   N/A |   Y   |
> Y |
> + * PASID |   Y   |   Y   |
> Y |
> + * ADDR  |   Y   |   Y   |
> N/A   |
> + */
> +struct iommu_cache_invalidate_info {
> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
> + __u32   version;
> +/* IOMMU paging structure cache */
> +#define IOMMU_CACHE_INV_TYPE_IOTLB   (1 << 0) /* IOMMU IOTLB */
> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB   (1 << 1) /* Device
> IOTLB */ +#define IOMMU_CACHE_INV_TYPE_PASID  (1 << 2) /* PASID
> cache */
 Just a clarification, this used to be an enum. You do intend to issue a
 single invalidation request on multiple cache types? Perhaps for
 virtio-IOMMU? I only see a single cache type in your patch #14. For VT-d
 we plan to issue one cache type at a time for now. So this format works
 for us.
>>>
>>> Yes for virtio-iommu I'd like as little overhead as possible, which
>>> means a single invalidation message to hit both IOTLB and ATC at once,
>>> and the ability to specify multiple pages with @nb_granules.
>> The original request/explanation from Jean-Philippe can be found here:
>> https://lkml.org/lkml/2019/1/28/1497
>>
>>>
 However, if multiple cache types are issued in a single invalidation.
 They must share a single granularity, not all combinations are valid.
 e.g. dev IOTLB does not support domain granularity. Just a reminder,
 not an issue. Driver could filter out invalid combinations.
>> Sure I will add a comment about this restriction.
>>>
>>> Agreed. Even the core could filter out invalid combinations based on the
>>> table above: IOTLB and domain granularity are N/A.
>> I don't get this sentence. What about vtd IOTLB domain-selective
>> invalidation:
> 
> My mistake: I meant dev-IOTLB and domain granularity are N/A

Ah OK, no worries.

How do we proceed further with those user APIs? Besides the comment to
be added above and previous suggestion from Jean ("Invalidations by
@granularity use field ...), have we reached a consensus now on:

- attach/detach_pasid_table
- cache_invalidate
- fault data and fault report API?

If not, please let me know.

Thanks

Eric


> 
> Thanks,
> Jean
> 
>> "
>> • IOTLB entries caching mappings associated with the specified domain-id
>> are invalidated.
>> • Paging-structure-cache entries caching mappings associated with the
>> specified domain-id are invalidated.
>> "
>>
>> Thanks
>>
>> Eric
>>
>>>
>>> Thanks,
>>> Jean
>>>

> + __u8cache;
> + __u8granularity;
> + __u8padding[2];
> + union {
> + __u64   pasid;
> + struct iommu_inv_addr_info addr_info;
> + };
> +};
> +
> +
>  #endif /* _UAPI_IOMMU_H */

 [Jacob Pan]
 ___
 iommu mailing list
 iommu@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/iommu

>>>
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-21 Thread Jean-Philippe Brucker
On 21/03/2019 13:54, Auger Eric wrote:
> Hi Jacob, Jean-Philippe,
> 
> On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote:
>> On 20/03/2019 16:37, Jacob Pan wrote:
>> [...]
 +struct iommu_inv_addr_info {
 +#define IOMMU_INV_ADDR_FLAGS_PASID(1 << 0)
 +#define IOMMU_INV_ADDR_FLAGS_ARCHID   (1 << 1)
 +#define IOMMU_INV_ADDR_FLAGS_LEAF (1 << 2)
 +  __u32   flags;
 +  __u32   archid;
 +  __u64   pasid;
 +  __u64   addr;
 +  __u64   granule_size;
 +  __u64   nb_granules;
 +};
 +
 +/**
 + * First level/stage invalidation information
 + * @cache: bitfield that allows to select which caches to invalidate
 + * @granularity: defines the lowest granularity used for the
 invalidation:
 + * domain > pasid > addr
 + *
 + * Not all the combinations of cache/granularity make sense:
 + *
 + * type |   DEV_IOTLB   | IOTLB |  PASID|
 + * granularity|   |   |
 cache  |
 + * -+---+---+---+
 + * DOMAIN |   N/A |   Y   |
 Y  |
 + * PASID  |   Y   |   Y   |
 Y  |
 + * ADDR   |   Y   |   Y   |
 N/A|
 + */
 +struct iommu_cache_invalidate_info {
 +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
 +  __u32   version;
 +/* IOMMU paging structure cache */
 +#define IOMMU_CACHE_INV_TYPE_IOTLB(1 << 0) /* IOMMU IOTLB */
 +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB(1 << 1) /* Device
 IOTLB */ +#define IOMMU_CACHE_INV_TYPE_PASID   (1 << 2) /* PASID
 cache */
>>> Just a clarification, this used to be an enum. You do intend to issue a
>>> single invalidation request on multiple cache types? Perhaps for
>>> virtio-IOMMU? I only see a single cache type in your patch #14. For VT-d
>>> we plan to issue one cache type at a time for now. So this format works
>>> for us.
>>
>> Yes for virtio-iommu I'd like as little overhead as possible, which
>> means a single invalidation message to hit both IOTLB and ATC at once,
>> and the ability to specify multiple pages with @nb_granules.
> The original request/explanation from Jean-Philippe can be found here:
> https://lkml.org/lkml/2019/1/28/1497
> 
>>
>>> However, if multiple cache types are issued in a single invalidation.
>>> They must share a single granularity, not all combinations are valid.
>>> e.g. dev IOTLB does not support domain granularity. Just a reminder,
>>> not an issue. Driver could filter out invalid combinations.
> Sure I will add a comment about this restriction.
>>
>> Agreed. Even the core could filter out invalid combinations based on the
>> table above: IOTLB and domain granularity are N/A.
> I don't get this sentence. What about vtd IOTLB domain-selective
> invalidation:

My mistake: I meant dev-IOTLB and domain granularity are N/A

Thanks,
Jean

> "
> • IOTLB entries caching mappings associated with the specified domain-id
> are invalidated.
> • Paging-structure-cache entries caching mappings associated with the
> specified domain-id are invalidated.
> "
> 
> Thanks
> 
> Eric
> 
>>
>> Thanks,
>> Jean
>>
>>>
 +  __u8cache;
 +  __u8granularity;
 +  __u8padding[2];
 +  union {
 +  __u64   pasid;
 +  struct iommu_inv_addr_info addr_info;
 +  };
 +};
 +
 +
  #endif /* _UAPI_IOMMU_H */
>>>
>>> [Jacob Pan]
>>> ___
>>> iommu mailing list
>>> iommu@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>
>>
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 05/22] iommu: Introduce cache_invalidate API

2019-03-21 Thread Auger Eric
Hi Jacob, Jean-Philippe,

On 3/20/19 5:50 PM, Jean-Philippe Brucker wrote:
> On 20/03/2019 16:37, Jacob Pan wrote:
> [...]
>>> +struct iommu_inv_addr_info {
>>> +#define IOMMU_INV_ADDR_FLAGS_PASID (1 << 0)
>>> +#define IOMMU_INV_ADDR_FLAGS_ARCHID(1 << 1)
>>> +#define IOMMU_INV_ADDR_FLAGS_LEAF  (1 << 2)
>>> +   __u32   flags;
>>> +   __u32   archid;
>>> +   __u64   pasid;
>>> +   __u64   addr;
>>> +   __u64   granule_size;
>>> +   __u64   nb_granules;
>>> +};
>>> +
>>> +/**
>>> + * First level/stage invalidation information
>>> + * @cache: bitfield that allows to select which caches to invalidate
>>> + * @granularity: defines the lowest granularity used for the
>>> invalidation:
>>> + * domain > pasid > addr
>>> + *
>>> + * Not all the combinations of cache/granularity make sense:
>>> + *
>>> + * type |   DEV_IOTLB   | IOTLB |  PASID|
>>> + * granularity |   |   |
>>> cache   |
>>> + * -+---+---+---+
>>> + * DOMAIN  |   N/A |   Y   |
>>> Y   |
>>> + * PASID   |   Y   |   Y   |
>>> Y   |
>>> + * ADDR|   Y   |   Y   |
>>> N/A |
>>> + */
>>> +struct iommu_cache_invalidate_info {
>>> +#define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
>>> +   __u32   version;
>>> +/* IOMMU paging structure cache */
>>> +#define IOMMU_CACHE_INV_TYPE_IOTLB (1 << 0) /* IOMMU IOTLB */
>>> +#define IOMMU_CACHE_INV_TYPE_DEV_IOTLB (1 << 1) /* Device
>>> IOTLB */ +#define IOMMU_CACHE_INV_TYPE_PASID(1 << 2) /* PASID
>>> cache */
>> Just a clarification, this used to be an enum. You do intend to issue a
>> single invalidation request on multiple cache types? Perhaps for
>> virtio-IOMMU? I only see a single cache type in your patch #14. For VT-d
>> we plan to issue one cache type at a time for now. So this format works
>> for us.
> 
> Yes for virtio-iommu I'd like as little overhead as possible, which
> means a single invalidation message to hit both IOTLB and ATC at once,
> and the ability to specify multiple pages with @nb_granules.
The original request/explanation from Jean-Philippe can be found here:
https://lkml.org/lkml/2019/1/28/1497

> 
>> However, if multiple cache types are issued in a single invalidation.
>> They must share a single granularity, not all combinations are valid.
>> e.g. dev IOTLB does not support domain granularity. Just a reminder,
>> not an issue. Driver could filter out invalid combinations.
Sure I will add a comment about this restriction.
> 
> Agreed. Even the core could filter out invalid combinations based on the
> table above: IOTLB and domain granularity are N/A.
I don't get this sentence. What about vtd IOTLB domain-selective
invalidation:
"
• IOTLB entries caching mappings associated with the specified domain-id
are invalidated.
• Paging-structure-cache entries caching mappings associated with the
specified domain-id are invalidated.
"

Thanks

Eric

> 
> Thanks,
> Jean
> 
>>
>>> +   __u8cache;
>>> +   __u8granularity;
>>> +   __u8padding[2];
>>> +   union {
>>> +   __u64   pasid;
>>> +   struct iommu_inv_addr_info addr_info;
>>> +   };
>>> +};
>>> +
>>> +
>>>  #endif /* _UAPI_IOMMU_H */
>>
>> [Jacob Pan]
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu