Re: [PATCH] iommu/vt-d: Add RPLS to quirk list to skip TE disabling

2022-02-23 Thread Lu Baolu

On 2/23/22 2:29 PM, Tejas Upadhyay wrote:

The VT-d spec requires (10.4.4 Global Command Register, TE
field) that:

Hardware implementations supporting DMA draining must drain
any in-flight DMA read/write requests queued within the
Root-Complex before completing the translation enable
command and reflecting the status of the command through
the TES field in the Global Status register.

Unfortunately, some integrated graphic devices fail to do
so after some kind of power state transition. As the
result, the system might stuck in iommu_disable_translati
on(), waiting for the completion of TE transition.

This adds RPLS to a quirk list for those devices and skips
TE disabling if the qurik hits.

Fixes: https://gitlab.freedesktop.org/drm/intel/-/issues/4898
Fixes: LCK-10789


Remove this please.


Tested-by: Raviteja Goud Talla 
Cc: Rodrigo Vivi 
Signed-off-by: Tejas Upadhyay 
---
  drivers/iommu/intel/iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 92fea3fbbb11..be9487516617 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5743,7 +5743,7 @@ static void quirk_igfx_skip_te_disable(struct pci_dev 
*dev)
ver = (dev->device >> 8) & 0xff;
if (ver != 0x45 && ver != 0x46 && ver != 0x4c &&
ver != 0x4e && ver != 0x8a && ver != 0x98 &&
-   ver != 0x9a)
+   ver != 0x9a && ver != 0xa7)
return;
  
  	if (risky_device(dev))


This is a quirk for integrated graphic device. Rodrigo, does this
hardware needs this quirk as well?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 01/11] iommu: Add dma ownership management interfaces

2022-02-23 Thread Lu Baolu

On 2/24/22 1:16 PM, Lu Baolu wrote:

Hi Robin and Jason,

On 2/24/22 2:02 AM, Jason Gunthorpe wrote:

On Wed, Feb 23, 2022 at 06:00:06PM +, Robin Murphy wrote:


...and equivalently just set owner_cnt directly to 0 here. I don't see a
realistic use-case for any driver to claim the same group more than 
once,

and allowing it in the API just feels like opening up various potential
corners for things to get out of sync.

I am Ok if we toss it out to get this merged, as there is no in-kernel
user right now.


So we don't need the owner pointer in the API anymore, right?


Oh, NO.

The owner token represents that the group has been claimed for user
space access. And the default domain auto-attach policy will be changed
accordingly.

So we still need this. Sorry for the noise.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 01/11] iommu: Add dma ownership management interfaces

2022-02-23 Thread Lu Baolu

On 2/24/22 2:00 AM, Robin Murphy wrote:

On 2022-02-18 00:55, Lu Baolu wrote:
[...]

+/**
+ * iommu_group_claim_dma_owner() - Set DMA ownership of a group
+ * @group: The group.
+ * @owner: Caller specified pointer. Used for exclusive ownership.
+ *
+ * This is to support backward compatibility for vfio which manages
+ * the dma ownership in iommu_group level. New invocations on this
+ * interface should be prohibited.
+ */
+int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
+{
+    int ret = 0;
+
+    mutex_lock(>mutex);
+    if (group->owner_cnt) {


To clarify the comment buried in the other thread, I really think we 
should just unconditionally flag the error here...



+    if (group->owner != owner) {
+    ret = -EPERM;
+    goto unlock_out;
+    }
+    } else {
+    if (group->domain && group->domain != group->default_domain) {
+    ret = -EBUSY;
+    goto unlock_out;
+    }
+
+    group->owner = owner;
+    if (group->domain)
+    __iommu_detach_group(group->domain, group);
+    }
+
+    group->owner_cnt++;
+unlock_out:
+    mutex_unlock(>mutex);
+
+    return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_group_claim_dma_owner);
+
+/**
+ * iommu_group_release_dma_owner() - Release DMA ownership of a group
+ * @group: The group.
+ *
+ * Release the DMA ownership claimed by iommu_group_claim_dma_owner().
+ */
+void iommu_group_release_dma_owner(struct iommu_group *group)
+{
+    mutex_lock(>mutex);
+    if (WARN_ON(!group->owner_cnt || !group->owner))
+    goto unlock_out;
+
+    if (--group->owner_cnt > 0)
+    goto unlock_out;


...and equivalently just set owner_cnt directly to 0 here. I don't see a 
realistic use-case for any driver to claim the same group more than 
once, and allowing it in the API just feels like opening up various 
potential corners for things to get out of sync.


Yeah! Both make sense to me. I will also drop the owner token in the API
as it's unnecessary anymore after the change.

I think that's the only significant concern I have left with the series 
as a whole - you can consider my other grumbles non-blocking :)


Thank you and very appreciated for your time!

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 01/11] iommu: Add dma ownership management interfaces

2022-02-23 Thread Lu Baolu

Hi Robin and Jason,

On 2/24/22 2:02 AM, Jason Gunthorpe wrote:

On Wed, Feb 23, 2022 at 06:00:06PM +, Robin Murphy wrote:


...and equivalently just set owner_cnt directly to 0 here. I don't see a
realistic use-case for any driver to claim the same group more than once,
and allowing it in the API just feels like opening up various potential
corners for things to get out of sync.

I am Ok if we toss it out to get this merged, as there is no in-kernel
user right now.


So we don't need the owner pointer in the API anymore, right? As we will
only allow the claiming interface to be called only once, this token is
unnecessary.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 3/8] hisi_ptt: Register PMU device for PTT trace

2022-02-23 Thread Yicong Yang via iommu
On 2022/2/22 19:17, John Garry wrote:
> 
>> +
>>   static irqreturn_t hisi_ptt_irq(int irq, void *context)
>>   {
>>   struct hisi_ptt *hisi_ptt = context;
>> @@ -169,7 +233,7 @@ static irqreturn_t hisi_ptt_irq(int irq, void *context)
>>   if (!(status & HISI_PTT_TRACE_INT_STAT_MASK))
>>   return IRQ_NONE;
>>   -    return IRQ_HANDLED;
>> +    return IRQ_WAKE_THREAD;
>>   }
>>     static void hisi_ptt_irq_free_vectors(void *pdev)
>> @@ -192,8 +256,10 @@ static int hisi_ptt_register_irq(struct hisi_ptt 
>> *hisi_ptt)
>>   if (ret < 0)
>>   return ret;
>>   -    ret = devm_request_irq(>dev, pci_irq_vector(pdev, 
>> HISI_PTT_TRACE_DMA_IRQ),
>> -   hisi_ptt_irq, 0, DRV_NAME, hisi_ptt);
>> +    ret = devm_request_threaded_irq(>dev,
> 
> why add code in patch 2/8 and then immediately change 3/8?
> 

My bad patch split. As replied to Patch 2, the whole IRQ handler part will be 
remove in Patch 2
and we won't have this changing here.

>> +    pci_irq_vector(pdev, HISI_PTT_TRACE_DMA_IRQ),
>> +    hisi_ptt_irq, hisi_ptt_isr, 0,
>> +    DRV_NAME, hisi_ptt);
>>   if (ret) {
>>   pci_err(pdev, "failed to request irq %d, ret = %d.\n",
>>   pci_irq_vector(pdev, HISI_PTT_TRACE_DMA_IRQ), ret);
>> @@ -270,6 +336,429 @@ static void hisi_ptt_init_ctrls(struct hisi_ptt 
>> *hisi_ptt)
>>   hisi_ptt->trace_ctrl.default_cpu = 
>> cpumask_first(cpumask_of_node(dev_to_node(>dev)));
>>   }
>>   +#define HISI_PTT_PMU_FILTER_IS_PORT    BIT(19)
>> +#define HISI_PTT_PMU_FILTER_VAL_MASK    GENMASK(15, 0)
>> +#define HISI_PTT_PMU_DIRECTION_MASK    GENMASK(23, 20)
>> +#define HISI_PTT_PMU_TYPE_MASK    GENMASK(31, 24)
>> +#define HISI_PTT_PMU_FORMAT_MASK    GENMASK(35, 32)
>> +
>> +static ssize_t available_root_port_filters_show(struct device *dev,
>> +    struct device_attribute *attr,
>> +    char *buf)
>> +{
>> +    struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
>> +    struct hisi_ptt_filter_desc *filter;
>> +    int pos = 0;
>> +
>> +    if (list_empty(_ptt->port_filters))
>> +    return sysfs_emit(buf, "\n");
>> +
>> +    mutex_lock(_ptt->mutex);
>> +    list_for_each_entry(filter, _ptt->port_filters, list)
>> +    pos += sysfs_emit_at(buf, pos, "%s    0x%05lx\n",
>> + pci_name(filter->pdev),
>> + hisi_ptt_get_filter_val(filter->pdev) |
>> + HISI_PTT_PMU_FILTER_IS_PORT);
>> +
>> +    mutex_unlock(_ptt->mutex);
>> +    return pos;
>> +}
>> +static DEVICE_ATTR_ADMIN_RO(available_root_port_filters);
>> +
>> +static ssize_t available_requester_filters_show(struct device *dev,
>> +    struct device_attribute *attr,
>> +    char *buf)
>> +{
>> +    struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev));
>> +    struct hisi_ptt_filter_desc *filter;
>> +    int pos = 0;
>> +
>> +    if (list_empty(_ptt->port_filters))
> 
> is this supposed to be req_filters? And is it safe to access without locking?
> 

Thanks for catching this! will fix it.

>> +    return sysfs_emit(buf, "\n");
>> +
>> +    mutex_lock(_ptt->mutex);
>> +    list_for_each_entry(filter, _ptt->req_filters, list)
>> +    pos += sysfs_emit_at(buf, pos, "%s    0x%05x\n",
>> + pci_name(filter->pdev),
>> + hisi_ptt_get_filter_val(filter->pdev));
>> +
>> +    mutex_unlock(_ptt->mutex);
>> +    return pos;
>> +}
>> +static DEVICE_ATTR_ADMIN_RO(available_requester_filters);
>> +
>> +PMU_FORMAT_ATTR(filter,    "config:0-19");
>> +PMU_FORMAT_ATTR(direction,    "config:20-23");
>> +PMU_FORMAT_ATTR(type,    "config:24-31");
>> +PMU_FORMAT_ATTR(format,    "config:32-35");
>> +
>> +static struct attribute *hisi_ptt_pmu_format_attrs[] = {
>> +    _attr_filter.attr,
>> +    _attr_direction.attr,
>> +    _attr_type.attr,
>> +    _attr_format.attr,
>> +    NULL
>> +};
>> +
>> +static struct attribute_group hisi_ptt_pmu_format_group = {
>> +    .name = "format",
>> +    .attrs = hisi_ptt_pmu_format_attrs,
>> +};
>> +
>> +static struct attribute *hisi_ptt_pmu_filter_attrs[] = {
>> +    _attr_available_root_port_filters.attr,
>> +    _attr_available_requester_filters.attr,
>> +    NULL
>> +};
>> +
>> +static struct attribute_group hisi_ptt_pmu_filter_group = {
>> +    .attrs = hisi_ptt_pmu_filter_attrs,
>> +};
>> +
>> +static const struct attribute_group *hisi_ptt_pmu_groups[] = {
>> +    _ptt_pmu_format_group,
>> +    _ptt_pmu_filter_group,
>> +    NULL
>> +};
>> +
>> +/*
>> + * The supported value of the direction parameter. See hisi_ptt.rst
>> + * documentation for more details.
>> + */
>> +static u32 hisi_ptt_trace_available_direction[] = {
>> +    0,
>> +    1,
>> +    2,
>> +    3,
> 
> this seems a very odd array.
> 

I copied part of the definition of this parameter from the hisi_ptt.rst below:

When the desired format is 4DW, directions and 

Re: [PATCH v4 2/8] hwtracing: Add trace function support for HiSilicon PCIe Tune and Trace device

2022-02-23 Thread Yicong Yang via iommu
On 2022/2/22 19:06, John Garry wrote:
> On 21/02/2022 08:43, Yicong Yang wrote:
>> HiSilicon PCIe tune and trace device(PTT) is a PCIe Root Complex
>> integrated Endpoint(RCiEP) device, providing the capability
>> to dynamically monitor and tune the PCIe traffic, and trace
>> the TLP headers.
>>
>> Add the driver for the device to enable the trace function.
>> This patch adds basic function of trace, including the device's
>> probe and initialization, functions for trace buffer allocation
>> and trace enable/disable, register an interrupt handler to
>> simply response to the DMA events. The user interface of trace
>> will be added in the following patch.
>>
> 
> Fill commit message lines upto 75 characters

Hi John,

Thanks for the comments.

The commit message is within 75 characters. I checked again and checkpatch
didn't warning for this.

> 
>> Signed-off-by: Yicong Yang 
>> ---
>>   drivers/Makefile |   1 +
>>   drivers/hwtracing/Kconfig    |   2 +
>>   drivers/hwtracing/ptt/Kconfig    |  11 +
>>   drivers/hwtracing/ptt/Makefile   |   2 +
>>   drivers/hwtracing/ptt/hisi_ptt.c | 370 +++
>>   drivers/hwtracing/ptt/hisi_ptt.h | 149 +
>>   6 files changed, 535 insertions(+)
>>   create mode 100644 drivers/hwtracing/ptt/Kconfig
>>   create mode 100644 drivers/hwtracing/ptt/Makefile
>>   create mode 100644 drivers/hwtracing/ptt/hisi_ptt.c
>>   create mode 100644 drivers/hwtracing/ptt/hisi_ptt.h
>>
>> diff --git a/drivers/Makefile b/drivers/Makefile
>> index a110338c860c..ab3411e4eba5 100644
>> --- a/drivers/Makefile
>> +++ b/drivers/Makefile
>> @@ -175,6 +175,7 @@ obj-$(CONFIG_USB4)    += thunderbolt/
>>   obj-$(CONFIG_CORESIGHT)    += hwtracing/coresight/
>>   obj-y    += hwtracing/intel_th/
>>   obj-$(CONFIG_STM)    += hwtracing/stm/
>> +obj-$(CONFIG_HISI_PTT)    += hwtracing/ptt/
>>   obj-$(CONFIG_ANDROID)    += android/
>>   obj-$(CONFIG_NVMEM)    += nvmem/
>>   obj-$(CONFIG_FPGA)    += fpga/
>> diff --git a/drivers/hwtracing/Kconfig b/drivers/hwtracing/Kconfig
>> index 13085835a636..911ee977103c 100644
>> --- a/drivers/hwtracing/Kconfig
>> +++ b/drivers/hwtracing/Kconfig
>> @@ -5,4 +5,6 @@ source "drivers/hwtracing/stm/Kconfig"
>>     source "drivers/hwtracing/intel_th/Kconfig"
>>   +source "drivers/hwtracing/ptt/Kconfig"
>> +
>>   endmenu
>> diff --git a/drivers/hwtracing/ptt/Kconfig b/drivers/hwtracing/ptt/Kconfig
>> new file mode 100644
>> index ..41fa83921a07
>> --- /dev/null
>> +++ b/drivers/hwtracing/ptt/Kconfig
>> @@ -0,0 +1,11 @@
>> +# SPDX-License-Identifier: GPL-2.0-only
>> +config HISI_PTT
>> +    tristate "HiSilicon PCIe Tune and Trace Device"
>> +    depends on ARM64 && PCI && HAS_DMA && HAS_IOMEM
> 
> why no compile test support?
> 

I'll add compile test on ARM64.

>> +    help
>> +  HiSilicon PCIe Tune and Trace Device exists as a PCIe RCiEP
>> +  device, and it provides support for PCIe traffic tuning and
>> +  tracing TLP headers to the memory.
>> +
>> +  This driver can also be built as a module. If so, the module
>> +  will be called hisi_ptt.
>> diff --git a/drivers/hwtracing/ptt/Makefile b/drivers/hwtracing/ptt/Makefile
>> new file mode 100644
>> index ..908c09a98161
>> --- /dev/null
>> +++ b/drivers/hwtracing/ptt/Makefile
>> @@ -0,0 +1,2 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +obj-$(CONFIG_HISI_PTT) += hisi_ptt.o
>> diff --git a/drivers/hwtracing/ptt/hisi_ptt.c 
>> b/drivers/hwtracing/ptt/hisi_ptt.c
>> new file mode 100644
>> index ..a5b4f09ccd1e
>> --- /dev/null
>> +++ b/drivers/hwtracing/ptt/hisi_ptt.c
>> @@ -0,0 +1,370 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Driver for HiSilicon PCIe tune and trace device
>> + *
>> + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd.
>> + * Author: Yicong Yang 
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "hisi_ptt.h"
>> +
>> +static u16 hisi_ptt_get_filter_val(struct pci_dev *pdev)
>> +{
>> +    if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)
>> +    return BIT(HISI_PCIE_CORE_PORT_ID(PCI_SLOT(pdev->devfn)));
>> +
>> +    return PCI_DEVID(pdev->bus->number, pdev->devfn);
>> +}
>> +
>> +static int hisi_ptt_wait_trace_hw_idle(struct hisi_ptt *hisi_ptt)
>> +{
>> +    u32 val;
>> +
>> +    return readl_poll_timeout_atomic(hisi_ptt->iobase + HISI_PTT_TRACE_STS,
>> + val, val & HISI_PTT_TRACE_IDLE,
>> + HISI_PTT_WAIT_POLL_INTERVAL_US,
>> + HISI_PTT_WAIT_TIMEOUT_US);
>> +}
>> +
>> +static int hisi_ptt_wait_dma_reset_done(struct hisi_ptt *hisi_ptt)
>> +{
>> +    u32 val;
>> +
>> +    return readl_poll_timeout_atomic(hisi_ptt->iobase + 
>> HISI_PTT_TRACE_WR_STS,
>> + val, !val, HISI_PTT_RESET_POLL_INTERVAL_US,
>> + 

Re: [PATCH v6 10/11] vfio: Remove iommu group notifier

2022-02-23 Thread Lu Baolu

Hi Alex,

On 2/24/22 5:53 AM, Alex Williamson wrote:

On Fri, 18 Feb 2022 08:55:20 +0800
Lu Baolu  wrote:


The iommu core and driver core have been enhanced to avoid unsafe driver
binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER)
has been called. There's no need to register iommu group notifier. This
removes the iommu group notifer which contains BUG_ON() and WARN().

The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") allowed all
pcieport drivers to be bound with devices while the group is assigned to
user space. This is not always safe. For example, The shpchp_core driver
relies on the PCI MMIO access for the controller functionality. With its
downstream devices assigned to the userspace, the MMIO might be changed
through user initiated P2P accesses without any notification. This might
break the kernel driver integrity and lead to some unpredictable
consequences. As the result, currently we only allow the portdrv driver.

For any bridge driver, in order to avoiding default kernel DMA ownership
claiming, we should consider:

  1) Does the bridge driver use DMA? Calling pci_set_master() or
 a dma_map_* API is a sure indicate the driver is doing DMA

  2) If the bridge driver uses MMIO, is it tolerant to hostile
 userspace also touching the same MMIO registers via P2P DMA
 attacks?

Conservatively if the driver maps an MMIO region at all, we can say that
it fails the test.


IIUC, there's a chance we're going to break user configurations if
they're assigning devices from a group containing a bridge that uses a
driver other than pcieport.  The recommendation to such an affected user
would be that the previously allowed host bridge driver was unsafe for
this use case and to continue to enable assignment of devices within
that group, the driver should be unbound from the bridge device or
replaced with the pci-stub driver.  Is that right?


Yes. You are right.

Another possible solution (for long term) is to re-audit the bridge
driver code and set the .device_managed_dma field on the premise that
the driver doesn't violate above potential hazards.



Unfortunately I also think a bisect of such a breakage wouldn't land
here, I think it was actually broken in "vfio: Set DMA ownership for
VFIO" since that's where vfio starts to make use of
iommu_group_claim_dma_owner() which should fail due to
pci_dma_configure() calling iommu_device_use_default_domain() for
any driver not identifying itself as driver_managed_dma.


Yes. Great point. Thank you!



If that's correct, can we leave a breadcrumb in the correct commit log
indicating why this potential breakage is intentional and how the
bridge driver might be reconfigured to continue to allow assignment from
within the group more safely?  Thanks,


Sure. I will add below in the commit message of "vfio: Set DMA ownership 
for VFIO":


"
This change disallows some unsafe bridge drivers to bind to non-ACS
bridges while devices under them are assigned to user space. This is an
intentional enhancement and possibly breaks some existing
configurations. The recommendation to such an affected user would be
that the previously allowed host bridge driver was unsafe for this use
case and to continue to enable assignment of devices within that group,
the driver should be unbound from the bridge device or replaced with the
pci-stub driver.

For any bridge driver, we consider it unsafe if it satisfies any of the
following conditions:

  1) The bridge driver uses DMA. Calling pci_set_master() or calling any
 kernel DMA API (dma_map_*() and etc.) is an indicate that the
 driver is doing DMA.

  2) If the bridge driver uses MMIO, it should be tolerant to hostile
 userspace also touching the same MMIO registers via P2P DMA
 attacks.

If the bridge driver turns out to be a safe one, it could be used as
before by setting the driver's .driver_managed_dma field, just like what
we have done in the pcieport driver.
"

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: cleanup swiotlb initialization

2022-02-23 Thread Boris Ostrovsky


On 2/22/22 10:35 AM, Christoph Hellwig wrote:

Hi all,

this series tries to clean up the swiotlb initialization, including
that of swiotlb-xen.  To get there is also removes the x86 iommu table
infrastructure that massively obsfucates the initialization path.

Git tree:

 git://git.infradead.org/users/hch/misc.git swiotlb-init-cleanup



I haven't had a chance to look at this yet but this crashes as dom0:


[   37.377313] BUG: unable to handle page fault for address: c90042880018
[   37.378219] #PF: supervisor read access in kernel mode
[   37.378219] #PF: error_code(0x) - not-present page
[   37.378219] PGD 7c2f2ee067 P4D 7c2f2ee067 PUD 7bf019b067 PMD 105a30067 PTE 0
[   37.378219] Oops:  [#1] PREEMPT SMP NOPTI
[   37.378219] CPU: 14 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc5swiotlb #9
[   37.378219] Hardware name: Oracle Corporation ORACLE SERVER 
E1-2c/ASY,Generic,SM,E1-2c, BIOS 49004900 12/23/2020
[   37.378219] RIP: e030:init_iommu_one+0x248/0x2f0
[   37.378219] Code: 48 89 43 68 48 85 c0 74 c4 be 00 20 00 00 48 89 df e8 ea ee ff 
ff 48 89 43 78 48 85 c0 74 ae c6 83 98 00 00 00 00 48 8b 43 38 <48> 8b 40 18 a8 
01 74 07 83 8b a8 04 00 00 01 f6 83 a8 04 00 00 01
[   37.378219] RSP: e02b:c9004044bd18 EFLAGS: 00010286
[   37.378219] RAX: c9004288 RBX: 888107260800 RCX: 
[   37.378219] RDX: 8000 RSI: ea00041cab80 RDI: 
[   37.378219] RBP: c9004044bd38 R08: 0901 R09: ea00041cab00
[   37.378219] R10: 0002 R11:  R12: c90040435008
[   37.378219] R13: 0008 R14: efa0 R15: 
[   37.378219] FS:  () GS:88fef418() 
knlGS:
[   37.378219] CS:  e030 DS:  ES:  CR0: 80050033
[   37.378219] CR2: c90042880018 CR3: 0260a000 CR4: 00050660
[   37.378219] Call Trace:
[   37.378219]  
[   37.378219]  early_amd_iommu_init+0x3c5/0x72d
[   37.378219]  ? iommu_setup+0x284/0x284
[   37.378219]  state_next+0x158/0x68f
[   37.378219]  ? iommu_setup+0x284/0x284
[   37.378219]  iommu_go_to_state+0x28/0x2d
[   37.378219]  amd_iommu_init+0x15/0x4b
[   37.378219]  ? iommu_setup+0x284/0x284
[   37.378219]  pci_iommu_init+0x12/0x37
[   37.378219]  do_one_initcall+0x48/0x210
[   37.378219]  kernel_init_freeable+0x229/0x28c
[   37.378219]  ? rest_init+0xe0/0xe0
[   37.963966]  kernel_init+0x1a/0x130
[   37.979415]  ret_from_fork+0x22/0x30
[   37.991436]  
[   37.999465] Modules linked in:
[   38.007413] CR2: c90042880018
[   38.019416] ---[ end trace  ]---
[   38.023418] RIP: e030:init_iommu_one+0x248/0x2f0
[   38.023418] Code: 48 89 43 68 48 85 c0 74 c4 be 00 20 00 00 48 89 df e8 ea ee ff 
ff 48 89 43 78 48 85 c0 74 ae c6 83 98 00 00 00 00 48 8b 43 38 <48> 8b 40 18 a8 
01 74 07 83 8b a8 04 00 00 01 f6 83 a8 04 00 00 01
[   38.023418] RSP: e02b:c9004044bd18 EFLAGS: 00010286
[   38.023418] RAX: c9004288 RBX: 888107260800 RCX: 
[   38.155413] RDX: 8000 RSI: ea00041cab80 RDI: 
[   38.175965] Freeing initrd memory: 62640K
[   38.155413] RBP: c9004044bd38 R08: 0901 R09: ea00041cab00
[   38.155413] R10: 0002 R11:  R12: c90040435008
[   38.155413] R13: 0008 R14: efa0 R15: 
[   38.155413] FS:  () GS:88fef418() 
knlGS:
[   38.287414] CS:  e030 DS:  ES:  CR0: 80050033
[   38.309557] CR2: c90042880018 CR3: 0260a000 CR4: 00050660
[   38.332403] Kernel panic - not syncing: Fatal exception
[   38.351414] Rebooting in 20 seconds..



-boris

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v6 10/11] vfio: Remove iommu group notifier

2022-02-23 Thread Alex Williamson
On Fri, 18 Feb 2022 08:55:20 +0800
Lu Baolu  wrote:

> The iommu core and driver core have been enhanced to avoid unsafe driver
> binding to a live group after iommu_group_set_dma_owner(PRIVATE_USER)
> has been called. There's no need to register iommu group notifier. This
> removes the iommu group notifer which contains BUG_ON() and WARN().
> 
> The commit 5f096b14d421b ("vfio: Whitelist PCI bridges") allowed all
> pcieport drivers to be bound with devices while the group is assigned to
> user space. This is not always safe. For example, The shpchp_core driver
> relies on the PCI MMIO access for the controller functionality. With its
> downstream devices assigned to the userspace, the MMIO might be changed
> through user initiated P2P accesses without any notification. This might
> break the kernel driver integrity and lead to some unpredictable
> consequences. As the result, currently we only allow the portdrv driver.
> 
> For any bridge driver, in order to avoiding default kernel DMA ownership
> claiming, we should consider:
> 
>  1) Does the bridge driver use DMA? Calling pci_set_master() or
> a dma_map_* API is a sure indicate the driver is doing DMA
> 
>  2) If the bridge driver uses MMIO, is it tolerant to hostile
> userspace also touching the same MMIO registers via P2P DMA
> attacks?
> 
> Conservatively if the driver maps an MMIO region at all, we can say that
> it fails the test.

IIUC, there's a chance we're going to break user configurations if
they're assigning devices from a group containing a bridge that uses a
driver other than pcieport.  The recommendation to such an affected user
would be that the previously allowed host bridge driver was unsafe for
this use case and to continue to enable assignment of devices within
that group, the driver should be unbound from the bridge device or
replaced with the pci-stub driver.  Is that right?

Unfortunately I also think a bisect of such a breakage wouldn't land
here, I think it was actually broken in "vfio: Set DMA ownership for
VFIO" since that's where vfio starts to make use of
iommu_group_claim_dma_owner() which should fail due to
pci_dma_configure() calling iommu_device_use_default_domain() for
any driver not identifying itself as driver_managed_dma.

If that's correct, can we leave a breadcrumb in the correct commit log
indicating why this potential breakage is intentional and how the
bridge driver might be reconfigured to continue to allow assignment from
within the group more safely?  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 01/11] iommu: Add dma ownership management interfaces

2022-02-23 Thread Jason Gunthorpe via iommu
On Wed, Feb 23, 2022 at 06:20:36PM +, Robin Murphy wrote:
> On 2022-02-23 18:02, Jason Gunthorpe via iommu wrote:
> > On Wed, Feb 23, 2022 at 06:00:06PM +, Robin Murphy wrote:
> > 
> > > ...and equivalently just set owner_cnt directly to 0 here. I don't see a
> > > realistic use-case for any driver to claim the same group more than once,
> > > and allowing it in the API just feels like opening up various potential
> > > corners for things to get out of sync.
> > 
> > I am Ok if we toss it out to get this merged, as there is no in-kernel
> > user right now.
> > 
> > Something will have to come back for iommufd, but we can look at what
> > is best suited then.
> 
> If iommufd plans to be too dumb to keep track of whether it already owns a
> given group or not, I can't see it dealing with attaching that group to a
> single domain no more than once, either ;)

Indeed, this is why I'd like to use the device API :)

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 01/11] iommu: Add dma ownership management interfaces

2022-02-23 Thread Robin Murphy

On 2022-02-23 18:02, Jason Gunthorpe via iommu wrote:

On Wed, Feb 23, 2022 at 06:00:06PM +, Robin Murphy wrote:


...and equivalently just set owner_cnt directly to 0 here. I don't see a
realistic use-case for any driver to claim the same group more than once,
and allowing it in the API just feels like opening up various potential
corners for things to get out of sync.


I am Ok if we toss it out to get this merged, as there is no in-kernel
user right now.

Something will have to come back for iommufd, but we can look at what
is best suited then.


If iommufd plans to be too dumb to keep track of whether it already owns 
a given group or not, I can't see it dealing with attaching that group 
to a single domain no more than once, either ;)


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 07/14] PCI: Add driver dma ownership management

2022-02-23 Thread Jason Gunthorpe via iommu
On Wed, Feb 23, 2022 at 12:00:56PM -0600, Bjorn Helgaas wrote:

> >  static int pci_dma_configure(struct device *dev)
> >  {
> > +   struct pci_driver *driver = to_pci_driver(dev->driver);
> > struct device *bridge;
> > int ret = 0;
> >  
> > +   if (!driver->no_kernel_api_dma) {
> 
> Ugh.  Double negative, totally agree this needs a better name that
> reverses the sense.  Every place you use it, you negate it again.

Greg came up with driver_managed_dma which is in the v6 version:

https://lore.kernel.org/all/20220218005521.172832-5-baolu...@linux.intel.com/

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v8 05/11] ACPI/IORT: Add a helper to retrieve RMR memory regions

2022-02-23 Thread Lorenzo Pieralisi
On Mon, Feb 21, 2022 at 03:43:38PM +, Shameer Kolothum wrote:
> Add helper functions (iort_iommu_get/put_rmrs()) that
> retrieves/releases RMR memory descriptors associated
> with a given IOMMU. This will be used by IOMMU drivers
> to set up necessary mappings.
> 
> Invoke it from the generic iommu helper functions.
> 
> Signed-off-by: Shameer Kolothum 
> ---
>  drivers/acpi/arm64/iort.c | 56 +++
>  drivers/iommu/dma-iommu.c |  4 +++
>  include/linux/acpi_iort.h | 14 ++
>  3 files changed, 74 insertions(+)

I would squash this patch with the previous one - at least the
iommu_dma_get/put_rmrs() are actually used in the patch that is
adding them.

Lorenzo

> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 05da9ebff50a..b2c959c72fb2 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1055,6 +1055,57 @@ static void iort_find_rmrs(struct acpi_iort_node 
> *iommu, struct device *dev,
>   }
>  }
>  
> +/**
> + * iort_iommu_dma_put_rmrs - Free any memory associated with RMRs.
> + * @iommu_fwnode: fwnode associated with IOMMU
> + * @head: Resereved region list
> + *
> + * This function go through the provided reserved region list and
> + * free up memory associated with RMR entries and delete them from
> + * the list.
> + */
> +void iort_iommu_put_rmrs(struct fwnode_handle *iommu_fwnode,
> +  struct list_head *head)
> +{
> + struct iommu_resv_region *e, *tmp;
> +
> + /*
> +  * RMR entries will have mem allocated for fw_data.rmr.sids.
> +  * Free the mem and delete the node.
> +  */
> + list_for_each_entry_safe(e, tmp, head, list) {
> + if (e->fw_data.rmr.sids) {
> + kfree(e->fw_data.rmr.sids);
> + list_del(>list);
> + kfree(e);
> + }
> + }
> +}
> +
> +/**
> + *
> + * iort_iommu_dma_get_rmrs - Retrieve Reserved Memory Regions(RMRs) 
> associated
> + *  with a given IOMMU and dev.
> + * @iommu_fwnode: fwnode associated with IOMMU
> + * @dev: Device associated with RMR(Optional)
> + * @list: RMR list to be populated
> + *
> + * This function populates the RMR list associated with a given IOMMU and
> + * dev(if provided). If dev is NULL, the function populates all the RMRs
> + * associated with the given IOMMU.
> + */
> +void iort_iommu_get_rmrs(struct fwnode_handle *iommu_fwnode, struct device 
> *dev,
> +  struct list_head *head)
> +{
> + struct acpi_iort_node *iommu;
> +
> + iommu = iort_get_iort_node(iommu_fwnode);
> + if (!iommu)
> + return;
> +
> + iort_find_rmrs(iommu, dev, head);
> +}
> +
>  /**
>   * iort_iommu_msi_get_resv_regions - Reserved region driver helper
>   * @dev: Device from iommu_get_resv_regions()
> @@ -1287,6 +1338,11 @@ int iort_iommu_msi_get_resv_regions(struct device 
> *dev, struct list_head *head)
>  { return 0; }
>  int iort_iommu_configure_id(struct device *dev, const u32 *input_id)
>  { return -ENODEV; }
> +void iort_iommu_get_rmrs(struct fwnode_handle *fwnode, struct device *dev,
> +  struct list_head *head)
> +{ }
> +void iort_iommu_put_rmrs(struct fwnode_handle *fwnode, struct list_head 
> *head)
> +{ }
>  #endif
>  
>  static int nc_dma_get_range(struct device *dev, u64 *size)
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 65ab01d5128b..b33e4df85de1 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -382,12 +382,16 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
>  void iommu_dma_get_rmrs(struct fwnode_handle *iommu_fwnode, struct device 
> *dev,
>   struct list_head *list)
>  {
> + if (!is_of_node(iommu_fwnode))
> + iort_iommu_get_rmrs(iommu_fwnode, dev, list);
>  }
>  EXPORT_SYMBOL(iommu_dma_get_rmrs);
>  
>  void iommu_dma_put_rmrs(struct fwnode_handle *iommu_fwnode,
>   struct list_head *list)
>  {
> + if (!is_of_node(iommu_fwnode))
> + iort_iommu_put_rmrs(iommu_fwnode, list);
>  }
>  EXPORT_SYMBOL(iommu_dma_put_rmrs);
>  
> diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h
> index f1f0842a2cb2..212f7f178ec3 100644
> --- a/include/linux/acpi_iort.h
> +++ b/include/linux/acpi_iort.h
> @@ -38,6 +38,10 @@ int iort_dma_get_ranges(struct device *dev, u64 *size);
>  int iort_iommu_configure_id(struct device *dev, const u32 *id_in);
>  int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head 
> *head);
>  phys_addr_t acpi_iort_dma_get_max_cpu_address(void);
> +void iort_iommu_get_rmrs(struct fwnode_handle *iommu_fwnode, struct device 
> *dev,
> +  struct list_head *list);
> +void iort_iommu_put_rmrs(struct fwnode_handle *iommu_fwnode,
> +  struct list_head *list);
>  #else
>  static inline void acpi_iort_init(void) { }
>  static inline u32 

Re: [PATCH v6 01/11] iommu: Add dma ownership management interfaces

2022-02-23 Thread Jason Gunthorpe via iommu
On Wed, Feb 23, 2022 at 06:00:06PM +, Robin Murphy wrote:

> ...and equivalently just set owner_cnt directly to 0 here. I don't see a
> realistic use-case for any driver to claim the same group more than once,
> and allowing it in the API just feels like opening up various potential
> corners for things to get out of sync.

I am Ok if we toss it out to get this merged, as there is no in-kernel
user right now.

Something will have to come back for iommufd, but we can look at what
is best suited then.

Thanks,
Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 07/14] PCI: Add driver dma ownership management

2022-02-23 Thread Bjorn Helgaas
In subject,

s/dma/DMA/ to match the other patches

On Tue, Jan 04, 2022 at 09:56:37AM +0800, Lu Baolu wrote:
> Multiple PCI devices may be placed in the same IOMMU group because
> they cannot be isolated from each other. These devices must either be
> entirely under kernel control or userspace control, never a mixture. This
> checks and sets DMA ownership during driver binding, and release the
> ownership during driver unbinding.
> 
> The device driver may set a new flag (no_kernel_api_dma) to skip calling
> iommu_device_use_dma_api() during the binding process. For instance, the
> userspace framework drivers (vfio etc.) which need to manually claim
> their own dma ownership when assigning the device to userspace.

s/vfio/VFIO/ when used as an acronym (occurs in several patches)

> + * @no_kernel_api_dma: Device driver doesn't use kernel DMA API for DMA.
> + *   Drivers which don't require DMA or want to manually claim the
> + *   owner type (e.g. userspace driver frameworks) could set this
> + *   flag.

s/Drivers which/Drivers that/

>  static int pci_dma_configure(struct device *dev)
>  {
> + struct pci_driver *driver = to_pci_driver(dev->driver);
>   struct device *bridge;
>   int ret = 0;
>  
> + if (!driver->no_kernel_api_dma) {

Ugh.  Double negative, totally agree this needs a better name that
reverses the sense.  Every place you use it, you negate it again.

> + if (ret && !driver->no_kernel_api_dma)
> + iommu_device_unuse_dma_api(dev);

> +static void pci_dma_cleanup(struct device *dev)
> +{
> + struct pci_driver *driver = to_pci_driver(dev->driver);
> +
> + if (!driver->no_kernel_api_dma)
> + iommu_device_unuse_dma_api(dev);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 01/11] iommu: Add dma ownership management interfaces

2022-02-23 Thread Robin Murphy

On 2022-02-18 00:55, Lu Baolu wrote:
[...]

+/**
+ * iommu_group_claim_dma_owner() - Set DMA ownership of a group
+ * @group: The group.
+ * @owner: Caller specified pointer. Used for exclusive ownership.
+ *
+ * This is to support backward compatibility for vfio which manages
+ * the dma ownership in iommu_group level. New invocations on this
+ * interface should be prohibited.
+ */
+int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner)
+{
+   int ret = 0;
+
+   mutex_lock(>mutex);
+   if (group->owner_cnt) {


To clarify the comment buried in the other thread, I really think we 
should just unconditionally flag the error here...



+   if (group->owner != owner) {
+   ret = -EPERM;
+   goto unlock_out;
+   }
+   } else {
+   if (group->domain && group->domain != group->default_domain) {
+   ret = -EBUSY;
+   goto unlock_out;
+   }
+
+   group->owner = owner;
+   if (group->domain)
+   __iommu_detach_group(group->domain, group);
+   }
+
+   group->owner_cnt++;
+unlock_out:
+   mutex_unlock(>mutex);
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_group_claim_dma_owner);
+
+/**
+ * iommu_group_release_dma_owner() - Release DMA ownership of a group
+ * @group: The group.
+ *
+ * Release the DMA ownership claimed by iommu_group_claim_dma_owner().
+ */
+void iommu_group_release_dma_owner(struct iommu_group *group)
+{
+   mutex_lock(>mutex);
+   if (WARN_ON(!group->owner_cnt || !group->owner))
+   goto unlock_out;
+
+   if (--group->owner_cnt > 0)
+   goto unlock_out;


...and equivalently just set owner_cnt directly to 0 here. I don't see a 
realistic use-case for any driver to claim the same group more than 
once, and allowing it in the API just feels like opening up various 
potential corners for things to get out of sync.


I think that's the only significant concern I have left with the series 
as a whole - you can consider my other grumbles non-blocking :)


Thanks,
Robin.


+
+   /*
+* The UNMANAGED domain should be detached before all USER
+* owners have been released.
+*/
+   if (!WARN_ON(group->domain) && group->default_domain)
+   __iommu_attach_group(group->default_domain, group);
+   group->owner = NULL;
+
+unlock_out:
+   mutex_unlock(>mutex);
+}
+EXPORT_SYMBOL_GPL(iommu_group_release_dma_owner);
+
+/**
+ * iommu_group_dma_owner_claimed() - Query group dma ownership status
+ * @group: The group.
+ *
+ * This provides status query on a given group. It is racey and only for
+ * non-binding status reporting.
+ */
+bool iommu_group_dma_owner_claimed(struct iommu_group *group)
+{
+   unsigned int user;
+
+   mutex_lock(>mutex);
+   user = group->owner_cnt;
+   mutex_unlock(>mutex);
+
+   return user;
+}
+EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type

2022-02-23 Thread Greg Kroah-Hartman
On Wed, Feb 23, 2022 at 05:05:23PM +, Robin Murphy wrote:
> On 2022-02-23 16:03, Greg Kroah-Hartman wrote:
> > On Wed, Feb 23, 2022 at 10:30:11AM -0400, Jason Gunthorpe wrote:
> > > On Wed, Feb 23, 2022 at 10:09:01AM -0400, Jason Gunthorpe wrote:
> > > > On Wed, Feb 23, 2022 at 03:06:35PM +0100, Greg Kroah-Hartman wrote:
> > > > > On Wed, Feb 23, 2022 at 09:46:27AM -0400, Jason Gunthorpe wrote:
> > > > > > On Wed, Feb 23, 2022 at 01:04:00PM +, Robin Murphy wrote:
> > > > > > 
> > > > > > > 1 - tmp->driver is non-NULL because tmp is already bound.
> > > > > > >1.a - If tmp->driver->driver_managed_dma == 0, the group must 
> > > > > > > currently be
> > > > > > > DMA-API-owned as a whole. Regardless of what driver dev has 
> > > > > > > unbound from,
> > > > > > > its removal does not release someone else's DMA API 
> > > > > > > (co-)ownership.
> > > > > > 
> > > > > > This is an uncommon locking pattern, but it does work. It relies on
> > > > > > the mutex being an effective synchronization barrier for an unlocked
> > > > > > store:
> > > > > > 
> > > > > >   WRITE_ONCE(dev->driver, NULL)
> > > > > 
> > > > > Only the driver core should be messing with the dev->driver pointer as
> > > > > when it does so, it already has the proper locks held.  Do I need to
> > > > > move that to a "private" location so that nothing outside of the 
> > > > > driver
> > > > > core can mess with it?
> > > > 
> > > > It would be nice, I've seen a abuse and mislocking of it in drivers
> > > 
> > > Though to be clear, what Robin is describing is still keeping the
> > > dev->driver stores in dd.c, just reading it in a lockless way from
> > > other modules.
> > 
> > "other modules" should never care if a device has a driver bound to it
> > because instantly after the check happens, it can change so what ever
> > logic it wanted to do with that knowledge is gone.
> > 
> > Unless the bus lock is held that the device is on, but that should be
> > only accessable from within the driver core as it controls that type of
> > stuff, not any random other part of the kernel.
> > 
> > And in looking at this, ick, there are loads of places in the kernel
> > that are thinking that this pointer being set to something actually
> > means something.  Sometimes it does, but lots of places, it doesn't as
> > it can change.
> 
> That's fine. In this case we're only talking about the low-level IOMMU code
> which has to be in cahoots with the driver core to some degree (via these
> new callbacks) anyway, but if you're uncomfortable about relying on
> dev->driver even there, I can live with that. There are several potential
> places to capture the relevant information in IOMMU API private data, from
> the point in really_probe() where it *is* stable, and then never look at
> dev->driver ever again - even from .dma_cleanup() or future equivalent,
> which is the aspect from whence this whole proof-of-concept tangent span
> out.

For a specific driver core callback, like dma_cleanup(), all is fine,
but you shouldn't be caring about a driver pointer in your bus callback
for stuff like this as you "know" what happened by virtue of the
callback being called.

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 00/12] x86: Trenchboot secure dynamic launch Linux kernel support

2022-02-23 Thread Dave Hansen
On 2/16/22 19:54, Ross Philipson wrote:
> The larger focus of the TrenchBoot project (https://github.com/TrenchBoot) is 
> to
> enhance the boot security and integrity in a unified manner. The first area of
> focus has been on the Trusted Computing Group's Dynamic Launch for 
> establishing
> a hardware Root of Trust for Measurement, also know as DRTM (Dynamic Root of
> Trust for Measurement). The project has been and continues to work on 
> providing
> a unified means to Dynamic Launch that is a cross-platform (Intel and AMD) and
> cross-architecture (x86 and Arm), with our recent involvment in the upcoming
> Arm DRTM specification. The order of introducing DRTM to the Linux kernel
> follows the maturity of DRTM in the architectures. Intel's Trusted eXecution
> Technology (TXT) is present today and only requires a preamble loader, e.g. a
> boot loader, and an OS kernel that is TXT-aware. AMD DRTM implementation has
> been present since the introduction of AMD-V but requires an additional
> component that is AMD specific and referred to in the specification as the
> Secure Loader, which the TrenchBoot project has an active prototype in
> development. Finally Arm's implementation is in specification development 
> stage
> and the project is looking to support it when it becomes available.

What problem is this patch series solving?  Is the same problem solved
in a different way in the kernel today?  What is wrong with that
solution?  What effects will end users see if they apply this series?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type

2022-02-23 Thread Robin Murphy

On 2022-02-23 16:03, Greg Kroah-Hartman wrote:

On Wed, Feb 23, 2022 at 10:30:11AM -0400, Jason Gunthorpe wrote:

On Wed, Feb 23, 2022 at 10:09:01AM -0400, Jason Gunthorpe wrote:

On Wed, Feb 23, 2022 at 03:06:35PM +0100, Greg Kroah-Hartman wrote:

On Wed, Feb 23, 2022 at 09:46:27AM -0400, Jason Gunthorpe wrote:

On Wed, Feb 23, 2022 at 01:04:00PM +, Robin Murphy wrote:


1 - tmp->driver is non-NULL because tmp is already bound.
   1.a - If tmp->driver->driver_managed_dma == 0, the group must currently be
DMA-API-owned as a whole. Regardless of what driver dev has unbound from,
its removal does not release someone else's DMA API (co-)ownership.


This is an uncommon locking pattern, but it does work. It relies on
the mutex being an effective synchronization barrier for an unlocked
store:

  WRITE_ONCE(dev->driver, NULL)


Only the driver core should be messing with the dev->driver pointer as
when it does so, it already has the proper locks held.  Do I need to
move that to a "private" location so that nothing outside of the driver
core can mess with it?


It would be nice, I've seen a abuse and mislocking of it in drivers


Though to be clear, what Robin is describing is still keeping the
dev->driver stores in dd.c, just reading it in a lockless way from
other modules.


"other modules" should never care if a device has a driver bound to it
because instantly after the check happens, it can change so what ever
logic it wanted to do with that knowledge is gone.

Unless the bus lock is held that the device is on, but that should be
only accessable from within the driver core as it controls that type of
stuff, not any random other part of the kernel.

And in looking at this, ick, there are loads of places in the kernel
that are thinking that this pointer being set to something actually
means something.  Sometimes it does, but lots of places, it doesn't as
it can change.


That's fine. In this case we're only talking about the low-level IOMMU 
code which has to be in cahoots with the driver core to some degree (via 
these new callbacks) anyway, but if you're uncomfortable about relying 
on dev->driver even there, I can live with that. There are several 
potential places to capture the relevant information in IOMMU API 
private data, from the point in really_probe() where it *is* stable, and 
then never look at dev->driver ever again - even from .dma_cleanup() or 
future equivalent, which is the aspect from whence this whole 
proof-of-concept tangent span out.


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type

2022-02-23 Thread Greg Kroah-Hartman
On Wed, Feb 23, 2022 at 10:30:11AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 23, 2022 at 10:09:01AM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 23, 2022 at 03:06:35PM +0100, Greg Kroah-Hartman wrote:
> > > On Wed, Feb 23, 2022 at 09:46:27AM -0400, Jason Gunthorpe wrote:
> > > > On Wed, Feb 23, 2022 at 01:04:00PM +, Robin Murphy wrote:
> > > > 
> > > > > 1 - tmp->driver is non-NULL because tmp is already bound.
> > > > >   1.a - If tmp->driver->driver_managed_dma == 0, the group must 
> > > > > currently be
> > > > > DMA-API-owned as a whole. Regardless of what driver dev has unbound 
> > > > > from,
> > > > > its removal does not release someone else's DMA API (co-)ownership.
> > > > 
> > > > This is an uncommon locking pattern, but it does work. It relies on
> > > > the mutex being an effective synchronization barrier for an unlocked
> > > > store:
> > > > 
> > > >   WRITE_ONCE(dev->driver, NULL)
> > > 
> > > Only the driver core should be messing with the dev->driver pointer as
> > > when it does so, it already has the proper locks held.  Do I need to
> > > move that to a "private" location so that nothing outside of the driver
> > > core can mess with it?
> > 
> > It would be nice, I've seen a abuse and mislocking of it in drivers
> 
> Though to be clear, what Robin is describing is still keeping the
> dev->driver stores in dd.c, just reading it in a lockless way from
> other modules.

"other modules" should never care if a device has a driver bound to it
because instantly after the check happens, it can change so what ever
logic it wanted to do with that knowledge is gone.

Unless the bus lock is held that the device is on, but that should be
only accessable from within the driver core as it controls that type of
stuff, not any random other part of the kernel.

And in looking at this, ick, there are loads of places in the kernel
that are thinking that this pointer being set to something actually
means something.  Sometimes it does, but lots of places, it doesn't as
it can change.

In a semi-related incident right now, we currently have a syzbot failure
in the usb gadget code where it was manipulating the ->driver pointer
directly and other parts of the kernel are crashing.  See
https://lore.kernel.org/r/ph0pr11mb58805e3c4cf7d4c41d49bfcfda...@ph0pr11mb5880.namprd11.prod.outlook.com
for the thread.

I'll poke at this as a background task to try to clean up over time.

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type

2022-02-23 Thread Jason Gunthorpe via iommu
On Wed, Feb 23, 2022 at 10:09:01AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 23, 2022 at 03:06:35PM +0100, Greg Kroah-Hartman wrote:
> > On Wed, Feb 23, 2022 at 09:46:27AM -0400, Jason Gunthorpe wrote:
> > > On Wed, Feb 23, 2022 at 01:04:00PM +, Robin Murphy wrote:
> > > 
> > > > 1 - tmp->driver is non-NULL because tmp is already bound.
> > > >   1.a - If tmp->driver->driver_managed_dma == 0, the group must 
> > > > currently be
> > > > DMA-API-owned as a whole. Regardless of what driver dev has unbound 
> > > > from,
> > > > its removal does not release someone else's DMA API (co-)ownership.
> > > 
> > > This is an uncommon locking pattern, but it does work. It relies on
> > > the mutex being an effective synchronization barrier for an unlocked
> > > store:
> > > 
> > > WRITE_ONCE(dev->driver, NULL)
> > 
> > Only the driver core should be messing with the dev->driver pointer as
> > when it does so, it already has the proper locks held.  Do I need to
> > move that to a "private" location so that nothing outside of the driver
> > core can mess with it?
> 
> It would be nice, I've seen a abuse and mislocking of it in drivers

Though to be clear, what Robin is describing is still keeping the
dev->driver stores in dd.c, just reading it in a lockless way from
other modules.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type

2022-02-23 Thread Jason Gunthorpe via iommu
On Wed, Feb 23, 2022 at 03:06:35PM +0100, Greg Kroah-Hartman wrote:
> On Wed, Feb 23, 2022 at 09:46:27AM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 23, 2022 at 01:04:00PM +, Robin Murphy wrote:
> > 
> > > 1 - tmp->driver is non-NULL because tmp is already bound.
> > >   1.a - If tmp->driver->driver_managed_dma == 0, the group must currently 
> > > be
> > > DMA-API-owned as a whole. Regardless of what driver dev has unbound from,
> > > its removal does not release someone else's DMA API (co-)ownership.
> > 
> > This is an uncommon locking pattern, but it does work. It relies on
> > the mutex being an effective synchronization barrier for an unlocked
> > store:
> > 
> >   WRITE_ONCE(dev->driver, NULL)
> 
> Only the driver core should be messing with the dev->driver pointer as
> when it does so, it already has the proper locks held.  Do I need to
> move that to a "private" location so that nothing outside of the driver
> core can mess with it?

It would be nice, I've seen a abuse and mislocking of it in drivers

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type

2022-02-23 Thread Greg Kroah-Hartman
On Wed, Feb 23, 2022 at 09:46:27AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 23, 2022 at 01:04:00PM +, Robin Murphy wrote:
> 
> > 1 - tmp->driver is non-NULL because tmp is already bound.
> >   1.a - If tmp->driver->driver_managed_dma == 0, the group must currently be
> > DMA-API-owned as a whole. Regardless of what driver dev has unbound from,
> > its removal does not release someone else's DMA API (co-)ownership.
> 
> This is an uncommon locking pattern, but it does work. It relies on
> the mutex being an effective synchronization barrier for an unlocked
> store:
> 
> WRITE_ONCE(dev->driver, NULL)

Only the driver core should be messing with the dev->driver pointer as
when it does so, it already has the proper locks held.  Do I need to
move that to a "private" location so that nothing outside of the driver
core can mess with it?

thanks,

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type

2022-02-23 Thread Jason Gunthorpe via iommu
On Wed, Feb 23, 2022 at 01:04:00PM +, Robin Murphy wrote:

> 1 - tmp->driver is non-NULL because tmp is already bound.
>   1.a - If tmp->driver->driver_managed_dma == 0, the group must currently be
> DMA-API-owned as a whole. Regardless of what driver dev has unbound from,
> its removal does not release someone else's DMA API (co-)ownership.

This is an uncommon locking pattern, but it does work. It relies on
the mutex being an effective synchronization barrier for an unlocked
store:

  WRITE_ONCE(dev->driver, NULL)

 mutex_lock(>lock)
 READ_ONCE(dev->driver) != NULL and no UAF
 mutex_unlock(>lock)

  mutex_lock(>lock)
  tmp = READ_ONCE(dev1->driver);
  if (tmp && tmp->blah) [..]
  mutex_unlock(>lock)
 mutex_lock(>lock)
 READ_ONCE(dev->driver) == NULL
 mutex_unlock(>lock)

  /* No other CPU can UAF dev->driver */
  kfree(driver)

Ie the CPU setting driver cannot pass to the next step without all
other CPUs observing the new value because of the release/acquire built
into the mutex_lock.

It is tricky, and can work in this instance, but the pattern's unlocked
design relies on ordering between the WRITE_ONCE and the locks - and
that ordering in dd.c isn't like that today.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v6 02/11] driver core: Add dma_cleanup callback in bus_type

2022-02-23 Thread Robin Murphy

On 2022-02-23 05:01, Lu Baolu wrote:

On 2/23/22 7:53 AM, Jason Gunthorpe wrote:

To spell it out, the scheme I'm proposing looks like this:

Well, I already got this, it is what is in driver_or_DMA_API_token()
that matters

I think you are suggesting to do something like:

    if (!READ_ONCE(dev->driver) ||  ???)
    return NULL;
    return group;  // A DMA_API 'token'

Which is locklessly reading dev->driver, and why you are talking about
races, I guess.



I am afraid that we are not able to implement a race-free
driver_or_DMA_API_token() helper. The lock problem between the IOMMU
core and driver core always exists.


It's not race-free. My point is that the races aren't harmful because 
what we might infer from the "wrong" information still leads to the 
right action. dev->driver is obviously always valid and constant for 
*claiming* ownership, since that either happens for the DMA API in the 
middle of really_probe() binding driver to dev, or while driver is 
actively using dev and calling iommu_group_claim_dma_owner(). The races 
exist during remove, but both probe and remove are serialised on the 
group mutex after respectively setting/clearing dev->driver, there are 
only 4 possibilities for the state of any other group sibling "tmp" 
during the time that dev holds that mutex in its remove path:


1 - tmp->driver is non-NULL because tmp is already bound.
  1.a - If tmp->driver->driver_managed_dma == 0, the group must 
currently be DMA-API-owned as a whole. Regardless of what driver dev has 
unbound from, its removal does not release someone else's DMA API 
(co-)ownership.
  1.b - If tmp->driver->driver_managed_dma == 1 and tmp->driver == 
group->owner, then dev must have unbound from the same driver, but 
either way that driver has not yet released ownership so dev's removal 
does not change anything.
  1.c - If tmp->driver->driver_managed_dma == 1 and tmp->driver != 
group->owner, it doesn't matter. Even if tmp->driver is currently 
waiting to attempt to claim ownership it can't do so until we release 
the mutex.


2 - tmp->driver is non-NULL because tmp is in the process of binding.
  2.a - If tmp->driver->driver_managed_dma == 0, tmp can be assumed to 
be waiting on the group mutex to claim DMA API ownership.
2.a.i - If the group is DMA API owned, this race is simply a 
short-cut to case 1.a - dev's ownership is effectively handed off 
directly to tmp, rather than potentially being released and immediately 
reclaimed. Once tmp gets its turn, it finds the group already 
DMA-API-owned as it wanted and all is well. This may be "unfair" if an 
explicit claim was also waiting, but not incorrect.
2.a.ii - If the group is driver-owned, it doesn't matter. Removing 
dev does not change the current ownership, and tmp's probe will 
eventually get its turn and find whatever it finds at that point in future.
  2.b - If tmp->driver->driver_managed_dma == 1, it doesn't matter. 
Either that driver already owns the group, or it might try to claim it 
after we've resolved dev's removal and released the mutex, in which case 
it will find whatever it finds.


3 - tmp->driver is NULL because tmp is unbound. Obviously no impact.

4 - tmp->driver is NULL because tmp is in the process of unbinding.
  4.a - If the group is DMA-API-owned, either way tmp has no further 
influence.
4.a.i - If tmp has unbound from a driver_managed_dma=0 driver, it 
must be waiting to release its DMA API ownership, thus if tmp would 
otherwise be the only remaining DMA API owner, the race is that dev's 
removal releases ownership on behalf of both devices. When tmp's own 
removal subsequently gets the mutex, it will either see that the group 
is already unowned, or maybe that someone else has re-claimed it in the 
interim, and either way do nothing, which is fine.
4.a.ii - If tmp has unbound from a driver_managed_dma=1 driver, it 
doesn't matter, as in case 1.c.
  4.b - If the group is driver-owned, it doesn't matter. That ownership 
can only change if that driver releases it, which isn't happening while 
we hold the mutex.


As I said yesterday, I'm really just airing out an idea here; I might 
write up some proper patches as part of the bus ops work, and we can 
give it proper scrutiny then.



For example, when we implemented iommu_group_store_type() to change the
default domain type of a device through sysfs, we could only comprised
and limited this functionality to singleton groups to avoid the lock
issue.


Indeed, but once the probe and remove paths for grouped devices have to 
serialise on the group mutex, as we're introducing here, the story 
changes and we gain a lot more power. In fact that's a good point I 
hadn't considered yet - that sysfs constraint is functionally equivalent 
to the one in iommu_attach_device(), so once we land this ownership 
concept we should be free to relax it from "singleton" to "unowned" in 
much the same way as your other series is doing for attach.



Unfortunately, that 

Re: [PATCH V2 1/2] Swiotlb: Add swiotlb_alloc_from_low_pages switch

2022-02-23 Thread Tianyu Lan




On 2/23/2022 12:00 AM, Christoph Hellwig wrote:

On Tue, Feb 22, 2022 at 11:07:19PM +0800, Tianyu Lan wrote:

Thanks for your comment. That means we need to expose an
swiotlb_device_init() interface to allocate bounce buffer and initialize
io tlb mem entry. DMA API Current  rmem_swiotlb_device_init() only works
for platform with device tree. The new API should be called in the bus
driver or new DMA API. Could you check whether this is a right way before
we start the work.


Do these VMs use ACPI?  We'd probably really want some kind of higher
level configuration and not have the drivers request it themselves.


Yes, Hyper-V isolation VM uses ACPI. Devices are enumerated via vmbus 
host and there is no child device information in ACPI table. The host 
driver seems to be the right place to call new API.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 3/9] gpu: host1x: Add context device management code

2022-02-23 Thread Mikko Perttunen

On 2/22/22 18:24, Christoph Hellwig wrote:

On Fri, Feb 18, 2022 at 01:39:46PM +0200, Mikko Perttunen wrote:

+
+/*
+ * Due to an issue with T194 NVENC, only 38 bits can be used.
+ * Anyway, 256GiB of IOVA ought to be enough for anyone.
+ */
+static dma_addr_t context_device_dma_mask = DMA_BIT_MASK(38);


You need a mask per device.  Please don't share the same variable
for multiple masks.


+EXPORT_SYMBOL(host1x_context_alloc);


All this low-level code really should be EXPORT_SYMBOL_GPL.


Thanks, will fix (and same for patch 2).

Mikko
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu