[PATCH AUTOSEL 5.14 29/40] swiotlb-xen: ensure to issue well-formed XENMEM_exchange requests

2021-09-27 Thread Sasha Levin
From: Jan Beulich 

[ Upstream commit 9074c79b62b6e0d91d7f716c6e4e9968eaf9e043 ]

While the hypervisor hasn't been enforcing this, we would still better
avoid issuing requests with GFNs not aligned to the requested order.
Instead of altering the value also in the call to panic(), drop it
there for being static and hence easy to determine without being part
of the panic message.

Signed-off-by: Jan Beulich 
Reviewed-by: Stefano Stabellini 
Link: https://lore.kernel.org/r/7b3998e3-1233-4e5a-89ec-d740e77eb...@suse.com
Signed-off-by: Juergen Gross 
Signed-off-by: Sasha Levin 
---
 drivers/xen/swiotlb-xen.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index dbb18dc956f3..de4f55154d49 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -232,10 +232,11 @@ void __init xen_swiotlb_init_early(void)
/*
 * Get IO TLB memory from any location.
 */
-   start = memblock_alloc(PAGE_ALIGN(bytes), PAGE_SIZE);
+   start = memblock_alloc(PAGE_ALIGN(bytes),
+  IO_TLB_SEGSIZE << IO_TLB_SHIFT);
if (!start)
-   panic("%s: Failed to allocate %lu bytes align=0x%lx\n",
- __func__, PAGE_ALIGN(bytes), PAGE_SIZE);
+   panic("%s: Failed to allocate %lu bytes\n",
+ __func__, PAGE_ALIGN(bytes));
 
/*
 * And replace that memory with pages under 4GB.
-- 
2.33.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver

2021-09-27 Thread Christoph Hellwig
On Mon, Sep 27, 2021 at 10:26:43PM +0800, Tianyu Lan wrote:
> Hi Christoph:
> Gentile ping. The swiotlb and shared memory mapping changes in this
> patchset needs your reivew. Could you have a look?

I'm a little too busy for a review of such a huge patchset right now.
That being said here are my comments from a very quick review:

 - the bare memremap usage in swiotlb looks strange and I'd
   definitively expect a well documented wrapper.
 - given that we can now hand out swiotlb memory for coherent mappings
   we need to carefully audit what happens when this memremaped
   memory gets mmaped or used through dma_get_sgtable
 - the netscv changes I'm not happy with at all.  A large part of it
   is that the driver already has a bad structure, but this series
   is making it significantly worse.  We'll need to find a way
   to use the proper dma mapping abstractions here.  One option
   if you want to stick to the double vmapped buffer would be something
   like using dma_alloc_noncontigous plus a variant of
   dma_vmap_noncontiguous that takes the shared_gpa_boundary into
   account.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/mediatek: fix out-of-range warning with clang

2021-09-27 Thread Yong Wu
On Mon, 2021-09-27 at 14:18 +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> clang-14 notices that a comparison is never true when
> CONFIG_PHYS_ADDR_T_64BIT is disabled:
> 
> drivers/iommu/mtk_iommu.c:553:34: error: result of comparison of
> constant 5368709120 with expression of type 'phys_addr_t' (aka
> 'unsigned int') is always false [-Werror,-Wtautological-constant-out-
> of-range-compare]
> if (dom->data->enable_4GB && pa >=
> MTK_IOMMU_4GB_MODE_REMAP_BASE)
>  ~~
> ^  ~
> 
> Add an explicit check for the type of the variable to skip the check
> and the warning in that case.
> 
> Fixes: b4dad40e4f35 ("iommu/mediatek: Adjust the PA for the 4GB
> Mode")
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Yong Wu 

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-27 Thread Dave Hansen
On 9/27/21 2:02 PM, Luck, Tony wrote:
> Or are you thinking of a helper that does both the check
> and the update ... so the code here could be:
> 
>   update_one_xsave_feature(XFEATURE_PASID, _val, sizeof(msr_val));
> 
> With the helper being something like:
> 
> void update_one_xsave_feature(enum xfeature xfeature, void *data, size_t size)
> {
>   if (xsave_state_in_memory(args ...)) {
>   addr = get_xsave_addr(xsave, xfeature);
>   memcpy(addr, data, size);
>   xsave->header.xfeatures |= (1 << xfeature);
>   return;
>   }
> 
>   switch (xfeature) {
>   case XFEATURE_PASID:
>   wrmsrl(MSR_IA32_PASID, *(u64 *)data);
>   break;
> 
>   case each_of_the_other_XFEATURE_enums:
>   code to update registers for that XFEATURE
>   }
> }
> 
> either way needs the definitive correct coding for xsave_state_in_memory()

That's close to what we want.

The size should probably be implicit.  If it isn't implicit, it needs to
at least be double-checked against the state sizes.

Not to get too fancy, but I think we also want to have a "replace"
operation which is separate from the "update".  Think of a case where
you are trying to set a bit:

struct pkru_state *pk = start_update_xstate(tsk, XSTATE_PKRU);
pk->pkru |= 0x100;
finish_update_xstate(tsk, XSTATE_PKRU, pk);

versus setting a whole new value:

struct pkru_state *pk = start_replace_xstate(tsk, XSTATE_PKRU);
memset(pkru, sizeof(*pk), 0);
pk->pkru = 0x1234;
finish_replace_xstate(tsk, XSTATE_PKRU, pk);

They look similar, but they actually might have very different costs for
big states like AMX.  We can also do some very different debugging for
them.  In normal operation, these _can_ just return pointers directly to
the fpu...->xstate in some case.  But, if we're debugging, we could
kmalloc() a buffer and do sanity checking before it goes into the task
buffer.

We don't have to do any of that fancy stuff now.  We don't even need to
do an "update" if all we need for now for XFEATURE_PASID is a "replace".

1. Hide whether we need to write to real registers
2. Hide whether we need to update the in-memory image
3. Hide other FPU infrastructure like the TIF flag.
4. Make the users deal with a *whole* state in the replace API
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/9] cxl: Convert "RBI" to enum

2021-09-27 Thread Dan Williams
Please spell out "register block indicator" in the subject so that the
shortlog remains somewhat readable.

On Thu, Sep 23, 2021 at 10:27 AM Ben Widawsky  wrote:
>
> In preparation for passing around the Register Block Indicator (RBI) as
> a parameter, it is desirable to convert the type to an enum so that the
> interface can use a well defined type checked parameter.

C wouldn't type check this unless it failed an integer conversion,
right? It would need to be a struct to get useful type checking.

I don't mind this for the self documenting properties it has for the
functions that will take this as a parameter, but maybe clarify what
you mean by type checked parameter?

>
> As a result of this change, should future versions of the spec add
> sparsely defined identifiers, it could become a problem if checking for
> invalid identifiers since the code currently checks for the max
> identifier. This is not an issue with current spec, and the algorithm to
> obtain the register blocks will change before those possible additions
> are made.

In general let's not spend changelog space trying to guess what future
specs may or may not do. I.e. I think this text can be dropped,
especially because enums can support sparse number spaces.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)

2021-09-27 Thread Krishna Reddy via iommu
Hi Eric,
> This is based on Jean-Philippe's
> [PATCH v14 00/10] iommu: I/O page faults for SMMUv3
> https://www.spinics.net/lists/arm-kernel/msg886518.html
> (including the patches that were not pulled for 5.13)
>

Jean's patches have been merged to v5.14.
Do you anticipate IOMMU/VFIO part patches getting into upstream kernel soon?

-KR
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/8] x86/traps: Demand-populate PASID MSR via #GP

2021-09-27 Thread Luck, Tony
On Thu, Sep 23, 2021 at 04:17:05PM -0700, Andy Lutomirski wrote:
> > +   fpregs_lock();
> > +
> > +   /*
> > +* If the task's FPU doesn't need to be loaded or is valid, directly
> > +* write the IA32_PASID MSR. Otherwise, write the PASID state and
> > +* the MSR will be loaded from the PASID state before returning to
> > +* the user.
> > +*/
> > +   if (!test_thread_flag(TIF_NEED_FPU_LOAD) ||
> > +   fpregs_state_valid(fpu, smp_processor_id())) {
> > +   wrmsrl(MSR_IA32_PASID, msr_val);
> 
> Let me try to decode this.
> 
> If the current task's FPU state is live or if the state is in memory but the 
> CPU regs just happen to match the copy in memory, then write the MSR.  Else 
> write the value to memory.
> 
> This is wrong.  If !TIF_NEED_FPU_LOAD && fpregs_state_valid, you MUST NOT 
> MODIFY FPU STATE.  This is not negotiable -- you will break coherence between 
> CPU regs and the memory image.  The way you modify the current task's state 
> is either you modify it in CPU regs (if the kernel knows that the CPU regs 
> are the one and only source of truth) OR you modify it in memory and 
> invalidate any preserved copies (by zapping last_cpu). 
> 
> In any event, that particular bit of logic really doesn't belong in here -- 
> it belongs in some helper that gets it right, once.

Andy,

A helper sounds like a good idea. Can you flesh out what
you would like that to look like?

Is it just the "where is the live register state?" so the
above could be written:

if (xsave_state_in_memory(args ...))
update pasid bit of xsave state in memory
else
wrmsrl(MSR_IA32_PASID, msr_val);

Or are you thinking of a helper that does both the check
and the update ... so the code here could be:

update_one_xsave_feature(XFEATURE_PASID, _val, sizeof(msr_val));

With the helper being something like:

void update_one_xsave_feature(enum xfeature xfeature, void *data, size_t size)
{
if (xsave_state_in_memory(args ...)) {
addr = get_xsave_addr(xsave, xfeature);
memcpy(addr, data, size);
xsave->header.xfeatures |= (1 << xfeature);
return;
}

switch (xfeature) {
case XFEATURE_PASID:
wrmsrl(MSR_IA32_PASID, *(u64 *)data);
break;

case each_of_the_other_XFEATURE_enums:
code to update registers for that XFEATURE
}
}

either way needs the definitive correct coding for xsave_state_in_memory()

-Tony
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] [RFC] qcom_scm: hide Kconfig symbol

2021-09-27 Thread Arnd Bergmann
On Mon, Sep 27, 2021 at 10:42 PM Bjorn Andersson
 wrote:
> On Mon 27 Sep 13:15 PDT 2021, Arnd Bergmann wrote:
> > On Mon, Sep 27, 2021 at 9:52 PM Bjorn Andersson 
> >  wrote:
> >
> > An easier option might be to find a way to build QCOM_SCM without
> > RESET_CONTROLLER for compile testing purposes. I don't know
> > what would break from that.
> >
>
> Afaict the reset API is properly stubbed and RESET_CONTROLLER is a bool,
> so I think we can simply drop the "select" and the kernel will still
> compile fine in all combinations.
>
> When it comes to runtime, we currently select RESET_CONTROLLER from the
> Qualcomm common clocks. If that is dropped (why would it...) it seems
> possible to build a custom kernel for msm8916 that we can boot and miss
> the stubbed out "mss restart" reset line from the SCM.
>
>
> So, let's just drop the select RESET_CONTROLLER from SCM for now.

Ok, I've made that change locally, giving it more time on the randconfig
build box now.

   Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] [RFC] qcom_scm: hide Kconfig symbol

2021-09-27 Thread Bjorn Andersson
On Mon 27 Sep 13:15 PDT 2021, Arnd Bergmann wrote:

> On Mon, Sep 27, 2021 at 9:52 PM Bjorn Andersson
>  wrote:
> > On Mon 27 Sep 08:22 PDT 2021, Arnd Bergmann wrote:
> > > From: Arnd Bergmann 
> > >
> > >  - To avoid a circular dependency chain involving RESET_CONTROLLER
> > >and PINCTRL_SUNXI, change the 'depends on RESET_CONTROLLER' in
> > >the latter one to 'select'.
> >
> > Can you please help me understand why this is part of the same patch?
> 
> This can be done as a preparatory patch if we decide to do it this way,
> for the review it seemed better to spell out that this is required.
> 
> I still hope that we can avoid adding another 'select RESET_CONTROLLER'
> if someone can figure out what to do instead.
> 

Okay, thanks.

> The problem here is that QCOM_SCM selects RESET_CONTROLLER,
> and turning that into 'depends on' would in turn mean that any driver that
> wants to select QCOM_SCM would have to have the same RESET_CONTROLLER
> dependency.
> 

Right, and that will just be another thing we'll get wrong across the
tree.

> An easier option might be to find a way to build QCOM_SCM without
> RESET_CONTROLLER for compile testing purposes. I don't know
> what would break from that.
> 

Afaict the reset API is properly stubbed and RESET_CONTROLLER is a bool,
so I think we can simply drop the "select" and the kernel will still
compile fine in all combinations.

When it comes to runtime, we currently select RESET_CONTROLLER from the
Qualcomm common clocks. If that is dropped (why would it...) it seems
possible to build a custom kernel for msm8916 that we can boot and miss
the stubbed out "mss restart" reset line from the SCM.


So, let's just drop the select RESET_CONTROLLER from SCM for now.

Regards,
Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] [RFC] qcom_scm: hide Kconfig symbol

2021-09-27 Thread Arnd Bergmann
On Mon, Sep 27, 2021 at 9:52 PM Bjorn Andersson
 wrote:
> On Mon 27 Sep 08:22 PDT 2021, Arnd Bergmann wrote:
> > From: Arnd Bergmann 
> >
> >  - To avoid a circular dependency chain involving RESET_CONTROLLER
> >and PINCTRL_SUNXI, change the 'depends on RESET_CONTROLLER' in
> >the latter one to 'select'.
>
> Can you please help me understand why this is part of the same patch?

This can be done as a preparatory patch if we decide to do it this way,
for the review it seemed better to spell out that this is required.

I still hope that we can avoid adding another 'select RESET_CONTROLLER'
if someone can figure out what to do instead.

The problem here is that QCOM_SCM selects RESET_CONTROLLER,
and turning that into 'depends on' would in turn mean that any driver that
wants to select QCOM_SCM would have to have the same RESET_CONTROLLER
dependency.

An easier option might be to find a way to build QCOM_SCM without
RESET_CONTROLLER for compile testing purposes. I don't know
what would break from that.

 Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 04/20] PCI/P2PDMA: introduce helpers for dma_map_sg implementations

2021-09-27 Thread Logan Gunthorpe



On 2021-09-27 12:53 p.m., Bjorn Helgaas wrote:
> Acked-by: Bjorn Helgaas 
> 
> Ditto.

Thanks Bjorn, I'll make these changes and add your Acks for subsequent
postings.

Logan
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] [RFC] qcom_scm: hide Kconfig symbol

2021-09-27 Thread Bjorn Andersson
On Mon 27 Sep 08:22 PDT 2021, Arnd Bergmann wrote:

> From: Arnd Bergmann 
> 
> Now that SCM can be a loadable module, we have to add another
> dependency to avoid link failures when ipa or adreno-gpu are
> built-in:
> 
> aarch64-linux-ld: drivers/net/ipa/ipa_main.o: in function `ipa_probe':
> ipa_main.c:(.text+0xfc4): undefined reference to `qcom_scm_is_available'
> 
> ld.lld: error: undefined symbol: qcom_scm_is_available
> >>> referenced by adreno_gpu.c
> >>>   gpu/drm/msm/adreno/adreno_gpu.o:(adreno_zap_shader_load) in 
> >>> archive drivers/built-in.a
> 
> This can happen when CONFIG_ARCH_QCOM is disabled and we don't select
> QCOM_MDT_LOADER, but some other module selects QCOM_SCM. Ideally we'd
> use a similar dependency here to what we have for QCOM_RPROC_COMMON,
> but that causes dependency loops from other things selecting QCOM_SCM.
> 
> This appears to be an endless problem, so try something different this
> time:
> 
>  - CONFIG_QCOM_SCM becomes a hidden symbol that nothing 'depends on'
>but that is simply selected by all of its users
> 
>  - All the stubs in include/linux/qcom_scm.h can go away
> 
>  - arm-smccc.h needs to provide a stub for __arm_smccc_smc() to
>allow compile-testing QCOM_SCM on all architectures.
> 
>  - To avoid a circular dependency chain involving RESET_CONTROLLER
>and PINCTRL_SUNXI, change the 'depends on RESET_CONTROLLER' in
>the latter one to 'select'.

Can you please help me understand why this is part of the same patch?

> 
> The last bit is rather annoying, as drivers should generally never
> 'select' another subsystem, and about half the users of the reset
> controller interface do this anyway.
> 
> Nevertheless, this version seems to pass all my randconfig tests
> and is more robust than any of the prior versions.
> 
> Comments?
> 

I like it!

It's less confusing than the current scheme, so should be easier to
get right. And I like the fact that we don't need the stubs anymore.


@John, could you please have a look at this as well, wrt GKI.

Regards,
Bjorn

> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/firmware/Kconfig|  4 +-
>  drivers/gpu/drm/msm/Kconfig |  4 +-
>  drivers/iommu/Kconfig   |  2 +-
>  drivers/media/platform/Kconfig  |  2 +-
>  drivers/mmc/host/Kconfig|  2 +-
>  drivers/net/ipa/Kconfig |  1 +
>  drivers/net/wireless/ath/ath10k/Kconfig |  2 +-
>  drivers/pinctrl/qcom/Kconfig|  3 +-
>  drivers/pinctrl/sunxi/Kconfig   |  6 +--
>  include/linux/arm-smccc.h   | 10 
>  include/linux/qcom_scm.h| 71 -
>  11 files changed, 23 insertions(+), 84 deletions(-)
> 
> diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
> index 220a58cf0a44..f7dd82ef0b9c 100644
> --- a/drivers/firmware/Kconfig
> +++ b/drivers/firmware/Kconfig
> @@ -203,9 +203,7 @@ config INTEL_STRATIX10_RSU
> Say Y here if you want Intel RSU support.
>  
>  config QCOM_SCM
> - tristate "Qcom SCM driver"
> - depends on ARM || ARM64
> - depends on HAVE_ARM_SMCCC
> + tristate
>   select RESET_CONTROLLER
>  
>  config QCOM_SCM_DOWNLOAD_MODE_DEFAULT
> diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
> index e9c6af78b1d7..3ddf739a6f9b 100644
> --- a/drivers/gpu/drm/msm/Kconfig
> +++ b/drivers/gpu/drm/msm/Kconfig
> @@ -17,7 +17,7 @@ config DRM_MSM
>   select DRM_SCHED
>   select SHMEM
>   select TMPFS
> - select QCOM_SCM if ARCH_QCOM
> + select QCOM_SCM
>   select WANT_DEV_COREDUMP
>   select SND_SOC_HDMI_CODEC if SND_SOC
>   select SYNC_FILE
> @@ -55,7 +55,7 @@ config DRM_MSM_GPU_SUDO
>  
>  config DRM_MSM_HDMI_HDCP
>   bool "Enable HDMI HDCP support in MSM DRM driver"
> - depends on DRM_MSM && QCOM_SCM
> + depends on DRM_MSM
>   default y
>   help
> Choose this option to enable HDCP state machine
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 124c41adeca1..989c83acbfee 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -308,7 +308,7 @@ config APPLE_DART
>  config ARM_SMMU
>   tristate "ARM Ltd. System MMU (SMMU) Support"
>   depends on ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)
> - depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
> + select QCOM_SCM
>   select IOMMU_API
>   select IOMMU_IO_PGTABLE_LPAE
>   select ARM_DMA_USE_IOMMU if ARM
> diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
> index 157c924686e4..80321e03809a 100644
> --- a/drivers/media/platform/Kconfig
> +++ b/drivers/media/platform/Kconfig
> @@ -565,7 +565,7 @@ config VIDEO_QCOM_VENUS
>   depends on VIDEO_DEV && VIDEO_V4L2 && QCOM_SMEM
>   depends on (ARCH_QCOM && IOMMU_DMA) || COMPILE_TEST
>   select QCOM_MDT_LOADER if ARCH_QCOM
> - select QCOM_SCM if ARCH_QCOM
> + select QCOM_SCM
>  

Re: [PATCH v3 04/20] PCI/P2PDMA: introduce helpers for dma_map_sg implementations

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:44PM -0600, Logan Gunthorpe wrote:
> Add pci_p2pdma_map_segment() as a helper for simple dma_map_sg()
> implementations. It takes an scatterlist segment that must point to a
> pci_p2pdma struct page and will map it if the mapping requires a bus
> address.
> 
> The return value indicates whether the mapping required a bus address
> or whether the caller still needs to map the segment normally. If the
> segment should not be mapped, -EREMOTEIO is returned.
> 
> This helper uses a state structure to track the changes to the
> pgmap across calls and avoid needing to lookup into the xarray for
> every page.
> 
> Also add pci_p2pdma_map_bus_segment() which is useful for IOMMU
> dma_map_sg() implementations where the sg segment containing the page
> differs from the sg segment containing the DMA address.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

Ditto.

> ---
>  drivers/pci/p2pdma.c   | 59 ++
>  include/linux/pci-p2pdma.h | 21 ++
>  2 files changed, 80 insertions(+)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index b656d8c801a7..58c34f1f1473 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -943,6 +943,65 @@ void pci_p2pdma_unmap_sg_attrs(struct device *dev, 
> struct scatterlist *sg,
>  }
>  EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
>  
> +/**
> + * pci_p2pdma_map_segment - map an sg segment determining the mapping type
> + * @state: State structure that should be declared outside of the 
> for_each_sg()
> + *   loop and initialized to zero.
> + * @dev: DMA device that's doing the mapping operation
> + * @sg: scatterlist segment to map
> + *
> + * This is a helper to be used by non-iommu dma_map_sg() implementations 
> where
> + * the sg segment is the same for the page_link and the dma_address.

s/non-iommu/non-IOMMU/

> + *
> + * Attempt to map a single segment in an SGL with the PCI bus address.
> + * The segment must point to a PCI P2PDMA page and thus must be
> + * wrapped in a is_pci_p2pdma_page(sg_page(sg)) check.
> + *
> + * Returns the type of mapping used and maps the page if the type is
> + * PCI_P2PDMA_MAP_BUS_ADDR.
> + */
> +enum pci_p2pdma_map_type
> +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device 
> *dev,
> +struct scatterlist *sg)
> +{
> + if (state->pgmap != sg_page(sg)->pgmap) {
> + state->pgmap = sg_page(sg)->pgmap;
> + state->map = pci_p2pdma_map_type(state->pgmap, dev);
> + state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset;
> + }
> +
> + if (state->map == PCI_P2PDMA_MAP_BUS_ADDR) {
> + sg->dma_address = sg_phys(sg) + state->bus_off;
> + sg_dma_len(sg) = sg->length;
> + sg_dma_mark_pci_p2pdma(sg);
> + }
> +
> + return state->map;
> +}
> +
> +/**
> + * pci_p2pdma_map_bus_segment - map an sg segment pre determined to
> + *   be mapped with PCI_P2PDMA_MAP_BUS_ADDR
> + * @pg_sg: scatterlist segment with the page to map
> + * @dma_sg: scatterlist segment to assign a dma address to

s/dma address/DMA address/, also below

> + *
> + * This is a helper for iommu dma_map_sg() implementations when the
> + * segment for the dma address differs from the segment containing the
> + * source page.
> + *
> + * pci_p2pdma_map_type() must have already been called on the pg_sg and
> + * returned PCI_P2PDMA_MAP_BUS_ADDR.
> + */
> +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
> + struct scatterlist *dma_sg)
> +{
> + struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(sg_page(pg_sg)->pgmap);
> +
> + dma_sg->dma_address = sg_phys(pg_sg) + pgmap->bus_offset;
> + sg_dma_len(dma_sg) = pg_sg->length;
> + sg_dma_mark_pci_p2pdma(dma_sg);
> +}
> +
>  /**
>   * pci_p2pdma_enable_store - parse a configfs/sysfs attribute store
>   *   to enable p2pdma
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index caac2d023f8f..e5a8d5bc0f51 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -13,6 +13,12 @@
>  
>  #include 
>  
> +struct pci_p2pdma_map_state {
> + struct dev_pagemap *pgmap;
> + int map;
> + u64 bus_off;
> +};
> +
>  struct block_device;
>  struct scatterlist;
>  
> @@ -70,6 +76,11 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
> scatterlist *sg,
>   int nents, enum dma_data_direction dir, unsigned long attrs);
>  void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
>   int nents, enum dma_data_direction dir, unsigned long attrs);
> +enum pci_p2pdma_map_type
> +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device 
> *dev,
> +struct scatterlist *sg);
> +void pci_p2pdma_map_bus_segment(struct scatterlist *pg_sg,
> + struct scatterlist *dma_sg);
>  int pci_p2pdma_enable_store(const 

Re: [PATCH v3 13/20] PCI/P2PDMA: remove pci_p2pdma_[un]map_sg()

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:53PM -0600, Logan Gunthorpe wrote:
> This interface is superseded by support in dma_map_sg() which now supports
> heterogeneous scatterlists. There are no longer any users, so remove it.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

Ditto.

> ---
>  drivers/pci/p2pdma.c   | 65 --
>  include/linux/pci-p2pdma.h | 27 
>  2 files changed, 92 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 58c34f1f1473..4478633346bd 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -878,71 +878,6 @@ enum pci_p2pdma_map_type pci_p2pdma_map_type(struct 
> dev_pagemap *pgmap,
>   return type;
>  }
>  
> -static int __pci_p2pdma_map_sg(struct pci_p2pdma_pagemap *p2p_pgmap,
> - struct device *dev, struct scatterlist *sg, int nents)
> -{
> - struct scatterlist *s;
> - int i;
> -
> - for_each_sg(sg, s, nents, i) {
> - s->dma_address = sg_phys(s) - p2p_pgmap->bus_offset;
> - sg_dma_len(s) = s->length;
> - }
> -
> - return nents;
> -}
> -
> -/**
> - * pci_p2pdma_map_sg_attrs - map a PCI peer-to-peer scatterlist for DMA
> - * @dev: device doing the DMA request
> - * @sg: scatter list to map
> - * @nents: elements in the scatterlist
> - * @dir: DMA direction
> - * @attrs: DMA attributes passed to dma_map_sg() (if called)
> - *
> - * Scatterlists mapped with this function should be unmapped using
> - * pci_p2pdma_unmap_sg_attrs().
> - *
> - * Returns the number of SG entries mapped or 0 on error.
> - */
> -int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs)
> -{
> - struct pci_p2pdma_pagemap *p2p_pgmap =
> - to_p2p_pgmap(sg_page(sg)->pgmap);
> -
> - switch (pci_p2pdma_map_type(sg_page(sg)->pgmap, dev)) {
> - case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
> - return dma_map_sg_attrs(dev, sg, nents, dir, attrs);
> - case PCI_P2PDMA_MAP_BUS_ADDR:
> - return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
> - default:
> - return 0;
> - }
> -}
> -EXPORT_SYMBOL_GPL(pci_p2pdma_map_sg_attrs);
> -
> -/**
> - * pci_p2pdma_unmap_sg_attrs - unmap a PCI peer-to-peer scatterlist that was
> - *   mapped with pci_p2pdma_map_sg()
> - * @dev: device doing the DMA request
> - * @sg: scatter list to map
> - * @nents: number of elements returned by pci_p2pdma_map_sg()
> - * @dir: DMA direction
> - * @attrs: DMA attributes passed to dma_unmap_sg() (if called)
> - */
> -void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs)
> -{
> - enum pci_p2pdma_map_type map_type;
> -
> - map_type = pci_p2pdma_map_type(sg_page(sg)->pgmap, dev);
> -
> - if (map_type == PCI_P2PDMA_MAP_THRU_HOST_BRIDGE)
> - dma_unmap_sg_attrs(dev, sg, nents, dir, attrs);
> -}
> -EXPORT_SYMBOL_GPL(pci_p2pdma_unmap_sg_attrs);
> -
>  /**
>   * pci_p2pdma_map_segment - map an sg segment determining the mapping type
>   * @state: State structure that should be declared outside of the 
> for_each_sg()
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index e5a8d5bc0f51..0c33a40a86e7 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -72,10 +72,6 @@ void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct 
> scatterlist *sgl);
>  void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
>  enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>struct device *dev);
> -int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs);
> -void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction dir, unsigned long attrs);
>  enum pci_p2pdma_map_type
>  pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device 
> *dev,
>  struct scatterlist *sg);
> @@ -135,17 +131,6 @@ pci_p2pdma_map_type(struct dev_pagemap *pgmap, struct 
> device *dev)
>  {
>   return PCI_P2PDMA_MAP_NOT_SUPPORTED;
>  }
> -static inline int pci_p2pdma_map_sg_attrs(struct device *dev,
> - struct scatterlist *sg, int nents, enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> - return 0;
> -}
> -static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev,
> - struct scatterlist *sg, int nents, enum dma_data_direction dir,
> - unsigned long attrs)
> -{
> -}
>  static inline enum pci_p2pdma_map_type
>  pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device 
> *dev,
>  struct scatterlist *sg)
> @@ -181,16 +166,4 @@ static inline struct pci_dev 

Re: [PATCH v3 02/20] PCI/P2PDMA: attempt to set map_type if it has not been set

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:42PM -0600, Logan Gunthorpe wrote:
> Attempt to find the mapping type for P2PDMA pages on the first
> DMA map attempt if it has not been done ahead of time.
> 
> Previously, the mapping type was expected to be calculated ahead of
> time, but if pages are to come from userspace then there's no
> way to ensure the path was checked ahead of time.
> 
> With this change it's no longer invalid to call pci_p2pdma_map_sg()
> before the mapping type is calculated so drop the WARN_ON when that
> is the case.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

Capitalize subject line.

> ---
>  drivers/pci/p2pdma.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 50cdde3e9a8b..1192c465ba6d 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -848,6 +848,7 @@ static enum pci_p2pdma_map_type 
> pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>   struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
>   struct pci_dev *client;
>   struct pci_p2pdma *p2pdma;
> + int dist;
>  
>   if (!provider->p2pdma)
>   return PCI_P2PDMA_MAP_NOT_SUPPORTED;
> @@ -864,6 +865,10 @@ static enum pci_p2pdma_map_type 
> pci_p2pdma_map_type(struct dev_pagemap *pgmap,
>   type = xa_to_value(xa_load(>map_types,
>  map_types_idx(client)));
>   rcu_read_unlock();
> +
> + if (type == PCI_P2PDMA_MAP_UNKNOWN)
> + return calc_map_type_and_dist(provider, client, , false);
> +
>   return type;
>  }
>  
> @@ -906,7 +911,6 @@ int pci_p2pdma_map_sg_attrs(struct device *dev, struct 
> scatterlist *sg,
>   case PCI_P2PDMA_MAP_BUS_ADDR:
>   return __pci_p2pdma_map_sg(p2p_pgmap, dev, sg, nents);
>   default:
> - WARN_ON_ONCE(1);
>   return 0;
>   }
>  }
> -- 
> 2.30.2
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 19/20] PCI/P2PDMA: introduce pci_mmap_p2pmem()

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:59PM -0600, Logan Gunthorpe wrote:
> Introduce pci_mmap_p2pmem() which is a helper to allocate and mmap
> a hunk of p2pmem into userspace.
> 
> Pages are allocated from the genalloc in bulk and their reference count
> incremented. They are returned to the genalloc when the page is put.
> 
> The VMA does not take a reference to the pages when they are inserted
> with vmf_insert_mixed() (which is necessary for zone device pages) so
> the backing P2P memory is stored in a structures in vm_private_data.
> 
> A pseudo mount is used to allocate an inode for each PCI device. The
> inode's address_space is used in the file doing the mmap so that all
> VMAs are collected and can be unmapped if the PCI device is unbound.
> After unmapping, the VMAs are iterated through and their pages are
> put so the device can continue to be unbound. An active flag is used
> to signal to VMAs not to allocate any further P2P memory once the
> removal process starts. The flag is synchronized with concurrent
> access with an RCU lock.
> 
> The VMAs and inode will survive after the unbind of the device, but no
> pages will be present in the VMA and a subsequent access will result
> in a SIGBUS error.
> 
> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

I would capitalize "Introduce" in the subject line.

> ---
>  drivers/pci/p2pdma.c   | 263 -
>  include/linux/pci-p2pdma.h |  11 ++
>  include/uapi/linux/magic.h |   1 +
>  3 files changed, 273 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 2422af5a529c..a5adf57af53a 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -16,14 +16,19 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct pci_p2pdma {
>   struct gen_pool *pool;
>   bool p2pmem_published;
>   struct xarray map_types;
> + struct inode *inode;
> + bool active;
>  };
>  
>  struct pci_p2pdma_pagemap {
> @@ -32,6 +37,14 @@ struct pci_p2pdma_pagemap {
>   u64 bus_offset;
>  };
>  
> +struct pci_p2pdma_map {
> + struct kref ref;
> + struct pci_dev *pdev;
> + struct inode *inode;
> + void *kaddr;
> + size_t len;
> +};
> +
>  static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap)
>  {
>   return container_of(pgmap, struct pci_p2pdma_pagemap, pgmap);
> @@ -100,6 +113,26 @@ static const struct attribute_group p2pmem_group = {
>   .name = "p2pmem",
>  };
>  
> +/*
> + * P2PDMA internal mount
> + * Fake an internal VFS mount-point in order to allocate struct address_space
> + * mappings to remove VMAs on unbind events.
> + */
> +static int pci_p2pdma_fs_cnt;
> +static struct vfsmount *pci_p2pdma_fs_mnt;
> +
> +static int pci_p2pdma_fs_init_fs_context(struct fs_context *fc)
> +{
> + return init_pseudo(fc, P2PDMA_MAGIC) ? 0 : -ENOMEM;
> +}
> +
> +static struct file_system_type pci_p2pdma_fs_type = {
> + .name = "p2dma",
> + .owner = THIS_MODULE,
> + .init_fs_context = pci_p2pdma_fs_init_fs_context,
> + .kill_sb = kill_anon_super,
> +};
> +
>  static void p2pdma_page_free(struct page *page)
>  {
>   struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page->pgmap);
> @@ -128,6 +161,9 @@ static void pci_p2pdma_release(void *data)
>   gen_pool_destroy(p2pdma->pool);
>   sysfs_remove_group(>dev.kobj, _group);
>   xa_destroy(>map_types);
> +
> + iput(p2pdma->inode);
> + simple_release_fs(_p2pdma_fs_mnt, _p2pdma_fs_cnt);
>  }
>  
>  static int pci_p2pdma_setup(struct pci_dev *pdev)
> @@ -145,17 +181,32 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
>   if (!p2p->pool)
>   goto out;
>  
> - error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev);
> + error = simple_pin_fs(_p2pdma_fs_type, _p2pdma_fs_mnt,
> +   _p2pdma_fs_cnt);
>   if (error)
>   goto out_pool_destroy;
>  
> + p2p->inode = alloc_anon_inode(pci_p2pdma_fs_mnt->mnt_sb);
> + if (IS_ERR(p2p->inode)) {
> + error = -ENOMEM;
> + goto out_unpin_fs;
> + }
> +
> + error = devm_add_action_or_reset(>dev, pci_p2pdma_release, pdev);
> + if (error)
> + goto out_put_inode;
> +
>   error = sysfs_create_group(>dev.kobj, _group);
>   if (error)
> - goto out_pool_destroy;
> + goto out_put_inode;
>  
>   rcu_assign_pointer(pdev->p2pdma, p2p);
>   return 0;
>  
> +out_put_inode:
> + iput(p2p->inode);
> +out_unpin_fs:
> + simple_release_fs(_p2pdma_fs_mnt, _p2pdma_fs_cnt);
>  out_pool_destroy:
>   gen_pool_destroy(p2p->pool);
>  out:
> @@ -163,6 +214,45 @@ static int pci_p2pdma_setup(struct pci_dev *pdev)
>   return error;
>  }
>  
> +static void pci_p2pdma_map_free_pages(struct pci_p2pdma_map *pmap)
> +{
> + int i;
> +
> + if (!pmap->kaddr)
> + return;
> +
> + 

Re: [PATCH v3 03/20] PCI/P2PDMA: make pci_p2pdma_map_type() non-static

2021-09-27 Thread Bjorn Helgaas
On Thu, Sep 16, 2021 at 05:40:43PM -0600, Logan Gunthorpe wrote:
> pci_p2pdma_map_type() will be needed by the dma-iommu map_sg
> implementation because it will need to determine the mapping type
> ahead of actually doing the mapping to create the actual iommu mapping.

I don't expect this to go via the PCI tree, but if it did I would
silently:

  s/PCI/P2PDMA: make pci_p2pdma_map_type() non-static/
PCI/P2PDMA: Expose pci_p2pdma_map_type()/
  s/iommu/IOMMU/

and mention what this patch does in the commit log (in addition to the
subject) and fix a couple minor typos below.

> Signed-off-by: Logan Gunthorpe 

Acked-by: Bjorn Helgaas 

> ---
>  drivers/pci/p2pdma.c   | 24 +-
>  include/linux/pci-p2pdma.h | 41 ++
>  2 files changed, 56 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 1192c465ba6d..b656d8c801a7 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -20,13 +20,6 @@
>  #include 
>  #include 
>  
> -enum pci_p2pdma_map_type {
> - PCI_P2PDMA_MAP_UNKNOWN = 0,
> - PCI_P2PDMA_MAP_NOT_SUPPORTED,
> - PCI_P2PDMA_MAP_BUS_ADDR,
> - PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
> -};
> -
>  struct pci_p2pdma {
>   struct gen_pool *pool;
>   bool p2pmem_published;
> @@ -841,8 +834,21 @@ void pci_p2pmem_publish(struct pci_dev *pdev, bool 
> publish)
>  }
>  EXPORT_SYMBOL_GPL(pci_p2pmem_publish);
>  
> -static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap 
> *pgmap,
> - struct device *dev)
> +/**
> + * pci_p2pdma_map_type - return the type of mapping that should be used for
> + *   a given device and pgmap
> + * @pgmap: the pagemap of a page to determine the mapping type for
> + * @dev: device that is mapping the page
> + *
> + * Returns one of:
> + *   PCI_P2PDMA_MAP_NOT_SUPPORTED - The mapping should not be done
> + *   PCI_P2PDMA_MAP_BUS_ADDR - The mapping should use the PCI bus address
> + *   PCI_P2PDMA_MAP_THRU_HOST_BRIDGE - The mapping should be done normally
> + *   using the CPU physical address (in dma-direct) or an IOVA
> + *   mapping for the IOMMU.
> + */
> +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
> +  struct device *dev)
>  {
>   enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED;
>   struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider;
> diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h
> index 8318a97c9c61..caac2d023f8f 100644
> --- a/include/linux/pci-p2pdma.h
> +++ b/include/linux/pci-p2pdma.h
> @@ -16,6 +16,40 @@
>  struct block_device;
>  struct scatterlist;
>  
> +enum pci_p2pdma_map_type {
> + /*
> +  * PCI_P2PDMA_MAP_UNKNOWN: Used internally for indicating the mapping
> +  * type hasn't been calculated yet. Functions that return this enum
> +  * never return this value.
> +  */
> + PCI_P2PDMA_MAP_UNKNOWN = 0,
> +
> + /*
> +  * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will
> +  * traverse the host bridge and the host bridge is not in the
> +  * whitelist. DMA Mapping routines should return an error when
> +  * this is returned.
> +  */
> + PCI_P2PDMA_MAP_NOT_SUPPORTED,
> +
> + /*
> +  * PCI_P2PDMA_BUS_ADDR: Indicates that two devices can talk to
> +  * eachother directly through a PCI switch and the transaction will
> +  * not traverse the host bridge. Such a mapping should program
> +  * the DMA engine with PCI bus addresses.

s/eachother/each other/

> +  */
> + PCI_P2PDMA_MAP_BUS_ADDR,
> +
> + /*
> +  * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk
> +  * to eachother, but the transaction traverses a host bridge on the
> +  * whitelist. In this case, a normal mapping either with CPU physical
> +  * addresses (in the case of dma-direct) or IOVA addresses (in the
> +  * case of IOMMUs) should be used to program the DMA engine.

s/eachother/each other/

> +  */
> + PCI_P2PDMA_MAP_THRU_HOST_BRIDGE,
> +};
> +
>  #ifdef CONFIG_PCI_P2PDMA
>  int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size,
>   u64 offset);
> @@ -30,6 +64,8 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev 
> *pdev,
>unsigned int *nents, u32 length);
>  void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl);
>  void pci_p2pmem_publish(struct pci_dev *pdev, bool publish);
> +enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap,
> +  struct device *dev);
>  int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
>   int nents, enum dma_data_direction dir, unsigned long attrs);
>  void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
> @@ -83,6 +119,11 

Re: [PATCH v2 6/9] PCI: Add pci_find_dvsec_capability to find designated VSEC

2021-09-27 Thread Bjorn Helgaas
s/pci_find_dvsec_capability/pci_find_dvsec_capability()/ in subject
and commit log.

On Thu, Sep 23, 2021 at 10:26:44AM -0700, Ben Widawsky wrote:
> Add pci_find_dvsec_capability to locate a Designated Vendor-Specific
> Extended Capability with the specified DVSEC ID.

"specified Vendor ID and Capability ID".

> The Designated Vendor-Specific Extended Capability (DVSEC) allows one or
> more vendor specific capabilities that aren't tied to the vendor ID of
> the PCI component.
> 
> DVSEC is critical for both the Compute Express Link (CXL) driver as well
> as the driver for OpenCAPI coherent accelerator (OCXL).

Strictly speaking, not really relevant for the commit log.

> Cc: David E. Box 
> Cc: Jonathan Cameron 
> Cc: Bjorn Helgaas 
> Cc: Dan Williams 
> Cc: linux-...@vger.kernel.org
> Cc: linuxppc-...@lists.ozlabs.org
> Cc: Andrew Donnellan 
> Cc: Lu Baolu 
> Reviewed-by: Frederic Barrat 
> Signed-off-by: Ben Widawsky 

If you want to merge this with the series,

Acked-by: Bjorn Helgaas 

Or if you want me to merge this on a branch, let me know.

> ---
>  drivers/pci/pci.c   | 32 
>  include/linux/pci.h |  1 +
>  2 files changed, 33 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index ce2ab62b64cf..94ac86ff28b0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -732,6 +732,38 @@ u16 pci_find_vsec_capability(struct pci_dev *dev, u16 
> vendor, int cap)
>  }
>  EXPORT_SYMBOL_GPL(pci_find_vsec_capability);
>  
> +/**
> + * pci_find_dvsec_capability - Find DVSEC for vendor
> + * @dev: PCI device to query
> + * @vendor: Vendor ID to match for the DVSEC
> + * @dvsec: Designated Vendor-specific capability ID
> + *
> + * If DVSEC has Vendor ID @vendor and DVSEC ID @dvsec return the capability
> + * offset in config space; otherwise return 0.
> + */
> +u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec)
> +{
> + int pos;
> +
> + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DVSEC);
> + if (!pos)
> + return 0;
> +
> + while (pos) {
> + u16 v, id;
> +
> + pci_read_config_word(dev, pos + PCI_DVSEC_HEADER1, );
> + pci_read_config_word(dev, pos + PCI_DVSEC_HEADER2, );
> + if (vendor == v && dvsec == id)
> + return pos;
> +
> + pos = pci_find_next_ext_capability(dev, pos, 
> PCI_EXT_CAP_ID_DVSEC);
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(pci_find_dvsec_capability);
> +
>  /**
>   * pci_find_parent_resource - return resource region of parent bus of given
>   * region
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index cd8aa6fce204..c93ccfa4571b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1130,6 +1130,7 @@ u16 pci_find_ext_capability(struct pci_dev *dev, int 
> cap);
>  u16 pci_find_next_ext_capability(struct pci_dev *dev, u16 pos, int cap);
>  struct pci_bus *pci_find_next_bus(const struct pci_bus *from);
>  u16 pci_find_vsec_capability(struct pci_dev *dev, u16 vendor, int cap);
> +u16 pci_find_dvsec_capability(struct pci_dev *dev, u16 vendor, u16 dvsec);
>  
>  u64 pci_get_dsn(struct pci_dev *dev);
>  
> -- 
> 2.33.0
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] [RFC] qcom_scm: hide Kconfig symbol

2021-09-27 Thread Arnd Bergmann
From: Arnd Bergmann 

Now that SCM can be a loadable module, we have to add another
dependency to avoid link failures when ipa or adreno-gpu are
built-in:

aarch64-linux-ld: drivers/net/ipa/ipa_main.o: in function `ipa_probe':
ipa_main.c:(.text+0xfc4): undefined reference to `qcom_scm_is_available'

ld.lld: error: undefined symbol: qcom_scm_is_available
>>> referenced by adreno_gpu.c
>>>   gpu/drm/msm/adreno/adreno_gpu.o:(adreno_zap_shader_load) in 
>>> archive drivers/built-in.a

This can happen when CONFIG_ARCH_QCOM is disabled and we don't select
QCOM_MDT_LOADER, but some other module selects QCOM_SCM. Ideally we'd
use a similar dependency here to what we have for QCOM_RPROC_COMMON,
but that causes dependency loops from other things selecting QCOM_SCM.

This appears to be an endless problem, so try something different this
time:

 - CONFIG_QCOM_SCM becomes a hidden symbol that nothing 'depends on'
   but that is simply selected by all of its users

 - All the stubs in include/linux/qcom_scm.h can go away

 - arm-smccc.h needs to provide a stub for __arm_smccc_smc() to
   allow compile-testing QCOM_SCM on all architectures.

 - To avoid a circular dependency chain involving RESET_CONTROLLER
   and PINCTRL_SUNXI, change the 'depends on RESET_CONTROLLER' in
   the latter one to 'select'.

The last bit is rather annoying, as drivers should generally never
'select' another subsystem, and about half the users of the reset
controller interface do this anyway.

Nevertheless, this version seems to pass all my randconfig tests
and is more robust than any of the prior versions.

Comments?

Signed-off-by: Arnd Bergmann 
---
 drivers/firmware/Kconfig|  4 +-
 drivers/gpu/drm/msm/Kconfig |  4 +-
 drivers/iommu/Kconfig   |  2 +-
 drivers/media/platform/Kconfig  |  2 +-
 drivers/mmc/host/Kconfig|  2 +-
 drivers/net/ipa/Kconfig |  1 +
 drivers/net/wireless/ath/ath10k/Kconfig |  2 +-
 drivers/pinctrl/qcom/Kconfig|  3 +-
 drivers/pinctrl/sunxi/Kconfig   |  6 +--
 include/linux/arm-smccc.h   | 10 
 include/linux/qcom_scm.h| 71 -
 11 files changed, 23 insertions(+), 84 deletions(-)

diff --git a/drivers/firmware/Kconfig b/drivers/firmware/Kconfig
index 220a58cf0a44..f7dd82ef0b9c 100644
--- a/drivers/firmware/Kconfig
+++ b/drivers/firmware/Kconfig
@@ -203,9 +203,7 @@ config INTEL_STRATIX10_RSU
  Say Y here if you want Intel RSU support.
 
 config QCOM_SCM
-   tristate "Qcom SCM driver"
-   depends on ARM || ARM64
-   depends on HAVE_ARM_SMCCC
+   tristate
select RESET_CONTROLLER
 
 config QCOM_SCM_DOWNLOAD_MODE_DEFAULT
diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index e9c6af78b1d7..3ddf739a6f9b 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -17,7 +17,7 @@ config DRM_MSM
select DRM_SCHED
select SHMEM
select TMPFS
-   select QCOM_SCM if ARCH_QCOM
+   select QCOM_SCM
select WANT_DEV_COREDUMP
select SND_SOC_HDMI_CODEC if SND_SOC
select SYNC_FILE
@@ -55,7 +55,7 @@ config DRM_MSM_GPU_SUDO
 
 config DRM_MSM_HDMI_HDCP
bool "Enable HDMI HDCP support in MSM DRM driver"
-   depends on DRM_MSM && QCOM_SCM
+   depends on DRM_MSM
default y
help
  Choose this option to enable HDCP state machine
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 124c41adeca1..989c83acbfee 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -308,7 +308,7 @@ config APPLE_DART
 config ARM_SMMU
tristate "ARM Ltd. System MMU (SMMU) Support"
depends on ARM64 || ARM || (COMPILE_TEST && !GENERIC_ATOMIC64)
-   depends on QCOM_SCM || !QCOM_SCM #if QCOM_SCM=m this can't be =y
+   select QCOM_SCM
select IOMMU_API
select IOMMU_IO_PGTABLE_LPAE
select ARM_DMA_USE_IOMMU if ARM
diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 157c924686e4..80321e03809a 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -565,7 +565,7 @@ config VIDEO_QCOM_VENUS
depends on VIDEO_DEV && VIDEO_V4L2 && QCOM_SMEM
depends on (ARCH_QCOM && IOMMU_DMA) || COMPILE_TEST
select QCOM_MDT_LOADER if ARCH_QCOM
-   select QCOM_SCM if ARCH_QCOM
+   select QCOM_SCM
select VIDEOBUF2_DMA_CONTIG
select V4L2_MEM2MEM_DEV
help
diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 71313961cc54..95b3511b0560 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -547,7 +547,7 @@ config MMC_SDHCI_MSM
depends on MMC_SDHCI_PLTFM
select MMC_SDHCI_IO_ACCESSORS
select MMC_CQHCI
-   select QCOM_SCM if MMC_CRYPTO && ARCH_QCOM
+   select QCOM_SCM if MMC_CRYPTO
help
  This selects the Secure Digital 

Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

2021-09-27 Thread Jason Gunthorpe via iommu
On Mon, Sep 27, 2021 at 09:42:58AM +, Tian, Kevin wrote:

> +static int iommu_dev_viable(struct device *dev, void *data)
> +{
> + enum dma_hint hint = *data;
> + struct device_driver *drv = READ_ONCE(dev->driver);
> +
> + /* no conflict if the new device doesn't do DMA */
> + if (hint == DMA_FOR_NONE)
> + return 0;
> +
> + /* no conflict if this device is driver-less, or doesn't do DMA */
> + if (!drv || (drv->dma_hint == DMA_FOR_NONE))
> + return 0;

While it is kind of clever to fetch this in the drv like this, the
locking just doesn't work right.

The group itself needs to have an atomic that encodes what state it is
in. You can read the initial state from the drv, under the
device_lock, and update the atomic state

Also, don't call it "hint", there is nothing hinty about this, it has
definitive functional impacts.

Greg will want to see a definiate benefit from this extra global code,
so be sure to explain about why the BUG_ON is bad, and how driver core
involvement is needed to fix it properly.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver

2021-09-27 Thread Tianyu Lan

Hi Christoph:
Gentile ping. The swiotlb and shared memory mapping changes in this
patchset needs your reivew. Could you have a look?

Thanks.

On 9/22/2021 6:34 PM, Tianyu Lan wrote:

Hi Christoph:
     This patch follows your purposal in the previous discussion.
Could you have a look?
     "use vmap_pfn as in the current series.  But in that case I think
     we should get rid of the other mapping created by vmalloc.  I
     though a bit about finding a way to apply the offset in vmalloc
     itself, but I think it would be too invasive to the normal fast
     path.  So the other sub-option would be to allocate the pages
     manually (maybe even using high order allocations to reduce TLB
     pressure) and then remap them(https://lkml.org/lkml/2021/9/2/112)

Otherwise, I merge your previous change for swiotlb into patch 9
“x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM”
You previous change 
link.(http://git.infradead.org/users/hch/misc.git/commit/8248f295928aded3364a1e54a4e0022e93d3610c) 
Please have a look.



Thanks.


On 9/16/2021 12:21 AM, Michael Kelley wrote:
From: Tianyu Lan   Sent: Tuesday, September 14, 
2021 6:39 AM


In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() stills need to be handled. Use DMA API to map/umap
these memory during sending/receiving packet and Hyper-V swiotlb
bounce buffer dma address will be returned. The swiotlb bounce buffer
has been masked to be visible to host during boot up.

Allocate rx/tx ring buffer via alloc_pages() in Isolation VM and map
these pages via vmap(). After calling vmbus_establish_gpadl() which
marks these pages visible to host, unmap these pages to release the
virtual address mapped with physical address below shared_gpa_boundary
and map them in the extra address space via vmap_pfn().

Signed-off-by: Tianyu Lan 
---
Change since v4:
* Allocate rx/tx ring buffer via alloc_pages() in Isolation VM
* Map pages after calling vmbus_establish_gpadl().
* set dma_set_min_align_mask for netvsc driver.

Change since v3:
* Add comment to explain why not to use dma_map_sg()
* Fix some error handle.
---
  drivers/net/hyperv/hyperv_net.h   |   7 +
  drivers/net/hyperv/netvsc.c   | 287 +-
  drivers/net/hyperv/netvsc_drv.c   |   1 +
  drivers/net/hyperv/rndis_filter.c |   2 +
  include/linux/hyperv.h    |   5 +
  5 files changed, 296 insertions(+), 6 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h 
b/drivers/net/hyperv/hyperv_net.h

index 315278a7cf88..87e8c74398a5 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
  u32 total_bytes;
  u32 send_buf_index;
  u32 total_data_buflen;
+    struct hv_dma_range *dma_range;
  };

  #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,8 @@ struct netvsc_device {

  /* Receive buffer allocated by us but manages by NetVSP */
  void *recv_buf;
+    struct page **recv_pages;
+    u32 recv_page_count;
  u32 recv_buf_size; /* allocated bytes */
  struct vmbus_gpadl recv_buf_gpadl_handle;
  u32 recv_section_cnt;
@@ -1082,6 +1085,8 @@ struct netvsc_device {

  /* Send buffer allocated by us */
  void *send_buf;
+    struct page **send_pages;
+    u32 send_page_count;
  u32 send_buf_size;
  struct vmbus_gpadl send_buf_gpadl_handle;
  u32 send_section_cnt;
@@ -1731,4 +1736,6 @@ struct rndis_message {
  #define RETRY_US_HI    1
  #define RETRY_MAX    2000    /* >10 sec */

+void netvsc_dma_unmap(struct hv_device *hv_dev,
+  struct hv_netvsc_packet *packet);
  #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 1f87e570ed2b..7d5254bf043e 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -20,6 +20,7 @@
  #include 
  #include 
  #include 
+#include 

  #include 
  #include 
@@ -150,11 +151,33 @@ static void free_netvsc_device(struct rcu_head 
*head)

  {
  struct netvsc_device *nvdev
  = container_of(head, struct netvsc_device, rcu);
+    unsigned int alloc_unit;
  int i;

  kfree(nvdev->extension);
-    vfree(nvdev->recv_buf);
-    vfree(nvdev->send_buf);
+
+    if (nvdev->recv_pages) {
+    alloc_unit = (nvdev->recv_buf_size /
+    nvdev->recv_page_count) >> PAGE_SHIFT;
+
+    vunmap(nvdev->recv_buf);
+    for (i = 0; i < nvdev->recv_page_count; i++)
+    __free_pages(nvdev->recv_pages[i], alloc_unit);
+    } else {
+    vfree(nvdev->recv_buf);
+    }
+
+    if (nvdev->send_pages) {
+    alloc_unit = (nvdev->send_buf_size /
+    nvdev->send_page_count) >> PAGE_SHIFT;
+
+    vunmap(nvdev->send_buf);
+    for (i = 0; i < nvdev->send_page_count; i++)
+    __free_pages(nvdev->send_pages[i], 

Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

2021-09-27 Thread Jason Gunthorpe via iommu
On Mon, Sep 27, 2021 at 01:00:08PM +, Tian, Kevin wrote:

> > I think for such a narrow usage you should not change the struct
> > device_driver. Just have pci_stub call a function to flip back to user
> > mode.
> 
> Here we want to ensure that kernel dma should be blocked
> if the group is already marked for user-dma. If we just blindly
> do it for any driver at this point (as you commented earlier):
> 
> +   ret = iommu_set_kernel_ownership(dev);
> +   if (ret)
> +   return ret;
> 
> how would pci-stub reach its function to indicate that it doesn't 
> do dma and flip back?

> Do you envision a simpler policy that no driver can be bound
> to the group if it's already set for user-dma? what about vfio-pci
> itself?

Yes.. I'm not sure there is a good use case to allow the stub drivers
to load/unload while a VFIO is running. At least, not a strong enough
one to justify a global change to the driver core..

> > > +static int iommu_dev_viable(struct device *dev, void *data)
> > > +{
> > > + enum dma_hint hint = *data;
> > > + struct device_driver *drv = READ_ONCE(dev->driver);
> > 
> > Especially since this isn't locked properly or safe.
> 
> I have the same worry when copying from vfio. Not sure how
> vfio gets safe with this approach...

Fixing the locking in vfio_dev_viable is part of deleting the unbound
list. Once it properly uses the device_lock and doesn't race with the
driver core like this things are much better. Don't copy this stuff
into the iommu core without fixing it.

https://github.com/jgunthorpe/linux/commit/fa6abb318ccca114da12c0b5b123c99131ace926
https://github.com/jgunthorpe/linux/commit/45980bd90b023d1eea56df70d1c395bdf4cc7cf1

I can't remember if the above is contingent on some of the mdev
cleanups or not.. Have to get back to it.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

2021-09-27 Thread Tian, Kevin
> From: Lu Baolu 
> Sent: Monday, September 27, 2021 7:34 PM
> 
> On 2021/9/27 17:42, Tian, Kevin wrote:
> > +int iommu_device_set_dma_hint(struct device *dev, enum dma_hint hint)
> > +{
> > +   struct iommu_group *group;
> > +   int ret;
> > +
> > +   group = iommu_group_get(dev);
> > +   /* not an iommu-probed device */
> > +   if (!group)
> > +   return 0;
> > +
> > +   mutex_lock(>mutex);
> > +   ret = __iommu_group_viable(group, hint);
> > +   mutex_unlock(>mutex);
> > +
> > +   iommu_group_put(group);
> > +   return ret;
> > +}
> 
> Conceptually, we could also move iommu_deferred_attach() from
> iommu_dma_ops here to save unnecessary checks in the hot DMA API
> paths?
> 

Yes, it's possible. But just be curious, why doesn't iommu core 
manage deferred_attach when receiving BOUND_DRIVER event?
Is there other implication that deferred attach cannot be done
at driver binding time?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

2021-09-27 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Monday, September 27, 2021 7:54 PM
> 
> On Mon, Sep 27, 2021 at 09:42:58AM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Wednesday, September 22, 2021 8:40 PM
> > >
> > > > > Ie the basic flow would see the driver core doing some:
> > > >
> > > > Just double confirm. Is there concern on having the driver core to
> > > > call iommu functions?
> > >
> > > It is always an interesting question, but I'd say iommu is
> > > foundantional to Linux and if it needs driver core help it shouldn't
> > > be any different from PM, pinctl, or other subsystems that have
> > > inserted themselves into the driver core.
> > >
> > > Something kind of like the below.
> > >
> > > If I recall, once it is done like this then the entire iommu notifier
> > > infrastructure can be ripped out which is a lot of code.
> >
> > Currently vfio is the only user of this notifier mechanism. Now
> > three events are handled in vfio_iommu_group_notifier():
> >
> > NOTIFY_ADD_DEVICE: this is basically for some sanity check. suppose
> > not required once we handle it cleanly in the iommu/driver core.
> >
> > NOTIFY_BOUND_DRIVER: the BUG_ON() logic to be fixed by this change.
> >
> > NOTIFY_UNBOUND_DRIVER: still needs some thoughts. Based on
> > the comments the group->unbound_list is used to avoid breaking
> 
> I have a patch series to delete the unbound_list, the scenario you
> describe is handled by the device_lock()

that's great!

> 
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 68ea1f9..826a651 100644
> > +++ b/drivers/base/dd.c
> > @@ -566,6 +566,10 @@ static int really_probe(struct device *dev, struct
> device_driver *drv)
> > goto done;
> > }
> >
> > +   ret = iommu_device_set_dma_hint(dev, drv->dma_hint);
> > +   if (ret)
> > +   return ret;
> 
> I think for such a narrow usage you should not change the struct
> device_driver. Just have pci_stub call a function to flip back to user
> mode.

Here we want to ensure that kernel dma should be blocked
if the group is already marked for user-dma. If we just blindly
do it for any driver at this point (as you commented earlier):

+   ret = iommu_set_kernel_ownership(dev);
+   if (ret)
+   return ret;

how would pci-stub reach its function to indicate that it doesn't 
do dma and flip back?

Do you envision a simpler policy that no driver can be bound
to the group if it's already set for user-dma? what about vfio-pci
itself?

> 
> > +static int iommu_dev_viable(struct device *dev, void *data)
> > +{
> > +   enum dma_hint hint = *data;
> > +   struct device_driver *drv = READ_ONCE(dev->driver);
> 
> Especially since this isn't locked properly or safe.

I have the same worry when copying from vfio. Not sure how
vfio gets safe with this approach...

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/mediatek: fix out-of-range warning with clang

2021-09-27 Thread Arnd Bergmann
From: Arnd Bergmann 

clang-14 notices that a comparison is never true when
CONFIG_PHYS_ADDR_T_64BIT is disabled:

drivers/iommu/mtk_iommu.c:553:34: error: result of comparison of constant 
5368709120 with expression of type 'phys_addr_t' (aka 'unsigned int') is always 
false [-Werror,-Wtautological-constant-out-of-range-compare]
if (dom->data->enable_4GB && pa >= MTK_IOMMU_4GB_MODE_REMAP_BASE)
 ~~ ^  ~

Add an explicit check for the type of the variable to skip the check
and the warning in that case.

Fixes: b4dad40e4f35 ("iommu/mediatek: Adjust the PA for the 4GB Mode")
Signed-off-by: Arnd Bergmann 
---
 drivers/iommu/mtk_iommu.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index d837adfd1da5..25b834104790 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -550,7 +550,9 @@ static phys_addr_t mtk_iommu_iova_to_phys(struct 
iommu_domain *domain,
phys_addr_t pa;
 
pa = dom->iop->iova_to_phys(dom->iop, iova);
-   if (dom->data->enable_4GB && pa >= MTK_IOMMU_4GB_MODE_REMAP_BASE)
+   if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT) &&
+   dom->data->enable_4GB &&
+   pa >= MTK_IOMMU_4GB_MODE_REMAP_BASE)
pa &= ~BIT_ULL(32);
 
return pa;
-- 
2.29.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

2021-09-27 Thread Jason Gunthorpe via iommu
On Mon, Sep 27, 2021 at 09:42:58AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Wednesday, September 22, 2021 8:40 PM
> > 
> > > > Ie the basic flow would see the driver core doing some:
> > >
> > > Just double confirm. Is there concern on having the driver core to
> > > call iommu functions?
> > 
> > It is always an interesting question, but I'd say iommu is
> > foundantional to Linux and if it needs driver core help it shouldn't
> > be any different from PM, pinctl, or other subsystems that have
> > inserted themselves into the driver core.
> > 
> > Something kind of like the below.
> > 
> > If I recall, once it is done like this then the entire iommu notifier
> > infrastructure can be ripped out which is a lot of code.
> 
> Currently vfio is the only user of this notifier mechanism. Now 
> three events are handled in vfio_iommu_group_notifier():
> 
> NOTIFY_ADD_DEVICE: this is basically for some sanity check. suppose
> not required once we handle it cleanly in the iommu/driver core.
> 
> NOTIFY_BOUND_DRIVER: the BUG_ON() logic to be fixed by this change.
> 
> NOTIFY_UNBOUND_DRIVER: still needs some thoughts. Based on
> the comments the group->unbound_list is used to avoid breaking

I have a patch series to delete the unbound_list, the scenario you
describe is handled by the device_lock()

> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 68ea1f9..826a651 100644
> +++ b/drivers/base/dd.c
> @@ -566,6 +566,10 @@ static int really_probe(struct device *dev, struct 
> device_driver *drv)
>   goto done;
>   }
>  
> + ret = iommu_device_set_dma_hint(dev, drv->dma_hint);
> + if (ret)
> + return ret;

I think for such a narrow usage you should not change the struct
device_driver. Just have pci_stub call a function to flip back to user
mode.

> +static int iommu_dev_viable(struct device *dev, void *data)
> +{
> + enum dma_hint hint = *data;
> + struct device_driver *drv = READ_ONCE(dev->driver);

Especially since this isn't locked properly or safe.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

2021-09-27 Thread Lu Baolu

On 2021/9/27 17:42, Tian, Kevin wrote:

+int iommu_device_set_dma_hint(struct device *dev, enum dma_hint hint)
+{
+   struct iommu_group *group;
+   int ret;
+
+   group = iommu_group_get(dev);
+   /* not an iommu-probed device */
+   if (!group)
+   return 0;
+
+   mutex_lock(>mutex);
+   ret = __iommu_group_viable(group, hint);
+   mutex_unlock(>mutex);
+
+   iommu_group_put(group);
+   return ret;
+}


Conceptually, we could also move iommu_deferred_attach() from
iommu_dma_ops here to save unnecessary checks in the hot DMA API
paths?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC 06/20] iommu: Add iommu_device_init[exit]_user_dma interfaces

2021-09-27 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 22, 2021 8:40 PM
> 
> > > Ie the basic flow would see the driver core doing some:
> >
> > Just double confirm. Is there concern on having the driver core to
> > call iommu functions?
> 
> It is always an interesting question, but I'd say iommu is
> foundantional to Linux and if it needs driver core help it shouldn't
> be any different from PM, pinctl, or other subsystems that have
> inserted themselves into the driver core.
> 
> Something kind of like the below.
> 
> If I recall, once it is done like this then the entire iommu notifier
> infrastructure can be ripped out which is a lot of code.

Currently vfio is the only user of this notifier mechanism. Now 
three events are handled in vfio_iommu_group_notifier():

NOTIFY_ADD_DEVICE: this is basically for some sanity check. suppose
not required once we handle it cleanly in the iommu/driver core.

NOTIFY_BOUND_DRIVER: the BUG_ON() logic to be fixed by this change.

NOTIFY_UNBOUND_DRIVER: still needs some thoughts. Based on
the comments the group->unbound_list is used to avoid breaking
group viability check between vfio_unregister_group_dev() and 
final dev/drv teardown. within that small window the device is
not tracked by vfio group but is still bound to a driver (e.g. vfio-pci
itself), while an external group user may hold a reference to the
group. Possibly it's not required now with the new mechanism as 
we rely on init/exit_user_dma() as the single switch to claim/
withdraw the group ownership. As long as exit_user_dma() is not 
called until vfio_group_release(), above small window is covered
thus no need to maintain a unbound_list.

But anyway since this corner case is tricky, will think more in case
of any oversight.

> 
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 68ea1f949daa90..e39612c99c6123 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -566,6 +566,10 @@ static int really_probe(struct device *dev, struct
> device_driver *drv)
> goto done;
> }
> 
> +   ret = iommu_set_kernel_ownership(dev);
> +   if (ret)
> +   return ret;
> +
>  re_probe:
> dev->driver = drv;
> 
> @@ -673,6 +677,7 @@ static int really_probe(struct device *dev, struct
> device_driver *drv)
> dev->pm_domain->dismiss(dev);
> pm_runtime_reinit(dev);
> dev_pm_set_driver_flags(dev, 0);
> +   iommu_release_kernel_ownership(dev);
>  done:
> return ret;
>  }
> @@ -1214,6 +1219,7 @@ static void __device_release_driver(struct device
> *dev, struct device *parent)
> dev->pm_domain->dismiss(dev);
> pm_runtime_reinit(dev);
> dev_pm_set_driver_flags(dev, 0);
> +   iommu_release_kernel_ownership(dev);
> 
> klist_remove(>p->knode_driver);
> device_pm_check_callbacks(dev);

I expanded above into below conceptual draft. Please help check whether
it matches your thought:

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 68ea1f9..826a651 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -566,6 +566,10 @@ static int really_probe(struct device *dev, struct 
device_driver *drv)
goto done;
}
 
+   ret = iommu_device_set_dma_hint(dev, drv->dma_hint);
+   if (ret)
+   return ret;
+
 re_probe:
dev->driver = drv;
 
@@ -673,6 +677,7 @@ static int really_probe(struct device *dev, struct 
device_driver *drv)
dev->pm_domain->dismiss(dev);
pm_runtime_reinit(dev);
dev_pm_set_driver_flags(dev, 0);
+   iommu_device_clear_dma_hint(dev);
 done:
return ret;
 }
@@ -1214,6 +1219,7 @@ static void __device_release_driver(struct device *dev, 
struct device *parent)
dev->pm_domain->dismiss(dev);
pm_runtime_reinit(dev);
dev_pm_set_driver_flags(dev, 0);
+   iommu_device_clear_dma_hint(dev);
 
klist_remove(>p->knode_driver);
device_pm_check_callbacks(dev);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3303d70..b12f335 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1064,6 +1064,104 @@ void iommu_group_put(struct iommu_group *group)
 }
 EXPORT_SYMBOL_GPL(iommu_group_put);
 
+static int iommu_dev_viable(struct device *dev, void *data)
+{
+   enum dma_hint hint = *data;
+   struct device_driver *drv = READ_ONCE(dev->driver);
+
+   /* no conflict if the new device doesn't do DMA */
+   if (hint == DMA_FOR_NONE)
+   return 0;
+
+   /* no conflict if this device is driver-less, or doesn't do DMA */
+   if (!drv || (drv->dma_hint == DMA_FOR_NONE))
+   return 0;
+
+   /* kernel dma and user dma are exclusive */
+   if (hint != drv->dma_hint)
+   return -EINVAL;
+
+   /*
+* devices in the group could be bound to different user-dma