Re: [PATCH 6/9] swiotlb: refactor swiotlb_tbl_map_single

2021-02-22 Thread Christoph Hellwig
On Mon, Feb 22, 2021 at 02:29:37PM -0500, Konrad Rzeszutek Wilk wrote:
> > 'max_slots' should be 'unsigned long' here. Breaks SWIOTLB on RPi4. Do you 
> > want
> > me to send a fix or you prefer editing the patch?
> 
> I can roll it in. Thx!

Thanks to both of you!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/tegra-smmu: Fix mc errors on tegra124-nyan

2021-02-22 Thread Dmitry Osipenko
23.02.2021 05:13, Nicolin Chen пишет:
> Hi Dmitry,
> 
> On Sat, Feb 20, 2021 at 08:16:22AM +0300, Dmitry Osipenko wrote:
>> 19.02.2021 01:07, Nicolin Chen пишет:
>>> Commit 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
>>> removed certain hack in the tegra_smmu_probe() by relying on IOMMU core to
>>> of_xlate SMMU's SID per device, so as to get rid of tegra_smmu_find() and
>>> tegra_smmu_configure() that are typically done in the IOMMU core also.
>>>
>>> This approach works for both existing devices that have DT nodes and other
>>> devices (like PCI device) that don't exist in DT, on Tegra210 and Tegra3
>>> upon testing. However, Page Fault errors are reported on tegra124-Nyan:
>>>
>>>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>>>  EMEM address decode error (SMMU translation error [--S])
>>>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>>>  Page fault (SMMU translation error [--S])
>>>
>>> After debugging, I found that the mentioned commit changed some function
>>> callback sequence of tegra-smmu's, resulting in enabling SMMU for display
>>> client before display driver gets initialized. I couldn't reproduce exact
>>> same issue on Tegra210 as Tegra124 (arm-32) differs at arch-level code.
>>
>> Hello Nicolin,
>>
>> Could you please explain in a more details what exactly makes the
>> difference for the callback sequence?
> 
> Here is a log with 5.11.0-rc6:
> https://lava.collabora.co.uk/scheduler/job/3187849
> [dump_stack was added in some tegra-smmu functions]
> 
> And here is a corresponding log with reverting the original commit:
> https://lava.collabora.co.uk/scheduler/job/3187851
> 
> Here is a log with 5.11.0-rc7-next-20210210:
> https://lava.collabora.co.uk/scheduler/job/3210245
> 
> And here is a corresponding log with reverting the original commit:
> https://lava.collabora.co.uk/scheduler/job/3210596
> 
> Both failing logs show that mc errors started right after client DC
> got enabled by ->attach_dev() callback that in the passing logs was
> not called until Host1x driver init. And note that two failing logs
> show that ->attach_dev() could be called from two different sources,
> of_dma_configure_id() or arch_setup_dma_ops().
> 
> The reason why ->attach_dev() gets called is probably related to the
> following reasons (sorry, can't be 100% sure as I don't have Tegra124
> or other 32bit Tegra board to test):
> 1) With the commit reverted, all clients are probed in "arch" stage,
>which is even prior to iommu core initialization -- including it
>setting default domain type. This probably messed up the type of
>allocating domains against the default domain type. Also internal
>group is somehow affected. So some condition check in iommu core
>failed and then it bypassed ->attach_dev callback in really_probe
>stage, until Host1x driver does attach_dev again.
> 
> 2) 32bit ARM has arch_setup_dma_ops() does an additional set of iommu
>domain allocation + attach_dev(), after of_dma_configure_id() did
>once. This isn't reproducible for me on Tegra210.
> 
> As debugging online isn't very efficient, and given that Thierry has
> been working on the linear mapping of framebuffer carveout, I choose
> to partially revert as a quick fix.

The partially revert should be okay, but it's not clear to me what makes
difference for T124 since I don't see that problem on T30, which also
has active display at a boot time.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/tegra-smmu: Fix mc errors on tegra124-nyan

2021-02-22 Thread Nicolin Chen
Hi Dmitry,

On Sat, Feb 20, 2021 at 08:16:22AM +0300, Dmitry Osipenko wrote:
> 19.02.2021 01:07, Nicolin Chen пишет:
> > Commit 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
> > removed certain hack in the tegra_smmu_probe() by relying on IOMMU core to
> > of_xlate SMMU's SID per device, so as to get rid of tegra_smmu_find() and
> > tegra_smmu_configure() that are typically done in the IOMMU core also.
> > 
> > This approach works for both existing devices that have DT nodes and other
> > devices (like PCI device) that don't exist in DT, on Tegra210 and Tegra3
> > upon testing. However, Page Fault errors are reported on tegra124-Nyan:
> > 
> >   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
> >  EMEM address decode error (SMMU translation error [--S])
> >   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
> >  Page fault (SMMU translation error [--S])
> > 
> > After debugging, I found that the mentioned commit changed some function
> > callback sequence of tegra-smmu's, resulting in enabling SMMU for display
> > client before display driver gets initialized. I couldn't reproduce exact
> > same issue on Tegra210 as Tegra124 (arm-32) differs at arch-level code.
> 
> Hello Nicolin,
> 
> Could you please explain in a more details what exactly makes the
> difference for the callback sequence?

Here is a log with 5.11.0-rc6:
https://lava.collabora.co.uk/scheduler/job/3187849
[dump_stack was added in some tegra-smmu functions]

And here is a corresponding log with reverting the original commit:
https://lava.collabora.co.uk/scheduler/job/3187851

Here is a log with 5.11.0-rc7-next-20210210:
https://lava.collabora.co.uk/scheduler/job/3210245

And here is a corresponding log with reverting the original commit:
https://lava.collabora.co.uk/scheduler/job/3210596

Both failing logs show that mc errors started right after client DC
got enabled by ->attach_dev() callback that in the passing logs was
not called until Host1x driver init. And note that two failing logs
show that ->attach_dev() could be called from two different sources,
of_dma_configure_id() or arch_setup_dma_ops().

The reason why ->attach_dev() gets called is probably related to the
following reasons (sorry, can't be 100% sure as I don't have Tegra124
or other 32bit Tegra board to test):
1) With the commit reverted, all clients are probed in "arch" stage,
   which is even prior to iommu core initialization -- including it
   setting default domain type. This probably messed up the type of
   allocating domains against the default domain type. Also internal
   group is somehow affected. So some condition check in iommu core
   failed and then it bypassed ->attach_dev callback in really_probe
   stage, until Host1x driver does attach_dev again.

2) 32bit ARM has arch_setup_dma_ops() does an additional set of iommu
   domain allocation + attach_dev(), after of_dma_configure_id() did
   once. This isn't reproducible for me on Tegra210.

As debugging online isn't very efficient, and given that Thierry has
been working on the linear mapping of framebuffer carveout, I choose
to partially revert as a quick fix.

Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH RFC v1 5/6] xen-swiotlb: convert variables to arrays

2021-02-22 Thread Stefano Stabellini
On Fri, 19 Feb 2021, Konrad Rzeszutek Wilk wrote:
> On Sun, Feb 07, 2021 at 04:56:01PM +0100, Christoph Hellwig wrote:
> > On Thu, Feb 04, 2021 at 09:40:23AM +0100, Christoph Hellwig wrote:
> > > So one thing that has been on my mind for a while:  I'd really like
> > > to kill the separate dma ops in Xen swiotlb.  If we compare xen-swiotlb
> > > to swiotlb the main difference seems to be:
> > > 
> > >  - additional reasons to bounce I/O vs the plain DMA capable
> > >  - the possibility to do a hypercall on arm/arm64
> > >  - an extra translation layer before doing the phys_to_dma and vice
> > >versa
> > >  - an special memory allocator
> > > 
> > > I wonder if inbetween a few jump labels or other no overhead enablement
> > > options and possibly better use of the dma_range_map we could kill
> > > off most of swiotlb-xen instead of maintaining all this code duplication?
> > 
> > So I looked at this a bit more.
> > 
> > For x86 with XENFEAT_auto_translated_physmap (how common is that?)
> 
> Juergen, Boris please correct me if I am wrong, but that 
> XENFEAT_auto_translated_physmap
> only works for PVH guests?

ARM is always XENFEAT_auto_translated_physmap


> > pfn_to_gfn is a nop, so plain phys_to_dma/dma_to_phys do work as-is.
> > 
> > xen_arch_need_swiotlb always returns true for x86, and
> > range_straddles_page_boundary should never be true for the
> > XENFEAT_auto_translated_physmap case.
> 
> Correct. The kernel should have no clue of what the real MFNs are
> for PFNs.

On ARM, Linux knows the MFNs because for local pages MFN == PFN and for
foreign pages it keeps track in arch/arm/xen/p2m.c. More on this below.

xen_arch_need_swiotlb only returns true on ARM in rare situations where
bouncing on swiotlb buffers is required. Today it only happens on old
versions of Xen that don't support the cache flushing hypercall but
there could be more cases in the future.


> > 
> > So as far as I can tell the mapping fast path for the
> > XENFEAT_auto_translated_physmap can be trivially reused from swiotlb.
> > 
> > That leaves us with the next more complicated case, x86 or fully cache
> > coherent arm{,64} without XENFEAT_auto_translated_physmap.  In that case
> > we need to patch in a phys_to_dma/dma_to_phys that performs the MFN
> > lookup, which could be done using alternatives or jump labels.
> > I think if that is done right we should also be able to let that cover
> > the foreign pages in is_xen_swiotlb_buffer/is_swiotlb_buffer, but
> > in that worst case that would need another alternative / jump label.
> > 
> > For non-coherent arm{,64} we'd also need to use alternatives or jump
> > labels to for the cache maintainance ops, but that isn't a hard problem
> > either.

With the caveat that ARM is always XENFEAT_auto_translated_physmap, what
you wrote looks correct. I am writing down a brief explanation on how
swiotlb-xen is used on ARM.


pfn: address as seen by the guest, pseudo-physical address in ARM terminology
mfn (or bfn): real address, physical address in ARM terminology


On ARM dom0 is auto_translated (so Xen sets up the stage2 translation
in the MMU) and the translation is 1:1. So pfn == mfn for Dom0.

However, when another domain shares a page with Dom0, that page is not
1:1. Swiotlb-xen is used to retrieve the mfn for the foreign page at
xen_swiotlb_map_page. It does that with xen_phys_to_bus -> pfn_to_bfn.
It is implemented with a rbtree in arch/arm/xen/p2m.c.

In addition, swiotlb-xen is also used to cache-flush the page via
hypercall at xen_swiotlb_unmap_page. That is done because dev_addr is
really the mfn at unmap_page and we don't know the pfn for it. We can do
pfn-to-mfn but we cannot do mfn-to-pfn (there are good reasons for it
unfortunately). The only way to cache-flush by mfn is by issuing a
hypercall. The hypercall is implemented in arch/arm/xen/mm.c.

The pfn != bfn and pfn_valid() checks are used to detect if the page is
local (of dom0) or foreign; they work thanks to the fact that Dom0 is
1:1 mapped.


Getting back to what you wrote, yes if we had a way to do MFN lookups in
phys_to_dma, and a way to call the hypercall at unmap_page if the page
is foreign (e.g. if it fails a pfn_valid check) then I think we would be
good from an ARM perspective. The only exception is when
xen_arch_need_swiotlb returns true, in which case we need to actually
bounce on swiotlb buffers.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/9] swiotlb: refactor swiotlb_tbl_map_single

2021-02-22 Thread Konrad Rzeszutek Wilk
> > +static int find_slots(struct device *dev, size_t alloc_size)
> > +{
> > +   unsigned long boundary_mask = dma_get_seg_boundary(dev);
> > +   dma_addr_t tbl_dma_addr =
> > +   phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
> > +   unsigned int max_slots = get_max_slots(boundary_mask);
> 
> 'max_slots' should be 'unsigned long' here. Breaks SWIOTLB on RPi4. Do you 
> want
> me to send a fix or you prefer editing the patch?

I can roll it in. Thx!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 6/9] swiotlb: refactor swiotlb_tbl_map_single

2021-02-22 Thread Nicolas Saenz Julienne
Hi Christoph,

On Sun, 2021-02-07 at 17:03 +0100, Christoph Hellwig wrote:
> Split out a bunch of a self-contained helpers to make the function easier
> to follow.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/swiotlb.c | 179 +--
>  1 file changed, 89 insertions(+), 90 deletions(-)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index b38b1553c4663a..381c24ef1ac1d0 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,134 +468,133 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
> phys_addr_t tlb_addr,
>   }
>  }
>  
> 
> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t 
> orig_addr,
> - size_t mapping_size, size_t alloc_size,
> - enum dma_data_direction dir, unsigned long attrs)
> -{
> - dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
> - unsigned long flags;
> - phys_addr_t tlb_addr;
> - unsigned int nslots, stride, index, wrap;
> - int i;
> - unsigned long mask;
> - unsigned long offset_slots;
> - unsigned long max_slots;
> - unsigned long tmp_io_tlb_used;
> -
> - if (no_iotlb_memory)
> - panic("Can not allocate SWIOTLB buffer earlier and can't now 
> provide you with the DMA bounce buffer");
> -
> - if (mem_encrypt_active())
> - pr_warn_once("Memory encryption is active and system is using 
> DMA bounce buffers\n");
> +#define slot_addr(start, idx)((start) + ((idx) << IO_TLB_SHIFT))
>  
> 
> - if (mapping_size > alloc_size) {
> - dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: 
> %zd bytes)",
> -   mapping_size, alloc_size);
> - return (phys_addr_t)DMA_MAPPING_ERROR;
> - }
> -
> - mask = dma_get_seg_boundary(hwdev);
> +/*
> + * Carefully handle integer overflow which can occur when boundary_mask == 
> ~0UL.
> + */
> +static inline unsigned long get_max_slots(unsigned long boundary_mask)
> +{
> + if (boundary_mask == ~0UL)
> + return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> + return nr_slots(boundary_mask + 1);
> +}
>  
> 
> - tbl_dma_addr &= mask;
> +static unsigned int wrap_index(unsigned int index)
> +{
> + if (index >= io_tlb_nslabs)
> + return 0;
> + return index;
> +}
>  
> 
> - offset_slots = nr_slots(tbl_dma_addr);
> +/*
> + * Find a suitable number of IO TLB entries size that will fit this request 
> and
> + * allocate a buffer from that IO TLB pool.
> + */
> +static int find_slots(struct device *dev, size_t alloc_size)
> +{
> + unsigned long boundary_mask = dma_get_seg_boundary(dev);
> + dma_addr_t tbl_dma_addr =
> + phys_to_dma_unencrypted(dev, io_tlb_start) & boundary_mask;
> + unsigned int max_slots = get_max_slots(boundary_mask);

'max_slots' should be 'unsigned long' here. Breaks SWIOTLB on RPi4. Do you want
me to send a fix or you prefer editing the patch?

Regards,
Nicolas



signature.asc
Description: This is a digitally signed message part
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [git pull] IOMMU Updates for Linux v5.12

2021-02-22 Thread pr-tracker-bot
The pull request you sent on Mon, 22 Feb 2021 17:17:35 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
> tags/iommu-updates-v5.12

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/d652ea30ba32db12fe8365182fad5ba2e7c22822

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/amd: Fix event counter availability check

2021-02-22 Thread Suravee Suthikulpanit

This fix has been accepted in the upstream recently.

https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?h=x86/amd

Could you please give this a try?

Thanks,
Suravee

On 2/21/21 8:49 PM, Paul Menzel wrote:

Dear Suravee,


Am 17.09.20 um 19:55 schrieb Alexander Monakov:

On Tue, 16 Jun 2020, Suravee Suthikulpanit wrote:


Instead of blindly moving the code around to a spot that would just work,
I am trying to understand what might be required here. In this case,
the init_device_table_dma()should not be needed. I suspect it's the IOMMU
invalidate all command that's also needed here.

I'm also checking with the HW and BIOS team. Meanwhile, could you please
give
the following change a try:

Hello. Can you give any update please?


[…]


Sorry for late reply. I have a reproducer and working with the HW team to
understand the issue.
I should be able to provide update with solution by the end of this week.


Hello, hope you are doing well. Has this investigation found anything?


I am wondering the same. It’d be great to have this fixed in the upstream Linux 
kernel.


Kind regards,

Paul

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 2/4] iommu/vt-d: Enable write protect propagation from guest

2021-02-22 Thread Jacob Pan
Hi Kevin,

On Sat, 20 Feb 2021 02:38:02 +, "Tian, Kevin" 
wrote:

> > From: Jacob Pan 
> > Sent: Saturday, February 20, 2021 1:09 AM
> > 
> > Hi Kevin,
> > 
> > On Fri, 19 Feb 2021 06:19:04 +, "Tian, Kevin" 
> > wrote:
> >   
> > > > From: Jacob Pan 
> > > > Sent: Friday, February 19, 2021 5:31 AM
> > > >
> > > > Write protect bit, when set, inhibits supervisor writes to the
> > > > read-only pages. In guest supervisor shared virtual addressing
> > > > (SVA), write-protect should be honored upon guest bind supervisor
> > > > PASID request.
> > > >
> > > > This patch extends the VT-d portion of the IOMMU UAPI to include WP
> > > > bit. WPE bit of the  supervisor PASID entry will be set to match
> > > > CPU CR0.WP bit.
> > > >
> > > > Signed-off-by: Sanjay Kumar 
> > > > Signed-off-by: Jacob Pan 
> > > > ---
> > > >  drivers/iommu/intel/pasid.c | 5 +
> > > >  include/uapi/linux/iommu.h  | 3 ++-
> > > >  2 files changed, 7 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/iommu/intel/pasid.c
> > > > b/drivers/iommu/intel/pasid.c index 0b7e0e726ade..c7a2ec930af4
> > > > 100644 --- a/drivers/iommu/intel/pasid.c
> > > > +++ b/drivers/iommu/intel/pasid.c
> > > > @@ -763,6 +763,11 @@ intel_pasid_setup_bind_data(struct  
> > intel_iommu  
> > > > *iommu, struct pasid_entry *pte,
> > > > return -EINVAL;
> > > > }
> > > > pasid_set_sre(pte);
> > > > +   /* Enable write protect WP if guest requested */
> > > > +   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_WPE) {
> > > > +   if (pasid_enable_wpe(pte))
> > > > +   return -EINVAL;  
> > >
> > > We should call pasid_set_wpe directly, as this binding is about guest
> > > page table and suppose the guest has done whatever check required
> > > (e.g. gcr0.wp) before setting this bit. pasid_enable_wpe has an
> > > additional check on host cr0.wp thus is logically incorrect here.
> > >  
> > If the host CPU does not support WP, can guest VCPU still support WP? If
> > so, I agree.
> >   
> 
> If you change 'support' to 'enable', then the answer is yes.

I agree, thanks for explaining. Will change it to pasid_set_wpe.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] iommu/vt-d: Enable write protect for supervisor SVM

2021-02-22 Thread Jacob Pan
Hi Lu,

On Sat, 20 Feb 2021 09:56:26 +0800, Lu Baolu 
wrote:

> Hi Jacob and Sanjay,
> 
> On 2/19/21 5:31 AM, Jacob Pan wrote:
> > Write protect bit, when set, inhibits supervisor writes to the read-only
> > pages. In supervisor shared virtual addressing (SVA), where page tables
> > are shared between CPU and DMA, IOMMU PASID entry WPE bit should match
> > CR0.WP bit in the CPU.
> > This patch sets WPE bit for supervisor PASIDs if CR0.WP is set.  
> 
>  From reading the commit message, the intention of this patch is to match
> PASID entry WPE bith with CPU CR0.WP if 1) SRE is set (supervisor
> pasid); 2) page table is shared between CPU and IOMMU. Do I understand
> it right?
> 
yes. that is my intention.

> But what the real code doing is failing pasid entry setup for first
> level translation if CPU CR0.WP is not set. It's not consistent with
> what described above.
> 
> What I am thinking is that, as long as SRE is set, we should always set
> WPE in intel_pasid_setup_first_level(). For supervisor SVA case, we
> should check CPU CR0.WP in intel_svm_bind_mm() and abort binding if
> CR0.WP is not set.
> 
> Thought?
> 
This code only affects supervisor SVA, since PASID_FLAG_SUPERVISOR_MODE
flag is not set for FL IOVA.

> Best regards,
> baolu
> 
> > 
> > Signed-off-by: Sanjay Kumar 
> > Signed-off-by: Jacob Pan 
> > ---
> >   drivers/iommu/intel/pasid.c | 26 ++
> >   1 file changed, 26 insertions(+)
> > 
> > diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> > index 0cceaabc3ce6..0b7e0e726ade 100644
> > --- a/drivers/iommu/intel/pasid.c
> > +++ b/drivers/iommu/intel/pasid.c
> > @@ -410,6 +410,15 @@ static inline void pasid_set_sre(struct
> > pasid_entry *pe) pasid_set_bits(&pe->val[2], 1 << 0, 1);
> >   }
> >   
> > +/*
> > + * Setup the WPE(Write Protect Enable) field (Bit 132) of a
> > + * scalable mode PASID entry.
> > + */
> > +static inline void pasid_set_wpe(struct pasid_entry *pe)
> > +{
> > +   pasid_set_bits(&pe->val[2], 1 << 4, 1 << 4);
> > +}
> > +
> >   /*
> >* Setup the P(Present) field (Bit 0) of a scalable mode PASID
> >* entry.
> > @@ -553,6 +562,20 @@ static void pasid_flush_caches(struct intel_iommu
> > *iommu, }
> >   }
> >   
> > +static inline int pasid_enable_wpe(struct pasid_entry *pte)
> > +{
> > +   unsigned long cr0 = read_cr0();
> > +
> > +   /* CR0.WP is normally set but just to be sure */
> > +   if (unlikely(!(cr0 & X86_CR0_WP))) {
> > +   pr_err_ratelimited("No CPU write protect!\n");
> > +   return -EINVAL;
> > +   }
> > +   pasid_set_wpe(pte);
> > +
> > +   return 0;
> > +};
> > +
> >   /*
> >* Set up the scalable mode pasid table entry for first only
> >* translation type.
> > @@ -584,6 +607,9 @@ int intel_pasid_setup_first_level(struct
> > intel_iommu *iommu, return -EINVAL;
> > }
> > pasid_set_sre(pte);
> > +   if (pasid_enable_wpe(pte))
> > +   return -EINVAL;
> > +
> > }
> >   
> > if (flags & PASID_FLAG_FL5LP) {
> >   


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[git pull] IOMMU Updates for Linux v5.12

2021-02-22 Thread Joerg Roedel
Hi Linus,

The following changes since commit 92bf22614b21a2706f4993b278017e437f7785b3:

  Linux 5.11-rc7 (2021-02-07 13:57:38 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
tags/iommu-updates-v5.12

for you to fetch changes up to 45e606f2726926b04094e1c9bf809bca4884c57f:

  Merge branches 'arm/renesas', 'arm/smmu', 'x86/amd', 'x86/vt-d' and 'core' 
into next (2021-02-12 15:27:17 +0100)


IOMMU Updates for Linux v5.12

Including:

- ARM SMMU and Mediatek updates from Will Deacon:

- Support for MT8192 IOMMU from Mediatek

- Arm v7s io-pgtable extensions for MT8192

- Removal of TLBI_ON_MAP quirk

- New Qualcomm compatible strings

- Allow SVA without hardware broadcast TLB maintenance
  on SMMUv3

- Virtualization Host Extension support for SMMUv3 (SVA)

- Allow SMMUv3 PMU (perf) driver to be built
  independently from IOMMU

- Some tidy-up in IOVA and core code

- Conversion of the AMD IOMMU code to use the generic
  IO-page-table framework

- Intel VT-d updates from Lu Baolu:

- Audit capability consistency among different IOMMUs

- Add SATC reporting structure support

- Add iotlb_sync_map callback support

- SDHI Support for Renesas IOMMU driver

- Misc Cleanups and other small improvments


Adrian Huang (1):
  iommu/amd: Remove unnecessary assignment

Bjorn Andersson (2):
  dt-bindings: arm-smmu-qcom: Add Qualcomm SC8180X compatible
  iommu/arm-smmu-qcom: Add Qualcomm SC8180X impl

Bjorn Helgaas (1):
  iommu/vt-d: Fix 'physical' typos

Colin Ian King (1):
  iommu/mediatek: Fix unsigned domid comparison with less than zero

Dan Carpenter (1):
  iommu/mediatek: Fix error code in probe()

Douglas Anderson (1):
  iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping

Isaac J. Manjarres (1):
  iommu/arm-smmu-qcom: Fix mask extraction for bootloader programmed SMRs

Jean-Philippe Brucker (3):
  iommu/arm-smmu-v3: Split arm_smmu_tlb_inv_range()
  iommu/arm-smmu-v3: Make BTM optional for SVA
  iommu/arm-smmu-v3: Add support for VHE

Joerg Roedel (2):
  Merge tag 'arm-smmu-updates' of git://git.kernel.org/.../will/linux into 
arm/smmu
  Merge branches 'arm/renesas', 'arm/smmu', 'x86/amd', 'x86/vt-d' and 
'core' into next

John Garry (7):
  iova: Make has_iova_flush_queue() private
  iova: Delete copy_reserved_iova()
  iova: Stop exporting some more functions
  iommu: Stop exporting iommu_map_sg_atomic()
  iommu: Delete iommu_domain_window_disable()
  iommu: Delete iommu_dev_has_feature()
  driver/perf: Remove ARM_SMMU_V3_PMU dependency on ARM_SMMU_V3

Kyung Min Park (2):
  iommu/vt-d: Audit IOMMU Capabilities and add helper functions
  iommu/vt-d: Move capability check code to cap_audit files

Lianbo Jiang (2):
  dma-iommu: use static-key to minimize the impact in the fast-path
  iommu: use the __iommu_attach_device() directly for deferred attach

Lu Baolu (7):
  iommu/vt-d: Consolidate duplicate cache invaliation code
  iommu/vt-d: Add qi_submit trace event
  iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
  iommu/vt-d: Clear PRQ overflow only when PRQ is empty
  iommu/vt-d: Use INVALID response code instead of FAILURE
  iommu/vt-d: Fix compile error [-Werror=implicit-function-declaration]
  iommu/vt-d: Add iotlb_sync_map callback

Lukas Bulwahn (1):
  MAINTAINERS: repair file pattern in MEDIATEK IOMMU DRIVER

Robin Murphy (3):
  iommu/arm-smmu-v3: Remove the page 1 fixup
  iommu/msm: Hook up iotlb_sync_map
  iommu/io-pgtable: Remove TLBI_ON_MAP quirk

Suravee Suthikulpanit (14):
  iommu/amd: Re-define amd_iommu_domain_encode_pgtable as inline
  iommu/amd: Prepare for generic IO page table framework
  iommu/amd: Move pt_root to struct amd_io_pgtable
  iommu/amd: Convert to using amd_io_pgtable
  iommu/amd: Declare functions as extern
  iommu/amd: Move IO page table related functions
  iommu/amd: Restructure code for freeing page table
  iommu/amd: Remove amd_iommu_domain_get_pgtable
  iommu/amd: Rename variables to be consistent with struct io_pgtable_ops
  iommu/amd: Refactor fetch_pte to use struct amd_io_pgtable
  iommu/amd: Introduce iommu_v1_iova_to_phys
  iommu/amd: Introduce iommu_v1_map_page and iommu_v1_unmap_page
  iommu/amd: Adopt IO page table framework for AMD IOMMU v1 page table
  iommu/amd: Fix performance counter initialization

Tom Rix (1):
  iommu/amd: remove h from printk format specifier

Vinod Koul (2):
  dt-bindings: arm-smmu: Add sm8

Re: [PATCH v11 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-02-22 Thread Auger Eric
Hi Keqian,

On 2/22/21 1:20 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2021/2/22 18:53, Auger Eric wrote:
>> Hi Keqian,
>>
>> On 2/2/21 1:34 PM, Keqian Zhu wrote:
>>> Hi Eric,
>>>
>>> On 2020/11/16 19:00, Eric Auger wrote:
 From: "Liu, Yi L" 

 This patch adds an VFIO_IOMMU_SET_PASID_TABLE ioctl
 which aims to pass the virtual iommu guest configuration
 to the host. This latter takes the form of the so-called
 PASID table.

 Signed-off-by: Jacob Pan 
 Signed-off-by: Liu, Yi L 
 Signed-off-by: Eric Auger 

 ---
 v11 -> v12:
 - use iommu_uapi_set_pasid_table
 - check SET and UNSET are not set simultaneously (Zenghui)

 v8 -> v9:
 - Merge VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE into a single
   VFIO_IOMMU_SET_PASID_TABLE ioctl.

 v6 -> v7:
 - add a comment related to VFIO_IOMMU_DETACH_PASID_TABLE

 v3 -> v4:
 - restore ATTACH/DETACH
 - add unwind on failure

 v2 -> v3:
 - s/BIND_PASID_TABLE/SET_PASID_TABLE

 v1 -> v2:
 - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
 - remove the struct device arg
 ---
  drivers/vfio/vfio_iommu_type1.c | 65 +
  include/uapi/linux/vfio.h   | 19 ++
  2 files changed, 84 insertions(+)

 diff --git a/drivers/vfio/vfio_iommu_type1.c 
 b/drivers/vfio/vfio_iommu_type1.c
 index 67e827638995..87ddd9e882dc 100644
 --- a/drivers/vfio/vfio_iommu_type1.c
 +++ b/drivers/vfio/vfio_iommu_type1.c
 @@ -2587,6 +2587,41 @@ static int vfio_iommu_iova_build_caps(struct 
 vfio_iommu *iommu,
return ret;
  }
  
 +static void
 +vfio_detach_pasid_table(struct vfio_iommu *iommu)
 +{
 +  struct vfio_domain *d;
 +
 +  mutex_lock(&iommu->lock);
 +  list_for_each_entry(d, &iommu->domain_list, next)
 +  iommu_detach_pasid_table(d->domain);
 +
 +  mutex_unlock(&iommu->lock);
 +}
 +
 +static int
 +vfio_attach_pasid_table(struct vfio_iommu *iommu, unsigned long arg)
 +{
 +  struct vfio_domain *d;
 +  int ret = 0;
 +
 +  mutex_lock(&iommu->lock);
 +
 +  list_for_each_entry(d, &iommu->domain_list, next) {
 +  ret = iommu_uapi_attach_pasid_table(d->domain, (void __user 
 *)arg);
>>> This design is not very clear to me. This assumes all iommu_domains share 
>>> the same pasid table.
>>>
>>> As I understand, it's reasonable when there is only one group in the 
>>> domain, and only one domain in the vfio_iommu.
>>> If more than one group in the vfio_iommu, the guest may put them into 
>>> different guest iommu_domain, then they have different pasid table.
>>>
>>> Is this the use scenario?
>>
>> the vfio_iommu is attached to a container. all the groups within a
>> container share the same set of page tables (linux
>> Documentation/driver-api/vfio.rst). So to me if you want to use
>> different pasid tables, the groups need to be attached to different
>> containers. Does that make sense to you?
> OK, so this is what I understand about the design. A little question is that 
> when
> we perform attach_pasid_table on a container, maybe we ought to do a sanity
> check to make sure that only one group is in this container, instead of
> iterating all domain?
> 
> To be frank, my main concern is that if we put each group into different 
> container
> under nested mode, then we give up the possibility that they can share stage2 
> page tables,
> which saves host memory and reduces the time of preparing environment for VM.

Referring to the QEMU integration, when you use a virtual IOMMU, there
is generally one VFIO container per viommu protected device
(AddressSpace), independently on the fact nested stage is being used. I
think the exception is if you put 2 assigned devices behind a virtual
PCIe to PCI bridge (pcie-pci-bridge), in that case they have the same
RID, they share the same QEMU AddressSpace and they are put in the same
container, if the kernel does not reject it (underlying pIOMMUs allow
it). See QEMU vfio_connect_container() in hw/vfio/common.c.

In that config, if the assigned devices belong to different groups, you
may end up with 2 groups set to the same container. But this case is not
supported by the guest kernel anyway (independently on the nested stage
integration). You hit a BUG_ON as reported a long time ago in

https://www.mail-archive.com/qemu-devel@nongnu.org/msg608047.html


> 
> To me, I'd like to understand the "container shares page table" to be:
> 1) share stage2 page table under nested mode.
under nested mode they share S2 and with this design devices also share
the same PASID table. Because on the guest they are in the same group.
> 2) share stage1 page table under non-nested mode.
in non nested mode there is a single stage, by default S1.
> 
> As when we perform "map" on a container:
> 1) under nested mode, we setup stage2 mapping.
> 2) under non-nested mode, we setup s

[RFC PATCH 5/5] KVM: arm64: Make sure pinned vmid is released on VM exit

2021-02-22 Thread Shameer Kolothum
Since the pinned VMID space is not recycled, we need to make sure that
we release the vmid back into the pool when we are done with it.

Signed-off-by: Shameer Kolothum 
---
 arch/arm64/kvm/arm.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 8955968be49f..d9900ffb88f4 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -181,8 +181,16 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_vgic_destroy(kvm);
 
for (i = 0; i < KVM_MAX_VCPUS; ++i) {
-   if (kvm->vcpus[i]) {
-   kvm_vcpu_destroy(kvm->vcpus[i]);
+   struct kvm_vcpu *vcpu = kvm->vcpus[i];
+
+   if (vcpu) {
+   struct kvm_vmid *vmid = &vcpu->arch.hw_mmu->vmid;
+
+   if (refcount_read(&vmid->pinned)) {
+   ida_free(&kvm_pinned_vmids, vmid->vmid);
+   refcount_set(&vmid->pinned, 0);
+   }
+   kvm_vcpu_destroy(vcpu);
kvm->vcpus[i] = NULL;
}
}
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 3/5] KVM: ARM64: Add support for pinned VMIDs

2021-02-22 Thread Shameer Kolothum
On an ARM64 system with a SMMUv3 implementation that fully supports
Broadcast TLB Maintenance(BTM) feature, the CPU TLB invalidate
instructions are received by SMMU. This is very useful when the
SMMU shares the page tables with the CPU(eg: Guest SVA use case).
For this to work, the SMMU must use the same VMID that is allocated
by KVM to configure the stage 2 translations.

At present KVM VMID allocations are recycled on rollover and may
change as a result. This will create issues if we have to share
the KVM VMID with SMMU. Hence, we spilt the KVM VMID space into
two, the first half follows the normal recycle on rollover policy
while the second half of the VMID pace is used to allocate pinned
VMIDs. This feature is enabled based on a command line option
"kvm-arm.pinned_vmid_enable".

Signed-off-by: Shameer Kolothum 
---
 arch/arm64/include/asm/kvm_host.h |   2 +
 arch/arm64/kvm/Kconfig|   1 +
 arch/arm64/kvm/arm.c  | 104 +-
 3 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 0cd9f0f75c13..db6441c6a580 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
@@ -65,6 +66,7 @@ struct kvm_vmid {
/* The VMID generation used for the virt. memory system */
u64vmid_gen;
u32vmid;
+   refcount_t   pinned;
 };
 
 struct kvm_s2_mmu {
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 043756db8f6e..c5c52953e842 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -40,6 +40,7 @@ menuconfig KVM
select HAVE_KVM_VCPU_RUN_PID_CHANGE
select TASKSTATS
select TASK_DELAY_ACCT
+   select HAVE_KVM_PINNED_VMID
help
  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c0ffb019ca8b..8955968be49f 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -56,6 +56,19 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
 static u32 kvm_next_vmid;
 static DEFINE_SPINLOCK(kvm_vmid_lock);
 
+static bool kvm_pinned_vmid_enable;
+
+static int __init early_pinned_vmid_enable(char *buf)
+{
+   return strtobool(buf, &kvm_pinned_vmid_enable);
+}
+
+early_param("kvm-arm.pinned_vmid_enable", early_pinned_vmid_enable);
+
+static DEFINE_IDA(kvm_pinned_vmids);
+static u32 kvm_pinned_vmid_start;
+static u32 kvm_pinned_vmid_end;
+
 static bool vgic_present;
 
 static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled);
@@ -475,6 +488,10 @@ void force_vm_exit(const cpumask_t *mask)
 static bool need_new_vmid_gen(struct kvm_vmid *vmid)
 {
u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen);
+
+   if (refcount_read(&vmid->pinned))
+   return false;
+
smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */
return unlikely(READ_ONCE(vmid->vmid_gen) != current_vmid_gen);
 }
@@ -485,6 +502,8 @@ static bool need_new_vmid_gen(struct kvm_vmid *vmid)
  */
 static void update_vmid(struct kvm_vmid *vmid)
 {
+   unsigned int vmid_bits;
+
if (!need_new_vmid_gen(vmid))
return;
 
@@ -521,7 +540,12 @@ static void update_vmid(struct kvm_vmid *vmid)
 
vmid->vmid = kvm_next_vmid;
kvm_next_vmid++;
-   kvm_next_vmid &= (1 << kvm_get_vmid_bits()) - 1;
+   if (kvm_pinned_vmid_enable)
+   vmid_bits = kvm_get_vmid_bits() - 1;
+   else
+   vmid_bits = kvm_get_vmid_bits();
+
+   kvm_next_vmid &= (1 << vmid_bits) - 1;
 
smp_wmb();
WRITE_ONCE(vmid->vmid_gen, atomic64_read(&kvm_vmid_gen));
@@ -569,6 +593,71 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
return ret;
 }
 
+int kvm_arch_pinned_vmid_get(struct kvm *kvm)
+{
+   struct kvm_vcpu *vcpu;
+   struct kvm_vmid *kvm_vmid;
+   int ret;
+
+   if (!kvm_pinned_vmid_enable || !atomic_read(&kvm->online_vcpus))
+   return -EINVAL;
+
+   vcpu = kvm_get_vcpu(kvm, 0);
+   if (!vcpu)
+   return -EINVAL;
+
+   kvm_vmid = &vcpu->arch.hw_mmu->vmid;
+
+   spin_lock(&kvm_vmid_lock);
+
+   if (refcount_inc_not_zero(&kvm_vmid->pinned)) {
+   spin_unlock(&kvm_vmid_lock);
+   return kvm_vmid->vmid;
+   }
+
+   ret = ida_alloc_range(&kvm_pinned_vmids, kvm_pinned_vmid_start,
+ kvm_pinned_vmid_end, GFP_KERNEL);
+   if (ret < 0) {
+   spin_unlock(&kvm_vmid_lock);
+   return ret;
+   }
+
+   force_vm_exit(cpu_all_mask);
+   kvm_call_hyp(__kvm_flush_vm_context);
+
+   kvm_vmid->vmid = (u32)ret;
+   refcount_set(&kvm_vmid->pinned, 1);
+   spin_unlock(&kvm_vmid_lock);
+
+   return ret;
+}
+
+int kvm_arch_pinned_vmid_put(struct kvm *kvm)
+{
+   struc

[RFC PATCH 4/5] iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM

2021-02-22 Thread Shameer Kolothum
If the SMMU supports BTM and the device belongs to NESTED domain
with shared pasid table, we need to use the VMID allocated by the
KVM for the s2 configuration. Hence, request a pinned VMID from KVM.

Signed-off-by: Shameer Kolothum 
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 49 -
 1 file changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 26bf7da1bcd0..04f83f7c8319 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -2195,6 +2196,33 @@ static void arm_smmu_bitmap_free(unsigned long *map, int 
idx)
clear_bit(idx, map);
 }
 
+static int arm_smmu_pinned_vmid_get(struct arm_smmu_domain *smmu_domain)
+{
+   struct arm_smmu_master *master;
+
+   master = list_first_entry_or_null(&smmu_domain->devices,
+ struct arm_smmu_master, domain_head);
+   if (!master)
+   return -EINVAL;
+
+   return kvm_pinned_vmid_get(master->dev);
+}
+
+static int arm_smmu_pinned_vmid_put(struct arm_smmu_domain *smmu_domain)
+{
+   struct arm_smmu_master *master;
+
+   master = list_first_entry_or_null(&smmu_domain->devices,
+ struct arm_smmu_master, domain_head);
+   if (!master)
+   return -EINVAL;
+
+   if (smmu_domain->s2_cfg.vmid)
+   return kvm_pinned_vmid_put(master->dev);
+
+   return 0;
+}
+
 static void arm_smmu_domain_free(struct iommu_domain *domain)
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -2215,8 +2243,11 @@ static void arm_smmu_domain_free(struct iommu_domain 
*domain)
mutex_unlock(&arm_smmu_asid_lock);
}
if (s2_cfg->set) {
-   if (s2_cfg->vmid)
-   arm_smmu_bitmap_free(smmu->vmid_map, s2_cfg->vmid);
+   if (s2_cfg->vmid) {
+   if (!(smmu->features & ARM_SMMU_FEAT_BTM) &&
+   smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+   arm_smmu_bitmap_free(smmu->vmid_map, 
s2_cfg->vmid);
+   }
}
 
kfree(smmu_domain);
@@ -3199,6 +3230,17 @@ static int arm_smmu_attach_pasid_table(struct 
iommu_domain *domain,
!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
goto out;
 
+   if (smmu->features & ARM_SMMU_FEAT_BTM) {
+   ret = arm_smmu_pinned_vmid_get(smmu_domain);
+   if (ret < 0)
+   goto out;
+
+   if (smmu_domain->s2_cfg.vmid)
+   arm_smmu_bitmap_free(smmu->vmid_map, 
smmu_domain->s2_cfg.vmid);
+
+   smmu_domain->s2_cfg.vmid = (u16)ret;
+   }
+
smmu_domain->s1_cfg.cdcfg.cdtab_dma = cfg->base_ptr;
smmu_domain->s1_cfg.s1cdmax = cfg->pasid_bits;
smmu_domain->s1_cfg.s1fmt = cfg->vendor_data.smmuv3.s1fmt;
@@ -3221,6 +3263,7 @@ static int arm_smmu_attach_pasid_table(struct 
iommu_domain *domain,
 static void arm_smmu_detach_pasid_table(struct iommu_domain *domain)
 {
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+   struct arm_smmu_device *smmu = smmu_domain->smmu;
struct arm_smmu_master *master;
unsigned long flags;
 
@@ -3237,6 +3280,8 @@ static void arm_smmu_detach_pasid_table(struct 
iommu_domain *domain)
arm_smmu_install_ste_for_dev(master);
spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
+   if (smmu->features & ARM_SMMU_FEAT_BTM)
+   arm_smmu_pinned_vmid_put(smmu_domain);
 unlock:
mutex_unlock(&smmu_domain->init_mutex);
 }
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 2/5] KVM: Add generic infrastructure to support pinned VMIDs

2021-02-22 Thread Shameer Kolothum
Provide generic helper functions to get/put pinned VMIDs if the arch
supports it.

Signed-off-by: Shameer Kolothum 
---
 include/linux/kvm_host.h | 17 +
 virt/kvm/Kconfig |  2 ++
 virt/kvm/kvm_main.c  | 25 +
 3 files changed, 44 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7f2e2a09ebbd..25856db74a08 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -836,6 +836,8 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu);
 void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
 int kvm_vcpu_yield_to(struct kvm_vcpu *target);
 void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
+int kvm_pinned_vmid_get(struct device *dev);
+int kvm_pinned_vmid_put(struct device *dev);
 
 void kvm_flush_remote_tlbs(struct kvm *kvm);
 void kvm_reload_remote_mmus(struct kvm *kvm);
@@ -1478,4 +1480,19 @@ static inline void kvm_handle_signal_exit(struct 
kvm_vcpu *vcpu)
 }
 #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
 
+#ifdef CONFIG_HAVE_KVM_PINNED_VMID
+int kvm_arch_pinned_vmid_get(struct kvm *kvm);
+int kvm_arch_pinned_vmid_put(struct kvm *kvm);
+#else
+static inline int kvm_arch_pinned_vmid_get(struct kvm *kvm)
+{
+   return -EINVAL;
+}
+
+static inline int kvm_arch_pinned_vmid_put(struct kvm *kvm)
+{
+   return -EINVAL;
+}
+#endif
+
 #endif
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 1c37ccd5d402..bb55616c5616 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -63,3 +63,5 @@ config HAVE_KVM_NO_POLL
 
 config KVM_XFER_TO_GUEST_WORK
bool
+config HAVE_KVM_PINNED_VMID
+   bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2541a17ff1c4..632d391f0e34 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -2849,6 +2850,30 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
 
+int kvm_pinned_vmid_get(struct device *dev)
+{
+   struct kvm *kvm;
+
+   kvm = vfio_kvm_get_from_dev(dev);
+   if (!kvm)
+   return -EINVAL;
+
+   return kvm_arch_pinned_vmid_get(kvm);
+}
+EXPORT_SYMBOL_GPL(kvm_pinned_vmid_get);
+
+int kvm_pinned_vmid_put(struct device *dev)
+{
+   struct kvm *kvm;
+
+   kvm = vfio_kvm_get_from_dev(dev);
+   if (!kvm)
+   return -EINVAL;
+
+   return kvm_arch_pinned_vmid_put(kvm);
+}
+EXPORT_SYMBOL_GPL(kvm_pinned_vmid_put);
+
 #ifndef CONFIG_S390
 /*
  * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode.
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 1/5] vfio: Add a helper to retrieve kvm instance from a dev

2021-02-22 Thread Shameer Kolothum
A device that belongs to vfio_group has the kvm instance associated
with it. Retrieve it.

Signed-off-by: Shameer Kolothum 
---
 drivers/vfio/vfio.c  | 12 
 include/linux/vfio.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 2151bc7f87ab..d1968e4bf9f4 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -876,6 +876,18 @@ struct vfio_device *vfio_device_get_from_dev(struct device 
*dev)
 }
 EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);
 
+struct kvm *vfio_kvm_get_from_dev(struct device *dev)
+{
+   struct vfio_group *group;
+
+   group = vfio_group_get_from_dev(dev);
+   if (!group)
+   return NULL;
+
+   return group->kvm;
+}
+EXPORT_SYMBOL_GPL(vfio_kvm_get_from_dev);
+
 static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
 char *buf)
 {
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 38d3c6a8dc7e..8716298fee27 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -56,6 +56,7 @@ extern void *vfio_del_group_dev(struct device *dev);
 extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
 extern void vfio_device_put(struct vfio_device *device);
 extern void *vfio_device_data(struct vfio_device *device);
+extern struct kvm *vfio_kvm_get_from_dev(struct device *dev);
 
 /**
  * struct vfio_iommu_driver_ops - VFIO IOMMU driver callbacks
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH 0/5] KVM/ARM64 Add support for pinned VMIDs

2021-02-22 Thread Shameer Kolothum
On an ARM64 system with a SMMUv3 implementation that fully supports
Broadcast TLB Maintenance(BTM) feature as part of the Distributed
Virtual Memory(DVM) protocol, the CPU TLB invalidate instructions are
received by SMMUv3. This is very useful when the SMMUv3 shares the
page tables with the CPU(eg: Guest SVA use case). For this to work,
the SMMU must use the same VMID that is allocated by KVM to configure
the stage 2 translations. At present KVM VMID allocations are recycled
on rollover and may change as a result. This will create issues if we
have to share the KVM VMID with SMMU.
 
Please see the discussion here,
https://lore.kernel.org/linux-iommu/20200522101755.GA3453945@myrica/

This series proposes a way to share the VMID between KVM and IOMMU
driver by,

1. Splitting the KVM VMID space into two equal halves based on the
   command line option "kvm-arm.pinned_vmid_enable".
2. First half of the VMID space follows the normal recycle on rollover
   policy.
3. Second half of the VMID space doesn't roll over and is used to
   allocate pinned VMIDs.
4. Provides helper function to retrieve the KVM instance associated
   with a device(if it is part of a vfio group).
5. Introduces generic interfaces to get/put pinned KVM VMIDs.

Open Items:
1. I couldn't figure out a way to determine whether a platform actually
   fully supports DVM/BTM or not. Not sure we can take a call based on
   SMMUv3 BTM feature bit alone. Probably we can get it from firmware
   via IORT?
2. The current splitting of VMID space is only one way to do this and
   probably not the best. Maybe we can follow the pinned ASID method used
   in SVA code. Suggestions welcome here.
3. The detach_pasid_table() interface is not very clear to me as the current
   Qemu prototype is not  using that. This requires fixing from my side.
 
This is based on Jean-Philippe's SVA series[1] and Eric's SMMUv3 dual-stage
support series[2].

The branch with the whole vSVA + BTM solution is here,
https://github.com/hisilicon/kernel-dev/tree/5.10-rc4-2stage-v13-vsva-btm-rfc

This is lightly tested on a HiSilicon D06 platform with uacce/zip dev test tool,
./zip_sva_per -k tlb

Thanks,
Shameer

1. https://github.com/Linaro/linux-kernel-uadk/commits/uacce-devel-5.10
2. 
https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.au...@redhat.com/T/

Shameer Kolothum (5):
  vfio: Add a helper to retrieve kvm instance from a dev
  KVM: Add generic infrastructure to support pinned VMIDs
  KVM: ARM64: Add support for pinned VMIDs
  iommu/arm-smmu-v3: Use pinned VMID for NESTED stage with BTM
  KVM: arm64: Make sure pinned vmid is released on VM exit

 arch/arm64/include/asm/kvm_host.h   |   2 +
 arch/arm64/kvm/Kconfig  |   1 +
 arch/arm64/kvm/arm.c| 116 +++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |  49 -
 drivers/vfio/vfio.c |  12 ++
 include/linux/kvm_host.h|  17 +++
 include/linux/vfio.h|   1 +
 virt/kvm/Kconfig|   2 +
 virt/kvm/kvm_main.c |  25 +
 9 files changed, 220 insertions(+), 5 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH] iommu/tegra-smmu: Fix mc errors on tegra124-nyan

2021-02-22 Thread Guillaume Tucker
On 18/02/2021 22:07, Nicolin Chen wrote:
> Commit 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
> removed certain hack in the tegra_smmu_probe() by relying on IOMMU core to
> of_xlate SMMU's SID per device, so as to get rid of tegra_smmu_find() and
> tegra_smmu_configure() that are typically done in the IOMMU core also.
> 
> This approach works for both existing devices that have DT nodes and other
> devices (like PCI device) that don't exist in DT, on Tegra210 and Tegra3
> upon testing. However, Page Fault errors are reported on tegra124-Nyan:
> 
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>EMEM address decode error (SMMU translation error [--S])
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>Page fault (SMMU translation error [--S])
> 
> After debugging, I found that the mentioned commit changed some function
> callback sequence of tegra-smmu's, resulting in enabling SMMU for display
> client before display driver gets initialized. I couldn't reproduce exact
> same issue on Tegra210 as Tegra124 (arm-32) differs at arch-level code.
> 
> Actually this Page Fault is a known issue, as on most of Tegra platforms,
> display gets enabled by the bootloader for the splash screen feature, so
> it keeps filling the framebuffer memory. A proper fix to this issue is to
> 1:1 linear map the framebuffer memory to IOVA space so the SMMU will have
> the same address as the physical address in its page table. Yet, Thierry
> has been working on the solution above for a year, and it hasn't merged.
> 
> Therefore, let's partially revert the mentioned commit to fix the errors.
> 
> The reason why we do a partial revert here is that we can still set priv
> in ->of_xlate() callback for PCI devices. Meanwhile, devices existing in
> DT, like display, will go through tegra_smmu_configure() at the stage of
> bus_set_iommu() when SMMU gets probed(), as what it did before we merged
> the mentioned commit.
> 
> Once we have the linear map solution for framebuffer memory, this change
> can be cleaned away.
> 
> [Big thank to Guillaume who reported and helped debugging/verification]
> 
> Fixes: 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
> Reported-by: Guillaume Tucker 

You're welcome.  I would actually prefer to see it as reported by
kernelci.org since I wouldn't have found it without the automated
testing and bisection.  If you're OK to change this trailer:

  Reported-by: "kernelci.org bot" 

> Signed-off-by: Nicolin Chen 
> ---
> 
> Guillaume, would you please give a "Tested-by" to this change? Thanks!

Sure. https://lava.collabora.co.uk/scheduler/job/3249387

  Tested-by: Guillaume Tucker 

Thanks,
Guillaume
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-02-22 Thread Keqian Zhu
Hi Eric,

On 2021/2/22 18:53, Auger Eric wrote:
> Hi Keqian,
> 
> On 2/2/21 1:34 PM, Keqian Zhu wrote:
>> Hi Eric,
>>
>> On 2020/11/16 19:00, Eric Auger wrote:
>>> From: "Liu, Yi L" 
>>>
>>> This patch adds an VFIO_IOMMU_SET_PASID_TABLE ioctl
>>> which aims to pass the virtual iommu guest configuration
>>> to the host. This latter takes the form of the so-called
>>> PASID table.
>>>
>>> Signed-off-by: Jacob Pan 
>>> Signed-off-by: Liu, Yi L 
>>> Signed-off-by: Eric Auger 
>>>
>>> ---
>>> v11 -> v12:
>>> - use iommu_uapi_set_pasid_table
>>> - check SET and UNSET are not set simultaneously (Zenghui)
>>>
>>> v8 -> v9:
>>> - Merge VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE into a single
>>>   VFIO_IOMMU_SET_PASID_TABLE ioctl.
>>>
>>> v6 -> v7:
>>> - add a comment related to VFIO_IOMMU_DETACH_PASID_TABLE
>>>
>>> v3 -> v4:
>>> - restore ATTACH/DETACH
>>> - add unwind on failure
>>>
>>> v2 -> v3:
>>> - s/BIND_PASID_TABLE/SET_PASID_TABLE
>>>
>>> v1 -> v2:
>>> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
>>> - remove the struct device arg
>>> ---
>>>  drivers/vfio/vfio_iommu_type1.c | 65 +
>>>  include/uapi/linux/vfio.h   | 19 ++
>>>  2 files changed, 84 insertions(+)
>>>
>>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>>> b/drivers/vfio/vfio_iommu_type1.c
>>> index 67e827638995..87ddd9e882dc 100644
>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>> @@ -2587,6 +2587,41 @@ static int vfio_iommu_iova_build_caps(struct 
>>> vfio_iommu *iommu,
>>> return ret;
>>>  }
>>>  
>>> +static void
>>> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
>>> +{
>>> +   struct vfio_domain *d;
>>> +
>>> +   mutex_lock(&iommu->lock);
>>> +   list_for_each_entry(d, &iommu->domain_list, next)
>>> +   iommu_detach_pasid_table(d->domain);
>>> +
>>> +   mutex_unlock(&iommu->lock);
>>> +}
>>> +
>>> +static int
>>> +vfio_attach_pasid_table(struct vfio_iommu *iommu, unsigned long arg)
>>> +{
>>> +   struct vfio_domain *d;
>>> +   int ret = 0;
>>> +
>>> +   mutex_lock(&iommu->lock);
>>> +
>>> +   list_for_each_entry(d, &iommu->domain_list, next) {
>>> +   ret = iommu_uapi_attach_pasid_table(d->domain, (void __user 
>>> *)arg);
>> This design is not very clear to me. This assumes all iommu_domains share 
>> the same pasid table.
>>
>> As I understand, it's reasonable when there is only one group in the domain, 
>> and only one domain in the vfio_iommu.
>> If more than one group in the vfio_iommu, the guest may put them into 
>> different guest iommu_domain, then they have different pasid table.
>>
>> Is this the use scenario?
> 
> the vfio_iommu is attached to a container. all the groups within a
> container share the same set of page tables (linux
> Documentation/driver-api/vfio.rst). So to me if you want to use
> different pasid tables, the groups need to be attached to different
> containers. Does that make sense to you?
OK, so this is what I understand about the design. A little question is that 
when
we perform attach_pasid_table on a container, maybe we ought to do a sanity
check to make sure that only one group is in this container, instead of
iterating all domain?

To be frank, my main concern is that if we put each group into different 
container
under nested mode, then we give up the possibility that they can share stage2 
page tables,
which saves host memory and reduces the time of preparing environment for VM.

To me, I'd like to understand the "container shares page table" to be:
1) share stage2 page table under nested mode.
2) share stage1 page table under non-nested mode.

As when we perform "map" on a container:
1) under nested mode, we setup stage2 mapping.
2) under non-nested mode, we setup stage1 mapping.

Indeed, to realize stage2 mapping sharing, we should do much more work to 
refactor
SMMU_DOMAIN...

Hope you can consider this. :)

Thanks,
Keqian

> 
> Thanks
> 
> Eric
>>
>> Thanks,
>> Keqian
>>
>>> +   if (ret)
>>> +   goto unwind;
>>> +   }
>>> +   goto unlock;
>>> +unwind:
>>> +   list_for_each_entry_continue_reverse(d, &iommu->domain_list, next) {
>>> +   iommu_detach_pasid_table(d->domain);
>>> +   }
>>> +unlock:
>>> +   mutex_unlock(&iommu->lock);
>>> +   return ret;
>>> +}
>>> +
>>>  static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
>>>struct vfio_info_cap *caps)
>>>  {
>>> @@ -2747,6 +2782,34 @@ static int vfio_iommu_type1_unmap_dma(struct 
>>> vfio_iommu *iommu,
>>> -EFAULT : 0;
>>>  }
>>>  
>>> +static int vfio_iommu_type1_set_pasid_table(struct vfio_iommu *iommu,
>>> +   unsigned long arg)
>>> +{
>>> +   struct vfio_iommu_type1_set_pasid_table spt;
>>> +   unsigned long minsz;
>>> +   int ret = -EINVAL;
>>> +
>>> +   minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table, flags);
>>> +
>>> +   if (copy_from_user(&spt, (void __user *)arg, minsz))
>>> +   return -EFA

Re: [PATCH v11 01/13] vfio: VFIO_IOMMU_SET_PASID_TABLE

2021-02-22 Thread Auger Eric
Hi Keqian,

On 2/2/21 1:34 PM, Keqian Zhu wrote:
> Hi Eric,
> 
> On 2020/11/16 19:00, Eric Auger wrote:
>> From: "Liu, Yi L" 
>>
>> This patch adds an VFIO_IOMMU_SET_PASID_TABLE ioctl
>> which aims to pass the virtual iommu guest configuration
>> to the host. This latter takes the form of the so-called
>> PASID table.
>>
>> Signed-off-by: Jacob Pan 
>> Signed-off-by: Liu, Yi L 
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v11 -> v12:
>> - use iommu_uapi_set_pasid_table
>> - check SET and UNSET are not set simultaneously (Zenghui)
>>
>> v8 -> v9:
>> - Merge VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE into a single
>>   VFIO_IOMMU_SET_PASID_TABLE ioctl.
>>
>> v6 -> v7:
>> - add a comment related to VFIO_IOMMU_DETACH_PASID_TABLE
>>
>> v3 -> v4:
>> - restore ATTACH/DETACH
>> - add unwind on failure
>>
>> v2 -> v3:
>> - s/BIND_PASID_TABLE/SET_PASID_TABLE
>>
>> v1 -> v2:
>> - s/BIND_GUEST_STAGE/BIND_PASID_TABLE
>> - remove the struct device arg
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 65 +
>>  include/uapi/linux/vfio.h   | 19 ++
>>  2 files changed, 84 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 67e827638995..87ddd9e882dc 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -2587,6 +2587,41 @@ static int vfio_iommu_iova_build_caps(struct 
>> vfio_iommu *iommu,
>>  return ret;
>>  }
>>  
>> +static void
>> +vfio_detach_pasid_table(struct vfio_iommu *iommu)
>> +{
>> +struct vfio_domain *d;
>> +
>> +mutex_lock(&iommu->lock);
>> +list_for_each_entry(d, &iommu->domain_list, next)
>> +iommu_detach_pasid_table(d->domain);
>> +
>> +mutex_unlock(&iommu->lock);
>> +}
>> +
>> +static int
>> +vfio_attach_pasid_table(struct vfio_iommu *iommu, unsigned long arg)
>> +{
>> +struct vfio_domain *d;
>> +int ret = 0;
>> +
>> +mutex_lock(&iommu->lock);
>> +
>> +list_for_each_entry(d, &iommu->domain_list, next) {
>> +ret = iommu_uapi_attach_pasid_table(d->domain, (void __user 
>> *)arg);
> This design is not very clear to me. This assumes all iommu_domains share the 
> same pasid table.
> 
> As I understand, it's reasonable when there is only one group in the domain, 
> and only one domain in the vfio_iommu.
> If more than one group in the vfio_iommu, the guest may put them into 
> different guest iommu_domain, then they have different pasid table.
> 
> Is this the use scenario?

the vfio_iommu is attached to a container. all the groups within a
container share the same set of page tables (linux
Documentation/driver-api/vfio.rst). So to me if you want to use
different pasid tables, the groups need to be attached to different
containers. Does that make sense to you?

Thanks

Eric
> 
> Thanks,
> Keqian
> 
>> +if (ret)
>> +goto unwind;
>> +}
>> +goto unlock;
>> +unwind:
>> +list_for_each_entry_continue_reverse(d, &iommu->domain_list, next) {
>> +iommu_detach_pasid_table(d->domain);
>> +}
>> +unlock:
>> +mutex_unlock(&iommu->lock);
>> +return ret;
>> +}
>> +
>>  static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
>> struct vfio_info_cap *caps)
>>  {
>> @@ -2747,6 +2782,34 @@ static int vfio_iommu_type1_unmap_dma(struct 
>> vfio_iommu *iommu,
>>  -EFAULT : 0;
>>  }
>>  
>> +static int vfio_iommu_type1_set_pasid_table(struct vfio_iommu *iommu,
>> +unsigned long arg)
>> +{
>> +struct vfio_iommu_type1_set_pasid_table spt;
>> +unsigned long minsz;
>> +int ret = -EINVAL;
>> +
>> +minsz = offsetofend(struct vfio_iommu_type1_set_pasid_table, flags);
>> +
>> +if (copy_from_user(&spt, (void __user *)arg, minsz))
>> +return -EFAULT;
>> +
>> +if (spt.argsz < minsz)
>> +return -EINVAL;
>> +
>> +if (spt.flags & VFIO_PASID_TABLE_FLAG_SET &&
>> +spt.flags & VFIO_PASID_TABLE_FLAG_UNSET)
>> +return -EINVAL;
>> +
>> +if (spt.flags & VFIO_PASID_TABLE_FLAG_SET)
>> +ret = vfio_attach_pasid_table(iommu, arg + minsz);
>> +else if (spt.flags & VFIO_PASID_TABLE_FLAG_UNSET) {
>> +vfio_detach_pasid_table(iommu);
>> +ret = 0;
>> +}
>> +return ret;
>> +}
>> +
>>  static int vfio_iommu_type1_dirty_pages(struct vfio_iommu *iommu,
>>  unsigned long arg)
>>  {
>> @@ -2867,6 +2930,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>>  return vfio_iommu_type1_unmap_dma(iommu, arg);
>>  case VFIO_IOMMU_DIRTY_PAGES:
>>  return vfio_iommu_type1_dirty_pages(iommu, arg);
>> +case VFIO_IOMMU_SET_PASID_TABLE:
>> +return vfio_iommu_type1_set_pasid_table(iommu, arg);
>>  default:
>>  return -ENOTTY;
>>  }
>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/v

Re: [PATCH 5/6] driver core: lift dma_default_coherent into common code

2021-02-22 Thread Maciej W. Rozycki
On Mon, 22 Feb 2021, Christoph Hellwig wrote:

> > I haven't booted Linux on my Malta for a while now, but it turns out to 
> > work just fine, and your patch set does not regress it booting multi-user 
> > NFS-rooted over FDDI.
> > 
> >  I note however that the system does not reboot properly:
> > 
> > sd 0:0:0:0: [sda] Synchronizing SCSI cache
> > reboot: Restarting system
> > Reboot failed -- System halted
> > 
> > which is a regression, and also the MMIO-mapped discrete CBUS UART (ttyS2) 
> > does not sign in anymore either:
> 
> Do you mean a regression with this series, or just compared to when you
> last tested?

 When last tested.  Years ago, so nothing for you to be concerned.  I'll 
look into it sometime unless someone beats me to.  Don't hold your breath 
though.  Sorry to be unclear.

  Maciej
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)

2021-02-22 Thread Shameerali Kolothum Thodi


> -Original Message-
> From: Auger Eric [mailto:eric.au...@redhat.com]
> Sent: 21 February 2021 18:21
> To: Shameerali Kolothum Thodi ;
> eric.auger@gmail.com; iommu@lists.linux-foundation.org;
> linux-ker...@vger.kernel.org; k...@vger.kernel.org;
> kvm...@lists.cs.columbia.edu; w...@kernel.org; j...@8bytes.org;
> m...@kernel.org; robin.mur...@arm.com; alex.william...@redhat.com
> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> zhangfei@gmail.com; vivek.gau...@arm.com;
> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> nicoleots...@gmail.com; yuzenghui ; Zengtao (B)
> ; linux...@openeuler.org
> Subject: Re: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> 
> Hi Shameer,
> On 1/8/21 6:05 PM, Shameerali Kolothum Thodi wrote:
> > Hi Eric,
> >
> >> -Original Message-
> >> From: Eric Auger [mailto:eric.au...@redhat.com]
> >> Sent: 18 November 2020 11:22
> >> To: eric.auger@gmail.com; eric.au...@redhat.com;
> >> iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> >> k...@vger.kernel.org; kvm...@lists.cs.columbia.edu; w...@kernel.org;
> >> j...@8bytes.org; m...@kernel.org; robin.mur...@arm.com;
> >> alex.william...@redhat.com
> >> Cc: jean-phili...@linaro.org; zhangfei@linaro.org;
> >> zhangfei@gmail.com; vivek.gau...@arm.com; Shameerali Kolothum
> >> Thodi ;
> >> jacob.jun@linux.intel.com; yi.l@intel.com; t...@semihalf.com;
> >> nicoleots...@gmail.com; yuzenghui 
> >> Subject: [PATCH v13 00/15] SMMUv3 Nested Stage Setup (IOMMU part)
> >>
> >> This series brings the IOMMU part of HW nested paging support
> >> in the SMMUv3. The VFIO part is submitted separately.
> >>
> >> The IOMMU API is extended to support 2 new API functionalities:
> >> 1) pass the guest stage 1 configuration
> >> 2) pass stage 1 MSI bindings
> >>
> >> Then those capabilities gets implemented in the SMMUv3 driver.
> >>
> >> The virtualizer passes information through the VFIO user API
> >> which cascades them to the iommu subsystem. This allows the guest
> >> to own stage 1 tables and context descriptors (so-called PASID
> >> table) while the host owns stage 2 tables and main configuration
> >> structures (STE).
> >
> > I am seeing an issue with Guest testpmd run with this series.
> > I have two different setups and testpmd works fine with the
> > first one but not with the second.
> >
> > 1). Guest doesn't have kernel driver built-in for pass-through dev.
> >
> > root@ubuntu:/# lspci -v
> > ...
> > 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev
> 21)
> > Subsystem: Huawei Technologies Co., Ltd. Device 
> > Flags: fast devsel
> > Memory at 800010 (64-bit, prefetchable) [disabled] [size=64K]
> > Memory at 80 (64-bit, prefetchable) [disabled] [size=1M]
> > Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> > Capabilities: [a0] MSI-X: Enable- Count=67 Masked-
> > Capabilities: [b0] Power Management version 3
> > Capabilities: [100] Access Control Services
> > Capabilities: [300] Transaction Processing Hints
> >
> > root@ubuntu:/# echo vfio-pci >
> /sys/bus/pci/devices/:00:02.0/driver_override
> > root@ubuntu:/# echo :00:02.0 > /sys/bus/pci/drivers_probe
> >
> > root@ubuntu:/mnt/dpdk/build/app# ./testpmd -w :00:02.0 --file-prefix
> socket0  -l 0-1 -n 2 -- -i
> > EAL: Detected 8 lcore(s)
> > EAL: Detected 1 NUMA nodes
> > EAL: Multi-process socket /var/run/dpdk/socket0/mp_socket
> > EAL: Selected IOVA mode 'VA'
> > EAL: No available hugepages reported in hugepages-32768kB
> > EAL: No available hugepages reported in hugepages-64kB
> > EAL: No available hugepages reported in hugepages-1048576kB
> > EAL: Probing VFIO support...
> > EAL: VFIO support initialized
> > EAL:   Invalid NUMA socket, default to 0
> > EAL:   using IOMMU type 1 (Type 1)
> > EAL: Probe PCI driver: net_hns3_vf (19e5:a22e) device: :00:02.0 (socket
> 0)
> > EAL: No legacy callbacks, legacy socket not created
> > Interactive-mode selected
> > testpmd: create a new mbuf pool : n=155456,
> size=2176, socket=0
> > testpmd: preferred mempool ops selected: ring_mp_mc
> >
> > Warning! port-topology=paired and odd forward ports number, the last port
> will pair with itself.
> >
> > Configuring Port 0 (socket 0)
> > Port 0: 8E:A6:8C:43:43:45
> > Checking link statuses...
> > Done
> > testpmd>
> >
> > 2). Guest have kernel driver built-in for pass-through dev.
> >
> > root@ubuntu:/# lspci -v
> > ...
> > 00:02.0 Ethernet controller: Huawei Technologies Co., Ltd. Device a22e (rev
> 21)
> > Subsystem: Huawei Technologies Co., Ltd. Device 
> > Flags: bus master, fast devsel, latency 0
> > Memory at 800010 (64-bit, prefetchable) [size=64K]
> > Memory at 80 (64-bit, prefetchable) [size=1M]
> > Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00
> > Capabilities: [a0] MSI-X: Enable+ Count=67 Masked-
> > Capabilities: [b0] Power Management version 3
> > Capabilities: [100] Access Control Se