Re: [PATCH v1 2/2] iommu/tegra-smmu: Revert workaround that was needed for Nyan Big Chromebook

2021-04-23 Thread Guillaume Tucker
On 02/04/2021 15:40, Dmitry Osipenko wrote:
> 01.04.2021 11:55, Nicolin Chen пишет:
>> On Mon, Mar 29, 2021 at 02:32:56AM +0300, Dmitry Osipenko wrote:
>>> The previous commit fixes problem where display client was attaching too
>>> early to IOMMU during kernel boot in a multi-platform kernel configuration
>>> which enables CONFIG_ARM_DMA_USE_IOMMU=y. The workaround that helped to
>>> defer the IOMMU attachment for Nyan Big Chromebook isn't needed anymore,
>>> revert it.
>>
>> Sorry for the late reply. I have been busy with downstream tasks.
>>
>> I will give them a try by the end of the week. Yet, probably it'd
>> be better to include Guillaume also as he has the Nyan platform.
>>
> 
> Indeed, thanks. Although, I'm pretty sure that it's the same issue which
> I reproduced on Nexus 7.
> 
> Guillaume, could you please give a test to these patches on Nyan Big?
> There should be no EMEM errors in the kernel log with this patches.
> 
> https://patchwork.ozlabs.org/project/linux-tegra/list/?series=236215

So sorry for the very late reply.  I have tried the patches but
hit some issues on linux-next, it's not reaching a login prompt
with next-20210422.  So I then tried with next-20210419 which
does boot but shows the IOMMU error:

<6>[2.995341] tegra-dc 5420.dc: Adding to iommu group 1
<4>[3.001070] Failed to attached device 5420.dc to IOMMU_mapping  

  https://lava.collabora.co.uk/scheduler/job/3570052#L1120

The branch I'm using with the patches applied can be found here:

  
https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210419-nyan-big-drm-read/

Hope this helps, let me know if you need anything else to be
tested.

Best wishes,
Guillaume
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: next/master bisection: baseline.login on r8a77960-ulcb

2021-02-23 Thread Guillaume Tucker
Hi Christoph,

Please see the bisection report below about a boot failure on
r8a77960-ulcb on next-20210222.  

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

The log shows a kernel panic, more details can be found here:

  https://kernelci.org/test/case/id/6034bde034504edc9faddd2c/

Please let us know if you need any help to debug the issue or try
a fix on this platform.

Best wishes,
Guillaume

On 23/02/2021 02:02, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on r8a77960-ulcb
> 
> Summary:
>   Start:  37dfbfbdca66 Add linux-next specific files for 20210222
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20210222/arm64/defconfig/clang-10/lab-baylibre/baseline-r8a77960-ulcb.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20210222/arm64/defconfig/clang-10/lab-baylibre/baseline-r8a77960-ulcb.html
>   Result: 567d877f9a7d swiotlb: refactor swiotlb_tbl_map_single
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: r8a77960-ulcb
>   CPU arch:   arm64
>   Lab:lab-baylibre
>   Compiler:   clang-10
>   Config: defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 567d877f9a7d6bf4e4bf0ecd6de23fec8039b123
> Author: Christoph Hellwig 
> Date:   Thu Feb 4 11:08:35 2021 +0100
> 
> swiotlb: refactor swiotlb_tbl_map_single
> 
> Split out a bunch of a self-contained helpers to make the function easier
> to follow.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Jianxiong Gao 
> Tested-by: Jianxiong Gao 
> Signed-off-by: Konrad Rzeszutek Wilk 
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index b38b1553c466..381c24ef1ac1 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,134 +468,133 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
> phys_addr_t tlb_addr,
>   }
>  }
>  
> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t 
> orig_addr,
> - size_t mapping_size, size_t alloc_size,
> - enum dma_data_direction dir, unsigned long attrs)
> -{
> - dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
> - unsigned long flags;
> - phys_addr_t tlb_addr;
> - unsigned int nslots, stride, index, wrap;
> - int i;
> - unsigned long mask;
> - unsigned long offset_slots;
> - unsigned long max_slots;
> - unsigned long tmp_io_tlb_used;
> -
> - if (no_iotlb_memory)
> - panic("Can not allocate SWIOTLB buffer earlier and can't now 
> provide you with the DMA bounce buffer");
> -
> - if (mem_encrypt_active())
> - pr_warn_once("Memory encryption is active and system is using 
> DMA bounce buffers\n");
> +#define slot_addr(start, idx)((start) + ((idx) << IO_TLB_SHIFT))
>  
> - if (mapping_size > alloc_size) {
> - dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: 
> %zd bytes)",
> -   mapping_size, alloc_size);
> - return (phys_addr_t)DMA_MAPPING_ERROR;
> - }
> -
> - mask = dma_get_seg_boundary(hwdev);
> +/*
> + * Carefully handle integer overflow which can occur when boundary_mask == 
> ~0UL.
> + */
> +static inline unsigned long get_max_slots(unsigned long boundary_mask)
> +{
> + if (boundary_mask == ~0UL)
> + return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> + return nr_slots(boundary_mask + 1);
> +}
>  
> - tbl_dma_addr &= mask;
> +static unsigned int wrap_index(unsigned int index)
> +{
> + if (index >= io_tlb_nslabs)
> + return 0;
> + return index;
> +}
>  
> - offset_slots = nr_slots(tbl_dma_addr);
> +/*
> + * Find a suitable number of IO TLB entries size that will fit this request 
> and
> + * allocate a buffer from that IO TLB pool.
> + */
> +static int find_slots(struct device *dev, size_t alloc_size)
> +{
> + unsigned long boundary_mask = 

Re: [PATCH] iommu/tegra-smmu: Fix mc errors on tegra124-nyan

2021-02-22 Thread Guillaume Tucker
On 18/02/2021 22:07, Nicolin Chen wrote:
> Commit 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
> removed certain hack in the tegra_smmu_probe() by relying on IOMMU core to
> of_xlate SMMU's SID per device, so as to get rid of tegra_smmu_find() and
> tegra_smmu_configure() that are typically done in the IOMMU core also.
> 
> This approach works for both existing devices that have DT nodes and other
> devices (like PCI device) that don't exist in DT, on Tegra210 and Tegra3
> upon testing. However, Page Fault errors are reported on tegra124-Nyan:
> 
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>EMEM address decode error (SMMU translation error [--S])
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>Page fault (SMMU translation error [--S])
> 
> After debugging, I found that the mentioned commit changed some function
> callback sequence of tegra-smmu's, resulting in enabling SMMU for display
> client before display driver gets initialized. I couldn't reproduce exact
> same issue on Tegra210 as Tegra124 (arm-32) differs at arch-level code.
> 
> Actually this Page Fault is a known issue, as on most of Tegra platforms,
> display gets enabled by the bootloader for the splash screen feature, so
> it keeps filling the framebuffer memory. A proper fix to this issue is to
> 1:1 linear map the framebuffer memory to IOVA space so the SMMU will have
> the same address as the physical address in its page table. Yet, Thierry
> has been working on the solution above for a year, and it hasn't merged.
> 
> Therefore, let's partially revert the mentioned commit to fix the errors.
> 
> The reason why we do a partial revert here is that we can still set priv
> in ->of_xlate() callback for PCI devices. Meanwhile, devices existing in
> DT, like display, will go through tegra_smmu_configure() at the stage of
> bus_set_iommu() when SMMU gets probed(), as what it did before we merged
> the mentioned commit.
> 
> Once we have the linear map solution for framebuffer memory, this change
> can be cleaned away.
> 
> [Big thank to Guillaume who reported and helped debugging/verification]
> 
> Fixes: 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
> Reported-by: Guillaume Tucker 

You're welcome.  I would actually prefer to see it as reported by
kernelci.org since I wouldn't have found it without the automated
testing and bisection.  If you're OK to change this trailer:

  Reported-by: "kernelci.org bot" 

> Signed-off-by: Nicolin Chen 
> ---
> 
> Guillaume, would you please give a "Tested-by" to this change? Thanks!

Sure. https://lava.collabora.co.uk/scheduler/job/3249387

  Tested-by: Guillaume Tucker 

Thanks,
Guillaume
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-18 Thread Guillaume Tucker
On 18/02/2021 10:35, Nicolin Chen wrote:
> Hi Guillaume,
> 
> Thank you for the test results! And sorry for my belated reply.

No worries :)

> On Thu, Feb 11, 2021 at 03:50:05PM +0000, Guillaume Tucker wrote:
>>> On Sat, Feb 06, 2021 at 01:40:13PM +0000, Guillaume Tucker wrote:
>>>>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
>>>>> and the commit-reverted version (passing), each applying this patch.
>>>>
>>>> Sure, I've run 3 jobs:
>>>>
>>>> * v5.11-rc6 as a reference, to see the original issue:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187848
>>>>
>>>> * + your debug patch:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187849
>>>>
>>>> * + the "breaking" commit reverted, passing the tests:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187851
>>>
>>> Thanks for the help!
>>>
>>> I am able to figure out what's probably wrong, yet not so sure
>>> about the best solution at this point.
>>>
>>> Would it be possible for you to run one more time with another
>>> debugging patch? I'd like to see the same logs as previous:
>>> 1. Vanilla kernel + debug patch
>>> 2. Vanilla kernel + Reverted + debug patch
>>
>> As it turns out, next-20210210 is passing all the tests again so
>> it looks like this got fixed in the meantime:
>>
>>   https://lava.collabora.co.uk/scheduler/job/3210192
> 
> I checked this passing log, however, found that the regression is
> still there though test passed, as the prints below aren't normal:
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>EMEM address decode error (SMMU translation error [--S])
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>Page fault (SMMU translation error [--S])

Ah yes sorry, there are other KernelCI checks for kernel errors
but that wasn't enabled in the bisection so I didn't notice them.

> I was trying to think of a simpler solution than a revert. However,
> given the fact that the callback sequence could change -- guessing
> likely a recent change in iommu core, I feel it safer to revert my
> previous change, not necessarily being a complete revert though.
> 
> I attached my partial reverting change in this email. Would it be
> possible for you to run one more test for me to confirm it? It'd
> keep the tests passing while eliminating all error prints above.
> 
> If the fix works, I'll re-send it to mail list by adding a commit
> message.

Sure, here's next-20210218 as a reference:

  https://lava.collabora.co.uk/scheduler/job/3241236

and here with your patch applied on top of it:

  https://lava.collabora.co.uk/scheduler/job/3241246

The git branch I've used where your patch is applied:

  
https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210218-nyan-big-drm-read/

The errors seem to have disappeared but I'll let you double check
that things are all back to a working state.

BTW: This thread is a good example of how having an "on-demand"
KernelCI service to let developers re-run tests with extra
patches would allow them to fix issues independently.  We'll keep
that in mind for the future.

Best wishes,
Guillaume
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-11 Thread Guillaume Tucker
On 10/02/2021 08:20, Nicolin Chen wrote:
> Hi Guillaume,
> 
> On Sat, Feb 06, 2021 at 01:40:13PM +, Guillaume Tucker wrote:
>>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
>>> and the commit-reverted version (passing), each applying this patch.
>>
>> Sure, I've run 3 jobs:
>>
>> * v5.11-rc6 as a reference, to see the original issue:
>>   https://lava.collabora.co.uk/scheduler/job/3187848
>>
>> * + your debug patch:
>>   https://lava.collabora.co.uk/scheduler/job/3187849
>>
>> * + the "breaking" commit reverted, passing the tests:
>>   https://lava.collabora.co.uk/scheduler/job/3187851
> 
> Thanks for the help!
> 
> I am able to figure out what's probably wrong, yet not so sure
> about the best solution at this point.
> 
> Would it be possible for you to run one more time with another
> debugging patch? I'd like to see the same logs as previous:
> 1. Vanilla kernel + debug patch
> 2. Vanilla kernel + Reverted + debug patch

As it turns out, next-20210210 is passing all the tests again so
it looks like this got fixed in the meantime:

  https://lava.collabora.co.uk/scheduler/job/3210192
  https://lava.collabora.co.uk/results/3210192/0_igt-kms-tegra

And here's a more extensive list of IGT tests on next-20210211,
all the regressions have been fixed:

  https://kernelci.org/test/plan/id/60254c42f51df36be53abe62/


I haven't run a reversed bisection to find the fix, but I guess
it wouldn't be too hard to find out what happened by hand anyway.
I see the drm/tegra/for-5.12-rc1 tag has been merged into
linux-next, maybe that solved the issue?

FYI I've also run some jobs with your debug patch and with the
breaking patch reverted:

  https://lava.collabora.co.uk/scheduler/job/3210245
  https://lava.collabora.co.uk/scheduler/job/3210596

Meanwhile I'll see what can be done to improve the automated
bisection so if there are new IGT regressions they would get
reported earlier.  I guess it would have saved us all some time
if it had been bisected in December.

Thanks,
Guillaume
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-06 Thread Guillaume Tucker
On 05/02/2021 09:45, Nicolin Chen wrote:
> Hi Guillaume,
> 
> On Thu, Feb 04, 2021 at 09:24:23PM -0800, Nicolin Chen wrote:
>>> Please let us know if you need any help debugging this issue or
>>> to try a fix on this platform.
>>
>> Yes, I don't have any Tegra124 platform to run. It'd be very nice
>> if you can run some debugging patch (I can provide you) and a fix
>> after I root cause the issue.
> 
> Would it be possible for you to run with the given debugging patch?
> 
> It'd be nicer if I can get both logs of the vanilla kernel (failing)
> and the commit-reverted version (passing), each applying this patch.

Sure, I've run 3 jobs:

* v5.11-rc6 as a reference, to see the original issue:
  https://lava.collabora.co.uk/scheduler/job/3187848

* + your debug patch:
  https://lava.collabora.co.uk/scheduler/job/3187849

* + the "breaking" commit reverted, passing the tests:
  https://lava.collabora.co.uk/scheduler/job/3187851


You can see the history of the test branch I'm using here, with
the 3 revisions mentioned above:

  
https://gitlab.collabora.com/gtucker/linux/-/commits/linux-5.11-rc6-nyan-big-drm-read/


Hope that helps,
Guillaume
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-04 Thread Guillaume Tucker
Hi Nicolin,

A regression was detected by kernelci.org in IGT's drm_read tests
on mainline, it was first seen on 17th December 2020.  You can
find some details here:

  https://kernelci.org/test/case/id/600b82dc1e3208f123d3dffc/

Then an automated bisection was run and it landed on this
patch (v5.10-rc3-4-g25938c73cd79 on mainline).  Normally, an
email is generated automatically but I had to start this one by
hand as there were issues getting it to complete.

You can see the failing test cases with this patch:

  https://lava.collabora.co.uk/results/3126405/0_igt-kms-tegra

Some errors are seen around this point in the log:

  https://lava.collabora.co.uk/scheduler/job/3126405#L1005

[3.029729] tegra-mc 70019000.memory-controller: display0a: read 
@0xfe00: EMEM address decode error (SMMU translation error [--S])
[3.042058] tegra-mc 70019000.memory-controller: display0a: read 
@0xfe00: Page fault (SMMU translation error [--S])


Here's the same test passing with this patch reverted:

  https://lava.collabora.co.uk/results/3126570/0_igt-kms-tegra
  

For completeness, you can see all the test jobs run by the
automated bisection here:

  
https://lava.collabora.co.uk/scheduler/device_type/tegra124-nyan-big?dt_length=25_search=bisection-gtucker-12#dt_


Please let us know if you need any help debugging this issue or
to try a fix on this platform.

Best wishes,
Guillaume

On 25/11/2020 10:10, Nicolin Chen wrote:
> The bus_set_iommu() in tegra_smmu_probe() enumerates all clients
> to call in tegra_smmu_probe_device() where each client searches
> its DT node for smmu pointer and swgroup ID, so as to configure
> an fwspec. But this requires a valid smmu pointer even before mc
> and smmu drivers are probed. So in tegra_smmu_probe() we added a
> line of code to fill mc->smmu, marking "a bit of a hack".
> 
> This works for most of clients in the DTB, however, doesn't work
> for a client that doesn't exist in DTB, a PCI device for example.
> 
> Actually, if we return ERR_PTR(-ENODEV) in ->probe_device() when
> it's called from bus_set_iommu(), iommu core will let everything
> carry on. Then when a client gets probed, of_iommu_configure() in
> iommu core will search DTB for swgroup ID and call ->of_xlate()
> to prepare an fwspec, similar to tegra_smmu_probe_device() and
> tegra_smmu_configure(). Then it'll call tegra_smmu_probe_device()
> again, and this time we shall return smmu->iommu pointer properly.
> 
> So we can get rid of tegra_smmu_find() and tegra_smmu_configure()
> along with DT polling code by letting the iommu core handle every
> thing, except a problem that we search iommus property in DTB not
> only for swgroup ID but also for mc node to get mc->smmu pointer
> to call dev_iommu_priv_set() and return the smmu->iommu pointer.
> So we'll need to find another way to get smmu pointer.
> 
> Referencing the implementation of sun50i-iommu driver, of_xlate()
> has client's dev pointer, mc node and swgroup ID. This means that
> we can call dev_iommu_priv_set() in of_xlate() instead, so we can
> simply get smmu pointer in ->probe_device().
> 
> This patch reworks tegra_smmu_probe_device() by:
> 1) Removing mc->smmu hack in tegra_smmu_probe() so as to return
>ERR_PTR(-ENODEV) in tegra_smmu_probe_device() during stage of
>tegra_smmu_probe/tegra_mc_probe().
> 2) Moving dev_iommu_priv_set() to of_xlate() so we can get smmu
>pointer in tegra_smmu_probe_device() to replace DTB polling.
> 3) Removing tegra_smmu_configure() accordingly since iommu core
>takes care of it.
> 
> This also fixes a problem that previously we could add clients to
> iommu groups before iommu core initializes its default domain:
> ubuntu@jetson:~$ dmesg | grep iommu
> platform 5000.host1x: Adding to iommu group 1
> platform 5700.gpu: Adding to iommu group 2
> iommu: Default domain type: Translated
> platform 5420.dc: Adding to iommu group 3
> platform 5424.dc: Adding to iommu group 3
> platform 5434.vic: Adding to iommu group 4
> 
> Though it works fine with IOMMU_DOMAIN_UNMANAGED, but will have
> warnings if switching to IOMMU_DOMAIN_DMA:
> iommu: Failed to allocate default IOMMU domain of type 0 for
>group (null) - Falling back to IOMMU_DOMAIN_DMA
> iommu: Failed to allocate default IOMMU domain of type 0 for
>group (null) - Falling back to IOMMU_DOMAIN_DMA
> 
> Now, bypassing the first probe_device() call from bus_set_iommu()
> fixes the sequence:
> ubuntu@jetson:~$ dmesg | grep iommu
> iommu: Default domain type: Translated
> tegra-host1x 5000.host1x: Adding to iommu group 0
> tegra-dc 5420.dc: Adding to iommu group 1
> tegra-dc 5424.dc: Adding to iommu group 1
> tegra-vic 5434.vic: Adding to iommu group 2
> nouveau 5700.gpu: Adding to iommu group 3
> 
> Note that dmesg log above is testing with IOMMU_DOMAIN_UNMANAGED.
> 
> Reviewed-by: Dmitry Osipenko 
> Tested-by: Dmitry Osipenko 
> 

Re: next/master bisection: baseline.login on panda

2020-05-20 Thread Guillaume Tucker
Please see the bisection report below about a boot failure.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

Unfortunately there isn't anything in the kernel log, it's
probably crashing very early on.  The bisection was run on
omap4-panda, and there seems to be the same issue on
omap3-beagle-xm as it's also failing to boot.

Please let us know if anyone is able to debug the issue or if we
need to rerun the KernelCI job with earlyprintk enabled or any
debug config option.

Thanks,
Guillaume

On 20/05/2020 09:34, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on panda
> 
> Summary:
>   Start:  fb57b1fabcb28 Add linux-next specific files for 20200519
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20200519/arm/omap2plus_defconfig/gcc-8/lab-baylibre/baseline-omap4-panda.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20200519/arm/omap2plus_defconfig/gcc-8/lab-baylibre/baseline-omap4-panda.html
>   Result: ce574c27ae275 iommu: Move iommu_group_create_direct_mappings() 
> out of iommu_group_add_device()
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: panda
>   CPU arch:   arm
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: omap2plus_defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit ce574c27ae275bc51b6437883fc9cd1c46b498e5
> Author: Joerg Roedel 
> Date:   Wed Apr 29 15:36:50 2020 +0200
> 
> iommu: Move iommu_group_create_direct_mappings() out of 
> iommu_group_add_device()
> 
> After the previous changes the iommu group may not have a default
> domain when iommu_group_add_device() is called. With no default domain
> iommu_group_create_direct_mappings() will do nothing and no direct
> mappings will be created.
> 
> Rename iommu_group_create_direct_mappings() to
> iommu_create_device_direct_mappings() to better reflect that the
> function creates direct mappings only for one device and not for all
> devices in the group. Then move the call to the places where a default
> domain actually exists.
> 
> Signed-off-by: Joerg Roedel 
> Tested-by: Marek Szyprowski 
> Acked-by: Marek Szyprowski 
> Link: https://lore.kernel.org/r/20200429133712.31431-13-j...@8bytes.org
> Signed-off-by: Joerg Roedel 
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 7de0e29db3338..834a45da0ed0f 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -89,6 +89,8 @@ static int __iommu_attach_group(struct iommu_domain *domain,
>   struct iommu_group *group);
>  static void __iommu_detach_group(struct iommu_domain *domain,
>struct iommu_group *group);
> +static int iommu_create_device_direct_mappings(struct iommu_group *group,
> +struct device *dev);
>  
>  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)\
>  struct iommu_group_attribute iommu_group_attr_##_name =  \
> @@ -243,6 +245,8 @@ static int __iommu_probe_device_helper(struct device *dev)
>   if (group->default_domain)
>   ret = __iommu_attach_device(group->default_domain, dev);
>  
> + iommu_create_device_direct_mappings(group, dev);
> +
>   iommu_group_put(group);
>  
>   if (ret)
> @@ -263,6 +267,7 @@ static int __iommu_probe_device_helper(struct device *dev)
>  int iommu_probe_device(struct device *dev)
>  {
>   const struct iommu_ops *ops = dev->bus->iommu_ops;
> + struct iommu_group *group;
>   int ret;
>  
>   WARN_ON(dev->iommu_group);
> @@ -285,6 +290,10 @@ int iommu_probe_device(struct device *dev)
>   if (ret)
>   goto err_module_put;
>  
> + group = iommu_group_get(dev);
> + iommu_create_device_direct_mappings(group, dev);
> + iommu_group_put(group);
> +
>   if (ops->probe_finalize)
>   

Re: next/master bisection: baseline.login on jetson-tk1

2020-05-13 Thread Guillaume Tucker
On 12/05/2020 16:16, Joerg Roedel wrote:
> Hi Guillaume,
> 
> thanks for the report!
> 
> On Tue, May 12, 2020 at 07:05:13AM +0100, Guillaume Tucker wrote:
>>> Summary:
>>>   Start:  4b20e7462caa6 Add linux-next specific files for 20200511
>>>   Plain log:  
>>> https://storage.kernelci.org/next/master/next-20200511/arm/tegra_defconfig/gcc-8/lab-collabora/baseline-tegra124-jetson-tk1.txt
>>>   HTML log:   
>>> https://storage.kernelci.org/next/master/next-20200511/arm/tegra_defconfig/gcc-8/lab-collabora/baseline-tegra124-jetson-tk1.html
>>>   Result: 3eeeb45c6d044 iommu: Remove add_device()/remove_device() 
>>> code-paths
> 
> Okay, so it faults at
> 
>   PC is at __iommu_probe_device+0x20/0x1b8
> 
> Can you translate that for me into a code-line, please? That would help
> finding the issue.

Sure, sorry for the delay.  I've built my own image as vmlinux is
not stored by kernelci and reproduced the problem:

  https://lava.collabora.co.uk/scheduler/job/2403076#L544

which this time gave me:

<4>[2.540558] PC is at iommu_probe_device+0x1c/0x15c
<4>[2.545606] LR is at of_iommu_configure+0x15c/0x1c4
<4>[2.550736] pc : []lr : []psr: a013

which in turn brings us to:

(gdb) l *0xc092e0e4
0xc092e0e4 is in iommu_probe_device (drivers/iommu/iommu.c:232).
227 int ret;
228 
229 if (!dev_iommu_get(dev))
230 return -ENOMEM;
231 
232 if (!try_module_get(ops->owner)) {
233 ret = -EINVAL;
234 goto err_out;
235 }
236 


Hope this helps.

Guillaume
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: next/master bisection: baseline.login on jetson-tk1

2020-05-12 Thread Guillaume Tucker
Please see the bisection report below about a kernel panic.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

See the kernel Oops due to a NULL pointer followed by a panic:


https://storage.kernelci.org/next/master/next-20200511/arm/tegra_defconfig/gcc-8/lab-collabora/baseline-tegra124-jetson-tk1.html#L573

Stack trace:

<0>[2.953683] [] (__iommu_probe_device) from [] 
(iommu_probe_device+0x18/0x124)
<0>[2.962810] [] (iommu_probe_device) from [] 
(of_iommu_configure+0x154/0x1b8)
<0>[2.971853] [] (of_iommu_configure) from [] 
(of_dma_configure+0x144/0x2c8)
<0>[2.980722] [] (of_dma_configure) from [] 
(host1x_attach_driver+0x148/0x2c4)
<0>[2.989763] [] (host1x_attach_driver) from [] 
(host1x_driver_register_full+0x70/0xcc)
<0>[2.999585] [] (host1x_driver_register_full) from [] 
(host1x_drm_init+0x14/0x50)
<0>[3.008973] [] (host1x_drm_init) from [] 
(do_one_initcall+0x50/0x2b0)
<0>[3.017405] [] (do_one_initcall) from [] 
(kernel_init_freeable+0x188/0x200)
<0>[3.026361] [] (kernel_init_freeable) from [] 
(kernel_init+0x8/0x114)
<0>[3.034794] [] (kernel_init) from [] 
(ret_from_fork+0x14/0x2c)

Guillaume


On 12/05/2020 02:24, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on jetson-tk1
> 
> Summary:
>   Start:  4b20e7462caa6 Add linux-next specific files for 20200511
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20200511/arm/tegra_defconfig/gcc-8/lab-collabora/baseline-tegra124-jetson-tk1.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20200511/arm/tegra_defconfig/gcc-8/lab-collabora/baseline-tegra124-jetson-tk1.html
>   Result: 3eeeb45c6d044 iommu: Remove add_device()/remove_device() 
> code-paths
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: jetson-tk1
>   CPU arch:   arm
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: tegra_defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 3eeeb45c6d0444b368cdeba9bdafa8bbcf5370d1
> Author: Joerg Roedel 
> Date:   Wed Apr 29 15:37:10 2020 +0200
> 
> iommu: Remove add_device()/remove_device() code-paths
> 
> All drivers are converted to use the probe/release_device()
> call-backs, so the add_device/remove_device() pointers are unused and
> the code using them can be removed.
> 
> Signed-off-by: Joerg Roedel 
> Tested-by: Marek Szyprowski 
> Acked-by: Marek Szyprowski 
> Link: https://lore.kernel.org/r/20200429133712.31431-33-j...@8bytes.org
> Signed-off-by: Joerg Roedel 
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 397fd4fd0c320..7f99e5ae432c6 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -220,12 +220,20 @@ static int __iommu_probe_device(struct device *dev, 
> struct list_head *group_list
>   return ret;
>  }
>  
> -static int __iommu_probe_device_helper(struct device *dev)
> +int iommu_probe_device(struct device *dev)
>  {
>   const struct iommu_ops *ops = dev->bus->iommu_ops;
>   struct iommu_group *group;
>   int ret;
>  
> + if (!dev_iommu_get(dev))
> + return -ENOMEM;
> +
> + if (!try_module_get(ops->owner)) {
> + ret = -EINVAL;
> + goto err_out;
> + }
> +
>   ret = __iommu_probe_device(dev, NULL);
>   if (ret)
>   goto err_out;
> @@ -259,75 +267,23 @@ static int __iommu_probe_device_helper(struct device 
> *dev)
>  
>  err_release:
>   iommu_release_device(dev);
> +
>  err_out:
>   return ret;
>  
>  }
>  
> -int iommu_probe_device(struct device *dev)
> +void iommu_release_device(struct device *dev)
>  {
>   const struct iommu_ops *ops = dev->bus->iommu_ops;
> - struct iommu_group *group;
> - int ret;
> -
> - WARN_ON(dev->iommu_group);
> -
> - if (!ops)
> - return -EINVAL;
> -
> - if (!dev_iommu_get(dev))
> - return -ENOMEM;

Re: [15/15] dma-mapping: bypass indirect calls for dma-direct

2018-12-18 Thread Guillaume Tucker
On 07/12/2018 19:07, Christoph Hellwig wrote:
> Avoid expensive indirect calls in the fast path DMA mapping
> operations by directly calling the dma_direct_* ops if we are using
> the directly mapped DMA operations.
> 
> Signed-off-by: Christoph Hellwig 
> Signed-off-by: Christoph Hellwig 
> Reported-by: Marek Szyprowski 
> Tested-by: Marek Szyprowski 

I've run a semi-automated bisection on kernelci.org and found that this
patch appeared to cause some regressions in linux-next on the
rk3399-gru-kevin arm64 platform.  The bisection was run between
next-20181128 and its merge base in mainline master (6531e115b7ab) with
a plain defconfig.


The problems seem to start with this message:

[3.242163] mmc1: Unable to allocate ADMA buffers - falling back to standard 
DMA

then we can see this kind of warnings:

[3.424261] mmc1: asked for transfer of 512 bytes exceeds bounce buffer 0 
bytes
[3.432488] WARNING: CPU: 3 PID: 1596 at ../drivers/mmc/host/sdhci.c:1050 
sdhci_send_command+0x8f0/0xfe8

see also:

[   16.046084] rk_iommu ff8f3f00.iommu: DMA map error for DT


The full kernel log is available here:

  https://lava.collabora.co.uk/scheduler/job/1395093


Reverting this patch makes the errors go away, but I haven't done any
further investigation so the actual problem may well lie somewhere else.

Hope this helps!

Best wishes,
Guillaume

> ---
>  arch/alpha/include/asm/dma-mapping.h |   2 +-
>  arch/arc/mm/cache.c  |   2 +-
>  arch/arm/include/asm/dma-mapping.h   |   2 +-
>  arch/arm/mm/dma-mapping-nommu.c  |  14 +---
>  arch/arm64/mm/dma-mapping.c  |   3 -
>  arch/ia64/hp/common/hwsw_iommu.c |   2 +-
>  arch/ia64/hp/common/sba_iommu.c  |   4 +-
>  arch/ia64/kernel/dma-mapping.c   |   1 -
>  arch/mips/include/asm/dma-mapping.h  |   2 +-
>  arch/parisc/kernel/setup.c   |   4 -
>  arch/sparc/include/asm/dma-mapping.h |   4 +-
>  arch/x86/kernel/pci-dma.c|   2 +-
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   2 +-
>  drivers/iommu/amd_iommu.c|  13 +---
>  include/asm-generic/dma-mapping.h|   2 +-
>  include/linux/dma-direct.h   |  17 
>  include/linux/dma-mapping.h  | 111 +++
>  include/linux/dma-noncoherent.h  |   5 +-
>  kernel/dma/direct.c  |  37 ++---
>  kernel/dma/mapping.c |  40 ++
>  20 files changed, 150 insertions(+), 119 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/dma-mapping.h 
> b/arch/alpha/include/asm/dma-mapping.h
> index 8beeafd4f68e..0ee6a5c99b16 100644
> --- a/arch/alpha/include/asm/dma-mapping.h
> +++ b/arch/alpha/include/asm/dma-mapping.h
> @@ -7,7 +7,7 @@ extern const struct dma_map_ops alpha_pci_ops;
>  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type 
> *bus)
>  {
>  #ifdef CONFIG_ALPHA_JENSEN
> - return _direct_ops;
> + return NULL;
>  #else
>   return _pci_ops;
>  #endif
> diff --git a/arch/arc/mm/cache.c b/arch/arc/mm/cache.c
> index f2701c13a66b..e188bb3ede53 100644
> --- a/arch/arc/mm/cache.c
> +++ b/arch/arc/mm/cache.c
> @@ -1280,7 +1280,7 @@ void __init arc_cache_init_master(void)
>   /*
>* In case of IOC (say IOC+SLC case), pointers above could still be set
>* but end up not being relevant as the first function in chain is not
> -  * called at all for @dma_direct_ops
> +  * called at all for devices using coherent DMA.
>* arch_sync_dma_for_cpu() -> dma_cache_*() -> __dma_cache_*()
>*/
>  }
> diff --git a/arch/arm/include/asm/dma-mapping.h 
> b/arch/arm/include/asm/dma-mapping.h
> index 965b7c846ecb..31d3b96f0f4b 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -18,7 +18,7 @@ extern const struct dma_map_ops arm_coherent_dma_ops;
>  
>  static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type 
> *bus)
>  {
> - return IS_ENABLED(CONFIG_MMU) ? _dma_ops : _direct_ops;
> + return IS_ENABLED(CONFIG_MMU) ? _dma_ops : NULL;
>  }
>  
>  #ifdef __arch_page_to_dma
> diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
> index 712416ecd8e6..f304b10e23a4 100644
> --- a/arch/arm/mm/dma-mapping-nommu.c
> +++ b/arch/arm/mm/dma-mapping-nommu.c
> @@ -22,7 +22,7 @@
>  #include "dma.h"
>  
>  /*
> - *  dma_direct_ops is used if
> + *  The generic direct mapping code is used if
>   *   - MMU/MPU is off
>   *   - cpu is v7m w/o cache support
>   *   - device is coherent
> @@ -209,16 +209,9 @@ const struct dma_map_ops arm_nommu_dma_ops = {
>  };
>  EXPORT_SYMBOL(arm_nommu_dma_ops);
>  
> -static const struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent)
> -{
> - return coherent ? _direct_ops : _nommu_dma_ops;
> -}
> -
>  void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>   const struct iommu_ops *iommu, bool coherent)
>  {
> - const struct dma_map_ops *dma_ops;
> -
>   if