Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling

2021-07-07 Thread Rob Clark
On Tue, Jul 6, 2021 at 10:12 PM John Stultz  wrote:
>
> On Sun, Jul 4, 2021 at 11:16 AM Rob Clark  wrote:
> >
> > I suspect you are getting a dpu fault, and need:
> >
> > https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=h...@mail.gmail.com/
> >
> > I suppose Bjorn was expecting me to send that patch
>
> If it's helpful, I applied that and it got the db845c booting mainline
> again for me (along with some reverts for a separate ext4 shrinker
> crash).
> Tested-by: John Stultz 
>

Thanks, I'll send a patch shortly

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling

2021-07-06 Thread John Stultz
On Sun, Jul 4, 2021 at 11:16 AM Rob Clark  wrote:
>
> I suspect you are getting a dpu fault, and need:
>
> https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=h...@mail.gmail.com/
>
> I suppose Bjorn was expecting me to send that patch

If it's helpful, I applied that and it got the db845c booting mainline
again for me (along with some reverts for a separate ext4 shrinker
crash).
Tested-by: John Stultz 

thanks
-john
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling

2021-07-06 Thread Bjorn Andersson
On Sun 04 Jul 13:20 CDT 2021, Rob Clark wrote:

> I suspect you are getting a dpu fault, and need:
> 
> https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=h...@mail.gmail.com/
> 
> I suppose Bjorn was expecting me to send that patch
> 

No, I left that discussion with the same understanding as you... But I
ended up side tracked by some other craziness.

Did you post this somewhere or would you still like me to test it and
spin a patch?

Regards,
Bjorn

> BR,
> -R
> 
> On Sun, Jul 4, 2021 at 5:53 AM Dmitry Baryshkov
>  wrote:
> >
> > Hi,
> >
> > I've had splash screen disabled on my RB3. However once I've enabled it,
> > I've got the attached crash during the boot on the msm/msm-next. It
> > looks like it is related to this particular set of changes.
> >
> > On 11/06/2021 00:44, Rob Clark wrote:
> > > From: Rob Clark 
> > >
> > > This picks up an earlier series[1] from Jordan, and adds additional
> > > support needed to generate GPU devcore dumps on iova faults.  Original
> > > description:
> > >
> > > This is a stack to add an Adreno GPU specific handler for pagefaults. The 
> > > first
> > > patch starts by wiring up report_iommu_fault for arm-smmu. The next patch 
> > > adds
> > > a adreno-smmu-priv function hook to capture a handful of important 
> > > debugging
> > > registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by 
> > > the
> > > third patch to print more detailed information on page fault such as the 
> > > TTBR0
> > > for the pagetable that caused the fault and the source of the fault as
> > > determined by a combination of the FSYNR1 register and an internal GPU
> > > register.
> > >
> > > This code provides a solid base that we can expand on later for even more
> > > extensive GPU side page fault debugging capabilities.
> > >
> > > v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
> > >  GPU snapshotting needs to avoid crashdumper, and check the
> > >  RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
> > > v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
> > >  resume translation after it has had a chance to snapshot the GPUs
> > >  state
> > > v3: Always clear FSR even if the target driver is going to handle resume
> > > v2: Fix comment wording and function pointer check per Rob Clark
> > >
> > > [1] 
> > > https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcro...@codeaurora.org/
> > >
> > > Jordan Crouse (3):
> > >iommu/arm-smmu: Add support for driver IOMMU fault handlers
> > >iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
> > >  info
> > >drm/msm: Improve the a6xx page fault handler
> > >
> > > Rob Clark (2):
> > >iommu/arm-smmu-qcom: Add stall support
> > >drm/msm: devcoredump iommu fault support
> > >
> > >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  23 +++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 110 +++-
> > >   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++--
> > >   drivers/gpu/drm/msm/adreno/adreno_gpu.c |  15 +++
> > >   drivers/gpu/drm/msm/msm_gem.h   |   1 +
> > >   drivers/gpu/drm/msm/msm_gem_submit.c|   1 +
> > >   drivers/gpu/drm/msm/msm_gpu.c   |  48 +
> > >   drivers/gpu/drm/msm/msm_gpu.h   |  17 +++
> > >   drivers/gpu/drm/msm/msm_gpummu.c|   5 +
> > >   drivers/gpu/drm/msm/msm_iommu.c |  22 +++-
> > >   drivers/gpu/drm/msm/msm_mmu.h   |   5 +-
> > >   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +
> > >   drivers/iommu/arm/arm-smmu/arm-smmu.c   |   9 +-
> > >   drivers/iommu/arm/arm-smmu/arm-smmu.h   |   2 +
> > >   include/linux/adreno-smmu-priv.h|  38 ++-
> > >   15 files changed, 367 insertions(+), 21 deletions(-)
> > >
> >
> >
> > --
> > With best wishes
> > Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling

2021-07-04 Thread Rob Clark
I suspect you are getting a dpu fault, and need:

https://lore.kernel.org/linux-arm-msm/CAF6AEGvTjTUQXqom-xhdh456tdLscbVFPQ+iud1H1gHc8A2=h...@mail.gmail.com/

I suppose Bjorn was expecting me to send that patch

BR,
-R

On Sun, Jul 4, 2021 at 5:53 AM Dmitry Baryshkov
 wrote:
>
> Hi,
>
> I've had splash screen disabled on my RB3. However once I've enabled it,
> I've got the attached crash during the boot on the msm/msm-next. It
> looks like it is related to this particular set of changes.
>
> On 11/06/2021 00:44, Rob Clark wrote:
> > From: Rob Clark 
> >
> > This picks up an earlier series[1] from Jordan, and adds additional
> > support needed to generate GPU devcore dumps on iova faults.  Original
> > description:
> >
> > This is a stack to add an Adreno GPU specific handler for pagefaults. The 
> > first
> > patch starts by wiring up report_iommu_fault for arm-smmu. The next patch 
> > adds
> > a adreno-smmu-priv function hook to capture a handful of important debugging
> > registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
> > third patch to print more detailed information on page fault such as the 
> > TTBR0
> > for the pagetable that caused the fault and the source of the fault as
> > determined by a combination of the FSYNR1 register and an internal GPU
> > register.
> >
> > This code provides a solid base that we can expand on later for even more
> > extensive GPU side page fault debugging capabilities.
> >
> > v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
> >  GPU snapshotting needs to avoid crashdumper, and check the
> >  RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
> > v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
> >  resume translation after it has had a chance to snapshot the GPUs
> >  state
> > v3: Always clear FSR even if the target driver is going to handle resume
> > v2: Fix comment wording and function pointer check per Rob Clark
> >
> > [1] 
> > https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcro...@codeaurora.org/
> >
> > Jordan Crouse (3):
> >iommu/arm-smmu: Add support for driver IOMMU fault handlers
> >iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
> >  info
> >drm/msm: Improve the a6xx page fault handler
> >
> > Rob Clark (2):
> >iommu/arm-smmu-qcom: Add stall support
> >drm/msm: devcoredump iommu fault support
> >
> >   drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  23 +++-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 110 +++-
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++--
> >   drivers/gpu/drm/msm/adreno/adreno_gpu.c |  15 +++
> >   drivers/gpu/drm/msm/msm_gem.h   |   1 +
> >   drivers/gpu/drm/msm/msm_gem_submit.c|   1 +
> >   drivers/gpu/drm/msm/msm_gpu.c   |  48 +
> >   drivers/gpu/drm/msm/msm_gpu.h   |  17 +++
> >   drivers/gpu/drm/msm/msm_gpummu.c|   5 +
> >   drivers/gpu/drm/msm/msm_iommu.c |  22 +++-
> >   drivers/gpu/drm/msm/msm_mmu.h   |   5 +-
> >   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +
> >   drivers/iommu/arm/arm-smmu/arm-smmu.c   |   9 +-
> >   drivers/iommu/arm/arm-smmu/arm-smmu.h   |   2 +
> >   include/linux/adreno-smmu-priv.h|  38 ++-
> >   15 files changed, 367 insertions(+), 21 deletions(-)
> >
>
>
> --
> With best wishes
> Dmitry
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling

2021-07-04 Thread Dmitry Baryshkov

Hi,

I've had splash screen disabled on my RB3. However once I've enabled it, 
I've got the attached crash during the boot on the msm/msm-next. It 
looks like it is related to this particular set of changes.


On 11/06/2021 00:44, Rob Clark wrote:

From: Rob Clark 

This picks up an earlier series[1] from Jordan, and adds additional
support needed to generate GPU devcore dumps on iova faults.  Original
description:

This is a stack to add an Adreno GPU specific handler for pagefaults. The first
patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
a adreno-smmu-priv function hook to capture a handful of important debugging
registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
third patch to print more detailed information on page fault such as the TTBR0
for the pagetable that caused the fault and the source of the fault as
determined by a combination of the FSYNR1 register and an internal GPU
register.

This code provides a solid base that we can expand on later for even more
extensive GPU side page fault debugging capabilities.

v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
 GPU snapshotting needs to avoid crashdumper, and check the
 RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
 resume translation after it has had a chance to snapshot the GPUs
 state
v3: Always clear FSR even if the target driver is going to handle resume
v2: Fix comment wording and function pointer check per Rob Clark

[1] 
https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcro...@codeaurora.org/

Jordan Crouse (3):
   iommu/arm-smmu: Add support for driver IOMMU fault handlers
   iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
 info
   drm/msm: Improve the a6xx page fault handler

Rob Clark (2):
   iommu/arm-smmu-qcom: Add stall support
   drm/msm: devcoredump iommu fault support

  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  23 +++-
  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 110 +++-
  drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++--
  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  15 +++
  drivers/gpu/drm/msm/msm_gem.h   |   1 +
  drivers/gpu/drm/msm/msm_gem_submit.c|   1 +
  drivers/gpu/drm/msm/msm_gpu.c   |  48 +
  drivers/gpu/drm/msm/msm_gpu.h   |  17 +++
  drivers/gpu/drm/msm/msm_gpummu.c|   5 +
  drivers/gpu/drm/msm/msm_iommu.c |  22 +++-
  drivers/gpu/drm/msm/msm_mmu.h   |   5 +-
  drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +
  drivers/iommu/arm/arm-smmu/arm-smmu.c   |   9 +-
  drivers/iommu/arm/arm-smmu/arm-smmu.h   |   2 +
  include/linux/adreno-smmu-priv.h|  38 ++-
  15 files changed, 367 insertions(+), 21 deletions(-)




--
With best wishes
Dmitry
Handling Cmd: set_active:a
SetActiveSlot: _a already active slot
Handling Cmd: download:061b8800
Download Finished
Handling Cmd: boot
A/B retry count NOT decremented
Booting Into Mission Mode
No dtbo partition is found, Skip dtbo
Exit key detection timer
GetVmData: making ScmCall to get HypInfo
GetVmData: No Vm data present! Status = (0x3)
No Ffbm cookie found, ignore: Not Found
Memory Base Address: 0x8000
Decompressing kernel image start: 32313 ms
Decompressing kernel image done: 35477 ms
BootLinux: failed to get dtbo image
DTB offset is incorrect, kernel image does not have appended DTB
Cmdline: ignore_loglevel console=ttyMSM0,115200n8 clk_ignore_unused 
pd_ignore_unused earlycon pcie_pme=nomsi root=PARTLABEL=userdata rootwait 
androidboot.bootdevice=1d84000.ufshc androidboot.serialno=8f186bb6 
androidboot.baseband=msm msm_drm.dsi_display0=
RAM Partitions
Add Base: 0x8000 Available Length: 0xFDFA 
WARNING: Unsupported EFI_RAMPARTITION_PROTOCOL
ERROR: Could not get splash memory region node
kaslr-Seed is added to chosen node

Shutting Down UEFI Boot Services: 36987 ms
BDS: LogFs sync skipped, Unsupported
App Log Flush : 0 ms
Exit BS[37141] UEFI End
[0.00] Booting Linux on physical CPU 0x00 [0x517f803c]
[0.00] Linux version 5.13.0-rc3-00115-g9b6193ea776c-dirty 
(lumag@eriador) (aarch64-linux-gnu-gcc (Debian 11.1.0-1) 11.1.0, GNU ld (GNU 
Binutils for Debian) 2.35.2) #142 SMP PREEMPT Sun Jul 4 15:25:15 MSK 2021
[0.00] Machine model: Thundercomm Dragonboard 845c
[0.00] printk: debug: ignoring loglevel setting.
[0.00] earlycon: qcom_geni0 at MMIO 0x00a84000 (options 
'115200n8')
[0.00] printk: bootconsole [qcom_geni0] enabled
[0.00] efi: UEFI not found.
[0.00] NUMA: No NUMA configuration found
[0.00] NUMA: Faking a node at [mem 
0x8000-0x00017df9]
[0.00] NUMA: NODE_DATA [mem 0x17d78c200-0x17d78dfff]
[0.00] Zone ranges:
[0.00]   DMA  [mem 

[PATCH v5 0/5] iommu/arm-smmu: adreno-smmu page fault handling

2021-06-10 Thread Rob Clark
From: Rob Clark 

This picks up an earlier series[1] from Jordan, and adds additional
support needed to generate GPU devcore dumps on iova faults.  Original
description:

This is a stack to add an Adreno GPU specific handler for pagefaults. The first
patch starts by wiring up report_iommu_fault for arm-smmu. The next patch adds
a adreno-smmu-priv function hook to capture a handful of important debugging
registers such as TTBR0, CONTEXTIDR, FSYNR0 and others. This is used by the
third patch to print more detailed information on page fault such as the TTBR0
for the pagetable that caused the fault and the source of the fault as
determined by a combination of the FSYNR1 register and an internal GPU
register.

This code provides a solid base that we can expand on later for even more
extensive GPU side page fault debugging capabilities.

v5: [Rob] Use RBBM_STATUS3.SMMU_STALLED_ON_FAULT to detect case where
GPU snapshotting needs to avoid crashdumper, and check the
RBBM_STATUS3.SMMU_STALLED_ON_FAULT in GPU hang irq paths
v4: [Rob] Add support to stall SMMU on fault, and let the GPU driver
resume translation after it has had a chance to snapshot the GPUs
state
v3: Always clear FSR even if the target driver is going to handle resume
v2: Fix comment wording and function pointer check per Rob Clark

[1] 
https://lore.kernel.org/dri-devel/20210225175135.91922-1-jcro...@codeaurora.org/

Jordan Crouse (3):
  iommu/arm-smmu: Add support for driver IOMMU fault handlers
  iommu/arm-smmu-qcom: Add an adreno-smmu-priv callback to get pagefault
info
  drm/msm: Improve the a6xx page fault handler

Rob Clark (2):
  iommu/arm-smmu-qcom: Add stall support
  drm/msm: devcoredump iommu fault support

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  23 +++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 110 +++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c |  42 ++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  15 +++
 drivers/gpu/drm/msm/msm_gem.h   |   1 +
 drivers/gpu/drm/msm/msm_gem_submit.c|   1 +
 drivers/gpu/drm/msm/msm_gpu.c   |  48 +
 drivers/gpu/drm/msm/msm_gpu.h   |  17 +++
 drivers/gpu/drm/msm/msm_gpummu.c|   5 +
 drivers/gpu/drm/msm/msm_iommu.c |  22 +++-
 drivers/gpu/drm/msm/msm_mmu.h   |   5 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c  |  50 +
 drivers/iommu/arm/arm-smmu/arm-smmu.c   |   9 +-
 drivers/iommu/arm/arm-smmu/arm-smmu.h   |   2 +
 include/linux/adreno-smmu-priv.h|  38 ++-
 15 files changed, 367 insertions(+), 21 deletions(-)

-- 
2.31.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu