Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-07-05 Thread Steven Price
On 02/07/2021 02:40, Chunyou Tang wrote:
> Hi Steve,
>> You didn't answer my previous question:
>>
>>> Is this device working with the kbase/DDK proprietary driver?
> 
> I don't know whether I used kbase/DDK,I only know I used the driver of
> panfrost in linux 5.11.

kbase is the Linux kernel driver for Arm's proprietary driver. The
proprietary driver from Arm is often called "the [Mali] DDK" by Arm (or
"the blob" by others). I guess "libmali.so" is the closest it has to a
real name.

>> What you are describing sounds like a hardware integration issue, so
>> it would be good to check that the hardware is working with the
>> proprietary driver to rule that out. And perhaps there is something
>> in the kbase for this device that is setting a chicken bit to 'fix'
>> the coherency?
> 
> I don't have the proprietary driver,I only used driver in linux 5.11.

Interesting - I would have expected the initial hardware bring up to
have happened with the proprietary driver from Arm. And my first step
would be to check whether any workarounds for integration issues were
applied in the kernel.

I think we'd need some input from the people who did the hardware
integration, and hopefully they would have access to the proprietary
driver from Arm.

Steve

> Thinks very much!
> 
> Chunyou.
> 
> 
> ?? Thu, 1 Jul 2021 11:15:14 +0100
> Steven Price  :
> 
>> On 29/06/2021 04:04, Chunyou Tang wrote:
> 
> 
>>> Hi Steve,
>>> thinks for your reply.
>>> I set the pte in arm_lpae_prot_to_pte(),
>>> ***
>>> /*
>>>  * Also Mali has its own notions of shareability wherein its
>>> Inner
>>>  * domain covers the cores within the GPU, and its Outer
>>> domain is
>>>  * "outside the GPU" (i.e. either the Inner or System
>>> domain in CPU
>>>  * terms, depending on coherency).
>>>  */
>>> if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
>>> pte |= ARM_LPAE_PTE_SH_IS;
>>> else
>>> pte |= ARM_LPAE_PTE_SH_OS;
>>> ***
>>> I set pte |= ARM_LPAE_PTE_SH_NS.
>>>
>>> If I set pte to ARM_LPAE_PTE_SH_OS or
>>> ARM_LPAE_PTE_SH_IS,whether I use singel core GPU or multi
>>> core GPU,it will occur GPU Fault.
>>> if I set pte to ARM_LPAE_PTE_SH_NS,whether I use singel core
>>> GPU or multi core GPU,it will not occur GPU Fault.
>>
>> Hi,
>>
>> So this is a difference between Panfrost and kbase. Panfrost (well
>> technically the IOMMU framework) enables the inner-shareable bit for
>> all memory, whereas kbase only enables it for some memory types (the
>> BASE_MEM_COHERENT_LOCAL flag in the UABI controls it). However this
>> should only be a performance/power difference (and AFAIK probably an
>> irrelevant one) and it's definitely required that "inner shareable"
>> (i.e. within the GPU) works for communication between the different
>> units of the GPU.
>>
>> You didn't answer my previous question:
>>
>>> Is this device working with the kbase/DDK proprietary driver?
>>
>> What you are describing sounds like a hardware integration issue, so
>> it would be good to check that the hardware is working with the
>> proprietary driver to rule that out. And perhaps there is something
>> in the kbase for this device that is setting a chicken bit to 'fix'
>> the coherency?
>>
>> Steve
> 
> 



Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-07-01 Thread Chunyou Tang
Hi Steve,
> You didn't answer my previous question:
> 
> > Is this device working with the kbase/DDK proprietary driver?

I don't know whether I used kbase/DDK,I only know I used the driver of
panfrost in linux 5.11.

> What you are describing sounds like a hardware integration issue, so
> it would be good to check that the hardware is working with the
> proprietary driver to rule that out. And perhaps there is something
> in the kbase for this device that is setting a chicken bit to 'fix'
> the coherency?

I don't have the proprietary driver,I only used driver in linux 5.11.

Thinks very much!

Chunyou.


?? Thu, 1 Jul 2021 11:15:14 +0100
Steven Price  :

> On 29/06/2021 04:04, Chunyou Tang wrote:


> > Hi Steve,
> > thinks for your reply.
> > I set the pte in arm_lpae_prot_to_pte(),
> > ***
> > /*
> >  * Also Mali has its own notions of shareability wherein its
> > Inner
> >  * domain covers the cores within the GPU, and its Outer
> > domain is
> >  * "outside the GPU" (i.e. either the Inner or System
> > domain in CPU
> >  * terms, depending on coherency).
> >  */
> > if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
> > pte |= ARM_LPAE_PTE_SH_IS;
> > else
> > pte |= ARM_LPAE_PTE_SH_OS;
> > ***
> > I set pte |= ARM_LPAE_PTE_SH_NS.
> > 
> > If I set pte to ARM_LPAE_PTE_SH_OS or
> > ARM_LPAE_PTE_SH_IS,whether I use singel core GPU or multi
> > core GPU,it will occur GPU Fault.
> > if I set pte to ARM_LPAE_PTE_SH_NS,whether I use singel core
> > GPU or multi core GPU,it will not occur GPU Fault.
> 
> Hi,
> 
> So this is a difference between Panfrost and kbase. Panfrost (well
> technically the IOMMU framework) enables the inner-shareable bit for
> all memory, whereas kbase only enables it for some memory types (the
> BASE_MEM_COHERENT_LOCAL flag in the UABI controls it). However this
> should only be a performance/power difference (and AFAIK probably an
> irrelevant one) and it's definitely required that "inner shareable"
> (i.e. within the GPU) works for communication between the different
> units of the GPU.
> 
> You didn't answer my previous question:
> 
> > Is this device working with the kbase/DDK proprietary driver?
> 
> What you are describing sounds like a hardware integration issue, so
> it would be good to check that the hardware is working with the
> proprietary driver to rule that out. And perhaps there is something
> in the kbase for this device that is setting a chicken bit to 'fix'
> the coherency?
> 
> Steve




Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-07-01 Thread Steven Price
On 29/06/2021 04:04, Chunyou Tang wrote:
> Hi Steve,
>   thinks for your reply.
>   I set the pte in arm_lpae_prot_to_pte(),
> ***
>   /*
>* Also Mali has its own notions of shareability wherein its
> Inner
>* domain covers the cores within the GPU, and its Outer domain
> is
>* "outside the GPU" (i.e. either the Inner or System domain in
> CPU
>* terms, depending on coherency).
>*/
>   if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
>   pte |= ARM_LPAE_PTE_SH_IS;
>   else
>   pte |= ARM_LPAE_PTE_SH_OS;
> ***
> I set pte |= ARM_LPAE_PTE_SH_NS.
> 
>   If I set pte to ARM_LPAE_PTE_SH_OS or
>   ARM_LPAE_PTE_SH_IS,whether I use singel core GPU or multi core
>   GPU,it will occur GPU Fault.
>   if I set pte to ARM_LPAE_PTE_SH_NS,whether I use singel core
>   GPU or multi core GPU,it will not occur GPU Fault.

Hi,

So this is a difference between Panfrost and kbase. Panfrost (well
technically the IOMMU framework) enables the inner-shareable bit for all
memory, whereas kbase only enables it for some memory types (the
BASE_MEM_COHERENT_LOCAL flag in the UABI controls it). However this
should only be a performance/power difference (and AFAIK probably an
irrelevant one) and it's definitely required that "inner shareable"
(i.e. within the GPU) works for communication between the different
units of the GPU.

You didn't answer my previous question:

> Is this device working with the kbase/DDK proprietary driver?

What you are describing sounds like a hardware integration issue, so it
would be good to check that the hardware is working with the proprietary
driver to rule that out. And perhaps there is something in the kbase for
this device that is setting a chicken bit to 'fix' the coherency?

Steve


Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-06-28 Thread Chunyou Tang
Hi Robin,
thinks for you reply.
I had send more detal to Steve in another mail,thinks for your
messages.

?? Mon, 28 Jun 2021 15:17:51 +0100
Robin Murphy  :

> On 2021-06-28 11:48, Steven Price wrote:
> > On 25/06/2021 10:49, Chunyou Tang wrote:
> >> Hi Steve,
> >>Thinks for your reply.
> >>When I only set the pte |= ARM_LPAE_PTE_SH_NS;there have
> >> no "GPU Fault",When I set the pte |= ARM_LPAE_PTE_SH_IS(or
> >> ARM_LPAE_PTE_SH_OS);there have "GPU Fault".I don't know how the pte
> >> effect this issue?
> >>Can you give me some suggestions again?
> >>
> >> Thinks.
> >>
> >> Chunyou
> > 
> > Hi Chunyou,
> > 
> > You haven't given me much context so I'm not entirely sure which
> > PTE you are talking about (GPU or CPU), or indeed where you are
> > changing the PTE values.
> > 
> > The PTEs control whether a page is shareable or not, the GPU
> > requires that accesses are consistent (i.e. either all accesses to
> > a page are shareable or all are non-shareable) and will race a
> > fault if it detects this isn't the case. Mali also has a quirk for
> > its version of 'LPAE' where inner shareable actually means only
> > within the GPU and outer shareable means outside the GPU (which I
> > think usually means Inner Shareable on the external bus).
> 
> Furthermore, the way the io-pgtable code works for ARM_MALI_LPAE
> format means that *all* GPU mappings are unconditionally
> outer-shareable, so it's not clear how the GPU could observe a
> mismatch in the first place (other than major integration issues
> causing data corruption).
> 
> Robin.
> 
> > 
> > Steve
> > 
> >> ?? Thu, 24 Jun 2021 14:22:04 +0100
> >> Steven Price  :
> >>
> >>> On 22/06/2021 02:40, Chunyou Tang wrote:
>  Hi Steve,
>   I will send a new patch with suitable subject/commit
>  message. But I send a V3 or a new patch?
> >>>
> >>> Send a V3 - it is a new version of this patch.
> >>>
>   I met a bug about the GPU,I have no idea about how to fix
>  it, If you can give me some suggestion,it is perfect.
> 
>  You can see such kernel log:
> 
>  Jun 20 10:20:13 icube kernel: [  774.566760] mvp_gpu
>  :05:00.0: GPU Fault 0x0088 (SHAREABILITY_FAULT) at
>  0x0310fd00 Jun 20 10:20:13 icube kernel: [  774.566764]
>  mvp_gpu :05:00.0: There were multiple GPU faults - some have
>  not been reported Jun 20 10:20:13 icube kernel: [  774.667542]
>  mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube
>  kernel: [  774.767900] mvp_gpu :05:00.0: AS_ACTIVE bit stuck
>  Jun 20 10:20:13 icube kernel: [  774.868546] mvp_gpu
>  :05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel:
>  [  774.968910] mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20
>  10:20:13 icube kernel: [  775.069251] mvp_gpu :05:00.0:
>  AS_ACTIVE bit stuck Jun 20 10:20:22 icube kernel: [  783.693971]
>  mvp_gpu :05:00.0: gpu sched timeout, js=1, config=0x7300,
>  status=0x8, head=0x362c900, tail=0x362c100,
>  sched_job=3252fb84
> 
>  In
>  https://lore.kernel.org/dri-devel/20200510165538.19720-1-peron.c...@gmail.com/
>  there had a same bug like mine,and I found you at the mail list,I
>  don't know how it fixed?
> >>>
> >>> The GPU_SHAREABILITY_FAULT error means that a cache line has been
> >>> accessed both as shareable and non-shareable and therefore
> >>> coherency cannot be guaranteed. Although the "multiple GPU
> >>> faults" means that this may not be the underlying cause.
> >>>
> >>> The fact that your dmesg log has PCI style identifiers
> >>> (":05:00.0") suggests this is an unusual platform - I've not
> >>> previously been aware of a Mali device behind PCI. Is this device
> >>> working with the kbase/DDK proprietary driver? It would be worth
> >>> looking at the kbase kernel code for the platform to see if there
> >>> is anything special done for the platform.
> >>>
> >>>  From the dmesg logs all I can really tell is that the GPU seems
> >>> unhappy about the memory system.
> >>>
> >>> Steve
> >>>
>  I need your help!
> 
>  thinks very much!
> 
>  Chunyou
> 
>  ?? Mon, 21 Jun 2021 11:45:20 +0100
>  Steven Price  :
> 
> > On 19/06/2021 04:18, Chunyou Tang wrote:
> >> Hi Steve,
> >>1,Now I know how to write the subject
> >>2,the low 8 bits is the exception type in spec.
> >>
> >> and you can see prnfrost_exception_name()
> >>
> >> switch (exception_code) {
> >>  /* Non-Fault Status code */
> >> case 0x00: return "NOT_STARTED/IDLE/OK";
> >> case 0x01: return "DONE";
> >> case 0x02: return "INTERRUPTED";
> >> case 0x03: return "STOPPED";
> >> case 0x04: return "TERMINATED";
> >> case 0x08: return "ACTIVE";
> >> 
> >> 
> >> case 0xD8: return "ACCESS_FLAG";
> >> case 0xD9 ... 0xDF: return "ACCESS_FLAG";
> >> 

Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-06-28 Thread Chunyou Tang
Hi Steve,
thinks for your reply.
I set the pte in arm_lpae_prot_to_pte(),
***
/*
 * Also Mali has its own notions of shareability wherein its
Inner
 * domain covers the cores within the GPU, and its Outer domain
is
 * "outside the GPU" (i.e. either the Inner or System domain in
CPU
 * terms, depending on coherency).
 */
if (prot & IOMMU_CACHE && data->iop.fmt != ARM_MALI_LPAE)
pte |= ARM_LPAE_PTE_SH_IS;
else
pte |= ARM_LPAE_PTE_SH_OS;
***
I set pte |= ARM_LPAE_PTE_SH_NS.

If I set pte to ARM_LPAE_PTE_SH_OS or
ARM_LPAE_PTE_SH_IS,whether I use singel core GPU or multi core
GPU,it will occur GPU Fault.
if I set pte to ARM_LPAE_PTE_SH_NS,whether I use singel core
GPU or multi core GPU,it will not occur GPU Fault.

Thinks

Chunyou

?? Mon, 28 Jun 2021 11:48:59 +0100
Steven Price  :

> On 25/06/2021 10:49, Chunyou Tang wrote:
> > Hi Steve,
> > Thinks for your reply.
> > When I only set the pte |= ARM_LPAE_PTE_SH_NS;there have no
> > "GPU Fault",When I set the pte |= ARM_LPAE_PTE_SH_IS(or
> > ARM_LPAE_PTE_SH_OS);there have "GPU Fault".I don't know how the pte
> > effect this issue?
> > Can you give me some suggestions again?
> > 
> > Thinks.
> > 
> > Chunyou
> 
> Hi Chunyou,
> 
> You haven't given me much context so I'm not entirely sure which PTE
> you are talking about (GPU or CPU), or indeed where you are changing
> the PTE values.
> 
> The PTEs control whether a page is shareable or not, the GPU requires
> that accesses are consistent (i.e. either all accesses to a page are
> shareable or all are non-shareable) and will race a fault if it
> detects this isn't the case. Mali also has a quirk for its version of
> 'LPAE' where inner shareable actually means only within the GPU and
> outer shareable means outside the GPU (which I think usually means
> Inner Shareable on the external bus).
> 
> Steve
> 
> > ?? Thu, 24 Jun 2021 14:22:04 +0100
> > Steven Price  :
> > 
> >> On 22/06/2021 02:40, Chunyou Tang wrote:
> >>> Hi Steve,
> >>>   I will send a new patch with suitable subject/commit
> >>> message. But I send a V3 or a new patch?
> >>
> >> Send a V3 - it is a new version of this patch.
> >>
> >>>   I met a bug about the GPU,I have no idea about how to fix
> >>> it, If you can give me some suggestion,it is perfect.
> >>>
> >>> You can see such kernel log:
> >>>
> >>> Jun 20 10:20:13 icube kernel: [  774.566760] mvp_gpu :05:00.0:
> >>> GPU Fault 0x0088 (SHAREABILITY_FAULT) at 0x0310fd00
> >>> Jun 20 10:20:13 icube kernel: [  774.566764] mvp_gpu :05:00.0:
> >>> There were multiple GPU faults - some have not been reported Jun
> >>> 20 10:20:13 icube kernel: [  774.667542] mvp_gpu :05:00.0:
> >>> AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [  774.767900]
> >>> mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube
> >>> kernel: [  774.868546] mvp_gpu :05:00.0: AS_ACTIVE bit stuck
> >>> Jun 20 10:20:13 icube kernel: [  774.968910] mvp_gpu :05:00.0:
> >>> AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [  775.069251]
> >>> mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:22 icube
> >>> kernel: [  783.693971] mvp_gpu :05:00.0: gpu sched timeout,
> >>> js=1, config=0x7300, status=0x8, head=0x362c900, tail=0x362c100,
> >>> sched_job=3252fb84
> >>>
> >>> In
> >>> https://lore.kernel.org/dri-devel/20200510165538.19720-1-peron.c...@gmail.com/
> >>> there had a same bug like mine,and I found you at the mail list,I
> >>> don't know how it fixed?
> >>
> >> The GPU_SHAREABILITY_FAULT error means that a cache line has been
> >> accessed both as shareable and non-shareable and therefore
> >> coherency cannot be guaranteed. Although the "multiple GPU faults"
> >> means that this may not be the underlying cause.
> >>
> >> The fact that your dmesg log has PCI style identifiers
> >> (":05:00.0") suggests this is an unusual platform - I've not
> >> previously been aware of a Mali device behind PCI. Is this device
> >> working with the kbase/DDK proprietary driver? It would be worth
> >> looking at the kbase kernel code for the platform to see if there
> >> is anything special done for the platform.
> >>
> >> From the dmesg logs all I can really tell is that the GPU seems
> >> unhappy about the memory system.
> >>
> >> Steve
> >>
> >>> I need your help!
> >>>
> >>> thinks very much!
> >>>
> >>> Chunyou
> >>>
> >>> ?? Mon, 21 Jun 2021 11:45:20 +0100
> >>> Steven Price  :
> >>>
>  On 19/06/2021 04:18, Chunyou Tang wrote:
> > Hi Steve,
> > 1,Now I know how to write the subject
> > 2,the low 8 bits is the exception type in spec.
> >
> > and you can see prnfrost_exception_name()
> >
> > switch (exception_code) {
> 

Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-06-28 Thread Robin Murphy

On 2021-06-28 11:48, Steven Price wrote:

On 25/06/2021 10:49, Chunyou Tang wrote:

Hi Steve,
Thinks for your reply.
When I only set the pte |= ARM_LPAE_PTE_SH_NS;there have no "GPU
Fault",When I set the pte |= ARM_LPAE_PTE_SH_IS(or
ARM_LPAE_PTE_SH_OS);there have "GPU Fault".I don't know how the pte
effect this issue?
Can you give me some suggestions again?

Thinks.

Chunyou


Hi Chunyou,

You haven't given me much context so I'm not entirely sure which PTE you
are talking about (GPU or CPU), or indeed where you are changing the PTE
values.

The PTEs control whether a page is shareable or not, the GPU requires
that accesses are consistent (i.e. either all accesses to a page are
shareable or all are non-shareable) and will race a fault if it detects
this isn't the case. Mali also has a quirk for its version of 'LPAE'
where inner shareable actually means only within the GPU and outer
shareable means outside the GPU (which I think usually means Inner
Shareable on the external bus).


Furthermore, the way the io-pgtable code works for ARM_MALI_LPAE format 
 means that *all* GPU mappings are unconditionally outer-shareable, so 
it's not clear how the GPU could observe a mismatch in the first place 
(other than major integration issues causing data corruption).


Robin.



Steve


于 Thu, 24 Jun 2021 14:22:04 +0100
Steven Price  写道:


On 22/06/2021 02:40, Chunyou Tang wrote:

Hi Steve,
I will send a new patch with suitable subject/commit
message. But I send a V3 or a new patch?


Send a V3 - it is a new version of this patch.


I met a bug about the GPU,I have no idea about how to fix
it, If you can give me some suggestion,it is perfect.

You can see such kernel log:

Jun 20 10:20:13 icube kernel: [  774.566760] mvp_gpu :05:00.0:
GPU Fault 0x0088 (SHAREABILITY_FAULT) at 0x0310fd00 Jun
20 10:20:13 icube kernel: [  774.566764] mvp_gpu :05:00.0:
There were multiple GPU faults - some have not been reported Jun 20
10:20:13 icube kernel: [  774.667542] mvp_gpu :05:00.0:
AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [  774.767900]
mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube
kernel: [  774.868546] mvp_gpu :05:00.0: AS_ACTIVE bit stuck
Jun 20 10:20:13 icube kernel: [  774.968910] mvp_gpu :05:00.0:
AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [  775.069251]
mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:22 icube
kernel: [  783.693971] mvp_gpu :05:00.0: gpu sched timeout,
js=1, config=0x7300, status=0x8, head=0x362c900, tail=0x362c100,
sched_job=3252fb84

In
https://lore.kernel.org/dri-devel/20200510165538.19720-1-peron.c...@gmail.com/
there had a same bug like mine,and I found you at the mail list,I
don't know how it fixed?


The GPU_SHAREABILITY_FAULT error means that a cache line has been
accessed both as shareable and non-shareable and therefore coherency
cannot be guaranteed. Although the "multiple GPU faults" means that
this may not be the underlying cause.

The fact that your dmesg log has PCI style identifiers
(":05:00.0") suggests this is an unusual platform - I've not
previously been aware of a Mali device behind PCI. Is this device
working with the kbase/DDK proprietary driver? It would be worth
looking at the kbase kernel code for the platform to see if there is
anything special done for the platform.

 From the dmesg logs all I can really tell is that the GPU seems
unhappy about the memory system.

Steve


I need your help!

thinks very much!

Chunyou

于 Mon, 21 Jun 2021 11:45:20 +0100
Steven Price  写道:


On 19/06/2021 04:18, Chunyou Tang wrote:

Hi Steve,
1,Now I know how to write the subject
2,the low 8 bits is the exception type in spec.

and you can see prnfrost_exception_name()

switch (exception_code) {
 /* Non-Fault Status code */
case 0x00: return "NOT_STARTED/IDLE/OK";
case 0x01: return "DONE";
case 0x02: return "INTERRUPTED";
case 0x03: return "STOPPED";
case 0x04: return "TERMINATED";
case 0x08: return "ACTIVE";


case 0xD8: return "ACCESS_FLAG";
case 0xD9 ... 0xDF: return "ACCESS_FLAG";
case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
}
return "UNKNOWN";
}

the exception_code in case is only 8 bits,so if fault_status
in panfrost_gpu_irq_handler() don't & 0xFF,it can't get correct
exception reason,it will be always UNKNOWN.


Yes, I'm happy with the change - I just need a patch that I can
apply. At the moment this patch only changes the first '0x%08x'
output rather than the call to panfrost_exception_name() as well.
So we just need a patch which does:

- fault_status & 0xFF, panfrost_exception_name(pfdev,
fault_status),
+ fault_status, panfrost_exception_name(pfdev, fault_status &
0xFF),

along with a suitable subject/commit message describing the
change. If you can send me that I can apply it.

Thanks,

Steve

PS. Sorry for going round in circles here - I'm trying to help 

Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-06-28 Thread Steven Price
On 25/06/2021 10:49, Chunyou Tang wrote:
> Hi Steve,
>   Thinks for your reply.
>   When I only set the pte |= ARM_LPAE_PTE_SH_NS;there have no "GPU
> Fault",When I set the pte |= ARM_LPAE_PTE_SH_IS(or
> ARM_LPAE_PTE_SH_OS);there have "GPU Fault".I don't know how the pte
> effect this issue?
>   Can you give me some suggestions again?
> 
> Thinks.
> 
> Chunyou

Hi Chunyou,

You haven't given me much context so I'm not entirely sure which PTE you
are talking about (GPU or CPU), or indeed where you are changing the PTE
values.

The PTEs control whether a page is shareable or not, the GPU requires
that accesses are consistent (i.e. either all accesses to a page are
shareable or all are non-shareable) and will race a fault if it detects
this isn't the case. Mali also has a quirk for its version of 'LPAE'
where inner shareable actually means only within the GPU and outer
shareable means outside the GPU (which I think usually means Inner
Shareable on the external bus).

Steve

> ?? Thu, 24 Jun 2021 14:22:04 +0100
> Steven Price  :
> 
>> On 22/06/2021 02:40, Chunyou Tang wrote:
>>> Hi Steve,
>>> I will send a new patch with suitable subject/commit
>>> message. But I send a V3 or a new patch?
>>
>> Send a V3 - it is a new version of this patch.
>>
>>> I met a bug about the GPU,I have no idea about how to fix
>>> it, If you can give me some suggestion,it is perfect.
>>>
>>> You can see such kernel log:
>>>
>>> Jun 20 10:20:13 icube kernel: [  774.566760] mvp_gpu :05:00.0:
>>> GPU Fault 0x0088 (SHAREABILITY_FAULT) at 0x0310fd00 Jun
>>> 20 10:20:13 icube kernel: [  774.566764] mvp_gpu :05:00.0:
>>> There were multiple GPU faults - some have not been reported Jun 20
>>> 10:20:13 icube kernel: [  774.667542] mvp_gpu :05:00.0:
>>> AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [  774.767900]
>>> mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube
>>> kernel: [  774.868546] mvp_gpu :05:00.0: AS_ACTIVE bit stuck
>>> Jun 20 10:20:13 icube kernel: [  774.968910] mvp_gpu :05:00.0:
>>> AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [  775.069251]
>>> mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:22 icube
>>> kernel: [  783.693971] mvp_gpu :05:00.0: gpu sched timeout,
>>> js=1, config=0x7300, status=0x8, head=0x362c900, tail=0x362c100,
>>> sched_job=3252fb84
>>>
>>> In
>>> https://lore.kernel.org/dri-devel/20200510165538.19720-1-peron.c...@gmail.com/
>>> there had a same bug like mine,and I found you at the mail list,I
>>> don't know how it fixed?
>>
>> The GPU_SHAREABILITY_FAULT error means that a cache line has been
>> accessed both as shareable and non-shareable and therefore coherency
>> cannot be guaranteed. Although the "multiple GPU faults" means that
>> this may not be the underlying cause.
>>
>> The fact that your dmesg log has PCI style identifiers
>> (":05:00.0") suggests this is an unusual platform - I've not
>> previously been aware of a Mali device behind PCI. Is this device
>> working with the kbase/DDK proprietary driver? It would be worth
>> looking at the kbase kernel code for the platform to see if there is
>> anything special done for the platform.
>>
>> From the dmesg logs all I can really tell is that the GPU seems
>> unhappy about the memory system.
>>
>> Steve
>>
>>> I need your help!
>>>
>>> thinks very much!
>>>
>>> Chunyou
>>>
>>> ?? Mon, 21 Jun 2021 11:45:20 +0100
>>> Steven Price  :
>>>
 On 19/06/2021 04:18, Chunyou Tang wrote:
> Hi Steve,
>   1,Now I know how to write the subject
>   2,the low 8 bits is the exception type in spec.
>
> and you can see prnfrost_exception_name()
>
> switch (exception_code) {
> /* Non-Fault Status code */
> case 0x00: return "NOT_STARTED/IDLE/OK";
> case 0x01: return "DONE";
> case 0x02: return "INTERRUPTED";
> case 0x03: return "STOPPED";
> case 0x04: return "TERMINATED";
> case 0x08: return "ACTIVE";
> 
> 
> case 0xD8: return "ACCESS_FLAG";
> case 0xD9 ... 0xDF: return "ACCESS_FLAG";
> case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
> case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
> }
> return "UNKNOWN";
> }
>
> the exception_code in case is only 8 bits,so if fault_status
> in panfrost_gpu_irq_handler() don't & 0xFF,it can't get correct
> exception reason,it will be always UNKNOWN.

 Yes, I'm happy with the change - I just need a patch that I can
 apply. At the moment this patch only changes the first '0x%08x'
 output rather than the call to panfrost_exception_name() as well.
 So we just need a patch which does:

 - fault_status & 0xFF, panfrost_exception_name(pfdev,
 fault_status),
 + fault_status, panfrost_exception_name(pfdev, fault_status &
 0xFF),

 along with a suitable subject/commit message describing the
 change. If you can send me that I can apply it.

Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-06-25 Thread Chunyou Tang
Hi Steve,
Thinks for your reply.
When I only set the pte |= ARM_LPAE_PTE_SH_NS;there have no "GPU
Fault",When I set the pte |= ARM_LPAE_PTE_SH_IS(or
ARM_LPAE_PTE_SH_OS);there have "GPU Fault".I don't know how the pte
effect this issue?
Can you give me some suggestions again?

Thinks.

Chunyou

?? Thu, 24 Jun 2021 14:22:04 +0100
Steven Price  :

> On 22/06/2021 02:40, Chunyou Tang wrote:
> > Hi Steve,
> > I will send a new patch with suitable subject/commit
> > message. But I send a V3 or a new patch?
> 
> Send a V3 - it is a new version of this patch.
> 
> > I met a bug about the GPU,I have no idea about how to fix
> > it, If you can give me some suggestion,it is perfect.
> > 
> > You can see such kernel log:
> > 
> > Jun 20 10:20:13 icube kernel: [  774.566760] mvp_gpu :05:00.0:
> > GPU Fault 0x0088 (SHAREABILITY_FAULT) at 0x0310fd00 Jun
> > 20 10:20:13 icube kernel: [  774.566764] mvp_gpu :05:00.0:
> > There were multiple GPU faults - some have not been reported Jun 20
> > 10:20:13 icube kernel: [  774.667542] mvp_gpu :05:00.0:
> > AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [  774.767900]
> > mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube
> > kernel: [  774.868546] mvp_gpu :05:00.0: AS_ACTIVE bit stuck
> > Jun 20 10:20:13 icube kernel: [  774.968910] mvp_gpu :05:00.0:
> > AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel: [  775.069251]
> > mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:22 icube
> > kernel: [  783.693971] mvp_gpu :05:00.0: gpu sched timeout,
> > js=1, config=0x7300, status=0x8, head=0x362c900, tail=0x362c100,
> > sched_job=3252fb84
> > 
> > In
> > https://lore.kernel.org/dri-devel/20200510165538.19720-1-peron.c...@gmail.com/
> > there had a same bug like mine,and I found you at the mail list,I
> > don't know how it fixed?
> 
> The GPU_SHAREABILITY_FAULT error means that a cache line has been
> accessed both as shareable and non-shareable and therefore coherency
> cannot be guaranteed. Although the "multiple GPU faults" means that
> this may not be the underlying cause.
> 
> The fact that your dmesg log has PCI style identifiers
> (":05:00.0") suggests this is an unusual platform - I've not
> previously been aware of a Mali device behind PCI. Is this device
> working with the kbase/DDK proprietary driver? It would be worth
> looking at the kbase kernel code for the platform to see if there is
> anything special done for the platform.
> 
> From the dmesg logs all I can really tell is that the GPU seems
> unhappy about the memory system.
> 
> Steve
> 
> > I need your help!
> > 
> > thinks very much!
> > 
> > Chunyou
> > 
> > ?? Mon, 21 Jun 2021 11:45:20 +0100
> > Steven Price  :
> > 
> >> On 19/06/2021 04:18, Chunyou Tang wrote:
> >>> Hi Steve,
> >>>   1,Now I know how to write the subject
> >>>   2,the low 8 bits is the exception type in spec.
> >>>
> >>> and you can see prnfrost_exception_name()
> >>>
> >>> switch (exception_code) {
> >>> /* Non-Fault Status code */
> >>> case 0x00: return "NOT_STARTED/IDLE/OK";
> >>> case 0x01: return "DONE";
> >>> case 0x02: return "INTERRUPTED";
> >>> case 0x03: return "STOPPED";
> >>> case 0x04: return "TERMINATED";
> >>> case 0x08: return "ACTIVE";
> >>> 
> >>> 
> >>> case 0xD8: return "ACCESS_FLAG";
> >>> case 0xD9 ... 0xDF: return "ACCESS_FLAG";
> >>> case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
> >>> case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
> >>> }
> >>> return "UNKNOWN";
> >>> }
> >>>
> >>> the exception_code in case is only 8 bits,so if fault_status
> >>> in panfrost_gpu_irq_handler() don't & 0xFF,it can't get correct
> >>> exception reason,it will be always UNKNOWN.
> >>
> >> Yes, I'm happy with the change - I just need a patch that I can
> >> apply. At the moment this patch only changes the first '0x%08x'
> >> output rather than the call to panfrost_exception_name() as well.
> >> So we just need a patch which does:
> >>
> >> - fault_status & 0xFF, panfrost_exception_name(pfdev,
> >> fault_status),
> >> + fault_status, panfrost_exception_name(pfdev, fault_status &
> >> 0xFF),
> >>
> >> along with a suitable subject/commit message describing the
> >> change. If you can send me that I can apply it.
> >>
> >> Thanks,
> >>
> >> Steve
> >>
> >> PS. Sorry for going round in circles here - I'm trying to help you
> >> get setup so you'll be able to contribute patches easily in
> >> future. An important part of that is ensuring you can send a
> >> properly formatted patch to the list.
> >>
> >> PPS. I'm still not receiving your emails directly. I don't think
> >> it's a problem at my end because I'm receiving other emails, but
> >> if you can somehow fix the problem you're likely to receive a
> >> faster response.
> >>
> >>> ?? Fri, 18 Jun 2021 13:43:24 +0100
> >>> Steven Price  :
> >>>
>  On 17/06/2021 07:20, ChunyouTang wrote:
> > From: ChunyouTang 
> >
> > of the 

Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-06-24 Thread Steven Price
On 22/06/2021 02:40, Chunyou Tang wrote:
> Hi Steve,
>   I will send a new patch with suitable subject/commit message.
> But I send a V3 or a new patch?

Send a V3 - it is a new version of this patch.

>   I met a bug about the GPU,I have no idea about how to fix it,
> If you can give me some suggestion,it is perfect.
> 
> You can see such kernel log:
> 
> Jun 20 10:20:13 icube kernel: [  774.566760] mvp_gpu :05:00.0: GPU
> Fault 0x0088 (SHAREABILITY_FAULT) at 0x0310fd00 Jun 20
> 10:20:13 icube kernel: [  774.566764] mvp_gpu :05:00.0: There were
> multiple GPU faults - some have not been reported Jun 20 10:20:13 icube
> kernel: [  774.667542] mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20
> 10:20:13 icube kernel: [  774.767900] mvp_gpu :05:00.0: AS_ACTIVE
> bit stuck Jun 20 10:20:13 icube kernel: [  774.868546] mvp_gpu
> :05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel:
> [  774.968910] mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20
> 10:20:13 icube kernel: [  775.069251] mvp_gpu :05:00.0: AS_ACTIVE
> bit stuck Jun 20 10:20:22 icube kernel: [  783.693971] mvp_gpu
> :05:00.0: gpu sched timeout, js=1, config=0x7300, status=0x8,
> head=0x362c900, tail=0x362c100, sched_job=3252fb84
> 
> In
> https://lore.kernel.org/dri-devel/20200510165538.19720-1-peron.c...@gmail.com/
> there had a same bug like mine,and I found you at the mail list,I don't
> know how it fixed?

The GPU_SHAREABILITY_FAULT error means that a cache line has been
accessed both as shareable and non-shareable and therefore coherency
cannot be guaranteed. Although the "multiple GPU faults" means that this
may not be the underlying cause.

The fact that your dmesg log has PCI style identifiers (":05:00.0")
suggests this is an unusual platform - I've not previously been aware of
a Mali device behind PCI. Is this device working with the kbase/DDK
proprietary driver? It would be worth looking at the kbase kernel code
for the platform to see if there is anything special done for the platform.

>From the dmesg logs all I can really tell is that the GPU seems unhappy
about the memory system.

Steve

> I need your help!
> 
> thinks very much!
> 
> Chunyou
> 
> 于 Mon, 21 Jun 2021 11:45:20 +0100
> Steven Price  写道:
> 
>> On 19/06/2021 04:18, Chunyou Tang wrote:
>>> Hi Steve,
>>> 1,Now I know how to write the subject
>>> 2,the low 8 bits is the exception type in spec.
>>>
>>> and you can see prnfrost_exception_name()
>>>
>>> switch (exception_code) {
>>> /* Non-Fault Status code */
>>> case 0x00: return "NOT_STARTED/IDLE/OK";
>>> case 0x01: return "DONE";
>>> case 0x02: return "INTERRUPTED";
>>> case 0x03: return "STOPPED";
>>> case 0x04: return "TERMINATED";
>>> case 0x08: return "ACTIVE";
>>> 
>>> 
>>> case 0xD8: return "ACCESS_FLAG";
>>> case 0xD9 ... 0xDF: return "ACCESS_FLAG";
>>> case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
>>> case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
>>> }
>>> return "UNKNOWN";
>>> }
>>>
>>> the exception_code in case is only 8 bits,so if fault_status
>>> in panfrost_gpu_irq_handler() don't & 0xFF,it can't get correct
>>> exception reason,it will be always UNKNOWN.
>>
>> Yes, I'm happy with the change - I just need a patch that I can apply.
>> At the moment this patch only changes the first '0x%08x' output rather
>> than the call to panfrost_exception_name() as well. So we just need a
>> patch which does:
>>
>> - fault_status & 0xFF, panfrost_exception_name(pfdev, fault_status),
>> + fault_status, panfrost_exception_name(pfdev, fault_status & 0xFF),
>>
>> along with a suitable subject/commit message describing the change. If
>> you can send me that I can apply it.
>>
>> Thanks,
>>
>> Steve
>>
>> PS. Sorry for going round in circles here - I'm trying to help you get
>> setup so you'll be able to contribute patches easily in future. An
>> important part of that is ensuring you can send a properly formatted
>> patch to the list.
>>
>> PPS. I'm still not receiving your emails directly. I don't think it's
>> a problem at my end because I'm receiving other emails, but if you can
>> somehow fix the problem you're likely to receive a faster response.
>>
>>> 于 Fri, 18 Jun 2021 13:43:24 +0100
>>> Steven Price  写道:
>>>
 On 17/06/2021 07:20, ChunyouTang wrote:
> From: ChunyouTang 
>
> of the low 8 bits.

 Please don't split the subject like this. The first line of the
 commit should be a (very short) summary of the patch. Then a blank
 line and then a longer description of what the purpose of the
 patch is and why it's needed.

 Also you previously had this as part of a series (the first part
 adding the "& 0xFF" in the panfrost_exception_name() call). I'm not
 sure we need two patches for the single line, but as it stands this
 patch doesn't apply.

 Also I'm still not receiving any emails from you directly (only via
 the list), so it's possible I might 

Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-06-21 Thread Chunyou Tang
Hi Steve,
I will send a new patch with suitable subject/commit message.
But I send a V3 or a new patch?

I met a bug about the GPU,I have no idea about how to fix it,
If you can give me some suggestion,it is perfect.

You can see such kernel log:

Jun 20 10:20:13 icube kernel: [  774.566760] mvp_gpu :05:00.0: GPU
Fault 0x0088 (SHAREABILITY_FAULT) at 0x0310fd00 Jun 20
10:20:13 icube kernel: [  774.566764] mvp_gpu :05:00.0: There were
multiple GPU faults - some have not been reported Jun 20 10:20:13 icube
kernel: [  774.667542] mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20
10:20:13 icube kernel: [  774.767900] mvp_gpu :05:00.0: AS_ACTIVE
bit stuck Jun 20 10:20:13 icube kernel: [  774.868546] mvp_gpu
:05:00.0: AS_ACTIVE bit stuck Jun 20 10:20:13 icube kernel:
[  774.968910] mvp_gpu :05:00.0: AS_ACTIVE bit stuck Jun 20
10:20:13 icube kernel: [  775.069251] mvp_gpu :05:00.0: AS_ACTIVE
bit stuck Jun 20 10:20:22 icube kernel: [  783.693971] mvp_gpu
:05:00.0: gpu sched timeout, js=1, config=0x7300, status=0x8,
head=0x362c900, tail=0x362c100, sched_job=3252fb84

In
https://lore.kernel.org/dri-devel/20200510165538.19720-1-peron.c...@gmail.com/
there had a same bug like mine,and I found you at the mail list,I don't
know how it fixed?

I need your help!

thinks very much!

Chunyou

?? Mon, 21 Jun 2021 11:45:20 +0100
Steven Price  :

> On 19/06/2021 04:18, Chunyou Tang wrote:
> > Hi Steve,
> > 1,Now I know how to write the subject
> > 2,the low 8 bits is the exception type in spec.
> > 
> > and you can see prnfrost_exception_name()
> > 
> > switch (exception_code) {
> > /* Non-Fault Status code */
> > case 0x00: return "NOT_STARTED/IDLE/OK";
> > case 0x01: return "DONE";
> > case 0x02: return "INTERRUPTED";
> > case 0x03: return "STOPPED";
> > case 0x04: return "TERMINATED";
> > case 0x08: return "ACTIVE";
> > 
> > 
> > case 0xD8: return "ACCESS_FLAG";
> > case 0xD9 ... 0xDF: return "ACCESS_FLAG";
> > case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
> > case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
> > }
> > return "UNKNOWN";
> > }
> > 
> > the exception_code in case is only 8 bits,so if fault_status
> > in panfrost_gpu_irq_handler() don't & 0xFF,it can't get correct
> > exception reason,it will be always UNKNOWN.
> 
> Yes, I'm happy with the change - I just need a patch that I can apply.
> At the moment this patch only changes the first '0x%08x' output rather
> than the call to panfrost_exception_name() as well. So we just need a
> patch which does:
> 
> - fault_status & 0xFF, panfrost_exception_name(pfdev, fault_status),
> + fault_status, panfrost_exception_name(pfdev, fault_status & 0xFF),
> 
> along with a suitable subject/commit message describing the change. If
> you can send me that I can apply it.
> 
> Thanks,
> 
> Steve
> 
> PS. Sorry for going round in circles here - I'm trying to help you get
> setup so you'll be able to contribute patches easily in future. An
> important part of that is ensuring you can send a properly formatted
> patch to the list.
> 
> PPS. I'm still not receiving your emails directly. I don't think it's
> a problem at my end because I'm receiving other emails, but if you can
> somehow fix the problem you're likely to receive a faster response.
> 
> > ?? Fri, 18 Jun 2021 13:43:24 +0100
> > Steven Price  :
> > 
> >> On 17/06/2021 07:20, ChunyouTang wrote:
> >>> From: ChunyouTang 
> >>>
> >>> of the low 8 bits.
> >>
> >> Please don't split the subject like this. The first line of the
> >> commit should be a (very short) summary of the patch. Then a blank
> >> line and then a longer description of what the purpose of the
> >> patch is and why it's needed.
> >>
> >> Also you previously had this as part of a series (the first part
> >> adding the "& 0xFF" in the panfrost_exception_name() call). I'm not
> >> sure we need two patches for the single line, but as it stands this
> >> patch doesn't apply.
> >>
> >> Also I'm still not receiving any emails from you directly (only via
> >> the list), so it's possible I might have missed something you sent.
> >>
> >> Steve
> >>
> >>>
> >>> Signed-off-by: ChunyouTang 
> >>> ---
> >>>  drivers/gpu/drm/panfrost/panfrost_gpu.c | 2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> >>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c index
> >>> 1fffb6a0b24f..d2d287bbf4e7 100644 ---
> >>> a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++
> >>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -33,7 +33,7 @@ static
> >>> irqreturn_t panfrost_gpu_irq_handler(int irq, void *data) address
> >>> |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO); 
> >>>   dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at
> >>> 0x%016llx\n",
> >>> -  fault_status & 0xFF,
> >>> panfrost_exception_name(pfdev, fault_status & 0xFF),
> >>> +  fault_status,
> >>> 

Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-06-21 Thread Steven Price
On 19/06/2021 04:18, Chunyou Tang wrote:
> Hi Steve,
>   1,Now I know how to write the subject
>   2,the low 8 bits is the exception type in spec.
> 
> and you can see prnfrost_exception_name()
> 
> switch (exception_code) {
> /* Non-Fault Status code */
> case 0x00: return "NOT_STARTED/IDLE/OK";
> case 0x01: return "DONE";
> case 0x02: return "INTERRUPTED";
> case 0x03: return "STOPPED";
> case 0x04: return "TERMINATED";
> case 0x08: return "ACTIVE";
> 
> 
> case 0xD8: return "ACCESS_FLAG";
> case 0xD9 ... 0xDF: return "ACCESS_FLAG";
> case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
> case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
> }
> return "UNKNOWN";
> }
> 
> the exception_code in case is only 8 bits,so if fault_status
> in panfrost_gpu_irq_handler() don't & 0xFF,it can't get correct
> exception reason,it will be always UNKNOWN.

Yes, I'm happy with the change - I just need a patch that I can apply.
At the moment this patch only changes the first '0x%08x' output rather
than the call to panfrost_exception_name() as well. So we just need a
patch which does:

- fault_status & 0xFF, panfrost_exception_name(pfdev, fault_status),
+ fault_status, panfrost_exception_name(pfdev, fault_status & 0xFF),

along with a suitable subject/commit message describing the change. If
you can send me that I can apply it.

Thanks,

Steve

PS. Sorry for going round in circles here - I'm trying to help you get
setup so you'll be able to contribute patches easily in future. An
important part of that is ensuring you can send a properly formatted
patch to the list.

PPS. I'm still not receiving your emails directly. I don't think it's a
problem at my end because I'm receiving other emails, but if you can
somehow fix the problem you're likely to receive a faster response.

> ?? Fri, 18 Jun 2021 13:43:24 +0100
> Steven Price  :
> 
>> On 17/06/2021 07:20, ChunyouTang wrote:
>>> From: ChunyouTang 
>>>
>>> of the low 8 bits.
>>
>> Please don't split the subject like this. The first line of the commit
>> should be a (very short) summary of the patch. Then a blank line and
>> then a longer description of what the purpose of the patch is and why
>> it's needed.
>>
>> Also you previously had this as part of a series (the first part
>> adding the "& 0xFF" in the panfrost_exception_name() call). I'm not
>> sure we need two patches for the single line, but as it stands this
>> patch doesn't apply.
>>
>> Also I'm still not receiving any emails from you directly (only via
>> the list), so it's possible I might have missed something you sent.
>>
>> Steve
>>
>>>
>>> Signed-off-by: ChunyouTang 
>>> ---
>>>  drivers/gpu/drm/panfrost/panfrost_gpu.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c
>>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c index
>>> 1fffb6a0b24f..d2d287bbf4e7 100644 ---
>>> a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++
>>> b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -33,7 +33,7 @@ static
>>> irqreturn_t panfrost_gpu_irq_handler(int irq, void *data) address
>>> |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO); 
>>> dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at
>>> 0x%016llx\n",
>>> -fault_status & 0xFF,
>>> panfrost_exception_name(pfdev, fault_status & 0xFF),
>>> +fault_status,
>>> panfrost_exception_name(pfdev, fault_status & 0xFF), address);
>>>  
>>> if (state & GPU_IRQ_MULTIPLE_FAULT)
>>>
> 
> 



Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-06-18 Thread Chunyou Tang
Hi Steve,
1,Now I know how to write the subject
2,the low 8 bits is the exception type in spec.

and you can see prnfrost_exception_name()

switch (exception_code) {
/* Non-Fault Status code */
case 0x00: return "NOT_STARTED/IDLE/OK";
case 0x01: return "DONE";
case 0x02: return "INTERRUPTED";
case 0x03: return "STOPPED";
case 0x04: return "TERMINATED";
case 0x08: return "ACTIVE";


case 0xD8: return "ACCESS_FLAG";
case 0xD9 ... 0xDF: return "ACCESS_FLAG";
case 0xE0 ... 0xE7: return "ADDRESS_SIZE_FAULT";
case 0xE8 ... 0xEF: return "MEMORY_ATTRIBUTES_FAULT";
}
return "UNKNOWN";
}

the exception_code in case is only 8 bits,so if fault_status
in panfrost_gpu_irq_handler() don't & 0xFF,it can't get correct
exception reason,it will be always UNKNOWN.

?? Fri, 18 Jun 2021 13:43:24 +0100
Steven Price  :

> On 17/06/2021 07:20, ChunyouTang wrote:
> > From: ChunyouTang 
> > 
> > of the low 8 bits.
> 
> Please don't split the subject like this. The first line of the commit
> should be a (very short) summary of the patch. Then a blank line and
> then a longer description of what the purpose of the patch is and why
> it's needed.
> 
> Also you previously had this as part of a series (the first part
> adding the "& 0xFF" in the panfrost_exception_name() call). I'm not
> sure we need two patches for the single line, but as it stands this
> patch doesn't apply.
> 
> Also I'm still not receiving any emails from you directly (only via
> the list), so it's possible I might have missed something you sent.
> 
> Steve
> 
> > 
> > Signed-off-by: ChunyouTang 
> > ---
> >  drivers/gpu/drm/panfrost/panfrost_gpu.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> > b/drivers/gpu/drm/panfrost/panfrost_gpu.c index
> > 1fffb6a0b24f..d2d287bbf4e7 100644 ---
> > a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++
> > b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -33,7 +33,7 @@ static
> > irqreturn_t panfrost_gpu_irq_handler(int irq, void *data) address
> > |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO); 
> > dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at
> > 0x%016llx\n",
> > -fault_status & 0xFF,
> > panfrost_exception_name(pfdev, fault_status & 0xFF),
> > +fault_status,
> > panfrost_exception_name(pfdev, fault_status & 0xFF), address);
> >  
> > if (state & GPU_IRQ_MULTIPLE_FAULT)
> > 




Re: [PATCH v2] drm/panfrost:report the full raw fault information instead

2021-06-18 Thread Steven Price
On 17/06/2021 07:20, ChunyouTang wrote:
> From: ChunyouTang 
> 
> of the low 8 bits.

Please don't split the subject like this. The first line of the commit
should be a (very short) summary of the patch. Then a blank line and
then a longer description of what the purpose of the patch is and why
it's needed.

Also you previously had this as part of a series (the first part adding
the "& 0xFF" in the panfrost_exception_name() call). I'm not sure we
need two patches for the single line, but as it stands this patch
doesn't apply.

Also I'm still not receiving any emails from you directly (only via the
list), so it's possible I might have missed something you sent.

Steve

> 
> Signed-off-by: ChunyouTang 
> ---
>  drivers/gpu/drm/panfrost/panfrost_gpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
> b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> index 1fffb6a0b24f..d2d287bbf4e7 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
> @@ -33,7 +33,7 @@ static irqreturn_t panfrost_gpu_irq_handler(int irq, void 
> *data)
>   address |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO);
>  
>   dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at 0x%016llx\n",
> -  fault_status & 0xFF, panfrost_exception_name(pfdev, 
> fault_status & 0xFF),
> +  fault_status, panfrost_exception_name(pfdev, 
> fault_status & 0xFF),
>address);
>  
>   if (state & GPU_IRQ_MULTIPLE_FAULT)
> 



[PATCH v2] drm/panfrost:report the full raw fault information instead

2021-06-17 Thread ChunyouTang
From: ChunyouTang 

of the low 8 bits.

Signed-off-by: ChunyouTang 
---
 drivers/gpu/drm/panfrost/panfrost_gpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c 
b/drivers/gpu/drm/panfrost/panfrost_gpu.c
index 1fffb6a0b24f..d2d287bbf4e7 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gpu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c
@@ -33,7 +33,7 @@ static irqreturn_t panfrost_gpu_irq_handler(int irq, void 
*data)
address |= gpu_read(pfdev, GPU_FAULT_ADDRESS_LO);
 
dev_warn(pfdev->dev, "GPU Fault 0x%08x (%s) at 0x%016llx\n",
-fault_status & 0xFF, panfrost_exception_name(pfdev, 
fault_status & 0xFF),
+fault_status, panfrost_exception_name(pfdev, 
fault_status & 0xFF),
 address);
 
if (state & GPU_IRQ_MULTIPLE_FAULT)
-- 
2.25.1