Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-12-20 Thread Salvatore Bonaccorso
Hi,

On Sat, Nov 18, 2023 at 04:50:36PM +0100, Aurelien Jarno wrote:
> Hi,
> 
> On 2023-10-30 09:46, Julien Cristau wrote:
> > Hi,
> > 
> > On Mon, Oct  9, 2023 at 09:08:31 +0100, Jiaxun Yang wrote:
> > 
> > > 
> > > 
> > > 在2023年10月8日十月 上午11:11,Aurelien Jarno写道:
> > > > On 2023-07-19 16:28, Jiaxun Yang wrote:
> > > >> 
> > > >> 
> > > >> 在 2023/7/8 18:11, Aurelien Jarno 写道:
> > > >> [...]
> > > >> > Any news about that? We need to be able to run the latest stable 
> > > >> > kernel
> > > >> > on the build daemon.
> > > >> 
> > > >> Hi all,
> > > >> 
> > > >> After receiving more reports on that patch I think we shoud workaround 
> > > >> it in
> > > >> Kernel.
> > > >> 
> > > >> I had posted a patch to kernel, kernel bug tracker [1].
> > > >> 
> > > >> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217680
> > > >
> > > > Any news about that? I haven't spotted any fix for this in Linus' tree
> > > > nor in next.
> > > 
> > > Still waiting for a response from PCI folks.
> > > Will resend the patch later.
> > > 
> > Any news on this?  It's been several months...
> 
> Gentle ping about the issue.

Good news: The issue got fixed in mainline in 6.7-rc6, and the change
backported to 6.6.8, 6.1.69, 5.10.205. So it will be in each of the
next rebases.

Regards,
Salvatore



Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-11-18 Thread Aurelien Jarno
Hi,

On 2023-10-30 09:46, Julien Cristau wrote:
> Hi,
> 
> On Mon, Oct  9, 2023 at 09:08:31 +0100, Jiaxun Yang wrote:
> 
> > 
> > 
> > 在2023年10月8日十月 上午11:11,Aurelien Jarno写道:
> > > On 2023-07-19 16:28, Jiaxun Yang wrote:
> > >> 
> > >> 
> > >> 在 2023/7/8 18:11, Aurelien Jarno 写道:
> > >> [...]
> > >> > Any news about that? We need to be able to run the latest stable kernel
> > >> > on the build daemon.
> > >> 
> > >> Hi all,
> > >> 
> > >> After receiving more reports on that patch I think we shoud workaround 
> > >> it in
> > >> Kernel.
> > >> 
> > >> I had posted a patch to kernel, kernel bug tracker [1].
> > >> 
> > >> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217680
> > >
> > > Any news about that? I haven't spotted any fix for this in Linus' tree
> > > nor in next.
> > 
> > Still waiting for a response from PCI folks.
> > Will resend the patch later.
> > 
> Any news on this?  It's been several months...

Gentle ping about the issue.

Thanks,
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://aurel32.net



Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-10-30 Thread Julien Cristau
Hi,

On Mon, Oct  9, 2023 at 09:08:31 +0100, Jiaxun Yang wrote:

> 
> 
> 在2023年10月8日十月 上午11:11,Aurelien Jarno写道:
> > On 2023-07-19 16:28, Jiaxun Yang wrote:
> >> 
> >> 
> >> 在 2023/7/8 18:11, Aurelien Jarno 写道:
> >> [...]
> >> > Any news about that? We need to be able to run the latest stable kernel
> >> > on the build daemon.
> >> 
> >> Hi all,
> >> 
> >> After receiving more reports on that patch I think we shoud workaround it 
> >> in
> >> Kernel.
> >> 
> >> I had posted a patch to kernel, kernel bug tracker [1].
> >> 
> >> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217680
> >
> > Any news about that? I haven't spotted any fix for this in Linus' tree
> > nor in next.
> 
> Still waiting for a response from PCI folks.
> Will resend the patch later.
> 
Any news on this?  It's been several months...

Thanks,
Julien



Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-10-09 Thread Jiaxun Yang



在2023年10月8日十月 上午11:11,Aurelien Jarno写道:
> On 2023-07-19 16:28, Jiaxun Yang wrote:
>> 
>> 
>> 在 2023/7/8 18:11, Aurelien Jarno 写道:
>> [...]
>> > Any news about that? We need to be able to run the latest stable kernel
>> > on the build daemon.
>> 
>> Hi all,
>> 
>> After receiving more reports on that patch I think we shoud workaround it in
>> Kernel.
>> 
>> I had posted a patch to kernel, kernel bug tracker [1].
>> 
>> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217680
>
> Any news about that? I haven't spotted any fix for this in Linus' tree
> nor in next.

Still waiting for a response from PCI folks.
Will resend the patch later.

Thanks
Jiaxun


>
> Thanks
> Aurelien
>
> -- 
> Aurelien Jarno  GPG: 4096R/1DDD8C9B
> aurel...@aurel32.net http://aurel32.net

-- 
- Jiaxun



Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-10-08 Thread Aurelien Jarno
On 2023-07-19 16:28, Jiaxun Yang wrote:
> 
> 
> 在 2023/7/8 18:11, Aurelien Jarno 写道:
> [...]
> > Any news about that? We need to be able to run the latest stable kernel
> > on the build daemon.
> 
> Hi all,
> 
> After receiving more reports on that patch I think we shoud workaround it in
> Kernel.
> 
> I had posted a patch to kernel, kernel bug tracker [1].
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217680

Any news about that? I haven't spotted any fix for this in Linus' tree
nor in next.

Thanks
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://aurel32.net



Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-07-19 Thread Jiaxun Yang




在 2023/7/8 18:11, Aurelien Jarno 写道:
[...]

Any news about that? We need to be able to run the latest stable kernel
on the build daemon.


Hi all,

After receiving more reports on that patch I think we shoud workaround 
it in Kernel.


I had posted a patch to kernel, kernel bug tracker [1].

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217680

Thanks
- Jiaxun



Thanks,
Aurelien





Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-07-10 Thread Huacai Chen
On Mon, Jul 10, 2023 at 4:14 PM Jiaxun Yang  wrote:
>
> Hi all,
>
> I don't have much idea on firmware, so I don't know if firmware update
> is possible
> for that system.
>
> @Huacai, is it acceptable to revert MRRS/MPS workaround patch all MIPS based
> Loongson system? Or leave a cmdline option to configure workaround type?
Contact the machine provider to get new firmwares?

Huacai
>
> Thanks
> - Jiaxun
>
> 在 2023/7/8 18:11, Aurelien Jarno 写道:
> > Hi,
> >
> > On 2023-06-24 11:46, Aurelien Jarno wrote:
> >> Hi,
> >>
> >> On 2023-06-19 09:37, Huacai Chen wrote:
> >>> On Sun, Jun 18, 2023 at 5:24 PM Aurelien Jarno  wrote:
>  Hi,
> 
>  On 2023-05-07 19:22, Jiaxun Yang wrote:
> >
> >> 2023年5月6日 01:58,YunQiang Su  写道:
> >>
> >> Aurelien Jarno  于2023年5月6日周六 04:30写道:
> >>> Source: linux
> >>> Version: 5.10.178-3
> >>> Severity: important
> >>> X-Debbugs-Cc: d...@debian.org, debian-m...@lists.debian.org, 
> >>> s...@debian.org
> >>>
> >>> Following the point release, the buildd mipsel-osuosl-03.d.o does not
> >>> boot anymore, with errors in the AHCI controller:
> >>>
> >>> [   35.912147] ata4.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 
> >>> action 0x6 frozen
> >>> [   35.919769] ata4.00: failed command: WRITE FPDMA QUEUED
> >>> [   35.924968] ata4.00: cmd 61/20:e8:00:f0:e1/00:00:00:00:00/40 tag 
> >>> 29 ncq dma 16384 out
> >>> [   35.924968]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 
> >>> 0x4 (timeout)
> >>> [   35.940097] ata4.00: status: { DRDY }
> >>> [   35.943743] ata4: hard resetting link
> >>>
> >>> While that initially looks like a hardware issue, it appears that
> >>> reverting the kernel to 5.10.162-1 (from 5.10.178-3) fixes the issue.
> >>> Strangely mipsel.osuosl-05.d.o, which seems to be similar hardware 
> >>> (CPU,
> >>> motherboard and SATA drive), does not exhibit the same issue.
> >>>
> >> Maybe the different firmwares are used for them...
> >> CCed Huacai and Jiaxun.
> > I’m unable to reproduce on my side. Perhaps different hardware.
> > Is it possible to bisect Kernel on that machine to see of reverting 
> > that two commits do help?
>  I have bisected the issue and I confirm the intuition from Cyril. The
>  first bad commit is 654ae539254d10042869fdc77ad04c09e7eff1fd. Reverting
>  both commits (they are linked) indeed fixes the issue.
> >>> Seems a firmware bug, latest firmware should configure a suitable MRRS.
> >> Ok, thanks for the feedback. Given it's not a kernel bug, I am closing
> >> it.
> >>
> >> That said, can someone please send us the procedure to upgrade the
> >> firmware on this machine, so that we can continue using it as a buildd?
> > Any news about that? We need to be able to run the latest stable kernel
> > on the build daemon.
> >
> > Thanks,
> > Aurelien
> >
>



Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-07-10 Thread Jiaxun Yang

Hi all,

I don't have much idea on firmware, so I don't know if firmware update 
is possible

for that system.

@Huacai, is it acceptable to revert MRRS/MPS workaround patch all MIPS based
Loongson system? Or leave a cmdline option to configure workaround type?

Thanks
- Jiaxun

在 2023/7/8 18:11, Aurelien Jarno 写道:

Hi,

On 2023-06-24 11:46, Aurelien Jarno wrote:

Hi,

On 2023-06-19 09:37, Huacai Chen wrote:

On Sun, Jun 18, 2023 at 5:24 PM Aurelien Jarno  wrote:

Hi,

On 2023-05-07 19:22, Jiaxun Yang wrote:



2023年5月6日 01:58,YunQiang Su  写道:

Aurelien Jarno  于2023年5月6日周六 04:30写道:

Source: linux
Version: 5.10.178-3
Severity: important
X-Debbugs-Cc: d...@debian.org, debian-m...@lists.debian.org, s...@debian.org

Following the point release, the buildd mipsel-osuosl-03.d.o does not
boot anymore, with errors in the AHCI controller:

[   35.912147] ata4.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 action 0x6 
frozen
[   35.919769] ata4.00: failed command: WRITE FPDMA QUEUED
[   35.924968] ata4.00: cmd 61/20:e8:00:f0:e1/00:00:00:00:00/40 tag 29 ncq dma 
16384 out
[   35.924968]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[   35.940097] ata4.00: status: { DRDY }
[   35.943743] ata4: hard resetting link

While that initially looks like a hardware issue, it appears that
reverting the kernel to 5.10.162-1 (from 5.10.178-3) fixes the issue.
Strangely mipsel.osuosl-05.d.o, which seems to be similar hardware (CPU,
motherboard and SATA drive), does not exhibit the same issue.


Maybe the different firmwares are used for them...
CCed Huacai and Jiaxun.

I’m unable to reproduce on my side. Perhaps different hardware.
Is it possible to bisect Kernel on that machine to see of reverting that two 
commits do help?

I have bisected the issue and I confirm the intuition from Cyril. The
first bad commit is 654ae539254d10042869fdc77ad04c09e7eff1fd. Reverting
both commits (they are linked) indeed fixes the issue.

Seems a firmware bug, latest firmware should configure a suitable MRRS.

Ok, thanks for the feedback. Given it's not a kernel bug, I am closing
it.

That said, can someone please send us the procedure to upgrade the
firmware on this machine, so that we can continue using it as a buildd?

Any news about that? We need to be able to run the latest stable kernel
on the build daemon.

Thanks,
Aurelien





Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-07-08 Thread Aurelien Jarno
Hi,

On 2023-06-24 11:46, Aurelien Jarno wrote:
> Hi,
> 
> On 2023-06-19 09:37, Huacai Chen wrote:
> > On Sun, Jun 18, 2023 at 5:24 PM Aurelien Jarno  wrote:
> > >
> > > Hi,
> > >
> > > On 2023-05-07 19:22, Jiaxun Yang wrote:
> > > >
> > > >
> > > > > 2023年5月6日 01:58,YunQiang Su  写道:
> > > > >
> > > > > Aurelien Jarno  于2023年5月6日周六 04:30写道:
> > > > >>
> > > > >> Source: linux
> > > > >> Version: 5.10.178-3
> > > > >> Severity: important
> > > > >> X-Debbugs-Cc: d...@debian.org, debian-m...@lists.debian.org, 
> > > > >> s...@debian.org
> > > > >>
> > > > >> Following the point release, the buildd mipsel-osuosl-03.d.o does not
> > > > >> boot anymore, with errors in the AHCI controller:
> > > > >>
> > > > >> [   35.912147] ata4.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 
> > > > >> action 0x6 frozen
> > > > >> [   35.919769] ata4.00: failed command: WRITE FPDMA QUEUED
> > > > >> [   35.924968] ata4.00: cmd 61/20:e8:00:f0:e1/00:00:00:00:00/40 tag 
> > > > >> 29 ncq dma 16384 out
> > > > >> [   35.924968]  res 40/00:00:00:00:00/00:00:00:00:00/00 
> > > > >> Emask 0x4 (timeout)
> > > > >> [   35.940097] ata4.00: status: { DRDY }
> > > > >> [   35.943743] ata4: hard resetting link
> > > > >>
> > > > >> While that initially looks like a hardware issue, it appears that
> > > > >> reverting the kernel to 5.10.162-1 (from 5.10.178-3) fixes the issue.
> > > > >> Strangely mipsel.osuosl-05.d.o, which seems to be similar hardware 
> > > > >> (CPU,
> > > > >> motherboard and SATA drive), does not exhibit the same issue.
> > > > >>
> > > > >
> > > > > Maybe the different firmwares are used for them...
> > > > > CCed Huacai and Jiaxun.
> > > >
> > > > I’m unable to reproduce on my side. Perhaps different hardware.
> > > > Is it possible to bisect Kernel on that machine to see of reverting 
> > > > that two commits do help?
> > >
> > > I have bisected the issue and I confirm the intuition from Cyril. The
> > > first bad commit is 654ae539254d10042869fdc77ad04c09e7eff1fd. Reverting
> > > both commits (they are linked) indeed fixes the issue.
> > Seems a firmware bug, latest firmware should configure a suitable MRRS.
> 
> Ok, thanks for the feedback. Given it's not a kernel bug, I am closing
> it.
> 
> That said, can someone please send us the procedure to upgrade the
> firmware on this machine, so that we can continue using it as a buildd?

Any news about that? We need to be able to run the latest stable kernel
on the build daemon.

Thanks,
Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://aurel32.net



Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-06-18 Thread Huacai Chen
On Sun, Jun 18, 2023 at 5:24 PM Aurelien Jarno  wrote:
>
> Hi,
>
> On 2023-05-07 19:22, Jiaxun Yang wrote:
> >
> >
> > > 2023年5月6日 01:58,YunQiang Su  写道:
> > >
> > > Aurelien Jarno  于2023年5月6日周六 04:30写道:
> > >>
> > >> Source: linux
> > >> Version: 5.10.178-3
> > >> Severity: important
> > >> X-Debbugs-Cc: d...@debian.org, debian-m...@lists.debian.org, 
> > >> s...@debian.org
> > >>
> > >> Following the point release, the buildd mipsel-osuosl-03.d.o does not
> > >> boot anymore, with errors in the AHCI controller:
> > >>
> > >> [   35.912147] ata4.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 
> > >> action 0x6 frozen
> > >> [   35.919769] ata4.00: failed command: WRITE FPDMA QUEUED
> > >> [   35.924968] ata4.00: cmd 61/20:e8:00:f0:e1/00:00:00:00:00/40 tag 29 
> > >> ncq dma 16384 out
> > >> [   35.924968]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 
> > >> 0x4 (timeout)
> > >> [   35.940097] ata4.00: status: { DRDY }
> > >> [   35.943743] ata4: hard resetting link
> > >>
> > >> While that initially looks like a hardware issue, it appears that
> > >> reverting the kernel to 5.10.162-1 (from 5.10.178-3) fixes the issue.
> > >> Strangely mipsel.osuosl-05.d.o, which seems to be similar hardware (CPU,
> > >> motherboard and SATA drive), does not exhibit the same issue.
> > >>
> > >
> > > Maybe the different firmwares are used for them...
> > > CCed Huacai and Jiaxun.
> >
> > I’m unable to reproduce on my side. Perhaps different hardware.
> > Is it possible to bisect Kernel on that machine to see of reverting that 
> > two commits do help?
>
> I have bisected the issue and I confirm the intuition from Cyril. The
> first bad commit is 654ae539254d10042869fdc77ad04c09e7eff1fd. Reverting
> both commits (they are linked) indeed fixes the issue.
Seems a firmware bug, latest firmware should configure a suitable MRRS.

Huacai
>
> Aurelien
>
> --
> Aurelien Jarno  GPG: 4096R/1DDD8C9B
> aurel...@aurel32.net http://aurel32.net



Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-06-18 Thread Aurelien Jarno
Hi,

On 2023-05-07 19:22, Jiaxun Yang wrote:
> 
> 
> > 2023年5月6日 01:58,YunQiang Su  写道:
> > 
> > Aurelien Jarno  于2023年5月6日周六 04:30写道:
> >> 
> >> Source: linux
> >> Version: 5.10.178-3
> >> Severity: important
> >> X-Debbugs-Cc: d...@debian.org, debian-m...@lists.debian.org, 
> >> s...@debian.org
> >> 
> >> Following the point release, the buildd mipsel-osuosl-03.d.o does not
> >> boot anymore, with errors in the AHCI controller:
> >> 
> >> [   35.912147] ata4.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 
> >> action 0x6 frozen
> >> [   35.919769] ata4.00: failed command: WRITE FPDMA QUEUED
> >> [   35.924968] ata4.00: cmd 61/20:e8:00:f0:e1/00:00:00:00:00/40 tag 29 ncq 
> >> dma 16384 out
> >> [   35.924968]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
> >> (timeout)
> >> [   35.940097] ata4.00: status: { DRDY }
> >> [   35.943743] ata4: hard resetting link
> >> 
> >> While that initially looks like a hardware issue, it appears that
> >> reverting the kernel to 5.10.162-1 (from 5.10.178-3) fixes the issue.
> >> Strangely mipsel.osuosl-05.d.o, which seems to be similar hardware (CPU,
> >> motherboard and SATA drive), does not exhibit the same issue.
> >> 
> > 
> > Maybe the different firmwares are used for them...
> > CCed Huacai and Jiaxun.
> 
> I’m unable to reproduce on my side. Perhaps different hardware.
> Is it possible to bisect Kernel on that machine to see of reverting that two 
> commits do help?

I have bisected the issue and I confirm the intuition from Cyril. The
first bad commit is 654ae539254d10042869fdc77ad04c09e7eff1fd. Reverting
both commits (they are linked) indeed fixes the issue.

Aurelien

-- 
Aurelien Jarno  GPG: 4096R/1DDD8C9B
aurel...@aurel32.net http://aurel32.net



Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-05-07 Thread Jiaxun Yang



> 2023年5月6日 01:58,YunQiang Su  写道:
> 
> Aurelien Jarno  于2023年5月6日周六 04:30写道:
>> 
>> Source: linux
>> Version: 5.10.178-3
>> Severity: important
>> X-Debbugs-Cc: d...@debian.org, debian-m...@lists.debian.org, s...@debian.org
>> 
>> Following the point release, the buildd mipsel-osuosl-03.d.o does not
>> boot anymore, with errors in the AHCI controller:
>> 
>> [   35.912147] ata4.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 action 
>> 0x6 frozen
>> [   35.919769] ata4.00: failed command: WRITE FPDMA QUEUED
>> [   35.924968] ata4.00: cmd 61/20:e8:00:f0:e1/00:00:00:00:00/40 tag 29 ncq 
>> dma 16384 out
>> [   35.924968]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
>> (timeout)
>> [   35.940097] ata4.00: status: { DRDY }
>> [   35.943743] ata4: hard resetting link
>> 
>> While that initially looks like a hardware issue, it appears that
>> reverting the kernel to 5.10.162-1 (from 5.10.178-3) fixes the issue.
>> Strangely mipsel.osuosl-05.d.o, which seems to be similar hardware (CPU,
>> motherboard and SATA drive), does not exhibit the same issue.
>> 
> 
> Maybe the different firmwares are used for them...
> CCed Huacai and Jiaxun.

I’m unable to reproduce on my side. Perhaps different hardware.
Is it possible to bisect Kernel on that machine to see of reverting that two 
commits do help?

Thanks
Jiaxun


> 
>> You'll find attached the output of /proc/cpuinfo, lspci and the full
>> boot log.
> 
> 
> 
> -- 
> YunQiang Su



Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-05-05 Thread YunQiang Su
Aurelien Jarno  于2023年5月6日周六 04:30写道:
>
> Source: linux
> Version: 5.10.178-3
> Severity: important
> X-Debbugs-Cc: d...@debian.org, debian-m...@lists.debian.org, s...@debian.org
>
> Following the point release, the buildd mipsel-osuosl-03.d.o does not
> boot anymore, with errors in the AHCI controller:
>
> [   35.912147] ata4.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 action 
> 0x6 frozen
> [   35.919769] ata4.00: failed command: WRITE FPDMA QUEUED
> [   35.924968] ata4.00: cmd 61/20:e8:00:f0:e1/00:00:00:00:00/40 tag 29 ncq 
> dma 16384 out
> [   35.924968]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
> (timeout)
> [   35.940097] ata4.00: status: { DRDY }
> [   35.943743] ata4: hard resetting link
>
> While that initially looks like a hardware issue, it appears that
> reverting the kernel to 5.10.162-1 (from 5.10.178-3) fixes the issue.
> Strangely mipsel.osuosl-05.d.o, which seems to be similar hardware (CPU,
> motherboard and SATA drive), does not exhibit the same issue.
>

Maybe the different firmwares are used for them...
CCed Huacai and Jiaxun.

> You'll find attached the output of /proc/cpuinfo, lspci and the full
> boot log.



-- 
YunQiang Su



Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-05-05 Thread Cyril Brulebois
Hi,

Knowing nothing about HW, MIPS, etc. as usual…

Aurelien Jarno  (2023-05-05):
> Following the point release, the buildd mipsel-osuosl-03.d.o does not
> boot anymore, with errors in the AHCI controller:
> 
> [   35.912147] ata4.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 action 
> 0x6 frozen
> [   35.919769] ata4.00: failed command: WRITE FPDMA QUEUED
> [   35.924968] ata4.00: cmd 61/20:e8:00:f0:e1/00:00:00:00:00/40 tag 29 ncq 
> dma 16384 out
> [   35.924968]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
> (timeout)
> [   35.940097] ata4.00: status: { DRDY }
> [   35.943743] ata4: hard resetting link
> 
> While that initially looks like a hardware issue, it appears that
> reverting the kernel to 5.10.162-1 (from 5.10.178-3) fixes the issue.
> Strangely mipsel.osuosl-05.d.o, which seems to be similar hardware
> (CPU, motherboard and SATA drive), does not exhibit the same issue.

A quick search between both versions suggests 1 AHCI commit, and 2
Loongson ones, both in the PCI layer.

- ab711f3eda7a62c800b41997e818b675812f53a9 is AHCI, apparently
  Intel-only, so not interesting.
- 654ae539254d10042869fdc77ad04c09e7eff1fd and
  faa050d2ff8820f450b69b84645e74b6934ed5ad are abouts quirks, the first
  one adding them for LS7A, the second one extending that to more LS7A
  ports, and to LS2K.

Glancing at bootlog.txt, it seems both AHCI and PCI work together, and
that LS7A is involved, so maybe those two PCI commits are relevant? Why
the other similar machine isn't impacted, I don't know.

Direct links for commits in the linux-5.10.y branch:
 - 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=654ae539254d10042869fdc77ad04c09e7eff1fd
 - 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=faa050d2ff8820f450b69b84645e74b6934ed5ad


Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- Freelance Consultant


signature.asc
Description: PGP signature


Bug#1035587: linux: broken AHCI controller on MIPS Loongson 3 (regression from 5.10.162-1)

2023-05-05 Thread Aurelien Jarno
Source: linux
Version: 5.10.178-3
Severity: important
X-Debbugs-Cc: d...@debian.org, debian-m...@lists.debian.org, s...@debian.org

Following the point release, the buildd mipsel-osuosl-03.d.o does not
boot anymore, with errors in the AHCI controller:

[   35.912147] ata4.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 action 0x6 
frozen
[   35.919769] ata4.00: failed command: WRITE FPDMA QUEUED
[   35.924968] ata4.00: cmd 61/20:e8:00:f0:e1/00:00:00:00:00/40 tag 29 ncq dma 
16384 out
[   35.924968]  res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 
(timeout)
[   35.940097] ata4.00: status: { DRDY }
[   35.943743] ata4: hard resetting link

While that initially looks like a hardware issue, it appears that
reverting the kernel to 5.10.162-1 (from 5.10.178-3) fixes the issue.
Strangely mipsel.osuosl-05.d.o, which seems to be similar hardware (CPU,
motherboard and SATA drive), does not exhibit the same issue.

You'll find attached the output of /proc/cpuinfo, lspci and the full
boot log.
PMON2000 MIPS Initializing. Standby...

node 0 N Voltage  write :
v ctrl err

node 0 N Voltage  read :
00080760uV 

node 0 P Voltage write :

0xbfe00190  : 5282a9b7b7a7
CPU CLK SEL : 0002
MEM CLK SEL : 0014
HT CLK SEL : 0014
Disable HT0 clock
Change the scale of HT1 clock
Change the scale of LS132 clock

BBGEN start  :
BBGEN config value  :00ff6431
Soft CLK SEL adjust begin
CORE & NODE:
Miku MAGIC Mismtach
04110c85
MEM   :0909017b
fdcoefficient  :0004
HT:
SYS_LOOPC:0012

DDR_LOOPC:0024
NO TLB cache init ...
Jump to 9fc
32 bit PCI space translate to 64 bit HT space

Check HT bus up.
01110020
set LS7A MISC and confbus base address done.
3A HT in soft freq cfg mode...ok
7A HT in soft freq cfg mode...ok

PLL check success.
Wait HT bus up.01110020>
01110020
Set 7A side HT:
Set width
0020
Set Freq
82251060
Set soft config
008a810a
Set Gen3 mode
81237008
Set retry mode
0081
Enable scrambling
0078
set buffer num
0fff
Set CPU side HT:
Set width
0020
Set Freq
82250060
Set soft config
0087c10a
Set GEN3 mode
81237008
Set retry mode
0081
Enable scrambling
0078
Reset Node 0 HT1-lo bus
0040
Wait HT bus down.>
0010

Dereset Node 0 HT1 bus

Wait HT bus up.>
0020

After reconnect, PLL check success.
Checking Node 0 HT1 CRC error.>
Checking Bridge HT CRC error bit.>
LS3A-7A linkup.
Disable ht regs.

Start Init Memory, wait a while..
NODE 0 MEMORY CONFIG BEGIN

Lock Scache
Lock Scache Done.

Probing DDR MC0 SLOT: 
Slot 0: s1 = 0x00114008__c3004000

Slot 1: s1 = 0x__

 T5 s1 = 00114008__c3005f00

 t0 = 0x__

Enable register space of MEMORY

init start
908:000f0f0300e1e1c1
 begin Reset MC 
init start
908:000f0f0300e1e1c1
 begin Reset MC 
init start
908:000f200300e1e1c1
 begin Reset MC 
init start
908:0016050a00e1e1c1
 begin Reset MC 
init start
908:001e0ce1e1c1
 begin Reset MC 
init start
908:001b0a0f00e1e1c1
 begin Reset MC 
init start
908:000f0f0200e1e1c1
 begin Reset MC 
init start
908:00200f1400e1e1c1
 begin Reset MC 
init start
908:0007070c00e1e1c1
 begin Reset MC 
init start
908:000f200300e1e1c1
 begin Reset MC 
init start
908:00200f0200e1e1c1
 begin Reset MC 
init start
908:0016160a00e1e1c1
 begin Reset MC 
init start
908:000c0c1100e1e1c1
 begin Reset MC 
init start
908:000f200200e1e1c1
 begin Reset MC 
init start
908:000c0ce1e1c1
 begin Reset MC 
init start
908:000f0f1300e1e1c1
 begin Reset MC 
init start
908:000c1e0100e1e1c1
 begin Reset MC 
init start
908:000c0ce1e1c1
 begin Reset MC 
init start
908:000a0a0f00e1e1c1
 begin Reset MC 
init start
908:0005050a00e1e1c1
init done
Start Hard Leveling...
Enable register space of MEMORY

start training of tPHY_WR

tPHY_WRLAT training successThe MC param after leveling is:
PHY:
:  0011
0008:  0037
0010:  0103
0018:  
0020:  0001
0028:  
0030:  0052010100040510
0038:  0144
0040:  0008002a
0048:  02041c3801010100
0050:  0008002a
0058:  02041c38
0060:  0008002a
0068:  02041c38
0070:  0008002a
0078:  03041c3800010100
0080:  0008002a
0088:  02041c38
0090:  0008002a
0098:  02041c38
00a0:  0008002a
00a8:  02041c38
00b0:  0008002a
00b8:  03041c3800010100
00c0:  0008002a
00c8:  02041c38
00d0:  0008002a
00d8:  02041c38
00e0:  0001ff000f00
00e8:  0001ff000f00
00f0:  0001ff000f00
00f8:  0001ff000f00
0100:  16002016
0108:  001c1918
0110:  820400880080
0118:  00140003
0120:  
0128:  
0130:  
0138:  
0140:  
0148:  
0150: