Re: Does '5.7.6-1' work on USIIIi for you? (was: Re: sparc64 kernel crashes, & using SAS/SATA drives instead of SCA/FC-AL)

2020-07-13 Thread Romain Dolbeau
Le lun. 13 juil. 2020 à 09:27, John Paul Adrian Glaubitz
 a écrit :
> Please, do thorough tests in the future before claiming a bug has been fixed.

That's the weird thing, I would have thought >10 hours of parallel
compile was a 'thorough' test...
Apparently not :-(

And I did not claim the bug was fixed; I merely requested further
tests to see if it was:
> (..)try the current kernel and see if it fixes the problem? And if it 
> doesn't, (...)

Clearly, it doesn't for everyone :-(

Cordially,

-- 
Romain Dolbeau



Re: Does '5.7.6-1' work on USIIIi for you? (was: Re: sparc64 kernel crashes, & using SAS/SATA drives instead of SCA/FC-AL)

2020-07-13 Thread John Paul Adrian Glaubitz



On 7/13/20 9:33 AM, Romain Dolbeau wrote:
> Le lun. 13 juil. 2020 à 09:01, John Paul Adrian Glaubitz
>  a écrit :
>> I switched one of the buildds which has an UltraSPARC IIIi to kernel 5.7.6 
>> and
>> got this:
> 
> Looks like the crash I had, as far as I can remember.
> 
> Any idea about the workload at the time?
> Is there a lot of parallelism in the build system, could the memory
> have been saturated somehow?
> On my side I was running at -j3 to force a bit of context switching
> (SB has dual CPU), but there's 8 GIB in there and I don't think I got
> close to the limit...

Try running the gcc or glibc testsuites, these will usually kill the machine.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: Does '5.7.6-1' work on USIIIi for you? (was: Re: sparc64 kernel crashes, & using SAS/SATA drives instead of SCA/FC-AL)

2020-07-13 Thread Romain Dolbeau
Le lun. 13 juil. 2020 à 09:01, John Paul Adrian Glaubitz
 a écrit :
> I switched one of the buildds which has an UltraSPARC IIIi to kernel 5.7.6 and
> got this:

Looks like the crash I had, as far as I can remember.

Any idea about the workload at the time?
Is there a lot of parallelism in the build system, could the memory
have been saturated somehow?
On my side I was running at -j3 to force a bit of context switching
(SB has dual CPU), but there's 8 GIB in there and I don't think I got
close to the limit...

Cordially,

-- 
Romain Dolbeau



Re: Does '5.7.6-1' work on USIIIi for you? (was: Re: sparc64 kernel crashes, & using SAS/SATA drives instead of SCA/FC-AL)

2020-07-13 Thread John Paul Adrian Glaubitz
On 7/13/20 9:01 AM, John Paul Adrian Glaubitz wrote:
> On 7/13/20 8:17 AM, Romain Dolbeau wrote:
>> I really don't understand; I had two crashes before but cannot induce one 
>> now...
> 
> I switched one of the buildds which has an UltraSPARC IIIi to kernel 5.7.6 and
> got this:

And now the machine is no longer reachable.

Please, do thorough tests in the future before claiming a bug has been fixed. I
will now have to spend several hours to get the machine working again because
I don't have access to the console and have to give instructions to the
owner of it.

*sigh*

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: Does '5.7.6-1' work on USIIIi for you? (was: Re: sparc64 kernel crashes, & using SAS/SATA drives instead of SCA/FC-AL)

2020-07-13 Thread John Paul Adrian Glaubitz
On 7/13/20 8:17 AM, Romain Dolbeau wrote:
> I really don't understand; I had two crashes before but cannot induce one 
> now...

I switched one of the buildds which has an UltraSPARC IIIi to kernel 5.7.6 and
got this:

[ 7483.258336] Unable to handle kernel paging request at virtual address 
71e76b6501458000
[ 7483.362427] tsk->{mm,active_mm}->context = 018a
[ 7483.435619] tsk->{mm,active_mm}->pgd = fff23825
[ 7483.504245]   \|/  \|/
 "@'/ .. \`@"
 /_| \__/ |_\
\__U_/
[ 7483.504249] kworker/0:3(2209): Oops [#1]
[ 7483.504257] CPU: 0 PID: 2209 Comm: kworker/0:3 Tainted: GE 
5.7.0-1-sparc64-smp #1 Debian 5.7.6-1
[ 7483.504274] Workqueue: memcg_kmem_cache kmemcg_workfn
[ 7483.504281] TSTATE: 004480e01604 TPC: 0065eba0 TNPC: 
0065eba4 Y: Tainted: GE
[ 7483.504292] TPC: 
[ 7483.504295] g0:  g1: 0018 g2: 00bc 
g3: 71e76b6501458ab8
[ 7483.504299] g4: fff03e1a6b40 g5: fff23d90a000 g6: fff238b38000 
g7: 10772cbe72ed3bf5
[ 7483.504302] o0: 0200 o1: 000c0482dc60 o2: 0001 
o3: f20145801800
[ 7483.504305] o4: 00fff2014580 o5: 00fff201 sp: fff238b3ac71 
ret_pc: 0065ec18
[ 7483.504309] RPC: 
[ 7483.504313] l0: 00bc0199 l1: 00bc0199 l2: ff00 
l3: ff00
[ 7483.504316] l4: 00ff l5: 00ff l6: ff00 
l7: 00ff
[ 7483.504320] i0: fff03fc84680 i1: 000c0482dc60 i2: 71e76b6501458aa0 
i3: 69672e6403457a5f
[ 7483.504323] i4: 71e76b6501458aa0 i5: fff201458000 i6: fff238b3adb1 
i7: 0065f5a0
[ 7483.504327] I7: 
[ 7483.504330] Call Trace:
[ 7483.504335]  [0065f5a0] flush_cpu_slab+0x40/0x80
[ 7483.504344]  [0050deec] on_each_cpu_cond_mask+0x6c/0x80
[ 7483.504349]  [0050df20] on_each_cpu_cond+0x20/0x40
[ 7483.504354]  [006637a0] __kmem_cache_shrink+0x20/0x2a0
[ 7483.504359]  [00663a2c] __kmemcg_cache_deactivate_after_rcu+0xc/0x60
[ 7483.504364]  [0061080c] kmemcg_cache_deactivate_after_rcu+0xc/0x40
[ 7483.504369]  [006107e0] kmemcg_workfn+0x20/0x40
[ 7483.504379]  [0048bc58] process_one_work+0x1b8/0x4e0
[ 7483.504383]  [0048c0c0] worker_thread+0x140/0x540
[ 7483.504390]  [0049279c] kthread+0xdc/0x120
[ 7483.504399]  [004060a4] ret_from_fork+0x1c/0x2c
[ 7483.504403]  [] 0x0
[ 7483.504406] Disabling lock debugging due to kernel taint
[ 7483.504410] Caller[0065f5a0]: flush_cpu_slab+0x40/0x80
[ 7483.504415] Caller[0050deec]: on_each_cpu_cond_mask+0x6c/0x80
[ 7483.504419] Caller[0050df20]: on_each_cpu_cond+0x20/0x40
[ 7483.504423] Caller[006637a0]: __kmem_cache_shrink+0x20/0x2a0
[ 7483.504428] Caller[00663a2c]: 
__kmemcg_cache_deactivate_after_rcu+0xc/0x60
[ 7483.504433] Caller[0061080c]: 
kmemcg_cache_deactivate_after_rcu+0xc/0x40
[ 7483.504437] Caller[006107e0]: kmemcg_workfn+0x20/0x40
[ 7483.504442] Caller[0048bc58]: process_one_work+0x1b8/0x4e0
[ 7483.504445] Caller[0048c0c0]: worker_thread+0x140/0x540
[ 7483.504449] Caller[0049279c]: kthread+0xdc/0x120

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: Does '5.7.6-1' work on USIIIi for you? (was: Re: sparc64 kernel crashes, & using SAS/SATA drives instead of SCA/FC-AL)

2020-07-13 Thread Romain Dolbeau
Le dim. 12 juil. 2020 à 14:38, John Paul Adrian Glaubitz
 a écrit :
> Sounds good. I will give it a try.

Thanks to Connor's message about Firefox, I discovered
snapshot.debian.org and could install the 5.6 that I couldn't find
before.
But I couldn't get it to crash with my GCC rebuild script, I gave up
near the end of stage 1 (so it had compiled, checked and installed
binutils/gmp/mpfr/mpc/isl).
Then I went back to 5.7 and tried again from the console, thinking
maybe the crash had to do with that.
Again, no crash.

I really don't understand; I had two crashes before but cannot induce one now...

Le dim. 12 juil. 2020 à 15:27, Gregor Riepl  a écrit :
> From what I can gather on the net, the GPU on this card is a 3DLabs Wildcat

I believe so; thanks for the links. I only mentioned the lack of X as
it could be a factor in the crashes.
If I really wanted to run X11 there in Linux (it should work on
Solaris), I have a spare XVR-100 to swap in that should be OK.

Cordially,

-- 
Romain Dolbeau



Re: Does '5.7.6-1' work on USIIIi for you? (was: Re: sparc64 kernel crashes, & using SAS/SATA drives instead of SCA/FC-AL)

2020-07-12 Thread Gregor Riepl


> (the XVR-600 isn't supported in X).

>From what I can gather on the net, the GPU on this card is a 3DLabs
Wildcat, with conflicting information about its generation (2 or 4).

Sadly, there's no open-source drivers for these cards, and the only
proprietary ones available[1] will likely not work on a modern box and
require a x86 CPU.

You *could* stuff another PCI card in there if you don't need the
OpenBoot messages... Maybe a Radeon 9200 or even this: [2]

[1]
https://www.schneider-digital.com/support/download/driver/Grafikkarten/3Dlabs/
[2] https://www.zotac.com/pk/product/graphics_card/gt-610-pci



Re: Does '5.7.6-1' work on USIIIi for you? (was: Re: sparc64 kernel crashes, & using SAS/SATA drives instead of SCA/FC-AL)

2020-07-12 Thread John Paul Adrian Glaubitz
On 7/12/20 1:57 PM, Romain Dolbeau wrote:
> After failing to deliberately crash my home-cross-compiled vanilla 5.2
> on the Sun Blade 2500 Red, I installed the current kernel in Sid:
> 
> linux-image-5.7.0-1-sparc64-smp  5.7.6-1
> 
> And managed to do a full rebuild of GCC 10.1 (starting with recent
> binutils/gmp/mpfr/mpc/isl before a complete 3-stage bootstrap), and in
> parallel do a git checkout of Linux, some package installation and a
> configure/rebuild of ZFS. Took almost a day (with a shutdown/reboot
> the middle of stage 2), no crash in sight, though I tried via SSH only
> (the XVR-600 isn't supported in X). The machine has been rock-solid so
> far (running from a SAS drive on a flashed 1068)...
> 
> For those with crashes - could you try the current kernel and see if
> it fixes the problem? And if it doesn't, what kind of workload do you
> have when the kernel crashes? I've seen the crashes myself but can't
> reproduce them anymore and I don't have the archive of the 5.6 I might
> have been running at the time...

Sounds good. I will give it a try.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Does '5.7.6-1' work on USIIIi for you? (was: Re: sparc64 kernel crashes, & using SAS/SATA drives instead of SCA/FC-AL)

2020-07-12 Thread Romain Dolbeau
Le jeu. 9 juil. 2020 à 13:26, Romain Dolbeau  a écrit :
> So - has anyone made any progress on this or are we still in need of a
> bisect? If the latest, is there any known way to quickly cause a crash
> to ensure if a tested kernel is good/bad?

I wanted to give a go at bisecting, so first I recovered a 4.17
package in the archive on my T5120 to have something that should work
as backup - and in so doing, removed the Debian kernels I had... maybe
that was a mistake, as I could have been running 5.6 at the time, I'm
not sure.

After failing to deliberately crash my home-cross-compiled vanilla 5.2
on the Sun Blade 2500 Red, I installed the current kernel in Sid:

linux-image-5.7.0-1-sparc64-smp  5.7.6-1

And managed to do a full rebuild of GCC 10.1 (starting with recent
binutils/gmp/mpfr/mpc/isl before a complete 3-stage bootstrap), and in
parallel do a git checkout of Linux, some package installation and a
configure/rebuild of ZFS. Took almost a day (with a shutdown/reboot
the middle of stage 2), no crash in sight, though I tried via SSH only
(the XVR-600 isn't supported in X). The machine has been rock-solid so
far (running from a SAS drive on a flashed 1068)...

For those with crashes - could you try the current kernel and see if
it fixes the problem? And if it doesn't, what kind of workload do you
have when the kernel crashes? I've seen the crashes myself but can't
reproduce them anymore and I don't have the archive of the 5.6 I might
have been running at the time...

Cordially,

-- 
Romain Dolbeau