Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-25 Thread Christoph Hellwig
I have to admit I'm completely lost at this point.  This new trace looks
totally strange to me, and I'm pretty sure whatever symptoms you see are
due to different alignments / code sections etc just triggered by the
removal, we need help from the real sparc experts.



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Frank Scheiner

On 24.03.21 17:33, Frank Scheiner wrote:

On 24.03.21 17:10, Christoph Hellwig wrote:

On Wed, Mar 24, 2021 at 04:58:39PM +0100, Frank Scheiner wrote:

[   20.090279] [<006c6494>] sys_mount+0x114/0x1e0
[   20.090338] [<006c6454>] sys_mount+0xd4/0x1e0
[   20.090499] [<00406274>] linux_sparc_syscall+0x34/0x44
[   20.090697] Disabling lock debugging due to kernel taint
[   20.090770] Caller[006c6494]: sys_mount+0x114/0x1e0
[   20.090926] Caller[006c6454]: sys_mount+0xd4/0x1e0
[   20.091133] Caller[00406274]: linux_sparc_syscall+0x34/0x44
[   20.091196] Caller[00100aa8]: 0x100aa8
[...]
```

[1]: https://pastebin.com/ApPYsMcu

Here the result for the suggested command:


Thanks.  And very strange, as i can't find what would free options
before.  Does the system boot if you comment out that kfree in line
3415 (even if that casues a memleak elsewhere).


Unfortunately not, the result with the kfree() commented in
fs/namespace.c:3415 looks pretty similar in my eyes.


Actually on second view the result looks different. :-/



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Frank Scheiner

On 24.03.21 17:10, Christoph Hellwig wrote:

On Wed, Mar 24, 2021 at 04:58:39PM +0100, Frank Scheiner wrote:

[   20.090279] [<006c6494>] sys_mount+0x114/0x1e0
[   20.090338] [<006c6454>] sys_mount+0xd4/0x1e0
[   20.090499] [<00406274>] linux_sparc_syscall+0x34/0x44
[   20.090697] Disabling lock debugging due to kernel taint
[   20.090770] Caller[006c6494]: sys_mount+0x114/0x1e0
[   20.090926] Caller[006c6454]: sys_mount+0xd4/0x1e0
[   20.091133] Caller[00406274]: linux_sparc_syscall+0x34/0x44
[   20.091196] Caller[00100aa8]: 0x100aa8
[...]
```

[1]: https://pastebin.com/ApPYsMcu

Here the result for the suggested command:


Thanks.  And very strange, as i can't find what would free options
before.  Does the system boot if you comment out that kfree in line
3415 (even if that casues a memleak elsewhere).


Unfortunately not, the result with the kfree() commented in
fs/namespace.c:3415 looks pretty similar in my eyes. Log is on [2]

[1]: https://pastebin.com/zmSFpv3R

Cheers,
Frank



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Christoph Hellwig
On Wed, Mar 24, 2021 at 04:58:39PM +0100, Frank Scheiner wrote:
> [   20.090279] [<006c6494>] sys_mount+0x114/0x1e0
> [   20.090338] [<006c6454>] sys_mount+0xd4/0x1e0
> [   20.090499] [<00406274>] linux_sparc_syscall+0x34/0x44
> [   20.090697] Disabling lock debugging due to kernel taint
> [   20.090770] Caller[006c6494]: sys_mount+0x114/0x1e0
> [   20.090926] Caller[006c6454]: sys_mount+0xd4/0x1e0
> [   20.091133] Caller[00406274]: linux_sparc_syscall+0x34/0x44
> [   20.091196] Caller[00100aa8]: 0x100aa8
> [...]
> ```
>
> [1]: https://pastebin.com/ApPYsMcu
>
> Here the result for the suggested command:

Thanks.  And very strange, as i can't find what would free options
before.  Does the system boot if you comment out that kfree in line
3415 (even if that casues a memleak elsewhere).



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Frank Scheiner




On 24.03.21 16:22, Jan Engelhardt wrote:


On Wednesday 2021-03-24 14:57, Frank Scheiner wrote:


(gdb) l *(sys_mount+0x114/0x1e0)
0x6c6380 is in __se_sys_mount (fs/namespace.c:3390).


/0x1e0 does not normally belong there. Just

l *(sys_mount+0x114)



I guess this comes from my log on [1]:

```
[...]
[   20.089289] RPC: 
[   20.089415] l0: 8001f8885cc8 l1: 8001f8881380 l2:
8001ec434558 l3: 00201db0
[   20.089586] l4: 029c l5: 8001c1a0 l6:
8001ec79c000 l7: 006c6380
[   20.089802] i0: 1000 i1: 8001ec436000 i2:
006c6494 i3: 8001ec436000
[   20.089877] i4: 88405340 i5: 645396c0 i6:
8001ec79f561 i7: 006c6494
[   20.090051] I7: 
[   20.090186] Call Trace:
[   20.090279] [<006c6494>] sys_mount+0x114/0x1e0
[   20.090338] [<006c6454>] sys_mount+0xd4/0x1e0
[   20.090499] [<00406274>] linux_sparc_syscall+0x34/0x44
[   20.090697] Disabling lock debugging due to kernel taint
[   20.090770] Caller[006c6494]: sys_mount+0x114/0x1e0
[   20.090926] Caller[006c6454]: sys_mount+0xd4/0x1e0
[   20.091133] Caller[00406274]: linux_sparc_syscall+0x34/0x44
[   20.091196] Caller[00100aa8]: 0x100aa8
[...]
```

[1]: https://pastebin.com/ApPYsMcu

Here the result for the suggested command:
```
root@t1000:~/mnt/torvalds-linux# gdb -q vmlinux
Reading symbols from vmlinux...
(gdb) l *(sys_mount+0x114)
0x6c6494 is in __se_sys_mount (fs/namespace.c:3415).
3410if (IS_ERR(options))
3411goto out_data;
3412
3413ret = do_mount(kernel_dev, dir_name, kernel_type, flags, 
options);
3414
3415kfree(options);
3416out_data:
3417kfree(kernel_dev);
3418out_dev:
3419kfree(kernel_type);
(gdb)
```

Cheers,
Frank



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Jan Engelhardt


On Wednesday 2021-03-24 14:57, Frank Scheiner wrote:

> (gdb) l *(sys_mount+0x114/0x1e0)
> 0x6c6380 is in __se_sys_mount (fs/namespace.c:3390).

/0x1e0 does not normally belong there. Just

l *(sys_mount+0x114)



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Frank Scheiner

On 24.03.21 09:28, Christoph Hellwig wrote:

On Tue, Mar 23, 2021 at 11:17:41PM +0100, Frank Scheiner wrote:

028abd9222df0cf5855dab5014a5ebaf06f90565

...is broken on my T1000.

As I don't know how big attachments can be on this list, I put the logs
on pastebin.

A log for 028abd9222df is here:

https://pastebin.com/ApPYsMcu


Just do confirm:  in this tree line 304 in mm/slub.c is this BUG_ON:

BUG_ON(object == fp); /* naive detection of double free or corruption */

which would mean we have a double free.  In that case it would be
interesting which call to kfree this is, which could be done by
calling gdb on vmlinux and then typing;

l *(sys_mount+0x114/0x1e0)

Not that a double free caused by this conversion makes any sense to me..



Finally - a T1 thread is so slow (for untaring) that I untared the
tarball from my X4270 cross-compile host to the T1000's root FS in the end:

```
root@t1000:~/mnt/torvalds-linux# git describe
v5.9-rc1-3-g028abd9222df
root@t1000:~/mnt/torvalds-linux# gdb -q vmlinux
Reading symbols from vmlinux...
(gdb) l *(sys_mount+0x114/0x1e0)
0x6c6380 is in __se_sys_mount (fs/namespace.c:3390).
3385/* ... and return the root of (sub)tree on it */
3386return path.dentry;
3387}
3388EXPORT_SYMBOL(mount_subtree);
3389
3390SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *,
dir_name,
3391char __user *, type, unsigned long, flags, void __user 
*, data)
3392{
3393int ret;
3394char *kernel_type;
(gdb)
```

...not sure if that adds anything to what Anatoly already provided apart
from the "correct" line numbers for the actually used kernel.

Cheers,
Frank



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Frank Scheiner

On 24.03.21 14:24, Anatoly Pugachev wrote:

On Wed, Mar 24, 2021 at 4:19 PM Frank Scheiner  wrote:

On 24.03.21 14:16, John Paul Adrian Glaubitz wrote:

On 3/24/21 2:09 PM, Frank Scheiner wrote:> Kernel sources are not available on 
the T1000.


If need be, where do they need to exist and how should the directory be
named - `/usr/src/[...]`?


Try installing "linux-source" and the "-dbg" package for your Debian kernel.


But don't I need the source for the kernel at 028abd92? I figured, I
need the sources in `/usr/src/linux-source-5.9.0-rc1+` because
"5.9.0-rc1+" is the version the corresponding modules are installed -
could that be correct?


Frank,

i'm using gdb from kernel sources directory (from which kernel is
installed), like:

$ uname -a
Linux ttip 5.12.0-rc4 #203 SMP Wed Mar 24 15:50:29 MSK 2021 sparc64 GNU/Linux
$ cd linux-2.6
linux-2.6$ git describe
v5.12-rc4
linux-2.6$ gdb -q vmlinux
Reading symbols from vmlinux...
(gdb) l *(sys_mount+0x114/0x1e0)
0x6dd7c0 is in __se_sys_mount (fs/namespace.c:3431).
3426/* ... and return the root of (sub)tree on it */
3427return path.dentry;
3428}
3429EXPORT_SYMBOL(mount_subtree);
3430
3431SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
3432char __user *, type, unsigned long, flags,
void __user *, data)
3433{
3434int ret;
3435char *kernel_type;
(gdb)



Ok, will try that approach. I'm currently `tar`ing the kernel sources
@028abd92 on the cross-compiling host and will move them over to the T1000.

Cheers,
Frank



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Anatoly Pugachev
On Wed, Mar 24, 2021 at 4:19 PM Frank Scheiner  wrote:
> On 24.03.21 14:16, John Paul Adrian Glaubitz wrote:
> > On 3/24/21 2:09 PM, Frank Scheiner wrote:> Kernel sources are not available 
> > on the T1000.
> >>
> >> If need be, where do they need to exist and how should the directory be
> >> named - `/usr/src/[...]`?
> >
> > Try installing "linux-source" and the "-dbg" package for your Debian kernel.
>
> But don't I need the source for the kernel at 028abd92? I figured, I
> need the sources in `/usr/src/linux-source-5.9.0-rc1+` because
> "5.9.0-rc1+" is the version the corresponding modules are installed -
> could that be correct?

Frank,

i'm using gdb from kernel sources directory (from which kernel is
installed), like:

$ uname -a
Linux ttip 5.12.0-rc4 #203 SMP Wed Mar 24 15:50:29 MSK 2021 sparc64 GNU/Linux
$ cd linux-2.6
linux-2.6$ git describe
v5.12-rc4
linux-2.6$ gdb -q vmlinux
Reading symbols from vmlinux...
(gdb) l *(sys_mount+0x114/0x1e0)
0x6dd7c0 is in __se_sys_mount (fs/namespace.c:3431).
3426/* ... and return the root of (sub)tree on it */
3427return path.dentry;
3428}
3429EXPORT_SYMBOL(mount_subtree);
3430
3431SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
3432char __user *, type, unsigned long, flags,
void __user *, data)
3433{
3434int ret;
3435char *kernel_type;
(gdb)



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Frank Scheiner




On 24.03.21 14:16, John Paul Adrian Glaubitz wrote:

On 3/24/21 2:09 PM, Frank Scheiner wrote:> Kernel sources are not available on 
the T1000.


If need be, where do they need to exist and how should the directory be
named - `/usr/src/[...]`?


Try installing "linux-source" and the "-dbg" package for your Debian kernel.


But don't I need the source for the kernel at 028abd92? I figured, I
need the sources in `/usr/src/linux-source-5.9.0-rc1+` because
"5.9.0-rc1+" is the version the corresponding modules are installed -
could that be correct?

Cheers,
Frank



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread John Paul Adrian Glaubitz
On 3/24/21 2:09 PM, Frank Scheiner wrote:> Kernel sources are not available on 
the T1000.
> 
> If need be, where do they need to exist and how should the directory be
> named - `/usr/src/[...]`?

Try installing "linux-source" and the "-dbg" package for your Debian kernel.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Frank Scheiner




On 24.03.21 09:28, Christoph Hellwig wrote:

On Tue, Mar 23, 2021 at 11:17:41PM +0100, Frank Scheiner wrote:

028abd9222df0cf5855dab5014a5ebaf06f90565

...is broken on my T1000.

As I don't know how big attachments can be on this list, I put the logs
on pastebin.

A log for 028abd9222df is here:

https://pastebin.com/ApPYsMcu


Just do confirm:  in this tree line 304 in mm/slub.c is this BUG_ON:

BUG_ON(object == fp); /* naive detection of double free or corruption */

which would mean we have a double free.  In that case it would be
interesting which call to kfree this is, which could be done by
calling gdb on vmlinux and then typing;

l *(sys_mount+0x114/0x1e0)

Not that a double free caused by this conversion makes any sense to me..


This is what I get:

```
root@t1000:~/kernels-in-question# gdb vmlinux-028abd9222df-new
GNU gdb (Debian 9.2-1+b1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "sparc64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vmlinux-028abd9222df-new...
(gdb) l *(sys_mount+0x114/0x1e0)
0x6c6380 is in __se_sys_mount (fs/namespace.c:3390).
3385fs/namespace.c: No such file or directory.
(gdb)
```

Kernel sources are not available on the T1000.

If need be, where do they need to exist and how should the directory be
named - `/usr/src/[...]`?

Cheers,
Frank



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread John Paul Adrian Glaubitz
Hello Frank!

On 3/24/21 1:30 PM, Frank Scheiner wrote:
> Sorry, but I can't install `gdb` on my T1000 ATM, because it depends on
> "libpython3.8" for sparc64 (see [1]) and "libpython3.9" for the other
> architectures, but "libpython3.8" is actually not available for sparc64,
> "libpython3.9" is available for sparc64 though:

The reason for this is a bug in gdb [1] and the fact that we don't have cruft
in Debian Ports [2]. If someone knows how to disable individual tests in the
GDB testsuite, we could just disable the problematic test in src:gdb.

Adrian

> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=26170
> [2] https://lists.debian.org/debian-sparc/2017/12/msg00060.html

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Frank Scheiner




On 24.03.21 13:42, Anatoly Pugachev wrote:

On Wed, Mar 24, 2021 at 3:31 PM Frank Scheiner  wrote:

Sorry, but I can't install `gdb` on my T1000 ATM, because it depends on
"libpython3.8" for sparc64 (see [1]) and "libpython3.9" for the other
architectures, but "libpython3.8" is actually not available for sparc64,
"libpython3.9" is available for sparc64 though:
...
The following packages have unmet dependencies:
   gdb : Depends: libpython3.8 (>= 3.8.2) but it is not installable
 Recommends: libc-dbg
E: Unable to correct problems, you have held broken packages.
```
Something wrong with the dependencies. Any suggestions?


Frank,

you could use http://snapshot.debian.org to install old versions of
packages, i.e. gdb and libpython-3.8


Of course, didn't think about that. Will try that and report my findings.

Thanks and cheers,
Frank



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Anatoly Pugachev
On Wed, Mar 24, 2021 at 3:31 PM Frank Scheiner  wrote:
> Sorry, but I can't install `gdb` on my T1000 ATM, because it depends on
> "libpython3.8" for sparc64 (see [1]) and "libpython3.9" for the other
> architectures, but "libpython3.8" is actually not available for sparc64,
> "libpython3.9" is available for sparc64 though:
> ...
> The following packages have unmet dependencies:
>   gdb : Depends: libpython3.8 (>= 3.8.2) but it is not installable
> Recommends: libc-dbg
> E: Unable to correct problems, you have held broken packages.
> ```
> Something wrong with the dependencies. Any suggestions?

Frank,

you could use http://snapshot.debian.org to install old versions of
packages, i.e. gdb and libpython-3.8



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Frank Scheiner

On 24.03.21 09:28, Christoph Hellwig wrote:

On Tue, Mar 23, 2021 at 11:17:41PM +0100, Frank Scheiner wrote:

028abd9222df0cf5855dab5014a5ebaf06f90565

...is broken on my T1000.

As I don't know how big attachments can be on this list, I put the logs
on pastebin.

A log for 028abd9222df is here:

https://pastebin.com/ApPYsMcu


Just do confirm:  in this tree line 304 in mm/slub.c is this BUG_ON:

BUG_ON(object == fp); /* naive detection of double free or corruption */

which would mean we have a double free.  In that case it would be
interesting which call to kfree this is, which could be done by
calling gdb on vmlinux and then typing;

l *(sys_mount+0x114/0x1e0)

Not that a double free caused by this conversion makes any sense to me..


Sorry, but I can't install `gdb` on my T1000 ATM, because it depends on
"libpython3.8" for sparc64 (see [1]) and "libpython3.9" for the other
architectures, but "libpython3.8" is actually not available for sparc64,
"libpython3.9" is available for sparc64 though:

```
root@t1000:~# apt install gdb
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 gdb : Depends: libpython3.8 (>= 3.8.2) but it is not installable
   Recommends: libc-dbg
E: Unable to correct problems, you have held broken packages.
```

[1]: https://packages.debian.org/sid/gdb

Something wrong with the dependencies. Any suggestions?

Cheers,
Frank



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-24 Thread Christoph Hellwig
On Tue, Mar 23, 2021 at 11:17:41PM +0100, Frank Scheiner wrote:
> 028abd9222df0cf5855dab5014a5ebaf06f90565
>
> ...is broken on my T1000.
>
> As I don't know how big attachments can be on this list, I put the logs
> on pastebin.
>
> A log for 028abd9222df is here:
>
> https://pastebin.com/ApPYsMcu

Just do confirm:  in this tree line 304 in mm/slub.c is this BUG_ON:

BUG_ON(object == fp); /* naive detection of double free or corruption */

which would mean we have a double free.  In that case it would be
interesting which call to kfree this is, which could be done by
calling gdb on vmlinux and then typing;

l *(sys_mount+0x114/0x1e0)

Not that a double free caused by this conversion makes any sense to me..



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-23 Thread Frank Scheiner

On 23.03.21 17:57, Christoph Hellwig wrote:> Frank, can you double check
that commit

67e306c6906137020267eb9bbdbc127034da3627 really still works, and
only 028abd9222df0cf5855dab5014a5ebaf06f90565 broke your setup?


So I manually checked out both 67e306c6906137020267eb9bbdbc127034da3627
and 028abd9222df0cf5855dab5014a5ebaf06f90565 and recompiled both (doing
`make [...] mrproper` before each run).

The results didn't change from the ones from the bisecting process:

67e306c6906137020267eb9bbdbc127034da3627

...is working and:

028abd9222df0cf5855dab5014a5ebaf06f90565

...is broken on my T1000.

As I don't know how big attachments can be on this list, I put the logs
on pastebin.

A log for 028abd9222df is here:

https://pastebin.com/ApPYsMcu

A log for 67e306c69061 is here:

https://pastebin.com/uGLXX7RS

Cheers,
Frank



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-23 Thread Frank Scheiner

On 23.03.21 17:57, Christoph Hellwig wrote:

On Tue, Mar 23, 2021 at 05:50:59PM +0100, Jan Engelhardt wrote:

Some participants in the discussion over at the debian-sparc list mentioned
"NFS" and "Invalid argument", which is something I know just too well from
iptables. NFS is a filesystem that uses an extra data blob (5th argument to the
mount syscall). Such blobs have historically not always been designed to bear
the same layout between ILP32 and LP64 modes, and nfs's structs fell prey to
this as well.

My hypothesis now is that fs/nfs/fs_context.c line 1160:

if (in_compat_syscall())
nfs4_compat_mount_data_conv(data);

and ones similar to it (I didn't look too close where nfs3 gets to do its
conversion), no longer trigger as a result of compat_sys_mount being
wiped from the syscall table:


No, if in_compat_syscall() syscall doesn't trigger properly the kernel
would not get this far.

That being said, the NFS compat code was moved out of the compat mount
handler and into nfs and refactored in the commit just before this one.

Frank, can you double check that commit
67e306c6906137020267eb9bbdbc127034da3627 really still works, and
only 028abd9222df0cf5855dab5014a5ebaf06f90565 broke your setup?


Indeed, I also expected 67e306c6906137020267eb9bbdbc127034da3627 to fail
because of its commit message, but from my log it did work correctly.

As the T1000 is at home and I don't have another T1 based system in my
storage location where I am now, I'll double check that in the evening
and report back.

Strangely for a V245 (with UltraSPARC IIIi) both commits seem to work
according to my testing, but 5.10.x (from Debian) doesn't work and
5.9.15 (also from Debian) does work - tested now both for boot from
network and boot from disk.

Possibly unrelated to the problem with the T1000, the V245 emits the
following for boot from disk with 5.10.x:

```
[...]
Loading Linux 5.10.0-5-sparc64-smp ...
Loading initial ramdisk ...

[2.602821] rtc_cmos rtc_cmos: IRQ index 0 not found
/dev/sda2: clean, 33516/8454144 files, 1105784/33798750 blocks
[   13.542728] autofs4:pid:1:autofs_fill_super: called with bogus options
[   13.628931] systemd[1]: proc-sys-fs-binfmt_misc.automount: Failed to
initialize automounter: Invalid argument
[   13.759917] systemd[1]: Failed to set up automount Arbitrary
Executable File Formats File System Automount Point.
[FAILED] Failed to set up automount  File System Automount Point.
[   14.456396] Unable to handle kernel paging request in mna handler
[   14.456400]  at virtual address da65f2fed110e482
[   14.597474] current->{active_,}mm->context = 00ce
[   14.597478] current->{active_,}mm->pgd = fff006d5c000
[   14.752380] Unable to handle kernel paging request in mna handler
[   14.752383]  at virtual address da65f2fed110e482
[   14.893509] current->{active_,}mm->context = 0094
[   14.969141] current->{active_,}mm->pgd = fff00011010e
[   15.040554] Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0009
[   15.141430] Press Stop-A (L1-A) from sun keyboard or send break
[   15.141430] twice on console to return to the boot prom
[   15.141459] kernel BUG at kernel/cpu.c:960
```

Cheers,
Frank



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-23 Thread Christoph Hellwig
On Tue, Mar 23, 2021 at 05:50:59PM +0100, Jan Engelhardt wrote:
> Some participants in the discussion over at the debian-sparc list mentioned
> "NFS" and "Invalid argument", which is something I know just too well from
> iptables. NFS is a filesystem that uses an extra data blob (5th argument to 
> the
> mount syscall). Such blobs have historically not always been designed to bear
> the same layout between ILP32 and LP64 modes, and nfs's structs fell prey to
> this as well.
> 
> My hypothesis now is that fs/nfs/fs_context.c line 1160:
> 
>   if (in_compat_syscall())
>   nfs4_compat_mount_data_conv(data);
> 
> and ones similar to it (I didn't look too close where nfs3 gets to do its
> conversion), no longer trigger as a result of compat_sys_mount being
> wiped from the syscall table:

No, if in_compat_syscall() syscall doesn't trigger properly the kernel
would not get this far.

That being said, the NFS compat code was moved out of the compat mount
handler and into nfs and refactored in the commit just before this one.

Frank, can you double check that commit
67e306c6906137020267eb9bbdbc127034da3627 really still works, and
only 028abd9222df0cf5855dab5014a5ebaf06f90565 broke your setup?



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-23 Thread Jan Engelhardt


On Monday 2021-03-22 22:55, Frank Scheiner wrote:
>>> Riccardo Mottola first recognized a problem with 5.10.x kernels on his
>>> Sun T2000 with UltraSPARC T1 (details in [this thread]). I could verify
>>> the problem also on my Sun T1000 and it looks like this specific issue
>>> breaks the mounting of the root FS or maybe mounting file systems at
>>> all. This affects both booting from disk and from network.
>>> (...)
>>> ...as first bad commit.
>>>
>>> ```
>>> commit 028abd9222df0cf5855dab5014a5ebaf06f90565
>>> Author: Christoph Hellwig 
>>>  fs: remove compat_sys_mount

Some participants in the discussion over at the debian-sparc list mentioned
"NFS" and "Invalid argument", which is something I know just too well from
iptables. NFS is a filesystem that uses an extra data blob (5th argument to the
mount syscall). Such blobs have historically not always been designed to bear
the same layout between ILP32 and LP64 modes, and nfs's structs fell prey to
this as well.

My hypothesis now is that fs/nfs/fs_context.c line 1160:

if (in_compat_syscall())
nfs4_compat_mount_data_conv(data);

and ones similar to it (I didn't look too close where nfs3 gets to do its
conversion), no longer trigger as a result of compat_sys_mount being
wiped from the syscall table:

+++ arch/sparc/kernel/syscalls/syscall.tbl
@@ -201,7 +201,7 @@
 16464  utrap_install   sys_utrap_install
 165common  quotactlsys_quotactl
 166common  set_tid_address sys_set_tid_address
-167common  mount   sys_mount   
compat_sys_mount
+167common  mount   sys_mount

I didn't extract from the debian-sparc discussion whether people were running
the all-LP64 userspace, or had some older Debian with a ILP32-on-64bitkernel
setup.


[But that's just a theory - a kernel theory!]



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-22 Thread Frank Scheiner

Hi,

On 22.03.21 22:48, John Paul Adrian Glaubitz wrote:

On 3/22/21 10:30 PM, Frank Scheiner wrote:

Riccardo Mottola first recognized a problem with 5.10.x kernels on his
Sun T2000 with UltraSPARC T1 (details in [this thread]). I could verify
the problem also on my Sun T1000 and it looks like this specific issue
breaks the mounting of the root FS or maybe mounting file systems at
all. This affects both booting from disk and from network.
(...)
...as first bad commit.

```
commit 028abd9222df0cf5855dab5014a5ebaf06f90565
Author: Christoph Hellwig 
Date:   Thu Sep 17 10:22:34 2020 +0200

 fs: remove compat_sys_mount

 compat_sys_mount is identical to the regular sys_mount now, so
remove it
 and use the native version everywhere.
```

[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=028abd9222df0cf5855dab5014a5ebaf06f90565


Looking at this change, I think it's rather unexpected that this particular
change would break the kernel on a specific CPU target. Are you sure that
this is the right bad commit?


Well, I strictly followed the `git bisect` process and tested each and
every proposed revision. It's indeed strange that this only affects
UltraSPARC T1s, but the changes match the behavior: mounting of (root)
FS is broken.


If you found the right commit, then I assume there is something wrong with
the syscall handling on UltraSPARC T1.


Could be, all in all the T1 is a first of its kind.

Cheers,
Frank



Re: Regression in 028abd92 for Sun UltraSPARC T1

2021-03-22 Thread John Paul Adrian Glaubitz
Hello!

On 3/22/21 10:30 PM, Frank Scheiner wrote:
> Riccardo Mottola first recognized a problem with 5.10.x kernels on his
> Sun T2000 with UltraSPARC T1 (details in [this thread]). I could verify
> the problem also on my Sun T1000 and it looks like this specific issue
> breaks the mounting of the root FS or maybe mounting file systems at
> all. This affects both booting from disk and from network.
> (...)
> ...as first bad commit.
> 
> ```
> commit 028abd9222df0cf5855dab5014a5ebaf06f90565
> Author: Christoph Hellwig 
> Date:   Thu Sep 17 10:22:34 2020 +0200
> 
> fs: remove compat_sys_mount
> 
> compat_sys_mount is identical to the regular sys_mount now, so
> remove it
> and use the native version everywhere.
> ```
> 
> [1]:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=028abd9222df0cf5855dab5014a5ebaf06f90565

Looking at this change, I think it's rather unexpected that this particular
change would break the kernel on a specific CPU target. Are you sure that
this is the right bad commit?

If you found the right commit, then I assume there is something wrong with
the syscall handling on UltraSPARC T1.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Regression in 028abd92 for Sun UltraSPARC T1

2021-03-22 Thread Frank Scheiner

Dear all,

Riccardo Mottola first recognized a problem with 5.10.x kernels on his
Sun T2000 with UltraSPARC T1 (details in [this thread]). I could verify
the problem also on my Sun T1000 and it looks like this specific issue
breaks the mounting of the root FS or maybe mounting file systems at
all. This affects both booting from disk and from network.

[this thread]: https://lists.debian.org/debian-sparc/2021/03/msg4.html

I bisected the Linux kernel between:

bbf5c979011a099af5dc76498918ed7df445635b (good)

...and:

3650b228f83adda7e5ee532e2b90429c03f7b9ec (bad)

...and the process identified:

028abd9222df0cf5855dab5014a5ebaf06f90565 ([1])

...as first bad commit.

```
commit 028abd9222df0cf5855dab5014a5ebaf06f90565
Author: Christoph Hellwig 
Date:   Thu Sep 17 10:22:34 2020 +0200

fs: remove compat_sys_mount

compat_sys_mount is identical to the regular sys_mount now, so
remove it
and use the native version everywhere.
```

[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=028abd9222df0cf5855dab5014a5ebaf06f90565

Details about the bisecting on [2].

[2]: https://lists.debian.org/debian-sparc/2021/03/msg00042.html

So far this only affects UltraSPARC T1 processors. I didn't see that
problem on a T5220 with UltraSPARC T2 and I also didn't see that problem
on a Sun Ultra Enterprise 450 with UltraSPARC II when testing a recent
Debian installation media with 5.10.x kernel some weeks ago. Other
UltraSPARC processors weren't tested yet. I plant to check UltraSPARC
IIIi and maybe others if time allows.



Do you maybe have an idea, what could go wrong with 028abd92
specifically on an UltraSPARC T1 processor?

I can provide a full log of a broken (network) boot process if that's
useful, I just need to re-create it. IIRC the kernel oopses for each
hardware thread (similar to what Riccardo wrote on the debian-sparc
mailing list above) and then stops.

Cheers,
Frank