Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

2021-10-10 Thread Andrej Filipcic



just to confirm, works fine with the patch.

Andrej

On 10/7/21 8:09 PM, Cheyenne Wills wrote:

A patch has been submitted for review (https://gerrit.openafs.org/#/c/14826).  
The fix itself was simple (2 lines), it just explicitly sets a function for the 
set_page_dirty operation to what was being used by default in the older Linux 
kernels.

I was able to consistently reproduce the problem by using the iozone 
benchmarking tool (iozone -B -a).

Thanks to Andrej Filipcic for reporting the problem, and Michael Laß for 
finding the offending Linux 5.14 commit

Cheyenne Wills
[email protected]

From: [email protected]  on behalf of 
Cheyenne Wills 
Sent: Wednesday, October 6, 2021 10:19:05 AM
Cc: OpenAFS
Subject: Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

I've started to look at a fix for this.

Thanks for the report

From: [email protected]  on behalf of 
Michael Laß 
Sent: Wednesday, October 6, 2021 2:05 AM
To: Mark Vitale; Andrej Filipcic
Cc: OpenAFS
Subject: Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

OK, I might have found the culprit. OpenZFS users were experiencing the
same [1] so I looked at what they had to change to support Linux 5.14.

I noticed this change: https://github.com/openzfs/zfs/pull/12427
It is required due to this upstream change:
https://github.com/torvalds/linux/commit/0af573780b0b13fceb7fabd49dc1b073cee9a507

In fact, filemap_page_mkwrite, which is on the top of the shown call
traces, calls set_page_dirty(page). So if that is an invalid function
pointer, things go wrong.

Best,
Michael

[1]:
https://forum.endeavouros.com/t/null-pointer-dereference-with-kernel-5-14-1/17312

Am Mittwoch, dem 06.10.2021 um 09:49 +0200 schrieb Michael Laß:

[reposting from correct mail address and with small change]

Hi,

it looks like people using Arch Linux get this error as well after
updating to 5.14 [previously I wrote 5.14.9 but the reporter upgraded
from 5.13 so it could be any subversion]. Here is a bug report:

https://bugs.archlinux.org/task/72340

And here is a direct link to the reporter's crash log:
https://bugs.archlinux.org/task/72340?getfile=20754

So the error as well is:

Okt 04 09:18:48  kernel: Code: Unable to access opcode bytes at RIP
0xffd6.

The call trace looks is also basically identical:

Okt 04 09:18:48  kernel: Call Trace:
Okt 04 09:18:48  kernel:  filemap_page_mkwrite+0xdf/0x190
Okt 04 09:18:48  kernel:  do_page_mkwrite+0x55/0xb0
Okt 04 09:18:48  kernel:  do_wp_page+0x22b/0x2d0
Okt 04 09:18:48  kernel:  ? cp_new_stat+0x134/0x160
Okt 04 09:18:48  kernel:  __handle_mm_fault+0xd45/0x15c0
Okt 04 09:18:48  kernel:  handle_mm_fault+0xd5/0x2a0
Okt 04 09:18:48  kernel:  do_user_addr_fault+0x1de/0x690
Okt 04 09:18:48  kernel:  exc_page_fault+0x72/0x170
Okt 04 09:18:48  kernel:  ? asm_exc_page_fault+0x8/0x30
Okt 04 09:18:48  kernel:  asm_exc_page_fault+0x1e/0x30

Best,
Michael

Am Montag, dem 04.10.2021 um 14:29 + schrieb Mark Vitale:

Andrej,


On Oct 4, 2021, at 5:47 AM, Andrej Filipcic

wrote:

I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login
with



any clues or patches?



I consulted with a colleague and we haven't seen a failure like
this.
The backtrace doesn't contain any AFS code either, so no clues
there.
Is it possible your OpenAFS kernel module wasn't rebuilt for this
kernel?  (mismatched kernel version)


Regards,
--
Mark Vitale
[email protected]



___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info



___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info


___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info
___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info
___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info



--
_
   prof. dr. Andrej Filipcic,   E-mail: [email protected]
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674Fax: +386-1-477-3166
-

___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

2021-10-07 Thread Cheyenne Wills
A patch has been submitted for review (https://gerrit.openafs.org/#/c/14826).  
The fix itself was simple (2 lines), it just explicitly sets a function for the 
set_page_dirty operation to what was being used by default in the older Linux 
kernels.

I was able to consistently reproduce the problem by using the iozone 
benchmarking tool (iozone -B -a).

Thanks to Andrej Filipcic for reporting the problem, and Michael Laß for 
finding the offending Linux 5.14 commit

Cheyenne Wills
[email protected]

From: [email protected]  on behalf 
of Cheyenne Wills 
Sent: Wednesday, October 6, 2021 10:19:05 AM
Cc: OpenAFS
Subject: Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

I've started to look at a fix for this.

Thanks for the report

From: [email protected]  on behalf 
of Michael Laß 
Sent: Wednesday, October 6, 2021 2:05 AM
To: Mark Vitale; Andrej Filipcic
Cc: OpenAFS
Subject: Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

OK, I might have found the culprit. OpenZFS users were experiencing the
same [1] so I looked at what they had to change to support Linux 5.14.

I noticed this change: https://github.com/openzfs/zfs/pull/12427
It is required due to this upstream change:
https://github.com/torvalds/linux/commit/0af573780b0b13fceb7fabd49dc1b073cee9a507

In fact, filemap_page_mkwrite, which is on the top of the shown call
traces, calls set_page_dirty(page). So if that is an invalid function
pointer, things go wrong.

Best,
Michael

[1]:
https://forum.endeavouros.com/t/null-pointer-dereference-with-kernel-5-14-1/17312

Am Mittwoch, dem 06.10.2021 um 09:49 +0200 schrieb Michael Laß:
> [reposting from correct mail address and with small change]
>
> Hi,
>
> it looks like people using Arch Linux get this error as well after
> updating to 5.14 [previously I wrote 5.14.9 but the reporter upgraded
> from 5.13 so it could be any subversion]. Here is a bug report:
>
> https://bugs.archlinux.org/task/72340
>
> And here is a direct link to the reporter's crash log:
> https://bugs.archlinux.org/task/72340?getfile=20754
>
> So the error as well is:
> > Okt 04 09:18:48  kernel: Code: Unable to access opcode bytes at RIP
> > 0xffd6.
>
> The call trace looks is also basically identical:
> > Okt 04 09:18:48  kernel: Call Trace:
> > Okt 04 09:18:48  kernel:  filemap_page_mkwrite+0xdf/0x190
> > Okt 04 09:18:48  kernel:  do_page_mkwrite+0x55/0xb0
> > Okt 04 09:18:48  kernel:  do_wp_page+0x22b/0x2d0
> > Okt 04 09:18:48  kernel:  ? cp_new_stat+0x134/0x160
> > Okt 04 09:18:48  kernel:  __handle_mm_fault+0xd45/0x15c0
> > Okt 04 09:18:48  kernel:  handle_mm_fault+0xd5/0x2a0
> > Okt 04 09:18:48  kernel:  do_user_addr_fault+0x1de/0x690
> > Okt 04 09:18:48  kernel:  exc_page_fault+0x72/0x170
> > Okt 04 09:18:48  kernel:  ? asm_exc_page_fault+0x8/0x30
> > Okt 04 09:18:48  kernel:  asm_exc_page_fault+0x1e/0x30
>
> Best,
> Michael
>
> Am Montag, dem 04.10.2021 um 14:29 + schrieb Mark Vitale:
> > Andrej,
> >
> > > On Oct 4, 2021, at 5:47 AM, Andrej Filipcic
> > > 
> > > wrote:
> > >
> > > I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login
> > > with
> > >
> > > 
> > >
> > > any clues or patches?
> > >
> > >
> > I consulted with a colleague and we haven't seen a failure like
> > this.
> > The backtrace doesn't contain any AFS code either, so no clues
> > there.
> > Is it possible your OpenAFS kernel module wasn't rebuilt for this
> > kernel?  (mismatched kernel version)
> >
> >
> > Regards,
> > --
> > Mark Vitale
> > [email protected]
> >
> >
> >
> > ___
> > OpenAFS-info mailing list
> > [email protected]
> > https://lists.openafs.org/mailman/listinfo/openafs-info
>
>
>
> ___
> OpenAFS-info mailing list
> [email protected]
> https://lists.openafs.org/mailman/listinfo/openafs-info


___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info
___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info
___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

2021-10-06 Thread Cheyenne Wills
I've started to look at a fix for this.  

Thanks for the report

From: [email protected]  on behalf 
of Michael Laß 
Sent: Wednesday, October 6, 2021 2:05 AM
To: Mark Vitale; Andrej Filipcic
Cc: OpenAFS
Subject: Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

OK, I might have found the culprit. OpenZFS users were experiencing the
same [1] so I looked at what they had to change to support Linux 5.14.

I noticed this change: https://github.com/openzfs/zfs/pull/12427
It is required due to this upstream change:
https://github.com/torvalds/linux/commit/0af573780b0b13fceb7fabd49dc1b073cee9a507

In fact, filemap_page_mkwrite, which is on the top of the shown call
traces, calls set_page_dirty(page). So if that is an invalid function
pointer, things go wrong.

Best,
Michael

[1]:
https://forum.endeavouros.com/t/null-pointer-dereference-with-kernel-5-14-1/17312

Am Mittwoch, dem 06.10.2021 um 09:49 +0200 schrieb Michael Laß:
> [reposting from correct mail address and with small change]
>
> Hi,
>
> it looks like people using Arch Linux get this error as well after
> updating to 5.14 [previously I wrote 5.14.9 but the reporter upgraded
> from 5.13 so it could be any subversion]. Here is a bug report:
>
> https://bugs.archlinux.org/task/72340
>
> And here is a direct link to the reporter's crash log:
> https://bugs.archlinux.org/task/72340?getfile=20754
>
> So the error as well is:
> > Okt 04 09:18:48  kernel: Code: Unable to access opcode bytes at RIP
> > 0xffd6.
>
> The call trace looks is also basically identical:
> > Okt 04 09:18:48  kernel: Call Trace:
> > Okt 04 09:18:48  kernel:  filemap_page_mkwrite+0xdf/0x190
> > Okt 04 09:18:48  kernel:  do_page_mkwrite+0x55/0xb0
> > Okt 04 09:18:48  kernel:  do_wp_page+0x22b/0x2d0
> > Okt 04 09:18:48  kernel:  ? cp_new_stat+0x134/0x160
> > Okt 04 09:18:48  kernel:  __handle_mm_fault+0xd45/0x15c0
> > Okt 04 09:18:48  kernel:  handle_mm_fault+0xd5/0x2a0
> > Okt 04 09:18:48  kernel:  do_user_addr_fault+0x1de/0x690
> > Okt 04 09:18:48  kernel:  exc_page_fault+0x72/0x170
> > Okt 04 09:18:48  kernel:  ? asm_exc_page_fault+0x8/0x30
> > Okt 04 09:18:48  kernel:  asm_exc_page_fault+0x1e/0x30
>
> Best,
> Michael
>
> Am Montag, dem 04.10.2021 um 14:29 + schrieb Mark Vitale:
> > Andrej,
> >
> > > On Oct 4, 2021, at 5:47 AM, Andrej Filipcic
> > > 
> > > wrote:
> > >
> > > I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login
> > > with
> > >
> > > 
> > >
> > > any clues or patches?
> > >
> > >
> > I consulted with a colleague and we haven't seen a failure like
> > this.
> > The backtrace doesn't contain any AFS code either, so no clues
> > there.
> > Is it possible your OpenAFS kernel module wasn't rebuilt for this
> > kernel?  (mismatched kernel version)
> >
> >
> > Regards,
> > --
> > Mark Vitale
> > [email protected]
> >
> >
> >
> > ___
> > OpenAFS-info mailing list
> > [email protected]
> > https://lists.openafs.org/mailman/listinfo/openafs-info
>
>
>
> ___
> OpenAFS-info mailing list
> [email protected]
> https://lists.openafs.org/mailman/listinfo/openafs-info


___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info
___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

2021-10-06 Thread Michael Laß
OK, I might have found the culprit. OpenZFS users were experiencing the
same [1] so I looked at what they had to change to support Linux 5.14.

I noticed this change: https://github.com/openzfs/zfs/pull/12427
It is required due to this upstream change:
https://github.com/torvalds/linux/commit/0af573780b0b13fceb7fabd49dc1b073cee9a507

In fact, filemap_page_mkwrite, which is on the top of the shown call
traces, calls set_page_dirty(page). So if that is an invalid function
pointer, things go wrong.

Best,
Michael

[1]:
https://forum.endeavouros.com/t/null-pointer-dereference-with-kernel-5-14-1/17312

Am Mittwoch, dem 06.10.2021 um 09:49 +0200 schrieb Michael Laß:
> [reposting from correct mail address and with small change]
> 
> Hi,
> 
> it looks like people using Arch Linux get this error as well after
> updating to 5.14 [previously I wrote 5.14.9 but the reporter upgraded
> from 5.13 so it could be any subversion]. Here is a bug report:
> 
> https://bugs.archlinux.org/task/72340
> 
> And here is a direct link to the reporter's crash log:
> https://bugs.archlinux.org/task/72340?getfile=20754
> 
> So the error as well is:
> > Okt 04 09:18:48  kernel: Code: Unable to access opcode bytes at RIP
> > 0xffd6.
> 
> The call trace looks is also basically identical:
> > Okt 04 09:18:48  kernel: Call Trace:
> > Okt 04 09:18:48  kernel:  filemap_page_mkwrite+0xdf/0x190
> > Okt 04 09:18:48  kernel:  do_page_mkwrite+0x55/0xb0
> > Okt 04 09:18:48  kernel:  do_wp_page+0x22b/0x2d0
> > Okt 04 09:18:48  kernel:  ? cp_new_stat+0x134/0x160
> > Okt 04 09:18:48  kernel:  __handle_mm_fault+0xd45/0x15c0
> > Okt 04 09:18:48  kernel:  handle_mm_fault+0xd5/0x2a0
> > Okt 04 09:18:48  kernel:  do_user_addr_fault+0x1de/0x690
> > Okt 04 09:18:48  kernel:  exc_page_fault+0x72/0x170
> > Okt 04 09:18:48  kernel:  ? asm_exc_page_fault+0x8/0x30
> > Okt 04 09:18:48  kernel:  asm_exc_page_fault+0x1e/0x30
> 
> Best,
> Michael
> 
> Am Montag, dem 04.10.2021 um 14:29 + schrieb Mark Vitale:
> > Andrej,
> > 
> > > On Oct 4, 2021, at 5:47 AM, Andrej Filipcic
> > > 
> > > wrote:
> > > 
> > > I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login
> > > with
> > > 
> > > 
> > > 
> > > any clues or patches?
> > > 
> > > 
> > I consulted with a colleague and we haven't seen a failure like
> > this. 
> > The backtrace doesn't contain any AFS code either, so no clues
> > there. 
> > Is it possible your OpenAFS kernel module wasn't rebuilt for this
> > kernel?  (mismatched kernel version)
> > 
> > 
> > Regards,
> > --
> > Mark Vitale
> > [email protected]
> > 
> > 
> > 
> > ___
> > OpenAFS-info mailing list
> > [email protected]
> > https://lists.openafs.org/mailman/listinfo/openafs-info
> 
> 
> 
> ___
> OpenAFS-info mailing list
> [email protected]
> https://lists.openafs.org/mailman/listinfo/openafs-info


___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

2021-10-06 Thread Michael Laß
[reposting from correct mail address and with small change]

Hi,

it looks like people using Arch Linux get this error as well after
updating to 5.14 [previously I wrote 5.14.9 but the reporter upgraded
from 5.13 so it could be any subversion]. Here is a bug report:

https://bugs.archlinux.org/task/72340

And here is a direct link to the reporter's crash log:
https://bugs.archlinux.org/task/72340?getfile=20754

So the error as well is:
> Okt 04 09:18:48  kernel: Code: Unable to access opcode bytes at RIP
> 0xffd6.

The call trace looks is also basically identical:
> Okt 04 09:18:48  kernel: Call Trace:
> Okt 04 09:18:48  kernel:  filemap_page_mkwrite+0xdf/0x190
> Okt 04 09:18:48  kernel:  do_page_mkwrite+0x55/0xb0
> Okt 04 09:18:48  kernel:  do_wp_page+0x22b/0x2d0
> Okt 04 09:18:48  kernel:  ? cp_new_stat+0x134/0x160
> Okt 04 09:18:48  kernel:  __handle_mm_fault+0xd45/0x15c0
> Okt 04 09:18:48  kernel:  handle_mm_fault+0xd5/0x2a0
> Okt 04 09:18:48  kernel:  do_user_addr_fault+0x1de/0x690
> Okt 04 09:18:48  kernel:  exc_page_fault+0x72/0x170
> Okt 04 09:18:48  kernel:  ? asm_exc_page_fault+0x8/0x30
> Okt 04 09:18:48  kernel:  asm_exc_page_fault+0x1e/0x30

Best,
Michael

Am Montag, dem 04.10.2021 um 14:29 + schrieb Mark Vitale:
> Andrej,
> 
> > On Oct 4, 2021, at 5:47 AM, Andrej Filipcic
> > 
> > wrote:
> > 
> > I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login
> > with
> > 
> > 
> > 
> > any clues or patches?
> > 
> > 
> I consulted with a colleague and we haven't seen a failure like
> this. 
> The backtrace doesn't contain any AFS code either, so no clues
> there. 
> Is it possible your OpenAFS kernel module wasn't rebuilt for this
> kernel?  (mismatched kernel version)
> 
> 
> Regards,
> --
> Mark Vitale
> [email protected]
> 
> 
> 
> ___
> OpenAFS-info mailing list
> [email protected]
> https://lists.openafs.org/mailman/listinfo/openafs-info



___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

2021-10-05 Thread Andrej Filipcic

On 04/10/2021 16:29, Mark Vitale wrote:

Andrej,


On Oct 4, 2021, at 5:47 AM, Andrej Filipcic  wrote:

I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login with



any clues or patches?



I consulted with a colleague and we haven't seen a failure like this.  The 
backtrace doesn't contain any AFS code either, so no clues there.  Is it 
possible your OpenAFS kernel module wasn't rebuilt for this kernel?  
(mismatched kernel version)
It was built correctly. I will  try again, maybe something else is 
wrong. I have seen similar reports with some zfs versions, though I am 
not using it in my case.


Best regards
Andrej



Regards,
--
Mark Vitale
[email protected]






--
_
   prof. dr. Andrej Filipcic,   E-mail: [email protected]
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674Fax: +386-1-425-7074
-

___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8

2021-10-04 Thread Mark Vitale
Andrej,

> On Oct 4, 2021, at 5:47 AM, Andrej Filipcic  wrote:
> 
> I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login with
> 
> 
> 
> any clues or patches?
> 
> 
I consulted with a colleague and we haven't seen a failure like this.  The 
backtrace doesn't contain any AFS code either, so no clues there.  Is it 
possible your OpenAFS kernel module wasn't rebuilt for this kernel?  
(mismatched kernel version)


Regards,
--
Mark Vitale
[email protected]



___
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info