Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8
just to confirm, works fine with the patch. Andrej On 10/7/21 8:09 PM, Cheyenne Wills wrote: A patch has been submitted for review (https://gerrit.openafs.org/#/c/14826). The fix itself was simple (2 lines), it just explicitly sets a function for the set_page_dirty operation to what was being used by default in the older Linux kernels. I was able to consistently reproduce the problem by using the iozone benchmarking tool (iozone -B -a). Thanks to Andrej Filipcic for reporting the problem, and Michael Laß for finding the offending Linux 5.14 commit Cheyenne Wills [email protected] From: [email protected] on behalf of Cheyenne Wills Sent: Wednesday, October 6, 2021 10:19:05 AM Cc: OpenAFS Subject: Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8 I've started to look at a fix for this. Thanks for the report From: [email protected] on behalf of Michael Laß Sent: Wednesday, October 6, 2021 2:05 AM To: Mark Vitale; Andrej Filipcic Cc: OpenAFS Subject: Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8 OK, I might have found the culprit. OpenZFS users were experiencing the same [1] so I looked at what they had to change to support Linux 5.14. I noticed this change: https://github.com/openzfs/zfs/pull/12427 It is required due to this upstream change: https://github.com/torvalds/linux/commit/0af573780b0b13fceb7fabd49dc1b073cee9a507 In fact, filemap_page_mkwrite, which is on the top of the shown call traces, calls set_page_dirty(page). So if that is an invalid function pointer, things go wrong. Best, Michael [1]: https://forum.endeavouros.com/t/null-pointer-dereference-with-kernel-5-14-1/17312 Am Mittwoch, dem 06.10.2021 um 09:49 +0200 schrieb Michael Laß: [reposting from correct mail address and with small change] Hi, it looks like people using Arch Linux get this error as well after updating to 5.14 [previously I wrote 5.14.9 but the reporter upgraded from 5.13 so it could be any subversion]. Here is a bug report: https://bugs.archlinux.org/task/72340 And here is a direct link to the reporter's crash log: https://bugs.archlinux.org/task/72340?getfile=20754 So the error as well is: Okt 04 09:18:48 kernel: Code: Unable to access opcode bytes at RIP 0xffd6. The call trace looks is also basically identical: Okt 04 09:18:48 kernel: Call Trace: Okt 04 09:18:48 kernel: filemap_page_mkwrite+0xdf/0x190 Okt 04 09:18:48 kernel: do_page_mkwrite+0x55/0xb0 Okt 04 09:18:48 kernel: do_wp_page+0x22b/0x2d0 Okt 04 09:18:48 kernel: ? cp_new_stat+0x134/0x160 Okt 04 09:18:48 kernel: __handle_mm_fault+0xd45/0x15c0 Okt 04 09:18:48 kernel: handle_mm_fault+0xd5/0x2a0 Okt 04 09:18:48 kernel: do_user_addr_fault+0x1de/0x690 Okt 04 09:18:48 kernel: exc_page_fault+0x72/0x170 Okt 04 09:18:48 kernel: ? asm_exc_page_fault+0x8/0x30 Okt 04 09:18:48 kernel: asm_exc_page_fault+0x1e/0x30 Best, Michael Am Montag, dem 04.10.2021 um 14:29 + schrieb Mark Vitale: Andrej, On Oct 4, 2021, at 5:47 AM, Andrej Filipcic wrote: I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login with any clues or patches? I consulted with a colleague and we haven't seen a failure like this. The backtrace doesn't contain any AFS code either, so no clues there. Is it possible your OpenAFS kernel module wasn't rebuilt for this kernel? (mismatched kernel version) Regards, -- Mark Vitale [email protected] ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info -- _ prof. dr. Andrej Filipcic, E-mail: [email protected] Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674Fax: +386-1-477-3166 - ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8
A patch has been submitted for review (https://gerrit.openafs.org/#/c/14826). The fix itself was simple (2 lines), it just explicitly sets a function for the set_page_dirty operation to what was being used by default in the older Linux kernels. I was able to consistently reproduce the problem by using the iozone benchmarking tool (iozone -B -a). Thanks to Andrej Filipcic for reporting the problem, and Michael Laß for finding the offending Linux 5.14 commit Cheyenne Wills [email protected] From: [email protected] on behalf of Cheyenne Wills Sent: Wednesday, October 6, 2021 10:19:05 AM Cc: OpenAFS Subject: Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8 I've started to look at a fix for this. Thanks for the report From: [email protected] on behalf of Michael Laß Sent: Wednesday, October 6, 2021 2:05 AM To: Mark Vitale; Andrej Filipcic Cc: OpenAFS Subject: Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8 OK, I might have found the culprit. OpenZFS users were experiencing the same [1] so I looked at what they had to change to support Linux 5.14. I noticed this change: https://github.com/openzfs/zfs/pull/12427 It is required due to this upstream change: https://github.com/torvalds/linux/commit/0af573780b0b13fceb7fabd49dc1b073cee9a507 In fact, filemap_page_mkwrite, which is on the top of the shown call traces, calls set_page_dirty(page). So if that is an invalid function pointer, things go wrong. Best, Michael [1]: https://forum.endeavouros.com/t/null-pointer-dereference-with-kernel-5-14-1/17312 Am Mittwoch, dem 06.10.2021 um 09:49 +0200 schrieb Michael Laß: > [reposting from correct mail address and with small change] > > Hi, > > it looks like people using Arch Linux get this error as well after > updating to 5.14 [previously I wrote 5.14.9 but the reporter upgraded > from 5.13 so it could be any subversion]. Here is a bug report: > > https://bugs.archlinux.org/task/72340 > > And here is a direct link to the reporter's crash log: > https://bugs.archlinux.org/task/72340?getfile=20754 > > So the error as well is: > > Okt 04 09:18:48 kernel: Code: Unable to access opcode bytes at RIP > > 0xffd6. > > The call trace looks is also basically identical: > > Okt 04 09:18:48 kernel: Call Trace: > > Okt 04 09:18:48 kernel: filemap_page_mkwrite+0xdf/0x190 > > Okt 04 09:18:48 kernel: do_page_mkwrite+0x55/0xb0 > > Okt 04 09:18:48 kernel: do_wp_page+0x22b/0x2d0 > > Okt 04 09:18:48 kernel: ? cp_new_stat+0x134/0x160 > > Okt 04 09:18:48 kernel: __handle_mm_fault+0xd45/0x15c0 > > Okt 04 09:18:48 kernel: handle_mm_fault+0xd5/0x2a0 > > Okt 04 09:18:48 kernel: do_user_addr_fault+0x1de/0x690 > > Okt 04 09:18:48 kernel: exc_page_fault+0x72/0x170 > > Okt 04 09:18:48 kernel: ? asm_exc_page_fault+0x8/0x30 > > Okt 04 09:18:48 kernel: asm_exc_page_fault+0x1e/0x30 > > Best, > Michael > > Am Montag, dem 04.10.2021 um 14:29 + schrieb Mark Vitale: > > Andrej, > > > > > On Oct 4, 2021, at 5:47 AM, Andrej Filipcic > > > > > > wrote: > > > > > > I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login > > > with > > > > > > > > > > > > any clues or patches? > > > > > > > > I consulted with a colleague and we haven't seen a failure like > > this. > > The backtrace doesn't contain any AFS code either, so no clues > > there. > > Is it possible your OpenAFS kernel module wasn't rebuilt for this > > kernel? (mismatched kernel version) > > > > > > Regards, > > -- > > Mark Vitale > > [email protected] > > > > > > > > ___ > > OpenAFS-info mailing list > > [email protected] > > https://lists.openafs.org/mailman/listinfo/openafs-info > > > > ___ > OpenAFS-info mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8
I've started to look at a fix for this. Thanks for the report From: [email protected] on behalf of Michael Laß Sent: Wednesday, October 6, 2021 2:05 AM To: Mark Vitale; Andrej Filipcic Cc: OpenAFS Subject: Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8 OK, I might have found the culprit. OpenZFS users were experiencing the same [1] so I looked at what they had to change to support Linux 5.14. I noticed this change: https://github.com/openzfs/zfs/pull/12427 It is required due to this upstream change: https://github.com/torvalds/linux/commit/0af573780b0b13fceb7fabd49dc1b073cee9a507 In fact, filemap_page_mkwrite, which is on the top of the shown call traces, calls set_page_dirty(page). So if that is an invalid function pointer, things go wrong. Best, Michael [1]: https://forum.endeavouros.com/t/null-pointer-dereference-with-kernel-5-14-1/17312 Am Mittwoch, dem 06.10.2021 um 09:49 +0200 schrieb Michael Laß: > [reposting from correct mail address and with small change] > > Hi, > > it looks like people using Arch Linux get this error as well after > updating to 5.14 [previously I wrote 5.14.9 but the reporter upgraded > from 5.13 so it could be any subversion]. Here is a bug report: > > https://bugs.archlinux.org/task/72340 > > And here is a direct link to the reporter's crash log: > https://bugs.archlinux.org/task/72340?getfile=20754 > > So the error as well is: > > Okt 04 09:18:48 kernel: Code: Unable to access opcode bytes at RIP > > 0xffd6. > > The call trace looks is also basically identical: > > Okt 04 09:18:48 kernel: Call Trace: > > Okt 04 09:18:48 kernel: filemap_page_mkwrite+0xdf/0x190 > > Okt 04 09:18:48 kernel: do_page_mkwrite+0x55/0xb0 > > Okt 04 09:18:48 kernel: do_wp_page+0x22b/0x2d0 > > Okt 04 09:18:48 kernel: ? cp_new_stat+0x134/0x160 > > Okt 04 09:18:48 kernel: __handle_mm_fault+0xd45/0x15c0 > > Okt 04 09:18:48 kernel: handle_mm_fault+0xd5/0x2a0 > > Okt 04 09:18:48 kernel: do_user_addr_fault+0x1de/0x690 > > Okt 04 09:18:48 kernel: exc_page_fault+0x72/0x170 > > Okt 04 09:18:48 kernel: ? asm_exc_page_fault+0x8/0x30 > > Okt 04 09:18:48 kernel: asm_exc_page_fault+0x1e/0x30 > > Best, > Michael > > Am Montag, dem 04.10.2021 um 14:29 + schrieb Mark Vitale: > > Andrej, > > > > > On Oct 4, 2021, at 5:47 AM, Andrej Filipcic > > > > > > wrote: > > > > > > I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login > > > with > > > > > > > > > > > > any clues or patches? > > > > > > > > I consulted with a colleague and we haven't seen a failure like > > this. > > The backtrace doesn't contain any AFS code either, so no clues > > there. > > Is it possible your OpenAFS kernel module wasn't rebuilt for this > > kernel? (mismatched kernel version) > > > > > > Regards, > > -- > > Mark Vitale > > [email protected] > > > > > > > > ___ > > OpenAFS-info mailing list > > [email protected] > > https://lists.openafs.org/mailman/listinfo/openafs-info > > > > ___ > OpenAFS-info mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8
OK, I might have found the culprit. OpenZFS users were experiencing the same [1] so I looked at what they had to change to support Linux 5.14. I noticed this change: https://github.com/openzfs/zfs/pull/12427 It is required due to this upstream change: https://github.com/torvalds/linux/commit/0af573780b0b13fceb7fabd49dc1b073cee9a507 In fact, filemap_page_mkwrite, which is on the top of the shown call traces, calls set_page_dirty(page). So if that is an invalid function pointer, things go wrong. Best, Michael [1]: https://forum.endeavouros.com/t/null-pointer-dereference-with-kernel-5-14-1/17312 Am Mittwoch, dem 06.10.2021 um 09:49 +0200 schrieb Michael Laß: > [reposting from correct mail address and with small change] > > Hi, > > it looks like people using Arch Linux get this error as well after > updating to 5.14 [previously I wrote 5.14.9 but the reporter upgraded > from 5.13 so it could be any subversion]. Here is a bug report: > > https://bugs.archlinux.org/task/72340 > > And here is a direct link to the reporter's crash log: > https://bugs.archlinux.org/task/72340?getfile=20754 > > So the error as well is: > > Okt 04 09:18:48 kernel: Code: Unable to access opcode bytes at RIP > > 0xffd6. > > The call trace looks is also basically identical: > > Okt 04 09:18:48 kernel: Call Trace: > > Okt 04 09:18:48 kernel: filemap_page_mkwrite+0xdf/0x190 > > Okt 04 09:18:48 kernel: do_page_mkwrite+0x55/0xb0 > > Okt 04 09:18:48 kernel: do_wp_page+0x22b/0x2d0 > > Okt 04 09:18:48 kernel: ? cp_new_stat+0x134/0x160 > > Okt 04 09:18:48 kernel: __handle_mm_fault+0xd45/0x15c0 > > Okt 04 09:18:48 kernel: handle_mm_fault+0xd5/0x2a0 > > Okt 04 09:18:48 kernel: do_user_addr_fault+0x1de/0x690 > > Okt 04 09:18:48 kernel: exc_page_fault+0x72/0x170 > > Okt 04 09:18:48 kernel: ? asm_exc_page_fault+0x8/0x30 > > Okt 04 09:18:48 kernel: asm_exc_page_fault+0x1e/0x30 > > Best, > Michael > > Am Montag, dem 04.10.2021 um 14:29 + schrieb Mark Vitale: > > Andrej, > > > > > On Oct 4, 2021, at 5:47 AM, Andrej Filipcic > > > > > > wrote: > > > > > > I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login > > > with > > > > > > > > > > > > any clues or patches? > > > > > > > > I consulted with a colleague and we haven't seen a failure like > > this. > > The backtrace doesn't contain any AFS code either, so no clues > > there. > > Is it possible your OpenAFS kernel module wasn't rebuilt for this > > kernel? (mismatched kernel version) > > > > > > Regards, > > -- > > Mark Vitale > > [email protected] > > > > > > > > ___ > > OpenAFS-info mailing list > > [email protected] > > https://lists.openafs.org/mailman/listinfo/openafs-info > > > > ___ > OpenAFS-info mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8
[reposting from correct mail address and with small change] Hi, it looks like people using Arch Linux get this error as well after updating to 5.14 [previously I wrote 5.14.9 but the reporter upgraded from 5.13 so it could be any subversion]. Here is a bug report: https://bugs.archlinux.org/task/72340 And here is a direct link to the reporter's crash log: https://bugs.archlinux.org/task/72340?getfile=20754 So the error as well is: > Okt 04 09:18:48 kernel: Code: Unable to access opcode bytes at RIP > 0xffd6. The call trace looks is also basically identical: > Okt 04 09:18:48 kernel: Call Trace: > Okt 04 09:18:48 kernel: filemap_page_mkwrite+0xdf/0x190 > Okt 04 09:18:48 kernel: do_page_mkwrite+0x55/0xb0 > Okt 04 09:18:48 kernel: do_wp_page+0x22b/0x2d0 > Okt 04 09:18:48 kernel: ? cp_new_stat+0x134/0x160 > Okt 04 09:18:48 kernel: __handle_mm_fault+0xd45/0x15c0 > Okt 04 09:18:48 kernel: handle_mm_fault+0xd5/0x2a0 > Okt 04 09:18:48 kernel: do_user_addr_fault+0x1de/0x690 > Okt 04 09:18:48 kernel: exc_page_fault+0x72/0x170 > Okt 04 09:18:48 kernel: ? asm_exc_page_fault+0x8/0x30 > Okt 04 09:18:48 kernel: asm_exc_page_fault+0x1e/0x30 Best, Michael Am Montag, dem 04.10.2021 um 14:29 + schrieb Mark Vitale: > Andrej, > > > On Oct 4, 2021, at 5:47 AM, Andrej Filipcic > > > > wrote: > > > > I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login > > with > > > > > > > > any clues or patches? > > > > > I consulted with a colleague and we haven't seen a failure like > this. > The backtrace doesn't contain any AFS code either, so no clues > there. > Is it possible your OpenAFS kernel module wasn't rebuilt for this > kernel? (mismatched kernel version) > > > Regards, > -- > Mark Vitale > [email protected] > > > > ___ > OpenAFS-info mailing list > [email protected] > https://lists.openafs.org/mailman/listinfo/openafs-info ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8
On 04/10/2021 16:29, Mark Vitale wrote: Andrej, On Oct 4, 2021, at 5:47 AM, Andrej Filipcic wrote: I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login with any clues or patches? I consulted with a colleague and we haven't seen a failure like this. The backtrace doesn't contain any AFS code either, so no clues there. Is it possible your OpenAFS kernel module wasn't rebuilt for this kernel? (mismatched kernel version) It was built correctly. I will try again, maybe something else is wrong. I have seen similar reports with some zfs versions, though I am not using it in my case. Best regards Andrej Regards, -- Mark Vitale [email protected] -- _ prof. dr. Andrej Filipcic, E-mail: [email protected] Department of Experimental High Energy Physics - F9 Jozef Stefan Institute, Jamova 39, P.o.Box 3000 SI-1001 Ljubljana, Slovenia Tel.: +386-1-477-3674Fax: +386-1-425-7074 - ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] kernel 5.14 RIP with openafs 1.8.8
Andrej, > On Oct 4, 2021, at 5:47 AM, Andrej Filipcic wrote: > > I tried kernel 5.14.9 and openafs 1.8.8. It fails just after login with > > > > any clues or patches? > > I consulted with a colleague and we haven't seen a failure like this. The backtrace doesn't contain any AFS code either, so no clues there. Is it possible your OpenAFS kernel module wasn't rebuilt for this kernel? (mismatched kernel version) Regards, -- Mark Vitale [email protected] ___ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
