Bug#1053122: Fwd: Re: Bug#1053122: linux-image-6.5.0-1-amd64: using smp_processor_id() in preemptible
Control: tag -1 - moreinfo Forwarded Message From: Gabriel Francisco To: Ben Hutchings Subject: Re: Bug#1053122: linux-image-6.5.0-1-amd64: using smp_processor_id() in preemptible Date: 12/10/23 20:23:30 Message-Id: Hi, > The CPU registers contain several addresses starting 89, except for > rbx which starts 99 (and is the faulting address). That looks like > a single bit got flipped. Thanks for the explanation! (now I know how to detect bit flips) :D > The first BUG message should be more meaningful that what comes after. > This shows the kernel tried to access non-existent memory. Yes, I should have reported the first one indeed, I thought too much and ended reporting the second one. Sorry about that. > This could be due to a kernel bug, but is more likely a hardware > problem. Please test the RAM with memtest86+. Also if you've enabled > any overclocking options, turn those off. Even with XMP(3000@1.35v) enabled (F4-3000C16-16GISB), memtest86+ ran for 3 hours and printed PASS in the screen. I removed the XMP profile from my memories and ordered new rams to check if my current ones are faulty (or not). The message in dmesg was only one occasion. (but I reported it anyways) The hang does still happens with/without XMP when running 6.5.x kernel series. It happens when maximizing a video (or time-to-time when my cursor enters the video area) when using kernel 6.5.x. It does not happen with kernel 6.1.x series. I'm using amgpu module. Greetings, *Gabriel Francisco* Linux User #507840 email: frc.gabriel[at]gmail.com On Thu, Oct 5, 2023 at 1:15 AM Ben Hutchings wrote: > Control: retitle -1 linux-image-6.5.0-1-amd64: Kernel page fault in > process exit due to bit flip > Control: tag -1 moreinfo > > On Wed, 2023-09-27 at 20:45 +0200, Gabriel Francisco wrote: > > Package: src:linux > > Version: 6.5.3-1 > > Severity: important > > Tags: upstream > > X-Debbugs-Cc: frc.gabr...@gmail.com > > > > Dear Maintainer, > > > > First of all thanks for your hard work! > > > > I noticed my computer started freezing for few seconds when > entering/exiting > > full screen videos in youtube using firefox and while trying to check if > the > > issue also afected chromium I saw the following message in dmesg: > > > > [12569.564300] BUG: unable to handle page fault for address: > 991989e936b8 > > [12569.564304] #PF: supervisor write access in kernel mode > > [12569.564306] #PF: error_code(0x0002) - not-present page > > The first BUG message should be more meaningful that what comes after. > This shows the kernel tried to access non-existent memory. > > > [12569.564308] PGD 0 P4D 0 > > [12569.564311] Oops: 0002 [#1] PREEMPT SMP NOPTI > > [12569.564314] CPU: 10 PID: 328649 Comm: Chroot Helper Not tainted > 6.5.0-1-amd64 #1 Debian 6.5.3-1 > > [12569.564317] Hardware name: ASUS System Product Name/ROG STRIX B550-F > GAMING WIFI II, BIOS 3205 08/14/2023 > > [12569.564318] RIP: 0010:down_write+0x23/0x70 > > [12569.564324] Code: 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 > 48 89 fb e8 2e bc ff ff bf 01 00 00 00 e8 74 3a 53 ff 31 c0 ba 01 00 00 00 > 48 0f b1 13 75 33 65 48 8b 04 25 80 29 03 00 48 89 43 08 bf 01 > > [12569.564326] RSP: 0018:a189d736fc70 EFLAGS: 00010246 > > [12569.564328] RAX: RBX: 991989e936b8 RCX: > 891797aaef00 > > [12569.564330] RDX: 0001 RSI: 891989e645c0 RDI: > 8e7c95dc > > [12569.564331] RBP: R08: 0060 R09: > 80400014 > > [12569.564333] R10: 8918cbfeb7f8 R11: 0006 R12: > 7f7e5fd0 > > [12569.564334] R13: 0001 R14: 891989e645c0 R15: > 891989e64958 > > The CPU registers contain several addresses starting 89, except for > rbx which starts 99 (and is the faulting address). That looks like > a single bit got flipped. > > This could be due to a kernel bug, but is more likely a hardware > problem. Please test the RAM with memtest86+. Also if you've enabled > any overclocking options, turn those off. > > [...] > > After that the computer can't shutdown and systemd keeps waiting on > process PID 328649 (Chroot Helper). > > This (and the other BUG messages) are because that process crashed in > kernel mode and couldn't properly exit. > > Ben. > > -- > Ben Hutchings > Beware of bugs in the above code; > I have only proved it correct, not tried it. - Donald Knuth > > Hi,> The CPU registers contain several addresses starting 89, except for > rbx which starts 99 (and is the faulting address). That looks like> a single bit got flipped.Thanks for the explanati
Bug#1053122: linux-image-6.5.0-1-amd64: using smp_processor_id() in preemptible
Control: forwarded -1 https://mastodon.tn/@ghazi/111240903155846113 On Sunday, 15 October 2023 17:20:25 CEST Gabriel Francisco wrote: > I found this link: https://gitlab.freedesktop.org/drm/amd/-/issues/2877 > where other people are facing the same issue as me signature.asc Description: This is a digitally signed message part.
Bug#1053122: linux-image-6.5.0-1-amd64: using smp_processor_id() in preemptible
Just an update on this bug. I found this link: https://gitlab.freedesktop.org/drm/amd/-/issues/2877 where other people are facing the same issue as me and I can confirm that disabling FreeSync on my monitor settings makes the freezes/hangs to disappear. Best, *Gabriel Francisco* Linux User #507840 email: frc.gabriel[at]gmail.com On Thu, Oct 12, 2023 at 8:36 PM Gabriel Francisco wrote: > -- Forwarded message - > From: Gabriel Francisco > Date: Thu, Oct 12, 2023 at 8:23 PM > Subject: Re: Bug#1053122: linux-image-6.5.0-1-amd64: using > smp_processor_id() in preemptible > To: Ben Hutchings > > > Hi, > > > The CPU registers contain several addresses starting 89, except for > > rbx which starts 99 (and is the faulting address). That looks like > > a single bit got flipped. > > Thanks for the explanation! (now I know how to detect bit flips) :D > > > The first BUG message should be more meaningful that what comes after. > > This shows the kernel tried to access non-existent memory. > > Yes, I should have reported the first one indeed, I thought too much and > ended reporting the second one. Sorry about that. > > > This could be due to a kernel bug, but is more likely a hardware > > problem. Please test the RAM with memtest86+. Also if you've enabled > > any overclocking options, turn those off. > > Even with XMP(3000@1.35v) enabled (F4-3000C16-16GISB), memtest86+ ran for > 3 hours and printed PASS in the screen. > I removed the XMP profile from my memories and ordered new rams to check > if my current ones are faulty (or not). > > The message in dmesg was only one occasion. (but I reported it anyways) > > The hang does still happens with/without XMP when running 6.5.x kernel > series. It happens when maximizing a video (or time-to-time when my cursor > enters the video area) when using kernel 6.5.x. It does not happen with > kernel 6.1.x series. > > I'm using amgpu module. > > Greetings, > > *Gabriel Francisco* > Linux User #507840 > email: frc.gabriel[at]gmail.com > > > On Thu, Oct 5, 2023 at 1:15 AM Ben Hutchings wrote: > >> Control: retitle -1 linux-image-6.5.0-1-amd64: Kernel page fault in >> process exit due to bit flip >> Control: tag -1 moreinfo >> >> On Wed, 2023-09-27 at 20:45 +0200, Gabriel Francisco wrote: >> > Package: src:linux >> > Version: 6.5.3-1 >> > Severity: important >> > Tags: upstream >> > X-Debbugs-Cc: frc.gabr...@gmail.com >> > >> > Dear Maintainer, >> > >> > First of all thanks for your hard work! >> > >> > I noticed my computer started freezing for few seconds when >> entering/exiting >> > full screen videos in youtube using firefox and while trying to check >> if the >> > issue also afected chromium I saw the following message in dmesg: >> > >> > [12569.564300] BUG: unable to handle page fault for address: >> 991989e936b8 >> > [12569.564304] #PF: supervisor write access in kernel mode >> > [12569.564306] #PF: error_code(0x0002) - not-present page >> >> The first BUG message should be more meaningful that what comes after. >> This shows the kernel tried to access non-existent memory. >> >> > [12569.564308] PGD 0 P4D 0 >> > [12569.564311] Oops: 0002 [#1] PREEMPT SMP NOPTI >> > [12569.564314] CPU: 10 PID: 328649 Comm: Chroot Helper Not tainted >> 6.5.0-1-amd64 #1 Debian 6.5.3-1 >> > [12569.564317] Hardware name: ASUS System Product Name/ROG STRIX B550-F >> GAMING WIFI II, BIOS 3205 08/14/2023 >> > [12569.564318] RIP: 0010:down_write+0x23/0x70 >> > [12569.564324] Code: 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 >> 48 89 fb e8 2e bc ff ff bf 01 00 00 00 e8 74 3a 53 ff 31 c0 ba 01 00 00 00 >> 48 0f b1 13 75 33 65 48 8b 04 25 80 29 03 00 48 89 43 08 bf 01 >> > [12569.564326] RSP: 0018:a189d736fc70 EFLAGS: 00010246 >> > [12569.564328] RAX: RBX: 991989e936b8 RCX: >> 891797aaef00 >> > [12569.564330] RDX: 0001 RSI: 891989e645c0 RDI: >> 8e7c95dc >> > [12569.564331] RBP: R08: 0060 R09: >> 80400014 >> > [12569.564333] R10: 8918cbfeb7f8 R11: 0006 R12: >> 7f7e5fd0 >> > [12569.564334] R13: 0001 R14: 891989e645c0 R15: >> 891989e64958 >> >> The CPU registers contain several addresses starting 89, except for >> rbx which starts 99 (and is the faulting address). That looks like >> a single bit got flipped. >> >> This could be due to a kernel bug, but is more likely a hardware >> problem. Please test the RAM with memtest86+. Also if you've enabled >> any overclocking options, turn those off. >> >> [...] >> > After that the computer can't shutdown and systemd keeps waiting on >> process PID 328649 (Chroot Helper). >> >> This (and the other BUG messages) are because that process crashed in >> kernel mode and couldn't properly exit. >> >> Ben. >> >> -- >> Ben Hutchings >> Beware of bugs in the above code; >> I have only proved it correct, not tried it. - Donald Knuth >> >>
Bug#1053122: Fwd: Bug#1053122: linux-image-6.5.0-1-amd64: using smp_processor_id() in preemptible
-- Forwarded message - From: Gabriel Francisco Date: Thu, Oct 12, 2023 at 8:23 PM Subject: Re: Bug#1053122: linux-image-6.5.0-1-amd64: using smp_processor_id() in preemptible To: Ben Hutchings Hi, > The CPU registers contain several addresses starting 89, except for > rbx which starts 99 (and is the faulting address). That looks like > a single bit got flipped. Thanks for the explanation! (now I know how to detect bit flips) :D > The first BUG message should be more meaningful that what comes after. > This shows the kernel tried to access non-existent memory. Yes, I should have reported the first one indeed, I thought too much and ended reporting the second one. Sorry about that. > This could be due to a kernel bug, but is more likely a hardware > problem. Please test the RAM with memtest86+. Also if you've enabled > any overclocking options, turn those off. Even with XMP(3000@1.35v) enabled (F4-3000C16-16GISB), memtest86+ ran for 3 hours and printed PASS in the screen. I removed the XMP profile from my memories and ordered new rams to check if my current ones are faulty (or not). The message in dmesg was only one occasion. (but I reported it anyways) The hang does still happens with/without XMP when running 6.5.x kernel series. It happens when maximizing a video (or time-to-time when my cursor enters the video area) when using kernel 6.5.x. It does not happen with kernel 6.1.x series. I'm using amgpu module. Greetings, *Gabriel Francisco* Linux User #507840 email: frc.gabriel[at]gmail.com On Thu, Oct 5, 2023 at 1:15 AM Ben Hutchings wrote: > Control: retitle -1 linux-image-6.5.0-1-amd64: Kernel page fault in > process exit due to bit flip > Control: tag -1 moreinfo > > On Wed, 2023-09-27 at 20:45 +0200, Gabriel Francisco wrote: > > Package: src:linux > > Version: 6.5.3-1 > > Severity: important > > Tags: upstream > > X-Debbugs-Cc: frc.gabr...@gmail.com > > > > Dear Maintainer, > > > > First of all thanks for your hard work! > > > > I noticed my computer started freezing for few seconds when > entering/exiting > > full screen videos in youtube using firefox and while trying to check if > the > > issue also afected chromium I saw the following message in dmesg: > > > > [12569.564300] BUG: unable to handle page fault for address: > 991989e936b8 > > [12569.564304] #PF: supervisor write access in kernel mode > > [12569.564306] #PF: error_code(0x0002) - not-present page > > The first BUG message should be more meaningful that what comes after. > This shows the kernel tried to access non-existent memory. > > > [12569.564308] PGD 0 P4D 0 > > [12569.564311] Oops: 0002 [#1] PREEMPT SMP NOPTI > > [12569.564314] CPU: 10 PID: 328649 Comm: Chroot Helper Not tainted > 6.5.0-1-amd64 #1 Debian 6.5.3-1 > > [12569.564317] Hardware name: ASUS System Product Name/ROG STRIX B550-F > GAMING WIFI II, BIOS 3205 08/14/2023 > > [12569.564318] RIP: 0010:down_write+0x23/0x70 > > [12569.564324] Code: 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 > 48 89 fb e8 2e bc ff ff bf 01 00 00 00 e8 74 3a 53 ff 31 c0 ba 01 00 00 00 > 48 0f b1 13 75 33 65 48 8b 04 25 80 29 03 00 48 89 43 08 bf 01 > > [12569.564326] RSP: 0018:a189d736fc70 EFLAGS: 00010246 > > [12569.564328] RAX: RBX: 991989e936b8 RCX: > 891797aaef00 > > [12569.564330] RDX: 0001 RSI: 891989e645c0 RDI: > 8e7c95dc > > [12569.564331] RBP: R08: 0060 R09: > 80400014 > > [12569.564333] R10: 8918cbfeb7f8 R11: 0006 R12: > 7f7e5fd0 > > [12569.564334] R13: 0001 R14: 891989e645c0 R15: > 891989e64958 > > The CPU registers contain several addresses starting 89, except for > rbx which starts 99 (and is the faulting address). That looks like > a single bit got flipped. > > This could be due to a kernel bug, but is more likely a hardware > problem. Please test the RAM with memtest86+. Also if you've enabled > any overclocking options, turn those off. > > [...] > > After that the computer can't shutdown and systemd keeps waiting on > process PID 328649 (Chroot Helper). > > This (and the other BUG messages) are because that process crashed in > kernel mode and couldn't properly exit. > > Ben. > > -- > Ben Hutchings > Beware of bugs in the above code; > I have only proved it correct, not tried it. - Donald Knuth > >
Bug#1053122: linux-image-6.5.0-1-amd64: using smp_processor_id() in preemptible
Control: retitle -1 linux-image-6.5.0-1-amd64: Kernel page fault in process exit due to bit flip Control: tag -1 moreinfo On Wed, 2023-09-27 at 20:45 +0200, Gabriel Francisco wrote: > Package: src:linux > Version: 6.5.3-1 > Severity: important > Tags: upstream > X-Debbugs-Cc: frc.gabr...@gmail.com > > Dear Maintainer, > > First of all thanks for your hard work! > > I noticed my computer started freezing for few seconds when entering/exiting > full screen videos in youtube using firefox and while trying to check if the > issue also afected chromium I saw the following message in dmesg: > > [12569.564300] BUG: unable to handle page fault for address: 991989e936b8 > [12569.564304] #PF: supervisor write access in kernel mode > [12569.564306] #PF: error_code(0x0002) - not-present page The first BUG message should be more meaningful that what comes after. This shows the kernel tried to access non-existent memory. > [12569.564308] PGD 0 P4D 0 > [12569.564311] Oops: 0002 [#1] PREEMPT SMP NOPTI > [12569.564314] CPU: 10 PID: 328649 Comm: Chroot Helper Not tainted > 6.5.0-1-amd64 #1 Debian 6.5.3-1 > [12569.564317] Hardware name: ASUS System Product Name/ROG STRIX B550-F > GAMING WIFI II, BIOS 3205 08/14/2023 > [12569.564318] RIP: 0010:down_write+0x23/0x70 > [12569.564324] Code: 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 > fb e8 2e bc ff ff bf 01 00 00 00 e8 74 3a 53 ff 31 c0 ba 01 00 00 00 48 > 0f b1 13 75 33 65 48 8b 04 25 80 29 03 00 48 89 43 08 bf 01 > [12569.564326] RSP: 0018:a189d736fc70 EFLAGS: 00010246 > [12569.564328] RAX: RBX: 991989e936b8 RCX: > 891797aaef00 > [12569.564330] RDX: 0001 RSI: 891989e645c0 RDI: > 8e7c95dc > [12569.564331] RBP: R08: 0060 R09: > 80400014 > [12569.564333] R10: 8918cbfeb7f8 R11: 0006 R12: > 7f7e5fd0 > [12569.564334] R13: 0001 R14: 891989e645c0 R15: > 891989e64958 The CPU registers contain several addresses starting 89, except for rbx which starts 99 (and is the faulting address). That looks like a single bit got flipped. This could be due to a kernel bug, but is more likely a hardware problem. Please test the RAM with memtest86+. Also if you've enabled any overclocking options, turn those off. [...] > After that the computer can't shutdown and systemd keeps waiting on process > PID 328649 (Chroot Helper). This (and the other BUG messages) are because that process crashed in kernel mode and couldn't properly exit. Ben. -- Ben Hutchings Beware of bugs in the above code; I have only proved it correct, not tried it. - Donald Knuth signature.asc Description: This is a digitally signed message part
Bug#1053122: linux-image-6.5.0-1-amd64: using smp_processor_id() in preemptible
Package: src:linux Version: 6.5.3-1 Severity: important Tags: upstream X-Debbugs-Cc: frc.gabr...@gmail.com Dear Maintainer, First of all thanks for your hard work! I noticed my computer started freezing for few seconds when entering/exiting full screen videos in youtube using firefox and while trying to check if the issue also afected chromium I saw the following message in dmesg: [12569.564300] BUG: unable to handle page fault for address: 991989e936b8 [12569.564304] #PF: supervisor write access in kernel mode [12569.564306] #PF: error_code(0x0002) - not-present page [12569.564308] PGD 0 P4D 0 [12569.564311] Oops: 0002 [#1] PREEMPT SMP NOPTI [12569.564314] CPU: 10 PID: 328649 Comm: Chroot Helper Not tainted 6.5.0-1-amd64 #1 Debian 6.5.3-1 [12569.564317] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING WIFI II, BIOS 3205 08/14/2023 [12569.564318] RIP: 0010:down_write+0x23/0x70 [12569.564324] Code: 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb e8 2e bc ff ff bf 01 00 00 00 e8 74 3a 53 ff 31 c0 ba 01 00 00 00 48 0f b1 13 75 33 65 48 8b 04 25 80 29 03 00 48 89 43 08 bf 01 [12569.564326] RSP: 0018:a189d736fc70 EFLAGS: 00010246 [12569.564328] RAX: RBX: 991989e936b8 RCX: 891797aaef00 [12569.564330] RDX: 0001 RSI: 891989e645c0 RDI: 8e7c95dc [12569.564331] RBP: R08: 0060 R09: 80400014 [12569.564333] R10: 8918cbfeb7f8 R11: 0006 R12: 7f7e5fd0 [12569.564334] R13: 0001 R14: 891989e645c0 R15: 891989e64958 [12569.564336] FS: () GS:891c8ec8() knlGS: [12569.564338] CS: 0010 DS: ES: CR0: 80050033 [12569.564339] CR2: 991989e936b8 CR3: 0002c522 CR4: 00750ee0 [12569.564341] PKRU: 5554 [12569.564342] Call Trace: [12569.564345] [12569.564347] ? __die+0x23/0x70 [12569.564351] ? page_fault_oops+0x171/0x4f0 [12569.564354] ? srso_alias_return_thunk+0x5/0x7f [12569.564358] ? exc_page_fault+0x175/0x180 [12569.564362] ? asm_exc_page_fault+0x26/0x30 [12569.564367] ? down_write+0x1c/0x70 [12569.564370] ? down_write+0x23/0x70 [12569.564373] free_pgtables+0x1ba/0x1d0 [12569.564379] exit_mmap+0x141/0x310 [12569.564385] __mmput+0x3e/0x130 [12569.564389] do_exit+0x305/0xb20 [12569.564393] do_group_exit+0x31/0x80 [12569.564396] __x64_sys_exit_group+0x18/0x20 [12569.564398] do_syscall_64+0x60/0xc0 [12569.564401] ? preempt_count_add+0x4b/0xa0 [12569.564404] ? srso_alias_return_thunk+0x5/0x7f [12569.564407] ? up_read+0x3b/0x80 [12569.564410] ? srso_alias_return_thunk+0x5/0x7f [12569.564412] ? do_user_addr_fault+0x235/0x640 [12569.564415] ? srso_alias_return_thunk+0x5/0x7f [12569.564417] ? fpregs_assert_state_consistent+0x26/0x50 [12569.564419] ? srso_alias_return_thunk+0x5/0x7f [12569.564422] ? exit_to_user_mode_prepare+0x40/0x1d0 [12569.564425] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [12569.564428] RIP: 0033:0x7f7e9e6f1995 [12569.564430] Code: Unable to access opcode bytes at 0x7f7e9e6f196b. [12569.564432] RSP: 002b:7f7e798bd288 EFLAGS: 0206 ORIG_RAX: 00e7 [12569.564434] RAX: ffda RBX: 7f7e6aa5bee0 RCX: 7f7e9e6f1995 [12569.564435] RDX: 00e7 RSI: ff48 RDI: [12569.564437] RBP: 7f7e798bd298 R08: R09: [12569.564438] R10: 7f7e9e630178 R11: 0206 R12: 000b [12569.564440] R13: R14: 7f7e6aa5bee0 R15: 7f7e798bd560 [12569.56] [12569.564445] Modules linked in: uinput mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag tls rfcomm nf_conntrack_netlink xfrm_user xfrm_algo snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nvme_fabrics ctr ccm qrtr overlay nfnetlink_log cmac lz4 lz4_compress algif_hash zram algif_skcipher zsmalloc af_alg bnep wireguard libchacha20poly1305 chacha_x86_64 poly1305_x86_64 curve25519_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nls_ascii nls_cp437 vfat fat nft_chain_nat xt_MASQUERADE nf_nat xt_addrtype xt_comment xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport ipt_REJECT nf_reject_ipv4 xt_CHECKSUM xt_tcpudp nft_compat nf_tables mt7921e btusb mt7921_common btrtl btbcm btintel mt76_connac_lib btmtk mt76 intel_rapl_msr intel_rapl_common bluetooth mac80211 snd_hda_codec_realtek sha3_generic jitterentropy_rng snd_hda_codec_generic edac_mce_amd snd_hda_codec_hdmi binfmt_misc drbg ip_set_hash_net ansi_cprng kvm_amd ip_set snd_hda_intel [12569.564507] snd_intel_dspcfg ecdh_generic libarc4 snd_intel_sdw_acpi ecc nfnetlink eeepc_wmi asus_nb_wmi kvm snd_hda_codec cfg80211 asus_wmi battery snd_hda_core ledtrig_audio snd_hwdep irqbypass sparse_keymap snd_pcm platform_profile rapl snd_timer rfkill wmi_bmof snd sp5100_tco ccp soundcore watchdog k10temp nct6775 nct6775_core joydev