Re: Kernel BUG: kernel NULL pointer dereference on Windows server connect/disconnect
There is now a patch for this issue, which is caused by a "use-after-free" situation. https://marc.info/?l=target-devel=167463560822293=2 On Wednesday, 25 January 2023 at 06:53:01 UTC+1 Forza wrote: > Thanks for the feedback. I have posted to the target-devel mailing list > with some more information about distro etc. Hopefully we can get to the > root of this elusive issue there. > > On Tuesday, 24 January 2023 at 19:33:27 UTC+1 The Lee-Man wrote: > >> This is a mailing list for open-iscsi, the iSCSI initiator. Your issues >> seems to be with LIO, the in-kernel target code. >> >> I suggest the target development kernel mailing list: >> target...@vger.kernel.org >> >> It would help if, when you post there, you include your distribution. >> Also, is there any other way to reproduce it, other than using a MS >> initiator (so that others might reproduce it)? >> >> I use LIO (targetcli) targets regularly and haven't seen this issue. >> >> On Monday, January 23, 2023 at 12:22:49 AM UTC-8 Forza wrote: >> >>> Hi! >>> I have an issue with spontaneous reboots of my ISCSI target server. It >>> seems to happen especially often when Windows Server 2016 clients reboot, >>> but I can't say for sure it happens every time either. >>> >>> The target is using a fileio backing store ontop of a BTRFS filesystem. >>> I have write-back enabled on this target (but it happens even with write >>> caching disabled). >>> >>> I am running Alpine Linux. The issue has happened since first installed >>> about a year ago. It is at least happeneing on kernels 5.15.60+ and 6.1.4, >>> 6.1.6. >>> >>> >>> I managed to capture the following trace using pstore: >>> >>> <6>[ 69.123671] RPC: Registered named UNIX socket transport module. >>> <6>[ 69.123674] RPC: Registered udp transport module. >>> <6>[ 69.123675] RPC: Registered tcp transport module. >>> <6>[ 69.123675] RPC: Registered tcp NFSv4.1 backchannel transport >>> module. >>> <6>[ 70.281192] NFSD: Using UMH upcall client tracking operations. >>> <6>[ 70.281199] NFSD: starting 90-second grace period (net f000) >>> <6>[ 75.683777] Rounding down aligned max_sectors from 4294967295 to >>> 4294967288 >>> <4>[ 76.014381] ignoring deprecated emulate_dpo attribute >>> <4>[ 76.014497] ignoring deprecated emulate_fua_read attribute >>> <4>[ 76.019775] dev[4a04ffbe]: Backstore name >>> 'sesoco3092-export' is too long for INQUIRY_MODEL, truncating to 15 >>> characters >>> <3>[ 76.813023] bond0: (slave igb0): speed changed to 0 on port 1 >>> <6>[ 76.868842] bond0: (slave igb0): link status definitely down, >>> disabling slave >>> <6>[ 76.868849] bond0: active interface up! >>> <3>[ 78.912829] bond0: (slave igb1): speed changed to 0 on port 2 >>> <6>[ 78.982125] bond0: (slave igb1): link status definitely down, >>> disabling slave >>> <6>[ 78.982149] bond0: now running without any active interface! >>> <6>[ 80.332786] igb :03:00.0 igb0: igb: igb0 NIC Link is Up 1000 >>> Mbps Full Duplex, Flow Control: RX >>> <6>[ 80.545683] bond0: (slave igb0): link status definitely up, 1000 >>> Mbps full duplex >>> <6>[ 80.545691] bond0: active interface up! >>> <6>[ 82.449424] igb :04:00.0 igb1: igb: igb1 NIC Link is Up 1000 >>> Mbps Full Duplex, Flow Control: RX >>> <6>[ 82.622345] bond0: (slave igb1): link status definitely up, 1000 >>> Mbps full duplex >>> <4>[ 7428.832735] hrtimer: interrupt took 7712 ns >>> <6>[ 8364.650898] ice :01:00.0 ice0: NIC Link is up 25 Gbps Full >>> Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg Advertised: >>> Off, Autoneg Negotiated: False, Flow Control: None >>> <6>[ 8364.651003] IPv6: ADDRCONF(NETDEV_CHANGE): ice0: link becomes ready >>> <6>[11218.918482] wireguard: WireGuard 1.0.0 loaded. See >>> www.wireguard.com for information. >>> <6>[11218.918484] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld < >>> ja...@zx2c4.com>. All Rights Reserved. >>> <3>[154888.236235] Did not receive response to NOPIN on CID: 1, failing >>> connection for I_T Nexus >>> iqn.1991-05.com.microsoft:srv,i,0x41370001,iqn.2022-02.com.example.srv04:srv,t,0x01 >>> <3>[154908.716136] Time2Retain timer expired for SID: 1, cleaning up >>> iSCSI session. >>> <1>[154908.716177] BUG: kernel NULL pointer dereference, address: >>> 0140 >>> <1>[154908.717023] #PF: supervisor write access in kernel mode >>> <1>[154908.717842] #PF: error_code(0x0002) - not-present page >>> <6>[154908.718667] PGD 0 P4D 0 >>> <4>[154908.719486] Oops: 0002 [#1] PREEMPT SMP PTI >>> <4>[154908.720289] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.3-0-lts >>> #1-Alpine >>> <4>[154908.721089] Hardware name: Supermicro Super Server/X11SCL-F, BIOS >>> 1.9 09/21/2022 >>> <4>[154908.721888] RIP: 0010:sbitmap_queue_clear+0x3a/0xa0 >>> <4>[154908.722667] Code: 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 >>> 8b 4f 04 ba ff ff ff ff 89 f0 d3 e2 d3 e8 f7 d2 48 c1 e0 07 48 03 47 10 21 >>> f2 48 0f ab 50
Re: Kernel BUG: kernel NULL pointer dereference on Windows server connect/disconnect
Thanks for the feedback. I have posted to the target-devel mailing list with some more information about distro etc. Hopefully we can get to the root of this elusive issue there. On Tuesday, 24 January 2023 at 19:33:27 UTC+1 The Lee-Man wrote: > This is a mailing list for open-iscsi, the iSCSI initiator. Your issues > seems to be with LIO, the in-kernel target code. > > I suggest the target development kernel mailing list: > target...@vger.kernel.org > > It would help if, when you post there, you include your distribution. > Also, is there any other way to reproduce it, other than using a MS > initiator (so that others might reproduce it)? > > I use LIO (targetcli) targets regularly and haven't seen this issue. > > On Monday, January 23, 2023 at 12:22:49 AM UTC-8 Forza wrote: > >> Hi! >> I have an issue with spontaneous reboots of my ISCSI target server. It >> seems to happen especially often when Windows Server 2016 clients reboot, >> but I can't say for sure it happens every time either. >> >> The target is using a fileio backing store ontop of a BTRFS filesystem. I >> have write-back enabled on this target (but it happens even with write >> caching disabled). >> >> I am running Alpine Linux. The issue has happened since first installed >> about a year ago. It is at least happeneing on kernels 5.15.60+ and 6.1.4, >> 6.1.6. >> >> >> I managed to capture the following trace using pstore: >> >> <6>[ 69.123671] RPC: Registered named UNIX socket transport module. >> <6>[ 69.123674] RPC: Registered udp transport module. >> <6>[ 69.123675] RPC: Registered tcp transport module. >> <6>[ 69.123675] RPC: Registered tcp NFSv4.1 backchannel transport >> module. >> <6>[ 70.281192] NFSD: Using UMH upcall client tracking operations. >> <6>[ 70.281199] NFSD: starting 90-second grace period (net f000) >> <6>[ 75.683777] Rounding down aligned max_sectors from 4294967295 to >> 4294967288 >> <4>[ 76.014381] ignoring deprecated emulate_dpo attribute >> <4>[ 76.014497] ignoring deprecated emulate_fua_read attribute >> <4>[ 76.019775] dev[4a04ffbe]: Backstore name >> 'sesoco3092-export' is too long for INQUIRY_MODEL, truncating to 15 >> characters >> <3>[ 76.813023] bond0: (slave igb0): speed changed to 0 on port 1 >> <6>[ 76.868842] bond0: (slave igb0): link status definitely down, >> disabling slave >> <6>[ 76.868849] bond0: active interface up! >> <3>[ 78.912829] bond0: (slave igb1): speed changed to 0 on port 2 >> <6>[ 78.982125] bond0: (slave igb1): link status definitely down, >> disabling slave >> <6>[ 78.982149] bond0: now running without any active interface! >> <6>[ 80.332786] igb :03:00.0 igb0: igb: igb0 NIC Link is Up 1000 >> Mbps Full Duplex, Flow Control: RX >> <6>[ 80.545683] bond0: (slave igb0): link status definitely up, 1000 >> Mbps full duplex >> <6>[ 80.545691] bond0: active interface up! >> <6>[ 82.449424] igb :04:00.0 igb1: igb: igb1 NIC Link is Up 1000 >> Mbps Full Duplex, Flow Control: RX >> <6>[ 82.622345] bond0: (slave igb1): link status definitely up, 1000 >> Mbps full duplex >> <4>[ 7428.832735] hrtimer: interrupt took 7712 ns >> <6>[ 8364.650898] ice :01:00.0 ice0: NIC Link is up 25 Gbps Full >> Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg Advertised: >> Off, Autoneg Negotiated: False, Flow Control: None >> <6>[ 8364.651003] IPv6: ADDRCONF(NETDEV_CHANGE): ice0: link becomes ready >> <6>[11218.918482] wireguard: WireGuard 1.0.0 loaded. See >> www.wireguard.com for information. >> <6>[11218.918484] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld < >> ja...@zx2c4.com>. All Rights Reserved. >> <3>[154888.236235] Did not receive response to NOPIN on CID: 1, failing >> connection for I_T Nexus >> iqn.1991-05.com.microsoft:srv,i,0x41370001,iqn.2022-02.com.example.srv04:srv,t,0x01 >> <3>[154908.716136] Time2Retain timer expired for SID: 1, cleaning up >> iSCSI session. >> <1>[154908.716177] BUG: kernel NULL pointer dereference, address: >> 0140 >> <1>[154908.717023] #PF: supervisor write access in kernel mode >> <1>[154908.717842] #PF: error_code(0x0002) - not-present page >> <6>[154908.718667] PGD 0 P4D 0 >> <4>[154908.719486] Oops: 0002 [#1] PREEMPT SMP PTI >> <4>[154908.720289] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.3-0-lts >> #1-Alpine >> <4>[154908.721089] Hardware name: Supermicro Super Server/X11SCL-F, BIOS >> 1.9 09/21/2022 >> <4>[154908.721888] RIP: 0010:sbitmap_queue_clear+0x3a/0xa0 >> <4>[154908.722667] Code: 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 >> 8b 4f 04 ba ff ff ff ff 89 f0 d3 e2 d3 e8 f7 d2 48 c1 e0 07 48 03 47 10 21 >> f2 48 0f ab 50 40 c7 44 24 04 01 00 00 00 48 8d 74 24 04 48 89 df >> <4>[154908.724376] RSP: 0018:a6e43d48 EFLAGS: 00010202 >> <4>[154908.725249] RAX: 0100 RBX: 8ddc43039428 RCX: >> 0005 >> <4>[154908.726137] RDX: 000b RSI: 004b RDI: >> 8ddc43039428 >>
Re: Kernel BUG: kernel NULL pointer dereference on Windows server connect/disconnect
This is a mailing list for open-iscsi, the iSCSI initiator. Your issues seems to be with LIO, the in-kernel target code. I suggest the target development kernel mailing list: target-de...@vger.kernel.org It would help if, when you post there, you include your distribution. Also, is there any other way to reproduce it, other than using a MS initiator (so that others might reproduce it)? I use LIO (targetcli) targets regularly and haven't seen this issue. On Monday, January 23, 2023 at 12:22:49 AM UTC-8 Forza wrote: > Hi! > I have an issue with spontaneous reboots of my ISCSI target server. It > seems to happen especially often when Windows Server 2016 clients reboot, > but I can't say for sure it happens every time either. > > The target is using a fileio backing store ontop of a BTRFS filesystem. I > have write-back enabled on this target (but it happens even with write > caching disabled). > > I am running Alpine Linux. The issue has happened since first installed > about a year ago. It is at least happeneing on kernels 5.15.60+ and 6.1.4, > 6.1.6. > > > I managed to capture the following trace using pstore: > > <6>[ 69.123671] RPC: Registered named UNIX socket transport module. > <6>[ 69.123674] RPC: Registered udp transport module. > <6>[ 69.123675] RPC: Registered tcp transport module. > <6>[ 69.123675] RPC: Registered tcp NFSv4.1 backchannel transport module. > <6>[ 70.281192] NFSD: Using UMH upcall client tracking operations. > <6>[ 70.281199] NFSD: starting 90-second grace period (net f000) > <6>[ 75.683777] Rounding down aligned max_sectors from 4294967295 to > 4294967288 > <4>[ 76.014381] ignoring deprecated emulate_dpo attribute > <4>[ 76.014497] ignoring deprecated emulate_fua_read attribute > <4>[ 76.019775] dev[4a04ffbe]: Backstore name > 'sesoco3092-export' is too long for INQUIRY_MODEL, truncating to 15 > characters > <3>[ 76.813023] bond0: (slave igb0): speed changed to 0 on port 1 > <6>[ 76.868842] bond0: (slave igb0): link status definitely down, > disabling slave > <6>[ 76.868849] bond0: active interface up! > <3>[ 78.912829] bond0: (slave igb1): speed changed to 0 on port 2 > <6>[ 78.982125] bond0: (slave igb1): link status definitely down, > disabling slave > <6>[ 78.982149] bond0: now running without any active interface! > <6>[ 80.332786] igb :03:00.0 igb0: igb: igb0 NIC Link is Up 1000 > Mbps Full Duplex, Flow Control: RX > <6>[ 80.545683] bond0: (slave igb0): link status definitely up, 1000 > Mbps full duplex > <6>[ 80.545691] bond0: active interface up! > <6>[ 82.449424] igb :04:00.0 igb1: igb: igb1 NIC Link is Up 1000 > Mbps Full Duplex, Flow Control: RX > <6>[ 82.622345] bond0: (slave igb1): link status definitely up, 1000 > Mbps full duplex > <4>[ 7428.832735] hrtimer: interrupt took 7712 ns > <6>[ 8364.650898] ice :01:00.0 ice0: NIC Link is up 25 Gbps Full > Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg Advertised: > Off, Autoneg Negotiated: False, Flow Control: None > <6>[ 8364.651003] IPv6: ADDRCONF(NETDEV_CHANGE): ice0: link becomes ready > <6>[11218.918482] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com > for information. > <6>[11218.918484] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld < > ja...@zx2c4.com>. All Rights Reserved. > <3>[154888.236235] Did not receive response to NOPIN on CID: 1, failing > connection for I_T Nexus > iqn.1991-05.com.microsoft:srv,i,0x41370001,iqn.2022-02.com.example.srv04:srv,t,0x01 > <3>[154908.716136] Time2Retain timer expired for SID: 1, cleaning up iSCSI > session. > <1>[154908.716177] BUG: kernel NULL pointer dereference, address: > 0140 > <1>[154908.717023] #PF: supervisor write access in kernel mode > <1>[154908.717842] #PF: error_code(0x0002) - not-present page > <6>[154908.718667] PGD 0 P4D 0 > <4>[154908.719486] Oops: 0002 [#1] PREEMPT SMP PTI > <4>[154908.720289] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.3-0-lts > #1-Alpine > <4>[154908.721089] Hardware name: Supermicro Super Server/X11SCL-F, BIOS > 1.9 09/21/2022 > <4>[154908.721888] RIP: 0010:sbitmap_queue_clear+0x3a/0xa0 > <4>[154908.722667] Code: 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 > 8b 4f 04 ba ff ff ff ff 89 f0 d3 e2 d3 e8 f7 d2 48 c1 e0 07 48 03 47 10 21 > f2 48 0f ab 50 40 c7 44 24 04 01 00 00 00 48 8d 74 24 04 48 89 df > <4>[154908.724376] RSP: 0018:a6e43d48 EFLAGS: 00010202 > <4>[154908.725249] RAX: 0100 RBX: 8ddc43039428 RCX: > 0005 > <4>[154908.726137] RDX: 000b RSI: 004b RDI: > 8ddc43039428 > <4>[154908.727030] RBP: 004b R08: R09: > > <4>[154908.727921] R10: R11: R12: > > <4>[154908.728812] R13: 8ddc43039380 R14: 8ddc49015370 R15: > 8ddc490157e0 > <4>[154908.729715] FS: () GS:8df9aec0() >
Kernel BUG: kernel NULL pointer dereference on Windows server connect/disconnect
Hi! I have an issue with spontaneous reboots of my ISCSI target server. It seems to happen especially often when Windows Server 2016 clients reboot, but I can't say for sure it happens every time either. The target is using a fileio backing store ontop of a BTRFS filesystem. I have write-back enabled on this target (but it happens even with write caching disabled). I am running Alpine Linux. The issue has happened since first installed about a year ago. It is at least happeneing on kernels 5.15.60+ and 6.1.4, 6.1.6. I managed to capture the following trace using pstore: <6>[ 69.123671] RPC: Registered named UNIX socket transport module. <6>[ 69.123674] RPC: Registered udp transport module. <6>[ 69.123675] RPC: Registered tcp transport module. <6>[ 69.123675] RPC: Registered tcp NFSv4.1 backchannel transport module. <6>[ 70.281192] NFSD: Using UMH upcall client tracking operations. <6>[ 70.281199] NFSD: starting 90-second grace period (net f000) <6>[ 75.683777] Rounding down aligned max_sectors from 4294967295 to 4294967288 <4>[ 76.014381] ignoring deprecated emulate_dpo attribute <4>[ 76.014497] ignoring deprecated emulate_fua_read attribute <4>[ 76.019775] dev[4a04ffbe]: Backstore name 'sesoco3092-export' is too long for INQUIRY_MODEL, truncating to 15 characters <3>[ 76.813023] bond0: (slave igb0): speed changed to 0 on port 1 <6>[ 76.868842] bond0: (slave igb0): link status definitely down, disabling slave <6>[ 76.868849] bond0: active interface up! <3>[ 78.912829] bond0: (slave igb1): speed changed to 0 on port 2 <6>[ 78.982125] bond0: (slave igb1): link status definitely down, disabling slave <6>[ 78.982149] bond0: now running without any active interface! <6>[ 80.332786] igb :03:00.0 igb0: igb: igb0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX <6>[ 80.545683] bond0: (slave igb0): link status definitely up, 1000 Mbps full duplex <6>[ 80.545691] bond0: active interface up! <6>[ 82.449424] igb :04:00.0 igb1: igb: igb1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX <6>[ 82.622345] bond0: (slave igb1): link status definitely up, 1000 Mbps full duplex <4>[ 7428.832735] hrtimer: interrupt took 7712 ns <6>[ 8364.650898] ice :01:00.0 ice0: NIC Link is up 25 Gbps Full Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg Advertised: Off, Autoneg Negotiated: False, Flow Control: None <6>[ 8364.651003] IPv6: ADDRCONF(NETDEV_CHANGE): ice0: link becomes ready <6>[11218.918482] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information. <6>[11218.918484] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld . All Rights Reserved. <3>[154888.236235] Did not receive response to NOPIN on CID: 1, failing connection for I_T Nexus iqn.1991-05.com.microsoft:srv,i,0x41370001,iqn.2022-02.com.example.srv04:srv,t,0x01 <3>[154908.716136] Time2Retain timer expired for SID: 1, cleaning up iSCSI session. <1>[154908.716177] BUG: kernel NULL pointer dereference, address: 0140 <1>[154908.717023] #PF: supervisor write access in kernel mode <1>[154908.717842] #PF: error_code(0x0002) - not-present page <6>[154908.718667] PGD 0 P4D 0 <4>[154908.719486] Oops: 0002 [#1] PREEMPT SMP PTI <4>[154908.720289] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.3-0-lts #1-Alpine <4>[154908.721089] Hardware name: Supermicro Super Server/X11SCL-F, BIOS 1.9 09/21/2022 <4>[154908.721888] RIP: 0010:sbitmap_queue_clear+0x3a/0xa0 <4>[154908.722667] Code: 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 8b 4f 04 ba ff ff ff ff 89 f0 d3 e2 d3 e8 f7 d2 48 c1 e0 07 48 03 47 10 21 f2 48 0f ab 50 40 c7 44 24 04 01 00 00 00 48 8d 74 24 04 48 89 df <4>[154908.724376] RSP: 0018:a6e43d48 EFLAGS: 00010202 <4>[154908.725249] RAX: 0100 RBX: 8ddc43039428 RCX: 0005 <4>[154908.726137] RDX: 000b RSI: 004b RDI: 8ddc43039428 <4>[154908.727030] RBP: 004b R08: R09: <4>[154908.727921] R10: R11: R12: <4>[154908.728812] R13: 8ddc43039380 R14: 8ddc49015370 R15: 8ddc490157e0 <4>[154908.729715] FS: () GS:8df9aec0() knlGS: <4>[154908.730638] CS: 0010 DS: ES: CR0: 80050033 <4>[154908.731569] CR2: 0140 CR3: 0002dc20a005 CR4: 003706f0 <4>[154908.732526] DR0: DR1: DR2: <4>[154908.733487] DR3: DR6: fffe0ff0 DR7: 0400 <4>[154908.734453] Call Trace: <4>[154908.735411] <4>[154908.736371] target_release_cmd_kref+0xd1/0x1f0 [target_core_mod] <4>[154908.737425] transport_generic_free_cmd+0xd1/0x180 [target_core_mod] <4>[154908.738485] iscsit_free_cmd+0x53/0xd0 [iscsi_target_mod] <4>[154908.739536] iscsit_free_connection_recovery_entries+0x29d/0x320 [iscsi_target_mod]