Re: Kernel BUG: kernel NULL pointer dereference on Windows server connect/disconnect

2023-01-25 Thread Forza
There is now a patch  for this issue, which is caused by a "use-after-free" 
situation.
https://marc.info/?l=target-devel=167463560822293=2


On Wednesday, 25 January 2023 at 06:53:01 UTC+1 Forza wrote:

> Thanks for the feedback. I have posted to the target-devel mailing list 
> with some more information about distro etc. Hopefully we can get to the 
> root of this elusive issue there.
>
> On Tuesday, 24 January 2023 at 19:33:27 UTC+1 The Lee-Man wrote:
>
>> This is a mailing list for open-iscsi, the iSCSI initiator. Your issues 
>> seems to be with LIO, the in-kernel target code.
>>
>> I suggest the target development kernel mailing list: 
>> target...@vger.kernel.org
>>
>> It would help if, when you post there, you include your distribution. 
>> Also, is there any other way to reproduce it, other than using a MS 
>> initiator (so that others might reproduce it)?
>>
>> I use LIO (targetcli) targets regularly and haven't seen this issue.
>>
>> On Monday, January 23, 2023 at 12:22:49 AM UTC-8 Forza wrote:
>>
>>> Hi!
>>> I have an issue with spontaneous reboots of my ISCSI target server. It 
>>> seems to happen especially often when Windows Server 2016 clients reboot, 
>>> but I can't say for sure it happens every time either.
>>>
>>> The target is using a fileio backing store ontop of a BTRFS filesystem. 
>>> I have write-back enabled on this target (but it happens even with write 
>>> caching disabled).
>>>
>>> I am running Alpine Linux. The issue has happened since first installed 
>>> about a year ago. It is at least happeneing on kernels 5.15.60+ and 6.1.4, 
>>> 6.1.6.
>>>
>>>
>>> I managed to capture the following trace using pstore:
>>>
>>> <6>[   69.123671] RPC: Registered named UNIX socket transport module.
>>> <6>[   69.123674] RPC: Registered udp transport module.
>>> <6>[   69.123675] RPC: Registered tcp transport module.
>>> <6>[   69.123675] RPC: Registered tcp NFSv4.1 backchannel transport 
>>> module.
>>> <6>[   70.281192] NFSD: Using UMH upcall client tracking operations.
>>> <6>[   70.281199] NFSD: starting 90-second grace period (net f000)
>>> <6>[   75.683777] Rounding down aligned max_sectors from 4294967295 to 
>>> 4294967288
>>> <4>[   76.014381] ignoring deprecated emulate_dpo attribute
>>> <4>[   76.014497] ignoring deprecated emulate_fua_read attribute
>>> <4>[   76.019775] dev[4a04ffbe]: Backstore name 
>>> 'sesoco3092-export' is too long for INQUIRY_MODEL, truncating to 15 
>>> characters
>>> <3>[   76.813023] bond0: (slave igb0): speed changed to 0 on port 1
>>> <6>[   76.868842] bond0: (slave igb0): link status definitely down, 
>>> disabling slave
>>> <6>[   76.868849] bond0: active interface up!
>>> <3>[   78.912829] bond0: (slave igb1): speed changed to 0 on port 2
>>> <6>[   78.982125] bond0: (slave igb1): link status definitely down, 
>>> disabling slave
>>> <6>[   78.982149] bond0: now running without any active interface!
>>> <6>[   80.332786] igb :03:00.0 igb0: igb: igb0 NIC Link is Up 1000 
>>> Mbps Full Duplex, Flow Control: RX
>>> <6>[   80.545683] bond0: (slave igb0): link status definitely up, 1000 
>>> Mbps full duplex
>>> <6>[   80.545691] bond0: active interface up!
>>> <6>[   82.449424] igb :04:00.0 igb1: igb: igb1 NIC Link is Up 1000 
>>> Mbps Full Duplex, Flow Control: RX
>>> <6>[   82.622345] bond0: (slave igb1): link status definitely up, 1000 
>>> Mbps full duplex
>>> <4>[ 7428.832735] hrtimer: interrupt took 7712 ns
>>> <6>[ 8364.650898] ice :01:00.0 ice0: NIC Link is up 25 Gbps Full 
>>> Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg Advertised: 
>>> Off, Autoneg Negotiated: False, Flow Control: None
>>> <6>[ 8364.651003] IPv6: ADDRCONF(NETDEV_CHANGE): ice0: link becomes ready
>>> <6>[11218.918482] wireguard: WireGuard 1.0.0 loaded. See 
>>> www.wireguard.com for information.
>>> <6>[11218.918484] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <
>>> ja...@zx2c4.com>. All Rights Reserved.
>>> <3>[154888.236235] Did not receive response to NOPIN on CID: 1, failing 
>>> connection for I_T Nexus 
>>> iqn.1991-05.com.microsoft:srv,i,0x41370001,iqn.2022-02.com.example.srv04:srv,t,0x01
>>> <3>[154908.716136] Time2Retain timer expired for SID: 1, cleaning up 
>>> iSCSI session.
>>> <1>[154908.716177] BUG: kernel NULL pointer dereference, address: 
>>> 0140
>>> <1>[154908.717023] #PF: supervisor write access in kernel mode
>>> <1>[154908.717842] #PF: error_code(0x0002) - not-present page
>>> <6>[154908.718667] PGD 0 P4D 0 
>>> <4>[154908.719486] Oops: 0002 [#1] PREEMPT SMP PTI
>>> <4>[154908.720289] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.3-0-lts 
>>> #1-Alpine
>>> <4>[154908.721089] Hardware name: Supermicro Super Server/X11SCL-F, BIOS 
>>> 1.9 09/21/2022
>>> <4>[154908.721888] RIP: 0010:sbitmap_queue_clear+0x3a/0xa0
>>> <4>[154908.722667] Code: 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 
>>> 8b 4f 04 ba ff ff ff ff 89 f0 d3 e2 d3 e8 f7 d2 48 c1 e0 07 48 03 47 10 21 
>>> f2  48 0f ab 50 

Re: Kernel BUG: kernel NULL pointer dereference on Windows server connect/disconnect

2023-01-24 Thread Forza
Thanks for the feedback. I have posted to the target-devel mailing list 
with some more information about distro etc. Hopefully we can get to the 
root of this elusive issue there.

On Tuesday, 24 January 2023 at 19:33:27 UTC+1 The Lee-Man wrote:

> This is a mailing list for open-iscsi, the iSCSI initiator. Your issues 
> seems to be with LIO, the in-kernel target code.
>
> I suggest the target development kernel mailing list: 
> target...@vger.kernel.org
>
> It would help if, when you post there, you include your distribution. 
> Also, is there any other way to reproduce it, other than using a MS 
> initiator (so that others might reproduce it)?
>
> I use LIO (targetcli) targets regularly and haven't seen this issue.
>
> On Monday, January 23, 2023 at 12:22:49 AM UTC-8 Forza wrote:
>
>> Hi!
>> I have an issue with spontaneous reboots of my ISCSI target server. It 
>> seems to happen especially often when Windows Server 2016 clients reboot, 
>> but I can't say for sure it happens every time either.
>>
>> The target is using a fileio backing store ontop of a BTRFS filesystem. I 
>> have write-back enabled on this target (but it happens even with write 
>> caching disabled).
>>
>> I am running Alpine Linux. The issue has happened since first installed 
>> about a year ago. It is at least happeneing on kernels 5.15.60+ and 6.1.4, 
>> 6.1.6.
>>
>>
>> I managed to capture the following trace using pstore:
>>
>> <6>[   69.123671] RPC: Registered named UNIX socket transport module.
>> <6>[   69.123674] RPC: Registered udp transport module.
>> <6>[   69.123675] RPC: Registered tcp transport module.
>> <6>[   69.123675] RPC: Registered tcp NFSv4.1 backchannel transport 
>> module.
>> <6>[   70.281192] NFSD: Using UMH upcall client tracking operations.
>> <6>[   70.281199] NFSD: starting 90-second grace period (net f000)
>> <6>[   75.683777] Rounding down aligned max_sectors from 4294967295 to 
>> 4294967288
>> <4>[   76.014381] ignoring deprecated emulate_dpo attribute
>> <4>[   76.014497] ignoring deprecated emulate_fua_read attribute
>> <4>[   76.019775] dev[4a04ffbe]: Backstore name 
>> 'sesoco3092-export' is too long for INQUIRY_MODEL, truncating to 15 
>> characters
>> <3>[   76.813023] bond0: (slave igb0): speed changed to 0 on port 1
>> <6>[   76.868842] bond0: (slave igb0): link status definitely down, 
>> disabling slave
>> <6>[   76.868849] bond0: active interface up!
>> <3>[   78.912829] bond0: (slave igb1): speed changed to 0 on port 2
>> <6>[   78.982125] bond0: (slave igb1): link status definitely down, 
>> disabling slave
>> <6>[   78.982149] bond0: now running without any active interface!
>> <6>[   80.332786] igb :03:00.0 igb0: igb: igb0 NIC Link is Up 1000 
>> Mbps Full Duplex, Flow Control: RX
>> <6>[   80.545683] bond0: (slave igb0): link status definitely up, 1000 
>> Mbps full duplex
>> <6>[   80.545691] bond0: active interface up!
>> <6>[   82.449424] igb :04:00.0 igb1: igb: igb1 NIC Link is Up 1000 
>> Mbps Full Duplex, Flow Control: RX
>> <6>[   82.622345] bond0: (slave igb1): link status definitely up, 1000 
>> Mbps full duplex
>> <4>[ 7428.832735] hrtimer: interrupt took 7712 ns
>> <6>[ 8364.650898] ice :01:00.0 ice0: NIC Link is up 25 Gbps Full 
>> Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg Advertised: 
>> Off, Autoneg Negotiated: False, Flow Control: None
>> <6>[ 8364.651003] IPv6: ADDRCONF(NETDEV_CHANGE): ice0: link becomes ready
>> <6>[11218.918482] wireguard: WireGuard 1.0.0 loaded. See 
>> www.wireguard.com for information.
>> <6>[11218.918484] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <
>> ja...@zx2c4.com>. All Rights Reserved.
>> <3>[154888.236235] Did not receive response to NOPIN on CID: 1, failing 
>> connection for I_T Nexus 
>> iqn.1991-05.com.microsoft:srv,i,0x41370001,iqn.2022-02.com.example.srv04:srv,t,0x01
>> <3>[154908.716136] Time2Retain timer expired for SID: 1, cleaning up 
>> iSCSI session.
>> <1>[154908.716177] BUG: kernel NULL pointer dereference, address: 
>> 0140
>> <1>[154908.717023] #PF: supervisor write access in kernel mode
>> <1>[154908.717842] #PF: error_code(0x0002) - not-present page
>> <6>[154908.718667] PGD 0 P4D 0 
>> <4>[154908.719486] Oops: 0002 [#1] PREEMPT SMP PTI
>> <4>[154908.720289] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.3-0-lts 
>> #1-Alpine
>> <4>[154908.721089] Hardware name: Supermicro Super Server/X11SCL-F, BIOS 
>> 1.9 09/21/2022
>> <4>[154908.721888] RIP: 0010:sbitmap_queue_clear+0x3a/0xa0
>> <4>[154908.722667] Code: 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 
>> 8b 4f 04 ba ff ff ff ff 89 f0 d3 e2 d3 e8 f7 d2 48 c1 e0 07 48 03 47 10 21 
>> f2  48 0f ab 50 40 c7 44 24 04 01 00 00 00 48 8d 74 24 04 48 89 df
>> <4>[154908.724376] RSP: 0018:a6e43d48 EFLAGS: 00010202
>> <4>[154908.725249] RAX: 0100 RBX: 8ddc43039428 RCX: 
>> 0005
>> <4>[154908.726137] RDX: 000b RSI: 004b RDI: 
>> 8ddc43039428
>> 

Re: Kernel BUG: kernel NULL pointer dereference on Windows server connect/disconnect

2023-01-24 Thread The Lee-Man
This is a mailing list for open-iscsi, the iSCSI initiator. Your issues 
seems to be with LIO, the in-kernel target code.

I suggest the target development kernel mailing list: 
target-de...@vger.kernel.org

It would help if, when you post there, you include your distribution. Also, 
is there any other way to reproduce it, other than using a MS initiator (so 
that others might reproduce it)?

I use LIO (targetcli) targets regularly and haven't seen this issue.

On Monday, January 23, 2023 at 12:22:49 AM UTC-8 Forza wrote:

> Hi!
> I have an issue with spontaneous reboots of my ISCSI target server. It 
> seems to happen especially often when Windows Server 2016 clients reboot, 
> but I can't say for sure it happens every time either.
>
> The target is using a fileio backing store ontop of a BTRFS filesystem. I 
> have write-back enabled on this target (but it happens even with write 
> caching disabled).
>
> I am running Alpine Linux. The issue has happened since first installed 
> about a year ago. It is at least happeneing on kernels 5.15.60+ and 6.1.4, 
> 6.1.6.
>
>
> I managed to capture the following trace using pstore:
>
> <6>[   69.123671] RPC: Registered named UNIX socket transport module.
> <6>[   69.123674] RPC: Registered udp transport module.
> <6>[   69.123675] RPC: Registered tcp transport module.
> <6>[   69.123675] RPC: Registered tcp NFSv4.1 backchannel transport module.
> <6>[   70.281192] NFSD: Using UMH upcall client tracking operations.
> <6>[   70.281199] NFSD: starting 90-second grace period (net f000)
> <6>[   75.683777] Rounding down aligned max_sectors from 4294967295 to 
> 4294967288
> <4>[   76.014381] ignoring deprecated emulate_dpo attribute
> <4>[   76.014497] ignoring deprecated emulate_fua_read attribute
> <4>[   76.019775] dev[4a04ffbe]: Backstore name 
> 'sesoco3092-export' is too long for INQUIRY_MODEL, truncating to 15 
> characters
> <3>[   76.813023] bond0: (slave igb0): speed changed to 0 on port 1
> <6>[   76.868842] bond0: (slave igb0): link status definitely down, 
> disabling slave
> <6>[   76.868849] bond0: active interface up!
> <3>[   78.912829] bond0: (slave igb1): speed changed to 0 on port 2
> <6>[   78.982125] bond0: (slave igb1): link status definitely down, 
> disabling slave
> <6>[   78.982149] bond0: now running without any active interface!
> <6>[   80.332786] igb :03:00.0 igb0: igb: igb0 NIC Link is Up 1000 
> Mbps Full Duplex, Flow Control: RX
> <6>[   80.545683] bond0: (slave igb0): link status definitely up, 1000 
> Mbps full duplex
> <6>[   80.545691] bond0: active interface up!
> <6>[   82.449424] igb :04:00.0 igb1: igb: igb1 NIC Link is Up 1000 
> Mbps Full Duplex, Flow Control: RX
> <6>[   82.622345] bond0: (slave igb1): link status definitely up, 1000 
> Mbps full duplex
> <4>[ 7428.832735] hrtimer: interrupt took 7712 ns
> <6>[ 8364.650898] ice :01:00.0 ice0: NIC Link is up 25 Gbps Full 
> Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg Advertised: 
> Off, Autoneg Negotiated: False, Flow Control: None
> <6>[ 8364.651003] IPv6: ADDRCONF(NETDEV_CHANGE): ice0: link becomes ready
> <6>[11218.918482] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com 
> for information.
> <6>[11218.918484] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <
> ja...@zx2c4.com>. All Rights Reserved.
> <3>[154888.236235] Did not receive response to NOPIN on CID: 1, failing 
> connection for I_T Nexus 
> iqn.1991-05.com.microsoft:srv,i,0x41370001,iqn.2022-02.com.example.srv04:srv,t,0x01
> <3>[154908.716136] Time2Retain timer expired for SID: 1, cleaning up iSCSI 
> session.
> <1>[154908.716177] BUG: kernel NULL pointer dereference, address: 
> 0140
> <1>[154908.717023] #PF: supervisor write access in kernel mode
> <1>[154908.717842] #PF: error_code(0x0002) - not-present page
> <6>[154908.718667] PGD 0 P4D 0 
> <4>[154908.719486] Oops: 0002 [#1] PREEMPT SMP PTI
> <4>[154908.720289] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.3-0-lts 
> #1-Alpine
> <4>[154908.721089] Hardware name: Supermicro Super Server/X11SCL-F, BIOS 
> 1.9 09/21/2022
> <4>[154908.721888] RIP: 0010:sbitmap_queue_clear+0x3a/0xa0
> <4>[154908.722667] Code: 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 
> 8b 4f 04 ba ff ff ff ff 89 f0 d3 e2 d3 e8 f7 d2 48 c1 e0 07 48 03 47 10 21 
> f2  48 0f ab 50 40 c7 44 24 04 01 00 00 00 48 8d 74 24 04 48 89 df
> <4>[154908.724376] RSP: 0018:a6e43d48 EFLAGS: 00010202
> <4>[154908.725249] RAX: 0100 RBX: 8ddc43039428 RCX: 
> 0005
> <4>[154908.726137] RDX: 000b RSI: 004b RDI: 
> 8ddc43039428
> <4>[154908.727030] RBP: 004b R08:  R09: 
> 
> <4>[154908.727921] R10:  R11:  R12: 
> 
> <4>[154908.728812] R13: 8ddc43039380 R14: 8ddc49015370 R15: 
> 8ddc490157e0
> <4>[154908.729715] FS:  () GS:8df9aec0() 
> 

Kernel BUG: kernel NULL pointer dereference on Windows server connect/disconnect

2023-01-23 Thread Forza
Hi!
I have an issue with spontaneous reboots of my ISCSI target server. It 
seems to happen especially often when Windows Server 2016 clients reboot, 
but I can't say for sure it happens every time either.

The target is using a fileio backing store ontop of a BTRFS filesystem. I 
have write-back enabled on this target (but it happens even with write 
caching disabled).

I am running Alpine Linux. The issue has happened since first installed 
about a year ago. It is at least happeneing on kernels 5.15.60+ and 6.1.4, 
6.1.6.


I managed to capture the following trace using pstore:

<6>[   69.123671] RPC: Registered named UNIX socket transport module.
<6>[   69.123674] RPC: Registered udp transport module.
<6>[   69.123675] RPC: Registered tcp transport module.
<6>[   69.123675] RPC: Registered tcp NFSv4.1 backchannel transport module.
<6>[   70.281192] NFSD: Using UMH upcall client tracking operations.
<6>[   70.281199] NFSD: starting 90-second grace period (net f000)
<6>[   75.683777] Rounding down aligned max_sectors from 4294967295 to 
4294967288
<4>[   76.014381] ignoring deprecated emulate_dpo attribute
<4>[   76.014497] ignoring deprecated emulate_fua_read attribute
<4>[   76.019775] dev[4a04ffbe]: Backstore name 'sesoco3092-export' 
is too long for INQUIRY_MODEL, truncating to 15 characters
<3>[   76.813023] bond0: (slave igb0): speed changed to 0 on port 1
<6>[   76.868842] bond0: (slave igb0): link status definitely down, 
disabling slave
<6>[   76.868849] bond0: active interface up!
<3>[   78.912829] bond0: (slave igb1): speed changed to 0 on port 2
<6>[   78.982125] bond0: (slave igb1): link status definitely down, 
disabling slave
<6>[   78.982149] bond0: now running without any active interface!
<6>[   80.332786] igb :03:00.0 igb0: igb: igb0 NIC Link is Up 1000 Mbps 
Full Duplex, Flow Control: RX
<6>[   80.545683] bond0: (slave igb0): link status definitely up, 1000 Mbps 
full duplex
<6>[   80.545691] bond0: active interface up!
<6>[   82.449424] igb :04:00.0 igb1: igb: igb1 NIC Link is Up 1000 Mbps 
Full Duplex, Flow Control: RX
<6>[   82.622345] bond0: (slave igb1): link status definitely up, 1000 Mbps 
full duplex
<4>[ 7428.832735] hrtimer: interrupt took 7712 ns
<6>[ 8364.650898] ice :01:00.0 ice0: NIC Link is up 25 Gbps Full 
Duplex, Requested FEC: RS-FEC, Negotiated FEC: RS-FEC, Autoneg Advertised: 
Off, Autoneg Negotiated: False, Flow Control: None
<6>[ 8364.651003] IPv6: ADDRCONF(NETDEV_CHANGE): ice0: link becomes ready
<6>[11218.918482] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com 
for information.
<6>[11218.918484] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld 
. All Rights Reserved.
<3>[154888.236235] Did not receive response to NOPIN on CID: 1, failing 
connection for I_T Nexus 
iqn.1991-05.com.microsoft:srv,i,0x41370001,iqn.2022-02.com.example.srv04:srv,t,0x01
<3>[154908.716136] Time2Retain timer expired for SID: 1, cleaning up iSCSI 
session.
<1>[154908.716177] BUG: kernel NULL pointer dereference, address: 
0140
<1>[154908.717023] #PF: supervisor write access in kernel mode
<1>[154908.717842] #PF: error_code(0x0002) - not-present page
<6>[154908.718667] PGD 0 P4D 0 
<4>[154908.719486] Oops: 0002 [#1] PREEMPT SMP PTI
<4>[154908.720289] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.3-0-lts 
#1-Alpine
<4>[154908.721089] Hardware name: Supermicro Super Server/X11SCL-F, BIOS 
1.9 09/21/2022
<4>[154908.721888] RIP: 0010:sbitmap_queue_clear+0x3a/0xa0
<4>[154908.722667] Code: 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 31 c0 8b 
4f 04 ba ff ff ff ff 89 f0 d3 e2 d3 e8 f7 d2 48 c1 e0 07 48 03 47 10 21 f2 
 48 0f ab 50 40 c7 44 24 04 01 00 00 00 48 8d 74 24 04 48 89 df
<4>[154908.724376] RSP: 0018:a6e43d48 EFLAGS: 00010202
<4>[154908.725249] RAX: 0100 RBX: 8ddc43039428 RCX: 
0005
<4>[154908.726137] RDX: 000b RSI: 004b RDI: 
8ddc43039428
<4>[154908.727030] RBP: 004b R08:  R09: 

<4>[154908.727921] R10:  R11:  R12: 

<4>[154908.728812] R13: 8ddc43039380 R14: 8ddc49015370 R15: 
8ddc490157e0
<4>[154908.729715] FS:  () GS:8df9aec0() 
knlGS:
<4>[154908.730638] CS:  0010 DS:  ES:  CR0: 80050033
<4>[154908.731569] CR2: 0140 CR3: 0002dc20a005 CR4: 
003706f0
<4>[154908.732526] DR0:  DR1:  DR2: 

<4>[154908.733487] DR3:  DR6: fffe0ff0 DR7: 
0400
<4>[154908.734453] Call Trace:
<4>[154908.735411]  
<4>[154908.736371]  target_release_cmd_kref+0xd1/0x1f0 [target_core_mod]
<4>[154908.737425]  transport_generic_free_cmd+0xd1/0x180 [target_core_mod]
<4>[154908.738485]  iscsit_free_cmd+0x53/0xd0 [iscsi_target_mod]
<4>[154908.739536]  iscsit_free_connection_recovery_entries+0x29d/0x320 
[iscsi_target_mod]