[BUG]: Null pointer exception from parallel calls to iscsi_stop_conn

[email protected] Tue, 16 Jul 2024 12:25:45 -0700

Hi. I reviewed a kdump generated by a NULL pointer exception during 
termination of an iSCSI session. In this instance, the termination of the 
session was due to a 'Target-Not-Found' error from target during login.


The system is running SLES15 SP4 ( v5.14.21 )
 
crash> bt
PID: 61755  TASK: ffff88ae57e4c380  CPU: 6   COMMAND: "kworker/u40:3"
 #0 [ffffc90006b6fae8] machine_kexec at ffffffff8106af4e
 #1 [ffffc90006b6fb38] __crash_kexec at ffffffff81168dce
 #2 [ffffc90006b6fc00] panic at ffffffff8191aa0f
 #3 [ffffc90006b6fc88] oops_end at ffffffff8102e3dd
 #4 [ffffc90006b6fca8] page_fault_oops at ffffffff8107b6fb
 #5 [ffffc90006b6fd28] exc_page_fault at ffffffff81923610
 #6 [ffffc90006b6fd50] asm_exc_page_fault at ffffffff81a00f39
    [exception RIP: iscsi_sw_tcp_release_conn+111]
    RIP: ffffffffc0c8243f  RSP: ffffc90006b6fe08  RFLAGS: 00010202
    RAX: 0000000000000000  RBX: ffff8881cb225388  RCX: 0000000000000001
    RDX: ffff88adbf660900  RSI: ffffffff81f7cb84  RDI: ffff88adbf660980
    RBP: ffff888ad68cd140   R8: 0000000000000001   R9: 0000000000000001
    R10: 0000000000000000  R11: 00000000000001d2  R12: ffff8881cb225388
    R13: ffff8881cb2256a8  R14: ffff8881cb2256a8  R15: ffff888105d8ca05
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffc90006b6fe38] iscsi_sw_tcp_conn_stop at ffffffffc0c825fd 
[iscsi_tcp]
 #8 [ffffc90006b6fe58] iscsi_stop_conn at ffffffffc0f276f3 
[scsi_transport_iscsi]
 #9 [ffffc90006b6fe78] iscsi_cleanup_conn_work_fn at ffffffffc0f277f8 
[scsi_transport_iscsi]
#10 [ffffc90006b6fea0] process_one_work at ffffffff810b5766
#11 [ffffc90006b6fed8] worker_thread at ffffffff810b595d
#12 [ffffc90006b6ff10] kthread at ffffffff810bdb63
#13 [ffffc90006b6ff50] ret_from_fork at ffffffff8100204f

Based on code review and journal logs, iscsid detects the login error and 
initiates a TERM stop from user space. In parallel, the kernel driver 
detects a socket error and initiates a RECOVERY stop on the connection.  

*Initiated by iscsid*

iscsi_recv_login_rsp ->
  iscsi_login_eh ->
    session_conn_shutdown ->
      kstop_conn ->
       iscsi_if_transport_conn ->
         iscsi_if_stop_conn ->
           iscsi_stop_conn(conn, STOP_CONN_TERM)

*Initiated by error on TCP socket*

iscsi_sw_sk_state_check ->
  iscsi_conn_failure ->
    iscsi_conn_error_event ->
      iscsi_conn_error_event ->
        queue_work(iscsi_conn_cleanup_workq, &conn->cleanup_work);
        .
        .
        iscsi_cleanup_conn_work_fn ->
          iscsi_stop_conn(conn, STOP_CONN_RECOVER);

The null pointer exception occurred in the* iscsi_stop_conn *call initiated 
from the worker thread for cleanup. Both *iscsi_sw_tcp_conn_stop* and 
*iscsi_sw_tcp_release_conn* check for a NULL sock pointer in the connection 
but the call to *iscsi_sw_tcp_conn_restore_callbacks* within 
*iscsi_sw_tcp_release_conn* does not leaving a small window where the 
connection's socket pointer can be set to NULL by the other 
*iscsi_stop_conn* call running in parallel resulting in this exception.

It would be simple enough to add a check for a NULL socket pointer in 
*iscsi_sw_tcp_conn_restore_callbacks 
*but I'm not convinced that is the correct solution. It looks to me that 
the resulting state of the session and connections would be different 
depending on which of the two calls executes first. If the cleanup thread 
successfully stop the connection with RECOVERY,  it will set the socket 
pointer in the connection to NULL and this will short circuit the iscsid 
TERMINATE and keep it from modifying the connection/session states. 

Also, I noticed that the cleanup thread's call to iscsi_stop_conn is made 
while holding the ep_mutex while the call made from the iscsid is not. 
Should the call from iscsid to iscsi_stop_conn be made while holding the 
ep_mutex? 

Thanks in advance, 
Adam 

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/fe79d2d1-4b30-4a16-81e7-0e54f49a6c33n%40googlegroups.com.

[BUG]: Null pointer exception from parallel calls to iscsi_stop_conn

Reply via email to