Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression
Hi again, Chris and others! We're hitting this again on 4.14.32. Looks like the path iscsi_queuecommand -> prepd_reject/prepd_fault results in iscsi_complete_task getting called with frwd_lock held instead of the requisite back_lock which might explain the task list corruption under heavy loads. An alternate possibly problematic situation seems to be fail_mgmt_tasks which calls iscsi_complete_task while holding frwd_lock from iscsi_start_session_recovery. See also: https://github.com/open-iscsi/open-iscsi/issues/56 Attached an ugly attempt at ensuring back_lock is held instead. - Ilkka On 5 March 2017 at 22:49, Ilkka Sovantowrote: > Hi! > > Was running pretty nicely for a week or so, until we got this one: > > [534630.679965] BUG: unable to handle kernel at 0078 > [534630.684035] IP: [] iscsi_xmit_task+0x29/0xc0 > [534630.685724] PGD a1fcd3067 > [534630.687346] Oops: 0002 [#1] > [534630.688846] Modules linked in: > [534630.690348] CPU: 17 PID: 7988 Comm: kworker/u56:6 Tainted: GW > [534630.693327] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 41113_115728-nilsson.home.kraxel.org 04/01/2014 > [534630.696404] Workqueue: iscsi_q_8 iscsi_xmitworker > [534630.697948] task: 8800bab62f00 ti: 880c0d9d8000 task.ti: > 880c0d9d8000 > [534630.700837] RIP: 0010:[] [] > iscsi_xmit_task+0x29/0xc0 > [534630.703689] RSP: 0018:880c0d9dbdb0 EFLAGS: 00010246 > [534630.705209] RAX: ffc3 RBX: 880fe3dfda10 RCX: > dead0200 > [534630.708062] RDX: RSI: 0200 RDI: > 880fe3dfda10 > [534630.710773] RBP: 880c0d9dbdc8 R08: 880fe3881c80 R09: > > [534630.713539] R10: 880fb799f710 R11: 0260 R12: > 880fe3dfda10 > [534630.716344] R13: R14: 880fe3881c80 R15: > 880fe3dfdae0 > [534630.719143] FS: () GS:880fff42() > [534630.722001] CS: 0010 DS: ES: CR0: 80050033 > [534630.723492] CR2: 0078 CR3: 000aa8e44000 CR4: > 000406e0 > [534630.726163] Stack: > [534630.727622] 880fe3dfdaa8 880fe3dfda10 880fe3dfdac0 > 880c0d9dbe18 > [534630.730906] 8171226d 880fe3dfdad0 880fe3881c00 > 880fe3dfdab0 > [534630.767489] 880ff7d98780 880fe3dfdae0 880ffec99400 > 880fe3dfdae8 > > [534630.770543] Call Trace: > [534630.771967] [] iscsi_xmitworker+0x1dd/0x310 > [534630.773523] [] process_one_work+0x149/0x3e0 > [534630.775084] [] worker_thread+0x69/0x470 > [534630.776585] [] ? process_one_work+0x3e0/0x3e0 > [534630.778131] [] ? process_one_work+0x3e0/0x3e0 > [534630.779682] [] kthread+0xea/0x100 > [534630.781300] [] ? kthread_create_on_node+0x1a0/0x1a0 > [534630.782872] [] ret_from_fork+0x3f/0x70 > [534630.784340] [] ? kthread_create_on_node+0x1a0/0x1a0 > [534630.785885] Code: > [534630.792699] RIP [] iscsi_xmit_task+0x29/0xc0 > [534630.794266] RSP > [534630.795617] CR2: 0078 > [534630.798051] ---[ end trace 9d963f95212dfbe2 ]--- > [534630.799788] Kernel panic - not syncing: Fatal exception in interrupt > [534630.802712] Kernel Offset: disabled > [534630.804304] ---[ end Kernel panic - not syncing: Fatal exception > in interrupt > > And here's the disassembly for iscsi_xmit_task: > > 8170efa0 : > 8170efa0: e8 0b fd 52 00 call > 81c3ecb0 <__fentry__> > 8170efa5: 55 push rbp > 8170efa6: b8 c3 ff ff ff moveax,0xffc3 > 8170efab: 48 89 e5movrbp,rsp > 8170efae: 41 55 push r13 > 8170efb0: 41 54 push r12 > 8170efb2: 53 push rbx > 8170efb3: 48 8b 97 f0 00 00 00movrdx,QWORD PTR > [rdi+0xf0] > 8170efba: 4c 8b af 90 00 00 00movr13,QWORD PTR > [rdi+0x90] > 8170efc1: 83 e2 02andedx,0x2 > 8170efc4: 75 71 jne > 8170f037 > 8170efc6: 48 89 fbmovrbx,rdi > *** > 8170efc9: f0 41 ff 45 78 lock inc DWORD PTR [r13+0x78] > *** > 8170efce: 48 8b 47 10 movrax,QWORD PTR > [rdi+0x10] > 8170efd2: 48 8d b8 18 01 00 00leardi,[rax+0x118] > 8170efd9: e8 b2 d3 52 00 call > 81c3c390 <_raw_spin_unlock_bh> > 8170efde: 48 8b 43 10 movrax,QWORD PTR > [rbx+0x10] > 8170efe2: 4c 89 efmovrdi,r13 > 8170efe5: 48 8b 80 00 01 00 00movrax,QWORD PTR > [rax+0x100] > 8170efec: ff 90 98 00 00 00 call QWORD PTR [rax+0x98] > 8170eff2: 41 89 c4movr12d,eax > 8170eff5: 48 8b 43 10 mov
Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression
Hi! Was running pretty nicely for a week or so, until we got this one: [534630.679965] BUG: unable to handle kernel at 0078 [534630.684035] IP: [] iscsi_xmit_task+0x29/0xc0 [534630.685724] PGD a1fcd3067 [534630.687346] Oops: 0002 [#1] [534630.688846] Modules linked in: [534630.690348] CPU: 17 PID: 7988 Comm: kworker/u56:6 Tainted: GW [534630.693327] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 41113_115728-nilsson.home.kraxel.org 04/01/2014 [534630.696404] Workqueue: iscsi_q_8 iscsi_xmitworker [534630.697948] task: 8800bab62f00 ti: 880c0d9d8000 task.ti: 880c0d9d8000 [534630.700837] RIP: 0010:[] [] iscsi_xmit_task+0x29/0xc0 [534630.703689] RSP: 0018:880c0d9dbdb0 EFLAGS: 00010246 [534630.705209] RAX: ffc3 RBX: 880fe3dfda10 RCX: dead0200 [534630.708062] RDX: RSI: 0200 RDI: 880fe3dfda10 [534630.710773] RBP: 880c0d9dbdc8 R08: 880fe3881c80 R09: [534630.713539] R10: 880fb799f710 R11: 0260 R12: 880fe3dfda10 [534630.716344] R13: R14: 880fe3881c80 R15: 880fe3dfdae0 [534630.719143] FS: () GS:880fff42() [534630.722001] CS: 0010 DS: ES: CR0: 80050033 [534630.723492] CR2: 0078 CR3: 000aa8e44000 CR4: 000406e0 [534630.726163] Stack: [534630.727622] 880fe3dfdaa8 880fe3dfda10 880fe3dfdac0 880c0d9dbe18 [534630.730906] 8171226d 880fe3dfdad0 880fe3881c00 880fe3dfdab0 [534630.767489] 880ff7d98780 880fe3dfdae0 880ffec99400 880fe3dfdae8 [534630.770543] Call Trace: [534630.771967] [] iscsi_xmitworker+0x1dd/0x310 [534630.773523] [] process_one_work+0x149/0x3e0 [534630.775084] [] worker_thread+0x69/0x470 [534630.776585] [] ? process_one_work+0x3e0/0x3e0 [534630.778131] [] ? process_one_work+0x3e0/0x3e0 [534630.779682] [] kthread+0xea/0x100 [534630.781300] [] ? kthread_create_on_node+0x1a0/0x1a0 [534630.782872] [] ret_from_fork+0x3f/0x70 [534630.784340] [] ? kthread_create_on_node+0x1a0/0x1a0 [534630.785885] Code: [534630.792699] RIP [] iscsi_xmit_task+0x29/0xc0 [534630.794266] RSP [534630.795617] CR2: 0078 [534630.798051] ---[ end trace 9d963f95212dfbe2 ]--- [534630.799788] Kernel panic - not syncing: Fatal exception in interrupt [534630.802712] Kernel Offset: disabled [534630.804304] ---[ end Kernel panic - not syncing: Fatal exception in interrupt And here's the disassembly for iscsi_xmit_task: 8170efa0 : 8170efa0: e8 0b fd 52 00 call 81c3ecb0 <__fentry__> 8170efa5: 55 push rbp 8170efa6: b8 c3 ff ff ff moveax,0xffc3 8170efab: 48 89 e5movrbp,rsp 8170efae: 41 55 push r13 8170efb0: 41 54 push r12 8170efb2: 53 push rbx 8170efb3: 48 8b 97 f0 00 00 00movrdx,QWORD PTR [rdi+0xf0] 8170efba: 4c 8b af 90 00 00 00movr13,QWORD PTR [rdi+0x90] 8170efc1: 83 e2 02andedx,0x2 8170efc4: 75 71 jne 8170f0378170efc6: 48 89 fbmovrbx,rdi *** 8170efc9: f0 41 ff 45 78 lock inc DWORD PTR [r13+0x78] *** 8170efce: 48 8b 47 10 movrax,QWORD PTR [rdi+0x10] 8170efd2: 48 8d b8 18 01 00 00leardi,[rax+0x118] 8170efd9: e8 b2 d3 52 00 call 81c3c390 <_raw_spin_unlock_bh> 8170efde: 48 8b 43 10 movrax,QWORD PTR [rbx+0x10] 8170efe2: 4c 89 efmovrdi,r13 8170efe5: 48 8b 80 00 01 00 00movrax,QWORD PTR [rax+0x100] 8170efec: ff 90 98 00 00 00 call QWORD PTR [rax+0x98] 8170eff2: 41 89 c4movr12d,eax 8170eff5: 48 8b 43 10 movrax,QWORD PTR [rbx+0x10] 8170eff9: 48 8d b8 18 01 00 00leardi,[rax+0x118] 8170f000: e8 9b d4 52 00 call 81c3c4a0 <_raw_spin_lock_bh> 8170f005: 45 85 e4test r12d,r12d 8170f008: 74 34 je 8170f03e 8170f00a: 48 8b 43 10 movrax,QWORD PTR [rbx+0x10] 8170f00e: 48 8d b8 1c 01 00 00leardi,[rax+0x11c] 8170f015: e8 e6 d3 52 00 call 81c3c400 <_raw_spin_lock> 8170f01a: 4c 89 efmovrdi,r13 8170f01d: e8 ce f5 ff ff call 8170e5f0 <__iscsi_put_task> 8170f022: 48 8b 7b 10 movrdi,QWORD PTR [rbx+0x10] 8170f026: 48 81 c7 1c 01 00
Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression
On 02/23/2017 07:25 PM, Chris Leech wrote: > Yikes, my git-send-email settings suppressed the important CCs. Sorry! > > Guilherme and Ilkka, can you comment about your testing results or review > please? Hi Chris, thanks for looping me. Patch seems nice, I do have some points below, most nitpicks heheh Feel free to accept or not the suggestions! Also, you can add my: Reviewed-by: Guilherme G. Piccoli> > - Chris Leech > > - Original Message - >> There's a rather long standing regression from commit >> 659743b [SCSI] libiscsi: Reduce locking contention in fast path >> Perhaps worth to follow the checkpatch "rule" of citing commits here? 659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in fast path") >> Depending on iSCSI target behavior, it's possible to hit the case in >> iscsi_complete_task where the task is still on a pending list >> (!list_empty(>running)). When that happens the task is removed >> from the list while holding the session back_lock, but other task list >> modification occur under the frwd_lock. That leads to linked list >> corruption and eventually a panicked system. >> >> Rather than back out the session lock split entirely, in order to try >> and keep some of the performance gains this patch adds another lock to >> maintain the task lists integrity. >> >> Major enterprise supported kernels have been backing out the lock split >> for while now, thanks to the efforts at IBM where a lab setup has the >> most reliable reproducer I've seen on this issue. This patch has been >> tested there successfully. >> >> Signed-off-by: Chris Leech Guess we're missing here: (a) Fixes: 659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in fast path") (b) CC: #v3.15+ Also, would be nice to point the original reporter, if possible: Reported-by: Prashantha Subbarao >> --- >> drivers/scsi/libiscsi.c | 26 +- >> include/scsi/libiscsi.h | 1 + >> 2 files changed, 26 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c >> index 834d121..acb5ef3 100644 >> --- a/drivers/scsi/libiscsi.c >> +++ b/drivers/scsi/libiscsi.c >> @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task *task, >> int state) >> WARN_ON_ONCE(task->state == ISCSI_TASK_FREE); >> task->state = state; >> >> -if (!list_empty(>running)) >> +spin_lock_bh(>taskqueuelock); >> +if (!list_empty(>running)) { >> +WARN_ONCE(1, "iscsi_complete_task while task on list"); I'm retesting this patch right now, and again I saw this giant warning. Perhaps worth to make it pr_debug()? So, it can be enabled as user desire and don't alarm all users too much. Thanks, Guilherme >> list_del_init(>running); >> +} >> +spin_unlock_bh(>taskqueuelock); >> >> if (conn->task == task) >> conn->task = NULL; >> @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct >> iscsi_hdr *hdr, >> if (session->tt->xmit_task(task)) >> goto free_task; >> } else { >> +spin_lock_bh(>taskqueuelock); >> list_add_tail(>running, >mgmtqueue); >> +spin_unlock_bh(>taskqueuelock); >> iscsi_conn_queue_work(conn); >> } >> >> @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task) >> * this may be on the requeue list already if the xmit_task callout >> * is handling the r2ts while we are adding new ones >> */ >> +spin_lock_bh(>taskqueuelock); >> if (list_empty(>running)) >> list_add_tail(>running, >requeue); >> +spin_unlock_bh(>taskqueuelock); >> iscsi_conn_queue_work(conn); >> } >> EXPORT_SYMBOL_GPL(iscsi_requeue_task); >> @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) >> * only have one nop-out as a ping from us and targets should not >> * overflow us with nop-ins >> */ >> +spin_lock_bh(>taskqueuelock); >> check_mgmt: >> while (!list_empty(>mgmtqueue)) { >> conn->task = list_entry(conn->mgmtqueue.next, >> struct iscsi_task, running); >> list_del_init(>task->running); >> +spin_unlock_bh(>taskqueuelock); >> if (iscsi_prep_mgmt_task(conn, conn->task)) { >> /* regular RX path uses back_lock */ >> spin_lock_bh(>session->back_lock); >> __iscsi_put_task(conn->task); >> spin_unlock_bh(>session->back_lock); >> conn->task = NULL; >> +spin_lock_bh(>taskqueuelock); >> continue; >> } >> rc = iscsi_xmit_task(conn); >> if (rc) >> goto done; >> +spin_lock_bh(>taskqueuelock); >> } >> >> /*
Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression
Yikes, my git-send-email settings suppressed the important CCs. Sorry! Guilherme and Ilkka, can you comment about your testing results or review please? - Chris Leech - Original Message - > There's a rather long standing regression from commit > 659743b [SCSI] libiscsi: Reduce locking contention in fast path > > Depending on iSCSI target behavior, it's possible to hit the case in > iscsi_complete_task where the task is still on a pending list > (!list_empty(>running)). When that happens the task is removed > from the list while holding the session back_lock, but other task list > modification occur under the frwd_lock. That leads to linked list > corruption and eventually a panicked system. > > Rather than back out the session lock split entirely, in order to try > and keep some of the performance gains this patch adds another lock to > maintain the task lists integrity. > > Major enterprise supported kernels have been backing out the lock split > for while now, thanks to the efforts at IBM where a lab setup has the > most reliable reproducer I've seen on this issue. This patch has been > tested there successfully. > > Signed-off-by: Chris Leech> --- > drivers/scsi/libiscsi.c | 26 +- > include/scsi/libiscsi.h | 1 + > 2 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 834d121..acb5ef3 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task *task, > int state) > WARN_ON_ONCE(task->state == ISCSI_TASK_FREE); > task->state = state; > > - if (!list_empty(>running)) > + spin_lock_bh(>taskqueuelock); > + if (!list_empty(>running)) { > + WARN_ONCE(1, "iscsi_complete_task while task on list"); > list_del_init(>running); > + } > + spin_unlock_bh(>taskqueuelock); > > if (conn->task == task) > conn->task = NULL; > @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct > iscsi_hdr *hdr, > if (session->tt->xmit_task(task)) > goto free_task; > } else { > + spin_lock_bh(>taskqueuelock); > list_add_tail(>running, >mgmtqueue); > + spin_unlock_bh(>taskqueuelock); > iscsi_conn_queue_work(conn); > } > > @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task) >* this may be on the requeue list already if the xmit_task callout >* is handling the r2ts while we are adding new ones >*/ > + spin_lock_bh(>taskqueuelock); > if (list_empty(>running)) > list_add_tail(>running, >requeue); > + spin_unlock_bh(>taskqueuelock); > iscsi_conn_queue_work(conn); > } > EXPORT_SYMBOL_GPL(iscsi_requeue_task); > @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) >* only have one nop-out as a ping from us and targets should not >* overflow us with nop-ins >*/ > + spin_lock_bh(>taskqueuelock); > check_mgmt: > while (!list_empty(>mgmtqueue)) { > conn->task = list_entry(conn->mgmtqueue.next, >struct iscsi_task, running); > list_del_init(>task->running); > + spin_unlock_bh(>taskqueuelock); > if (iscsi_prep_mgmt_task(conn, conn->task)) { > /* regular RX path uses back_lock */ > spin_lock_bh(>session->back_lock); > __iscsi_put_task(conn->task); > spin_unlock_bh(>session->back_lock); > conn->task = NULL; > + spin_lock_bh(>taskqueuelock); > continue; > } > rc = iscsi_xmit_task(conn); > if (rc) > goto done; > + spin_lock_bh(>taskqueuelock); > } > > /* process pending command queue */ > @@ -1535,19 +1547,24 @@ static int iscsi_data_xmit(struct iscsi_conn *conn) > conn->task = list_entry(conn->cmdqueue.next, struct iscsi_task, > running); > list_del_init(>task->running); > + spin_unlock_bh(>taskqueuelock); > if (conn->session->state == ISCSI_STATE_LOGGING_OUT) { > fail_scsi_task(conn->task, DID_IMM_RETRY); > + spin_lock_bh(>taskqueuelock); > continue; > } > rc = iscsi_prep_scsi_cmd_pdu(conn->task); > if (rc) { > if (rc == -ENOMEM || rc == -EACCES) { > + spin_lock_bh(>taskqueuelock); > list_add_tail(>task->running, > >cmdqueue); >
Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression
> "Chris" == Chris Leechwrites: Chris> There's a rather long standing regression from commit 659743b Chris> [SCSI] libiscsi: Reduce locking contention in fast path Chris> Depending on iSCSI target behavior, it's possible to hit the case Chris> in iscsi_complete_task where the task is still on a pending list Chris> (!list_empty(>running)). When that happens the task is Chris> removed from the list while holding the session back_lock, but Chris> other task list modification occur under the frwd_lock. That Chris> leads to linked list corruption and eventually a panicked system. Chris> Rather than back out the session lock split entirely, in order to Chris> try and keep some of the performance gains this patch adds Chris> another lock to maintain the task lists integrity. Chris> Major enterprise supported kernels have been backing out the lock Chris> split for while now, thanks to the efforts at IBM where a lab Chris> setup has the most reliable reproducer I've seen on this issue. Chris> This patch has been tested there successfully. Reviews, please! -- Martin K. Petersen Oracle Linux Engineering