Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression

2018-05-09 Thread Ilkka Sovanto
Hi again, Chris and others!

We're hitting this again on 4.14.32.

Looks like the path iscsi_queuecommand -> prepd_reject/prepd_fault
results in iscsi_complete_task getting called with frwd_lock held
instead of the requisite back_lock which might explain the task list
corruption under heavy loads.

An alternate possibly problematic situation seems to be
fail_mgmt_tasks which calls iscsi_complete_task while holding
frwd_lock from iscsi_start_session_recovery.

See also: https://github.com/open-iscsi/open-iscsi/issues/56

Attached an ugly attempt at ensuring back_lock is held instead.

 - Ilkka

On 5 March 2017 at 22:49, Ilkka Sovanto  wrote:
> Hi!
>
> Was running pretty nicely for a week or so, until we got this one:
>
> [534630.679965] BUG: unable to handle kernel at 0078
> [534630.684035] IP: [] iscsi_xmit_task+0x29/0xc0
> [534630.685724] PGD a1fcd3067
> [534630.687346] Oops: 0002 [#1]
> [534630.688846] Modules linked in:
> [534630.690348] CPU: 17 PID: 7988 Comm: kworker/u56:6 Tainted: GW
> [534630.693327] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 41113_115728-nilsson.home.kraxel.org 04/01/2014
> [534630.696404] Workqueue: iscsi_q_8 iscsi_xmitworker
> [534630.697948] task: 8800bab62f00 ti: 880c0d9d8000 task.ti:
> 880c0d9d8000
> [534630.700837] RIP: 0010:[] []
> iscsi_xmit_task+0x29/0xc0
> [534630.703689] RSP: 0018:880c0d9dbdb0  EFLAGS: 00010246
> [534630.705209] RAX: ffc3 RBX: 880fe3dfda10 RCX:
> dead0200
> [534630.708062] RDX:  RSI: 0200 RDI:
> 880fe3dfda10
> [534630.710773] RBP: 880c0d9dbdc8 R08: 880fe3881c80 R09:
> 
> [534630.713539] R10: 880fb799f710 R11: 0260 R12:
> 880fe3dfda10
> [534630.716344] R13:  R14: 880fe3881c80 R15:
> 880fe3dfdae0
> [534630.719143] FS:  () GS:880fff42()
> [534630.722001] CS:  0010 DS:  ES:  CR0: 80050033
> [534630.723492] CR2: 0078 CR3: 000aa8e44000 CR4:
> 000406e0
> [534630.726163] Stack:
> [534630.727622]  880fe3dfdaa8 880fe3dfda10 880fe3dfdac0
> 880c0d9dbe18
> [534630.730906]  8171226d 880fe3dfdad0 880fe3881c00
> 880fe3dfdab0
> [534630.767489]  880ff7d98780 880fe3dfdae0 880ffec99400
> 880fe3dfdae8
>
> [534630.770543] Call Trace:
> [534630.771967]  [] iscsi_xmitworker+0x1dd/0x310
> [534630.773523]  [] process_one_work+0x149/0x3e0
> [534630.775084]  [] worker_thread+0x69/0x470
> [534630.776585]  [] ? process_one_work+0x3e0/0x3e0
> [534630.778131]  [] ? process_one_work+0x3e0/0x3e0
> [534630.779682]  [] kthread+0xea/0x100
> [534630.781300]  [] ? kthread_create_on_node+0x1a0/0x1a0
> [534630.782872]  [] ret_from_fork+0x3f/0x70
> [534630.784340]  [] ? kthread_create_on_node+0x1a0/0x1a0
> [534630.785885] Code:
> [534630.792699] RIP [] iscsi_xmit_task+0x29/0xc0
> [534630.794266]  RSP 
> [534630.795617] CR2: 0078
> [534630.798051] ---[ end trace 9d963f95212dfbe2 ]---
> [534630.799788] Kernel panic - not syncing: Fatal exception in interrupt
> [534630.802712] Kernel Offset: disabled
> [534630.804304] ---[ end Kernel panic - not syncing: Fatal exception
> in interrupt
>
> And here's the disassembly for iscsi_xmit_task:
>
> 8170efa0 :
> 8170efa0:   e8 0b fd 52 00  call
> 81c3ecb0 <__fentry__>
> 8170efa5:   55  push   rbp
> 8170efa6:   b8 c3 ff ff ff  moveax,0xffc3
> 8170efab:   48 89 e5movrbp,rsp
> 8170efae:   41 55   push   r13
> 8170efb0:   41 54   push   r12
> 8170efb2:   53  push   rbx
> 8170efb3:   48 8b 97 f0 00 00 00movrdx,QWORD PTR 
> [rdi+0xf0]
> 8170efba:   4c 8b af 90 00 00 00movr13,QWORD PTR 
> [rdi+0x90]
> 8170efc1:   83 e2 02andedx,0x2
> 8170efc4:   75 71   jne
> 8170f037 
> 8170efc6:   48 89 fbmovrbx,rdi
> ***
> 8170efc9:   f0 41 ff 45 78  lock inc DWORD PTR [r13+0x78]
> ***
> 8170efce:   48 8b 47 10 movrax,QWORD PTR 
> [rdi+0x10]
> 8170efd2:   48 8d b8 18 01 00 00leardi,[rax+0x118]
> 8170efd9:   e8 b2 d3 52 00  call
> 81c3c390 <_raw_spin_unlock_bh>
> 8170efde:   48 8b 43 10 movrax,QWORD PTR 
> [rbx+0x10]
> 8170efe2:   4c 89 efmovrdi,r13
> 8170efe5:   48 8b 80 00 01 00 00movrax,QWORD PTR 
> [rax+0x100]
> 8170efec:   ff 90 98 00 00 00   call   QWORD PTR [rax+0x98]
> 8170eff2:   41 89 c4movr12d,eax
> 8170eff5:   48 8b 43 10 mov

Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression

2017-03-05 Thread Ilkka Sovanto
Hi!

Was running pretty nicely for a week or so, until we got this one:

[534630.679965] BUG: unable to handle kernel at 0078
[534630.684035] IP: [] iscsi_xmit_task+0x29/0xc0
[534630.685724] PGD a1fcd3067
[534630.687346] Oops: 0002 [#1]
[534630.688846] Modules linked in:
[534630.690348] CPU: 17 PID: 7988 Comm: kworker/u56:6 Tainted: GW
[534630.693327] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 41113_115728-nilsson.home.kraxel.org 04/01/2014
[534630.696404] Workqueue: iscsi_q_8 iscsi_xmitworker
[534630.697948] task: 8800bab62f00 ti: 880c0d9d8000 task.ti:
880c0d9d8000
[534630.700837] RIP: 0010:[] []
iscsi_xmit_task+0x29/0xc0
[534630.703689] RSP: 0018:880c0d9dbdb0  EFLAGS: 00010246
[534630.705209] RAX: ffc3 RBX: 880fe3dfda10 RCX:
dead0200
[534630.708062] RDX:  RSI: 0200 RDI:
880fe3dfda10
[534630.710773] RBP: 880c0d9dbdc8 R08: 880fe3881c80 R09:

[534630.713539] R10: 880fb799f710 R11: 0260 R12:
880fe3dfda10
[534630.716344] R13:  R14: 880fe3881c80 R15:
880fe3dfdae0
[534630.719143] FS:  () GS:880fff42()
[534630.722001] CS:  0010 DS:  ES:  CR0: 80050033
[534630.723492] CR2: 0078 CR3: 000aa8e44000 CR4:
000406e0
[534630.726163] Stack:
[534630.727622]  880fe3dfdaa8 880fe3dfda10 880fe3dfdac0
880c0d9dbe18
[534630.730906]  8171226d 880fe3dfdad0 880fe3881c00
880fe3dfdab0
[534630.767489]  880ff7d98780 880fe3dfdae0 880ffec99400
880fe3dfdae8

[534630.770543] Call Trace:
[534630.771967]  [] iscsi_xmitworker+0x1dd/0x310
[534630.773523]  [] process_one_work+0x149/0x3e0
[534630.775084]  [] worker_thread+0x69/0x470
[534630.776585]  [] ? process_one_work+0x3e0/0x3e0
[534630.778131]  [] ? process_one_work+0x3e0/0x3e0
[534630.779682]  [] kthread+0xea/0x100
[534630.781300]  [] ? kthread_create_on_node+0x1a0/0x1a0
[534630.782872]  [] ret_from_fork+0x3f/0x70
[534630.784340]  [] ? kthread_create_on_node+0x1a0/0x1a0
[534630.785885] Code:
[534630.792699] RIP [] iscsi_xmit_task+0x29/0xc0
[534630.794266]  RSP 
[534630.795617] CR2: 0078
[534630.798051] ---[ end trace 9d963f95212dfbe2 ]---
[534630.799788] Kernel panic - not syncing: Fatal exception in interrupt
[534630.802712] Kernel Offset: disabled
[534630.804304] ---[ end Kernel panic - not syncing: Fatal exception
in interrupt

And here's the disassembly for iscsi_xmit_task:

8170efa0 :
8170efa0:   e8 0b fd 52 00  call
81c3ecb0 <__fentry__>
8170efa5:   55  push   rbp
8170efa6:   b8 c3 ff ff ff  moveax,0xffc3
8170efab:   48 89 e5movrbp,rsp
8170efae:   41 55   push   r13
8170efb0:   41 54   push   r12
8170efb2:   53  push   rbx
8170efb3:   48 8b 97 f0 00 00 00movrdx,QWORD PTR [rdi+0xf0]
8170efba:   4c 8b af 90 00 00 00movr13,QWORD PTR [rdi+0x90]
8170efc1:   83 e2 02andedx,0x2
8170efc4:   75 71   jne
8170f037 
8170efc6:   48 89 fbmovrbx,rdi
***
8170efc9:   f0 41 ff 45 78  lock inc DWORD PTR [r13+0x78]
***
8170efce:   48 8b 47 10 movrax,QWORD PTR [rdi+0x10]
8170efd2:   48 8d b8 18 01 00 00leardi,[rax+0x118]
8170efd9:   e8 b2 d3 52 00  call
81c3c390 <_raw_spin_unlock_bh>
8170efde:   48 8b 43 10 movrax,QWORD PTR [rbx+0x10]
8170efe2:   4c 89 efmovrdi,r13
8170efe5:   48 8b 80 00 01 00 00movrax,QWORD PTR [rax+0x100]
8170efec:   ff 90 98 00 00 00   call   QWORD PTR [rax+0x98]
8170eff2:   41 89 c4movr12d,eax
8170eff5:   48 8b 43 10 movrax,QWORD PTR [rbx+0x10]
8170eff9:   48 8d b8 18 01 00 00leardi,[rax+0x118]
8170f000:   e8 9b d4 52 00  call
81c3c4a0 <_raw_spin_lock_bh>
8170f005:   45 85 e4test   r12d,r12d
8170f008:   74 34   je
8170f03e 
8170f00a:   48 8b 43 10 movrax,QWORD PTR [rbx+0x10]
8170f00e:   48 8d b8 1c 01 00 00leardi,[rax+0x11c]
8170f015:   e8 e6 d3 52 00  call
81c3c400 <_raw_spin_lock>
8170f01a:   4c 89 efmovrdi,r13
8170f01d:   e8 ce f5 ff ff  call
8170e5f0 <__iscsi_put_task>
8170f022:   48 8b 7b 10 movrdi,QWORD PTR [rbx+0x10]
8170f026:   48 81 c7 1c 01 00 

Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression

2017-02-24 Thread Guilherme G. Piccoli
On 02/23/2017 07:25 PM, Chris Leech wrote:
> Yikes, my git-send-email settings suppressed the important CCs.  Sorry!
> 
> Guilherme and Ilkka, can you comment about your testing results or review 
> please?

Hi Chris, thanks for looping me. Patch seems nice, I do have some points
below, most nitpicks heheh
Feel free to accept or not the suggestions!

Also, you can add my:

Reviewed-by: Guilherme G. Piccoli 


> 
> - Chris Leech
> 
> - Original Message -
>> There's a rather long standing regression from commit
>> 659743b [SCSI] libiscsi: Reduce locking contention in fast path
>>

Perhaps worth to follow the checkpatch "rule" of citing commits here?
659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in fast path")


>> Depending on iSCSI target behavior, it's possible to hit the case in
>> iscsi_complete_task where the task is still on a pending list
>> (!list_empty(>running)).  When that happens the task is removed
>> from the list while holding the session back_lock, but other task list
>> modification occur under the frwd_lock.  That leads to linked list
>> corruption and eventually a panicked system.
>>
>> Rather than back out the session lock split entirely, in order to try
>> and keep some of the performance gains this patch adds another lock to
>> maintain the task lists integrity.
>>
>> Major enterprise supported kernels have been backing out the lock split
>> for while now, thanks to the efforts at IBM where a lab setup has the
>> most reliable reproducer I've seen on this issue.  This patch has been
>> tested there successfully.
>>
>> Signed-off-by: Chris Leech 

Guess we're missing here:

(a) Fixes: 659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in
fast path")

(b) CC:  #v3.15+

Also, would be nice to point the original reporter, if possible:

Reported-by: Prashantha Subbarao 


>> ---
>>  drivers/scsi/libiscsi.c | 26 +-
>>  include/scsi/libiscsi.h |  1 +
>>  2 files changed, 26 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
>> index 834d121..acb5ef3 100644
>> --- a/drivers/scsi/libiscsi.c
>> +++ b/drivers/scsi/libiscsi.c
>> @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task *task,
>> int state)
>>  WARN_ON_ONCE(task->state == ISCSI_TASK_FREE);
>>  task->state = state;
>>  
>> -if (!list_empty(>running))
>> +spin_lock_bh(>taskqueuelock);
>> +if (!list_empty(>running)) {
>> +WARN_ONCE(1, "iscsi_complete_task while task on list");

I'm retesting this patch right now, and again I saw this giant warning.
Perhaps worth to make it pr_debug()? So, it can be enabled as user
desire and don't alarm all users too much.

Thanks,


Guilherme


>>  list_del_init(>running);
>> +}
>> +spin_unlock_bh(>taskqueuelock);
>>  
>>  if (conn->task == task)
>>  conn->task = NULL;
>> @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct
>> iscsi_hdr *hdr,
>>  if (session->tt->xmit_task(task))
>>  goto free_task;
>>  } else {
>> +spin_lock_bh(>taskqueuelock);
>>  list_add_tail(>running, >mgmtqueue);
>> +spin_unlock_bh(>taskqueuelock);
>>  iscsi_conn_queue_work(conn);
>>  }
>>  
>> @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task)
>>   * this may be on the requeue list already if the xmit_task callout
>>   * is handling the r2ts while we are adding new ones
>>   */
>> +spin_lock_bh(>taskqueuelock);
>>  if (list_empty(>running))
>>  list_add_tail(>running, >requeue);
>> +spin_unlock_bh(>taskqueuelock);
>>  iscsi_conn_queue_work(conn);
>>  }
>>  EXPORT_SYMBOL_GPL(iscsi_requeue_task);
>> @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
>>   * only have one nop-out as a ping from us and targets should not
>>   * overflow us with nop-ins
>>   */
>> +spin_lock_bh(>taskqueuelock);
>>  check_mgmt:
>>  while (!list_empty(>mgmtqueue)) {
>>  conn->task = list_entry(conn->mgmtqueue.next,
>>   struct iscsi_task, running);
>>  list_del_init(>task->running);
>> +spin_unlock_bh(>taskqueuelock);
>>  if (iscsi_prep_mgmt_task(conn, conn->task)) {
>>  /* regular RX path uses back_lock */
>>  spin_lock_bh(>session->back_lock);
>>  __iscsi_put_task(conn->task);
>>  spin_unlock_bh(>session->back_lock);
>>  conn->task = NULL;
>> +spin_lock_bh(>taskqueuelock);
>>  continue;
>>  }
>>  rc = iscsi_xmit_task(conn);
>>  if (rc)
>>  goto done;
>> +spin_lock_bh(>taskqueuelock);
>>  }
>>  
>>  /* 

Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression

2017-02-23 Thread Chris Leech
Yikes, my git-send-email settings suppressed the important CCs.  Sorry!

Guilherme and Ilkka, can you comment about your testing results or review 
please?

- Chris Leech

- Original Message -
> There's a rather long standing regression from commit
> 659743b [SCSI] libiscsi: Reduce locking contention in fast path
> 
> Depending on iSCSI target behavior, it's possible to hit the case in
> iscsi_complete_task where the task is still on a pending list
> (!list_empty(>running)).  When that happens the task is removed
> from the list while holding the session back_lock, but other task list
> modification occur under the frwd_lock.  That leads to linked list
> corruption and eventually a panicked system.
> 
> Rather than back out the session lock split entirely, in order to try
> and keep some of the performance gains this patch adds another lock to
> maintain the task lists integrity.
> 
> Major enterprise supported kernels have been backing out the lock split
> for while now, thanks to the efforts at IBM where a lab setup has the
> most reliable reproducer I've seen on this issue.  This patch has been
> tested there successfully.
> 
> Signed-off-by: Chris Leech 
> ---
>  drivers/scsi/libiscsi.c | 26 +-
>  include/scsi/libiscsi.h |  1 +
>  2 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 834d121..acb5ef3 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task *task,
> int state)
>   WARN_ON_ONCE(task->state == ISCSI_TASK_FREE);
>   task->state = state;
>  
> - if (!list_empty(>running))
> + spin_lock_bh(>taskqueuelock);
> + if (!list_empty(>running)) {
> + WARN_ONCE(1, "iscsi_complete_task while task on list");
>   list_del_init(>running);
> + }
> + spin_unlock_bh(>taskqueuelock);
>  
>   if (conn->task == task)
>   conn->task = NULL;
> @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct
> iscsi_hdr *hdr,
>   if (session->tt->xmit_task(task))
>   goto free_task;
>   } else {
> + spin_lock_bh(>taskqueuelock);
>   list_add_tail(>running, >mgmtqueue);
> + spin_unlock_bh(>taskqueuelock);
>   iscsi_conn_queue_work(conn);
>   }
>  
> @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task)
>* this may be on the requeue list already if the xmit_task callout
>* is handling the r2ts while we are adding new ones
>*/
> + spin_lock_bh(>taskqueuelock);
>   if (list_empty(>running))
>   list_add_tail(>running, >requeue);
> + spin_unlock_bh(>taskqueuelock);
>   iscsi_conn_queue_work(conn);
>  }
>  EXPORT_SYMBOL_GPL(iscsi_requeue_task);
> @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
>* only have one nop-out as a ping from us and targets should not
>* overflow us with nop-ins
>*/
> + spin_lock_bh(>taskqueuelock);
>  check_mgmt:
>   while (!list_empty(>mgmtqueue)) {
>   conn->task = list_entry(conn->mgmtqueue.next,
>struct iscsi_task, running);
>   list_del_init(>task->running);
> + spin_unlock_bh(>taskqueuelock);
>   if (iscsi_prep_mgmt_task(conn, conn->task)) {
>   /* regular RX path uses back_lock */
>   spin_lock_bh(>session->back_lock);
>   __iscsi_put_task(conn->task);
>   spin_unlock_bh(>session->back_lock);
>   conn->task = NULL;
> + spin_lock_bh(>taskqueuelock);
>   continue;
>   }
>   rc = iscsi_xmit_task(conn);
>   if (rc)
>   goto done;
> + spin_lock_bh(>taskqueuelock);
>   }
>  
>   /* process pending command queue */
> @@ -1535,19 +1547,24 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
>   conn->task = list_entry(conn->cmdqueue.next, struct iscsi_task,
>   running);
>   list_del_init(>task->running);
> + spin_unlock_bh(>taskqueuelock);
>   if (conn->session->state == ISCSI_STATE_LOGGING_OUT) {
>   fail_scsi_task(conn->task, DID_IMM_RETRY);
> + spin_lock_bh(>taskqueuelock);
>   continue;
>   }
>   rc = iscsi_prep_scsi_cmd_pdu(conn->task);
>   if (rc) {
>   if (rc == -ENOMEM || rc == -EACCES) {
> + spin_lock_bh(>taskqueuelock);
>   list_add_tail(>task->running,
> >cmdqueue);
>

Re: [PATCH] libiscsi: add lock around task lists to fix list corruption regression

2017-02-23 Thread Martin K. Petersen
> "Chris" == Chris Leech  writes:

Chris> There's a rather long standing regression from commit 659743b
Chris> [SCSI] libiscsi: Reduce locking contention in fast path

Chris> Depending on iSCSI target behavior, it's possible to hit the case
Chris> in iscsi_complete_task where the task is still on a pending list
Chris> (!list_empty(>running)).  When that happens the task is
Chris> removed from the list while holding the session back_lock, but
Chris> other task list modification occur under the frwd_lock.  That
Chris> leads to linked list corruption and eventually a panicked system.

Chris> Rather than back out the session lock split entirely, in order to
Chris> try and keep some of the performance gains this patch adds
Chris> another lock to maintain the task lists integrity.

Chris> Major enterprise supported kernels have been backing out the lock
Chris> split for while now, thanks to the efforts at IBM where a lab
Chris> setup has the most reliable reproducer I've seen on this issue.
Chris> This patch has been tested there successfully.

Reviews, please!

-- 
Martin K. Petersen  Oracle Linux Engineering