Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
James, >> Cc: #4.4+ Ugh. > Martin, can you fix this up with a rebase and I'll resync my tree? Done and pushed. -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
On Mon, 2018-02-12 at 10:28 -0800, Himanshu Madhani wrote: [...] > Cc: #4.4+ This is the wrong stable tag, which would lead to stable not picking up the patch automatically. The correct stable address is Cc: #4.4+ ^^ Please be more careful in future. Martin, can you fix this up with a rebase and I'll resync my tree? Thanks, James
Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
Himanshu, > Can you please queue this patch for 4.16-rc3 Applied to 4.16/scsi-fixes with the missing Fixes: tag. Thank you! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
Hi Martin, > On Feb 14, 2018, at 11:58 AM, Madhani, Himanshu > wrote: > > Hi John, > >> On Feb 13, 2018, at 2:54 AM, John Garry wrote: >> >> On 12/02/2018 18:28, Himanshu Madhani wrote: >>> This patch fixes NULL pointer crash due to active timer running >>> for abort IOCB. >>> From crash dump analysis it was discoverd that get_next_timer_interrupt() >>> encountered a corrupted entry on the timer list. >>> >>> #9 [95e1f6f0fd40] page_fault at 914fe8f8 >>> [exception RIP: get_next_timer_interrupt+440] >>> RIP: 90ea3088 RSP: 95e1f6f0fdf0 RFLAGS: 00010013 >>> RAX: 95e1f6451028 RBX: 000218e2389e5f40 RCX: 0001232ad600 >>> RDX: 0001 RSI: 95e1f6f0fdf0 RDI: 01232ad6 >>> RBP: 95e1f6f0fe40 R8: 95e1f6451188 R9: 0001 >>> R10: 0016 R11: 0016 R12: 0001232ad5f6 >>> R13: 95e1f645 R14: 95e1f6f0fdf8 R15: 95e1f6f0fe10 >>> ORIG_RAX: CS: 0010 SS: 0018 >>> >>> Looking at the assembly of get_next_timer_interrupt(), address came >>> from %r8 (95e1f6451188) which is pointing to list_head with single >>> entry at 95e5ff621178. >>> >>> 0x90ea307a : mov(%r8),%rdx >>> 0x90ea307d : cmp%r8,%rdx >>> 0x90ea3080 : je >>> 0x90ea30a7 >>> 0x90ea3082 : nopw >>> 0x0(%rax,%rax,1) >>> 0x90ea3088 : testb >>> $0x1,0x18(%rdx) >>> >>> crash> rd 95e1f6451188 10 >>> 95e1f6451188: 95e5ff621178 95e5ff621178 x.b.x.b. >>> 95e1f6451198: 95e1f6451198 95e1f6451198 ..E...E. >>> 95e1f64511a8: 95e1f64511a8 95e1f64511a8 ..E...E. >>> 95e1f64511b8: 95e77cf509a0 95e77cf509a0 ...|...| >>> 95e1f64511c8: 95e1f64511c8 95e1f64511c8 ..E...E. >>> >>> crash> rd 95e5ff621178 10 >>> 95e5ff621178: 0001 95e15936aa00 ..6Y >>> 95e5ff621188: >>> 95e5ff621198: 00a0 0010 >>> 95e5ff6211a8: 95e5ff621198 000c ..b. >>> 95e5ff6211b8: 0f58 95e751f8d720 X... ..Q >>> >>> 95e5ff621178 belongs to freed mempool object at 95e5ff621080. >>> >>> CACHENAME OBJSIZE ALLOCATED TOTAL SLABS >>> SSIZE >>> 95dc7fd74d00 mnt_cache384 19785 24948594 >>> 16k >>> SLAB MEMORYNODE TOTAL ALLOCATED FREE >>> dc5dabfd8800 95e5ff62 1 42 2913 >>> FREE / [ALLOCATED] >>> 95e5ff621080 (cpu 6 cache) >>> >>> Examining the contents of that memory reveals a pointer to a constant >>> string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd(). >>> >>> crash> rd c059277c 20 >>> c059277c: 6e490074726f6261 0074707572726574 abort.Interrupt. >>> c059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i >>> c059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi >>> c05927ac: 636976656420676e 786c252074612065 ng device at %lx >>> c05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod >>> c05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip.. >>> c05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A >>> c05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking >>> c05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl >>> c059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma >>> >>> crash> struct -ox srb_iocb >>> struct srb_iocb { >>> union { >>> struct {...} logio; >>> struct {...} els_logo; >>> struct {...} tmf; >>> struct {...} fxiocb; >>> struct {...} abt; >>> struct ct_arg ctarg; >>> struct {...} mbx; >>> struct {...} nack; >>> [0x0 ] } u; >>> [0xb8] struct timer_list timer; >>> [0x108] void (*timeout)(void *); >>> } >>> SIZE: 0x110 >>> >>> crash> ! bc >>> ibase=16 >>> obase=10 >>> B8+40 >>> F8 >>> >>> The object is a srb_t, and at offset 0xf8 within that structure >>> (i.e. 95e5ff621080 + f8 -> 95e5ff621178) is a struct timer_list. >>> >>> Cc: #4.4+ >>> Signed-off-by: Himanshu Madhani >>> --- >>> Hi Martin, >>> >>> This patch addresses crash due to NULL pointer access because driver >>> left active timer running for abort IOCB. >>> >>> Please apply this patch to 4.16/scsi-fixes at your earliest convenience. >>> >>> Thanks, >>> Himanshu >>> --- >>> drivers/scsi/qla2xxx/qla_init.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/scsi/qla2xxx/qla_init.c >>> b/drivers/scsi/qla2xxx/qla_init.c >>> index 2dea1129d396..04870621e712 100644 >>> --- a/d
Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
Hi John, > On Feb 13, 2018, at 2:54 AM, John Garry wrote: > > On 12/02/2018 18:28, Himanshu Madhani wrote: >> This patch fixes NULL pointer crash due to active timer running >> for abort IOCB. >> >>> From crash dump analysis it was discoverd that get_next_timer_interrupt() >> encountered a corrupted entry on the timer list. >> >> #9 [95e1f6f0fd40] page_fault at 914fe8f8 >>[exception RIP: get_next_timer_interrupt+440] >>RIP: 90ea3088 RSP: 95e1f6f0fdf0 RFLAGS: 00010013 >>RAX: 95e1f6451028 RBX: 000218e2389e5f40 RCX: 0001232ad600 >>RDX: 0001 RSI: 95e1f6f0fdf0 RDI: 01232ad6 >>RBP: 95e1f6f0fe40 R8: 95e1f6451188 R9: 0001 >>R10: 0016 R11: 0016 R12: 0001232ad5f6 >>R13: 95e1f645 R14: 95e1f6f0fdf8 R15: 95e1f6f0fe10 >>ORIG_RAX: CS: 0010 SS: 0018 >> >> Looking at the assembly of get_next_timer_interrupt(), address came >> from %r8 (95e1f6451188) which is pointing to list_head with single >> entry at 95e5ff621178. >> >> 0x90ea307a : mov(%r8),%rdx >> 0x90ea307d : cmp%r8,%rdx >> 0x90ea3080 : je >> 0x90ea30a7 >> 0x90ea3082 : nopw >> 0x0(%rax,%rax,1) >> 0x90ea3088 : testb >> $0x1,0x18(%rdx) >> >> crash> rd 95e1f6451188 10 >> 95e1f6451188: 95e5ff621178 95e5ff621178 x.b.x.b. >> 95e1f6451198: 95e1f6451198 95e1f6451198 ..E...E. >> 95e1f64511a8: 95e1f64511a8 95e1f64511a8 ..E...E. >> 95e1f64511b8: 95e77cf509a0 95e77cf509a0 ...|...| >> 95e1f64511c8: 95e1f64511c8 95e1f64511c8 ..E...E. >> >> crash> rd 95e5ff621178 10 >> 95e5ff621178: 0001 95e15936aa00 ..6Y >> 95e5ff621188: >> 95e5ff621198: 00a0 0010 >> 95e5ff6211a8: 95e5ff621198 000c ..b. >> 95e5ff6211b8: 0f58 95e751f8d720 X... ..Q >> >> 95e5ff621178 belongs to freed mempool object at 95e5ff621080. >> >> CACHENAME OBJSIZE ALLOCATED TOTAL SLABS >> SSIZE >> 95dc7fd74d00 mnt_cache384 19785 24948594 >> 16k >> SLAB MEMORYNODE TOTAL ALLOCATED FREE >> dc5dabfd8800 95e5ff62 1 42 2913 >> FREE / [ALLOCATED] >>95e5ff621080 (cpu 6 cache) >> >> Examining the contents of that memory reveals a pointer to a constant >> string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd(). >> >> crash> rd c059277c 20 >> c059277c: 6e490074726f6261 0074707572726574 abort.Interrupt. >> c059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i >> c059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi >> c05927ac: 636976656420676e 786c252074612065 ng device at %lx >> c05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod >> c05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip.. >> c05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A >> c05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking >> c05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl >> c059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma >> >> crash> struct -ox srb_iocb >> struct srb_iocb { >> union { >> struct {...} logio; >> struct {...} els_logo; >> struct {...} tmf; >> struct {...} fxiocb; >> struct {...} abt; >> struct ct_arg ctarg; >> struct {...} mbx; >> struct {...} nack; >>[0x0 ] } u; >>[0xb8] struct timer_list timer; >>[0x108] void (*timeout)(void *); >> } >> SIZE: 0x110 >> >> crash> ! bc >> ibase=16 >> obase=10 >> B8+40 >> F8 >> >> The object is a srb_t, and at offset 0xf8 within that structure >> (i.e. 95e5ff621080 + f8 -> 95e5ff621178) is a struct timer_list. >> >> Cc: #4.4+ >> Signed-off-by: Himanshu Madhani >> --- >> Hi Martin, >> >> This patch addresses crash due to NULL pointer access because driver >> left active timer running for abort IOCB. >> >> Please apply this patch to 4.16/scsi-fixes at your earliest convenience. >> >> Thanks, >> Himanshu >> --- >> drivers/scsi/qla2xxx/qla_init.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/scsi/qla2xxx/qla_init.c >> b/drivers/scsi/qla2xxx/qla_init.c >> index 2dea1129d396..04870621e712 100644 >> --- a/drivers/scsi/qla2xxx/qla_init.c >> +++ b/drivers/scsi/qla2xxx/qla_init.c >> @@ -1527,6 +1527,7 @@ `(void *ptr, int res) >> srb_t *sp = ptr; >> struct srb_iocb *abt =
Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
On Tue, 2018-02-13 at 18:13 +, Madhani, Himanshu wrote: > Thanks for asking (I did not add fixes commit ID because it was pre > 4.x kernel) > > Here’s commit id 4440e46d5db7b445a961a8849b2a31fa7fd1 which is > fixed by this patch. Thanks for the info. In general having a Fixes: 4440e46d5db7 ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous handling.") line is extremly valuable as many distributions have automation in place to check whether a fix for a backport arrived upstream. Thanks, Johannes -- Johannes Thumshirn Storage jthu msh...@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
Hi John, > On Feb 13, 2018, at 2:54 AM, John Garry wrote: > > On 12/02/2018 18:28, Himanshu Madhani wrote: >> This patch fixes NULL pointer crash due to active timer running >> for abort IOCB. >> >>> From crash dump analysis it was discoverd that get_next_timer_interrupt() >> encountered a corrupted entry on the timer list. >> >> #9 [95e1f6f0fd40] page_fault at 914fe8f8 >>[exception RIP: get_next_timer_interrupt+440] >>RIP: 90ea3088 RSP: 95e1f6f0fdf0 RFLAGS: 00010013 >>RAX: 95e1f6451028 RBX: 000218e2389e5f40 RCX: 0001232ad600 >>RDX: 0001 RSI: 95e1f6f0fdf0 RDI: 01232ad6 >>RBP: 95e1f6f0fe40 R8: 95e1f6451188 R9: 0001 >>R10: 0016 R11: 0016 R12: 0001232ad5f6 >>R13: 95e1f645 R14: 95e1f6f0fdf8 R15: 95e1f6f0fe10 >>ORIG_RAX: CS: 0010 SS: 0018 >> >> Looking at the assembly of get_next_timer_interrupt(), address came >> from %r8 (95e1f6451188) which is pointing to list_head with single >> entry at 95e5ff621178. >> >> 0x90ea307a : mov(%r8),%rdx >> 0x90ea307d : cmp%r8,%rdx >> 0x90ea3080 : je >> 0x90ea30a7 >> 0x90ea3082 : nopw >> 0x0(%rax,%rax,1) >> 0x90ea3088 : testb >> $0x1,0x18(%rdx) >> >> crash> rd 95e1f6451188 10 >> 95e1f6451188: 95e5ff621178 95e5ff621178 x.b.x.b. >> 95e1f6451198: 95e1f6451198 95e1f6451198 ..E...E. >> 95e1f64511a8: 95e1f64511a8 95e1f64511a8 ..E...E. >> 95e1f64511b8: 95e77cf509a0 95e77cf509a0 ...|...| >> 95e1f64511c8: 95e1f64511c8 95e1f64511c8 ..E...E. >> >> crash> rd 95e5ff621178 10 >> 95e5ff621178: 0001 95e15936aa00 ..6Y >> 95e5ff621188: >> 95e5ff621198: 00a0 0010 >> 95e5ff6211a8: 95e5ff621198 000c ..b. >> 95e5ff6211b8: 0f58 95e751f8d720 X... ..Q >> >> 95e5ff621178 belongs to freed mempool object at 95e5ff621080. >> >> CACHENAME OBJSIZE ALLOCATED TOTAL SLABS >> SSIZE >> 95dc7fd74d00 mnt_cache384 19785 24948594 >> 16k >> SLAB MEMORYNODE TOTAL ALLOCATED FREE >> dc5dabfd8800 95e5ff62 1 42 2913 >> FREE / [ALLOCATED] >>95e5ff621080 (cpu 6 cache) >> >> Examining the contents of that memory reveals a pointer to a constant >> string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd(). >> >> crash> rd c059277c 20 >> c059277c: 6e490074726f6261 0074707572726574 abort.Interrupt. >> c059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i >> c059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi >> c05927ac: 636976656420676e 786c252074612065 ng device at %lx >> c05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod >> c05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip.. >> c05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A >> c05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking >> c05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl >> c059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma >> >> crash> struct -ox srb_iocb >> struct srb_iocb { >> union { >> struct {...} logio; >> struct {...} els_logo; >> struct {...} tmf; >> struct {...} fxiocb; >> struct {...} abt; >> struct ct_arg ctarg; >> struct {...} mbx; >> struct {...} nack; >>[0x0 ] } u; >>[0xb8] struct timer_list timer; >>[0x108] void (*timeout)(void *); >> } >> SIZE: 0x110 >> >> crash> ! bc >> ibase=16 >> obase=10 >> B8+40 >> F8 >> >> The object is a srb_t, and at offset 0xf8 within that structure >> (i.e. 95e5ff621080 + f8 -> 95e5ff621178) is a struct timer_list. >> >> Cc: #4.4+ >> Signed-off-by: Himanshu Madhani >> --- >> Hi Martin, >> >> This patch addresses crash due to NULL pointer access because driver >> left active timer running for abort IOCB. >> >> Please apply this patch to 4.16/scsi-fixes at your earliest convenience. >> >> Thanks, >> Himanshu >> --- >> drivers/scsi/qla2xxx/qla_init.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/scsi/qla2xxx/qla_init.c >> b/drivers/scsi/qla2xxx/qla_init.c >> index 2dea1129d396..04870621e712 100644 >> --- a/drivers/scsi/qla2xxx/qla_init.c >> +++ b/drivers/scsi/qla2xxx/qla_init.c >> @@ -1527,6 +1527,7 @@ qla24xx_abort_sp_done(void *ptr, int res) >> srb_t *sp = ptr; >> st
Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
Hi Johannes, > On Feb 13, 2018, at 2:38 AM, Johannes Thumshirn wrote: > > Thanks Himanshu, > Reviewed-by: Johannes Thumshirn > > Do you happen to know which commit it actually fixes? > Thanks for asking (I did not add fixes commit ID because it was pre 4.x kernel) Here’s commit id 4440e46d5db7b445a961a8849b2a31fa7fd1 which is fixed by this patch. > -- > Johannes Thumshirn Storage > jthu > msh...@suse.de+49 911 74053 689 > SUSE > LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: Felix Imendörffer, Jane > Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > Key fingerprint = EC38 > 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 Thanks, - Himanshu
Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
On 12/02/2018 18:28, Himanshu Madhani wrote: This patch fixes NULL pointer crash due to active timer running for abort IOCB. From crash dump analysis it was discoverd that get_next_timer_interrupt() encountered a corrupted entry on the timer list. #9 [95e1f6f0fd40] page_fault at 914fe8f8 [exception RIP: get_next_timer_interrupt+440] RIP: 90ea3088 RSP: 95e1f6f0fdf0 RFLAGS: 00010013 RAX: 95e1f6451028 RBX: 000218e2389e5f40 RCX: 0001232ad600 RDX: 0001 RSI: 95e1f6f0fdf0 RDI: 01232ad6 RBP: 95e1f6f0fe40 R8: 95e1f6451188 R9: 0001 R10: 0016 R11: 0016 R12: 0001232ad5f6 R13: 95e1f645 R14: 95e1f6f0fdf8 R15: 95e1f6f0fe10 ORIG_RAX: CS: 0010 SS: 0018 Looking at the assembly of get_next_timer_interrupt(), address came from %r8 (95e1f6451188) which is pointing to list_head with single entry at 95e5ff621178. 0x90ea307a : mov(%r8),%rdx 0x90ea307d : cmp%r8,%rdx 0x90ea3080 : je 0x90ea30a7 0x90ea3082 : nopw 0x0(%rax,%rax,1) 0x90ea3088 : testb $0x1,0x18(%rdx) crash> rd 95e1f6451188 10 95e1f6451188: 95e5ff621178 95e5ff621178 x.b.x.b. 95e1f6451198: 95e1f6451198 95e1f6451198 ..E...E. 95e1f64511a8: 95e1f64511a8 95e1f64511a8 ..E...E. 95e1f64511b8: 95e77cf509a0 95e77cf509a0 ...|...| 95e1f64511c8: 95e1f64511c8 95e1f64511c8 ..E...E. crash> rd 95e5ff621178 10 95e5ff621178: 0001 95e15936aa00 ..6Y 95e5ff621188: 95e5ff621198: 00a0 0010 95e5ff6211a8: 95e5ff621198 000c ..b. 95e5ff6211b8: 0f58 95e751f8d720 X... ..Q 95e5ff621178 belongs to freed mempool object at 95e5ff621080. CACHENAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE 95dc7fd74d00 mnt_cache384 19785 24948594 16k SLAB MEMORYNODE TOTAL ALLOCATED FREE dc5dabfd8800 95e5ff62 1 42 2913 FREE / [ALLOCATED] 95e5ff621080 (cpu 6 cache) Examining the contents of that memory reveals a pointer to a constant string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd(). crash> rd c059277c 20 c059277c: 6e490074726f6261 0074707572726574 abort.Interrupt. c059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i c059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi c05927ac: 636976656420676e 786c252074612065 ng device at %lx c05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod c05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip.. c05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A c05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking c05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl c059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma crash> struct -ox srb_iocb struct srb_iocb { union { struct {...} logio; struct {...} els_logo; struct {...} tmf; struct {...} fxiocb; struct {...} abt; struct ct_arg ctarg; struct {...} mbx; struct {...} nack; [0x0 ] } u; [0xb8] struct timer_list timer; [0x108] void (*timeout)(void *); } SIZE: 0x110 crash> ! bc ibase=16 obase=10 B8+40 F8 The object is a srb_t, and at offset 0xf8 within that structure (i.e. 95e5ff621080 + f8 -> 95e5ff621178) is a struct timer_list. Cc: #4.4+ Signed-off-by: Himanshu Madhani --- Hi Martin, This patch addresses crash due to NULL pointer access because driver left active timer running for abort IOCB. Please apply this patch to 4.16/scsi-fixes at your earliest convenience. Thanks, Himanshu --- drivers/scsi/qla2xxx/qla_init.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index 2dea1129d396..04870621e712 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -1527,6 +1527,7 @@ qla24xx_abort_sp_done(void *ptr, int res) srb_t *sp = ptr; struct srb_iocb *abt = &sp->u.iocb_cmd; + del_timer(&sp->u.iocb_cmd.timer); Hi, I am not familiar with this code. But I note that this abort seems to be "asynchronous" (I see it's setup in qla24xx_async_abort_cmd()), so do you need the "sync" variant of del_timer() for safety? This would be to ensure the timeout has not actually o
Re: [PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
Thanks Himanshu, Reviewed-by: Johannes Thumshirn Do you happen to know which commit it actually fixes? -- Johannes Thumshirn Storage jthu msh...@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
[PATCH] qla2xxx: Fix NULL pointer crash due to active timer for ABTS
This patch fixes NULL pointer crash due to active timer running for abort IOCB. >From crash dump analysis it was discoverd that get_next_timer_interrupt() encountered a corrupted entry on the timer list. #9 [95e1f6f0fd40] page_fault at 914fe8f8 [exception RIP: get_next_timer_interrupt+440] RIP: 90ea3088 RSP: 95e1f6f0fdf0 RFLAGS: 00010013 RAX: 95e1f6451028 RBX: 000218e2389e5f40 RCX: 0001232ad600 RDX: 0001 RSI: 95e1f6f0fdf0 RDI: 01232ad6 RBP: 95e1f6f0fe40 R8: 95e1f6451188 R9: 0001 R10: 0016 R11: 0016 R12: 0001232ad5f6 R13: 95e1f645 R14: 95e1f6f0fdf8 R15: 95e1f6f0fe10 ORIG_RAX: CS: 0010 SS: 0018 Looking at the assembly of get_next_timer_interrupt(), address came from %r8 (95e1f6451188) which is pointing to list_head with single entry at 95e5ff621178. 0x90ea307a : mov(%r8),%rdx 0x90ea307d : cmp%r8,%rdx 0x90ea3080 : je 0x90ea30a7 0x90ea3082 : nopw 0x0(%rax,%rax,1) 0x90ea3088 : testb $0x1,0x18(%rdx) crash> rd 95e1f6451188 10 95e1f6451188: 95e5ff621178 95e5ff621178 x.b.x.b. 95e1f6451198: 95e1f6451198 95e1f6451198 ..E...E. 95e1f64511a8: 95e1f64511a8 95e1f64511a8 ..E...E. 95e1f64511b8: 95e77cf509a0 95e77cf509a0 ...|...| 95e1f64511c8: 95e1f64511c8 95e1f64511c8 ..E...E. crash> rd 95e5ff621178 10 95e5ff621178: 0001 95e15936aa00 ..6Y 95e5ff621188: 95e5ff621198: 00a0 0010 95e5ff6211a8: 95e5ff621198 000c ..b. 95e5ff6211b8: 0f58 95e751f8d720 X... ..Q 95e5ff621178 belongs to freed mempool object at 95e5ff621080. CACHENAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE 95dc7fd74d00 mnt_cache384 19785 24948594 16k SLAB MEMORYNODE TOTAL ALLOCATED FREE dc5dabfd8800 95e5ff62 1 42 2913 FREE / [ALLOCATED] 95e5ff621080 (cpu 6 cache) Examining the contents of that memory reveals a pointer to a constant string in the driver, "abort\0", which is set by qla24xx_async_abort_cmd(). crash> rd c059277c 20 c059277c: 6e490074726f6261 0074707572726574 abort.Interrupt. c059278c: 00676e696c6c6f50 6920726576697244 Polling.Driver i c059279c: 646f6d207325206e 6974736554000a65 n %s mode..Testi c05927ac: 636976656420676e 786c252074612065 ng device at %lx c05927bc: 6b63656843000a2e 646f727020676e69 ...Checking prod c05927cc: 6f20444920746375 0a2e706968632066 uct ID of chip.. c05927dc: 5120646e756f4600 204130303232414c .Found QLA2200A c05927ec: 43000a2e70696843 20676e696b636568 Chip...Checking c05927fc: 65786f626c69616d 6c636e69000a2e73 mailboxes...incl c059280c: 756e696c2f656475 616d2d616d642f78 ude/linux/dma-ma crash> struct -ox srb_iocb struct srb_iocb { union { struct {...} logio; struct {...} els_logo; struct {...} tmf; struct {...} fxiocb; struct {...} abt; struct ct_arg ctarg; struct {...} mbx; struct {...} nack; [0x0 ] } u; [0xb8] struct timer_list timer; [0x108] void (*timeout)(void *); } SIZE: 0x110 crash> ! bc ibase=16 obase=10 B8+40 F8 The object is a srb_t, and at offset 0xf8 within that structure (i.e. 95e5ff621080 + f8 -> 95e5ff621178) is a struct timer_list. Cc: #4.4+ Signed-off-by: Himanshu Madhani --- Hi Martin, This patch addresses crash due to NULL pointer access because driver left active timer running for abort IOCB. Please apply this patch to 4.16/scsi-fixes at your earliest convenience. Thanks, Himanshu --- drivers/scsi/qla2xxx/qla_init.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index 2dea1129d396..04870621e712 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -1527,6 +1527,7 @@ qla24xx_abort_sp_done(void *ptr, int res) srb_t *sp = ptr; struct srb_iocb *abt = &sp->u.iocb_cmd; + del_timer(&sp->u.iocb_cmd.timer); complete(&abt->u.abt.comp); } -- 2.12.0