RE: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed
-Original Message- From: Wei Fang [mailto:fangw...@huawei.com] Sent: June-01-16 10:37 PM >The concurrently decrements of host_failed only lead to abnormal of >host_failed, host_busy will be zero after error handler, and >the result may >be host_failed > host_busy forever. But in your case, host_busy > host_failed, >so I think it's not the same case. I'm >afraid that this patch can't fix your >scsi hang issue. I tested the patch and you (and others also suggested similar) are correct that it does not fix the error handling issue I am having with port multipliers and CD-ROM+HDD. Kevin -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed
On 2016/6/2 10:37, Wei Fang wrote: > Hi, Kevin, > > On 2016/6/1 22:36, Kevin Groeneveld wrote: >>> Subject: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of >>> ->host_failed >> >> I wonder if this could be related to >> http://www.spinics.net/lists/linux-scsi/msg86808.html? >> >> I never did get to the bottom of that. If I have time I hope to retest my >> scsi hang issue with this patch. >> > > The concurrently decrements of host_failed only lead to abnormal > of host_failed, host_busy will be zero after error handler, and > the result may be host_failed > host_busy forever. But in your > case, host_busy > host_failed, so I think it's not the same > case. I'm afraid that this patch can't fix your scsi hang issue. Something wrong in my words. host_busy may not be zero after error handler, but the result is true that missing decrement of host_failed may lead to host_failed > host_busy. Thanks, Wei -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed
On 06/01/2016 07:36 AM, Kevin Groeneveld wrote: Subject: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed I wonder if this could be related to http://www.spinics.net/lists/linux-scsi/msg86808.html? I never did get to the bottom of that. If I have time I hope to retest my scsi hang issue with this patch. I've never seen that hang with the ib_srp driver (which is a SCSI LLD). That probably means that the SATA code uses the SCSI error handler in another way than other SCSI LLDs. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed
On Wed, 2016-06-01 at 08:29 -0700, Bart Van Assche wrote: > On 06/01/2016 07:36 AM, Kevin Groeneveld wrote: > > > Subject: [PATCH v2 1/2] scsi: fix race between simultaneous > > > decrements of ->host_failed > > > > I wonder if this could be related to > > http://www.spinics.net/lists/linux-scsi/msg86808.html? > > > > I never did get to the bottom of that. If I have time I hope to > > retest my scsi hang issue with this patch. > > I've never seen that hang with the ib_srp driver (which is a SCSI > LLD). That probably means that the SATA code uses the SCSI error > handler in another way than other SCSI LLDs. It's the ata_sas error handler, which is split between libsas and libata. It does one thread of execution per ata port when the strategy handler is invoked. This behaviour is pretty much unique at the moment. James -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed
On Tue, 2016-05-31 at 16:38 +0800, Wei Fang wrote: > sas_ata_strategy_handler() adds the works of the ata error handler > to system_unbound_wq. This workqueue asynchronously runs work items, > so the ata error handler will be performed concurrently on different > CPUs. In this case, ->host_failed will be decreased simultaneously in > scsi_eh_finish_cmd() on different CPUs, and become abnormal. > > It will lead to permanently inequal between ->host_failed and > ->host_busy, and scsi error handler thread won't become running. > IO errors after that won't be handled forever. > > Use atomic type for ->host_failed to fix this race. As I said previously, you don't need atomics to do this, could you just remove the decrement in scsi_eh_finish_command() and zero the counter after the strategy handler completes. Thanks, James -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed
On Tue, May 31, 2016 at 8:21 PM, Dan Williamswrote: > On Tue, May 31, 2016 at 1:38 AM, Wei Fang wrote: >> sas_ata_strategy_handler() adds the works of the ata error handler >> to system_unbound_wq. This workqueue asynchronously runs work items, >> so the ata error handler will be performed concurrently on different >> CPUs. In this case, ->host_failed will be decreased simultaneously in >> scsi_eh_finish_cmd() on different CPUs, and become abnormal. >> >> It will lead to permanently inequal between ->host_failed and >> ->host_busy, and scsi error handler thread won't become running. >> IO errors after that won't be handled forever. >> >> Use atomic type for ->host_failed to fix this race. >> >> This fixes the problem introduced in >> commit 50824d6c5657 ("[SCSI] libsas: async ata-eh"). >> >> Reviewed-by: Christoph Hellwig >> Signed-off-by: Wei Fang > > Acked-by: Dan Williams Should also tag this for -stable. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed
On Tue, May 31, 2016 at 04:38:17PM +0800, Wei Fang wrote: > sas_ata_strategy_handler() adds the works of the ata error handler > to system_unbound_wq. This workqueue asynchronously runs work items, Are there more than one error handling work items per host? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html