RE: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed

2016-06-06 Thread Kevin Groeneveld
-Original Message-
From: Wei Fang [mailto:fangw...@huawei.com] 
Sent: June-01-16 10:37 PM

>The concurrently decrements of host_failed only lead to abnormal of 
>host_failed, host_busy will be zero after error handler, and >the result may 
>be host_failed > host_busy forever. But in your case, host_busy > host_failed, 
>so I think it's not the same case. I'm >afraid that this patch can't fix your 
>scsi hang issue.

I tested the patch and you (and others also suggested similar) are correct that 
it does not fix the error handling issue I am having with port multipliers and 
CD-ROM+HDD.


Kevin
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed

2016-06-01 Thread Wei Fang

On 2016/6/2 10:37, Wei Fang wrote:
> Hi, Kevin,
> 
> On 2016/6/1 22:36, Kevin Groeneveld wrote:
>>> Subject: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of 
>>> ->host_failed
>>
>> I wonder if this could be related to 
>> http://www.spinics.net/lists/linux-scsi/msg86808.html?
>>
>> I never did get to the bottom of that.  If I have time I hope to retest my 
>> scsi hang issue with this patch.
>>
> 
> The concurrently decrements of host_failed only lead to abnormal
> of host_failed, host_busy will be zero after error handler, and
> the result may be host_failed > host_busy forever. But in your
> case, host_busy > host_failed, so I think it's not the same
> case. I'm afraid that this patch can't fix your scsi hang issue.

Something wrong in my words. host_busy may not be zero after error
handler, but the result is true that missing decrement of host_failed
may lead to host_failed > host_busy.

Thanks,
Wei

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed

2016-06-01 Thread Bart Van Assche

On 06/01/2016 07:36 AM, Kevin Groeneveld wrote:

Subject: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of 
->host_failed


I wonder if this could be related to 
http://www.spinics.net/lists/linux-scsi/msg86808.html?

I never did get to the bottom of that.  If I have time I hope to retest my scsi 
hang issue with this patch.


I've never seen that hang with the ib_srp driver (which is a SCSI LLD). 
That probably means that the SATA code uses the SCSI error handler in 
another way than other SCSI LLDs.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed

2016-06-01 Thread James Bottomley
On Wed, 2016-06-01 at 08:29 -0700, Bart Van Assche wrote:
> On 06/01/2016 07:36 AM, Kevin Groeneveld wrote:
> > > Subject: [PATCH v2 1/2] scsi: fix race between simultaneous
> > > decrements of ->host_failed
> > 
> > I wonder if this could be related to 
> > http://www.spinics.net/lists/linux-scsi/msg86808.html?
> > 
> > I never did get to the bottom of that.  If I have time I hope to
> > retest my scsi hang issue with this patch.
> 
> I've never seen that hang with the ib_srp driver (which is a SCSI 
> LLD). That probably means that the SATA code uses the SCSI error 
> handler in another way than other SCSI LLDs.

It's the ata_sas error handler, which is split between libsas and
libata.  It does one thread of execution per ata port when the strategy
handler is invoked.  This behaviour is pretty much unique at the
moment.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed

2016-06-01 Thread James Bottomley
On Tue, 2016-05-31 at 16:38 +0800, Wei Fang wrote:
> sas_ata_strategy_handler() adds the works of the ata error handler
> to system_unbound_wq. This workqueue asynchronously runs work items,
> so the ata error handler will be performed concurrently on different
> CPUs. In this case, ->host_failed will be decreased simultaneously in
> scsi_eh_finish_cmd() on different CPUs, and become abnormal.
> 
> It will lead to permanently inequal between ->host_failed and
>  ->host_busy, and scsi error handler thread won't become running.
> IO errors after that won't be handled forever.
> 
> Use atomic type for ->host_failed to fix this race.

As I said previously, you don't need atomics to do this, could you just
remove the decrement in scsi_eh_finish_command() and zero the counter
after the strategy handler completes.

Thanks,

James


--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed

2016-05-31 Thread Dan Williams
On Tue, May 31, 2016 at 8:21 PM, Dan Williams  wrote:
> On Tue, May 31, 2016 at 1:38 AM, Wei Fang  wrote:
>> sas_ata_strategy_handler() adds the works of the ata error handler
>> to system_unbound_wq. This workqueue asynchronously runs work items,
>> so the ata error handler will be performed concurrently on different
>> CPUs. In this case, ->host_failed will be decreased simultaneously in
>> scsi_eh_finish_cmd() on different CPUs, and become abnormal.
>>
>> It will lead to permanently inequal between ->host_failed and
>>  ->host_busy, and scsi error handler thread won't become running.
>> IO errors after that won't be handled forever.
>>
>> Use atomic type for ->host_failed to fix this race.
>>
>> This fixes the problem introduced in
>> commit 50824d6c5657 ("[SCSI] libsas: async ata-eh").
>>
>> Reviewed-by: Christoph Hellwig 
>> Signed-off-by: Wei Fang 
>
> Acked-by: Dan Williams 

Should also tag this for -stable.
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] scsi: fix race between simultaneous decrements of ->host_failed

2016-05-31 Thread Tejun Heo
On Tue, May 31, 2016 at 04:38:17PM +0800, Wei Fang wrote:
> sas_ata_strategy_handler() adds the works of the ata error handler
> to system_unbound_wq. This workqueue asynchronously runs work items,

Are there more than one error handling work items per host?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html