Re: [PATCH] pm80xx: remove the SCSI host before detaching from SAS transport

2015-11-09 Thread Martin K. Petersen
> "Benjamin" == Benjamin Rood  writes:

Benjamin> Previously, when this module was unloaded via 'rmmod' with at
Benjamin> least one drive attached, the SCSI error handler thread would
Benjamin> become stuck in an infinite recovery loop and lockup the
Benjamin> system, necessitating a reboot.

Applied.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pm80xx: remove the SCSI host before detaching from SAS transport

2015-11-03 Thread Christoph Hellwig
On Mon, Nov 02, 2015 at 09:29:13AM +0100, Jack Wang wrote:
> Thanks for digging it down. Looks good to me.
> 
> Seems other libsas based driver all affected, maybe we should also fix them?
> Christoph (cc-ed), could you share your opinion about this fix, as the
> commit cff549e4860f is from you?

scsi_remove_host should be done first in ->remove and the commit
was written under that assumption.  So it looks like the other drivers
will need fixes as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pm80xx: remove the SCSI host before detaching from SAS transport

2015-11-02 Thread Jack Wang
2015-10-30 21:01 GMT+01:00 Benjamin Rood :
> Previously, when this module was unloaded via 'rmmod' with at least one
> drive attached, the SCSI error handler thread would become stuck in an
> infinite recovery loop and lockup the system, necessitating a reboot.
>
> Once the SAS layer is detached, the driver will fail any subsequent
> commands since the target devices are removed.  However, removing the
> SCSI host generates a SYNCHRONIZE CACHE (10) command, which was failed
> and left the error handler no method of recovery.
>
> This patch simply removes the SCSI host first so that no more commands
> can come down, prior to cleaning up the SAS layer.  Note that the stack
> is built up with the SCSI host first, and then the SAS layer.  Perhaps
> it should be reversed for symmetry, so that commands cannot be sent to
> the pm80xx driver prior to attaching the SAS layer?
>
> What was really strange about this bug was that it was introduced at
> commit cff549e4860f ("[SCSI]: proper state checking and module refcount
> handling in scsi_device_get").  This commit appears to tinker with how
> the reference counting is performed for SCSI device objects.  My theory
> is that prior to this commit, the refcount for a device object was
> blindly incremented at some point during the teardown process which
> coincidentially made the device stick around during the procedure, which
> also coincidentially made any commands sent to the driver not fail
> (since the device was technically still "there").  After this commit was
> applied, my theory is the refcount for the device object is not being
> incremented at a specific point anymore, which makes the device go away,
> and thus made the pm80xx driver fail any subsequent commands.
>
> You may also want to see the following for more details:
>
> [1] http://www.spinics.net/lists/linux-scsi/msg37208.html
> [2] http://marc.info/?l=linux-scsi=144416476406993=2
>
> Signed-off-by: Benjamin Rood 
> ---
>  drivers/scsi/pm8001/pm8001_init.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/pm8001/pm8001_init.c 
> b/drivers/scsi/pm8001/pm8001_init.c
> index bdc624f..b1b2dd7 100644
> --- a/drivers/scsi/pm8001/pm8001_init.c
> +++ b/drivers/scsi/pm8001/pm8001_init.c
> @@ -1096,10 +1096,10 @@ static void pm8001_pci_remove(struct pci_dev *pdev)
> struct pm8001_hba_info *pm8001_ha;
> int i, j;
> pm8001_ha = sha->lldd_ha;
> +   scsi_remove_host(pm8001_ha->shost);
> sas_unregister_ha(sha);
> sas_remove_host(pm8001_ha->shost);
> list_del(_ha->list);
> -   scsi_remove_host(pm8001_ha->shost);
> PM8001_CHIP_DISP->interrupt_disable(pm8001_ha, 0xFF);
> PM8001_CHIP_DISP->chip_soft_rst(pm8001_ha);
>
> --
> 2.4.3
>
Thanks for digging it down. Looks good to me.

Seems other libsas based driver all affected, maybe we should also fix them?
Christoph (cc-ed), could you share your opinion about this fix, as the
commit cff549e4860f is from you?

Regards,
Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html