Re: SCSI error handling -- one error blocks the whole SCSI host

2013-05-28 Thread Jeremy Linton
On 5/27/2013 8:32 PM, Baruch Even wrote: necessary but the command itself if it is already actively handled continues in its path. The abort only cancels those commands that are in the queue and if there really was a problem and the disk is engaging in error recovery of its own you'll just

Re: SCSI error handling -- one error blocks the whole SCSI host

2013-05-28 Thread Baruch Even
On Tue, May 28, 2013 at 5:38 PM, Jeremy Linton jlin...@tributary.com wrote: This is another part of what formed my opinions about error isolation. If one of your devices goes out to lunch and isn't recovering via abort/lun reset. Its done! Wrecking the rest of the SAN doing bus resets

Re: SCSI error handling -- one error blocks the whole SCSI host

2013-05-27 Thread Hannes Reinecke
On 05/27/2013 12:44 AM, James Bottomley wrote: On Thu, 2013-05-23 at 11:14 -0700, Roland Dreier wrote: At LSF this year, we had a discussion about error handling and in particular the problem that SCSI midlayer error handling waits for the entire SCSI host (HBA) to quiesce before it starts

Re: SCSI error handling -- one error blocks the whole SCSI host

2013-05-27 Thread James Bottomley
On Mon, 2013-05-27 at 16:39 +0200, Hannes Reinecke wrote: On 05/27/2013 12:44 AM, James Bottomley wrote: On Thu, 2013-05-23 at 11:14 -0700, Roland Dreier wrote: At LSF this year, we had a discussion about error handling and in particular the problem that SCSI midlayer error handling

Re: SCSI error handling -- one error blocks the whole SCSI host

2013-05-27 Thread Baruch Even
On Mon, May 27, 2013 at 11:41 PM, James Bottomley james.bottom...@hansenpartnership.com wrote: On Mon, 2013-05-27 at 16:39 +0200, Hannes Reinecke wrote: - LLDDs typically won't return a command status even for a command which has been aborted via ABORT TASK TMF. So the midlayer probably

Re: SCSI error handling -- one error blocks the whole SCSI host

2013-05-26 Thread James Bottomley
On Thu, 2013-05-23 at 11:14 -0700, Roland Dreier wrote: At LSF this year, we had a discussion about error handling and in particular the problem that SCSI midlayer error handling waits for the entire SCSI host (HBA) to quiesce before it starts to abort commands etc. James made the

Re: SCSI error handling -- one error blocks the whole SCSI host

2013-05-25 Thread James Smart
Roland, I agree, and am already working around that limitation. -- james s On 5/23/2013 2:14 PM, Roland Dreier wrote: At LSF this year, we had a discussion about error handling and in particular the problem that SCSI midlayer error handling waits for the entire SCSI host (HBA) to quiesce

SCSI error handling -- one error blocks the whole SCSI host

2013-05-23 Thread Roland Dreier
At LSF this year, we had a discussion about error handling and in particular the problem that SCSI midlayer error handling waits for the entire SCSI host (HBA) to quiesce before it starts to abort commands etc. James made the suggestion that FC should handle things the way SAS does, because SAS

re :SCSI error handling -- one error blocks the whole SCSI host

2013-05-23 Thread Jack Wang
James, am I understanding your suggestion properly? If so can you explain what you meant about the libsas code -- I see that it has its own strategy handler but as I said before we've already stopped every device attached to the HBA before we ever get there. To recapitulate the problem