Re: [PATCH v2] scsi: handle special return codes for ABORTED COMMAND

2018-02-26 Thread Martin K. Petersen

Doug,

> I checked the code for the ABORTED COMMAND asc=0x10 case. That is a
> bunch of Protection Information errors which retrying would be
> pointless. And the existing code does check for that just before the
> new check for BLIST_ABORTED_CMD_QUIRK. The PI check has the cryptic
> comment /* DIF */ and we also have 'T10 DIF' appearing in
> defines. 'DIF' is mentioned _once_ in the T10 drafts thatI can find
> (in sbc4r15.pdf the last sentence of section 4.22.1) and is not
> cross-referenced anywhere that I can find.
>
> So anyway, Martin (P.), could we have and explanation somewhere
> prominent in our code what DIF and DIX mean in SCSI (and other
> standards) terms?

At the time this code was written, the T10 Protection Information Model
was called Data Integrity Field or DIF. DIF turned out to clash with the
three letter acronym for another T10 feature in development. As a
result, the integrity field was re-branded protection information.

However, the new name didn't really take off. And despite the spec being
consistent in its use of T10 PIM/PI, people in the industry generally
still refer to it as "DIF". Easier to pronounce and not much risk of
confusion (to my knowledge, the other "DIF" in T10 never really came to
fruition).

You'll note that the more recent integrity profile stuff is located in
block/t10-pi.c. But that the profile strings still use "DIF" to avoid
breaking existing apps.

I have been contemplating renaming sd_dif.c to sd_dix.c since all the
DIF/PI stuff has moved to sd.c over the years. The only thing left in
sd_dif.c at this point is DIX functionality...

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] scsi: handle special return codes for ABORTED COMMAND

2018-02-26 Thread Martin K. Petersen

Martin,

> Introduce a new blist flag that indicates the device may return certain
> sense code/ASC/ASCQ combinations that indicate different treatment than
> normal. In particular, some devices need unconditional retry (aka
> ADD_TO_MLQUEUE) under certain conditions; otherwise path failures may be
> falsely detected in the "maybe_retry" case.

I'm still not convinced of the merits of all this complexity to save one
bit in a per-device data structure.

Introducing a BLIST_RETRY_ITF entry for 44/xx might be generally useful
for devices other than Symmetrix. So let's just do that.

As far as the vendor-specific Fujitsu ASC/ASCQ combo, I'm also OK with a
quirk flag. If we end up with many devices that needs vendor-specific
treatment we can revisit the bigger patch. But for now I think it's an
overkill.

Only caveat is that I think you need to tweak the blist_flags_t stuff to
accommodate the extra bits...

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: [PATCH v2] scsi: handle special return codes for ABORTED COMMAND

2018-02-23 Thread Douglas Gilbert

On 2018-02-23 08:33 AM, Martin Wilck wrote:

Gentle reminder - reviews welcome ...

On Tue, 2018-01-30 at 11:25 +0100, Martin Wilck wrote:

Introduce a new blist flag that indicates the device may return
certain
sense code/ASC/ASCQ combinations that indicate different treatment
than
normal. In particular, some devices need unconditional retry (aka
ADD_TO_MLQUEUE) under certain conditions; otherwise path failures may
be
falsely detected in the "maybe_retry" case.

Because different devices use different sense codes to indicate this
condition, a single bit is not enough. But we also can't use a bit
for
every possible status code. Therefore the new flag
BLIST_ABORTED_CMD_QUIRK
just indicates that this is a device for which the return code should
be
checked. An extra "blacklist" in scsi_aborted_cmd_quirk() then checks
for
known ASC/ASCQ combinations, and handles them.

When such a combination is encountered for the first time, a message
is
printed. In systems that have several different peripherals using
this
flag, that might lead to a wrong match without a warning. This small
risk
is a compromise between exactness and the excessive resources that
would be
required to check for matching device vendor and model name every
time.
Also, if there were different devices (identified by vendor/model)
using
the same ASC/ASCQ, the code might print stray warnings. But the
additional
effort to required to handle this 100% correctly is hardly justified
with
the current minimal "blacklist".

I chose to recycle devinfo flag (1<<11) (former BLIST_INQUIRY_58,
free
since 496c91bbc910) for this purpose rather than extending
blist_flags_t to
64 bit. This could be easily changed of course.

This patch replaces the previously submitted patches
  "scsi: handle ABORTED_COMMAND on Fujitsu ETERNUS" and
  "scsi: Always retry internal target error" (Hannes Reinecke)

Changes in v2:
  - use ARRAY_SIZE (Bart van Assche)
  - make blist array static const and use separate bitmask for warned
flag
(Bart van Assche)
  - Fix string comparison for SCSI vendor and model
  - Print warning at scsi_logging_level 0, and improve message
formatting

Signed-off-by: Martin Wilck 


I checked the code for the ABORTED COMMAND asc=0x10 case. That is a bunch
of Protection Information errors which retrying would be pointless. And
the existing code does check for that just before the new check for
BLIST_ABORTED_CMD_QUIRK. The PI check has the cryptic comment /* DIF */
and we also have 'T10 DIF' appearing in defines. 'DIF' is mentioned _once_
in the T10 drafts thatI can find (in sbc4r15.pdf the last sentence of
section 4.22.1) and is not cross-referenced anywhere that I can find.

So anyway, Martin (P.), could we have and explanation somewhere prominent in
our code what DIF and DIX mean in SCSI (and other standards) terms?

But that is not this patch's problem, so:

Reviewed-by: Douglas Gilbert 




Re: [PATCH v2] scsi: handle special return codes for ABORTED COMMAND

2018-02-23 Thread Bart Van Assche
On Tue, 2018-01-30 at 11:25 +0100, Martin Wilck wrote:
> Introduce a new blist flag that indicates the device may return certain
> sense code/ASC/ASCQ combinations that indicate different treatment than
> normal. In particular, some devices need unconditional retry (aka
> ADD_TO_MLQUEUE) under certain conditions; otherwise path failures may be
> falsely detected in the "maybe_retry" case.

Reviewed-by: Bart Van Assche 





Re: [PATCH v2] scsi: handle special return codes for ABORTED COMMAND

2018-02-23 Thread Martin Wilck
Gentle reminder - reviews welcome ...

On Tue, 2018-01-30 at 11:25 +0100, Martin Wilck wrote:
> Introduce a new blist flag that indicates the device may return
> certain
> sense code/ASC/ASCQ combinations that indicate different treatment
> than
> normal. In particular, some devices need unconditional retry (aka
> ADD_TO_MLQUEUE) under certain conditions; otherwise path failures may
> be
> falsely detected in the "maybe_retry" case.
> 
> Because different devices use different sense codes to indicate this
> condition, a single bit is not enough. But we also can't use a bit
> for
> every possible status code. Therefore the new flag
> BLIST_ABORTED_CMD_QUIRK
> just indicates that this is a device for which the return code should
> be
> checked. An extra "blacklist" in scsi_aborted_cmd_quirk() then checks
> for
> known ASC/ASCQ combinations, and handles them.
> 
> When such a combination is encountered for the first time, a message
> is
> printed. In systems that have several different peripherals using
> this
> flag, that might lead to a wrong match without a warning. This small
> risk
> is a compromise between exactness and the excessive resources that
> would be
> required to check for matching device vendor and model name every
> time.
> Also, if there were different devices (identified by vendor/model)
> using
> the same ASC/ASCQ, the code might print stray warnings. But the
> additional
> effort to required to handle this 100% correctly is hardly justified
> with
> the current minimal "blacklist".
> 
> I chose to recycle devinfo flag (1<<11) (former BLIST_INQUIRY_58,
> free
> since 496c91bbc910) for this purpose rather than extending
> blist_flags_t to
> 64 bit. This could be easily changed of course.
> 
> This patch replaces the previously submitted patches
>  "scsi: handle ABORTED_COMMAND on Fujitsu ETERNUS" and
>  "scsi: Always retry internal target error" (Hannes Reinecke)
> 
> Changes in v2:
>  - use ARRAY_SIZE (Bart van Assche)
>  - make blist array static const and use separate bitmask for warned
> flag
>(Bart van Assche)
>  - Fix string comparison for SCSI vendor and model
>  - Print warning at scsi_logging_level 0, and improve message
> formatting
> 
> Signed-off-by: Martin Wilck 
> ---
>  drivers/scsi/scsi_devinfo.c |   4 +-
>  drivers/scsi/scsi_error.c   | 111
> 
>  include/scsi/scsi_devinfo.h |   6 +++
>  3 files changed, 120 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_devinfo.c
> b/drivers/scsi/scsi_devinfo.c
> index dfb8da83fa50..39455734d934 100644
> --- a/drivers/scsi/scsi_devinfo.c
> +++ b/drivers/scsi/scsi_devinfo.c
> @@ -161,12 +161,14 @@ static struct {
>   {"DGC", "RAID", NULL, BLIST_SPARSELUN}, /* Dell PV
> 650F, storage on LUN 0 */
>   {"DGC", "DISK", NULL, BLIST_SPARSELUN}, /* Dell PV
> 650F, no storage on LUN 0 */
>   {"EMC",  "Invista", "*", BLIST_SPARSELUN | BLIST_LARGELUN},
> - {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN
> | BLIST_REPORTLUN2},
> + {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN
> +  | BLIST_REPORTLUN2 | BLIST_ABORTED_CMD_QUIRK},
>   {"EMULEX", "MD21/S2 ESDI", NULL, BLIST_SINGLELUN},
>   {"easyRAID", "16P", NULL, BLIST_NOREPORTLUN},
>   {"easyRAID", "X6P", NULL, BLIST_NOREPORTLUN},
>   {"easyRAID", "F8", NULL, BLIST_NOREPORTLUN},
>   {"FSC", "CentricStor", "*", BLIST_SPARSELUN |
> BLIST_LARGELUN},
> + {"FUJITSU", "ETERNUS_DXM", "*", BLIST_ABORTED_CMD_QUIRK},
>   {"Generic", "USB SD Reader", "1.00", BLIST_FORCELUN |
> BLIST_INQUIRY_36},
>   {"Generic", "USB Storage-SMC", "0180", BLIST_FORCELUN |
> BLIST_INQUIRY_36},
>   {"Generic", "USB Storage-SMC", "0207", BLIST_FORCELUN |
> BLIST_INQUIRY_36},
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 62b56de38ae8..d60568f71047 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -39,6 +40,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "scsi_priv.h"
>  #include "scsi_logging.h"
> @@ -432,6 +434,112 @@ static void scsi_report_sense(struct
> scsi_device *sdev,
>   }
>  }
>  
> +struct aborted_cmd_blist {
> + u8 asc;
> + u8 ascq;
> + int retval;
> + const char *vendor;
> + const char *model;
> +};
> +
> +/**
> + * scsi_strcmp - Compare space-padded string with reference string
> + * @device_str:  vendor or model field of struct scsi_device,
> + *   possibly space-padded
> + * @ref_str: reference string to compare with
> + * @len: max size of device_str: 8 for vendor, 16 for model
> + *
> + * Return value:
> + *   -1, 0, or 1, like strcmp().
> + */
> +static int scsi_strcmp(const char *device_str, const char *ref_str,
> int len)
> +{
> + int ref_len = strlen(ref_str);
> + int r, i;
> +
> + WARN_ON(ref_len > len);
> +