Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM

2014-12-08 Thread Theodore Ts'o
On Fri, Dec 05, 2014 at 10:58:09PM +, Elliott, Robert (Server Storage) 
wrote:
  I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero.
 
 make me concerned about this whitelist approach.
 
 I think you need a manufacturer assertion that this is indeed
 the design intent; you cannot be certain based on observation
 from outside.

How is this different from a manufacturer assertion that they follow a
SCSI or ATA standard?  There have been cases in the distant past
(fortunately) of disk manufacturers that ignored a CACHE FLUSH command
just to get higher Winbench scores.  Does that mean we can't trust
them to do anything right?

What I'd suggest instead is that if a vendor states this on a spec
sheet --- more than just an e-mail assertion --- so they can be sued
if they knowingly misrepresent their product, that we take their word
at it.  Of course, there will be bugs, which is why we have
blacklists, or why we can remove them from the list if it turns out
there are edge conditions where it appears the disk doesn't quite do
the right thing.

After all, we generally take the manufacturer's word that air bags
will work as claimed, even if potentially 11 million of them are
currently subject to recall.  And do we think that the community
would necessarily be more suited than the vendors and the manufacturer
to figure out whether or not air bags or drives are working as
desired?

That being said, if someone wants to create a open source program
which stress tests SSD's to look for cases where it is dropping a
requested discard, that would certainly be a good thing to do...

Cheers,

 - Ted
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM

2014-12-08 Thread James Bottomley
On Mon, 2014-12-08 at 10:15 -0500, Theodore Ts'o wrote:
 On Fri, Dec 05, 2014 at 10:58:09PM +, Elliott, Robert (Server Storage) 
 wrote:
   I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero.
  
  make me concerned about this whitelist approach.
  
  I think you need a manufacturer assertion that this is indeed
  the design intent; you cannot be certain based on observation
  from outside.
 
 How is this different from a manufacturer assertion that they follow a
 SCSI or ATA standard?  There have been cases in the distant past
 (fortunately) of disk manufacturers that ignored a CACHE FLUSH command
 just to get higher Winbench scores.  Does that mean we can't trust
 them to do anything right?

That answer depends on device type manufacturer.  USB devices, hell no.
ATA devices, maybe and SCSI devices usually.

The main problem is usually testing.  Consumer devices like USB and
(s)ATA rarely get tested on anything but windows.  USB devices tend to
supply their own driver, so they're on the we fix it in the driver
model which is why they bite us so badly. (S)ATA usually comply, but
they only test what windows exercises, so if windows doesn't do it,
chances are it never got tested.  SCSI devices still tend to be tested
in legacy UNIX environments, which are as diverse as we are.

 What I'd suggest instead is that if a vendor states this on a spec
 sheet --- more than just an e-mail assertion --- so they can be sued
 if they knowingly misrepresent their product, that we take their word
 at it.  Of course, there will be bugs, which is why we have
 blacklists, or why we can remove them from the list if it turns out
 there are edge conditions where it appears the disk doesn't quite do
 the right thing.
 
 After all, we generally take the manufacturer's word that air bags
 will work as claimed, even if potentially 11 million of them are
 currently subject to recall.  And do we think that the community
 would necessarily be more suited than the vendors and the manufacturer
 to figure out whether or not air bags or drives are working as
 desired?
 
 That being said, if someone wants to create a open source program
 which stress tests SSD's to look for cases where it is dropping a
 requested discard, that would certainly be a good thing to do...

The purpose of DRAT and RZAT is to enable disk arrays deterministically
to use TRIM/Unmap so arrays know what happens to stripes on discard.
Arrays are being built of mostly SATA technology these days, so some
manufacturers have retargetted to arrays and consumer technology (and
are testing the array cases).  However, windows doesn't use either
feature, so manufacturers not targetting arrays will never test this
feature.  Hence, in this case, I think a whitelist does make sense.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM

2014-12-08 Thread One Thousand Gnomes
On Mon, 8 Dec 2014 10:15:59 -0500
Theodore Ts'o ty...@mit.edu wrote:

 On Fri, Dec 05, 2014 at 10:58:09PM +, Elliott, Robert (Server Storage) 
 wrote:
   I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero.
  
  make me concerned about this whitelist approach.
  
  I think you need a manufacturer assertion that this is indeed
  the design intent; you cannot be certain based on observation
  from outside.
 
 How is this different from a manufacturer assertion that they follow a
 SCSI or ATA standard?  There have been cases in the distant past
 (fortunately) of disk manufacturers that ignored a CACHE FLUSH command
 just to get higher Winbench scores.  Does that mean we can't trust
 them to do anything right?

At the time they never promised to honour cache flush. The reason it was
became mandatory in the specification was in part so that the vendors
could all force each other to play fair. If its optional then it's
tough..., if they say they meet the standard it's class action 8)

If this is a promise then it ought to be good 

James:  USB devices tend to supply their own driver

has not been true for some years now. Microsoft provide an in-box driver
and vendors have the choice of using that or certifying their own via
WHQL, which is a bit like choosing between free ice cream and banging
your head against a plank cover in nails.

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM

2014-12-05 Thread Paolo Bonzini


On 07/11/2014 06:08, Martin K. Petersen wrote:
 The whitelist is only meant as a starting point and is by no means
 comprehensive:
 
- All intel SSD models except for 510
- Micron M5*
- Samsung SSDs
- Seagate SSDs
 
 Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
 ---
  drivers/ata/libata-core.c | 18 ++
  drivers/ata/libata-scsi.c | 10 ++
  include/linux/libata.h|  1 +
  3 files changed, 21 insertions(+), 8 deletions(-)
 
 diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
 index c5ba15af87d3..f41f24a8bc21 100644
 --- a/drivers/ata/libata-core.c
 +++ b/drivers/ata/libata-core.c
 @@ -4225,10 +4225,20 @@ static const struct ata_blacklist_entry 
 ata_device_blacklist [] = {
   { PIONEER DVD-RW  DVR-216D,   NULL,   ATA_HORKAGE_NOSETXFER },
  
   /* devices that don't properly handle queued TRIM commands */
 - { Micron_M500*,   NULL,   ATA_HORKAGE_NO_NCQ_TRIM, },
 - { Crucial_CT???M500SSD*,  NULL,   ATA_HORKAGE_NO_NCQ_TRIM, },
 - { Micron_M550*,   NULL,   ATA_HORKAGE_NO_NCQ_TRIM, },
 - { Crucial_CT*M550SSD*,NULL,   ATA_HORKAGE_NO_NCQ_TRIM, },
 + { Micron_M5?0*,   NULL,   ATA_HORKAGE_NO_NCQ_TRIM |
 + ATA_HORKAGE_ZERO_AFTER_TRIM, },
 + { Crucial_CT???M5?0SSD*,  NULL,   ATA_HORKAGE_NO_NCQ_TRIM, },

I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero.

BTW. it's the same hardware as the M550, so probably the same set of
quirks should apply to both.

Paolo

 +
 + /*
 +  * DRAT/RZAT are weak guarantees. Explicitly black/whitelist
 +  * SSDs that provide reliable zero after TRIM.
 +  */
 + { INTEL*SSDSC2MH*,NULL,   0, }, /* Blacklist intel 510 */
 + { INTEL*SSD*, NULL,   ATA_HORKAGE_ZERO_AFTER_TRIM, },
 + { SSD*INTEL*, NULL,   ATA_HORKAGE_ZERO_AFTER_TRIM, },
 + { Samsung*SSD*,   NULL,   ATA_HORKAGE_ZERO_AFTER_TRIM, },
 + { SAMSUNG*SSD*,   NULL,   ATA_HORKAGE_ZERO_AFTER_TRIM, },
 + { ST[1248][0248]0[FH]*,   NULL,   ATA_HORKAGE_ZERO_AFTER_TRIM, },
  
   /*
* Some WD SATA-I drives spin up and down erratically when the link
 diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
 index 0586f66d70fa..deaa6e34ed4d 100644
 --- a/drivers/ata/libata-scsi.c
 +++ b/drivers/ata/libata-scsi.c
 @@ -2515,13 +2515,15 @@ static unsigned int ata_scsiop_read_cap(struct 
 ata_scsi_args *args, u8 *rbuf)
   rbuf[15] = lowest_aligned;
  
   if (ata_id_has_trim(args-id)) {
 - rbuf[14] |= 0x80; /* TPE */
 + rbuf[14] |= 0x80; /* LBPME */
  
 - if (ata_id_has_zero_after_trim(args-id))
 - rbuf[14] |= 0x40; /* TPRZ */
 + if (ata_id_has_zero_after_trim(args-id) 
 + dev-horkage  ATA_HORKAGE_ZERO_AFTER_TRIM) {
 + ata_dev_warn(dev, Enabling 
 discard_zeroes_data\n);
 + rbuf[14] |= 0x40; /* LBPRZ */
 + }
   }
   }
 -
   return 0;
  }
  
 diff --git a/include/linux/libata.h b/include/linux/libata.h
 index bd5fefeaf548..45ac825b8366 100644
 --- a/include/linux/libata.h
 +++ b/include/linux/libata.h
 @@ -421,6 +421,7 @@ enum {
   ATA_HORKAGE_NO_NCQ_TRIM = (1  19),/* don't use queued TRIM */
   ATA_HORKAGE_NOLPM   = (1  20),/* don't use LPM */
   ATA_HORKAGE_WD_BROKEN_LPM = (1  21),  /* some WDs have broken LPM */
 + ATA_HORKAGE_ZERO_AFTER_TRIM = (1  22),/* guarantees zero after trim */
  
/* DMA mask for user DMA control: User visible values; DO NOT
   renumber */
 
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM

2014-12-05 Thread Elliott, Robert (Server Storage)


 -Original Message-
 From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
 ow...@vger.kernel.org] On Behalf Of Martin K. Petersen
 Sent: Thursday, 06 November, 2014 11:08 PM
 To: linux-scsi@vger.kernel.org; linux-...@vger.kernel.org; linux-
 fsde...@vger.kernel.org; ne...@suse.de
 Cc: Martin K. Petersen
 Subject: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return
 zeroes after TRIM
 
 As defined, the DRAT (Deterministic Read After Trim) and RZAT (Return
 Zero After Trim) flags in the ATA Command Set are unreliable in the
 sense that they only define what happens if the device successfully
 executed the DSM TRIM command. TRIM is only advisory, however, and the
 device is free to silently ignore all or parts of the request.
 
 In practice this renders the DRAT and RZAT flags completely useless and
 because the results are unpredictable we decided to disable discard in
 MD for 3.18 to avoid the risk of data corruption.
 
 Hardware vendors in the real world obviously need better guarantees than
 what the standards bodies provide. Unfortuntely those guarantees are
 encoded in product requirements documents rather than somewhere we can
 key off of them programatically. So we are compelled to disabling
 discard_zeroes_data for all devices unless we explicitly have data to
 support whitelisting them.
 
 This patch whitelists SSDs from a few of the main vendors. None of the
 whitelists are based on written guarantees. They are purely based on
 empirical evidence collected from internal and external users that have
 tested or qualified these drives in RAID deployments.
 
 The whitelist is only meant as a starting point and is by no means
 comprehensive:
 
- All intel SSD models except for 510
- Micron M5*
- Samsung SSDs
- Seagate SSDs
 

That description and Paolo's reply:
 From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
 ow...@vger.kernel.org] On Behalf Of Paolo Bonzini
 Sent: Friday, 05 December, 2014 10:45 AM
...
 I have a Crucial_CT256MX1 (i.e. MX100) and it does reliably zero.

make me concerned about this whitelist approach.

I think you need a manufacturer assertion that this is indeed
the design intent; you cannot be certain based on observation
from outside.

Since the SCSI and ATA standards allow ignoring the hint, it 
might be honored most of the time, but ignored in some rare
cases (e.g., drive firmware has a malloc() failure that only
happens when the drive is handling an overtemperature
condition and six other problems at the same time).

Maybe there should be two levels:
* vendor asserts the drive is designed to always honor the hint
* community observes the drive always seems to honor the hint

and a sysfs flag for users to select the level at which 
they feel safe.

A user running 3 replicas of the data in different sites 
might be more trusting than a user for which this is the 
only copy of the data.

---
Rob ElliottHP Server Storage




--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM

2014-11-07 Thread Christoph Hellwig
On Fri, Nov 07, 2014 at 12:08:12AM -0500, Martin K. Petersen wrote:
   if (ata_id_has_trim(args-id)) {
 - rbuf[14] |= 0x80; /* TPE */
 + rbuf[14] |= 0x80; /* LBPME */
  
 - if (ata_id_has_zero_after_trim(args-id))
 - rbuf[14] |= 0x40; /* TPRZ */
 + if (ata_id_has_zero_after_trim(args-id) 
 + dev-horkage  ATA_HORKAGE_ZERO_AFTER_TRIM) {
 + ata_dev_warn(dev, Enabling 
 discard_zeroes_data\n);

I think this should _info, not _warn.

Otherwise looks good to me,

Reviewed-by: Christoph Hellwig h...@lst.de

It would be nice if there was a way to trigger the flag from userspace,
so that we don't need to rebuild the kernel to add a whitelist entry.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] libata: Whitelist SSDs that are known to properly return zeroes after TRIM

2014-11-07 Thread Martin K. Petersen
 Christoph == Christoph Hellwig h...@infradead.org writes:

 + ata_dev_warn(dev, Enabling discard_zeroes_data\n);

Christoph I think this should _info, not _warn.

Fixed.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html