2015-06-18 8:57 GMT+05:30 Nick Holland <n...@holland-consulting.net>:

> > What is then proper behavior for a program or system using an SSD, to
> deal
> > with SSD degradation?:
>
> replace drive before it is an issue.
>
> > So say you have a program altering a file's contents all the time, or you
> > have file turnover on a system (rm f123; echo importantdata > f124). At
> > some point the SSD will shrink and down the line reach zero capacity.
>
> That's not how it works.
>
> The SSD has some number of spare storage blocks.  When it finds a bad
> block, it locks out the bad block and swaps in a good block.
>
..

> Neither SSDs nor magnetic disks "shrink" to the outside world.  The
> moment they need a replacement block that doesn't exist, the disk has
> lost data for you and you should call it failed...it has not "shrunk".
>
> Now, in both cases, this is assuming the drive fails in the way you
> expect -- that the "flaw" will be spotted on immediate read-after-write,
> while the data is still in the disk's cache or buffer.  There is more
> than one way magnetic disks fail, there's more than one way SSDs fail.
> People tend to hyperventilate over the one way and forget all the rest.
>
> Run your SSDs in production servers for two or three years, then swap
> them out.



Thank you very much for your remarks.

In particular for the distinctions that for production purposes a disk is
good or bad only, and that SSD failure can surface in any nasty way
imaginable.

I guess in a system where you have some kind of data duplication between
disks already, and for economical reasons you want to use the SSD til it
fails, you have bad disk detection trig on any write or read IO error, both
directly reported (fopen/fwrite/fread/fclose misbehavior) and appearing
indirectly through IO op taking more time than usual .. and perhaps a
checksum?

Could broken SSD corrupt binaries without UFS checksumming and application
executable checksumming denying a program to start at all, so random
SIGSEGV:s count as broken disk too - what checksumming does UFS and
executables have, how reliable are they?

Reply via email to