-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Eberhard Roloff wrote: > Randall R Schulz wrote: >> On Friday 09 November 2007 09:43, Robert Smits wrote: >>> On Friday 09 November 2007 01:12:31 G T Smith wrote: >>>> Robert Smits information rather confirms what I have suspected for >>>> some time about how one should assess a S.M.A.R.T report, >>>> unfortunately Robert did not give a link for the paper he referred >>>> to. I would be interested to have a look at it ... >>> Happy to oblige..... >>> >>> http://209.85.163.132/papers/disk_failures.pdf >> Thanks. >> >> Oddly enough, when I went to download that into my publications >> directory, I discovered I already had a copy that I downloaded back in >> February and which is byte-for-byte identical. >> >> > > While this study is great, one should not forget that the google usage > environment of hundreds of thousands disks is not directly comparable to > what most people do at work or at home. > > I.e. most people do not work in air-conditioned data centers and most > desktops do not run 24x7. > > So while the google paper is certainly informative and a rare beast in > regard to the observation of a very large population of commodity > harddisks, I would not dare to use any of it's conclusions lightly for > my home usage pattern. > > regards > Eberhard >
Thanks Rob for the link... The paper it is extremely useful but possibly flawed. Eberhard may have missed a couple of points that are probably relevant to home usage. The most important being that there seems to be slight increase in failure rate if the drive has light usage, and the failure pattern of the quaintly labelled 'infant mortality' in which drives are more likely to fail early in use or when the drive is getting on a bit (but the latter is more of a confirmation of what I expect most of us know already). The most difficult problem with the paper is the definition of failure, S.M.A.R.T. mainly reports on the media access status not so much the reliability of electronics controlling that media. As one of the most significant events in recent time was a recall of a large number of a particular manufacturers drives due to poor quality of the latter the failure to distinguish between media failure and electronic failure is problematic. As this is a difficult problem to handle one cannot lay fault at the authors for this, but one does need to take it consideration when considering their results. Four parameters are identified as being critical, but the concentration on annualized failure rate without analysis of mean time to failure weakens the analysis somewhat. There is also an issue in that they report on survival rates after the first event but do not report on secondary events. Survival rates of the drive if there were no subsequent failures reports would have been useful. I would make similar observation on the various sector error counts that they examine. The mean time to failure statistics is also possibly more useful to those dealing with a small quantity of drives. I think the most interesting part is the conclusion that the S.M.A.R.T. indicators are probably nearly useless in predicting the failure or survival of an individual drive on their own, and that there are really only four values one should take notice of. (This does not mean do not use S.M.A.R.T., it means take S.M.A.R.T for what it is, a useful tool for flagging a potential problem). If your are seeing a S.M.A.R.T. error but the file systems on the drive pass all integrity tests there is a fairly good chance this a is false (or non-critical) positive but one should monitor the situation and if the values change adversely take appropriate action. (in other words DONT PANIC!). My conclusion is that this only emphasises the need for a good backup strategy preferably with two independent approaches if one feels that paranoid. Also for a good guarantee of the data integrity of that you wish to backup to invest in at least dual drive Raid 1 to ensure what you back up is not effected by hardware issues. (No guarantee against software SNAFUs of course). Of the S.M.A.R.T reports a scan error is probably the error that is of most concern. Drives with sector allocation related errors, if the values do not change one could probably still use for non critical testing or in configurations such RAID where there is some redundancy. - -- ============================================================================== I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. Bjarne Stroustrup ============================================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org iD8DBQFHNYSUasN0sSnLmgIRAoDEAKDZZoyrog1irAGP7NB/ZUB/zDp6wgCfeDlL DNhJ2hGbqSNBbZGosXekqU8= =Jgu7 -----END PGP SIGNATURE----- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
