Steven Alligood wrote: > On 02/09/2010 07:52 PM, Mike Lovell wrote: >> does anyone have good recommendations as to some tools or utilities to >> use for exercising or burning in new hard disks? where i work, we buy *a >> lot* of disks and currently use a utility called thrash [1]. we just >> have it do a couple million random writes to the disk. but i could use >> some other tools to test the disks in different ways as well to get a >> better idea if the disk is going to hold up. i've thought about using >> bonnie++ or iozone as well. what you any of you use, if anything? thx. >> >> mike >> >> [1] http://www.csc.liv.ac.uk/~greg/thrash/ >> >> > > Just exactly how many disks do you find that fail with that method, > and do you end up with less disks failing and needing replacement in > the first few months of production versus more disks failing at one > year, two years, etc? > > I guess I am asking why you bother to waste man hours thrashing the > poor disks and removing potential life from them rather than just > making sure they are all in good RAID sets and replacing them as they > fail (hot spares and man-hours to replace rather than test)? > > My company deploys more than 100 new drives per week, and the testing > alone would be much more time consuming to find the very few bad > drives in testing versus replacing them as they fail in those first > few weeks. Add to that the fact that the testing may reduce the life > sufficiently that you have more failures at the one and two year > points, and it seems a waste to test like that. > > I am always open to better ways of doing things, so please, if you > find the thrashing helps, I would love to hear the results. > > -Steve
for one, these aren't going into RAID sets. they are used individually. the thought is that we do some burn in testing up front to get rid of the disks that would die soon after going into use. disk infant mortality. it is based on the idea the mortality rates of disks follow an bathtub curve. there are a lot that fail at the beginning, reduced numbers during most of the life of the disk, and then increasing failure rates near the end of the life span. we are wanting to do the burn in to get to the low point on that curve. but that does assume disks follow a bathtub curve which i don't know if ours actually follow that. we also do more than a 100 disks per week. i don't have numbers on how many we replace due to them dying quickly cause i have been away from the actual deployment for a while. recently, the topic has come up of do we need to do more burn in testing on the disks. so i thought i would ask the group. mike /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
