On 08/30/2012 04:08 PM, Nomen Nescio wrote:
>>> Hi. I have a spare off the shelf consumer PC and was thinking about loading
>>> Solaris on it for a development box since I use Studio @work and like it
>>> better than gcc. I was thinking maybe it isn't so smart to use ZFS since it
>>> has only one drive. If ZFS detects something bad it might kernel panic and
>>> lose the whole system right? I realize UFS /might/ be ignorant of any
>>> corruption but it might be more usable and go happily on it's way without
>>> noticing? Except then I have to size all the partitions and lose out on
>>> compression etc. Any suggestions thankfully received.
>>
>> Suppose you start getting checksum errors.  Then you *do* want to notice.
> 
> I'm not convinced. I understand the theoretical value of ZFS but it
> introduces a whole new layer of problems other filesystems don't have. Even
> if it's right in theory it doesn't always make things better in reality. I
> like the features it provides and not having to size filesystems like in
> the old days is great, but ZFS can and does have bugs and like anything else
> is not perfect. Aside from Microsoft which used to be guaranteed to corrupt
> filesystems I haven't ever had corruption that caused me any problems.
> Certainly there must have been corruptions because of software bugs and
> crappy hardware but they had no visible effect and that is good enough for
> me in this situation I asked about. I feel this issue is a little overblown
> given most of the world runs on  other enterprise filesystems and the world
> hasn't come to and end yet. ZFS is an important step in the right direction
> but it doesn't mean you can't live without it's error detection. We lived
> without it until now. What I find hard to live without is the management
> features it gives you which is why I have a dilemna.

1) Anecdotal evidence is nearly worthless in matters of technology.

2) Data corruption does happen, and HDD manufacturers can even pin a
   number to it (the typical bit error rate on modern HDDs is around
   10^-13, i.e. one bit error per ~10TB transferred). That it didn't
   hit your sensitive data but only some random pixel in an MPEG movie
   is good for you. But ZFS was built to handle environments where all
   data is critically important.

3) Data corruption also happens in-transit on the SATA/SAS buses and
   in memory (that's why there is a thing as ECC memory).

4) If it so bothers you, simply set checksum=off and fly without the
   parachute (a single core of a modern CPU can checksum at a rate
   upwards of 4GB/s, but if the few CPU cycles are so important to you,
   turn it off).

> In this specific use case I would rather have a system that's still bootable
> and runs as best it can than an unbootable system that has detected an
> integrity problem especially at this point in ZFS's life. If ZFS would not
> panic the kernel and give the option to fail or mark file(s) bad, I would
> like it more. 

ZFS doesn't panic in case of an unrecoverable single-block error, it
simply returns an I/O error to the calling application. The panic only
*can* take place in case of a catastrophic pool failure and isn't the
default anyway. See man zpool(1M) for the description of the "failmode"
option.

> But having the ability manage the disk with one pool and the other nice
> features like compression plus the fact it works nicely on good hardware
> make it hard to go back once you made the jump. Choices, choices.

So you want to enable compression (which is a huge CPU hug) and worry
about checksumming (which is tiny in comparison)? If you're compressing
data, you've got all the more reason to enable checksumming, since
compression tends to make all data corruption much, much worse (e.g.
that's why a single-bit error in a compressed MPEG stream doesn't simply
slightly alter the color of a single pixel, but typically instead
results in a whole macroblock or row of macroblocks messing up completely).

>>> Even if your system does crash, at least you now have an opportunity to
>>> recognize there is a problem, and think about your backups, rather than
>>> allowing the corruption to proliferate. 
> 
> This isn't a production box as I said it's an unused PC with a single drive,
> and I don't have anybody's bank accounts on it. I can rsync whatever I work
> on that day to a backup server. It won't be a disaster if UFS suddenly
> becomes unreliable and I lose a file or two, or if a drive fails, but it
> would be very annoying if ZFS barfed on a technicality and I had to
> reinstall the whole OS because of a kernel panic and an unbootable system.

As noted before, simple checksum errors won't panic your box, and
neither will catastrophic pool failure (the default failmode=wait). You
have to explicitly tell ZFS that you want it to panic your system in
this situation.

Cheers,
--
Saso
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to