I asked what I thought was a simple question but most of the answers don't
have too much to do with the question. Now it seems to be an argument of
your filesystem is better than any other filesystem. I don't think it is
because I have seen the horror stories lurking on this list. I had no
intention to get into this and I think you should have no intention
either. I like ZFS, I use it at workand I am not here to knock it. 

> 1) Anecdotal evidence is nearly worthless in matters of technology.

Agree but fail to see the relevance. Bug reports on this list aren't
worthless or the list wouldn't exist.

> 2) Data corruption does happen, and HDD manufacturers can even pin a
>    number to it (the typical bit error rate on modern HDDs is around
>    10^-13, i.e. one bit error per ~10TB transferred). That it didn't
>    hit your sensitive data but only some random pixel in an MPEG movie
>    is good for you. But ZFS was built to handle environments where all
>    data is critically important.

I don't think I have 10TB of source code ;)

Other file systems also handle critically important data. Every design has
its tradeoffs and I don't believe ZFS is superior to anything else although
it has many nice management features which aren't available in the same
feature set elsewhere. I am not criticising ZFS, but I don't believe it
solves every problem either.

> 3) Data corruption also happens in-transit on the SATA/SAS buses and
>    in memory (that's why there is a thing as ECC memory).

Right.

> 
> 4) If it so bothers you, simply set checksum=off and fly without the
>    parachute (a single core of a modern CPU can checksum at a rate
>    upwards of 4GB/s, but if the few CPU cycles are so important to you,
>    turn it off).

You're making up imaginary motives and blaming them on me? I didn't say I
don't want to spend cycles on checksumming. I said I don't want to lose a
system because of a filesystem error. There's no need to be snide or
condescending. Maybe you need a vacation? Who's your boss?

> 
> > In this specific use case I would rather have a system that's still bootable
> > and runs as best it can than an unbootable system that has detected an
> > integrity problem especially at this point in ZFS's life. If ZFS would not
> > panic the kernel and give the option to fail or mark file(s) bad, I would
> > like it more. 
> 
> ZFS doesn't panic in case of an unrecoverable single-block error, it
> simply returns an I/O error to the calling application. The panic only
> *can* take place in case of a catastrophic pool failure and isn't the
> default anyway. See man zpool(1M) for the description of the "failmode"
> option.

ZFS is not perfect and although it may be designed to do what you say I
think errors in ZFS are more likely than bit errors on hard drives. I'm
betting on hardware and /in this scenario/ I would prefer a filesystem that
tolerates it even ignorantly rather than protecting me from myself. What I'd
really like is an option (maybe it exists) in ZFS to say when a block fails
a checksum tell me which file it affects and let me decide to proceed or dump.

> > But having the ability manage the disk with one pool and the other nice
> > features like compression plus the fact it works nicely on good hardware
> > make it hard to go back once you made the jump. Choices, choices.
> 
> So you want to enable compression (which is a huge CPU hug) and worry
> about checksumming (which is tiny in comparison)?

Yes, you got it right this time. You're the one trying to put words in my
mouth. Nowhere did I ever suggest CPU cycles are an issue. The issue is
what I said. Scroll up.

> If you're compressing data, you've got all the more reason to enable
> checksumming, since compression tends to make all data corruption much,
> much worse (e.g. that's why a single-bit error in a compressed MPEG stream
> doesn't simply slightly alter the color of a single pixel, but typically
> instead results in a whole macroblock or row of macroblocks messing up
> completely).

Sounds reasonable.

> 
> >>> Even if your system does crash, at least you now have an opportunity to
> >>> recognize there is a problem, and think about your backups, rather than
> >>> allowing the corruption to proliferate. 
> > 
> > This isn't a production box as I said it's an unused PC with a single drive,
> > and I don't have anybody's bank accounts on it. I can rsync whatever I work
> > on that day to a backup server. It won't be a disaster if UFS suddenly
> > becomes unreliable and I lose a file or two, or if a drive fails, but it
> > would be very annoying if ZFS barfed on a technicality and I had to
> > reinstall the whole OS because of a kernel panic and an unbootable system.
> 
> As noted before, simple checksum errors won't panic your box, and
> neither will catastrophic pool failure (the default failmode=wait). You
> have to explicitly tell ZFS that you want it to panic your system in
> this situation.

I have read reports on this list that show ZFS does panic the system by
default in some cases. It may not have been for checksum failures, I have no
idea why it did, but enough people wrote about crashed boxes to make me ask
the question I asked.

Thanks for the copies suggestion. I'm too busy to argue with you so please
pretend this thread never happened.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to