Re: [zfs-discuss] Best way to convert checksums

2009-10-05 Thread Brandon Mercer
On Mon, Oct 5, 2009 at 10:27 AM, David Dyer-Bennet d...@dd-b.net wrote:

 On Sat, October 3, 2009 17:18, Ray Clark wrote:

 Thank you all for your help, not to snub anyone, but Darren, Richard, and
 Cindy especially come to mind.  Thanks for sparring with me until we
 understood each other.

 I'd like to echo this (and extend the thanks to include Ray).  I'm now
 starting to feel that I understand this issue, and I didn't for quite a
 while.  And that I understand the risks better, and have a clearer idea of
 what the possible fixes are.  And I didn't before.  That I do now is due
 to Ray's persistence, and to the rest of your patience.  Thank you!

Excellent, can this thread die now? :P
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-05 Thread Al Hopper
Question (for Richard E):  Is there a write-up on the ZFS broken fletcher fix?
Is the default checksum for new pool creation changed in U8?
Is the default checksum for new pool creation change in OpenSolaris or
SXCE  (which versions)?
Is there a case open to allow the user to select the checksum to be
used when a ZIL is being created?

Interesting thread - and commiserations to the team ZFS on the broken
fletcher implementation - we (developers) all have bad days!!

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX a...@logical-approach.com
   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-05 Thread Miles Nordin
 re == Richard Elling richard.ell...@gmail.com writes:

re As I said before, if the checksum matches, then the data is
re checked for sequence number = previous + 1, the blk_birth ==
re 0, and the size is correct. Since this data lives inside the
re block, it is unlikely that a collision would also result in a
re valid block.

That's just a description of how the zil works, not an additional
layer of protection for user data in the ZIL beyond the checksum.  The
point of all this is to avoid needing to write a synchronous commit
sector to mark the block valid.  Instead, the block becomes valid once
it's entirely written.  Yes, the checksum has an additional, critical,
use in the ZIL compared to its use in the bulk pool, but checking
these header fields for sanity does nothing to mitigate broken
fletcher2's weakness in detecting corruption of the user data stored
inside the zil records.  It's completely orthogonal.

If anything, the additional use of broken fletcher2 in the ZIL is a
reason it's even more important to fix the checksum in the ZIL:
checksum mismatches occur in the ZIL even during normal operation,
even when the storage is not misbehaving, because sometimes blocks are
incompletely written.  This is the normal case, not the exception,
because the ZIL is only read after unclean shutdown.

and AIUI you are saying fletcher2 is still the default for bulk pool
data, too?  even on newly created pools with the latest code?  The fix
was just to add the word ``deprecated'' to some documentation
somewhere, without actually performing the deprecation?  I feel like
FreeBSD/NetBSD would probably have left this bug open until it's
fixed.  :/  Ubuntu or Gentoo would probably keep closing and reopening
it though while people haggled in the comments section.


pgpiLfE4opE8N.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-05 Thread Victor Latushkin

On 05.10.09 23:07, Miles Nordin wrote:

re == Richard Elling richard.ell...@gmail.com writes:


re As I said before, if the checksum matches, then the data is
re checked for sequence number = previous + 1, the blk_birth ==
re 0, and the size is correct. Since this data lives inside the
re block, it is unlikely that a collision would also result in a
re valid block.

That's just a description of how the zil works, not an additional
layer of protection for user data in the ZIL beyond the checksum.  The
point of all this is to avoid needing to write a synchronous commit
sector to mark the block valid.  Instead, the block becomes valid once
it's entirely written.  Yes, the checksum has an additional, critical,
use in the ZIL compared to its use in the bulk pool, but checking
these header fields for sanity does nothing to mitigate broken
fletcher2's weakness in detecting corruption of the user data stored
inside the zil records.  It's completely orthogonal.

If anything, the additional use of broken fletcher2 in the ZIL is a
reason it's even more important to fix the checksum in the ZIL:
checksum mismatches occur in the ZIL even during normal operation,
even when the storage is not misbehaving, because sometimes blocks are
incompletely written.  This is the normal case, not the exception,
because the ZIL is only read after unclean shutdown.

and AIUI you are saying fletcher2 is still the default for bulk pool
data, too?  even on newly created pools with the latest code?


Here's essentially the fix:

http://src.opensolaris.org/source/diff/onnv/onnv-gate/usr/src/uts/common/fs/zfs/sys/zio.h?r2=%252Fonnv%252Fonnv-gate%252Fusr%252Fsrc%252Futs%252Fcommon%252Ffs%252Fzfs%252Fsys%252Fzio.h%409454%3A02e1ddcc9be7r1=%252Fonnv%252Fonnv-gate%252Fusr%252Fsrc%252Futs%252Fcommon%252Ffs%252Fzfs%252Fsys%252Fzio.h%409443%3A2a96d8478e95

It changes setting of checksum=on to mean fletcher4, so it is used by default 
for all user data and metadata. Though you can still set it to fletcher2 
explicitly.


victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-05 Thread Miles Nordin
 vl == Victor Latushkin victor.latush...@sun.com writes:

vl It changes setting of checksum=on to mean fletcher4

oh, good.  so it is only the ZIL that's unfixed?  At least that fix
could come from a simple upgrade, if it ever gets fixed.


pgp0027QlEfr3.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-05 Thread Toby Thain


On 5-Oct-09, at 3:32 PM, Miles Nordin wrote:


bm == Brandon Mercer yourcomputer...@gmail.com writes:



I'm now starting to feel that I understand this issue,
and I didn't for quite a while.  And that I understand the
risks better, and have a clearer idea of what the possible
fixes are.  And I didn't before.


haha, yes, I think I can explain it to people when advocating ZFS, but
the story goes something like ``ZFS is serious business and pretty
useful, but it has some pretty hilarious problems that you wouldn't
expect


Let's talk about the hilarious problems that a naive RAID stack  
has, and most users don't expect. For a start, no crash safe  
behaviour, and no way to self-heal from unexpected mirror desync.  
Then we could compare always-consistent COW with conventionally  
fragile metadata needing regular consistency checks...




from some of the blog hype you read.  Let me give you a couple
examples of things that still aren't fixed


...and can't be fixed, in RAID, or conventional filesystems.

--Toby


and how the discussion
went...''

...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-04 Thread Miles Nordin
 re == Richard Elling richard.ell...@gmail.com writes:

re The probability of the garbage having both a valid fletcher2
re checksum at the proper offset and having the proper sequence
re number and having the right log chain link and having the
re right block size is considerably lower than the weakness of
re fletcher2.

I'm having trouble parsing this.  I think you're confusing a few 
different failure modes:

 * ZIL entry is written, but corrupted by the storage, so that, for
   example, an entry should be read from the mirrored ZIL instead.

   + broken fletcher2 detects the storage corruption
 CASE A: Good!

   + broken fletcher2 misses the error, so that corrupted data is
 replayed from ZIL into the proper pool, possibly adding a
 stronger checksum to the corrupt data while writing it.
 CASE B: Bad!

   + broken fletcher2 misinterprets storage corruption as signalling
 the end of the ZIL, and any data in the ZIL after the corrupt
 entry is truncated without even attempting to read the mirror.
 (does this happen?)
 CASE C: Bad!

 * ZIL entry is intentional garbage, either a partially-written entry
   or an old entry, and should be treated as the end of the ZIL

   + broken fletcher2 identifies the partially written entry by a
 checksum mismatch, or the sequence number identifies it as old
 CASE D: Good!

   + broken fletcher2 misidentifies a partially-written entry as
 complete because of a hash collision
 CASE E: Bad!

   + (hypothetical, only applies to non-existent fixed system) working
 fletcher2 or broken-good-enough fletcher4 misidentifies a
 partially-written entry as complete because of a hash collision
 CASE F: Bad!

If I read your sentence carefully and try to match it with this chart,
it seems like you're saying P(CASE F)  P(CASE E), which seems like
an argument for fixing the checksum.  While you don't say so, I
presume from your other posts you're trying to make a case for doing
nothing, so I'm confused.

I was mostly thinking about CASE B though.  It seems like the special
way the ZIL works has nothing to do with CASE B: if you send data
through the ZIL to a sha256 pool, it can be written to ZIL under
broken-fletcher2, corrupted by the storage, and then read in and
played back corrupt but covered with a sha256 checksum to the pool
proper.  AFAICT your relative-probability sentence has nothing to do
with CASE B.

re Unfortunately, the ZIL is also latency sensitive, so the
re performance case gets stronger 

The performance case advocating what?  not fixing the broken checksum?

re while the additional error checking already boosts the
re dependability case.

what additional error checking?

Isn't the whole specialness of the ZIL that the checksum is needed in
normal operation, absent storage subsystem corruption, as I originally
said?  It seems like the checksum's strength is more important here,
not less.


pgpMTzwhPNdUa.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-04 Thread Richard Elling


On Oct 4, 2009, at 11:51 AM, Miles Nordin wrote:


re == Richard Elling richard.ell...@gmail.com writes:


   re The probability of the garbage having both a valid fletcher2
   re checksum at the proper offset and having the proper sequence
   re number and having the right log chain link and having the
   re right block size is considerably lower than the weakness of
   re fletcher2.

I'm having trouble parsing this.  I think you're confusing a few
different failure modes:

* ZIL entry is written, but corrupted by the storage, so that, for
  example, an entry should be read from the mirrored ZIL instead.


This is attempted, if you have a mirrored slog.


  + broken fletcher2 detects the storage corruption
CASE A: Good!

  + broken fletcher2 misses the error, so that corrupted data is
replayed from ZIL into the proper pool, possibly adding a
stronger checksum to the corrupt data while writing it.
CASE B: Bad!

  + broken fletcher2 misinterprets storage corruption as signalling
the end of the ZIL, and any data in the ZIL after the corrupt
entry is truncated without even attempting to read the mirror.
(does this happen?)
CASE C: Bad!

* ZIL entry is intentional garbage, either a partially-written entry
  or an old entry, and should be treated as the end of the ZIL

  + broken fletcher2 identifies the partially written entry by a
checksum mismatch, or the sequence number identifies it as old
CASE D: Good!


If the checksum mismatches, you can't go any further because
the pointer to the next ZIL log entry cannot be trusted. So the
roll forward stops.  This is how such logs work -- there is no
end-of-log record.


  + broken fletcher2 misidentifies a partially-written entry as
complete because of a hash collision
CASE E: Bad!

  + (hypothetical, only applies to non-existent fixed system) working
fletcher2 or broken-good-enough fletcher4 misidentifies a
partially-written entry as complete because of a hash collision
CASE F: Bad!


As I said before, if the checksum matches, then the data is
checked for sequence number = previous + 1, the blk_birth == 0,
and the size is correct. Since this data lives inside the block, it
is unlikely that a collision would also result in a valid block.
In other words, ZFS doesn't just trust the checksum for slog entries.
 -- richard


If I read your sentence carefully and try to match it with this chart,
it seems like you're saying P(CASE F)  P(CASE E), which seems like
an argument for fixing the checksum.  While you don't say so, I
presume from your other posts you're trying to make a case for doing
nothing, so I'm confused.

I was mostly thinking about CASE B though.  It seems like the special
way the ZIL works has nothing to do with CASE B: if you send data
through the ZIL to a sha256 pool, it can be written to ZIL under
broken-fletcher2, corrupted by the storage, and then read in and
played back corrupt but covered with a sha256 checksum to the pool
proper.  AFAICT your relative-probability sentence has nothing to do
with CASE B.

   re Unfortunately, the ZIL is also latency sensitive, so the
   re performance case gets stronger

The performance case advocating what?  not fixing the broken checksum?

   re while the additional error checking already boosts the
   re dependability case.

what additional error checking?

Isn't the whole specialness of the ZIL that the checksum is needed in
normal operation, absent storage subsystem corruption, as I originally
said?  It seems like the checksum's strength is more important here,
not less.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-03 Thread Ray Clark
Richard, with respect to:

This has been answered several times in this thread already.
set checksum=sha256 filesystem
copy your files -- all newly written data will have the sha256
checksums.

I understand that.  I understood it before the thread started.  I did not ask 
this.  It is a fact that there is no feature to convert checksums as a part of 
resilver or some such.  

I started what utility to use, but quickly zeroed in on zfs send/receive as 
being the native and presumably best method, but had questions as to how to get 
the properly set when it was automatically created.  etc.

Note that my focus in recent portions of the thread has changed to the 
underlying zpool.

Simply changing the checksum=sha356 and copying my data is analogous to hanging 
my data from a hierarchy of 0.256 welded steel chain, with the top of the 
hierarchy hanging it all from an 0.001 steel thread.  Well, that is not quite 
fair because there are probabilities involved.  Someone is going to pick a link 
randomly and go after it with a fingernail clipper.  If they pick a thick one, 
I have very little to worry about to say the least.  If they pick one of the 
few dozen? hundred? thousand? (I don't know how many) that contain the 
structure and services of the underlying zpool, then the nailclipper will not 
be stopped by the 0.001 thread.   I do have 8,000,000,000 links in the chain, 
and only a a very small fraction are 0.001 thick, and that is strongly in my 
favor, but I would expect the heads to also spend a disproportionate amount of 
time over the intent log.  It is hard to know how it comes out.  I just don't 
want and 0.001 steel threads protecting my data from the
  gremlins.  I moved to ZFS to avoid gambles.  If I wanted gambles I would use 
linux raid and lvm2.  They work well enough if there are no errors.  

I should have enumerated the knowns and unknowns in my list last night, then I 
would not have annoyed you with my apparent deafness.  (Hopefully I am not 
still being deaf).  I will clarify below, as I should have last night:


Given that I only have 1.6TB of data in a 4TB pool, what can I do to
change those blocks to sha256 or Fletcher4:

(1) Without destroying and recreating the zpool under U4

I know how to fix the user data (just change checksum property on the pool 
using zfs specifying the pool vs. a zfs file system, then copy the data).

I don't know (am ignorant of) blocks comprising the underlying zpool, and how 
to fix them without recreating the pool.  It makes sense to me that at least 
some would be rewritten in the course of using the system, but (1) I have had 
no confirmation or denial that this is the case, (2) I don't know if this is 
all of them or some of them, (3) I don't know if the checksum= parameter would 
effect these (relling's Oct 2 at 3:26 post implies that it does not by lack of 
reference to checksum property).  So I don't know yet how much exposure will 
remain.  I would think that if the user specified a stronger checksum for their 
data that the system would abandon its use of weaker ones in the underlying 
structure, but Richard's list seems to imply otherwise.

(2) With destroying and recreating the zpool under U4 (Which I don't
really have the resources to pull off)

Due to some of the non-technical factors in the situation, I cannot actually 
execute an experimental valid zpool command, but zpool create -o garbage 
gives me a usage that does not include any -o or -O.  So it appears that under 
U4 I cannot do this.  I wish there were someone who could confirm that I can or 
cannot do this before I arrange for and propose that we dive into this massive 
undertaking.  Also, from Richard's Oct 2 3:26 note, I infer that this will not 
change the checksum used by the underlying zpool anyway, so this might be 
fruitless.  But I am infering... Richard gave a quick list, his attitude was 
not that of providing all level of precise detail so I really don't know.  Many 
of the answers I have received have turned out to recommend features that are 
not available in U4 but in later versions, even unreleased versions.  I have no 
way of sorting this out without the information being qualified with a version.

(3) With upgrading to U7 (Perhaps in a few months)
Not clear what this will support on zpool, or if it would be effective (similar 
to U4 above)

(4) With upgrading to U8
Not sure when it will come out, what it will support, or if it will be 
effective (similar to U7, U4 above).

So I can enable robust protection on my user data, but perhaps not the 
infrastructure needed to get at that user data, and perhaps not the intent log. 
 

The answer may be that I cannot be helped.  That is not the desired answer, but 
if that is the case, so be it.  Let's lay out the facts and the best way to 
move on from here, for me and everybody else.  Why leave us thrashing in the 
dark?  Am I a Mac user?  

I personally will still believe ZFS is the way to go in the short term because 
it is 

Re: [zfs-discuss] Best way to convert checksums

2009-10-03 Thread Bob Friesenhahn

On Fri, 2 Oct 2009, Ray Clark wrote:

With the current fletcher2 we have only a 50-50 chance of catching 
these multi-bit errors.  Probability of multiple bits being changed 
is not


What is the current fletcher2?  A while back I seem to recall reading 
a discussion in the zfs-code forum about how the original zfs 
fletcher2 was found to be unexpectedly weak and broken so they updated 
the fletcher2 algorithm and assigned it a new enumeration so that 
fresh blocks use the corrected algorithm.  I could be just imagining 
all of this but that is what I remember today.


Since you are using Solaris 10 U4 maybe you are using the dinosaur 
version of fletcher2?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-03 Thread Richard Elling

On Oct 3, 2009, at 7:46 AM, Ray Clark wrote:


Richard, with respect to:

This has been answered several times in this thread already.
set checksum=sha256 filesystem
copy your files -- all newly written data will have the sha256
checksums.

I understand that.  I understood it before the thread started.  I  
did not ask this.  It is a fact that there is no feature to convert  
checksums as a part of resilver or some such.


There is no such feature.  There is a long-awaited RFE for block
pointer rewriting (checksums are stored in the block pointer)
which would add the underlying capabilities for this.

I started what utility to use, but quickly zeroed in on zfs send/ 
receive as being the native and presumably best method, but had  
questions as to how to get the properly set when it was  
automatically created.  etc.


Say for example I have a pool called zwimming with some stuff in it
and compression=on. To create a copy of the data using send/recv
in the same pool but with compression=sha256 do:
zfs snapshot zwimm...@now
zfs send zwimm...@now | zfs receive zwimming/new

You will now have a new file system called zwimming/new with
the same data as zwimming, but with compression=sha256.
If you then want to get back to the original directory structure
you can the mountpoint properties, as desired. There are dozens
of other ways to accomplish the copy.

Note that my focus in recent portions of the thread has changed to  
the underlying zpool.


Simply changing the checksum=sha356 and copying my data is analogous  
to hanging my data from a hierarchy of 0.256 welded steel chain,  
with the top of the hierarchy hanging it all from an 0.001 steel  
thread.  Well, that is not quite fair because there are  
probabilities involved.  Someone is going to pick a link randomly  
and go after it with a fingernail clipper.  If they pick a thick  
one, I have very little to worry about to say the least.  If they  
pick one of the few dozen? hundred? thousand? (I don't know how  
many) that contain the structure and services of the underlying  
zpool, then the nailclipper will not be stopped by the 0.001  
thread.   I do have 8,000,000,000 links in the chain, and only a a  
very small fraction are 0.001 thick, and that is strongly in my  
favor, but I would expect the heads to also spend a disproportionate  
amount of time over the intent log.  It is hard to know how it comes  
out.  I just don't want and 0.001 steel threads protecting my data  
from the
 gremlins.  I moved to ZFS to avoid gambles.  If I wanted gambles I  
would use linux raid and lvm2.  They work well enough if there are  
no errors.


I think you are missing the concept of pools. Pools contain datasets.
One form of dataset is a file system. Pools do not contain data per se,
datasets contain data.  Reviewing the checksums used with this
heirarchy in mind:

Pool
Label [SHA-256]
Uberblock [SHA-256]
Metadata [fletcher4]
Gang block [SHA-256]
ZIL log [fletcher2]

Dataset (file system or volume)
Metadata [fletcher4]
Data [fletcher2 (default, today), fletcher4, or SHA-256]
Send stream [fletcher4]

With this in mind, I don't understand your steel analogy.

wrt the ZIL, it is rarely used for normal file system access.  ZIL  
blocks

are allocated from the pool as needed and freed no more than 30
seconds later, unless there is a sudden halt. If the system is halted
then the ZIL is used to roll forward transactions. The heads do not
spend a disproportionate amount of time over the intent log.
 -- richard

I should have enumerated the knowns and unknowns in my list last  
night, then I would not have annoyed you with my apparent deafness.   
(Hopefully I am not still being deaf).  I will clarify below, as I  
should have last night:



Given that I only have 1.6TB of data in a 4TB pool, what can I do to
change those blocks to sha256 or Fletcher4:

(1) Without destroying and recreating the zpool under U4

I know how to fix the user data (just change checksum property on  
the pool using zfs specifying the pool vs. a zfs file system, then  
copy the data).


I don't know (am ignorant of) blocks comprising the underlying  
zpool, and how to fix them without recreating the pool.  It makes  
sense to me that at least some would be rewritten in the course of  
using the system, but (1) I have had no confirmation or denial that  
this is the case, (2) I don't know if this is all of them or some of  
them, (3) I don't know if the checksum= parameter would effect these  
(relling's Oct 2 at 3:26 post implies that it does not by lack of  
reference to checksum property).  So I don't know yet how much  
exposure will remain.  I would think that if the user specified a  
stronger checksum for their data that the system would abandon its  
use of weaker ones in the underlying structure, 

Re: [zfs-discuss] Best way to convert checksums

2009-10-03 Thread Miles Nordin
 re == Richard Elling richard.ell...@gmail.com writes:

re If I was to refer to Fletcher's algorithm, I would use
re Fletcher.  When I am referring to the ZFS checksum setting of
re fletcher2 I will continue to use fletcher2

haha okay, so to clarify, when reading a Richard Elling post:

 fletcher2 = ZFS's broken attempt to implement a 32-bit Fletcher checksum

 Fletcher  = hypothetical correct implementation of a Fletcher checksum

In that case, for clarity I think I'll have to use the word ``broken''
a lot more often.

  How does the fix included S10u8 and snv_114 work?

re The best I can tell, the comments are changed to indicate
re fletcher2 is deprecated.

You are saying the ``fix'' was a change in documentation, nothing
else?  The default is still fletcher2, and there is no correct
implementation of the Fletcher checksum only the
good-enough-but-broken fletcher4, which is not the default?

Also, there is no way to use a non-broken checksum on the ZIL?

doesn't sound fixed to me.  At least there is some transparency,
though, and a partial workaround.


pgp67z0VWvcbD.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-03 Thread Bob Friesenhahn

On Sat, 3 Oct 2009, Miles Nordin wrote:

   re The best I can tell, the comments are changed to indicate
   re fletcher2 is deprecated.

You are saying the ``fix'' was a change in documentation, nothing
else?  The default is still fletcher2, and there is no correct
implementation of the Fletcher checksum only the
good-enough-but-broken fletcher4, which is not the default?


It seems that my memory is kind of crappy (like fletcher2).

There were discussions of the fletcher2 issue on the zfs-code list 
starting in March and ending in May:


http://mail.opensolaris.org/pipermail/zfs-code/2009-March/thread.html
http://mail.opensolaris.org/pipermail/zfs-code/2009-April/thread.html
http://mail.opensolaris.org/pipermail/zfs-code/2009-May/thread.html

Unless someone has a legal requirement to prove data integrity, it 
does not seem like the fletcher2 woes are much for most people to be 
worried about.  After all, before zfs, this level of validation did 
not exist at all.  Fletcher2 will still catch most instances of data 
corruption.


One thing I did learn from this discussion is that when accessing 
uncached memory, the performance of fletcher2 and fletcher4 is roughly 
equivalent so there is usually no penalty for enabling fletcher4.  It 
does seem like there could be some CPU impact for synchronous writes 
from fletcher4 since it is more likely that the data is in cache for a 
synchronous write.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-03 Thread Richard Elling


On Oct 3, 2009, at 12:22 PM, Miles Nordin wrote:


re == Richard Elling richard.ell...@gmail.com writes:


   re If I was to refer to Fletcher's algorithm, I would use
   re Fletcher.  When I am referring to the ZFS checksum setting of
   re fletcher2 I will continue to use fletcher2

haha okay, so to clarify, when reading a Richard Elling post:

fletcher2 = ZFS's broken attempt to implement a 32-bit Fletcher  
checksum


Fletcher  = hypothetical correct implementation of a Fletcher checksum

In that case, for clarity I think I'll have to use the word ``broken''
a lot more often.


How does the fix included S10u8 and snv_114 work?


   re The best I can tell, the comments are changed to indicate
   re fletcher2 is deprecated.

You are saying the ``fix'' was a change in documentation, nothing
else?  The default is still fletcher2, and there is no correct
implementation of the Fletcher checksum only the
good-enough-but-broken fletcher4, which is not the default?

Also, there is no way to use a non-broken checksum on the ZIL?


The ZIL is a slightly different beast. If there is a checksum mismatch
while processing the log, it signals the effective end of the log.  This
is why log entries are self-checksummed.  In other words, if you reach
garbage, then you've reached the end of the log. The probability of
the garbage having both a valid fletcher2 checksum at the proper
offset and having the proper sequence number and having the right
log chain link and having the right block size is considerably lower
than the weakness of fletcher2.

Unfortunately, the ZIL is also latency sensitive, so the performance
case gets stronger while the additional error checking already
boosts the dependability case.
 -- richard



doesn't sound fixed to me.  At least there is some transparency,
though, and a partial workaround.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-03 Thread Ray Clark
With respect to relling's Oct 3 2009 7:46 AM Post:

 I think you are missing the concept of pools. Pools contain datasets.
 One form of dataset is a file system. Pools do not contain data per se,
 datasets contain data. Reviewing the checksums used with this
 heirarchy in mind:

 Pool
 Label [SHA-256]
 Uberblock [SHA-256]
 Metadata [fletcher4]
 Gang block [SHA-256]
 ZIL log [fletcher2]

 Dataset (file system or volume)
 Metadata [fletcher4]
 Data [fletcher2 (default, today), fletcher4, or SHA-256]
 Send stream [fletcher4]

 With this in mind, I don't understand your steel analogy.

I am assuming based on the context of our presentation that the above list of 
pool stuff is exhaustive, that this is everything not in a dataset.

My steel analogy is based on the assumption that the pool-level stuff that 
you list above is needed to gain access to the dataset.  If the dataset can be 
accessed with all of the pool stuff trashed, then my steel thread does not 
exist.  But it also means that the pool stuff is extraneous, so I doubt that 
this is the case.

Given that all of the pool stuff is either sha256 or fletcher4 except for the 
ZIL, I have new understanding that suggests (though I don't understand the 
details of the system) that I am not depending on Fletcher2 protected data, and 
my steel thread is actually pretty thick, not 0.001.

Based on your comments regarding the ZIL, I am infering that stuff is written 
there and never used except for a restart after a messy shutdown.  I might be 
exposed to whatever weakness the Fletcher2 has as implemented, but only in 
these rare circumstances.  Normal transactions and data would not be impacted 
by corruption in the ZIL blocks since they would never be used.  So a large 
layer of probability protects me:  I would have to have a crash at the same 
instance of corruption in the ZIL that hits on a Fletcher2 weakness.

Based on all of this I believe I am relatively happy simply copying my data, 
not recreating my zpool.  

As Darren Moffat taught me, I can zfs set checksum=sha256 zfs01 where zfs01 
is the zpool, then zfs send zfs01/h...@snapshot | zfs receive zfs01/home.new 
and the new file system will all be sha256 as long as I don't specify the -R 
option on the zfs send, and all of this is supported in U4.  I believe it has 
to be supported due to the presence of files with properties in the (odd?) zfs 
file system that exists at the zfs01 zpool level before creation of zfs file 
systems.

So assuming the above process works, this thread is done as far as I am 
concerned right now.  

Thank you all for your help, not to snub anyone, but Darren, Richard, and Cindy 
especially come to mind.  Thanks for sparring with me until we understood each 
other.

--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ray Clark
Data security.  I migrated my organization from Linux to Solaris driven away 
from Linux by the the shortfalls of fsck on TB size file systems, and towards 
Solaris by the features of ZFS.

At the time I tried to dig up information concerning tradeoffs associated with 
Fletcher2 vs. 4 vs. SHA256 and found nothing.  Studying the algorithms, I 
decided that fletcher2 would tend to be weak for periodic data, which 
characterizes my data.  I ran throughput tests and got 67MB/Sec for Fletcher2 
and 4 and 48MB/Sec for SHA256.  I projected (perhaps without basis) SHA256's 
cryptographic strength to also mean strength as a hash, and chose it since 
48MB/Sec is more than I need.

21 months later (9/15/09) I lost everything to a corrupt metadata (Not sure 
where this was printed) ZFS-8000-CS.  No clue why to date, I will never know.  
The person who restored from tape was not informed to set checksum=sha256, so 
it all went in with the default, Fletcher2.

Before taking rather disruptive actions to correct this, I decided to question 
my original decision and found schlie's post stating that a bug in fletcher2 
makes it essentially a one bit parity on the entire block:
http://opensolaris.org/jive/thread.jspa?threadID=69655tstart=30  While 
this is twice as good as any other file system in the world that has NO such 
checksum, this does not provide the security I migrated for.  Especially given 
that I did not know what caused the original data loss, it is all I have to 
lean on.

Convinced that I need to convert all of the checksums to sha256 to have the 
data security ZFS purports to deliver and in the absence of a checksum 
conversion capability, I need to copy the data.  It appears that all of the 
implementations of the various means of copying data, from tar and cpio to cp 
to rsync to pax have ghosts in their closets, each living in glass houses, and 
each throwing stones at the other with respect to various issues with file 
size, filename lengths, pathname lengths, ACLs, extended attributes, sparse 
files, etc. etc. etc.  

It seems like zfs send/receive *should* be safe from all such issues as part of 
the zfs family, but the questions raised here are ambiguous once one starts to 
think about it.  If the file system is faithfully duplicated, it should also 
duplicate all properties, including the checksum used on each block.  It 
appears (to my advantage) that this is not what is done.  This enables the 
filesystem spontaneously created by zfs receive to inherit from the pool, which 
evidently can be set to sha256 though it is a pool not a file system in the 
pool.  The present question is protection on the base pool.  This can be set 
when the pool is created, though not with U4 which I am running.  It is not 
clear (yet) if this is simply not documented with the current release or if the 
version that supports this has not been released yet.  If I were to upgrade 
(Which I cannot do in a timely fashion), it would only be to U7.  I cannot run 
a weekly build type of OS on my production server.  Any way 
 it goes I am hosed.  In short there is surely some structure, some blocks with 
stuff written in them when a pool is created but before anything else is done, 
else it would be a blank disk, not a zfs pool.  Are these protected by 
Fletcher2 as the default?  I have learned that the Ubberblock is protected by 
SHA256, other parts by Fletcher4.  Is this everything?  In U4 was it fletcher4, 
or was this a recent change steming from Schlie's report?

In short, what is the situation with regard to the data security I switched to 
Solaris/ZFS for, and what can I do to achieve it?  What *do* the tools do?  Are 
there tools for what needs to be done to convert things, to copy things, to 
verify things, and to do so completely and correctly?  

So here is where I am:  I should zfs send/receive, but I cannot have confidence 
that there are not fletcher2 protected blocks (1 bit parity) at the most 
fundamental levels of the zpool.  To verify data, I cannot depend on existing 
tools since diff is not large file aware.  My best idea at this point is to 
calculate and compare MD5 sums of every file and spot check other properties as 
best I can.

Given this rather full perspective, help or comments very appreciated.  I still 
think zfs is the way to go, but the road is a little bumpy at the moment.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ray Clark
Appologize that the preceeding post appears out of context.  I expected it to 
indent as I pushed the reply button on myxiplx' Oct 1, 2009 1:47 post.  It 
was in response to his question.  I will try to remember to provide links 
internal to my messages.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Tomas Ögren
On 02 October, 2009 - Ray Clark sent me these 4,4K bytes:

 Data security.  I migrated my organization from Linux to Solaris
 driven away from Linux by the the shortfalls of fsck on TB size file
 systems, and towards Solaris by the features of ZFS.
[...]
 Before taking rather disruptive actions to correct this, I decided to
 question my original decision and found schlie's post stating that a
 bug in fletcher2 makes it essentially a one bit parity on the entire
 block:
 http://opensolaris.org/jive/thread.jspa?threadID=69655tstart=30
 While this is twice as good as any other file system in the world that
 has NO such checksum, this does not provide the security I migrated
 for.  Especially given that I did not know what caused the original
 data loss, it is all I have to lean on.
...

That post refers to bug 6740597
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6740597
which also refers to
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=2178540

So it seems like it's fixed in snv114 and s10u8, which won't help your
s10u4 unless you update..

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ray Clark
Replying to Cindys Oct 1, 2009 3:34 PM post:

Thank you.   The second part was my attempt to guess my way out of this.  If 
the fundamental structure of the pool (That which was created before I set the 
checksum=sha256 property) is using fletcher2, perhaps as I use the pool all of 
this structure will be updated, and therefore automatically migrate to the new 
checksum.  It would be very difficult for me to recreate the pool, but I have 
space to duplicate the user files (and so get the new checksum).  Perhaps 
this will also result in the underlying structure of the pool being converted 
in the course of normal use.  

Comments for or against?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ray Clark
Replying to relling's October 1, 2009 3:34 post:

Richard, regarding when a pool is created, there is only metadata which uses 
fletcher4.  Was this true in U4, or is this a new change of default with U4 
using fletcher2?  Similarly, did the Ubberblock use sha256 in U4?  I am running 
U4.

--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ross
Interesting answer, thanks :)

I'd like to dig a little deeper if you don't mind, just to further my own 
understanding (which is usually rudimentary compared to a lot of the guys on 
here).  My belief is that ZFS stores two copies of the metadata for any block, 
so corrupt metadata really shouldn't happen often.

Could I ask what the structure of your pool is, what level of redundancy do you 
have there.  The very fact that you had a 'corrupt metadata' error implies to 
me that the checksums have done their job in finding an error, and I'm 
wondering if the true cause could be further down the line.

I'm still taking all this in though - we'll be using sha256 on our secondary 
system, just in case :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ray Clark
My pool was the default, with checksum=256.  The default has two copies of all 
metadata (as I understand it), and one copy of user data.  It was a raidz2 with 
eight 750GB drives, yielding just over 4TB of usable space.  

I am not happy with the situation, but I recognize that I am 2x better off (1 
bit parity) than I would be with any other file system.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Marion Hakanson
webcl...@rochester.rr.com said:
  To verify data, I cannot depend on existing tools since diff is not large
 file aware.  My best idea at this point is to calculate and compare MD5 sums
 of every file and spot check other properties as best I can. 

Ray,

I recommend that you use rsync's -c to compare copies.  It reads all the
source files, computes a checksum for them, then does the same for the
destination and compares checksums.  As far as I know, the only thing
that rsync can't do in your situation is the ZFS/NFSv4 ACL's.  I've used
it to migrate many TB's of data.

Regards,

Marion




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Cindy Swearingen

Ray,

The checksums are set on the file systems not the pool.

If a new checksum is set and *you* rewrite the data, then the rewritten
data will contain the new checksum. If your pool has the space for you 
to duplicate the user data and new checksum is set, then the duplicated 
data will have the new checksum.


ZFS doesn't rewrite data as part of normal operations. I confirmed with
a simple test (like Darren's) that even if you have a single-disk pool 
and the disk is replaced and all the data is resilvered and a new 
checksum is set, you'll see data with the previous checksum and the new

checksum.

Cindy

On 10/02/09 08:44, Ray Clark wrote:

Replying to Cindys Oct 1, 2009 3:34 PM post:

Thank you.   The second part was my attempt to guess my way out of this.  If the fundamental structure of the pool (That which was created before I set the checksum=sha256 property) is using fletcher2, perhaps as I use the pool all of this structure will be updated, and therefore automatically migrate to the new checksum.  It would be very difficult for me to recreate the pool, but I have space to duplicate the user files (and so get the new checksum).  Perhaps this will also result in the underlying structure of the pool being converted in the course of normal use.  


Comments for or against?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Richard Elling

On Oct 2, 2009, at 7:46 AM, Ray Clark wrote:


Replying to relling's October 1, 2009 3:34 post:

Richard, regarding when a pool is created, there is only metadata  
which uses fletcher4.  Was this true in U4, or is this a new change  
of default with U4 using fletcher2?  Similarly, did the Ubberblock  
use sha256 in U4?  I am running U4.


ZFS uses different checksums for different things. Briefly,

use checksum
-
uberblock   SHA-256, self-checksummed
labels  SHA-256
metadatafletcher4
datafletcher2 (default), set with checksum parameter
ZIL log fletcher2, self-checksummed
gang block  SHA-256, self-checksummed

The parent holds the checksum for an entity is not self-checksummed.

The big question, that is currently unanswered, is do we see single
bit faults in disk-based storage systems? The answer to this question
must be known before the effectiveness of a checksum can be evaluated.
The overwhelming empirical evidence suggests that fletcher2 catches
many storage system corruptions.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Miles Nordin
 re == Richard Elling richard.ell...@gmail.com writes:
 r == Ross  myxi...@googlemail.com writes:

re The answer to this question must be known before the
re effectiveness of a checksum can be evaluated.

...well...we can use math to know that a checksum is effective.  What
you are really suggesting we evaluate ``empirically'' is the degree of
INeffectiveness of the broken checksum.

 r ZFS stores two copies of the metadata for any block, so
 r corrupt metadata really shouldn't happen often.

the other copy probably won't be read if the first copy read has a
valid checksum.  I think it'll more likely just lazy-panic instead.
If that's the case, the two copies won't help cover up the broken
checksum bug.  but Richard's table says metadata has fletcher4 which
the OP said is as good as the correct algorithm would have been, even
in its broken implementation, so long as it's only used up to
128kByte.  It's only data and ZIL that has the relevantly-broken
checksum, according to his math.

re The overwhelming empirical evidence suggests that fletcher2
re catches many storage system corruptions.

What do you mean by the word ``many''?  It's a weasel-word.  It
basically means, AFAICT, ``the broken checksum still trips
sometimes.''  But have you any empirical evidence about the fraction
of real world errors which are still caught by the broken checksum
vs. those that are not?  I don't see how you could.

How about cases where checksums are not used to correct bit-flip
gremlins but relied upon to determine whether a data structure is
fully present (committed) yet, like in the ZIL, or to determine which
half of a mirror is stale---these are cases where checksums could be
wrong even if the storage subsystem is functioning in an ideal way.

Checksum weakness on ZFS where checksums are presumed good by other
parts of the design could potentially be worse overall than a
checksumless design.  That's not my impression, but it's the right
place to put the bar.  Ray's ``well at least it's better than no
checksums'' is wrong because it presumes ZFS could function as well as
another filesystem if ZFS were using a hypothetical null checksum.  It
couldn't.

Anyway I'm glad the problem is both fixed and also avoidable on the
broken systems.  I just think the doublespeak after the fact is, once
again, not helping anyone.


pgpSoPvsby5bY.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Richard Elling

Hi Miles, good to hear from you again.

On Oct 2, 2009, at 1:20 PM, Miles Nordin wrote:


re == Richard Elling richard.ell...@gmail.com writes:
r == Ross  myxi...@googlemail.com writes:


   re The answer to this question must be known before the
   re effectiveness of a checksum can be evaluated.

...well...we can use math to know that a checksum is effective.  What
you are really suggesting we evaluate ``empirically'' is the degree of
INeffectiveness of the broken checksum.


By your logic, SECDED ECC for memory is broken because it only
corrects 1 bit per symbol and only detects brokeness of 2 bits per
symbol. However, the empirical evidence suggests that ECC provides
a useful function for many people. Do we know how many triple bit
errors occur in memories? I can compute the probability, but have
never seen a field failure analysis. So, if ECC is good enough for
DRAM, is fletcher2 good enough for storage?

NB, for DRAM the symbol size is usually 64 bits. For the ZFS case, the
symbol size is 4,096 to 1,048,576 bits. AFAIK, no collisions have been
found in SHA-256 digests for symbols of size 1,048,576, but it has not
been proven that that they do not exist.


r ZFS stores two copies of the metadata for any block, so
r corrupt metadata really shouldn't happen often.

the other copy probably won't be read if the first copy read has a
valid checksum.  I think it'll more likely just lazy-panic instead.
If that's the case, the two copies won't help cover up the broken
checksum bug.  but Richard's table says metadata has fletcher4 which
the OP said is as good as the correct algorithm would have been, even
in its broken implementation, so long as it's only used up to
128kByte.  It's only data and ZIL that has the relevantly-broken
checksum, according to his math.

   re The overwhelming empirical evidence suggests that fletcher2
   re catches many storage system corruptions.

What do you mean by the word ``many''?  It's a weasel-word.


I'll blame the lawyers. They are causing me to remove certain words
from my vocabulary :-(


 It
basically means, AFAICT, ``the broken checksum still trips
sometimes.''  But have you any empirical evidence about the fraction
of real world errors which are still caught by the broken checksum
vs. those that are not?  I don't see how you could.


Question for the zfs-discuss participants, have you seen a data  
corruption

that was not detected when using fletcher2?

Personally, I've seen many corruptions of data stored on file systems
lacking checksums.


How about cases where checksums are not used to correct bit-flip
gremlins but relied upon to determine whether a data structure is
fully present (committed) yet, like in the ZIL, or to determine which
half of a mirror is stale---these are cases where checksums could be
wrong even if the storage subsystem is functioning in an ideal way.

Checksum weakness on ZFS where checksums are presumed good by other
parts of the design could potentially be worse overall than a
checksumless design.  That's not my impression, but it's the right
place to put the bar.  Ray's ``well at least it's better than no
checksums'' is wrong because it presumes ZFS could function as well as
another filesystem if ZFS were using a hypothetical null checksum.  It
couldn't.


I'm in Ray's camp. I've got far to many scars from data corruption and  
I'd

rather not add more.
 -- richard



Anyway I'm glad the problem is both fixed and also avoidable on the
broken systems.  I just think the doublespeak after the fact is, once
again, not helping anyone.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ray Clark
Replying to hakanson's Oct 2, 2009 2:01 post:

Thanks.  I suppose it is true that I am not even trying to compare the 
peripheral stuff, and simple presence of a file and the data matching covers 
some of them.  

Using it for moving data, one encounters a longer list:  Sparse files, ACL 
handling, extended atributes, length of filenames, length of pathnames, large 
files.  And probably other interesting things that can be not handled 
correctly. 

Most information for misbehavior of the various archive / backup / data 
movement utilities is very old.  One wonders how they behave today.  This would 
be a useful compilation, but I can't do it.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ray Clark
Cindys Oct 2, 2009 2:59,  Thanks for staying with me.

Re: The checksums are aset on the file systems not the pool.:

But previous responses seem to indicate that I can set them for file stored in 
the filesystem that appears to be the pool, at the pool level, before I create 
any new ones.  One post seems to indicate that there is a checksum property for 
this file system, and independently for the pool.  (This topic needs a 
picture).  

Re: If a new checksum is set and *you* rewrite the data ... then the 
duplciated data will have the new checksum.

Understand.  Now I am on to being concerned for the blocks that comprise the 
zpool that *contain* the file system.

Re: ZFS doesn't rewrite data as part of normal operations.  I confirmed with a 
simple test (like Darren's) that even if you have a single-disk pool and the 
disk is replaced and all the data is resilvered and a new checksum is set, 
you'll see data with the previous checksum and the new checksum.

Yes, ... a resilver duplicates exactly.  Darren's example showed that without 
the -R, no properties were sent and the zfs receive had no choice but to use 
the pool default for the zfs filesystem that it created.  This also implies 
that there was a property associated with the pool.  So my previous comment 
about zfs send/receive not duplicating exactly was not fair.  The man page / 
admin guide should be clear as to what is sent without -R.  I would have 
guessed everything, just not descendent file systems.  It is a shame that zdb 
is totally undocumented.  I thought I had discovered a gold mine when I first 
read Darren's note!

--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ray Clark
Re: relling's Oct 2, 2009 3:26 Post:

(1) Is this list everything?
(2) Is this the same for U4?
(3) If I change the zpool checksum property on creation as you indicated in 
your Oct 1, 12:51 post (evidently very recent versions only), does this change 
the checksums used for this list?  Why would not the strongest checksum be used 
for the most fundamental data rather than fool around, allowing the user to 
compromise only when the tradeoff pays back on the 99% bulk of the data?

Re: The big question, that is currently unanswered, is do we see single bit 
faults in disk-based storage systems?

I don't think this is the question.  I believe the implication of schlie's post 
is not that single bit faults will get through, but that the current fletcher2 
is equivalent to a single bit checksum.  You could have 1,000 bits in error, or 
4095, and still have a 50-50 chance of detecting it.  A single bit error would 
be certain to be detected (I think) even with the current code.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ray Clark
Re: Miles Nordin Oct 2, 2009 4:20:

Re: Anyway, I'm glad the problem is both fixed...

I want to know HOW it can be fixed?  If they fixed it, this will invalidate 
every pool that has not been changed from the default (Probably almost all of 
them!).  This can't be!  So what WAS done?  In the interest of honesty in 
advertising and enabling people to evaluate their own risks, I think we should 
know how it was fixed.  Something either ingenious or potentially misleading 
must have been done.  I am not suggesting that it was not the best way to 
handle a difficult situation, but I don't see how it can be transparent.  If 
the string fletcher2 does the same thing, it is not fixed.  If it does 
something different, it is misleading.  

... and avoidable on the broken systems.

Please tell me how!  Without destroying and recreating my zpool, I can only fix 
the zfs file system blocks, not the underlying zpool blocks.  WITH destroying 
and recreating my zpool, I can only control the checksum on the underlying 
zpool using a version of Solaris that is not yet available.  And then (Pending 
relling's response) may or may not *still* effect the blocks I am concerned 
about.  So how is this avoidable?  It is partially avoidable (so far) IF I have 
the luxury of doing significant rebuilding..  No?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Richard Elling

On Oct 2, 2009, at 3:05 PM, Ray Clark wrote:


Re: relling's Oct 2, 2009 3:26 Post:

(1) Is this list everything?


AFAIK


(2) Is this the same for U4?


Yes.  This hasn't changed in a very long time.

(3) If I change the zpool checksum property on creation as you  
indicated in your Oct 1, 12:51 post (evidently very recent versions  
only), does this change the checksums used for this list?  Why would  
not the strongest checksum be used for the most fundamental data  
rather than fool around, allowing the user to compromise only when  
the tradeoff pays back on the 99% bulk of the data?


Performance.  Many people value performance over dependability.

Re: The big question, that is currently unanswered, is do we see  
single bit faults in disk-based storage systems?


I don't think this is the question.  I believe the implication of  
schlie's post is not that single bit faults will get through, but  
that the current fletcher2 is equivalent to a single bit checksum.   
You could have 1,000 bits in error, or 4095, and still have a 50-50  
chance of detecting it.  A single bit error would be certain to be  
detected (I think) even with the current code.


I don't believe schlie posted the number of fletcher2 collisions for the
symbol size used by ZFS. I do not believe it will be anywhere near
50% collisions.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ray Clark
Re: relling's Oct 2 5:06 Post:

Re: analogy to ECC memory... 

I appreciate the support, but the ECC memory analogy does not hold water.  ECC 
memory is designed to correct for multiple independent events, such as 
electrical noise, bits flipped due to alpha particles from the DRAM package, or 
cosmic rays.  The probability of these independent events coinciding in time 
and space is very small indeed.  It works well.  

ZFS does purport to cover errors such as these in the crummy double layer 
boards wtihout sufficient decoupling, microcontrollers and memories without 
parity or ECC, etc. found in the cost-reduced to the razor's edge hardware most 
of us run on, but it also covers system level errors such as entire blocks 
being replaced, or large fractions of them being corrupted by high level bugs.  
With the current fletcher2 we have only a 50-50 chance of catching these 
multi-bit errors.  Probability of multiple bits being changed is not small, 
because the probabilities of the error mechanism effecting the 4096~1048576 
bits in the block are not independent.  Indeed, in many of the show-cased 
mechanisms, it is a sure bet - the entire disk sector is written with the wrong 
data, for sure!  Although there is a good chance that many of the bits in the 
sector happen to match, there is an excellent chance that many are different.  
And the mechanisms that caused these differences were not independent
 .  

Re: AFAIK, no collisions have been found in SHA-256 digests for symbols of 
size 1,048,576, but it has not been proven that they do not exist

For sure they exist.  I think 4096 of them, for every SHA256 digest, there are 
(I think) 4096 1,048,576 bit long blocks that will create it.  One hopes that 
the same properties that make SHA256 a good cryptographic hash also make it a 
good hash period.  This, I admit, is a leap of ignorance (At least I know what 
cliff I am leaping off of).

Regarding the question of what people have seen, I have seen lots of 
unexplained things happen, and by definition one never knows why.  I am not 
interested in seeing any more.  I see the potential for disaster, and my time, 
and the time of my group, is better spent doing other things.  That is why I 
moved to ZFS.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Miles Nordin
 re == Richard Elling richard.ell...@gmail.com writes:

re By your logic, SECDED ECC for memory is broken because it only
re corrects

ECC is not a checksum.

Go ahead, get out your dictionary, enter severe-pedantry-mode.  but it
is relevantly different.  In for example data transmission scenarios,
FEC's like ECC are often used along with a strong noncorrecting
checksum over a larger block.

The OP further described scenarios plausible for storage, like ``long
string of zeroes with 1 bit flipped'', that produce collisions with
the misimplemented fletcher2 (but, obviously, not with any strong
checksum like correct-fletcher2).

re is fletcher2 good enough for storage?

yes, it probably is good enough, but ZFS implements some other broken
algorithm and calls it fletcher2.  so, please stop saying fletcher2.

re I'll blame the lawyers. They are causing me to remove certain
re words from my vocabulary :-(

yeah, well, allow me to add a word back to the vocabulary: BROKEN.

If you are not legally allowed to use words like broken and working,
then find another identity from which to talk, please.

re Question for the zfs-discuss participants, have you seen a
re data corruption that was not detected when using fletcher2?

This is ridiculous.  It's not fletcher2, it's brokenfletcher2.  It's
avoidably extremely weak.  It's reasonable to want to use a real
checksum, and this PR game you are playing is frustrating and
confidence-harming for people who want that.  

This does not have to become a big deal, unless you try to spin it
with a 7200rpm PR machine like IBM did with their broken Deathstar
drives before they became HGST.

Please, what we need to do is admit that the checksum is relevantly
broken in a way that compromises the integrity guarantees with which
ZFS was sold to many customers, fix the checksum, and learn how to
conveniently migrate our data.

Based on the table you posted, I guess file data can be set to
fletcher4 or sha256 using filesystem properties to work around the
bug on Solaris versions with the broken implementation.

 1. What's needed to avoid fletcher2 on the ZIL on broken Solaris
versions?

 2. I understand the workaround, but not the fix.  

How does the fix included S10u8 and snv_114 work?  Is there a ZFS
version bump?  Does the fix work by implementing fletcher2
correctly?  or does it just disable fletcher2 and force everything
to use brokenfletcher4 which is good enough?  If the former, how
are the broken and correct versions of fletcher2
distinguished---do they show up with different names in the pool
properties?

Once you have the fixed software, how do you make sure fixed
checksums are actually covering data blocks originally written by
old broken software?  I assume you have to use rsync or zfs
send/recv to rewrite all the data with the new checksum?  If yes,
what do you have to do before rewriting---upgrade solaris and then
'zfs upgrade' each filesystem one by one?  Will zfs send/recv work
across the filesystem versions, or does the copying have to be
done with rsync?

 3. speaking of which, what about the checksum in zfs send streams?
is it also fletcher2, and if so was it also fixed in
s10u8/snv_114, and how does this affect compatibility for people
who have ignored my advice and stored streams instead of zpools?
Will a newer 'zfs recv' always work with an older 'zfs send' but
not the other way around?

there is basically no informaiton about implementing the fix in the
bug, and we can't write to the bug from outside Sun.  Whatever
sysadmins need to do to get their data under the strength of checksum
they thought it was under, it might be nice to describe it in the bug
for whoever gets referred to the bug and has an affected version.


pgp4LNb1yFFMv.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Ray Clark
Let me try to refocus:

Given that I have a U4 system with a zpool created with Fletcher2:

What blocks in the system are protected by Fletcher2, or even Fletcher4 
although that does not worry me so much.

Given that I only have 1.6TB of data in a 4TB pool, what can I do to change 
those blocks to sha256 or Fletcher4:

(1) Without destroying and recreating the zpool under U4

(2) With destroying and recreating the zpool under U4 (Which I don't really 
have the resources to pull off)

(3) With upgrading to U7 (Perhaps in a few months)

(4) With upgrading to U8

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Richard Elling

On Oct 2, 2009, at 3:44 PM, Ray Clark wrote:


Let me try to refocus:

Given that I have a U4 system with a zpool created with Fletcher2:

What blocks in the system are protected by Fletcher2, or even  
Fletcher4 although that does not worry me so much.


Given that I only have 1.6TB of data in a 4TB pool, what can I do to  
change those blocks to sha256 or Fletcher4:


(1) Without destroying and recreating the zpool under U4

(2) With destroying and recreating the zpool under U4 (Which I don't  
really have the resources to pull off)


(3) With upgrading to U7 (Perhaps in a few months)

(4) With upgrading to U8


This has been answered several times in this thread already.
set checksum=sha256 filesystem
copy your files -- all newly written data will have the sha256  
checksums.


 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-02 Thread Richard Elling

On Oct 2, 2009, at 3:36 PM, Miles Nordin wrote:


re == Richard Elling richard.ell...@gmail.com writes:


   re By your logic, SECDED ECC for memory is broken because it only
   re corrects

ECC is not a checksum.


SHA-256 is not a checksum, either, but that isn't the point. The  
concern is

that corruption can be detected.  ECC has very, very limited detection
capabilities, yet it is good enough for many people. We know that
MOS memories have certain failure modes that cause bit flips and by
using ECC and interleaving, the dependability is improved. The big
question is, what does the corrupted data look like in storage? Random
bit flips? Big chunks of zeros? 55aa patterns? Since the concern with
the broken fletcher2 is restricted to the most significant bits, we are
most concerned with failures where the most significants are set to
ones. But as I said, we have no real idea what the corrupted data
should look like, and if it is zero-filled, then fletcher2 will catch  
it.



Go ahead, get out your dictionary, enter severe-pedantry-mode.  but it
is relevantly different.  In for example data transmission scenarios,
FEC's like ECC are often used along with a strong noncorrecting
checksum over a larger block.

The OP further described scenarios plausible for storage, like ``long
string of zeroes with 1 bit flipped'', that produce collisions with
the misimplemented fletcher2 (but, obviously, not with any strong
checksum like correct-fletcher2).

   re is fletcher2 good enough for storage?

yes, it probably is good enough, but ZFS implements some other broken
algorithm and calls it fletcher2.  so, please stop saying fletcher2.


If I was to refer to Fletcher's algorithm, I would use Fletcher.  When I
am referring to the ZFS checksum setting of fletcher2 I will continue
to use fletcher2


   re I'll blame the lawyers. They are causing me to remove certain
   re words from my vocabulary :-(

yeah, well, allow me to add a word back to the vocabulary: BROKEN.

If you are not legally allowed to use words like broken and working,
then find another identity from which to talk, please.

   re Question for the zfs-discuss participants, have you seen a
   re data corruption that was not detected when using fletcher2?

This is ridiculous.  It's not fletcher2, it's brokenfletcher2.  It's
avoidably extremely weak.  It's reasonable to want to use a real
checksum, and this PR game you are playing is frustrating and
confidence-harming for people who want that.


There is no PR campaign. It is what it is. What is done is done.


This does not have to become a big deal, unless you try to spin it
with a 7200rpm PR machine like IBM did with their broken Deathstar
drives before they became HGST.

Please, what we need to do is admit that the checksum is relevantly
broken in a way that compromises the integrity guarantees with which
ZFS was sold to many customers, fix the checksum, and learn how to
conveniently migrate our data.


Unfortunately, there is a backwards compatibility issue that
requires the current fletcher2 to live for a very long time. The
only question for debate is whether it should be the default.
To date, I see no field data that suggests it is not detecting
corruption.


Based on the table you posted, I guess file data can be set to
fletcher4 or sha256 using filesystem properties to work around the
bug on Solaris versions with the broken implementation.

1. What's needed to avoid fletcher2 on the ZIL on broken Solaris
   versions?


Please file RFEs at bugs.opensolaris.org


2. I understand the workaround, but not the fix.

   How does the fix included S10u8 and snv_114 work?  Is there a ZFS
   version bump?  Does the fix work by implementing fletcher2
   correctly?  or does it just disable fletcher2 and force everything
   to use brokenfletcher4 which is good enough?  If the former, how
   are the broken and correct versions of fletcher2
   distinguished---do they show up with different names in the pool
   properties?


The best I can tell, the comments are changed to indicate fletcher2 is
deprecated. However, it must live on (forever) because of backwards
compatibility. I presume one day the default will change to fletcher4
or something else. This is implied by zfs(1m):

 checksum=on | off | fletcher2,| fletcher4 | sha256

 Controls the checksum used to verify data integrity. The
 default  value  is  on,  which  automatically selects an
 appropriate algorithm (currently,  fletcher2,  but  this
 may  change  in future releases). The value off disables
 integrity checking on user data. Disabling checksums  is
 NOT a recommended practice.


   Once you have the fixed software, how do you make sure fixed
   checksums are actually covering data blocks originally written by
   old broken software?  I assume you have to use rsync or zfs
   send/recv to rewrite all the data with the new checksum?  If yes,
   what do you have to do before rewriting---upgrade 

Re: [zfs-discuss] Best way to convert checksums

2009-10-01 Thread Darren J Moffat

Ray Clark wrote:

Dynamite!

I don't feel comfortable leaving things implicit.  That is how misunderstandings happen.  


It isn't implicit it is explicitly inherited that is how ZFS is designed 
to (and does) work.



Would you please acknowlege that zfs send | zfs receive uses the checksum 
setting on the receiving pool instead of preserving the checksum algorithm used 
by the sending block?


For now it depends wither or not you pass -R to 'zfs send' or not. 
Without the -R argument the send stream does not have any properties in 
it so it will (by design) use those that would be used if the dataset 
was created  by 'zfs create'.


In the future there will be a distinction between the local and the 
received values see the recently (yesterday) approved case PSARC/2009/510:


http://arc.opensolaris.org/caselog/PSARC/2009/510/20090924_tom.erickson

Lets look at how it works just now:

portellen:pts/2# zpool create dummy c7t3d0
portellen:pts/2# zfs create dummy/home
portellen:pts/2# cp /etc/profile /dummy/home
portellen:pts/2# zfs get checksum dummy/home
NAMEPROPERTY  VALUE  SOURCE
dummy/home  checksum  on default
portellen:pts/2# zfs snapshot dummy/h...@1
portellen:pts/2# zfs set checksum=sha256 dummy
portellen:pts/2# zfs send dummy/h...@1 | zfs recv -F dummy/home.sha256
portellen:pts/2# zfs get checksum dummy/home.sha256
NAME   PROPERTY  VALUE  SOURCE
dummy/home.sha256  checksum  sha256 inherited from dummy

Now lets verify using zdb, we should have two plain file blocks 
(/etc/profile fits in a single ZFS block) one from the original 
dummy/home and one from the newly received home.sha256.


portellen:pts/2# zdb -vvv -S user:all dummy
0	2048	1	ZFS plain file	fletcher4	uncompressed 
8040e8f120:a2c635bc0556:73b5ba539e9699:3b4d66984ac9d6b4
0	2048	1	ZFS plain file	SHA256	uncompressed 
57f1e8168c58e8cf:3b20be148f57852e:f72ee8e3358f:1bfae4ae0599577c




--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-01 Thread Frank Middleton

On 10/01/09 05:08 AM, Darren J Moffat wrote:


In the future there will be a distinction between the local and the
received values see the recently (yesterday) approved case PSARC/2009/510:

http://arc.opensolaris.org/caselog/PSARC/2009/510/20090924_tom.erickson


Currently non-recursive incremental streams send properties and full
streams don't. Will the p flag reverse its meaning for incremental
streams? For my purposes the current behavior is the exact opposite
of what I need and it isn't obvious that the case addresses this
peculiar inconsistency without going through a lot of hoops. I suppose
the new properties can be sent initially so that subsequent incremental
streams won't override the possibly changed local properties, but that
seems so complicated :-). If I understand the case correctly, we can
now set a flag that says ignore properties sent by any future incremental
non-recursive stream. This instead of having a flag for incremental
streams that says don't send properties. What happens if sometimes
we do and sometimes we don't? Sounds like a static property when a
dynamic flag is really what is wanted and this is a complicated way of
working around a design inconsistency. But maybe I missed something :-)

So what would the semantics of the new p flag be for non-recursive
incremental streams?

Thanks -- Frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-01 Thread Ray Clark
Darren, thank you very much!  Not only have you answered my question, you have 
made me aware of a tool to verify, and probably do alot more (zdb).

Can you comment on my concern regarding what checksum is used in the base zpool 
before anything is created in it?  (No doubt my terminology is wrong, but you 
get the idea I am sure).  

The single critical feature of ZFS is debatably that every block on ZFS is 
checksummed to enable detection of corruption, but it appears that the user 
does not have the ability to choose the checksum for the highest levels of the 
pool itself.  Given the issue with fletcher2, this is of concern!  Since this 
activity was kicked off by a Corrupt Metadata ZFS-8000-CS, I am trying to 
move away from fletcher2.  Don't know if that was the cause, but my goal is to 
restore the safety that we went to ZFS for.

Is my understanding correct?
Are there ways to control the checksum algorithm on the empty zpool?

Thanks, again.

--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-01 Thread Richard Elling

On Oct 1, 2009, at 7:10 AM, Ray Clark wrote:

Darren, thank you very much!  Not only have you answered my  
question, you have made me aware of a tool to verify, and probably  
do alot more (zdb).


Can you comment on my concern regarding what checksum is used in the  
base zpool before anything is created in it?  (No doubt my  
terminology is wrong, but you get the idea I am sure).


The single critical feature of ZFS is debatably that every block on  
ZFS is checksummed to enable detection of corruption, but it appears  
that the user does not have the ability to choose the checksum for  
the highest levels of the pool itself.  Given the issue with  
fletcher2, this is of concern!  Since this activity was kicked off  
by a Corrupt Metadata ZFS-8000-CS, I am trying to move away from  
fletcher2.  Don't know if that was the cause, but my goal is to  
restore the safety that we went to ZFS for.


Is my understanding correct?
Are there ways to control the checksum algorithm on the empty zpool?


You can set both zpool (-o option) and zfs (-O option) options when you
create the zpool. See zpool(1m)
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-01 Thread Ross
Ray, if you don't mind me asking, what was the original problem you had on your 
system that makes you think the checksum type is the problem?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-01 Thread Ray Clark
U4 zpool does not appear to support the -o option...   Reading a current zpool 
manpage online lists the valid properties for the current zpool -o, and 
checksum is not one of them.  Are you mistaken or am I missing something?

Another thought is that *perhaps* all of the blocks that comprise an empty 
zpool are re-written sooner or later, and once the checksum is changed with 
zfs set checksum=sha256 zfs01 (The pool name) they will be re-written with 
the new checksum very soon anyway.  Is this true?  This would require an 
understanding of the on-disk structure and when what is rewritten.

--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-01 Thread Ross
Ray, if you use -o it sets properties for the pool.  If you use -O (capital), 
it sets the filesystem properties for the default filesystem created with the 
pool.

zpool -O can use any valid zfs filesystem option.

But I agree, it's not very clearly documented.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-01 Thread Cindy Swearingen
You are correct. The zpool create -O option isn't available in a Solaris 
10 release but will be soon. This will allow you to set the file system

checksum property when the pool is created:

# zpool create -O checksum=sha256 pool c1t1d0
# zfs get checksum pool
NAME  PROPERTY  VALUE  SOURCE
pool  checksum  sha256 local

Otherwise, you would have to set it like this:

# zpool create pool c1t1d0
# zfs set checksum=sha256 pool
# zfs get checksum pool
NAME  PROPERTY  VALUE  SOURCE
pool  checksum  sha256 local

I'm not sure I understand the second part of your comments but will add:

If *you* rewrite your data then the new data will contain the new
checksum. I believe an upcoming project will provide the ability to
revise file system properties on the fly.


On 10/01/09 12:21, Ray Clark wrote:

U4 zpool does not appear to support the -o option...   Reading a current zpool 
manpage online lists the valid properties for the current zpool -o, and 
checksum is not one of them.  Are you mistaken or am I missing something?

Another thought is that *perhaps* all of the blocks that comprise an empty zpool are 
re-written sooner or later, and once the checksum is changed with zfs set 
checksum=sha256 zfs01 (The pool name) they will be re-written with the new checksum 
very soon anyway.  Is this true?  This would require an understanding of the on-disk 
structure and when what is rewritten.

--Ray

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-10-01 Thread Richard Elling
Also, when a pool is created, there is only metadata which uses  
fletcher4[*].
So it is not a crime if you set the checksum after the pool is created  
and before

data is written :-)

* note: the uberblock uses SHA-256
 -- richard


On Oct 1, 2009, at 12:34 PM, Cindy Swearingen wrote:

You are correct. The zpool create -O option isn't available in a  
Solaris 10 release but will be soon. This will allow you to set the  
file system

checksum property when the pool is created:

# zpool create -O checksum=sha256 pool c1t1d0
# zfs get checksum pool
NAME  PROPERTY  VALUE  SOURCE
pool  checksum  sha256 local

Otherwise, you would have to set it like this:

# zpool create pool c1t1d0
# zfs set checksum=sha256 pool
# zfs get checksum pool
NAME  PROPERTY  VALUE  SOURCE
pool  checksum  sha256 local

I'm not sure I understand the second part of your comments but will  
add:


If *you* rewrite your data then the new data will contain the new
checksum. I believe an upcoming project will provide the ability to
revise file system properties on the fly.


On 10/01/09 12:21, Ray Clark wrote:
U4 zpool does not appear to support the -o option...   Reading a  
current zpool manpage online lists the valid properties for the  
current zpool -o, and checksum is not one of them.  Are you  
mistaken or am I missing something?
Another thought is that *perhaps* all of the blocks that comprise  
an empty zpool are re-written sooner or later, and once the  
checksum is changed with zfs set checksum=sha256 zfs01 (The pool  
name) they will be re-written with the new checksum very soon  
anyway.  Is this true?  This would require an understanding of the  
on-disk structure and when what is rewritten.

--Ray

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-09-30 Thread Darren J Moffat

Ray Clark wrote:

When using zfs send/receive to do the conversion, the receive creates a new 
file system:

   zfs snapshot zfs01/h...@before
   zfs send zfs01/h...@before | zfs receive afx01/home.sha256

Where do I get the chance to zfs set checksum=sha256 on the new file system 
before all of the files are written ???


Set it on the afx01 dataset before you do the receive and it will be 
inherited.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-09-30 Thread Ray Clark
I made a typo... I only have one pool.  I should have typed:

   zfs snapshot zfs01/h...@before
   zfs send zfs01/h...@before | zfs receive zfs01/home.sha256

Does that change the answer?

And independently if it does or not, zfs01 is a pool, and the property is on 
the home zfs file system.

I cannot change it on the file system before doing the receive because the file 
system does not exist - it is created by the receive.

This raises a related question of whether the file system on the receiving end 
is ALL created using the checksum property from the source file system, or if 
the blocks and their present mix of checksums are faithfully recreated in the 
received file system?

Finally, is there any way to verify behavior after it is done?

Thanks for helping on this.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-09-30 Thread Darren J Moffat

Ray Clark wrote:

I made a typo... I only have one pool.  I should have typed:

   zfs snapshot zfs01/h...@before
   zfs send zfs01/h...@before | zfs receive zfs01/home.sha256

Does that change the answer?


No it doesn't change my answer


And independently if it does or not, zfs01 is a pool, and the property is on 
the home zfs file system.


doesn't mater if zfs01 is the top level dataset or not.

Before you do the receive do this:

zfs set checksum=sha256 zfs01

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-09-30 Thread Ray Clark
Dynamite!

I don't feel comfortable leaving things implicit.  That is how 
misunderstandings happen.  

Would you please acknowlege that zfs send | zfs receive uses the checksum 
setting on the receiving pool instead of preserving the checksum algorithm used 
by the sending block?

Thanks a million!
--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-09-30 Thread Ray Clark
Sinking feeling...

zfs01 was originally created with fletcher2.  Doesn't this mean that the sort 
of root level stuff in the zfs pool exist with fletcher2 and so are not well 
protected?

If so, is there a way to fix this short of a backup and restore?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-09-29 Thread Ray Clark
When using zfs send/receive to do the conversion, the receive creates a new 
file system:

   zfs snapshot zfs01/h...@before
   zfs send zfs01/h...@before | zfs receive afx01/home.sha256

Where do I get the chance to zfs set checksum=sha256 on the new file system 
before all of the files are written ???

The new filesystem is created automatically by the receive command!

Although it does not say so in the man page or zfs admin guide, it certainly 
seems reasonable that I don't get a chance - the idea is that send/receive 
recreates the file system exactly.  

This would still have an ambiguity as to whether the new blocks are 
created/copied with the checksum algorithm they had in the source filesystem 
(Which would not result in the conversion I am trying to accomplish), or are 
they created and checksumed with the algorithm specified by the checksum 
PROPERTY set in the source file system at the time of the send/receive (which 
WOULD do the conversion I am trying to accomplish)?

Is there a way to use send/receive to duplicate a filesystem with a different 
checksum, or do I use cpio or tar?  (I pick on cpio and tar because they are 
specifically called out in the zfs admin manual as saving and restoring zfs 
file attributes and ACLs).

Thanks.

--Ray
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-09-26 Thread Orvar Korvar
I had this same question. I was recommended to use rsync or zfs send. I used 
both just to be safe. With zfs send, you create a snapshot and then send the 
snapshot. After deleting the snapshot on the target, you have identical copies. 
rsync seems to be used for this task also. And also zfs send.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Best way to convert checksums

2009-09-25 Thread Ray Clark
What is the Best way to convert the checksums of an existing ZFS file system 
from one checksum to another?  To me Best means safest and most complete.

My zpool is 39% used, so there is plenty of space available.

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best way to convert checksums

2009-09-25 Thread Ray Clark
I didn't want my question to lead to an answer, but perhaps I should have put 
more information.  My idea is to copy the file system with one of the following:
   cp -rp
   zfs send | zfs receive
   tar
   cpio
But I don't know what would be the best.

Then I would do a diff -r on them before deleting the old.

I don't know the obscure (for me) secondary things like attributes, links, 
extended modes, etc.

Thanks again.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss