Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-15 Thread Ceri Davies
On Thu, Sep 14, 2006 at 05:08:18PM -0500, Nicolas Williams wrote:
 On Thu, Sep 14, 2006 at 10:32:59PM +0200, Henk Langeveld wrote:
  Bady, Brant RBCM:EX wrote:
  Part of the archiving process is to generate checksums (I happen to use
  MD5), and store them with other metadata about the digital object in
  order to verify data integrity and demonstrate the authenticity of the
  digital object over time.
  
  Wouldn't it be helpful if there was a utility to access/read  the
  checksum data created by ZFS, and use it for those same purposes.
  
  Doesn't ZFS use block-level checksums?
 
 Yes, but the checksum is stored with the pointer.
 
 So then, for each file/directory there's a dnode, and that dnode has
 several block pointers to data blocks or indirect blocks, and indirect
 blocks have pointers to... and so on.

Does ZFS have block fragments?  If so, then updating an unrelated file
would change the checksum.

Ceri
-- 
That must be wonderful!  I don't understand it at all.
  -- Moliere


pgpzabNG9m5HW.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-15 Thread Luke Scharf

Luke Scharf wrote:
It sounded to me like he wanted to implement tripwire, but save some 
time and CPU power by querying the checksumming-work that was already 
done by ZFS.
Nevermind.  The e-mail client that I chose to use broke up the thread, 
and I didn't see that the issue had already been thoroughly discussed.


-Luke



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-15 Thread Nicolas Williams
On Fri, Sep 15, 2006 at 09:31:04AM +0100, Ceri Davies wrote:
 On Thu, Sep 14, 2006 at 05:08:18PM -0500, Nicolas Williams wrote:
  Yes, but the checksum is stored with the pointer.
  
  So then, for each file/directory there's a dnode, and that dnode has
  several block pointers to data blocks or indirect blocks, and indirect
  blocks have pointers to... and so on.
 
 Does ZFS have block fragments?  If so, then updating an unrelated file
 would change the checksum.

No.  It has variable sized blocks.

A block pointer in ZFS is much more than just a block number.  Among
other things a block pointer has the checksum of the block it points to.
See the on-disk layout document for more info.

There is no way that updating one file could change another's checksum.

What does matter is that the ZFS checksum of a file, to be O(1), depends
on the on-disk layout of the file, and anything that would change that
(today nothing would) would change the ZFS checksum of the file.  So I
think that ZFS checksums, if exposed, are best left as a file change
test optimization, not as an actual checksum of the file.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-14 Thread Chad Lewis


On Sep 14, 2006, at 1:32 PM, Henk Langeveld wrote:


Bady, Brant RBCM:EX wrote:
Part of the archiving process is to generate checksums (I happen  
to use

MD5), and store them with other metadata about the digital object in
order to verify data integrity and demonstrate the authenticity of  
the

digital object over time.



Wouldn't it be helpful if there was a utility to access/read  the
checksum data created by ZFS, and use it for those same purposes.


Doesn't ZFS use block-level checksums?
Hoping to see something like that in a future release, or a  
command line

utility that could do the same.


It might be possible to add a user set property to a file with the  
md5sum and

a timestamp when it was computed.

But what would this protect against?  If you need to avoid  
tampering, you

need the checksums offline anyway - cf. tripwire.

Cheers,
Henk



Better still would be the forthcoming cryptographic extensions in some
kind of digital-signature mode.

ckl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-14 Thread James C. McPherson

Bady, Brant RBCM:EX wrote:
I am working in the area of archiving (in the true send of the word - 
e.g. using the OAIS reference model) electronic data for long term 
preservation and access.  ZFS now makes magnetic disk arrays a bit more 
suitable for that.


Part of the archiving process is to generate checksums (I happen to use 
MD5), and store them with other metadata about the digital object in 
order to verify data integrity and demonstrate the authenticity of the 
digital object over time.


Wouldn’t it be helpful if there was a utility to access/read  the 
checksum data created by ZFS, and use it for those same purposes.



Would you want a single checksum per file, or the list of every checksum
for every block that the file referenced?

The second option might get unwieldy.

The first option - a meta-checksum if you like - would require some
interesting design.



James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
  http://www.jmcp.homeunix.com/blog
Find me on LinkedIn @ http://www.linkedin.com/pub/2/1ab/967
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


RE: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-14 Thread Bady, Brant RBCM:EX
Actually to clarify - what I want to do is to be able to read the
associated checksums ZFS creates for a file and then store them in an
external system e.g. an oracle database most likely

Its just a way of avoiding having to do MD5's on everything when ZFS is
doing checksums as well.

If ZFS does block level checksums, then I guess that's not so easy to
use them in that way.

I will check out the crypto extensions when they become available.

Thanks


Brant Bady
Access and Information Management
Royal British Columbia Museum
Telephone:  (250) 387-4126
Email:  [EMAIL PROTECTED]



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 14, 2006 1:46 PM
To: Henk Langeveld
Cc: Bady, Brant RBCM:EX; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Access to ZFS checksums would be nice and
very useful feature


On Sep 14, 2006, at 1:32 PM, Henk Langeveld wrote:

 Bady, Brant RBCM:EX wrote:
 Part of the archiving process is to generate checksums (I happen to 
 use MD5), and store them with other metadata about the digital object

 in order to verify data integrity and demonstrate the authenticity of

 the digital object over time.

 Wouldn't it be helpful if there was a utility to access/read  the 
 checksum data created by ZFS, and use it for those same purposes.

 Doesn't ZFS use block-level checksums?
 Hoping to see something like that in a future release, or a command 
 line utility that could do the same.

 It might be possible to add a user set property to a file with the 
 md5sum and a timestamp when it was computed.

 But what would this protect against?  If you need to avoid tampering, 
 you need the checksums offline anyway - cf. tripwire.

 Cheers,
 Henk


Better still would be the forthcoming cryptographic extensions in some
kind of digital-signature mode.

ckl


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-14 Thread Nicolas Williams
On Thu, Sep 14, 2006 at 10:32:59PM +0200, Henk Langeveld wrote:
 Bady, Brant RBCM:EX wrote:
 Part of the archiving process is to generate checksums (I happen to use
 MD5), and store them with other metadata about the digital object in
 order to verify data integrity and demonstrate the authenticity of the
 digital object over time.
 
 Wouldn't it be helpful if there was a utility to access/read  the
 checksum data created by ZFS, and use it for those same purposes.
 
 Doesn't ZFS use block-level checksums?

Yes, but the checksum is stored with the pointer.

So then, for each file/directory there's a dnode, and that dnode has
several block pointers to data blocks or indirect blocks, and indirect
blocks have pointers to... and so on.

If a bit of data in a file changes, then a new block will be written,
and the pointer to the previous block will be changed in the indirect
block that pointed to it or the dnode itself if there was no indirect
block, and so on, and a new block will be written for each indirect
block and dnode so modified.  All in one transaction.  That's how COW
works.

And this will necessarily change any checksum of the dnode itself
(assuming there are no collisions in the checksum algorithm).

So, a checksum of a dnode will capture the entire file's contents and
meta-data.  Read from the file, update the atime, and so change its
checksum.  ZFS could export a dnode checksum that only covers the data,
and another that covers both, data and meta-data.

Of course, a filesystem scrub (if one is implemented, but I think it
will be necessary) would change all such checksums.  So these checksums
may not have the desired property.

 Hoping to see something like that in a future release, or a command line
 utility that could do the same.
 
 It might be possible to add a user set property to a file with the md5sum 
 and
 a timestamp when it was computed.

That would be slow.

 But what would this protect against?  If you need to avoid tampering, you
 need the checksums offline anyway - cf. tripwire.

ZFS can very quickly compute a checksum of a file's data by checksumming
all the top-level block pointers in the file's dnode.  Or the data and
meta-data by checksumming the entire dnode.  That's O(1), no matter how
large the file.  That'd be nice indeed!

But because of the semantics for when such checksums can/could change
(see above), ZFS checksums can only be used to detect the possiblity of
change, and so there may be false positives, IMO.  Which means that for
tamper detection one would need to compute a checksum of the file
contents and then store it and the ZFS checksum together, using the ZFS
checksum only as a way to optimize against checksumming the entire file
most of the time.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-14 Thread Mike Gerdts

On 9/14/06, Chad Lewis [EMAIL PROTECTED] wrote:

Better still would be the forthcoming cryptographic extensions in some
kind of digital-signature mode.


When I first saw extended attributes I thought that would be a great
place to store a digital signature of the file.  I'm not saying that
it is up to ZFS to generate or manage the signature.

The nice thing about it is that so long as the private key is secret,
the signature stays with the file as it is moved, taken to tape, other
file systems, etc. so long as the file manipulation mechanisms support
extended-attributes.

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-14 Thread Nicolas Williams
On Thu, Sep 14, 2006 at 06:26:46PM -0500, Mike Gerdts wrote:
 On 9/14/06, Chad Lewis [EMAIL PROTECTED] wrote:
 Better still would be the forthcoming cryptographic extensions in some
 kind of digital-signature mode.
 
 When I first saw extended attributes I thought that would be a great
 place to store a digital signature of the file.  I'm not saying that
 it is up to ZFS to generate or manage the signature.
 
 The nice thing about it is that so long as the private key is secret,
 the signature stays with the file as it is moved, taken to tape, other
 file systems, etc. so long as the file manipulation mechanisms support
 extended-attributes.

Hmm.  Picture a magic attribute that returns a checksum of the file's
contents and which recomputes this checksum only the first time it is
read after the file has changed.  Internally ZFS could invalidate this
checksum whenever the file changes, then recompute and store the
attribute when the attribute is next read.  That sounds useful, but if
read at unexpected times it would be observed as a slow down by users.
I think I'd rather ZFS export a ZFS checksum (O(1)) instead (also as a
magic attribute) and let auditing systems do any additional checksumming
explicitly.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Access to ZFS checksums would be nice and very useful feature

2006-09-14 Thread Matthew Ahrens

Bady, Brant RBCM:EX wrote:

Actually to clarify - what I want to do is to be able to read the
associated checksums ZFS creates for a file and then store them in an
external system e.g. an oracle database most likely


Rather than storing the checksum externally, you could simply let ZFS 
verify the integrity of the data.  Whenever you want to check it, just 
run 'zpool scrub'.


Of course, if you don't trust ZFS to do that for you, you probably 
wouldn't trust it to tell you the checksum either!


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss