Re: [zones-discuss] [zfs-discuss] Zones on shared storage - a warning

2010-01-08 Thread Mike Gerdts
On Fri, Jan 8, 2010 at 6:55 AM, Darren J Moffat darr...@opensolaris.org wrote:
 Frank Batschulat (Home) wrote:

 This just can't be an accident, there must be some coincidence and thus
 there's a good chance
 that these CHKSUM errors must have a common source, either in ZFS or in
 NFS ?

 What are you using for on the wire protection with NFS ?  Is it shared using
 krb5i or do you have IPsec configured ?  If not I'd recommend trying one of
 those and see if your symptoms change.

Shouldn't a scrub pick that up?  Why would there be no errors from
zoneadm install, which under the covers does a pkg image create
followed by *multiple* pkg install invocations.  No checksum errors
pop up there.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zones-discuss mailing list
zones-discuss@opensolaris.org


Re: [zones-discuss] [zfs-discuss] Zones on shared storage - a warning

2010-01-08 Thread Frank Batschulat (Home)
On Fri, 08 Jan 2010 13:55:13 +0100, Darren J Moffat darr...@opensolaris.org 
wrote:

 Frank Batschulat (Home) wrote:
 This just can't be an accident, there must be some coincidence and thus 
 there's a good chance
 that these CHKSUM errors must have a common source, either in ZFS or in NFS ?

 What are you using for on the wire protection with NFS ?  Is it shared
 using krb5i or do you have IPsec configured ?  If not I'd recommend
 trying one of those and see if your symptoms change.

Hey Darren, doing krb5i is certainly a good idea for additional protection in 
general,
however I have some doubts that NFS OTW corruption will produce the exact same
wrong checksum inside 2 totally different setups and networks, as comparing
Mike and my results showed [see 1].

cheers
frankB

[1]

osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort 
| uniq -c | sort -n | tail
   2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 
0x7cd81ca72df5ccc0
   2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 
0x3d2827dd7ee4f21
   6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 
0x983ddbb8c4590e40
*A   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
0x89715e34fbf9cdc0
*B   7cksum_actual = 0x0 0x0 0x0 0x0
*C  11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*D  14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E  17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*F  20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
0x7f84b11b3fc7f80
*G  25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
0x82804bc6ebcfc0

==

now compare this with Mike's error output as posted here:

http://www.mail-archive.com/zfs-disc...@opensolaris.org/msg33041.html

# fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail

   2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 
0x290cbce13fc59dce
*D   3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E   3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*B   4cksum_actual = 0x0 0x0 0x0 0x0
   4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 
0x330107da7c4bcec0
   5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 
0x4e0b3a8747b8a8
*C   6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*A   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
0x89715e34fbf9cdc0
*F  16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
0x7f84b11b3fc7f80
*G  48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
0x82804bc6ebcfc0

and observe that the values in 'chksum_actual' causing our CHKSUM pool errors 
eventually
because of missmatching with what had been expected are the SAME ! for 2 totally
different client systems and 2 different NFS servers (mine vrs. Mike's),
see the entries marked with *A to *G.
___
zones-discuss mailing list
zones-discuss@opensolaris.org