On Fri, 08 Jan 2010 13:55:13 +0100, Darren J Moffat
wrote:
> Frank Batschulat (Home) wrote:
>> This just can't be an accident, there must be some coincidence and thus
>> there's a good chance
>> that these CHKSUM errors must have a common source, either in ZFS or in NFS ?
>
> What are you using for on the wire protection with NFS ? Is it shared
> using krb5i or do you have IPsec configured ? If not I'd recommend
> trying one of those and see if your symptoms change.
Hey Darren, doing krb5i is certainly a good idea for additional protection in
general,
however I have some doubts that NFS OTW corruption will produce the exact same
wrong checksum inside 2 totally different setups and networks, as comparing
Mike and my results showed [see 1].
cheers
frankB
[1]
osoldev.batschul./export/home/batschul.=> fmdump -eV | grep cksum_actual | sort
| uniq -c | sort -n | tail
2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100
0x7cd81ca72df5ccc0
2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74
0x3d2827dd7ee4f21
6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300
0x983ddbb8c4590e40
*A 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00
0x89715e34fbf9cdc0
*B 7cksum_actual = 0x0 0x0 0x0 0x0
*C 11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00
0x280934efa6d20f40
*D 14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400
0x7e0aef335f0c7f00
*E 17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800
0xd4f1025a8e66fe00
*F 20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00
0x7f84b11b3fc7f80
*G 25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500
0x82804bc6ebcfc0
==
now compare this with Mike's error output as posted here:
http://www.mail-archive.com/zfs-disc...@opensolaris.org/msg33041.html
# fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail
2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62
0x290cbce13fc59dce
*D 3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400
0x7e0aef335f0c7f00
*E 3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800
0xd4f1025a8e66fe00
*B 4cksum_actual = 0x0 0x0 0x0 0x0
4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900
0x330107da7c4bcec0
5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73
0x4e0b3a8747b8a8
*C 6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00
0x280934efa6d20f40
*A 6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00
0x89715e34fbf9cdc0
*F 16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00
0x7f84b11b3fc7f80
*G 48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500
0x82804bc6ebcfc0
and observe that the values in 'chksum_actual' causing our CHKSUM pool errors
eventually
because of missmatching with what had been expected are the SAME ! for 2 totally
different client systems and 2 different NFS servers (mine vrs. Mike's),
see the entries marked with *A to *G.
___
zones-discuss mailing list
zones-discuss@opensolaris.org