Am 19.01.13 18:17, schrieb Bob Friesenhahn:
I know and it's really bugging me, that I seem to have these chksum
errors on all of my machines, be it Sun gear or Dell.
On Sat, 19 Jan 2013, Stephan Budach wrote:
Now, this zpool is made of 3-way mirrors and currently 13 out of 15
vdevs are resilvering (which they had gone through yesterday as well)
and I never got any error while resilvering. I have been all over the
setup to find any glitch or bad part, but I couldn't come up with
Doesn't this sound improbable, wouldn't one expect to encounter other
chksum errors while resilvering is running?
I can't attest to chksum errors since I have yet to see one on my
machines (have seen several complete disk failures, or disks faulted
by the system though). Checksum errors are bad and not seeing them
should be the normal case.
Resilver may in fact be just verifying that the pool disks are
coherent via metadata. This might happen if the fiber channel is
Regarding the dire fiber channel issue, are you using fiber channel
switches or direct connections to the storage array(s)? If you are
using switches, are they stable or are they doing something terrible
like resetting? Do you have duplex connectivity? Have you verified
that your FC HBA's firmware is correct?
Looking on my FC switches, I am noticing such errors like these:
[Thu Dec 06 03:33:04.795 UTC 2012][I][8600.001E][Port][Port:
2][PortID 0x30200 PortWWN 10:00:00:06:2b:12:d3:55 logged out of nameserver.]
[Thu Dec 06 03:33:05.829 UTC 2012][I][8600.0020][Port][Port:
[Thu Dec 06 03:37:08.077 UTC 2012][I][8600.001F][Port][Port:
[Thu Dec 06 03:37:10.582 UTC 2012][I][8600.001D][Port][Port:
2][PortID 0x30200 PortWWN 10:00:00:06:2b:12:d3:55 logged into nameserver.]
[Sun Dec 09 04:18:32.324 UTC 2012][I][8600.001E][Port][Port:
10][PortID 0x30a00 PortWWN 21:01:00:1b:32:22:30:53 logged out of
[Sun Dec 09 04:18:32.326 UTC 2012][I][8600.0020][Port][Port:
[Sun Dec 09 04:18:32.913 UTC 2012][I][8600.001F][Port][Port:
[Sun Dec 09 04:18:33.024 UTC 2012][I][8600.001D][Port][Port:
10][PortID 0x30a00 PortWWN 21:01:00:1b:32:22:30:53 logged into nameserver.]
Just ignore the timestamp, as it seems that the time is not set
correctly, but the dates match my two issues from today and thursday,
which accounts for three days. I didn't catch that before, but it seems
to clearly indicate a problem with the FC connection…
But, what do I make of this information?
Well, this is the most scaring part to me. Neither fmdump nor dmesg
showed anything that would indicate a connectivity issue - at least not
the last time.
Did you check for messages in /var/adm/messages which might indicate
when and how FC connectivity has been lost?
zfs-discuss mailing list