Re: ZFS error logging

2016-09-22 Thread Peter Ross via luv-main
Hi Russell,

I would assume that the resilvering is related to the checksum errors. From
the zpool(8) manpage:

Scrubbing and resilvering are very similar operations. The difference
is that resilvering only examines data that ZFS knows to be out of
date (for example, when attaching a new device to a mirror or
replacing an existing device), whereas scrubbing examines all data to
discover silent errors due to hardware faults or disk failure.


For the messages: FreeBSD has a sysctl vfs.zfs.debug. This sysctl approach
was ported to Linux, my Google 'research' (e.g.
http://askubuntu.com/questions/228386/how-do-you-apply-performance-tuning-settings-for-native-zfs)
indicates,
so you may be able to use it under Linux too.

BTW: There is a Nagios/Icinga check_zfs plugin.

I did not know about "mon" before... How does it compare to Nagios/Icinga?

Regards
Peter


On Thu, Sep 22, 2016 at 10:54 PM, Russell Coker via luv-main <
luv-main@luv.asn.au> wrote:

> Below is part of the output of "zpool status".  It seems that sdr is
> defective, it has a steadily increasing number of checksum errors.
>
> Would the "resilvered 763M" part be about the 121 checksum errors?  If so
> does
> that mean each checksum error required resilvering on average 6M of data?
>
> The kernel message log has NOTHING about this.  I'm used to Ext* and BTRFS
> which give kernel message log entries about filesystem errors.  Can ZFS be
> configured to give similar logging?
>
> As an aside I've written a mon module for monitoring for such ZFS errors.
> I'll release it sometime soon.  But I'd be happy to give a version that's
> quite usable although not ready for full release to anyone who wants it.
>
> status: One or more devices has experienced an unrecoverable error.  An
> attempt was made to correct the error.  Applications are
> unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> using 'zpool clear' or replace the device with 'zpool replace'.
>see: http://zfsonlinux.org/msg/ZFS-8000-9P
>   scan: resilvered 763M in 0h0m with 0 errors on Thu Aug 18 14:48:53 2016
> config:
>
> NAME   STATE READ WRITE CKSUM
> server ONLINE   0 0 0
>   raidz1-0 ONLINE   0 0 0
> sdjONLINE   0 0 0
> sdkONLINE   0 0 0
> sdlONLINE   0 0 0
> sdmONLINE   0 0 0
> sdnONLINE   0 0 0
> sdoONLINE   0 0 0
> sdpONLINE   0 0 0
> sdqONLINE   0 0 0
> sdrONLINE   0 0   121
>
> --
> My Main Blog http://etbe.coker.com.au/
> My Documents Bloghttp://doc.coker.com.au/
>
> ___
> luv-main mailing list
> luv-main@luv.asn.au
> https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main
>
___
luv-main mailing list
luv-main@luv.asn.au
https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main


ZFS error logging

2016-09-22 Thread Russell Coker via luv-main
Below is part of the output of "zpool status".  It seems that sdr is 
defective, it has a steadily increasing number of checksum errors.

Would the "resilvered 763M" part be about the 121 checksum errors?  If so does 
that mean each checksum error required resilvering on average 6M of data?

The kernel message log has NOTHING about this.  I'm used to Ext* and BTRFS 
which give kernel message log entries about filesystem errors.  Can ZFS be 
configured to give similar logging?

As an aside I've written a mon module for monitoring for such ZFS errors.  
I'll release it sometime soon.  But I'd be happy to give a version that's 
quite usable although not ready for full release to anyone who wants it.

status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 763M in 0h0m with 0 errors on Thu Aug 18 14:48:53 2016
config:

NAME   STATE READ WRITE CKSUM
server ONLINE   0 0 0
  raidz1-0 ONLINE   0 0 0
sdjONLINE   0 0 0
sdkONLINE   0 0 0
sdlONLINE   0 0 0
sdmONLINE   0 0 0
sdnONLINE   0 0 0
sdoONLINE   0 0 0
sdpONLINE   0 0 0
sdqONLINE   0 0 0
sdrONLINE   0 0   121

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/

___
luv-main mailing list
luv-main@luv.asn.au
https://lists.luv.asn.au/cgi-bin/mailman/listinfo/luv-main