On 06.06.2010 08:06, devsk wrote:
> I had an unclean shutdown because of a hang and suddenly my pool is degraded 
> (I realized something is wrong when python dumped core a couple of times).
> 
> This is before I ran scrub:
> 
>   pool: mypool
>  state: DEGRADED
> status: One or more devices has experienced an error resulting in data
>         corruption.  Applications may be affected.
> action: Restore the file in question if possible.  Otherwise restore the
>         entire pool from backup.
>    see: http://www.sun.com/msg/ZFS-8000-8A
>  scan: scrub repaired 0 in 0h7m with 0 errors on Mon May 31 09:00:27 2010
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         mypool      DEGRADED     0     0     0
>           c6t0d0s0  DEGRADED     0     0     0  too many errors
> 
> errors: Permanent errors have been detected in the following files:
> 
>         mypool/ROOT/May25-2010-Image-Update:<0x3041e>
>         mypool/ROOT/May25-2010-Image-Update:<0x31524>
>         mypool/ROOT/May25-2010-Image-Update:<0x26d24>
>         mypool/ROOT/May25-2010-Image-Update:<0x37234>
>         //var/pkg/download/d6/d6be0ef348e3c81f18eca38085721f6d6503af7a
>         mypool/ROOT/May25-2010-Image-Update:<0x25db3>
>         //var/pkg/download/cb/cbb0ff02bcdc6649da3763900363de7cff78ec72
>         mypool/ROOT/May25-2010-Image-Update:<0x26cf6>
> 
> 
> I ran scrub and this is what it has to say afterwards.
> 
>   pool: mypool
>  state: DEGRADED
> status: One or more devices has experienced an unrecoverable error.  An
>         attempt was made to correct the error.  Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
>         using 'zpool clear' or replace the device with 'zpool replace'.
>    see: http://www.sun.com/msg/ZFS-8000-9P
>  scan: scrub repaired 0 in 0h11m with 0 errors on Sat Jun  5 22:43:54 2010
> config:
> 
>         NAME        STATE     READ WRITE CKSUM
>         mypool      DEGRADED     0     0     0
>           c6t0d0s0  DEGRADED     0     0     0  too many errors
> 
> errors: No known data errors
> 
> Few of questions:
> 
> 1. Have the errors really gone away? Can I just clear and be content that 
> errors are really gone?
> 
> 2. Why did the errors occur anyway if ZFS guarantees on-disk consistency? I 
> wasn't writing anything. Those files were definitely not being touched when 
> the hang and unclean shutdown happened.
> 
> I mean I don't mind if I create or modify a file and it doesn't land on disk 
> because on unclean shutdown happened but a bunch of unrelated files getting 
> corrupted, is sort of painful to digest.
> 
> 3. The action says "Determine if the device needs to be replaced". How the 
> heck do I do that?


Is it possible that this system runs on a virtual box? At least I've
seen such a thing happen on a Virtual Box but never on a real machine.

The reason why the error have gone away might be that meta data has
three copies IIRC. So if your disk only had corruptions in the meta data
area these errors can be repaired by scrubbing the pool.

The smartmontools might help you figuring out if the disk is broken. But
if you only had an unexpected shutdown and now everything is clean after
a scrub, I wouldn't expect the disk to be broken. You can get the
smartmontools from opencsw.org.

If your system is really running on a Virtual Box I'd recommend that you
turn of disk write caching of Virtual Box. Search the OpenSolaris forum
of Virtual Box. There is an article somewhere how to do this. IIRC the
subject is somethink like 'zfs pool curruption'. But it is also
somewhere in the docs.

HTH,
Thomas
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to