Today my production server crashed  4 times. THIS IS NIGHTMARE!
Self-healing file system?! For me ZFS is SELF-KILLING filesystem. 

I cannot fsck it, there's no such tool.
I cannot scrub it, it crashes 30-40 minutes after scrub starts.
I cannot use it, it crashes a number of times every day! And with every crash 
number of checksum failures is growing:

NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     0
...after a few hours...
        box5        ONLINE       0     0     4
...after a few hours...
        box5        ONLINE       0     0     62
...after another few hours...
        box5        ONLINE       0     0     120
...crash! and we start again...
        box5        ONLINE       0     0     0
...etc...

actually 120 is record, sometimes it crashed as soon as it boots.

and always there's a permanent error:
errors: Permanent errors have been detected in the following files:
        box5:<0x0>

and very wise self-healing advice:
http://www.sun.com/msg/ZFS-8000-8A
Restore the file in question if possible.  Otherwise restore the entire pool 
from backup.

Thanks, but if I restore it from backup it won't be ZFS anymore, that's for 
sure.

It's not I/O problem. AFAIK, default ZFS I/O error behavior is "wait" to repair 
(i've 10U4, non-configurable). Then why it panics?

Recently there were discussions on failure of OpenSolaris community. Now it's 
been more than half a month since I reported such an error. Nobody even posted 
something like "RTFM". Come on guys, I know you are there and busy with 
enterprise customers... but at least give me some troubleshooting ideas. i'm 
totally lost.

just to remind, it's heavily loaded fs with 3-4 million files and folders.

Link to original post:
http://www.opensolaris.org/jive/thread.jspa?threadID=57425
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to