Today my production server crashed 4 times. THIS IS NIGHTMARE! Self-healing file system?! For me ZFS is SELF-KILLING filesystem.
I cannot fsck it, there's no such tool. I cannot scrub it, it crashes 30-40 minutes after scrub starts. I cannot use it, it crashes a number of times every day! And with every crash number of checksum failures is growing: NAME STATE READ WRITE CKSUM box5 ONLINE 0 0 0 ...after a few hours... box5 ONLINE 0 0 4 ...after a few hours... box5 ONLINE 0 0 62 ...after another few hours... box5 ONLINE 0 0 120 ...crash! and we start again... box5 ONLINE 0 0 0 ...etc... actually 120 is record, sometimes it crashed as soon as it boots. and always there's a permanent error: errors: Permanent errors have been detected in the following files: box5:<0x0> and very wise self-healing advice: http://www.sun.com/msg/ZFS-8000-8A Restore the file in question if possible. Otherwise restore the entire pool from backup. Thanks, but if I restore it from backup it won't be ZFS anymore, that's for sure. It's not I/O problem. AFAIK, default ZFS I/O error behavior is "wait" to repair (i've 10U4, non-configurable). Then why it panics? Recently there were discussions on failure of OpenSolaris community. Now it's been more than half a month since I reported such an error. Nobody even posted something like "RTFM". Come on guys, I know you are there and busy with enterprise customers... but at least give me some troubleshooting ideas. i'm totally lost. just to remind, it's heavily loaded fs with 3-4 million files and folders. Link to original post: http://www.opensolaris.org/jive/thread.jspa?threadID=57425 This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss