Re: automatic fsck on gmirror failure
On Fri, 22 Feb 2008, Wojciech Puchar wrote: gmirror(8) / geom(8) should automatically remove (degrade) components with bad I/O operations after a certain threshold, but I'm pretty sure it doesn't. but i'm absolutely sure it does because it did several times for me Finally I had some time to research. 939 of geom_mirror -- kern.geom.mirror.disconnect_on_failure -- It's a newer 6.x thing apparently: Behavior is not tunable. It happens on a single failure. I ask about tunable behavior because some cheap IDE (Maxtor) disks can fail, then recover. 6.3/amd64: [EMAIL PROTECTED] /usr/src-RELENG_6_3]# sysctl -a|grep -i kern.geom.mirror kern.geom.mirror.sync_requests: 2 kern.geom.mirror.disconnect_on_failure: 1 kern.geom.mirror.idletime: 5 kern.geom.mirror.timeout: 4 kern.geom.mirror.debug: 0 But: FreeBSD wingspan 5.5-RELEASE-p10 FreeBSD 5.5-RELEASE-p10 #0: Fri Jan 12 [EMAIL PROTECTED]:/home/seklecki$ sysctl -a|grep -i kern.geom.mirror kern.geom.mirror.debug: 0 kern.geom.mirror.timeout: 0 kern.geom.mirror.idletime: 5 kern.geom.mirror.reqs_per_sync: 5 kern.geom.mirror.syncs_per_sec: 1000 ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: automatic fsck on gmirror failure
On Fri, 22 Feb 2008, Wojciech Puchar wrote: $ grep -i fsck /etc/defaults/rc.conf fsck_y_enable="NO" # Set to YES to do fsck -y if the initial preen fails. gmirror(8) / geom(8) should automatically remove (degrade) components with bad I/O operations after a certain threshold, but I'm pretty sure it doesn't. yes it does Maybe my experiences didn't his the threshold. I'm checking the code now. The threshold is likely compile-time adjusable. ~BAS ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: automatic fsck on gmirror failure
gmirror(8) / geom(8) should automatically remove (degrade) components with bad I/O operations after a certain threshold, but I'm pretty sure it doesn't. but i'm absolutely sure it does because it did several times for me ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: automatic fsck on gmirror failure
On Sun, 2008-02-03 at 23:39 +0100, Wojciech Puchar wrote: > it failed while rebuilding with badly written data on the disk that was > used, while other rebuild. > > now it can't read it. > > if you are sure that it doesn't pass through fsck before second reboot, do > the following. > > 1) turn off gmirror > > 2) clear gmirror header on both providers > > 3) run fsck the other drive (not ad6, but the other used on mirror). > Also don't forget about: $ grep -i fsck /etc/defaults/rc.conf fsck_y_enable="NO" # Set to YES to do fsck -y if the initial preen fails. gmirror(8) / geom(8) should automatically remove (degrade) components with bad I/O operations after a certain threshold, but I'm pretty sure it doesn't. ~BAS > 4) pray > > 5) after fsck will end it successfully (it should), create gmirror with > the disk you checked > > gmirror label gmirror-name /dev/thedisk > > 6) reboot and start the system. should go well. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: automatic fsck on gmirror failure
$ grep -i fsck /etc/defaults/rc.conf fsck_y_enable="NO" # Set to YES to do fsck -y if the initial preen fails. gmirror(8) / geom(8) should automatically remove (degrade) components with bad I/O operations after a certain threshold, but I'm pretty sure it doesn't. yes it does ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: automatic fsck on gmirror failure
it failed while rebuilding with badly written data on the disk that was used, while other rebuild. now it can't read it. if you are sure that it doesn't pass through fsck before second reboot, do the following. 1) turn off gmirror 2) clear gmirror header on both providers 3) run fsck the other drive (not ad6, but the other used on mirror). 4) pray 5) after fsck will end it successfully (it should), create gmirror with the disk you checked gmirror label gmirror-name /dev/thedisk 6) reboot and start the system. should go well. 7) after system is running and not too much needing disk I/O, do gmirror insert gmirror-name /dev/ad6 8) pray again, but with much less fear. 9) if gmirror will finish rebuild, all right. if you got write errors in log, ad6 needs to be replaced. wish it helps. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
automatic fsck on gmirror failure
Hi there, I have a RAID 1 mirror implemented with gmirror and we recently had some power issues at our data centre which caused fsck to fail mysteriously. The server lost power unexpectedly, then came back up again for a minute, power died again and shortly after the next boot the following appears in my /var/log/messages Feb 2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: INCORRECT BLOCK COUNT I=777684 (8 should be 0) (CORRECTED) Feb 2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: CANNOT READ BLK: 12417184 Feb 2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: UNEXPECTED SOFT UPDATE INCONSISTENCY; RUN fsck MANUALLY. gm0s1f is my /usr partition. This was followed by countless errors that look like Feb 2 05:20:38 myserver ad6: TIMEOUT - READ_DMA retrying (1 retry left) LBA=29096879 Feb 2 05:20:43 myserver ad6: TIMEOUT - READ_DMA retrying (0 retries left) LBA=29096879 Feb 2 05:20:48 myserver ad6: FAILURE - READ_DMA timed out LBA=29096879 Feb 2 05:20:48 myserver g_vfs_done():mirror/gm0s1f[READ(offset=6357598208, length=16384)]error = 5 and with it went any sort of remote access to the box. We had to get physical access, fsck -y and reboot for the machine to be put back into service. Now my question is: Why did fsck die on me? I thought in this day and age file system corruptions caused by power failures are repaired automatically upon reboot. Or is it possible that interrupting fsck itself caused the problem when the system went down again after the very brief uptime in between? I am really concerned about this as this caused a lot of unnecessary downtime and I really don't want this to ever happen again. I know, solving the power issues is the real solution but I want my several layers of peace of mind. Oh, I run 6.2 RELEASE. Gunther ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"