Re: Weird problem with gmirror - cannot add the Good disk when previously failed SATA disk is online
Στις Monday 18 May 2009 22:07:24 ο/η Miroslav Lachman έγραψε: Achilleas Mantzios wrote: Hello, I run 7.1-PRERELEASE, its a home server. today morning after a power failure, the rebuild my root gm0 failed on disk ad4. The messages were: May 18 08:02:02 panix kernel: ad4: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=268091264 May 18 08:02:08 panix kernel: drm0: Intel i865G GMCH on vgapci0 May 18 08:02:08 panix kernel: info: [drm] AGP at 0xf000 128MB May 18 08:02:08 panix kernel: info: [drm] Initialized i915 1.5.0 20060119 May 18 08:02:08 panix kernel: drm0: [ITHREAD] May 18 08:02:08 panix kernel: ad4: FAILURE - device detached May 18 08:02:08 panix kernel: subdisk4: detached May 18 08:02:08 panix kernel: ad4: detached May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: provider ad4 disconnected. May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: rebuilding provider ad4 stopped. I read http://www.eztiger.org/2008/08/removing-and-re-adding-a-disk-in-gmirror/ hoping that the rebuld failure was temprary and so i tried to just run # gmirror forget gm0 # gmirror insert gm0 ad4 The correct order of commands is: atacontrol list gmirror list gmirror forget gm0 gmirror clear -v ad4 gmirror insert -v gm0 ad4 Thanx. Miroslav Lachman -- Achilleas Mantzios ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Weird problem with gmirror - cannot add the Good disk when previously failed SATA disk is online
Στις Monday 18 May 2009 13:27:58 ο/η Manolis Kiagias έγραψε: It looks to me you got a bad disk now. Manoli, thanx i replaced the bad disk and the system looks ok, rebulding gm0. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org -- Achilleas Mantzios ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Weird problem with gmirror - cannot add the Good disk when previously failed SATA disk is online
Achilleas Mantzios wrote: Hello, in advance sorry for the cross posting, it is just that freebsd-geom didnt seem that populated. I run 7.1-PRERELEASE, its a home server. today morning after a power failure, the rebuild my root gm0 failed on disk ad4. The messages were: May 18 08:02:02 panix kernel: ad4: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=268091264 May 18 08:02:08 panix kernel: drm0: Intel i865G GMCH on vgapci0 May 18 08:02:08 panix kernel: info: [drm] AGP at 0xf000 128MB May 18 08:02:08 panix kernel: info: [drm] Initialized i915 1.5.0 20060119 May 18 08:02:08 panix kernel: drm0: [ITHREAD] May 18 08:02:08 panix kernel: ad4: FAILURE - device detached May 18 08:02:08 panix kernel: subdisk4: detached May 18 08:02:08 panix kernel: ad4: detached May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: provider ad4 disconnected. May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: rebuilding provider ad4 stopped. It looks to me you got a bad disk now. I read http://www.eztiger.org/2008/08/removing-and-re-adding-a-disk-in-gmirror/ hoping that the rebuld failure was temprary and so i tried to just run # gmirror forget gm0 # gmirror insert gm0 ad4 But the system responded (if i remember correctly) Unknown provider ad4. The system no longer could see ad4 being online. So i rebooted the system many times and had these results: -When having put offline ad4 (disconnected by hardware), the system booted ok. -When having both disks online the system responded consistently with: GEOM_MIRROR: Cannot add disk ad6 to gm0 (error=22). Which IMO is not very ok, since gm0 should add ad6 without problem, no matter if ad4 is online or not. -When having only ad4 online, then it simply cannot find gm0 at all. (kind of reasonable) So my only option is to have only ad6 online, with a current gmirror status: panix# gmirror status NameStatus Components mirror/gm0 COMPLETE ad6 Anyone has an idea of how should i proceed (besides buying a UPS unit!) Is it meaningfull to go for a new Disk to replace current ad4? I'd recommend attaching the bad disk on its own to a system and perform tests on it. Is the BIOS recognizing this properly? I would run hardware tests on it - either manufacturer ones, or stuff like sysutils/smartmontools. You could also try installing FreeBSD on it and see if it works. And probably use dd to clean all the contents, esp. the partition table and the last sector where geom information is stored. Why is the presence of the supposed bad disk ad4, affecting gm0, when having already told gm0 to forget about ad4? The bad disk may be sending confusing signals to the bus / IDE interface. I've had this once (although it was due to a bad cable). The entire mirror would disappear suddenly. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Weird problem with gmirror - cannot add the Good disk when previously failed SATA disk is online
Hey Manoli! glad to see you again, Στις Monday 18 May 2009 13:27:58 ο/η Manolis Kiagias έγραψε: Achilleas Mantzios wrote: Hello, in advance sorry for the cross posting, it is just that freebsd-geom didnt seem that populated. I run 7.1-PRERELEASE, its a home server. today morning after a power failure, the rebuild my root gm0 failed on disk ad4. The messages were: May 18 08:02:02 panix kernel: ad4: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=268091264 May 18 08:02:08 panix kernel: drm0: Intel i865G GMCH on vgapci0 May 18 08:02:08 panix kernel: info: [drm] AGP at 0xf000 128MB May 18 08:02:08 panix kernel: info: [drm] Initialized i915 1.5.0 20060119 May 18 08:02:08 panix kernel: drm0: [ITHREAD] May 18 08:02:08 panix kernel: ad4: FAILURE - device detached May 18 08:02:08 panix kernel: subdisk4: detached May 18 08:02:08 panix kernel: ad4: detached May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: provider ad4 disconnected. May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: rebuilding provider ad4 stopped. It looks to me you got a bad disk now. I certainly hope so, since there is nothing else i can do I read http://www.eztiger.org/2008/08/removing-and-re-adding-a-disk-in-gmirror/ hoping that the rebuld failure was temprary and so i tried to just run # gmirror forget gm0 # gmirror insert gm0 ad4 But the system responded (if i remember correctly) Unknown provider ad4. The system no longer could see ad4 being online. So i rebooted the system many times and had these results: -When having put offline ad4 (disconnected by hardware), the system booted ok. -When having both disks online the system responded consistently with: GEOM_MIRROR: Cannot add disk ad6 to gm0 (error=22). Which IMO is not very ok, since gm0 should add ad6 without problem, no matter if ad4 is online or not. -When having only ad4 online, then it simply cannot find gm0 at all. (kind of reasonable) So my only option is to have only ad6 online, with a current gmirror status: panix# gmirror status NameStatus Components mirror/gm0 COMPLETE ad6 Anyone has an idea of how should i proceed (besides buying a UPS unit!) Is it meaningfull to go for a new Disk to replace current ad4? I'd recommend attaching the bad disk on its own to a system and perform tests on it. Is the BIOS recognizing this properly? I would run hardware Yes, the BIOS recognizes it ok i suppose. tests on it - either manufacturer ones, or stuff like sysutils/smartmontools. You could also try installing FreeBSD on it and see if it works. And probably use dd to clean all the contents, esp. the partition table and the last sector where geom information is stored. Thanx, lacking time i think i will try to use a brand new identical disk. Why is the presence of the supposed bad disk ad4, affecting gm0, when having already told gm0 to forget about ad4? The bad disk may be sending confusing signals to the bus / IDE interface. I've had this once (although it was due to a bad cable). The entire mirror would disappear suddenly. -- Achilleas Mantzios ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org
Re: Weird problem with gmirror - cannot add the Good disk when previously failed SATA disk is online
panix panix wrote: Hello, in advance sorry for the cross posting, it is just that freebsd-geom didnt seem that populated. I run 7.1-PRERELEASE, its a home server. today morning after a power failure, the rebuild my root gm0 failed on disk ad4. The messages were: May 18 08:02:02 panix kernel: ad4: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=268091264 May 18 08:02:08 panix kernel: drm0: Intel i865G GMCH on vgapci0 May 18 08:02:08 panix kernel: info: [drm] AGP at 0xf000 128MB May 18 08:02:08 panix kernel: info: [drm] Initialized i915 1.5.0 20060119 May 18 08:02:08 panix kernel: drm0: [ITHREAD] May 18 08:02:08 panix kernel: ad4: FAILURE - device detached May 18 08:02:08 panix kernel: subdisk4: detached May 18 08:02:08 panix kernel: ad4: detached May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: provider ad4 disconnected. May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: rebuilding provider ad4 stopped. I read http://www.eztiger.org/2008/08/removing-and-re-adding-a-disk-in-gmirror/ hoping that the rebuld failure was temprary and so i tried to just run # gmirror forget gm0 # gmirror insert gm0 ad4 But the system responded (if i remember correctly) Unknown provider ad4. The system no longer could see ad4 being online. Yes, as you were informed by the device detached message - after that point the ad4 was removed from /dev. So i rebooted the system many times and had these results: -When having put offline ad4 (disconnected by hardware), the system booted ok. -When having both disks online the system responded consistently with: GEOM_MIRROR: Cannot add disk ad6 to gm0 (error=22). Which means that gm0 was somehow created before - maybe from the stale ad4 copy? If so, you are attempting to add a newer generation of data (from ad6) to a gm0 instantiated from an older generation (from ad4). This could explain the error code (22=invalid argument). OTOH if you only have ad6 in the system this means you are trying to insert ad6 into a mirror which is already instantiated by ad6 - which is trivially wrong. Which IMO is not very ok, since gm0 should add ad6 without problem, no matter if ad4 is online or not. You cannot really expect the system to behave correctly with broken hardware. -When having only ad4 online, then it simply cannot find gm0 at all. (kind of reasonable) Relatively. Is the ad4 recognized by the system? You didn't really clear metadata on ad4 so it should be recognized, but as a stale version (hopefully). If it isn't recognized at all, then it's broken. So my only option is to have only ad6 online, with a current gmirror status: panix# gmirror status NameStatus Components mirror/gm0 COMPLETE ad6 This is ok. Anyone has an idea of how should i proceed (besides buying a UPS unit!) Is it meaningfull to go for a new Disk to replace current ad4? Yes. Then proceed with gmirror insert. Why is the presence of the supposed bad disk ad4, affecting gm0, when having already told gm0 to forget about ad4? It's relatively common (it was more common in the days of PATA cables) to have a bad drive interfering with the rest of the system. signature.asc Description: OpenPGP digital signature
Re: Weird problem with gmirror - cannot add the Good disk when previously failed SATA disk is online
Achilleas Mantzios wrote: Hello, I run 7.1-PRERELEASE, its a home server. today morning after a power failure, the rebuild my root gm0 failed on disk ad4. The messages were: May 18 08:02:02 panix kernel: ad4: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=268091264 May 18 08:02:08 panix kernel: drm0: Intel i865G GMCH on vgapci0 May 18 08:02:08 panix kernel: info: [drm] AGP at 0xf000 128MB May 18 08:02:08 panix kernel: info: [drm] Initialized i915 1.5.0 20060119 May 18 08:02:08 panix kernel: drm0: [ITHREAD] May 18 08:02:08 panix kernel: ad4: FAILURE - device detached May 18 08:02:08 panix kernel: subdisk4: detached May 18 08:02:08 panix kernel: ad4: detached May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: provider ad4 disconnected. May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: rebuilding provider ad4 stopped. I read http://www.eztiger.org/2008/08/removing-and-re-adding-a-disk-in-gmirror/ hoping that the rebuld failure was temprary and so i tried to just run # gmirror forget gm0 # gmirror insert gm0 ad4 The correct order of commands is: atacontrol list gmirror list gmirror forget gm0 gmirror clear -v ad4 gmirror insert -v gm0 ad4 Miroslav Lachman ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org