Re: Unexpected gmirror behavior: Is this a bug?
> i think it's a bug but only happens with such massive mirror. very few >people do more than 2-way mirrors that's probably it wasn't catched. > >please do report the bug - it's critical. In fact I just confirmed that if we reduce our mirror to just two members the problem does not occur. The returning member, even if it is the first drive, is always re-synced with the data from the other drive and no data is lost. And yes, it's definitely a critical bug. I'm filing a bug report now, but we may have to fix this in-house before we can release our product with this problem. There is too great a risk for customers to lose data. Peter ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Unexpected gmirror behavior: Is this a bug?
> By "kicked out" you mean "overwritten"? > >You should definitely look at "gmirror list" before and after. Sorry for the confusion. By "kicked out", what I meant was as gmirror started up it took ad4 as the principal member, saw that it was previously part of a mirror with three other drives and tried to add those drives. These drives could not be added for some reason so the system eventually completed the process leaving a degraded mirror with only 1/4 members active. When the system completed booted, a gmirror list showed that the mirror consisted of only a single member of the expected four members. We have software that runs automatically when a system has booted to make sure all drives are partipating in the mirror. In this case it discovered 3 of the 4 drives were missing and proceeded to add them back in. This is where their old data gets destroyed of course. If I go through this exact same process with any of the other drives everything works as it should--that drive gets reinserted and none of the other drives lose any data. The problem only occurs when the drive that's pulled is the first drive, which in our case is ad4. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Unexpected gmirror behavior: Is this a bug?
2009/4/24 Peter Steele : >> This definitely looks like a bug. Try asking again on the freebsd-geom@ >> list. Provide output of "gmirror list". > > I'll try that list... > >>So, your steps were: > >>1. ad4, ad6, ad8 and ad10 in a 4-way mirror >>2. ad4 fails. At this point did you do a "gmirror list"? I.e. did >>gmirror detect it failing? If I read it correctly, the GenID field >>should have been increased in this case. >>3. The system continues to be used >>4. You power it down, take out and reinsert ad4 >>5. On boot, ad4 is detected, inserted in the mirror but as a "known >>good" copy, not a stale one. >> >>Correct? > > Yes, that's basically the sequence that occurred. I can easily recreate the > condition though. I shut my box down, took out ad4 and then rebooted. The > system complained about ad4 being missing and proceeded with a mirror using > 3/4 of the drives. I then created a file on the system, shutdown again, and > then reinserted ad4. On reboot as the system was starting up the gmirror > driver, it detected ad4 but instead of reinserting in in the most recent > mirror made up of the other drives, it became the active drive and kicked > out its old partners, By "kicked out" you mean "overwritten"? You should definitely look at "gmirror list" before and after. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Unexpected gmirror behavior: Is this a bug?
i think it's a bug but only happens with such massive mirror. very few people do more than 2-way mirrors that's probably it wasn't catched. please do report the bug - it's critical. On Fri, 24 Apr 2009, Peter Steele wrote: This only happens with ad4. If ad6 for example goes offline in the same way, when it is reinserted it does not become the dominant drive and resync its data with the other drives. Rather its data is overwritten with the data from the 3 member mirror, as you'd expect. looks like very strange bug. many times i got drives disconnected and always gmirror resynced If I just pull ad4 and then reinsert it without doing a reboot, everything works fine. The problem occurs when ad4 is pulled and then reinserted after the system is shutdown. When the system comes up, it doesn't get added back to the existing mirror but rather becomes the principal member of the mirror, using its old data. Not good. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org" ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Unexpected gmirror behavior: Is this a bug?
> This only happens with ad4. If ad6 for example goes offline in the same way, > when it is reinserted > it does not become the dominant drive and resync its data with the other > drives. Rather its data > is overwritten with the data from the 3 member mirror, as you'd expect. > looks like very strange bug. many times i got drives disconnected and > always gmirror resynced If I just pull ad4 and then reinsert it without doing a reboot, everything works fine. The problem occurs when ad4 is pulled and then reinserted after the system is shutdown. When the system comes up, it doesn't get added back to the existing mirror but rather becomes the principal member of the mirror, using its old data. Not good. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Unexpected gmirror behavior: Is this a bug?
> This definitely looks like a bug. Try asking again on the freebsd-geom@ > list. Provide output of "gmirror list". I'll try that list... >So, your steps were: >1. ad4, ad6, ad8 and ad10 in a 4-way mirror >2. ad4 fails. At this point did you do a "gmirror list"? I.e. did >gmirror detect it failing? If I read it correctly, the GenID field >should have been increased in this case. >3. The system continues to be used >4. You power it down, take out and reinsert ad4 >5. On boot, ad4 is detected, inserted in the mirror but as a "known >good" copy, not a stale one. > >Correct? Yes, that's basically the sequence that occurred. I can easily recreate the condition though. I shut my box down, took out ad4 and then rebooted. The system complained about ad4 being missing and proceeded with a mirror using 3/4 of the drives. I then created a file on the system, shutdown again, and then reinserted ad4. On reboot as the system was starting up the gmirror driver, it detected ad4 but instead of reinserting in in the most recent mirror made up of the other drives, it became the active drive and kicked out its old partners, When the old drives were reinserted manually into that mirror they were of course synced with the data from ad4, destroying their more recent data. This does not happen if I do the same thing with a drive other than ad4. >What version of FreeBSD are you using? 7.0-p10 or so. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Unexpected gmirror behavior: Is this a bug?
This only happens with ad4. If ad6 for example goes offline in the same way, when it is reinserted it does not become the dominant drive and resync its data with the other drives. Rather its data is overwritten with the data from the 3 member mirror, as you'd expect. looks like very strange bug. many times i got drives disconnected and always gmirror resynced ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Unexpected gmirror behavior: Is this a bug?
Ivan Voras wrote: > Peter Steele wrote: >> We had a somewhat startling scenario occur with gmirror. We have systems >> with four drives ad4, ad6, ad8, and ad10, with the OS setup on a mirrored >> slice across all four drives. The ad4 drive failed at one point, due to a >> simple bad connection in its drive bay. While it was offline, the system was >> continued to be used for a while and new data was added to the mirrored file >> system. >> >> We eventually took the box down to deal with ad4, and tried simply pulling >> and reinserting the drive. On reboot we saw that the BIOS detected the >> drive, so that was good. However, when FreeBSD got to the point of starting >> up the GEOM driver, instead of reinserting ad4 into the more current mirror >> consisting of ad6/ad8/ad10 and resyncing it with that data, the GEOM driver >> assumed ad4 was the "good" mirror and ended up resyncing ad6/ad8/ad10 with >> the data from ad4, causing the new files we had added to those drives to be >> lost. >> >> This only happens with ad4. If ad6 for example goes offline in the same way, >> when it is reinserted it does not become the dominant drive and resync its >> data with the other drives. Rather its data is overwritten with the data >> from the 3 member mirror, as you'd expect. >> >> So, clearly ad4, the first disk, is treated specially. The question is this >> a bug or a feature? Is there anyway to prevent this behavior? This would be >> a disastrous thing to happen in the field on one of our customer systems. > > This definitely looks like a bug. Try asking again on the freebsd-geom@ > list. Provide output of "gmirror list". > > From what you said it looks like you did the procedure safely - you > turned off the server, then pulled the drive and reinserted it, then > turned it on again, right? Sorry, that was a useless response - what I said should be a no-op. So, your steps were: 1. ad4, ad6, ad8 and ad10 in a 4-way mirror 2. ad4 fails. At this point did you do a "gmirror list"? I.e. did gmirror detect it failing? If I read it correctly, the GenID field should have been increased in this case. 3. The system continues to be used 4. You power it down, take out and reinsert ad4 5. On boot, ad4 is detected, inserted in the mirror but as a "known good" copy, not a stale one. Correct? What version of FreeBSD are you using? signature.asc Description: OpenPGP digital signature
Re: Unexpected gmirror behavior: Is this a bug?
Peter Steele wrote: > We had a somewhat startling scenario occur with gmirror. We have systems with > four drives ad4, ad6, ad8, and ad10, with the OS setup on a mirrored slice > across all four drives. The ad4 drive failed at one point, due to a simple > bad connection in its drive bay. While it was offline, the system was > continued to be used for a while and new data was added to the mirrored file > system. > > We eventually took the box down to deal with ad4, and tried simply pulling > and reinserting the drive. On reboot we saw that the BIOS detected the drive, > so that was good. However, when FreeBSD got to the point of starting up the > GEOM driver, instead of reinserting ad4 into the more current mirror > consisting of ad6/ad8/ad10 and resyncing it with that data, the GEOM driver > assumed ad4 was the "good" mirror and ended up resyncing ad6/ad8/ad10 with > the data from ad4, causing the new files we had added to those drives to be > lost. > > This only happens with ad4. If ad6 for example goes offline in the same way, > when it is reinserted it does not become the dominant drive and resync its > data with the other drives. Rather its data is overwritten with the data from > the 3 member mirror, as you'd expect. > > So, clearly ad4, the first disk, is treated specially. The question is this a > bug or a feature? Is there anyway to prevent this behavior? This would be a > disastrous thing to happen in the field on one of our customer systems. This definitely looks like a bug. Try asking again on the freebsd-geom@ list. Provide output of "gmirror list". From what you said it looks like you did the procedure safely - you turned off the server, then pulled the drive and reinserted it, then turned it on again, right? signature.asc Description: OpenPGP digital signature