Re: Unexpected gmirror behavior: Is this a bug?

2009-04-27 Thread Peter Steele
> i think it's a bug but only happens with such massive mirror. very few 
>people do more than 2-way mirrors that's probably it wasn't catched. 
> 
>please do report the bug - it's critical. 

In fact I just confirmed that if we reduce our mirror to just two members the 
problem does not occur. The returning member, even if it is the first drive, is 
always re-synced with the data from the other drive and no data is lost. 

And yes, it's definitely a critical bug. I'm filing a bug report now, but we 
may have to fix this in-house before we can release our product with this 
problem. There is too great a risk for customers to lose data. 

Peter 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Unexpected gmirror behavior: Is this a bug?

2009-04-24 Thread Peter Steele
> By "kicked out" you mean "overwritten"? 
> 
>You should definitely look at "gmirror list" before and after. 

Sorry for the confusion. By "kicked out", what I meant was as gmirror started 
up it took ad4 as the principal member, saw that it was previously part of a 
mirror with three other drives and tried to add those drives. These drives 
could not be added for some reason so the system eventually completed the 
process leaving a degraded mirror with only 1/4 members active. When the system 
completed booted, a gmirror list showed that the mirror consisted of only a 
single member of the expected four members. 

We have software that runs automatically when a system has booted to make sure 
all drives are partipating in the mirror. In this case it discovered 3 of the 4 
drives were missing and proceeded to add them back in. This is where their old 
data gets destroyed of course. If I go through this exact same process with any 
of the other drives everything works as it should--that drive gets reinserted 
and none of the other drives lose any data. The problem only occurs when the 
drive that's pulled is the first drive, which in our case is ad4. 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Unexpected gmirror behavior: Is this a bug?

2009-04-24 Thread Ivan Voras
2009/4/24 Peter Steele :
>> This definitely looks like a bug. Try asking again on the freebsd-geom@
>> list. Provide output of "gmirror list".
>
> I'll try that list...
>
>>So, your steps were:
>
>>1. ad4, ad6, ad8 and ad10 in a 4-way mirror
>>2. ad4 fails. At this point did you do a "gmirror list"? I.e. did
>>gmirror detect it failing? If I read it correctly, the GenID field
>>should have been increased in this case.
>>3. The system continues to be used
>>4. You power it down, take out and reinsert ad4
>>5. On boot, ad4 is detected, inserted in the mirror but as a "known
>>good" copy, not a stale one.
>>
>>Correct?
>
> Yes, that's basically the sequence that occurred. I can easily recreate the
> condition though. I shut my box down, took out ad4 and then rebooted. The
> system complained about ad4 being missing and proceeded with a mirror using
> 3/4 of the drives. I then created a file on the system, shutdown again, and
> then reinserted ad4. On reboot as the system was starting up the gmirror
> driver, it detected ad4 but instead of reinserting in in the most recent
> mirror made up of the other drives, it became the active drive and kicked
> out its old partners,

By "kicked out" you mean "overwritten"?

You should definitely look at "gmirror list" before and after.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Unexpected gmirror behavior: Is this a bug?

2009-04-24 Thread Wojciech Puchar
i think it's a bug but only happens with such massive mirror. very few 
people do more than 2-way mirrors that's probably it wasn't catched.


please do report the bug - it's critical.

On Fri, 24 Apr 2009, Peter Steele wrote:


This only happens with ad4. If ad6 for example goes offline in the same way, 
when it is reinserted
it does not become the dominant drive and resync its data with the other 
drives. Rather its data
is overwritten with the data from the 3 member mirror, as you'd expect.
looks like very strange bug. many times i got drives disconnected and
always gmirror resynced


If I just pull ad4 and then reinsert it without doing a reboot, everything 
works fine. The problem occurs when ad4 is pulled and then reinserted after the 
system is shutdown. When the system comes up, it doesn't get added back to the 
existing mirror but rather becomes the principal member of the mirror, using 
its old data. Not good.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Unexpected gmirror behavior: Is this a bug?

2009-04-24 Thread Peter Steele
> This only happens with ad4. If ad6 for example goes offline in the same way, 
> when it is reinserted 
> it does not become the dominant drive and resync its data with the other 
> drives. Rather its data 
> is overwritten with the data from the 3 member mirror, as you'd expect. 
> looks like very strange bug. many times i got drives disconnected and 
> always gmirror resynced 

If I just pull ad4 and then reinsert it without doing a reboot, everything 
works fine. The problem occurs when ad4 is pulled and then reinserted after the 
system is shutdown. When the system comes up, it doesn't get added back to the 
existing mirror but rather becomes the principal member of the mirror, using 
its old data. Not good. 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Unexpected gmirror behavior: Is this a bug?

2009-04-24 Thread Peter Steele
> This definitely looks like a bug. Try asking again on the freebsd-geom@ 
> list. Provide output of "gmirror list". 

I'll try that list... 

>So, your steps were: 

>1. ad4, ad6, ad8 and ad10 in a 4-way mirror 
>2. ad4 fails. At this point did you do a "gmirror list"? I.e. did 
>gmirror detect it failing? If I read it correctly, the GenID field 
>should have been increased in this case. 
>3. The system continues to be used 
>4. You power it down, take out and reinsert ad4 
>5. On boot, ad4 is detected, inserted in the mirror but as a "known 
>good" copy, not a stale one. 
> 
>Correct? 

Yes, that's basically the sequence that occurred. I can easily recreate the 
condition though. I shut my box down, took out ad4 and then rebooted. The 
system complained about ad4 being missing and proceeded with a mirror using 3/4 
of the drives. I then created a file on the system, shutdown again, and then 
reinserted ad4. On reboot as the system was starting up the gmirror driver, it 
detected ad4 but instead of reinserting in in the most recent mirror made up of 
the other drives, it became the active drive and kicked out its old partners, 
When the old drives were reinserted manually into that mirror they were of 
course synced with the data from ad4, destroying their more recent data. 

This does not happen if I do the same thing with a drive other than ad4. 

>What version of FreeBSD are you using? 

7.0-p10 or so. 

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Unexpected gmirror behavior: Is this a bug?

2009-04-24 Thread Wojciech Puchar

This only happens with ad4. If ad6 for example goes offline in the same way, 
when it is reinserted it does not become the dominant drive and resync its data 
with the other drives. Rather its data is overwritten with the data from the 3 
member mirror, as you'd expect.
looks like very strange bug. many times i got drives disconnected and 
always gmirror resynced

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Unexpected gmirror behavior: Is this a bug?

2009-04-24 Thread Ivan Voras
Ivan Voras wrote:
> Peter Steele wrote:
>> We had a somewhat startling scenario occur with gmirror. We have systems 
>> with four drives ad4, ad6, ad8, and ad10, with the OS setup on a mirrored 
>> slice across all four drives. The ad4 drive failed at one point, due to a 
>> simple bad connection in its drive bay. While it was offline, the system was 
>> continued to be used for a while and new data was added to the mirrored file 
>> system. 
>>
>> We eventually took the box down to deal with ad4, and tried simply pulling 
>> and reinserting the drive. On reboot we saw that the BIOS detected the 
>> drive, so that was good. However, when FreeBSD got to the point of starting 
>> up the GEOM driver, instead of reinserting ad4 into the more current mirror 
>> consisting of ad6/ad8/ad10 and resyncing it with that data, the GEOM driver 
>> assumed ad4 was the "good" mirror and ended up resyncing ad6/ad8/ad10 with 
>> the data from ad4, causing the new files we had added to those drives to be 
>> lost. 
>>
>> This only happens with ad4. If ad6 for example goes offline in the same way, 
>> when it is reinserted it does not become the dominant drive and resync its 
>> data with the other drives. Rather its data is overwritten with the data 
>> from the 3 member mirror, as you'd expect. 
>>
>> So, clearly ad4, the first disk, is treated specially. The question is this 
>> a bug or a feature? Is there anyway to prevent this behavior? This would be 
>> a disastrous thing to happen in the field on one of our customer systems. 
> 
> This definitely looks like a bug. Try asking again on the freebsd-geom@
> list. Provide output of "gmirror list".
> 
> From what you said it looks like you did the procedure safely - you
> turned off the server, then pulled the drive and reinserted it, then
> turned it on again, right?

Sorry, that was a useless response - what I said should be a no-op.

So, your steps were:

1. ad4, ad6, ad8 and ad10 in a 4-way mirror
2. ad4 fails. At this point did you do a "gmirror list"? I.e. did
gmirror detect it failing? If I read it correctly, the GenID field
should have been increased in this case.
3. The system continues to be used
4. You power it down, take out and reinsert ad4
5. On boot, ad4 is detected, inserted in the mirror but as a "known
good" copy, not a stale one.

Correct?

What version of FreeBSD are you using?




signature.asc
Description: OpenPGP digital signature


Re: Unexpected gmirror behavior: Is this a bug?

2009-04-24 Thread Ivan Voras
Peter Steele wrote:
> We had a somewhat startling scenario occur with gmirror. We have systems with 
> four drives ad4, ad6, ad8, and ad10, with the OS setup on a mirrored slice 
> across all four drives. The ad4 drive failed at one point, due to a simple 
> bad connection in its drive bay. While it was offline, the system was 
> continued to be used for a while and new data was added to the mirrored file 
> system. 
> 
> We eventually took the box down to deal with ad4, and tried simply pulling 
> and reinserting the drive. On reboot we saw that the BIOS detected the drive, 
> so that was good. However, when FreeBSD got to the point of starting up the 
> GEOM driver, instead of reinserting ad4 into the more current mirror 
> consisting of ad6/ad8/ad10 and resyncing it with that data, the GEOM driver 
> assumed ad4 was the "good" mirror and ended up resyncing ad6/ad8/ad10 with 
> the data from ad4, causing the new files we had added to those drives to be 
> lost. 
> 
> This only happens with ad4. If ad6 for example goes offline in the same way, 
> when it is reinserted it does not become the dominant drive and resync its 
> data with the other drives. Rather its data is overwritten with the data from 
> the 3 member mirror, as you'd expect. 
> 
> So, clearly ad4, the first disk, is treated specially. The question is this a 
> bug or a feature? Is there anyway to prevent this behavior? This would be a 
> disastrous thing to happen in the field on one of our customer systems. 

This definitely looks like a bug. Try asking again on the freebsd-geom@
list. Provide output of "gmirror list".

From what you said it looks like you did the procedure safely - you
turned off the server, then pulled the drive and reinserted it, then
turned it on again, right?




signature.asc
Description: OpenPGP digital signature