I believe it's a single delayed conflict write attempt and no successes in that
same interval.
On Aug 7, 2010, at 7:51 PM, Damien Katz wrote:
> Looks like all that's necessary is a single delayed conflict write attempt,
> and all subsequent delayed commits won't be commit, the header never gets
> written.
>
> 1.0 loses data. This is ridiculously bad.
>
> We need a test to reproduce this and fix.
>
> -Damien
>
> On Aug 7, 2010, at 4:35 PM, Adam Kocoloski wrote:
>
>> Good sleuthing guys, and my apologies for letting this through. Randall,
>> your patch in COUCHDB-794 was actually fine, it was my reworking of it that
>> caused this serious bug.
>>
>> With respect to that gist 513282, I think it would be better to return
>> Db#db{waiting_delayed_commit=nil} when the headers match instead of moving
>> the cancel_timer() command as you did. After all, we did perform the check
>> here -- it was just that nothing needed to be committed.
>>
>> Adam
>>
>> On Aug 7, 2010, at 6:55 PM, Damien Katz wrote:
>>
>>> Yes, I think it requires 2 conflicting writes in row, because it needs to
>>> trigger the delayed_commit timer without actually having anything to
>>> commit, so the header never changes.
>>>
>>> Try to reproduce this and add a test case.
>>>
>>> -Damien
>>>
>>>
>>> On Aug 7, 2010, at 3:47 PM, Randall Leeds wrote:
>>>
>>>> I think you may be right, Damien.
>>>> If ever a write happens that only contains conflicts while waiting for
>>>> a delayed commit message we might still be cancelling the timer. Is
>>>> this what you're thinking? This would be the fix:
>>>> http://gist.github.com/513282
>>>>
>>>> On Sat, Aug 7, 2010 at 15:42, Damien Katz <[email protected]> wrote:
>>>>> I think the problem might be that 2 conflicting write attempts in row can
>>>>> leave the #db.waiting_delayed_commit set but the timer has been
>>>>> cancelled. One that happens, the header may never be written, as it
>>>>> always thinks a delayed commit will fire soon.
>>>>>
>>>>> -Damien
>>>>>
>>>>>
>>>>> On Aug 7, 2010, at 12:08 PM, Randall Leeds wrote:
>>>>>
>>>>>> On Sat, Aug 7, 2010 at 11:56, Randall Leeds <[email protected]>
>>>>>> wrote:
>>>>>>> I agree completely! I immediately thought of this because I wrote that
>>>>>>> change. I spent a while staring at it last night but still can't
>>>>>>> imagine how it's a problem.
>>>>>>>
>>>>>>> On Sat, Aug 7, 2010 at 11:12, Damien Katz <[email protected]> wrote:
>>>>>>>> SVN commit r954043 looks suspicious. Digging further.
>>>>>>>>
>>>>>>>> -Damien
>>>>>>>
>>>>>>
>>>>>> I still want to stare at r954043, but it looks to me like there's at
>>>>>> least one situation where we do not commit data correctly during
>>>>>> compaction. This has to do with the way we now use the path to sync
>>>>>> outside the couch_file:process. Check this diff:
>>>>>> http://gist.github.com/513081
>>>>>
>>>>>
>>>
>>
>