On Wed, Oct 19, 2016 at 2:37 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:

>
>>
> Actually, this is still not 100% safe. Flushing the WAL before modifying
> the FSM page is not enough. We also need to WAL-log a full-page image of
> the FSM page, otherwise we are still vulnerable to the torn page problem.
>
> I came up with the attached. This is fortunately much simpler than my
> previous attempt. I replaced the MarkBufferDirtyHint() calls with
> MarkBufferDirty(), to fix the original issue, plus WAL-logging a full-page
> image to fix the torn page issue.
>
>
Looks good to me.


> BTW any thoughts on race-condition on the primary? Comments at
>> MarkBufferDirtyHint() seems to suggest that a race condition is possible
>> which might leave the buffer without the DIRTY flag, but I'm not sure if
>> that can only happen when the page is locked in shared mode.
>>
>
> I think the race condition can only happen when the page is locked in
> shared mode. In any case, with this proposed fix, we'll use
> MarkBufferDirty() rather than MarkBufferDirtyHint(), so it's moot.
>
>
Yes, the fix will cover that problem (if it exists). The reason why I was
curious to know is because there are several reports of similar error in
the past and some of them did not involve as standby. Those reports mostly
remained unresolved and I wondered if this explains them. But yeah, my
conclusion was that the race is not possible with page locked in EXCLUSIVE
mode. So may be there is another problem somewhere or a crash recovery may
have left the FSM in inconsistent state.

Anyways, we seem good to go with the patch.

Thanks,
Pavan
-- 
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Reply via email to