Re: [HACKERS] FSM corruption leading to errors

Pavan Deolasee Mon, 17 Oct 2016 00:15:52 -0700

On Tue, Oct 11, 2016 at 5:20 AM, Michael Paquier <[email protected]>
wrote:


>
> >
> > Once the underlying bug is fixed, I don't see why it should break again.
> I
> > added the above code to mostly deal with already corrupt FSMs. May be we
> can
> > just document and leave it to the user to run some correctness checks
> (see
> > below), especially given that the code is not cheap and adds overheads
> for
> > everybody, irrespective of whether they have or will ever have corrupt
> FSM.
>
> Yep. I'd leave it for the release notes to hold a diagnostic method.
> That's annoying, but this has been done in the past like for the
> multixact issues..
>

I'm okay with that. It's annoying, especially because the bug may show up
when your primary is down and you just failed over for HA, only to find
that the standby won't work correctly. But I don't have ideas how to fix
existing corruption without adding significant penalty to normal path.


>
> What if you restart the standby, and then do a diagnostic query?
> Wouldn't that be enough? (Something just based on
> pg_freespace(pg_relation_size(oid) / block_size) != 0)
>
>
Yes, that will enough once the fix is in place.

I think this is a major bug and I would appreciate any ideas to get the
patch in a committable shape before the next minor release goes out. We
probably need a committer to get interested in this to make progress.

Thanks,
Pavan

-- 
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] FSM corruption leading to errors

Reply via email to