Re: [ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

Tiemen Ruiten Tue, 09 Jul 2019 04:22:19 -0700

On Mon, Jul 8, 2019 at 10:01 PM Jehan-Guillaume de Rorthais <[email protected]>
wrote:


> I should have step up to this thread, sorry :)
>

Really appreciate all the assistance so far.


> The real problem is not how much xact you will lost during failover, but
> how we
> can choose the best standby to elect. This election needs the timeline and
> LSN
> location of all standbys. And today, to fetch te timeline, we must issue a
> CHECKPOINT, then read the controldata file.
>
> I dig in xlog.c today. Maybe I can write a small extension to get the
> timeline
> from shared memory directly and make pgsqlms use it if it detects it. So
> people
> can decide if they feel like it is too invasive or really needed for
> their usecase. Maybe in next release. What do you think? Would it be
> useful to
> you?
>

Yes, that would be a really useful addition IMO. I would definitely use it.
If we can avoid taking a checkpoint that will save precious minutes during
a failover and the risk of timeouts would be drastically reduced. Would be
happy to test it if you want!


>
> >
> > I managed to improve the average time checkpoints are taking already from
> > what I mentioned in that thread, mainly by decreasing checkpoint_timeout
> > and setting full_page_writes = off; ostensibly not necessary on ZFS.
>
> The "full_page_writes" helps lowering the amount of WAL produced. Not the
> amount of writes to sync during the checkpoint. But I am sure it helps for
> your
> performances :)
>

If I'm saturating the IO capacity of my system during a forced checkpoint
and full_page_writes = off reduces IO by reducing the amount of WAL, then
it should help in an indirect way?


>
> Lowering "checkpoint_timeout" probably helps. As checkpoints occur more
> frequently, there is statistically less data to sync when a forced
> checkpoint
> happen during a failover.
>
> Regards,
>
>

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] PAF fails to promote slave: Can not get current node LSN location

Reply via email to