Re: [HACKERS] Directory pg_replslot is not properly cleaned

Fabrízio de Royes Mello Wed, 07 Jun 2017 11:47:32 -0700

On Wed, Jun 7, 2017 at 3:30 PM, Andres Freund <and...@anarazel.de> wrote:
>
>
>
> On June 7, 2017 11:29:28 AM PDT, "Fabrízio de Royes Mello" <
fabriziome...@gmail.com> wrote:
> >On Fri, Jun 2, 2017 at 6:37 PM, Fabrízio de Royes Mello <
> >fabriziome...@gmail.com> wrote:
> >>
> >>
> >> On Fri, Jun 2, 2017 at 6:32 PM, Fabrízio de Royes Mello <
> >fabriziome...@gmail.com> wrote:
> >> >
> >> > Hi all,
> >> >
> >> > This week I faced a out of disk space trouble in 8TB production
> >cluster. During investigation we notice that pg_replslot was the
> >culprit
> >growing more than 1TB in less than 1 (one) hour.
> >> >
> >> > We're using PostgreSQL 9.5.6 with pglogical 1.2.2 replicating to a
> >new
> >9.6 instance and planning the upgrade soon.
> >> >
> >> > What I did? I freed some disk space just to startup PostgreSQL and
> >begin the investigation. During the 'startup recovery' simply the files
> >inside the pg_replslot was tottaly removed. So our trouble with 'out of
> >disk space' disappear. Then the server went up and physical slaves
> >attached
> >normally to master but logical slaves doesn't, staying stalled in
> >'catchup'
> >state.
> >> >
> >> > At this moment the "pg_replslot" directory started growing fast
> >again
> >and forced us to drop the logical replication slot and we lost the
> >logical
> >slave.
> >> >
> >> > Googling awhile I found this thread [1] about a similar issue
> >reported
> >by Dmitriy Sarafannikov and replied by Andres and Álvaro.
> >> >
> >> > I ran the test case provided by Dmitriy [1] against branches:
> >> > - REL9_4_STABLE
> >> > - REL9_5_STABLE
> >> > - REL9_6_STABLE
> >> > - master
> >> >
> >> > After all test the issue remains... and also using the new Logical
> >Replication stuff (CREATE PUB/CREATE SUB). Just after a restart the
> >"pg_replslot" was properly cleaned. The typo in
> >ReorderBufferIterTXNInit
> >complained by Dimitriy was fixed but the issue remains.
> >> >
> >> > Seems no one complain again about this issue and the thread was
> >lost.
> >> >
> >> > The attached is a reworked version of Dimitriy's patch that seems
> >solve
> >the issue. I confess I don't know enough about replication slots code
> >to
> >really know if it's the best solution.
> >> >
> >> > Regards,
> >> >
> >> > [1]
> >
https://www.postgresql.org/message-id/1457621358.355011041%40f382.i.mail.ru
> >> >
> >>
> >> Just adding Dimitriy to conversation... previous email I provided was
> >wrong.
> >>
> >
> >Does anyone have some thought about this critical issue?
> >
>
> I plan to look into it over the next few days.
>


Thanks...


--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] Directory pg_replslot is not properly cleaned

Reply via email to