On June 7, 2017 11:29:28 AM PDT, "Fabrízio de Royes Mello" <fabriziome...@gmail.com> wrote: >On Fri, Jun 2, 2017 at 6:37 PM, Fabrízio de Royes Mello < >fabriziome...@gmail.com> wrote: >> >> >> On Fri, Jun 2, 2017 at 6:32 PM, Fabrízio de Royes Mello < >fabriziome...@gmail.com> wrote: >> > >> > Hi all, >> > >> > This week I faced a out of disk space trouble in 8TB production >cluster. During investigation we notice that pg_replslot was the >culprit >growing more than 1TB in less than 1 (one) hour. >> > >> > We're using PostgreSQL 9.5.6 with pglogical 1.2.2 replicating to a >new >9.6 instance and planning the upgrade soon. >> > >> > What I did? I freed some disk space just to startup PostgreSQL and >begin the investigation. During the 'startup recovery' simply the files >inside the pg_replslot was tottaly removed. So our trouble with 'out of >disk space' disappear. Then the server went up and physical slaves >attached >normally to master but logical slaves doesn't, staying stalled in >'catchup' >state. >> > >> > At this moment the "pg_replslot" directory started growing fast >again >and forced us to drop the logical replication slot and we lost the >logical >slave. >> > >> > Googling awhile I found this thread [1] about a similar issue >reported >by Dmitriy Sarafannikov and replied by Andres and Álvaro. >> > >> > I ran the test case provided by Dmitriy [1] against branches: >> > - REL9_4_STABLE >> > - REL9_5_STABLE >> > - REL9_6_STABLE >> > - master >> > >> > After all test the issue remains... and also using the new Logical >Replication stuff (CREATE PUB/CREATE SUB). Just after a restart the >"pg_replslot" was properly cleaned. The typo in >ReorderBufferIterTXNInit >complained by Dimitriy was fixed but the issue remains. >> > >> > Seems no one complain again about this issue and the thread was >lost. >> > >> > The attached is a reworked version of Dimitriy's patch that seems >solve >the issue. I confess I don't know enough about replication slots code >to >really know if it's the best solution. >> > >> > Regards, >> > >> > [1] >https://www.postgresql.org/message-id/1457621358.355011041%40f382.i.mail.ru >> > >> >> Just adding Dimitriy to conversation... previous email I provided was >wrong. >> > >Does anyone have some thought about this critical issue? >
I plan to look into it over the next few days. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers