Re: [HACKERS] Help required to debug pg_repack breaking logical replication

2017-10-12 Thread Robert Haas
On Sat, Oct 7, 2017 at 2:37 PM, Daniele Varrazzo
 wrote:
> (with a
> custom addition to update the relfrozenxid which seems backwards to me
> as it sets the older frozen xid on the new table [3]).
>
> [3] https://github.com/reorg/pg_repack/issues/152

Wow.  That's really bad.  It will corrupt your database.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Help required to debug pg_repack breaking logical replication

2017-10-10 Thread Petr Jelinek
On 08/10/17 15:21, Craig Ringer wrote:
> On 8 October 2017 at 02:37, Daniele Varrazzo  
> wrote:
>> Hello,
>>
>> we have been reported, and I have experienced a couple of times,
>> pg_repack breaking logical replication.
>>
>> - https://github.com/reorg/pg_repack/issues/135
>> - https://github.com/2ndQuadrant/pglogical/issues/113
> 
> Yeah, I was going to say I've seen reports of this with pglogical, but
> I see you've linked to them.
> 
> I haven't had a chance to look into it though, and haven't had a
> suitable reproducible test case.
> 
>> In the above issue #113, Petr Jelinek commented:
>>
>>> From quick look at pg_repack, the way it does table rewrite is almost 
>>> guaranteed
>>> to break logical decoding unless there is zero unconsumed changes for a 
>>> given table
>>> as it does not build the necessary mappings info for logical decoding that 
>>> standard
>>> heap rewrite in postgres does.
>>
>> unfortunately he didn't follow up to further details requests.
> 
> At a guess he's referring to src/backend/access/heap/rewriteheap.c .
> 
> I'd explain better if I understood what was going on myself, but I
> haven't really understood the logical decoding parts of that code.
> 
>> - Is Petr diagnosis right and freezing of logical replication is to be
>> blamed to missing mapping?
>> - Can you suggest a test to reproduce the issue reliably?
>> - What are mapped relations anyway?
> 
> I can't immediately give you the answers you seek, but start by
> studying src/backend/access/heap/rewriteheap.c . Notably
> logical_end_heap_rewrite, logical_rewrite_heap_tuple,
> logical_begin_heap_rewrite.
> 

Yes that's exactly it. When table is rewritten we need to create mapping
for every tuple that was created or removed (ie, insert, update or
delete operation happened on it) since the oldest replication slot xmin
for logical decoding to continue to work on that table after the
rewrite. And pg_repack doesn't create that mapping.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Help required to debug pg_repack breaking logical replication

2017-10-08 Thread Craig Ringer
On 8 October 2017 at 02:37, Daniele Varrazzo  wrote:
> Hello,
>
> we have been reported, and I have experienced a couple of times,
> pg_repack breaking logical replication.
>
> - https://github.com/reorg/pg_repack/issues/135
> - https://github.com/2ndQuadrant/pglogical/issues/113

Yeah, I was going to say I've seen reports of this with pglogical, but
I see you've linked to them.

I haven't had a chance to look into it though, and haven't had a
suitable reproducible test case.

> In the above issue #113, Petr Jelinek commented:
>
>> From quick look at pg_repack, the way it does table rewrite is almost 
>> guaranteed
>> to break logical decoding unless there is zero unconsumed changes for a 
>> given table
>> as it does not build the necessary mappings info for logical decoding that 
>> standard
>> heap rewrite in postgres does.
>
> unfortunately he didn't follow up to further details requests.

At a guess he's referring to src/backend/access/heap/rewriteheap.c .

I'd explain better if I understood what was going on myself, but I
haven't really understood the logical decoding parts of that code.

> - Is Petr diagnosis right and freezing of logical replication is to be
> blamed to missing mapping?
> - Can you suggest a test to reproduce the issue reliably?
> - What are mapped relations anyway?

I can't immediately give you the answers you seek, but start by
studying src/backend/access/heap/rewriteheap.c . Notably
logical_end_heap_rewrite, logical_rewrite_heap_tuple,
logical_begin_heap_rewrite.

At a wild "I haven't read any of the relevant code in detail yet" stab
in the dark, pg_repack is failing to do the bookkeeping required by
logical decoding around relfilenode changes, cmin/cmax, etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Help required to debug pg_repack breaking logical replication

2017-10-07 Thread Daniele Varrazzo
Hello,

we have been reported, and I have experienced a couple of times,
pg_repack breaking logical replication.

- https://github.com/reorg/pg_repack/issues/135
- https://github.com/2ndQuadrant/pglogical/issues/113

In my experience, after the botched run, the replication slot was
"stuck", and any attempt of reading (including
pg_logical_slot_peek_changes()) blocked until ctrl-c. I've tried
replicating the issue but first attempts have failed to fail.

In the above issue #113, Petr Jelinek commented:

> From quick look at pg_repack, the way it does table rewrite is almost 
> guaranteed
> to break logical decoding unless there is zero unconsumed changes for a given 
> table
> as it does not build the necessary mappings info for logical decoding that 
> standard
> heap rewrite in postgres does.

unfortunately he didn't follow up to further details requests.

I've started drilling down the problem, observing that the swap
function, swap_heap_or_index_files() [1] was cargoculted by the
original author from the CLUSTER command code as of PG 8.2 [2] (with a
custom addition to update the relfrozenxid which seems backwards to me
as it sets the older frozen xid on the new table [3]).

[1] https://github.com/reorg/pg_repack/blob/ver_1.4.1/lib/repack.c#L1082
[2] 
https://github.com/postgres/postgres/blob/REL8_2_STABLE/src/backend/commands/cluster.c#L783
[3] https://github.com/reorg/pg_repack/issues/152

so that code is effectively missing a good 10 years of development.
Before jumping into fast-forwarding it, I would like to ask for some
help, i.e.

- Is Petr diagnosis right and freezing of logical replication is to be
blamed to missing mapping?
- Can you suggest a test to reproduce the issue reliably?
- What are mapped relations anyway?

Thank you in advance for any help (either info about how to fix the
issue properly, or a patch by someone who happens to really understand
what we are talking about).

-- Daniele


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers