On Sun, Feb 4, 2018 at 10:47 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
> On Fri, Feb 2, 2018 at 2:11 PM, amul sul <sula...@gmail.com> wrote:
>> On Fri, Jan 26, 2018 at 11:58 AM, Amit Kapila <amit.kapil...@gmail.com> 
>> wrote:
>> [....]
>>> I think you can manually (via debugger) hit this by using
>>> PUBLICATION/SUBSCRIPTION syntax for logical replication.  I think what
>>> you need to do is in node-1, create a partitioned table and subscribe
>>> it on node-2.  Now, perform an Update on node-1, then stop the logical
>>> replication worker before it calls heap_lock_tuple.  Now, in node-2,
>>> update the same row such that it moves the row.  Now, continue the
>>> logical replication worker.  I think it should hit your new code, if
>>> not then we need to think of some other way.
>>>
>>
>> I am able to hit the change log using above steps. Thanks a lot for the
>> step by step guide, I really needed that.
>>
>> One strange behavior I found in the logical replication which is reproducible
>> without attached patch as well -- when I have updated on node2 by keeping
>> breakpoint before the heap_lock_tuple call in replication worker, I can see
>> a duplicate row was inserted on the node2, see this:
>>
> ..
>>
>> I am thinking to report this in a separate thread, but not sure if
>> this is already known behaviour or not.
>>
>
> I think it is worth to discuss this behavior in a separate thread.
> However, if possible, try to reproduce it without partitioning and
> then report it.
>
Logical replication behavior for the normal table is as expected, this happens
only with partition table, will start a new thread for this on hacker.

>>
>> Updated patch attached -- correct changes in execReplication.c.
>>
>
> Your changes look correct to me.
>
> I wonder what will be the behavior of this patch with
> wal_consistency_checking [1].  I think it will generate a failure as
> there is nothing in WAL to replay it.  Can you once try it?  If we see
> a failure with wal consistency checker, then we need to think whether
> (a) we want to deal with it by logging this information, or (b) do we
> want to mask it or (c) something else?
>
>
> [1] -  
> https://www.postgresql.org/docs/devel/static/runtime-config-developer.html
>

Yes, you are correct standby stopped with a following error:

 FATAL:  inconsistent page found, rel 1663/13260/16390, forknum 0, blkno 0
 CONTEXT:  WAL redo at 0/3002510 for Heap/DELETE: off 6 KEYS_UPDATED
 LOG:  startup process (PID 22791) exited with exit code 1
 LOG:  terminating any other active server processes
 LOG:  database system is shut down

I have tested warm standby replication setup using attached script. Without
wal_consistency_checking setting, it works fine & data from master to standby is
replicated as expected, if this guaranty is enough then I think could skip this
error from wal consistent check for such deleted tuple (I guess option
b that you have suggested), thoughts?

Attachment: test.sh
Description: Bourne shell script

Reply via email to