Re: [HACKERS] Issues with logical replication

2018-01-05 Thread Stas Kelvich
> On 3 Jan 2018, at 23:35, Alvaro Herrera wrote: > > Pushed. Will you (Konstantin, Stas, Masahiko) please verify that after > this commit all the problems reported with logical replication are > fixed? Checked that with and without extra sleep in AssignTransactionId(). In both cases patch work

Re: [HACKERS] Issues with logical replication

2018-01-03 Thread Alvaro Herrera
Alvaro Herrera wrote: > Will push this shortly after lunch. Pushed. Will you (Konstantin, Stas, Masahiko) please verify that after this commit all the problems reported with logical replication are fixed? Thanks -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Developm

Re: [HACKERS] Issues with logical replication

2018-01-03 Thread Alvaro Herrera
Stas Kelvich wrote: > Seems that having busy loop is the best idea out of several discussed. > > I thought about small sleep at the bottom of that loop if we reached topmost > transaction, but taking into account low probability of that event may be > it is faster to do just busy wait. In other

Re: [HACKERS] Issues with logical replication

2017-12-01 Thread Petr Jelinek
On 30/11/17 11:48, Simon Riggs wrote: > On 30 November 2017 at 11:30, Petr Jelinek > wrote: >> On 30/11/17 00:47, Andres Freund wrote: >>> On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote: I don't understand. I mean sure the SnapBuildWaitSnapshot() can live with it, but the problematic

Re: [HACKERS] Issues with logical replication

2017-11-30 Thread Simon Riggs
On 30 November 2017 at 11:30, Petr Jelinek wrote: > On 30/11/17 00:47, Andres Freund wrote: >> On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote: >>> I don't understand. I mean sure the SnapBuildWaitSnapshot() can live >>> with it, but the problematic logic happens inside the >>> XactLockTableInser

Re: [HACKERS] Issues with logical replication

2017-11-30 Thread Stas Kelvich
> On 30 Nov 2017, at 03:30, Petr Jelinek wrote: > > On 30/11/17 00:47, Andres Freund wrote: >> On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote: >>> I don't understand. I mean sure the SnapBuildWaitSnapshot() can live >>> with it, but the problematic logic happens inside the >>> XactLockTableIns

Re: [HACKERS] Issues with logical replication

2017-11-29 Thread Petr Jelinek
On 30/11/17 00:47, Andres Freund wrote: > On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote: >> I don't understand. I mean sure the SnapBuildWaitSnapshot() can live >> with it, but the problematic logic happens inside the >> XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of >> detectin

Re: [HACKERS] Issues with logical replication

2017-11-29 Thread Andres Freund
On 2017-11-30 00:45:44 +0100, Petr Jelinek wrote: > I don't understand. I mean sure the SnapBuildWaitSnapshot() can live > with it, but the problematic logic happens inside the > XactLockTableInsert() and SnapBuildWaitSnapshot() has no way of > detecting the situation short of reimplementing the >

Re: [HACKERS] Issues with logical replication

2017-11-29 Thread Petr Jelinek
On 30/11/17 00:40, Andres Freund wrote: > On 2017-11-30 00:25:58 +0100, Petr Jelinek wrote: >> Yes that helps thanks. Now that I reproduced it I understand, I was >> confused by the backtrace that said xid was 0 on the input to >> XactLockTableWait() but that's not the case, it's what xid is change

Re: [HACKERS] Issues with logical replication

2017-11-29 Thread Andres Freund
On 2017-11-30 00:25:58 +0100, Petr Jelinek wrote: > Yes that helps thanks. Now that I reproduced it I understand, I was > confused by the backtrace that said xid was 0 on the input to > XactLockTableWait() but that's not the case, it's what xid is changed to > in the inner loop. > So what happens

Re: [HACKERS] Issues with logical replication

2017-11-29 Thread Petr Jelinek
On 29/11/17 20:11, Stas Kelvich wrote: > >> On 29 Nov 2017, at 18:46, Petr Jelinek wrote: >> >> What I don't understand is how it leads to crash (and I could not >> reproduce it using the pgbench file attached in this thread either) and >> moreover how it leads to 0 xid being logged. The only exp

Re: [HACKERS] Issues with logical replication

2017-11-29 Thread Stas Kelvich
> On 29 Nov 2017, at 18:46, Petr Jelinek wrote: > > What I don't understand is how it leads to crash (and I could not > reproduce it using the pgbench file attached in this thread either) and > moreover how it leads to 0 xid being logged. The only explanation I can > come up is that some kind of

Re: [HACKERS] Issues with logical replication

2017-11-29 Thread Petr Jelinek
Hi, (sorry for not being active here, I am still catching up after being away for some family issues) On 16/11/17 21:12, Robert Haas wrote: > On Thu, Nov 16, 2017 at 2:41 PM, Andres Freund wrote: >>> To me, it seems like SnapBuildWaitSnapshot() is fundamentally >>> misdesigned >> >> Maybe I'm co

Re: [HACKERS] Issues with logical replication

2017-11-21 Thread Simon Riggs
On 4 October 2017 at 10:35, Petr Jelinek wrote: > On 02/10/17 18:59, Petr Jelinek wrote: >>> >>> Now fix the trigger function: >>> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$ >>> BEGIN >>> RETURN NEW; >>> END $$ LANGUAGE plpgsql; >>> >>> And manually perform at ma

Re: [HACKERS] Issues with logical replication

2017-11-21 Thread Craig Ringer
On 4 October 2017 at 07:35, Petr Jelinek wrote: > On 02/10/17 18:59, Petr Jelinek wrote: > >> > >> Now fix the trigger function: > >> CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER > AS $$ > >> BEGIN > >> RETURN NEW; > >> END $$ LANGUAGE plpgsql; > >> > >> And manually pe

Re: [HACKERS] Issues with logical replication

2017-11-21 Thread Michael Paquier
On Fri, Nov 17, 2017 at 5:12 AM, Robert Haas wrote: > On Thu, Nov 16, 2017 at 2:41 PM, Andres Freund wrote: >>> To me, it seems like SnapBuildWaitSnapshot() is fundamentally >>> misdesigned >> >> Maybe I'm confused, but why is it fundamentally misdesigned? It's not >> such an absurd idea to wait

Re: [HACKERS] Issues with logical replication

2017-11-16 Thread Robert Haas
On Thu, Nov 16, 2017 at 2:41 PM, Andres Freund wrote: >> To me, it seems like SnapBuildWaitSnapshot() is fundamentally >> misdesigned > > Maybe I'm confused, but why is it fundamentally misdesigned? It's not > such an absurd idea to wait for an xid in a WAL record. I get that > there's a race con

Re: [HACKERS] Issues with logical replication

2017-11-16 Thread Andres Freund
On 2017-11-16 10:36:40 -0500, Robert Haas wrote: > On Wed, Nov 15, 2017 at 8:20 PM, Stas Kelvich > wrote: > > I did a sketch of first approach just to confirm that it solves the problem. > > But there I hold ProcArrayLock during update of flag. Since only reader is > > GetRunningTransactionData i

Re: [HACKERS] Issues with logical replication

2017-11-16 Thread Robert Haas
On Wed, Nov 15, 2017 at 8:20 PM, Stas Kelvich wrote: > I did a sketch of first approach just to confirm that it solves the problem. > But there I hold ProcArrayLock during update of flag. Since only reader is > GetRunningTransactionData it possible to have a custom lock there. In > this case GetRu

Re: [HACKERS] Issues with logical replication

2017-11-16 Thread Stas Kelvich
> On 15 Nov 2017, at 23:09, Robert Haas wrote: > > Ouch. This seems like a bug that needs to be fixed, but do you think > it's related to to Petr's proposed fix to set es_output_cid? That fix > looks reasonable, since we shouldn't try to lock tuples without a > valid CommandId. > > Now, havin

Re: [HACKERS] Issues with logical replication

2017-11-15 Thread Robert Haas
On Mon, Oct 9, 2017 at 9:19 PM, Stas Kelvich wrote: > I investigated this case and it seems that XactLockTableWait() in > SnapBuildWaitSnapshot() > not always work as expected. XactLockTableWait() waits on LockAcquire() for > xid to be > completed and if we finally got this lock but transactio