Hi,

On 2018-08-11 01:55:43 +0200, Tomas Vondra wrote:
> On 08/10/2018 11:59 PM, Tomas Vondra wrote:
> > 
> > ...
> > 
> > I suspect there's some other ingredient, e.g. some manipulation with the
> > subscription. Or maybe it's not needed at all and I'm just imagining things.
> > 
> 
> Indeed, the manipulation with the subscription seems to be the key here.
> I pretty reliably get the 'could not read block' error when doing this:
> 
> 1) start the insert pgbench
> 
>    pgbench -n -c 4 -T 300 -p 5433 -f insert.sql test
> 
> 2) start the vacuum full pgbench
> 
>    pgbench -n -f vacuum.sql -T 300 -p 5433 test
> 
> 3) try to create a subscription, but with small amount of conflicting
> data so that the sync fails like this:
> 
>   LOG:  logical replication table synchronization worker for
>   subscription "s", table "t" has started
>   ERROR:  duplicate key value violates unique constraint "t_pkey"
>   DETAIL:  Key (a)=(5997542) already exists.
>   CONTEXT:  COPY t, line 1
>   LOG:  worker process: logical replication worker for subscription
>   16458 sync 16397 (PID 31983) exited with exit code 1
> 
> 4) At this point the insert pgbench (at least some clients) should have
> failed with the error. If not, rinse and repeat.
> 
> This kinda explains why I've been seeing the error only occasionally,
> because it only happened when I forgotten to clean the table on the
> subscriber while recreating the subscription.

I'll try to reproduce this.  If you're also looking, I suspect a good
first hint would be to just change the ERROR into a PANIC and look at
the backtrace from the generated core file.

To the point that I wonder if we shouldn't just change the ERROR into a
PANIC on master (but not REL_11_STABLE), so the buildfarm gives us
feedback.  I don't think the problem can fundamentally be related to
subscriptions, given the error occurs before any subscriptions are
created in the schedule.

Greetings,

Andres Freund

Reply via email to