Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

Noah Misch Wed, 09 Dec 2020 01:36:28 -0800

On Wed, Dec 02, 2020 at 01:50:25PM +0530, Amit Kapila wrote:
> On Wed, Dec 2, 2020 at 1:20 PM Dilip Kumar <dilipbal...@gmail.com> wrote:
> > > On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila <amit.kapil...@gmail.com> 
> > > wrote:
> > > > On Mon, Nov 30, 2020 at 3:14 AM Noah Misch <n...@leadboat.com> wrote:
> > > > > On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > > > > > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > > > > > see the buildfarm reports
> > > > >
> > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-09-08%2006%3A24%3A14
> > > > > failed the new 015_stream.pl test with the subscriber looping like 
> > > > > this:
> > > > >
> > > > > 2020-09-08 11:22:49.848 UTC [13959252:1] LOG:  logical replication 
> > > > > apply worker for subscription "tap_sub" has started
> > > > > 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR:  could not open 
> > > > > temporary file "16393-510.changes.0" from BufFile 
> > > > > "16393-510.changes": No such file or directory
> > > > > 2020-09-08 11:22:54.055 UTC [7602182:1] LOG:  logical replication 
> > > > > apply worker for subscription "tap_sub" has started
> > > > > 2020-09-08 11:22:54.101 UTC [31785284:4] LOG:  background worker 
> > > > > "logical replication worker" (PID 13959252) exited with exit code 1
> > > > > 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR:  could not open 
> > > > > temporary file "16393-510.changes.0" from BufFile 
> > > > > "16393-510.changes": No such file or directory
> > > > > ...


> > > > The above kind of error can happen due to the following reasons: (a)
> > > > the first time we sent the stream and created the file and that got
> > > > removed before the second stream reached the subscriber. (b) from the
> > > > publisher-side, we never sent the indication that it is the first
> > > > stream and the subscriber directly tries to open the file thinking it
> > > > is already there.

Further testing showed it was a file location problem, not a deletion problem.
The worker tried to open
base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset/16393-510.changes.0, but these
were the files actually existing:

[nm@power-aix 0:2 2020-12-08T13:56:35 64gcc 0]$ ls -la $(find 
src/test/subscription/tmp_check -name '*sharedfileset*')
src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.0.sharedfileset:
total 408
drwx------    2 nm       usr             256 Dec 08 03:20 .
drwx------    4 nm       usr             256 Dec 08 03:20 ..
-rw-------    1 nm       usr          207806 Dec 08 03:20 16393-510.changes.0

src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset:
total 0
drwx------    2 nm       usr             256 Dec 08 03:20 .
drwx------    4 nm       usr             256 Dec 08 03:20 ..
-rw-------    1 nm       usr               0 Dec 08 03:20 16393-511.changes.0

> > I have executed "make check" in the loop with only this file.  I have
> > repeated it 5000 times but no failure, I am wondering shall we try to
> > execute in the same machine in a loop where it failed once?
> 
> Yes, that might help. Noah, would it be possible for you to try that

The problem is xidhash using strcmp() to compare keys; it needs memcmp().  For
this to matter, xidhash must contain more than one element.  Existing tests
rarely exercise the multi-element scenario.  Under heavy load, on this system,
the test publisher can have two active transactions at once, in which case it
does exercise multi-element xidhash.  (The publisher is sensitive to timing,
but the subscriber is not; once WAL contains interleaved records of two XIDs,
the subscriber fails every time.)  This would be much harder to reproduce on a
little-endian system, where strcmp(&xid, &xid_plus_one)!=0.  On big-endian,
every small XID has zero in the first octet; they all look like empty strings.

The attached patch has the one-line fix and some test suite changes that make
this reproduce frequently on any big-endian system.  I'm currently planning to
drop the test suite changes from the commit, but I could keep them if folks
like them.  (They'd need more comments and timeout handling.)

Author:     Noah Misch <n...@leadboat.com>
Commit:     Noah Misch <n...@leadboat.com>

    Use HASH_BLOBS for xidhash.
    
    This caused BufFile errors on buildfarm member sungazer, and SIGSEGV was
    possible.  Conditions for reaching those symptoms were more frequent on
    big-endian systems.
    
    Reviewed by FIXME.
    
    Discussion: https://postgr.es/m/20201129214441.ga691...@rfd.leadboat.com

diff --git a/src/backend/replication/logical/worker.c 
b/src/backend/replication/logical/worker.c
index c37aafe..fce1dee 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -804,7 +804,7 @@ apply_handle_stream_start(StringInfo s)
                hash_ctl.entrysize = sizeof(StreamXidHash);
                hash_ctl.hcxt = ApplyContext;
                xidhash = hash_create("StreamXidHash", 1024, &hash_ctl,
-                                                         HASH_ELEM | 
HASH_CONTEXT);
+                                                         HASH_ELEM | 
HASH_BLOBS | HASH_CONTEXT);
        }
 
        /* open the spool file for this transaction */
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 1488bff..40610f1 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1626,6 +1626,42 @@ sub interactive_psql
        return $harness;
 }
 
+# return IPC::Run harness object for non-interactive psql
+# FIXME pick a better name, and add POD docs
+sub psql_printable
+{
+       my ($self, $dbname, $stdin, $stdout, $timer, %params) = @_;
+
+       my $replication       = $params{replication};
+
+       my @psql_params       = (
+               'psql',
+               '-XAtq',
+               '-d',
+               $self->connstr($dbname)
+                 . (defined $replication ? " replication=$replication" : ""),
+               '-f',
+               '-');
+
+       $params{on_error_stop} = 1 unless defined $params{on_error_stop};
+
+       push @psql_params, '-v', 'ON_ERROR_STOP=1' if $params{on_error_stop};
+       push @psql_params, @{ $params{extra_params} }
+         if defined $params{extra_params};
+
+       # Ensure there is no data waiting to be sent:
+       $$stdin = "" if ref($stdin);
+       # IPC::Run would otherwise append to existing contents:
+       $$stdout = "" if ref($stdout);
+
+       my $harness = IPC::Run::start \@psql_params,
+         '<', $stdin, '>', $stdout, $timer;
+
+       die "psql startup timed out" if $timer->is_expired;
+
+       return $harness;
+}
+
 =pod
 
 =item $node->poll_query_until($dbname, $query [, $expected ])
diff --git a/src/test/subscription/t/015_stream.pl 
b/src/test/subscription/t/015_stream.pl
index fffe001..9ebe166 100644
--- a/src/test/subscription/t/015_stream.pl
+++ b/src/test/subscription/t/015_stream.pl
@@ -47,14 +47,35 @@ my $result =
 is($result, qq(2|2|2), 'check initial data was copied to subscriber');
 
 # Insert, update and delete enough rows to exceed the 64kB limit.
-$node_publisher->safe_psql('postgres', q{
+my $in  = '';
+my $out = '';
+
+my $timer = IPC::Run::timer(180);
+
+my $h = $node_publisher->psql_printable('postgres', \$in, \$out, $timer);
+
+$in .= q{
 BEGIN;
 INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
 UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
 DELETE FROM test_tab WHERE mod(a,3) = 0;
+};
+$h->pump;
+
+$node_publisher->safe_psql('postgres', q{
+BEGIN;
+INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(5001, 9999) 
s(i);
+DELETE FROM test_tab WHERE a > 5000;
 COMMIT;
 });
 
+$in .= q{
+COMMIT;
+\q
+};
+$h->pump;
+$h->finish;
+
 $node_publisher->wait_for_catchup($appname);
 
 $result =

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

Reply via email to