Re: [BUG] Logical replica crash if there was an error in a function.

2023-04-06 Thread Anton A. Melnikov

On 05.04.2023 17:35, Tom Lane wrote:

"Anton A. Melnikov"  writes:

On 03.04.2023 21:49, Tom Lane wrote:

I did not think this case was worth memorializing in a test before,
and I still do not.  I'm inclined to reject this patch.



Could you help me to figure out, please.


The problem was an Assert that was speculative when it went in,
and which we eventually found was wrong in the context of logical
replication.  We removed the Assert.  I don't think we need a test
case to keep us from putting back the Assert.  That line of thinking
leads to test suites that run for fourteen hours and are near useless
because developers can't run them easily.

regards, tom lane


Ok, i understand! Thanks a lot for the clarification. A rather important point,
i'll take it into account for the future.
Let's do that. Revoked the patch.

With the best wishes!

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [BUG] Logical replica crash if there was an error in a function.

2023-04-05 Thread Tom Lane
"Anton A. Melnikov"  writes:
> On 03.04.2023 21:49, Tom Lane wrote:
>> I did not think this case was worth memorializing in a test before,
>> and I still do not.  I'm inclined to reject this patch.

> Could you help me to figure out, please.

The problem was an Assert that was speculative when it went in,
and which we eventually found was wrong in the context of logical
replication.  We removed the Assert.  I don't think we need a test
case to keep us from putting back the Assert.  That line of thinking
leads to test suites that run for fourteen hours and are near useless
because developers can't run them easily.

regards, tom lane




Re: [BUG] Logical replica crash if there was an error in a function.

2023-04-05 Thread Anton A. Melnikov

Hello!

On 03.04.2023 21:49, Tom Lane wrote:

"Anton A. Melnikov"  writes:

Now there are no any pending questions, so moved it to RFC.


I did not think this case was worth memorializing in a test before,
and I still do not.  I'm inclined to reject this patch.


Earlier, when discussing this test, there was a suggestion like this:


If we were just adding a
query or two to an existing scenario, that could be okay;


The current version of the test seems to be satisfies this condition.
The queries added do not affect the total test time within the measurement 
error.
This case is rare, of cause, but it really took place in practice.

So either there are some more reasons why this test should not be accepted that
i do not understand, or i misunderstood something from the above.

Could you help me to figure out, please.

Would be very grateful.

Sincerely yours,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [BUG] Logical replica crash if there was an error in a function.

2023-04-03 Thread Tom Lane
"Anton A. Melnikov"  writes:
> Now there are no any pending questions, so moved it to RFC.

I did not think this case was worth memorializing in a test before,
and I still do not.  I'm inclined to reject this patch.

regards, tom lane




Re: [BUG] Logical replica crash if there was an error in a function.

2023-03-16 Thread Anton A. Melnikov

Hello!

On 15.03.2023 21:29, Gregory Stark (as CFM) wrote:


These patches that are "Needs Review" and have received no comments at
all since before March 1st are these. If your patch is amongst this
list I would suggest any of:

1) Move it yourself to the next CF (or withdraw it)
2) Post to the list with any pending questions asking for specific
feedback -- it's much more likely to get feedback than just a generic
"here's a patch plz review"...
3) Mark it Ready for Committer and possibly post explaining the
resolution to any earlier questions to make it easier for a committer
to understand the state



There were some remarks:
1) very poor use of test cycles (by Tom Lane)
Solved in v2 by adding few extra queries to an existent test without any 
additional syncing.
2) the patch-tester fails on all platforms (by Andres Freund)
Fixed in v3.
3) warning with "my" variable $result and suggestion to correct comment (by 
vignesh C)
Both fixed in v4.

Now there are no any pending questions, so moved it to RFC.

With the best regards!

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit 16beb574708e0e2e993eb2fa1b093be97b8e53cf
Author: Anton A. Melnikov 
Date:   Sun Dec 11 06:26:03 2022 +0300

Add test for syntax error in the function in a a logical replication
worker. See dea834938.

diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index 036576acab..b241a7d498 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -69,6 +69,51 @@ $node_publisher->wait_for_catchup('sub1');
 
 pass('index predicates do not cause crash');
 
+# https://www.postgresql.org/message-id/flat/adf0452f-8c6b-7def-d35e-ab516c80088e%40inbox.ru
+
+# The bug was that when a syntax error occurred in a SQL-language or PL/pgSQL-language
+# CREATE FUNCTION or DO command executed in a logical replication worker,
+# we'd suffer a null pointer dereference or assertion failure.
+
+$node_subscriber->safe_psql('postgres', q{
+	create or replace procedure rebuild_test(
+	) as
+	$body$
+	declare
+		l_code  text:= E'create or replace function public.test_selector(
+	) returns setof public.tab1 as
+	\$body\$
+		select * from  error_name
+	\$body\$ language sql;';
+	begin
+		execute l_code;
+		perform public.test_selector();
+	end
+	$body$ language plpgsql;
+	create or replace function test_trg()
+	returns trigger as
+	$body$
+		declare
+		begin
+			call public.rebuild_test();
+			return NULL;
+		end
+	$body$ language plpgsql;
+	create trigger test_trigger after insert on tab1 for each row
+execute function test_trg();
+	alter table tab1 enable replica trigger test_trigger;
+});
+
+# This command will cause a crash on the replica if that bug hasn't been fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+	"ERROR:  relation \"error_name\" does not exist at character"
+);
+
+ok($result,
+	"ERROR: Logical decoding doesn't fail on function error");
+
 # We'll re-use these nodes below, so drop their replication state.
 $node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub1");
 $node_publisher->safe_psql('postgres', "DROP PUBLICATION pub1");
@@ -352,7 +397,7 @@ $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub_sch');
 
 # Subscriber's sch1.t1 should receive the row inserted into the new sch1.t1,
 # but not the row inserted into the old sch1.t1 post-rename.
-my $result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
 is( $result, qq(1
 2), 'check data in subscriber sch1.t1 after schema rename');
 


Re: [BUG] Logical replica crash if there was an error in a function.

2023-01-07 Thread Anton A. Melnikov

Thanks for your remarks.

On 07.01.2023 15:27, vignesh C wrote:


Few suggestions:
1) There is a warning:
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+   "ERROR:  relation \"error_name\" does not exist at character"
+);

  "my" variable $result masks earlier declaration in same scope at
t/100_bugs.pl line 400.

You can change:
my $result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
to
$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");


The reason is that the patch fell behind the master.
Fixed in v4 together with rebasing on current master.


2) Now that the crash is fixed, you could change it to a better message:
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+   "ERROR:  relation \"error_name\" does not exist at character"
+);



Tried to make this comment more clear.

Best wishes for the new year!


--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit 16beb574708e0e2e993eb2fa1b093be97b8e53cf
Author: Anton A. Melnikov 
Date:   Sun Dec 11 06:26:03 2022 +0300

Add test for syntax error in the function in a a logical replication
worker. See dea834938.

diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index 036576acab..b241a7d498 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -69,6 +69,51 @@ $node_publisher->wait_for_catchup('sub1');
 
 pass('index predicates do not cause crash');
 
+# https://www.postgresql.org/message-id/flat/adf0452f-8c6b-7def-d35e-ab516c80088e%40inbox.ru
+
+# The bug was that when a syntax error occurred in a SQL-language or PL/pgSQL-language
+# CREATE FUNCTION or DO command executed in a logical replication worker,
+# we'd suffer a null pointer dereference or assertion failure.
+
+$node_subscriber->safe_psql('postgres', q{
+	create or replace procedure rebuild_test(
+	) as
+	$body$
+	declare
+		l_code  text:= E'create or replace function public.test_selector(
+	) returns setof public.tab1 as
+	\$body\$
+		select * from  error_name
+	\$body\$ language sql;';
+	begin
+		execute l_code;
+		perform public.test_selector();
+	end
+	$body$ language plpgsql;
+	create or replace function test_trg()
+	returns trigger as
+	$body$
+		declare
+		begin
+			call public.rebuild_test();
+			return NULL;
+		end
+	$body$ language plpgsql;
+	create trigger test_trigger after insert on tab1 for each row
+execute function test_trg();
+	alter table tab1 enable replica trigger test_trigger;
+});
+
+# This command will cause a crash on the replica if that bug hasn't been fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+	"ERROR:  relation \"error_name\" does not exist at character"
+);
+
+ok($result,
+	"ERROR: Logical decoding doesn't fail on function error");
+
 # We'll re-use these nodes below, so drop their replication state.
 $node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub1");
 $node_publisher->safe_psql('postgres', "DROP PUBLICATION pub1");
@@ -352,7 +397,7 @@ $node_subscriber->wait_for_subscription_sync($node_publisher, 'tap_sub_sch');
 
 # Subscriber's sch1.t1 should receive the row inserted into the new sch1.t1,
 # but not the row inserted into the old sch1.t1 post-rename.
-my $result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
+$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
 is( $result, qq(1
 2), 'check data in subscriber sch1.t1 after schema rename');
 


Re: [BUG] Logical replica crash if there was an error in a function.

2023-01-07 Thread vignesh C
On Sun, 11 Dec 2022 at 09:21, Anton A. Melnikov  wrote:
>
>
> On 07.12.2022 21:03, Andres Freund wrote:
>
> >
> > This CF entry causes tests to fail on all platforms:
> > https://cirrus-ci.com/build/5755408111894528
> >
> > E.g.
> > https://api.cirrus-ci.com/v1/artifact/task/5298457144459264/testrun/build/testrun/subscription/100_bugs/log/regress_log_100_bugs
> >
> >  Begin standard error
> > psql::1: NOTICE:  dropped replication slot "sub1" on publisher
> >  End standard error
> > timed out waiting for match: ERROR:  relation "error_name" does not exist 
> > at character at /tmp/cirrus-ci-build/src/test/subscription/t/100_bugs.pl 
> > line 115.
> >
> > Greetings,
> >
> > Andres Freund
>
> Thank you for reminding!
>
> There was a conflict when applying v2 on current master.
> Rebased v3 is attached.

Few suggestions:
1) There is a warning:
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+   "ERROR:  relation \"error_name\" does not exist at character"
+);

 "my" variable $result masks earlier declaration in same scope at
t/100_bugs.pl line 400.

You can change:
my $result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");
to
$result = $node_subscriber->safe_psql('postgres', "SELECT * FROM sch1.t1");

2) Now that the crash is fixed, you could change it to a better message:
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+   "ERROR:  relation \"error_name\" does not exist at character"
+);

Regards,
Vignesh




Re: [BUG] Logical replica crash if there was an error in a function.

2022-12-10 Thread Anton A. Melnikov


On 07.12.2022 21:03, Andres Freund wrote:



This CF entry causes tests to fail on all platforms:
https://cirrus-ci.com/build/5755408111894528

E.g.
https://api.cirrus-ci.com/v1/artifact/task/5298457144459264/testrun/build/testrun/subscription/100_bugs/log/regress_log_100_bugs

 Begin standard error
psql::1: NOTICE:  dropped replication slot "sub1" on publisher
 End standard error
timed out waiting for match: ERROR:  relation "error_name" does not exist at 
character at /tmp/cirrus-ci-build/src/test/subscription/t/100_bugs.pl line 115.

Greetings,

Andres Freund


Thank you for reminding!

There was a conflict when applying v2 on current master.
Rebased v3 is attached.

Best wishes!

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit 07eaa674953ed700a53174410a6e1eb81151d7e8
Author: Anton A. Melnikov 
Date:   Sun Dec 11 06:26:03 2022 +0300

Add test for syntax error in the function in a a logical replication
worker. See dea834938.

diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index 7b3cd66be5..0bf4fdfa47 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -69,6 +69,51 @@ $node_publisher->wait_for_catchup('sub1');
 
 pass('index predicates do not cause crash');
 
+# https://www.postgresql.org/message-id/flat/adf0452f-8c6b-7def-d35e-ab516c80088e%40inbox.ru
+
+# The bug was that when a syntax error occurred in a SQL-language or PL/pgSQL-language
+# CREATE FUNCTION or DO command executed in a logical replication worker,
+# we'd suffer a null pointer dereference or assertion failure.
+
+$node_subscriber->safe_psql('postgres', q{
+	create or replace procedure rebuild_test(
+	) as
+	$body$
+	declare
+		l_code  text:= E'create or replace function public.test_selector(
+	) returns setof public.tab1 as
+	\$body\$
+		select * from  error_name
+	\$body\$ language sql;';
+	begin
+		execute l_code;
+		perform public.test_selector();
+	end
+	$body$ language plpgsql;
+	create or replace function test_trg()
+	returns trigger as
+	$body$
+		declare
+		begin
+			call public.rebuild_test();
+			return NULL;
+		end
+	$body$ language plpgsql;
+	create trigger test_trigger after insert on tab1 for each row
+execute function test_trg();
+	alter table tab1 enable replica trigger test_trigger;
+});
+
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+	"ERROR:  relation \"error_name\" does not exist at character"
+);
+
+ok($result,
+	"ERROR: Logical decoding doesn't fail on function error");
+
 # We'll re-use these nodes below, so drop their replication state.
 # We don't bother to drop the tables though.
 $node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION sub1");


Re: [BUG] Logical replica crash if there was an error in a function.

2022-12-07 Thread Andres Freund
Hi,

On 2022-11-16 17:52:50 +0300, Anton A. Melnikov wrote:
> Sorry, i didn't fully understand what is required and
> added some functions to the test that spend extra cpu time. But i found
> that it is possible to make a test according to previous remarks by adding
> only a few extra queries to an existent test without any additional syncing.
> 
> Experimentally, i could not observe any significant difference in
> CPU usage due to this test addition.
> The difference in the CPU usage was less than standard error.
> See 100_bugs-CPU-time.txt attached.
> 
> > > Additionally
> > > i've tried to reduce overall number of nodes previously
> > > used in this test in a similar way.
> > 
> > Optimizing existing tests isn't an answer to that.  We could
> > install those optimizations without adding a new test case.
> 
> Oh sure! Here is a separate patch for this optimization:
> https://www.postgresql.org/message-id/eb7aa992-c2d7-6ce7-4942-0c784231a362%40inbox.ru
> On the contrary with previous case in that one the difference in the CPU usage
> during 100_bugs.pl is essential; it decreases approximately by 30%.

This CF entry causes tests to fail on all platforms:
https://cirrus-ci.com/build/5755408111894528

E.g.
https://api.cirrus-ci.com/v1/artifact/task/5298457144459264/testrun/build/testrun/subscription/100_bugs/log/regress_log_100_bugs

 Begin standard error
psql::1: NOTICE:  dropped replication slot "sub1" on publisher
 End standard error
timed out waiting for match: ERROR:  relation "error_name" does not exist at 
character at /tmp/cirrus-ci-build/src/test/subscription/t/100_bugs.pl line 115.

Greetings,

Andres Freund




Re: [BUG] Logical replica crash if there was an error in a function.

2022-11-16 Thread Anton A. Melnikov

Thanks a lot for the fast reply!

On 03.11.2022 18:29, Tom Lane wrote:

If we were just adding a
query or two to an existing scenario, that could be okay; but spinning
up and syncing a whole new primary and standby database is *expensive*
when you multiply it by the number of times developers and buildfarm
animals are going to run the tests in the future.



On 15.11.2022 04:59, Tom Lane wrote:

"Anton A. Melnikov"  writes:

On 02.11.2022 21:02, Tom Lane wrote:

I don't think the cost of that test case is justified by the tiny
probability that it'd ever catch anything.



Seems it is possible to do a test without these remarks. The attached
test uses existing nodes and checks the specific error message.


My opinion remains unchanged: this would be a very poor use of test
cycles.


Sorry, i didn't fully understand what is required and
added some functions to the test that spend extra cpu time. But i found
that it is possible to make a test according to previous remarks by adding
only a few extra queries to an existent test without any additional syncing.

Experimentally, i could not observe any significant difference in
CPU usage due to this test addition.
The difference in the CPU usage was less than standard error.
See 100_bugs-CPU-time.txt attached.


Additionally
i've tried to reduce overall number of nodes previously
used in this test in a similar way.


Optimizing existing tests isn't an answer to that.  We could
install those optimizations without adding a new test case.


Oh sure! Here is a separate patch for this optimization:
https://www.postgresql.org/message-id/eb7aa992-c2d7-6ce7-4942-0c784231a362%40inbox.ru
On the contrary with previous case in that one the difference in the CPU usage
during 100_bugs.pl is essential; it decreases approximately by 30%.


With the best wishes!

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit 2449158ba576d7c6d97852d14f85dadb3aced262
Author: Anton A. Melnikov 
Date:   Wed Nov 16 11:46:54 2022 +0300

Add test for syntax error in the function in a a logical replication
worker. See dea834938.

diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index 6247aa7730..eb4ab6d359 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -69,6 +69,51 @@ $node_publisher->wait_for_catchup('sub1');
 
 pass('index predicates do not cause crash');
 
+# https://www.postgresql.org/message-id/flat/adf0452f-8c6b-7def-d35e-ab516c80088e%40inbox.ru
+
+# The bug was that when a syntax error occurred in a SQL-language or PL/pgSQL-language
+# CREATE FUNCTION or DO command executed in a logical replication worker,
+# we'd suffer a null pointer dereference or assertion failure.
+
+$node_subscriber->safe_psql('postgres', q{
+	create or replace procedure rebuild_test(
+	) as
+	$body$
+	declare
+		l_code  text:= E'create or replace function public.test_selector(
+	) returns setof public.tab1 as
+	\$body\$
+		select * from  error_name
+	\$body\$ language sql;';
+	begin
+		execute l_code;
+		perform public.test_selector();
+	end
+	$body$ language plpgsql;
+	create or replace function test_trg()
+	returns trigger as
+	$body$
+		declare
+		begin
+			call public.rebuild_test();
+			return NULL;
+		end
+	$body$ language plpgsql;
+	create trigger test_trigger after insert on tab1 for each row
+execute function test_trg();
+	alter table tab1 enable replica trigger test_trigger;
+});
+
+# This would crash on the subscriber if not fixed
+$node_publisher->safe_psql('postgres', "INSERT INTO tab1 VALUES (3, 4)");
+
+my $result = $node_subscriber->wait_for_log(
+	"ERROR:  relation \"error_name\" does not exist at character"
+);
+
+ok($result,
+	"ERROR: Logical decoding doesn't fail on function error");
+
 $node_publisher->stop('fast');
 $node_subscriber->stop('fast');
 
## src/test/subscription/100_bugs.pl 
Before adding test:
Files=1, Tests=7, 14 wallclock secs ( 0.02 usr  0.00 sys +  7.96 cusr  2.24 
csys = 10.22 CPU)
Files=1, Tests=7, 15 wallclock secs ( 0.02 usr  0.00 sys +  8.21 cusr  2.13 
csys = 10.36 CPU)
Files=1, Tests=7, 14 wallclock secs ( 0.01 usr  0.01 sys +  8.69 cusr  2.17 
csys = 10.88 CPU)
Files=1, Tests=7, 15 wallclock secs ( 0.01 usr  0.00 sys +  8.34 cusr  2.22 
csys = 10.57 CPU)
Files=1, Tests=7, 15 wallclock secs ( 0.01 usr  0.00 sys +  8.64 cusr  2.19 
csys = 10.84 CPU)
Files=1, Tests=7, 14 wallclock secs ( 0.02 usr  0.00 sys +  8.18 cusr  2.20 
csys = 10.40 CPU)
Files=1, Tests=7, 13 wallclock secs ( 0.01 usr  0.00 sys +  8.02 cusr  2.23 
csys = 10.26 CPU)
Files=1, Tests=7, 14 wallclock secs ( 0.01 usr  0.01 sys +  8.20 cusr  2.14 
csys = 10.36 CPU)
Files=1, Tests=7, 13 wallclock secs ( 0.02 usr  0.00 sys +  8.04 cusr  2.19 
csys = 10.25 CPU)
Files=1, Tests=7, 14 wallclock secs ( 0.01 usr  0.00 sys +  8.20 cusr  2.09 
csys = 10.30 CPU)
Average CPU usage = 10.44+-0.07s

After adding test:
Files=1, Tests=8, 14 

Re: [BUG] Logical replica crash if there was an error in a function.

2022-11-14 Thread Tom Lane
"Anton A. Melnikov"  writes:
> On 02.11.2022 21:02, Tom Lane wrote:
>> I don't think the cost of that test case is justified by the tiny
>> probability that it'd ever catch anything.

> Seems it is possible to do a test without these remarks. The attached
> test uses existing nodes and checks the specific error message.

My opinion remains unchanged: this would be a very poor use of test
cycles.

> Additionally
> i've tried to reduce overall number of nodes previously
> used in this test in a similar way.

Optimizing existing tests isn't an answer to that.  We could
install those optimizations without adding a new test case.

regards, tom lane




Re: [BUG] Logical replica crash if there was an error in a function.

2022-11-14 Thread Anton A. Melnikov

Hello!

On 02.11.2022 21:02, Tom Lane wrote:

So I'm now good with the idea of just not failing.  I don't like
the patch as presented though.  First, the cfbot is quite rightly
complaining about the "uninitialized variable" warning it draws.
Second, I don't see a good reason to tie the change to logical
replication in any way.  Let's just change the Assert to an if(),
as attached.


Fully agreed that is the most optimal solution for that case. Thanks!
Surely it's very rare one but there was a real segfault at production server.
Someone came up with the idea to modify function like public.test_selector()
in repcmd.sql (see above) on the fly with adding to it :last_modification:
field from current time and some other parameters with the help of replace()
inside the creation of the rebuild_test() procedure.

On 03.11.2022 18:29, Tom Lane wrote:

Amit Kapila  writes:

LGTM. I don't know if it is a good idea to omit the test case for this
scenario. If required, we can reuse the test case from Sawada-San's
patch in the email [1].


I don't think the cost of that test case is justified by the tiny
probability that it'd ever catch anything.  If we were just adding a
query or two to an existing scenario, that could be okay; but spinning
up and syncing a whole new primary and standby database is *expensive*
when you multiply it by the number of times developers and buildfarm
animals are going to run the tests in the future.

There's also the little issue that I'm not sure it would actually
detect a problem if we had one.  The case is going to fail, and
what we want to know is just how messily it fails, and I think the
TAP infrastructure isn't very sensitive to that ... especially
if the test isn't even checking for specific error messages.

Seems it is possible to do a test without these remarks. The attached
test uses existing nodes and checks the specific error message. Additionally
i've tried to reduce overall number of nodes previously
used in this test in a similar way.

Would be glad for comments and remarks.

With best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

commit 722fa6d8c629eb83b3d11ea88b49bab1f700b48d
Author: Anton A. Melnikov 
Date:   Mon Nov 14 08:30:26 2022 +0300

Add test for syntax error in the function in a a logical replication
worker and combine some test nodes.

diff --git a/src/test/subscription/t/100_bugs.pl b/src/test/subscription/t/100_bugs.pl
index 6247aa7730..76a6c12cae 100644
--- a/src/test/subscription/t/100_bugs.pl
+++ b/src/test/subscription/t/100_bugs.pl
@@ -69,6 +69,9 @@ $node_publisher->wait_for_catchup('sub1');
 
 pass('index predicates do not cause crash');
 
+$node_publisher->safe_psql('postgres',
+	"CHECKPOINT");
+
 $node_publisher->stop('fast');
 $node_subscriber->stop('fast');
 
@@ -81,9 +84,8 @@ $node_subscriber->stop('fast');
 # identity set before accepting updates.  If it did not it would cause
 # an error when an update was attempted.
 
-$node_publisher = PostgreSQL::Test::Cluster->new('publisher2');
-$node_publisher->init(allows_streaming => 'logical');
-$node_publisher->start;
+$node_publisher->rotate_logfile();
+$node_publisher->start();
 
 $node_publisher->safe_psql('postgres',
 	"CREATE PUBLICATION pub FOR ALL TABLES");
@@ -102,6 +104,9 @@ is( $node_publisher->psql(
 	'update to unlogged table without replica identity with FOR ALL TABLES publication'
 );
 
+$node_publisher->safe_psql('postgres',
+	"CHECKPOINT");
+
 $node_publisher->stop('fast');
 
 # Bug #16643 - https://postgr.es/m/16643-eaadeb2a1a58d...@postgresql.org
@@ -226,13 +231,13 @@ $node_sub->stop('fast');
 # target table's relcache was not being invalidated. This leads to skipping
 # UPDATE/DELETE operations during apply on the subscriber side as the columns
 # required to search corresponding rows won't get logged.
-$node_publisher = PostgreSQL::Test::Cluster->new('publisher3');
-$node_publisher->init(allows_streaming => 'logical');
-$node_publisher->start;
 
-$node_subscriber = PostgreSQL::Test::Cluster->new('subscriber3');
-$node_subscriber->init(allows_streaming => 'logical');
-$node_subscriber->start;
+$node_publisher->rotate_logfile();
+$node_publisher->start();
+
+$node_subscriber->rotate_logfile();
+$node_subscriber->rotate_logfile(); # call twice to synchronize lognames between master and replica
+$node_subscriber->start();
 
 $node_publisher->safe_psql('postgres',
 	"CREATE TABLE tab_replidentity_index(a int not null, b int not null)");
@@ -296,7 +301,89 @@ is( $node_subscriber->safe_psql(
 	qq(-1|1),
 	"update works with REPLICA IDENTITY");
 
+$node_publisher->safe_psql('postgres',
+	"CHECKPOINT");
+
 $node_publisher->stop('fast');
 $node_subscriber->stop('fast');
 
+# https://www.postgresql.org/message-id/flat/adf0452f-8c6b-7def-d35e-ab516c80088e%40inbox.ru
+
+# The bug was that when a syntax error occurred in a SQL-language or PL/pgSQL-language
+# CREATE FUNCTION or DO command executed in a logical replication 

Re: [BUG] Logical replica crash if there was an error in a function.

2022-11-03 Thread Tom Lane
Amit Kapila  writes:
> LGTM. I don't know if it is a good idea to omit the test case for this
> scenario. If required, we can reuse the test case from Sawada-San's
> patch in the email [1].

I don't think the cost of that test case is justified by the tiny
probability that it'd ever catch anything.  If we were just adding a
query or two to an existing scenario, that could be okay; but spinning
up and syncing a whole new primary and standby database is *expensive*
when you multiply it by the number of times developers and buildfarm
animals are going to run the tests in the future.

There's also the little issue that I'm not sure it would actually
detect a problem if we had one.  The case is going to fail, and
what we want to know is just how messily it fails, and I think the
TAP infrastructure isn't very sensitive to that ... especially
if the test isn't even checking for specific error messages.

regards, tom lane




Re: [BUG] Logical replica crash if there was an error in a function.

2022-11-02 Thread Amit Kapila
On Wed, Nov 2, 2022 at 11:32 PM Tom Lane  wrote:
>
> So I'm now good with the idea of just not failing.  I don't like
> the patch as presented though.  First, the cfbot is quite rightly
> complaining about the "uninitialized variable" warning it draws.
> Second, I don't see a good reason to tie the change to logical
> replication in any way.  Let's just change the Assert to an if(),
> as attached.
>

LGTM. I don't know if it is a good idea to omit the test case for this
scenario. If required, we can reuse the test case from Sawada-San's
patch in the email [1].

[1] - 
https://www.postgresql.org/message-id/CAD21AoDKA%2BMB4M9BOnct_%3DZj5bNHbkYn6oKZ2aOQp8m%3D3x2GhQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: [BUG] Logical replica crash if there was an error in a function.

2022-11-02 Thread Tom Lane
Michael Paquier  writes:
> Yeah, the query string is not available in this context, but it surely
> looks wrong to me to assume that something as low-level as
> function_parse_error_transpose() needs to be updated for the sake of a
> logical worker, while we have other areas that would expect a portal
> to be set up.

After thinking about this some more, I'm withdrawing my opposition to
fixing it by making function_parse_error_transpose() cope with not
having an active portal.  I have a few reasons:

* A Portal is intended to contain an executor state.  While worker.c
does fake up an EState, there's certainly no plan tree or planstate tree,
and I doubt it'd be sane to create dummy ones.  So even if we made a
Portal, it'd be lacking a lot of the stuff one would expect to find there.
I fear that moving the cause of this sort of problem from "there's no
ActivePortal" to "there's an ActivePortal but it lacks field X" wouldn't
be an improvement.

* There is actually just one other consumer of ActivePortal,
namely EnsurePortalSnapshotExists, and that doesn't offer a lot of
support for the idea that ActivePortal must always be set.  It says

 * Nothing to do if a snapshot is set.  (We take it on faith that the
 * outermost active snapshot belongs to some Portal; or if there is no
 * Portal, it's somebody else's responsibility to manage things.)

and "it's somebody else's responsibility" summarizes the situation
here pretty perfectly.  worker.c *does* set up an active snapshot.

* The comment in function_parse_error_transpose() freely admits that
looking at the ActivePortal is a hack.  It works, more or less, for
the intended case of reporting a function-body syntax error nicely
during CREATE FUNCTION.  But it's capable of getting false-positive
matches, so maybe someday we should replace it with something more
bulletproof.

* There isn't any strong reason why function_parse_error_transpose()
has to succeed at finding the original query text.  Its fallback
approach of treating the syntax error position as internal to the
function body text is fine, in fact it's just what we want here.


So I'm now good with the idea of just not failing.  I don't like
the patch as presented though.  First, the cfbot is quite rightly
complaining about the "uninitialized variable" warning it draws.
Second, I don't see a good reason to tie the change to logical
replication in any way.  Let's just change the Assert to an if(),
as attached.

regards, tom lane

diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index a9fe45e347..e03b98bcd2 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -1017,7 +1017,6 @@ function_parse_error_transpose(const char *prosrc)
 {
 	int			origerrposition;
 	int			newerrposition;
-	const char *queryText;
 
 	/*
 	 * Nothing to do unless we are dealing with a syntax error that has a
@@ -1035,11 +1034,22 @@ function_parse_error_transpose(const char *prosrc)
 	}
 
 	/* We can get the original query text from the active portal (hack...) */
-	Assert(ActivePortal && ActivePortal->status == PORTAL_ACTIVE);
-	queryText = ActivePortal->sourceText;
+	if (ActivePortal && ActivePortal->status == PORTAL_ACTIVE)
+	{
+		const char *queryText = ActivePortal->sourceText;
 
-	/* Try to locate the prosrc in the original text */
-	newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
+		/* Try to locate the prosrc in the original text */
+		newerrposition = match_prosrc_to_query(prosrc, queryText,
+			   origerrposition);
+	}
+	else
+	{
+		/*
+		 * Quietly give up if no ActivePortal.  This is an unusual situation
+		 * but it can happen in, e.g., logical replication workers.
+		 */
+		newerrposition = -1;
+	}
 
 	if (newerrposition > 0)
 	{


Re: [BUG] Logical replica crash if there was an error in a function.

2022-10-11 Thread Michael Paquier
On Sun, Oct 09, 2022 at 12:24:23PM +0300, Anton A. Melnikov wrote:
> On the other hand, function_parse_error_transpose() tries to get
> the original query text  (INSERT INTO test VALUES ('1') in our case) from
> the ActivePortal to clarify the location of the error.
> But in the logrep worker there is no way to restore original query text
> from the logrep message. There is only 'zipped' query equivalent to the 
> original.
> So any function_parse_error_transpose() call seems to be useless
> in the logrep worker.

Yeah, the query string is not available in this context, but it surely
looks wrong to me to assume that something as low-level as
function_parse_error_transpose() needs to be updated for the sake of a
logical worker, while we have other areas that would expect a portal
to be set up.
--
Michael


signature.asc
Description: PGP signature


Re: [BUG] Logical replica crash if there was an error in a function.

2022-10-10 Thread Alvaro Herrera
On 2022-Sep-24, Tom Lane wrote:

> "Anton A. Melnikov"  writes:
> > [ v4-0001-Fix-logical-replica-assert-on-func-error.patch ]
> 
> I took a quick look at this.  I think you're solving the
> problem in the wrong place.  The real issue is why are
> we not setting up ActivePortal correctly when running
> user-defined code in a logrep worker?  There is other code
> that expects that to be set, eg EnsurePortalSnapshotExists.

Right ... mostly, the logical replication *does* attempt to set up a
transaction and active snapshot when replaying actions (c.f.
begin_replication_step()).  Is this firing at an inappropriate time,
perhaps?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/




Re: [BUG] Logical replica crash if there was an error in a function.

2022-10-09 Thread Anton A. Melnikov

Hello!

Thanks for reply!

On 24.09.2022 20:27, Tom Lane wrote:

I think you're solving the
problem in the wrong place.  The real issue is why are
we not setting up ActivePortal correctly when running
user-defined code in a logrep worker?

During a common query from the backend ActivePortal becomes defined
in the PortalRun and then AfterTriggerEndQuery starts with
non-NULL ActivePortal after ExecutorFinish.
When the logrep worker is applying messages there are neither
PortalStart nor PortalRun calls. And AfterTriggerEndQuery starts
with undefined ActivePortal after finish-edata().
May be it's normal behavior?


There is other code
that expects that to be set, eg EnsurePortalSnapshotExists.


When the logrep worker is applying message it doesn't have to use
ActivePortal in EnsurePortalSnapshotExists because ActiveSnapshot is already 
installed.
It is set at the beginning of each transaction in the begin_replication_step 
call.

On the other hand, function_parse_error_transpose() tries to get
the original query text  (INSERT INTO test VALUES ('1') in our case) from
the ActivePortal to clarify the location of the error.
But in the logrep worker there is no way to restore original query text
from the logrep message. There is only 'zipped' query equivalent to the 
original.
So any function_parse_error_transpose() call seems to be useless
in the logrep worker.

And it looks like we can simply omit match_prosrc_to_query() call there.
The attached patch does it.

Best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit bfaa02ac7a7cbeb793747be71962a7799c60c21c
Author: Anton A. Melnikov 
Date:   Sun Oct 9 11:56:20 2022 +0300

Fix logical replica crash if there was an error in a user function.

diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index a9fe45e347..1e8f1b1097 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -36,6 +36,7 @@
 #include "parser/parse_coerce.h"
 #include "parser/parse_type.h"
 #include "pgstat.h"
+#include "replication/logicalworker.h"
 #include "rewrite/rewriteHandler.h"
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
@@ -1034,12 +1035,20 @@ function_parse_error_transpose(const char *prosrc)
 			return false;
 	}
 
-	/* We can get the original query text from the active portal (hack...) */
-	Assert(ActivePortal && ActivePortal->status == PORTAL_ACTIVE);
-	queryText = ActivePortal->sourceText;
+	/*
+	 * In the logical replication worker there is no way to restore original
+	 * query text from the logical replication message. There is only 'zipped'
+	 * query equivalent to the original text.
+	 */
+	if (!IsLogicalWorker())
+	{
+		/* We can get the original query text from the active portal (hack...) */
+		Assert(ActivePortal && ActivePortal->status == PORTAL_ACTIVE);
+		queryText = ActivePortal->sourceText;
 
-	/* Try to locate the prosrc in the original text */
-	newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
+		/* Try to locate the prosrc in the original text */
+		newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
+	}
 
 	if (newerrposition > 0)
 	{


Re: [BUG] Logical replica crash if there was an error in a function.

2022-09-24 Thread Tom Lane
"Anton A. Melnikov"  writes:
> [ v4-0001-Fix-logical-replica-assert-on-func-error.patch ]

I took a quick look at this.  I think you're solving the
problem in the wrong place.  The real issue is why are
we not setting up ActivePortal correctly when running
user-defined code in a logrep worker?  There is other code
that expects that to be set, eg EnsurePortalSnapshotExists.

regards, tom lane




Re: [BUG] Logical replica crash if there was an error in a function.

2022-09-08 Thread Anton A. Melnikov

Hello!

Added a TAP test for this case.

On 30.08.2022 10:09, Anton A. Melnikov wrote:

Hello!

The patch was rebased on current master.
And here is a simplified crash reproduction:
1) On primary with 'wal_level = logical' execute:
  CREATE TABLE public.test (id int NOT NULL, val integer);
  CREATE PUBLICATION test_pub FOR TABLE test;

2) On replica replace  in the repcmd.sql  attached with primary port and 
execute it:
psql -f repcmd.sql

3) On master execute command:
INSERT INTO test VALUES ('1');

 
With best regards,


--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit d539e1e36ef7af33e1a89eaee00183739c716797
Author: Anton A. Melnikov 
Date:   Sun Aug 21 18:27:44 2022 +0300

Fix logical replica crash if there was an error in a function.

diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index a9fe45e347..1381fae575 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -1007,8 +1007,9 @@ sql_function_parse_error_callback(void *arg)
  * anonymous-block handler, not only for SQL-language functions.
  * It is assumed that the syntax error position is initially relative to the
  * function body string (as passed in).  If possible, we adjust the position
- * to reference the original command text; if we can't manage that, we set
- * up an "internal query" syntax error instead.
+ * to reference the original command text; if we can't manage that or
+ * can't get the original command text when ActivePortal is not defined,
+ * we set up an "internal query" syntax error instead.
  *
  * Returns true if a syntax error was processed, false if not.
  */
@@ -1016,7 +1017,7 @@ bool
 function_parse_error_transpose(const char *prosrc)
 {
 	int			origerrposition;
-	int			newerrposition;
+	int			newerrposition = 0;
 	const char *queryText;
 
 	/*
@@ -1034,12 +1035,15 @@ function_parse_error_transpose(const char *prosrc)
 			return false;
 	}
 
-	/* We can get the original query text from the active portal (hack...) */
-	Assert(ActivePortal && ActivePortal->status == PORTAL_ACTIVE);
-	queryText = ActivePortal->sourceText;
+	if (ActivePortal)
+	{
+		/* We can get the original query text from the active portal (hack...) */
+		Assert(ActivePortal->status == PORTAL_ACTIVE);
+		queryText = ActivePortal->sourceText;
 
-	/* Try to locate the prosrc in the original text */
-	newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
+		/* Try to locate the prosrc in the original text */
+		newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
+	}
 
 	if (newerrposition > 0)
 	{
@@ -1052,7 +1056,8 @@ function_parse_error_transpose(const char *prosrc)
 	else
 	{
 		/*
-		 * If unsuccessful, convert the position to an internal position
+		 * If unsuccessful or ActivePortal not defined and original command
+		 * text is unreachable, convert the position to an internal position
 		 * marker and give the function text as the internal query.
 		 */
 		errposition(0);
diff --git a/src/test/recovery/t/034_logical_replica_on_error.pl b/src/test/recovery/t/034_logical_replica_on_error.pl
new file mode 100644
index 00..380ad74590
--- /dev/null
+++ b/src/test/recovery/t/034_logical_replica_on_error.pl
@@ -0,0 +1,105 @@
+# Copyright (c) 2022, Postgres Professional
+
+# There was an assertion if function text contains an error. See PGPRO-7025
+# Тhis file has a prefix (120_) to prevent prefix collision with
+# upstream test files.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 2;
+
+# Initialize primary node
+my $node_primary = PostgreSQL::Test::Cluster->new('primary');
+$node_primary->init(allows_streaming => 1);
+$node_primary->append_conf(
+	'postgresql.conf', qq(
+wal_level = logical
+));
+$node_primary->start;
+
+$node_primary->safe_psql('postgres',
+	'CREATE TABLE public.test (id int NOT NULL, val integer);');
+
+$node_primary->safe_psql('postgres',
+	'CREATE PUBLICATION test_pub FOR TABLE test;');
+
+# Initialize logical replica node
+my $node_replica = PostgreSQL::Test::Cluster->new('replica');
+$node_replica->init(has_streaming => 1,
+	has_restoring => 1);
+$node_replica->start;
+
+$node_replica->safe_psql('postgres',
+	'CREATE TABLE test (id int NOT NULL, val integer);');
+
+$node_replica->safe_psql('postgres', q{
+	create or replace procedure rebuild_test(
+	) as
+	$body$
+	declare
+		l_code  text:= E'create or replace function public.test_selector(
+	) returns setof public.test as
+	\$body\$
+		select * from  error_name
+	\$body\$ language sql;';
+	begin
+		execute l_code;
+		perform public.test_selector();
+	end
+	$body$ language plpgsql;
+});
+
+$node_replica->safe_psql('postgres', '
+	create or replace function test_trg()
+	returns trigger as
+	$body$
+		declare
+		begin
+			call public.rebuild_test();
+			return NULL;
+		end
+	$body$ language plpgsql;
+');
+
+$node_replica->safe_psql('postgres',
+	

Re: [BUG] Logical replica crash if there was an error in a function.

2022-08-30 Thread Anton A. Melnikov

Hello!

The patch was rebased on current master.
And here is a simplified crash reproduction:
1) On primary with 'wal_level = logical' execute:
 CREATE TABLE public.test (id int NOT NULL, val integer);
 CREATE PUBLICATION test_pub FOR TABLE test;

2) On replica replace  in the repcmd.sql  attached with primary port and 
execute it:
psql -f repcmd.sql

3) On master execute command:
INSERT INTO test VALUES ('1');

With best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit 4de66e1b1ffaeaacdd72f3e72789ca05b114476b
Author: Anton A. Melnikov 
Date:   Sun Aug 21 18:27:44 2022 +0300

Fix logical replica crash if there was an error in a function.

diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index a9fe45e347..1381fae575 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -1007,8 +1007,9 @@ sql_function_parse_error_callback(void *arg)
  * anonymous-block handler, not only for SQL-language functions.
  * It is assumed that the syntax error position is initially relative to the
  * function body string (as passed in).  If possible, we adjust the position
- * to reference the original command text; if we can't manage that, we set
- * up an "internal query" syntax error instead.
+ * to reference the original command text; if we can't manage that or
+ * can't get the original command text when ActivePortal is not defined,
+ * we set up an "internal query" syntax error instead.
  *
  * Returns true if a syntax error was processed, false if not.
  */
@@ -1016,7 +1017,7 @@ bool
 function_parse_error_transpose(const char *prosrc)
 {
 	int			origerrposition;
-	int			newerrposition;
+	int			newerrposition = 0;
 	const char *queryText;
 
 	/*
@@ -1034,12 +1035,15 @@ function_parse_error_transpose(const char *prosrc)
 			return false;
 	}
 
-	/* We can get the original query text from the active portal (hack...) */
-	Assert(ActivePortal && ActivePortal->status == PORTAL_ACTIVE);
-	queryText = ActivePortal->sourceText;
+	if (ActivePortal)
+	{
+		/* We can get the original query text from the active portal (hack...) */
+		Assert(ActivePortal->status == PORTAL_ACTIVE);
+		queryText = ActivePortal->sourceText;
 
-	/* Try to locate the prosrc in the original text */
-	newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
+		/* Try to locate the prosrc in the original text */
+		newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
+	}
 
 	if (newerrposition > 0)
 	{
@@ -1052,7 +1056,8 @@ function_parse_error_transpose(const char *prosrc)
 	else
 	{
 		/*
-		 * If unsuccessful, convert the position to an internal position
+		 * If unsuccessful or ActivePortal not defined and original command
+		 * text is unreachable, convert the position to an internal position
 		 * marker and give the function text as the internal query.
 		 */
 		errposition(0);


repcmd.sql
Description: application/sql


Re: [BUG] Logical replica crash if there was an error in a function.

2022-08-21 Thread Anton A. Melnikov

Hello!

On 21.08.2022 17:33, Anton A. Melnikov wrote:

Hello!

Here is a fix for the bug first described in:
https://www.postgresql.org/message-id/flat/adf0452f-8c6b-7def-d35e-ab516c80088e%40inbox.ru



Sorry, there was a wrong patch in the first letter.
Here is a right version.


With best regards,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companycommit 9bff84bda455609820c259bdc47d200bebcba7ab
Author: Anton A. Melnikov 
Date:   Sun Aug 21 18:27:44 2022 +0300

Fix logical replica crash if there was an error in a function.

diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index a9fe45e347..1381fae575 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -1007,8 +1007,9 @@ sql_function_parse_error_callback(void *arg)
  * anonymous-block handler, not only for SQL-language functions.
  * It is assumed that the syntax error position is initially relative to the
  * function body string (as passed in).  If possible, we adjust the position
- * to reference the original command text; if we can't manage that, we set
- * up an "internal query" syntax error instead.
+ * to reference the original command text; if we can't manage that or
+ * can't get the original command text when ActivePortal is not defined,
+ * we set up an "internal query" syntax error instead.
  *
  * Returns true if a syntax error was processed, false if not.
  */
@@ -1016,7 +1017,7 @@ bool
 function_parse_error_transpose(const char *prosrc)
 {
 	int			origerrposition;
-	int			newerrposition;
+	int			newerrposition = 0;
 	const char *queryText;
 
 	/*
@@ -1034,12 +1035,15 @@ function_parse_error_transpose(const char *prosrc)
 			return false;
 	}
 
-	/* We can get the original query text from the active portal (hack...) */
-	Assert(ActivePortal && ActivePortal->status == PORTAL_ACTIVE);
-	queryText = ActivePortal->sourceText;
+	if (ActivePortal)
+	{
+		/* We can get the original query text from the active portal (hack...) */
+		Assert(ActivePortal->status == PORTAL_ACTIVE);
+		queryText = ActivePortal->sourceText;
 
-	/* Try to locate the prosrc in the original text */
-	newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
+		/* Try to locate the prosrc in the original text */
+		newerrposition = match_prosrc_to_query(prosrc, queryText, origerrposition);
+	}
 
 	if (newerrposition > 0)
 	{
@@ -1052,7 +1056,8 @@ function_parse_error_transpose(const char *prosrc)
 	else
 	{
 		/*
-		 * If unsuccessful, convert the position to an internal position
+		 * If unsuccessful or ActivePortal not defined and original command
+		 * text is unreachable, convert the position to an internal position
 		 * marker and give the function text as the internal query.
 		 */
 		errposition(0);


[BUG] Logical replica crash if there was an error in a function.

2022-08-21 Thread Anton A. Melnikov

Hello!

Here is a fix for the bug first described in:
https://www.postgresql.org/message-id/flat/adf0452f-8c6b-7def-d35e-ab516c80088e%40inbox.ru

Reproduction:
1) On master with 'wal_level = logical' execute mascmd.sql attached.

2) On replica substitute the correct port in repcmd.sql and execute it.

3) On master execute command:
INSERT INTO rul_rule_set VALUES ('1', 'name','1','age','true');

Replica will crash with:
FailedAssertion("ActivePortal && ActivePortal->status == PORTAL_ACTIVE", File: 
"pg_proc.c", Line: 1038, PID: 42894)
in infinite loop.

After applying this patch replica will give the correct error message instead 
of assertion:

2022-08-21 17:08:39.935 MSK [143171] ERROR:  relation "rul_rule_set" does not 
exist at character 172
2022-08-21 17:08:39.935 MSK [143171] QUERY:
-- Last modified: 2022-08-21 17:08:39.930842+03
with parameters as (
<<--- skipped by me --- >>>
 )
2022-08-21 17:08:39.935 MSK [143171] CONTEXT:  SQL statement "create or replace 
function public.rule_set_selector(
<<--- skipped by me --- >>>
SQL statement "call public.rebuild_rule_set_selector()"
PL/pgSQL function public.rul_rule_set_trg() line 4 at CALL
processing remote data for replication origin "pg_16401" during "INSERT"
for replication target relation "public.rul_rule_set" in transaction 
741 finished at 0/17BE180

With best regards,
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

mascmd.sql
Description: application/sql


repcmd.sql
Description: application/sql
commit 585d0cd944d952f08f7649d02f1d6b6644e93611
Author: Peter Eisentraut 
Date:   Sat Aug 20 20:48:47 2022 +0200

Remove dummyret definition

This hasn't been used in a while (last use removed by 50d22de932, and
before that 84b6d5f359), and since we are now preferring inline
functions over complex macros, it's unlikely to be needed again.

Reviewed-by: Daniel Gustafsson 
Discussion: https://www.postgresql.org/message-id/flat/7110ab37-8ddd-437f-905c-6aa6205c6185%40enterprisedb.com

diff --git a/src/include/c.h b/src/include/c.h
index 65e91a6b89..dfc366b026 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -333,16 +333,6 @@
 	_61,_62,_63,  N, ...) \
 	(N)
 
-/*
- * dummyret is used to set return values in macros that use ?: to make
- * assignments.  gcc wants these to be void, other compilers like char
- */
-#ifdef __GNUC__	/* GNU cc */
-#define dummyret	void
-#else
-#define dummyret	char
-#endif
-
 /*
  * Generic function pointer.  This can be used in the rare cases where it's
  * necessary to cast a function pointer to a seemingly incompatible function