Re: cpluspluscheck/headerscheck require build in REL_16_STABLE
On 2024-04-27 09:14, John Naylor wrote: On Wed, Apr 17, 2024 at 7:21 AM John Naylor wrote: On Mon, Apr 15, 2024 at 9:20 PM Marina Polyakova wrote: > Everything seems to work with this patch, thank you! Glad to hear it -- I'll push next week when I get back from vacation, unless there are objections. Pushed, thanks for the report! Thank you again! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: cpluspluscheck/headerscheck require build in REL_16_STABLE
On 2024-04-13 08:40, John Naylor wrote: On Fri, Apr 12, 2024 at 11:51 PM Marina Polyakova wrote: ... In the other branches everything is fine: these problems begin with commits [2] (jsonpath_gram.h) and [3] (gram.h) and in the master branch there're no such problems after commit [4]. The attached diff.patch fixes this issue for me. (IIUC internal headers generated by Bison are usually excluded from such checks so I also excluded gramparse.h and jsonpath_internal.h...) I'm not in favor of this patch because these files build fine on master, so there is no good reason to exclude them. We should arrange so that they build fine on PG16 as well. The problem is, not all the required headers are generated when invoking `make headerscheck`. The attached patch brings in some Makefile rules from master to make this work. Does this fix it for you? Everything seems to work with this patch, thank you! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
cpluspluscheck/headerscheck require build in REL_16_STABLE
Hello, hackers! When running cpluspluscheck/headerscheck on REL_16_STABLE [1] I found that everything was ok only if it was preceded by a build and make maintainer-clean was not used: $ ./configure --without-icu --with-perl --with-python > configure.txt && make -j16 > make.txt && make -j16 clean > clean.txt && make -j16 cpluspluscheck > cpluspluscheck.txt $ ./configure --without-icu --with-perl --with-python > configure_1.txt && make -j16 > make.txt && make -j16 distclean > clean.txt && ./configure --without-icu --with-perl --with-python > configure_2.txt && make -j16 cpluspluscheck > cpluspluscheck.txt Otherwise cpluspluscheck/headerscheck will fail: $ ./configure --without-icu --with-perl --with-python > configure_1.txt && make -j16 > make.txt && make -j16 maintainer-clean > clean.txt && ./configure --without-icu --with-perl --with-python > configure_2.txt && make -j16 cpluspluscheck > cpluspluscheck.txt $ ./configure --without-icu --with-perl --with-python > configure.txt && make -j16 cpluspluscheck > cpluspluscheck.txt In file included from /tmp/cpluspluscheck.Zy4645/test.cpp:3: /home/marina/postgrespro/postgrespro/src/backend/parser/gramparse.h:29:10: fatal error: gram.h: No such file or directory 29 | #include "gram.h" | ^~~~ compilation terminated. In file included from /tmp/cpluspluscheck.Zy4645/test.cpp:3: /home/marina/postgrespro/postgrespro/src/backend/utils/adt/jsonpath_internal.h:26:10: fatal error: jsonpath_gram.h: No such file or directory 26 | #include "jsonpath_gram.h" | ^ compilation terminated. make: *** [GNUmakefile:141: cpluspluscheck] Error 1 In the other branches everything is fine: these problems begin with commits [2] (jsonpath_gram.h) and [3] (gram.h) and in the master branch there're no such problems after commit [4]. The attached diff.patch fixes this issue for me. (IIUC internal headers generated by Bison are usually excluded from such checks so I also excluded gramparse.h and jsonpath_internal.h...) [1] https://github.com/postgres/postgres/commit/e177da5c87a10abac97c028bfb427bafb7353aa2 [2] https://github.com/postgres/postgres/commit/dac048f71ebbcf2f980d280711f8ff8001331c5d [3] https://github.com/postgres/postgres/commit/ecaf7c5df54f7fa9df2fdc7225d2bb4e283f0081 [4] https://github.com/postgres/postgres/commit/721856ff24b3722ce8e894e5a32c9c063cd48455 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/tools/pginclude/cpluspluscheck b/src/tools/pginclude/cpluspluscheck index 4e09c4686b30b973fd4ca2443d04b5228e904f4b..e77979c97ebd09dff18cb98090254bd3724de0ff 100755 --- a/src/tools/pginclude/cpluspluscheck +++ b/src/tools/pginclude/cpluspluscheck @@ -121,14 +121,15 @@ do # We can't make these Bison output files compilable standalone # without using "%code require", which old Bison versions lack. - # parser/gram.h will be included by parser/gramparse.h anyway. test "$f" = contrib/cube/cubeparse.h && continue test "$f" = contrib/seg/segparse.h && continue test "$f" = src/backend/bootstrap/bootparse.h && continue test "$f" = src/backend/parser/gram.h && continue + test "$f" = src/backend/parser/gramparse.h && continue test "$f" = src/backend/replication/repl_gram.h && continue test "$f" = src/backend/replication/syncrep_gram.h && continue test "$f" = src/backend/utils/adt/jsonpath_gram.h && continue + test "$f" = src/backend/utils/adt/jsonpath_internal.h && continue test "$f" = src/bin/pgbench/exprparse.h && continue test "$f" = src/pl/plpgsql/src/pl_gram.h && continue test "$f" = src/interfaces/ecpg/preproc/preproc.h && continue diff --git a/src/tools/pginclude/headerscheck b/src/tools/pginclude/headerscheck index 8dee1b56709d06a9ef1f5e80b956a079fa8a1e6a..30e0e47684e571e8c86459654a0ad37326d798e0 100755 --- a/src/tools/pginclude/headerscheck +++ b/src/tools/pginclude/headerscheck @@ -116,14 +116,15 @@ do # We can't make these Bison output files compilable standalone # without using "%code require", which old Bison versions lack. - # parser/gram.h will be included by parser/gramparse.h anyway. test "$f" = contrib/cube/cubeparse.h && continue test "$f" = contrib/seg/segparse.h && continue test "$f" = src/backend/bootstrap/bootparse.h && continue test "$f" = src/backend/parser/gram.h && continue + test "$f" = src/backend/parser/gramparse.h && continue test "$f" = src/backend/replication/repl_gram.h && continue test "$f" = src/backend/replication/syncrep_gram.h && continue test "$f" = src/backend/utils/adt/jsonpath_gram.h && continue + test "$f" = src/backend/utils/adt/jsonpath_internal.h && continue test "$f" = src/bin/pgbench/exprparse.h && continue test "$f" = src/pl/plpgsql/src/pl_gram.h && continue test "$f" = src/interfaces/ecpg/preproc/preproc.h && continue
Re: Conflict between regression tests namespace & transactions due to recent changes
On 2023-05-19 17:59, Tom Lane wrote: Marina Polyakova writes: On 2023-05-19 09:03, Michael Paquier wrote: FYI, the buildfarm is seeing some spurious failures as well: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=schnauzer=2023-05-1904%3A29%3A42 Yes, it is the same error. Here's another one in version 13: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=schnauzer=2023-05-18%2022:37:49 Right. I went ahead and pushed the fix in hopes of stabilizing things. (I went with "trans_abc" as the new table name, for consistency with some other nearby names.) regards, tom lane Thank you! I missed the same changes in version 11 :( -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Conflict between regression tests namespace & transactions due to recent changes
On 2023-05-19 09:03, Michael Paquier wrote: On Wed, May 17, 2023 at 02:39:10PM +0900, Michael Paquier wrote: On Tue, May 16, 2023 at 11:02:45AM +0300, Marina Polyakova wrote: It confuses me a little that different methods are used for the same purpose. But the namespace test checks schemas. So see diff_abc_to_txn_table.patch which replaces abc with txn_table in the transaction test. Looks OK seen from here. Thanks! FYI, the buildfarm is seeing some spurious failures as well: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=schnauzer=2023-05-19 04%3A29%3A42 -- Michael Yes, it is the same error. Here's another one in version 13: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=schnauzer=2023-05-18%2022:37:49 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Conflict between regression tests namespace & transactions due to recent changes
On 2023-05-16 02:19, Michael Paquier wrote: On Mon, May 15, 2023 at 11:23:18PM +0300, Marina Polyakova wrote: Maybe use a separate schema for all new objects in the transaction test?.. See diff_set_tx_schema.patch. Sure, you could do that to bypass the failure (without the "public" actually?), leaving non-generic names around. Still I'd agree with Tom here and just rename the objects to something more in line with the context of the test to make things a bit more greppable. These could be renamed as transaction_tab or transaction_view, for example. -- Michael It confuses me a little that different methods are used for the same purpose. But the namespace test checks schemas. So see diff_abc_to_txn_table.patch which replaces abc with txn_table in the transaction test. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/test/regress/expected/transactions.out b/src/test/regress/expected/transactions.out index 2b2cff7d9120a139532d1a6a2f2bc79e463bc4fa..21efb900ef1bc30338f1102f6a40f1d25be0db9c 100644 --- a/src/test/regress/expected/transactions.out +++ b/src/test/regress/expected/transactions.out @@ -609,10 +609,10 @@ drop function inverse(int); -- performed in the aborted subtransaction begin; savepoint x; -create table abc (a int); -insert into abc values (5); -insert into abc values (10); -declare foo cursor for select * from abc; +create table txn_table (a int); +insert into txn_table values (5); +insert into txn_table values (10); +declare foo cursor for select * from txn_table; fetch from foo; a --- @@ -625,11 +625,11 @@ fetch from foo; ERROR: cursor "foo" does not exist commit; begin; -create table abc (a int); -insert into abc values (5); -insert into abc values (10); -insert into abc values (15); -declare foo cursor for select * from abc; +create table txn_table (a int); +insert into txn_table values (5); +insert into txn_table values (10); +insert into txn_table values (15); +declare foo cursor for select * from txn_table; fetch from foo; a --- @@ -698,7 +698,7 @@ COMMIT; DROP FUNCTION create_temp_tab(); DROP FUNCTION invert(x float8); -- Tests for AND CHAIN -CREATE TABLE abc (a int); +CREATE TABLE txn_table (a int); -- set nondefault value so we have something to override below SET default_transaction_read_only = on; START TRANSACTION ISOLATION LEVEL REPEATABLE READ, READ WRITE, DEFERRABLE; @@ -720,8 +720,8 @@ SHOW transaction_deferrable; on (1 row) -INSERT INTO abc VALUES (1); -INSERT INTO abc VALUES (2); +INSERT INTO txn_table VALUES (1); +INSERT INTO txn_table VALUES (2); COMMIT AND CHAIN; -- TBLOCK_END SHOW transaction_isolation; transaction_isolation @@ -741,11 +741,11 @@ SHOW transaction_deferrable; on (1 row) -INSERT INTO abc VALUES ('error'); +INSERT INTO txn_table VALUES ('error'); ERROR: invalid input syntax for type integer: "error" -LINE 1: INSERT INTO abc VALUES ('error'); -^ -INSERT INTO abc VALUES (3); -- check it's really aborted +LINE 1: INSERT INTO txn_table VALUES ('error'); + ^ +INSERT INTO txn_table VALUES (3); -- check it's really aborted ERROR: current transaction is aborted, commands ignored until end of transaction block COMMIT AND CHAIN; -- TBLOCK_ABORT_END SHOW transaction_isolation; @@ -766,7 +766,7 @@ SHOW transaction_deferrable; on (1 row) -INSERT INTO abc VALUES (4); +INSERT INTO txn_table VALUES (4); COMMIT; START TRANSACTION ISOLATION LEVEL REPEATABLE READ, READ WRITE, DEFERRABLE; SHOW transaction_isolation; @@ -788,10 +788,10 @@ SHOW transaction_deferrable; (1 row) SAVEPOINT x; -INSERT INTO abc VALUES ('error'); +INSERT INTO txn_table VALUES ('error'); ERROR: invalid input syntax for type integer: "error" -LINE 1: INSERT INTO abc VALUES ('error'); -^ +LINE 1: INSERT INTO txn_table VALUES ('error'); + ^ COMMIT AND CHAIN; -- TBLOCK_ABORT_PENDING SHOW transaction_isolation; transaction_isolation @@ -811,7 +811,7 @@ SHOW transaction_deferrable; on (1 row) -INSERT INTO abc VALUES (5); +INSERT INTO txn_table VALUES (5); COMMIT; START TRANSACTION ISOLATION LEVEL REPEATABLE READ, READ WRITE, DEFERRABLE; SHOW transaction_isolation; @@ -873,7 +873,7 @@ SHOW transaction_deferrable; off (1 row) -INSERT INTO abc VALUES (6); +INSERT INTO txn_table VALUES (6); ROLLBACK AND CHAIN; -- TBLOCK_ABORT_PENDING SHOW transaction_isolation; transaction_isolation @@ -893,10 +893,10 @@ SHOW transaction_deferrable; off (1 row) -INSERT INTO abc VALUES ('error'); +INSERT INTO txn_table VALUES ('error'); ERROR: invalid input syntax for type integer: "error" -LINE 1: INSERT INTO abc VALUES ('error'); -^ +LINE 1: INSERT INTO txn_table VALUES ('error'); + ^ ROLLBACK AND CHAIN
Re: Conflict between regression tests namespace & transactions due to recent changes
On 2023-05-15 19:16, Tom Lane wrote: Marina Polyakova writes: IIUC the conflict was caused by +SET search_path to public, test_ns_schema_1; +CREATE SCHEMA test_ns_schema_2 + CREATE VIEW abc_view AS SELECT a FROM abc; because the parallel regression test transactions had already created the table abc and was trying to drop it. Hmm. I'd actually fix the blame on transactions.sql here. Creating a table named as generically as "abc" is horribly bad practice in a set of concurrent tests. namespace.sql is arguably okay, since it's creating that table name in a private schema. I'd be inclined to fix this by doing s/abc/something-else/g in transactions.sql. regards, tom lane Maybe use a separate schema for all new objects in the transaction test?.. See diff_set_tx_schema.patch. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/test/regress/expected/transactions.out b/src/test/regress/expected/transactions.out index 2b2cff7d9120a139532d1a6a2f2bc79e463bc4fa..6a3758bafb54327fbe341dfd89457e53ffce47dd 100644 --- a/src/test/regress/expected/transactions.out +++ b/src/test/regress/expected/transactions.out @@ -1,6 +1,10 @@ -- -- TRANSACTIONS -- +-- Use your schema to avoid conflicts with objects with the same names in +-- parallel tests. +CREATE SCHEMA test_tx_schema; +SET search_path TO test_tx_schema,public; BEGIN; CREATE TABLE xacttest (a smallint, b real); INSERT INTO xacttest VALUES diff --git a/src/test/regress/sql/transactions.sql b/src/test/regress/sql/transactions.sql index 7ee5f6aaa55a686f5484664c3fae224c08bde42a..0e6d6453edaa776425524d43abf5ae3cfca3ac45 100644 --- a/src/test/regress/sql/transactions.sql +++ b/src/test/regress/sql/transactions.sql @@ -2,6 +2,11 @@ -- TRANSACTIONS -- +-- Use your schema to avoid conflicts with objects with the same names in +-- parallel tests. +CREATE SCHEMA test_tx_schema; +SET search_path TO test_tx_schema,public; + BEGIN; CREATE TABLE xacttest (a smallint, b real);
Conflict between regression tests namespace & transactions due to recent changes
Hello, hackers! When running tests for version 15, we found a conflict between regression tests namespace & transactions due to recent changes [1]. diff -w -U3 .../src/test/regress/expected/transactions.out .../src/bin/pg_upgrade/tmp_check/results/transactions.out --- .../src/test/regress/expected/transactions.out ... +++ .../src/bin/pg_upgrade/tmp_check/results/transactions.out ... @@ -899,6 +899,9 @@ RESET default_transaction_read_only; DROP TABLE abc; +ERROR: cannot drop table abc because other objects depend on it +DETAIL: view test_ns_schema_2.abc_view depends on table abc +HINT: Use DROP ... CASCADE to drop the dependent objects too. -- Test assorted behaviors around the implicit transaction block created -- when multiple SQL commands are sent in a single Query message. These -- tests rely on the fact that psql will not break SQL commands apart at a ... IIUC the conflict was caused by +SET search_path to public, test_ns_schema_1; +CREATE SCHEMA test_ns_schema_2 + CREATE VIEW abc_view AS SELECT a FROM abc; because the parallel regression test transactions had already created the table abc and was trying to drop it. ISTM the patch diff.patch fixes this problem... [1] https://github.com/postgres/postgres/commit/dbd5795e7539ec9e15c0d4ed2d05b1b18d2a3b09 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/test/regress/expected/namespace.out b/src/test/regress/expected/namespace.out index a62fd8ded015e6ee56784e4341e5ac1a4ed5621f..bcb6a75cf230fdc30f728201c2a2baef1838ed18 100644 --- a/src/test/regress/expected/namespace.out +++ b/src/test/regress/expected/namespace.out @@ -35,10 +35,15 @@ SHOW search_path; -- verify that the correct search_path preserved -- after creating the schema and on commit +-- create your table to not use the same table from transactions test BEGIN; SET search_path to public, test_ns_schema_1; CREATE SCHEMA test_ns_schema_2 - CREATE VIEW abc_view AS SELECT a FROM abc; + CREATE VIEW abc_view AS SELECT a FROM abc + CREATE TABLE abc ( + a serial, + b int UNIQUE + ); SHOW search_path; search_path -- @@ -53,7 +58,9 @@ SHOW search_path; (1 row) DROP SCHEMA test_ns_schema_2 CASCADE; -NOTICE: drop cascades to view test_ns_schema_2.abc_view +NOTICE: drop cascades to 2 other objects +DETAIL: drop cascades to table test_ns_schema_2.abc +drop cascades to view test_ns_schema_2.abc_view -- verify that the objects were created SELECT COUNT(*) FROM pg_class WHERE relnamespace = (SELECT oid FROM pg_namespace WHERE nspname = 'test_ns_schema_1'); diff --git a/src/test/regress/sql/namespace.sql b/src/test/regress/sql/namespace.sql index 3474f5ecf4215068fd89ec52f9ee8b95ddff36bb..95b9ed7b00807222e94e30221348f58039c63257 100644 --- a/src/test/regress/sql/namespace.sql +++ b/src/test/regress/sql/namespace.sql @@ -28,10 +28,16 @@ SHOW search_path; -- verify that the correct search_path preserved -- after creating the schema and on commit +-- create your table to not use the same table from transactions test BEGIN; SET search_path to public, test_ns_schema_1; CREATE SCHEMA test_ns_schema_2 - CREATE VIEW abc_view AS SELECT a FROM abc; + CREATE VIEW abc_view AS SELECT a FROM abc + + CREATE TABLE abc ( + a serial, + b int UNIQUE + ); SHOW search_path; COMMIT; SHOW search_path;
Re: Fix order of checking ICU options in initdb and create database
On 2022-11-12 22:43, Jose Arthur Benetasso Villanova wrote: Hello Marina. I just reviewed your patch. It applied cleanly at my current master (commit d6a3dbe14f98d867b2fc3faeb99d2d3c2a48ca67). Also, it worked as described in email. Since it's a clarification in an error message, I think the documentation is fine. I played a bit with "make check", creating a database in my native language (pt_BR), testing with some data and everything worked as expected. Hello! Thank you! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Fix order of checking ICU options in initdb and create database
On 2022-10-29 14:33, Marina Polyakova wrote: Hello! This is the last proposed patch on this subject [1] moved to a separate thread for Commitfest.. Also added a patch to export with_icu when running src/bin/scripts tests [1]. The problem can be reproduced as $ meson setup && ninja && meson test --print-errorlogs --suite setup --suite scripts [1] https://cirrus-ci.com/task/4825664661487616 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres CompanyFrom a6de5922fc533c88c7288051d8797d021ae91dae Mon Sep 17 00:00:00 2001 From: Marina Polyakova Date: Sat, 29 Oct 2022 14:25:05 +0300 Subject: [PATCH v1 2/2] Fix order of checking ICU options in initdb and create database First check if ICU is supported in this build. Then check that the selected encoding is supported BY ICU (the encoding can be set from lc_ctype which can be set from an environment variable). Only then check the option for the ICU locale. --- src/backend/commands/dbcommands.c | 6 ++ src/backend/utils/adt/pg_locale.c | 10 ++--- src/bin/initdb/initdb.c | 36 ++- src/bin/initdb/t/001_initdb.pl| 21 ++ src/bin/scripts/t/020_createdb.pl | 31 ++ src/include/utils/pg_locale.h | 2 +- 6 files changed, 69 insertions(+), 37 deletions(-) diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 16e422138b..4c1e79fca3 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -1030,6 +1030,11 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) if (dblocprovider == COLLPROVIDER_ICU) { +#ifndef USE_ICU + ereport(ERROR, +(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("ICU is not supported in this build"))); +#else if (!(is_encoding_supported_by_icu(encoding))) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -1046,6 +1051,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) errmsg("ICU locale must be specified"))); check_icu_locale(dbiculocale); +#endif } else { diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 2b42d9ccd8..743f11e1d1 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1945,15 +1945,12 @@ icu_set_collation_attributes(UCollator *collator, const char *loc) } } -#endif /* USE_ICU */ - /* * Check if the given locale ID is valid, and ereport(ERROR) if it isn't. */ void check_icu_locale(const char *icu_locale) { -#ifdef USE_ICU UCollator *collator; UErrorCode status; @@ -1967,13 +1964,10 @@ check_icu_locale(const char *icu_locale) if (U_ICU_VERSION_MAJOR_NUM < 54) icu_set_collation_attributes(collator, icu_locale); ucol_close(collator); -#else - ereport(ERROR, - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), - errmsg("ICU is not supported in this build"))); -#endif } +#endif /* USE_ICU */ + /* * These functions convert from/to libc's wchar_t, *not* pg_wchar_t. * Therefore we keep them here rather than with the mbutils code. diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index f61a043055..82e6644f89 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -2047,9 +2047,12 @@ check_locale_encoding(const char *locale, int user_enc) * * this should match the similar check in the backend createdb() function */ -static bool +static void check_icu_locale_encoding(int user_enc) { +#ifndef USE_ICU + pg_fatal("ICU is not supported in this build"); +#else if (!(is_encoding_supported_by_icu(user_enc))) { pg_log_error("encoding mismatch"); @@ -2058,9 +2061,17 @@ check_icu_locale_encoding(int user_enc) pg_log_error_hint("Rerun %s and either do not specify an encoding explicitly, " "or choose a matching combination.", progname); - return false; + exit(1); } - return true; + + if (!icu_locale) + pg_fatal("ICU locale must be specified"); + + /* + * In supported builds, the ICU locale ID will be checked by the backend + * during post-bootstrap initialization. + */ +#endif } /* @@ -2113,20 +2124,6 @@ setlocales(void) check_locale_name(LC_CTYPE, lc_messages, ); lc_messages = canonname; #endif - - if (locale_provider == COLLPROVIDER_ICU) - { - if (!icu_locale) - pg_fatal("ICU locale must be specified"); - - /* - * In supported builds, the ICU locale ID will be checked by the - * backend during post-bootstrap initialization. - */ -#ifndef USE_ICU - pg_fatal("ICU is not supported in this build"); -#endif - } } /* @@ -2388,9 +2385,8 @@ setup_locale_encoding(void) !check_locale_encoding(lc_collate, encodingid)) exit(1);/* check_locale_encoding printed the error */ - if (locale_provider == COLLPROVIDER_ICU && - !check_icu_locale_enc
Fix order of checking ICU options in initdb and create database
Hello! This is the last proposed patch on this subject [1] moved to a separate thread for Commitfest.. It looks like that the current order of checking ICU options in initdb and create database in PG 15+ is not user-friendly. Examples: 1. initdb reports a missing ICU locale although it may already report that the selected encoding cannot be used: $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: ICU locale must be specified $ initdb --encoding sql-ascii --locale-provider icu --icu-locale en-US hoge ... initdb: error: encoding mismatch initdb: detail: The encoding you selected (SQL_ASCII) is not supported with the ICU provider. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. 2. initdb/create database report problems with the ICU locale/encoding although they may already report that ICU is not supported in this build: 2.1. $ initdb --locale-provider icu hoge ... initdb: error: ICU locale must be specified $ initdb --locale-provider icu --icu-locale en-US hoge ... initdb: error: ICU is not supported in this build $ createdb --locale-provider icu hoge createdb: error: database creation failed: ERROR: ICU locale must be specified $ createdb --locale-provider icu --icu-locale en-US hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build 2.2. $ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii hoge createdb: error: database creation failed: ERROR: encoding "SQL_ASCII" is not supported with ICU provider $ createdb --locale-provider icu --icu-locale en-US --encoding utf8 hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build The patch v1-0001-Fix-order-of-checking-ICU-options-in-initdb-and-c.patch fixes this: 1. $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: encoding mismatch initdb: detail: The encoding you selected (SQL_ASCII) is not supported with the ICU provider. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. 2.1. $ initdb --locale-provider icu hoge ... initdb: error: ICU is not supported in this build $ createdb --locale-provider icu hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build 2.2. $ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build A side effect of the proposed patch in initdb is that if ICU locale is missed (or ICU is not supported in this build), the provider, locales and encoding are reported before the error message: $ initdb --locale-provider icu hoge The files belonging to this database system will be owned by user "marina". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider:icu LC_COLLATE: en_US.UTF-8 LC_CTYPE:en_US.UTF-8 LC_MESSAGES: en_US.UTF-8 LC_MONETARY: ru_RU.UTF-8 LC_NUMERIC: ru_RU.UTF-8 LC_TIME: ru_RU.UTF-8 The default database encoding has been set to "UTF8". initdb: error: ICU locale must be specified $ initdb --locale-provider icu hoge The files belonging to this database system will be owned by user "marina". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider:icu LC_COLLATE: en_US.UTF-8 LC_CTYPE:en_US.UTF-8 LC_MESSAGES: en_US.UTF-8 LC_MONETARY: ru_RU.UTF-8 LC_NUMERIC: ru_RU.UTF-8 LC_TIME: ru_RU.UTF-8 The default database encoding has been set to "UTF8". initdb: error: ICU is not supported in this build I was thinking about another master-only version of the patch that first checks everything for provider, locales and encoding but IMO it is worse [2].. [1] https://www.postgresql.org/message-id/e94aca035bf0b92fac42d204ad385552%40postgrespro.ru [2] https://www.postgresql.org/message-id/79f410460c4fc9534000785adb8bf39a%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres CompanyFrom da761095a7e39463ec6083484c4b1b924cacf387 Mon Sep 17 00:00:00 2001 From: Marina Polyakova Date: Sat, 29 Oct 2022 14:25:05 +0300 Subject: [PATCH v1] Fix order of checking ICU options in initdb and create database First check if ICU is supported in this build. Then check that the selected encoding is supported BY ICU (the encoding can be set from lc_ctype which can be set from an environment variable). Only then check the option for the ICU locale. --- src/backend/commands/dbcommands.c | 6 ++ src/backend/utils/adt/pg_locale.c | 10 ++--- src/bin/initdb/initdb.c | 36 ++- src/bin/initdb/t/001_initdb.pl| 21 ++ src/bin/scripts/t/020_create
Fix database creation during installchecks for ICU cluster
Hello! This is a copy of [1] moved to a separated thread for Commitfest.. I discovered an interesting behaviour during installcheck runs on PG 15+ when the cluster was initialized with ICU locale provider: $ initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start 1) The ECPG tests fail because they use the SQL_ASCII encoding [2], the database template0 uses the ICU locale provider and SQL_ASCII is not supported by ICU: $ make -C src/interfaces/ecpg/ installcheck ... == creating database "ecpg1_regression" == ERROR: encoding "SQL_ASCII" is not supported with ICU provider ERROR: database "ecpg1_regression" does not exist command failed: "/home/marina/postgresql/master/my/inst/bin/psql" -X -c "CREATE DATABASE \"ecpg1_regression\" TEMPLATE=template0 ENCODING='SQL_ASCII'" -c "ALTER DATABASE \"ecpg1_regression\" SET lc_messages TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_monetary TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_numeric TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_time TO 'C';ALTER DATABASE \"ecpg1_regression\" SET bytea_output TO 'hex';ALTER DATABASE \"ecpg1_regression\" SET timezone_abbreviations TO 'Default';" "postgres" 2) The option --no-locale in pg_regress is described as "use C locale" [3]. But in this case the created databases actually use the ICU locale provider with the ICU cluster locale from template0 (see diff_check_backend_used_provider.txt): $ make NO_LOCALE=1 installcheck In regression.diffs: diff -U3 /home/marina/postgresql/master/src/test/regress/expected/test_setup.out /home/marina/postgresql/master/src/test/regress/results/test_setup.out --- /home/marina/postgresql/master/src/test/regress/expected/test_setup.out 2022-09-27 05:31:27.674628815 +0300 +++ /home/marina/postgresql/master/src/test/regress/results/test_setup.out 2022-10-21 15:09:31.232992885 +0300 @@ -143,6 +143,798 @@ \set filename :abs_srcdir '/data/person.data' COPY person FROM :'filename'; VACUUM ANALYZE person; +NOTICE: varstrfastcmp_locale sss->collate_c 0 sss->locale 0xefacd0 +NOTICE: varstrfastcmp_locale sss->locale->provider i +NOTICE: varstrfastcmp_locale sss->locale->info.icu.locale en-US ... The patch v1-0001-Fix-database-creation-during-installchecks-for-IC.patch fixes both issues for me. [1] https://www.postgresql.org/message-id/727b5d5160f845dcf5e0818e625a6e56%40postgrespro.ru [2] https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/interfaces/ecpg/test/Makefile#L18 [3] https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/test/regress/pg_regress.c#L1992 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c index b57ed946c42bb54ede800e95045aa937a8dbad85..b3c0f6f753f8428274389844ccf9778a7ed47ea4 100644 --- a/src/backend/access/hash/hashfunc.c +++ b/src/backend/access/hash/hashfunc.c @@ -281,6 +281,14 @@ hashtext(PG_FUNCTION_ARGS) if (!lc_collate_is_c(collid)) mylocale = pg_newlocale_from_collation(collid); + elog(NOTICE, "hashtext lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale); + if (mylocale) + { + elog(NOTICE, "hashtext mylocale->provider %c", mylocale->provider); + if (mylocale->provider == COLLPROVIDER_ICU) + elog(NOTICE, "hashtext mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)"); + } + if (!mylocale || mylocale->deterministic) { result = hash_any((unsigned char *) VARDATA_ANY(key), @@ -337,6 +345,14 @@ hashtextextended(PG_FUNCTION_ARGS) if (!lc_collate_is_c(collid)) mylocale = pg_newlocale_from_collation(collid); + elog(NOTICE, "hashtextextended lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale); + if (mylocale) + { + elog(NOTICE, "hashtextextended mylocale->provider %c", mylocale->provider); + if (mylocale->provider == COLLPROVIDER_ICU) + elog(NOTICE, "hashtextextended mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)"); + } + if (!mylocale || mylocale->deterministic) { result = hash_any_extended((unsigned char *) VARDATA_ANY(key), diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c index 02d462a659778016f3c4479d425ba0a84feb6e26..9627c84a7ccfb4c4013556a51c989e9e6d611634 100644 --- a/src/backend/regex/regc_pg_locale.c +++ b/src/backend/regex/regc_pg_locale.c @@ -243,6 +243,8 @@ pg_set_regex_collation(Oid collation) errhint("Use the COLLATE clause to
Re: ICU for global collation
Hello! I discovered an interesting behaviour during installcheck runs when the cluster was initialized with ICU locale provider: $ initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start 1) The ECPG tests fail because they use the SQL_ASCII encoding [1], the database template0 uses the ICU locale provider and SQL_ASCII is not supported by ICU: $ make -C src/interfaces/ecpg/ installcheck ... == creating database "ecpg1_regression" == ERROR: encoding "SQL_ASCII" is not supported with ICU provider ERROR: database "ecpg1_regression" does not exist command failed: "/home/marina/postgresql/master/my/inst/bin/psql" -X -c "CREATE DATABASE \"ecpg1_regression\" TEMPLATE=template0 ENCODING='SQL_ASCII'" -c "ALTER DATABASE \"ecpg1_regression\" SET lc_messages TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_monetary TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_numeric TO 'C';ALTER DATABASE \"ecpg1_regression\" SET lc_time TO 'C';ALTER DATABASE \"ecpg1_regression\" SET bytea_output TO 'hex';ALTER DATABASE \"ecpg1_regression\" SET timezone_abbreviations TO 'Default';" "postgres" 2) The option --no-locale in pg_regress is described as "use C locale" [2]. But in this case the created databases actually use the ICU locale provider with the ICU cluster locale from template0 (see diff_check_backend_used_provider.patch): $ make NO_LOCALE=1 installcheck In regression.diffs: diff -U3 /home/marina/postgresql/master/src/test/regress/expected/test_setup.out /home/marina/postgresql/master/src/test/regress/results/test_setup.out --- /home/marina/postgresql/master/src/test/regress/expected/test_setup.out 2022-09-27 05:31:27.674628815 +0300 +++ /home/marina/postgresql/master/src/test/regress/results/test_setup.out 2022-10-21 15:09:31.232992885 +0300 @@ -143,6 +143,798 @@ \set filename :abs_srcdir '/data/person.data' COPY person FROM :'filename'; VACUUM ANALYZE person; +NOTICE: varstrfastcmp_locale sss->collate_c 0 sss->locale 0xefacd0 +NOTICE: varstrfastcmp_locale sss->locale->provider i +NOTICE: varstrfastcmp_locale sss->locale->info.icu.locale en-US ... The patch diff_fix_pg_regress_create_database.patch fixes both issues for me. [1] https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/interfaces/ecpg/test/Makefile#L18 [2] https://github.com/postgres/postgres/blob/ce20f8b9f4354b46b40fd6ebf7ce5c37d08747e0/src/test/regress/pg_regress.c#L1992 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c index b57ed946c42bb54ede800e95045aa937a8dbad85..b3c0f6f753f8428274389844ccf9778a7ed47ea4 100644 --- a/src/backend/access/hash/hashfunc.c +++ b/src/backend/access/hash/hashfunc.c @@ -281,6 +281,14 @@ hashtext(PG_FUNCTION_ARGS) if (!lc_collate_is_c(collid)) mylocale = pg_newlocale_from_collation(collid); + elog(NOTICE, "hashtext lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale); + if (mylocale) + { + elog(NOTICE, "hashtext mylocale->provider %c", mylocale->provider); + if (mylocale->provider == COLLPROVIDER_ICU) + elog(NOTICE, "hashtext mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)"); + } + if (!mylocale || mylocale->deterministic) { result = hash_any((unsigned char *) VARDATA_ANY(key), @@ -337,6 +345,14 @@ hashtextextended(PG_FUNCTION_ARGS) if (!lc_collate_is_c(collid)) mylocale = pg_newlocale_from_collation(collid); + elog(NOTICE, "hashtextextended lc_collate_is_c(collid) %d mylocale %p", lc_collate_is_c(collid), mylocale); + if (mylocale) + { + elog(NOTICE, "hashtextextended mylocale->provider %c", mylocale->provider); + if (mylocale->provider == COLLPROVIDER_ICU) + elog(NOTICE, "hashtextextended mylocale->info.icu.locale %s", mylocale->info.icu.locale ? mylocale->info.icu.locale : "(null)"); + } + if (!mylocale || mylocale->deterministic) { result = hash_any_extended((unsigned char *) VARDATA_ANY(key), diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c index 02d462a659778016f3c4479d425ba0a84feb6e26..9627c84a7ccfb4c4013556a51c989e9e6d611634 100644 --- a/src/backend/regex/regc_pg_locale.c +++ b/src/backend/regex/regc_pg_locale.c @@ -243,6 +243,8 @@ pg_set_regex_collation(Oid collation) errhint("Use the COLLATE clause to set the collation explicitly."))); } + elog(NOTICE, "pg_set_regex_collation lc_ctype_is_c(collid) %d", lc_ctype_is_c(collation)); + if (lc_ctype_is_c(collation)) { /* C/
Re: ICU for global collation
On 2022-10-01 15:07, Peter Eisentraut wrote: On 22.09.22 20:06, Marina Polyakova wrote: On 2022-09-21 17:53, Peter Eisentraut wrote: Committed with that test, thanks. I think that covers all the ICU issues you reported for PG15 for now? I thought about the order of the ICU checks - if it is ok to check that the selected encoding is supported by ICU after printing all the locale & encoding information, why not to move almost all the ICU checks here?.. It's possible that we can do better, but I'm not going to add things like that to PG 15 at this point unless it fixes a faulty behavior. Will PG 15 always have this order of ICU checks, is the current behaviour correct enough? On the other hand, there may be a better fix for PG 16+ and not all changes can be backported... On 2022-09-16 10:56, Peter Eisentraut wrote: On 15.09.22 17:41, Marina Polyakova wrote: I agree with you. Here's another version of the patch. The locale/encoding checks and reports in initdb have been reordered, because now the encoding is set first and only then the ICU locale is checked. I committed something based on the first version of your patch. This reordering of the messages here was a little too much surgery for me at this point. For instance, there are also messages in #ifdef WIN32 code that would need to be reordered as well. I kept the overall structure of the code the same and just inserted the additional proposed checks. If you want to pursue the reordering of the checks and messages overall, a patch for the master branch could be considered. I've worked on this again (see attached patch) but I'm not sure if the messages of encoding mismatches are clear enough without the full locale information. For $ initdb -D data --icu-locale en --locale-provider icu compare the outputs: The database cluster will be initialized with this locale configuration: provider:icu ICU locale: en LC_COLLATE: de_DE.iso885915@euro LC_CTYPE:de_DE.iso885915@euro LC_MESSAGES: en_US.utf8 LC_MONETARY: de_DE.iso885915@euro LC_NUMERIC: de_DE.iso885915@euro LC_TIME: de_DE.iso885915@euro The default database encoding has been set to "UTF8". initdb: error: encoding mismatch initdb: detail: The encoding you selected (UTF8) and the encoding that the selected locale uses (LATIN9) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. and Encoding "UTF8" implied by locale will be set as the default database encoding. initdb: error: encoding mismatch initdb: detail: The encoding you selected (UTF8) and the encoding that the selected locale uses (LATIN9) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. The same without ICU, e.g. for $ initdb -D data the output with locale information: The database cluster will be initialized with this locale configuration: provider:libc LC_COLLATE: en_US.utf8 LC_CTYPE:de_DE.iso885915@euro LC_MESSAGES: en_US.utf8 LC_MONETARY: de_DE.iso885915@euro LC_NUMERIC: de_DE.iso885915@euro LC_TIME: de_DE.iso885915@euro The default database encoding has accordingly been set to "LATIN9". initdb: error: encoding mismatch initdb: detail: The encoding you selected (LATIN9) and the encoding that the selected locale uses (UTF8) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. and the "shorter" version: Encoding "LATIN9" implied by locale will be set as the default database encoding. initdb: error: encoding mismatch initdb: detail: The encoding you selected (LATIN9) and the encoding that the selected locale uses (UTF8) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. BTW, what did you mean that "there are also messages in #ifdef WIN32 code that would need to be reordered as well"?.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 96b46cbc020ef8b85b6d54d3d4ca8ad116277832..242d68f58287aeb6f95619c2ce8b78e38433cf18 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -1030,6 +1030,11 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) if (dblocprovider == COLLPROVIDER_ICU) { +#ifndef USE_ICU + ereport(ERROR, +(errcode(ERRCODE_FEATURE_NOT_SUPPORTE
Re: ICU for global collation
On 2022-09-21 17:53, Peter Eisentraut wrote: Committed with that test, thanks. I think that covers all the ICU issues you reported for PG15 for now? I thought about the order of the ICU checks - if it is ok to check that the selected encoding is supported by ICU after printing all the locale & encoding information, why not to move almost all the ICU checks here?.. Examples of the work of the attached patch: 1. ICU locale vs supported encoding: 1.1. $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: encoding mismatch initdb: detail: The encoding you selected (SQL_ASCII) is not supported with the ICU provider. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. 1.2. (like before) $ initdb --encoding sql-ascii --icu-locale en-US hoge initdb: error: --icu-locale cannot be specified unless locale provider "icu" is chosen $ createdb --encoding sql-ascii --icu-locale en-US hoge createdb: error: database creation failed: ERROR: ICU locale cannot be specified unless locale provider is ICU 2. For builds without ICU: 2.1. $ initdb --locale-provider icu hoge ... initdb: error: ICU is not supported in this build $ createdb --locale-provider icu hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build 2.2. (like before) $ initdb --icu-locale en-US hoge initdb: error: --icu-locale cannot be specified unless locale provider "icu" is chosen $ createdb --icu-locale en-US hoge createdb: error: database creation failed: ERROR: ICU locale cannot be specified unless locale provider is ICU 2.3. $ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build 4. About errors in initdb: 4.1. If icu_locale is not specified, but it is required, then we get this: $ initdb --locale-provider icu hoge The files belonging to this database system will be owned by user "marina". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider:icu LC_COLLATE: en_US.UTF-8 LC_CTYPE:en_US.UTF-8 LC_MESSAGES: en_US.UTF-8 LC_MONETARY: ru_RU.UTF-8 LC_NUMERIC: ru_RU.UTF-8 LC_TIME: ru_RU.UTF-8 The default database encoding has been set to "UTF8". initdb: error: ICU locale must be specified Almost the same if ICU is not supported in this build: $ initdb --locale-provider icu hoge The files belonging to this database system will be owned by user "marina". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider:icu LC_COLLATE: en_US.UTF-8 LC_CTYPE:en_US.UTF-8 LC_MESSAGES: en_US.UTF-8 LC_MONETARY: ru_RU.UTF-8 LC_NUMERIC: ru_RU.UTF-8 LC_TIME: ru_RU.UTF-8 The default database encoding has been set to "UTF8". initdb: error: ICU is not supported in this build 4.2. If icu_locale is specified for the wrong provider, the error will be at the beginning of the program start as before: $ initdb --icu-locale en-US hoge initdb: error: --icu-locale cannot be specified unless locale provider "icu" is chosen -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index e0753c1badcc299e2bac45f3bdd2f23f59d70cbc..2589a2523e07c9543c99c7d7b446438d62382b89 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -1030,6 +1030,11 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) if (dblocprovider == COLLPROVIDER_ICU) { +#ifndef USE_ICU + ereport(ERROR, +(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("ICU is not supported in this build"))); +#else if (!(is_encoding_supported_by_icu(encoding))) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -1046,6 +1051,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) errmsg("ICU locale must be specified"))); check_icu_locale(dbiculocale); +#endif } else { diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 2b42d9ccd8b751bcb5cda3a1f4c7803a68bc0a4a..743f11e1d1fd62d500fc2846976472d13e08046b 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1945,15 +1945,12 @@ icu_set_collation_attributes(UCollator *collator, const char *loc) } } -#endif /* USE_ICU */ - /* * Check if the given locale ID is valid, and ereport(ERROR) if it isn't. */ void check_icu_locale(const char *icu_locale) { -#ifdef USE_ICU UCollator *collator; UErrorCode status; @@ -1967,13 +1964,10 @@ check_icu_locale(const char *icu_locale) if (U_ICU_VERSION_MAJOR_NUM < 54) icu_set_collati
Re: ICU for global collation
On 2022-09-20 12:59, Peter Eisentraut wrote: On 17.09.22 10:33, Marina Polyakova wrote: 3. The locale provider is ICU, but it has not yet been set from the template database: $ initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start && createdb --icu-locale ru-RU --template template0 mydb ... createdb: error: database creation failed: ERROR: ICU locale cannot be specified unless locale provider is ICU Please see attached patch for a fix. Does that work for you? Yes, it works. The following test checks this fix: diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl index b87d8fc63b5246b02bcd4499aae815269b60df7c..c2464a99618cd7ca5616cc21121e1e4379b52baf 100644 --- a/src/bin/scripts/t/020_createdb.pl +++ b/src/bin/scripts/t/020_createdb.pl @@ -71,6 +71,14 @@ if ($ENV{with_icu} eq 'yes') $node2->command_ok( [ 'createdb', '-T', 'template0', '--locale-provider=libc', 'foobar55' ], 'create database with libc provider from template database with icu provider'); + + $node2->command_ok( + [ + 'createdb', '-T', 'template0', '--icu-locale', + 'en-US', 'foobar56' + ], + 'create database with icu locale from template database with icu provider' + ); } else { -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: ICU for global collation
On 2022-09-16 10:56, Peter Eisentraut wrote: On 15.09.22 17:41, Marina Polyakova wrote: I agree with you. Here's another version of the patch. The locale/encoding checks and reports in initdb have been reordered, because now the encoding is set first and only then the ICU locale is checked. I committed something based on the first version of your patch. This reordering of the messages here was a little too much surgery for me at this point. For instance, there are also messages in #ifdef WIN32 code that would need to be reordered as well. I kept the overall structure of the code the same and just inserted the additional proposed checks. If you want to pursue the reordering of the checks and messages overall, a patch for the master branch could be considered. Thank you! I already wrote about the order of the ICU checks in initdb/create database, they were the only reason to propose such changes... -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: ICU for global collation
Thanks to Kyotaro Horiguchi review we found out that there're interesting cases due to the order of some ICU checks: 1. ICU locale vs supported encoding: 1.1. On 2022-09-15 09:52, Kyotaro Horiguchi wrote: If I executed initdb as follows, I would be told to specify --icu-locale option. $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: ICU locale must be specified However, when I reran the command, it complains about incompatible encoding this time. I think it's more user-friendly to check for the encoding compatibility before the check for missing --icu-locale option. 1.2. (ok?) $ initdb --encoding sql-ascii --icu-locale en-US hoge initdb: error: --icu-locale cannot be specified unless locale provider "icu" is chosen $ initdb --encoding sql-ascii --icu-locale en-US --locale-provider icu hoge ... initdb: error: encoding mismatch initdb: detail: The encoding you selected (SQL_ASCII) is not supported with the ICU provider. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. $ createdb --encoding sql-ascii --icu-locale en-US hoge createdb: error: database creation failed: ERROR: ICU locale cannot be specified unless locale provider is ICU $ createdb --encoding sql-ascii --icu-locale en-US --locale-provider icu hoge createdb: error: database creation failed: ERROR: encoding "SQL_ASCII" is not supported with ICU provider 2. For builds without ICU: 2.1. $ initdb --locale-provider icu hoge ... initdb: error: ICU locale must be specified $ initdb --locale-provider icu --icu-locale en-US hoge ... initdb: error: ICU is not supported in this build $ createdb --locale-provider icu hoge createdb: error: database creation failed: ERROR: ICU locale must be specified $ createdb --locale-provider icu --icu-locale en-US hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build IMO, it would be more user-friendly to inform an unsupported build in the first runs too.. 2.2. (ok?) $ initdb --icu-locale en-US hoge initdb: error: --icu-locale cannot be specified unless locale provider "icu" is chosen $ initdb --icu-locale en-US --locale-provider icu hoge ... initdb: error: ICU is not supported in this build $ createdb --icu-locale en-US hoge createdb: error: database creation failed: ERROR: ICU locale cannot be specified unless locale provider is ICU $ createdb --icu-locale en-US --locale-provider icu hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build 2.3. $ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii hoge createdb: error: database creation failed: ERROR: encoding "SQL_ASCII" is not supported with ICU provider $ createdb --locale-provider icu --icu-locale en-US --encoding utf8 hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build IMO, it would be more user-friendly to inform an unsupported build in the first run too.. 3. The locale provider is ICU, but it has not yet been set from the template database: $ initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start && createdb --icu-locale ru-RU --template template0 mydb ... createdb: error: database creation failed: ERROR: ICU locale cannot be specified unless locale provider is ICU -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: ICU for global collation
On 2022-09-16 11:11, Kyotaro Horiguchi wrote: At Fri, 16 Sep 2022 09:49:28 +0300, Marina Polyakova wrote in In continuation of options check: AFAICS the following checks in initdb if (locale_provider == COLLPROVIDER_ICU) { if (!icu_locale) pg_fatal("ICU locale must be specified"); /* * In supported builds, the ICU locale ID will be checked by the * backend during post-bootstrap initialization. */ #ifndef USE_ICU pg_fatal("ICU is not supported in this build"); #endif } are executed approximately when they are executed in create database after getting all the necessary data from the template database: initdb doesn't work that way, but anyway, I realized that I am proposing to move that code in setlocales() to the caller function as the result. I don't think setlocales() is the place for the code because icu locale has no business with what the function does. That being said there's no obvious reason we *need* to move the code out to its caller. Excuse me, but could you explain your last sentence in more detail? I read that this code is not for setlocales and then - that it should not moved from here, so I'm confused... +errmsg("encoding \"%s\" is not supported with ICU provider", + pg_log_error("encoding \"%s\" is not supported with ICU provider", +pg_encoding_to_char(encodingid)); I might be wrong, but the messages look wrong to me. The alternatives below might work. "encoding \"%s\" is not supported by ICU" "encoding \"%s\" cannot be used for/with ICU locales" The message indicates that the selected encoding cannot be used with the ICU provider because it does not support it. But if the text of the message becomes better and clearer, I will only be glad. + pg_log_error_hint("Rerun %s and choose a matching combination.", + progname); This doesn't seem to provide users with useful information. It was commited in more verbose form: pg_log_error_hint("Rerun %s and either do not specify an encoding explicitly, " "or choose a matching combination.", -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: ICU for global collation
On 2022-09-16 10:57, Peter Eisentraut wrote: On 16.09.22 09:31, Marina Polyakova wrote: IMO it is hardly understantable from the program output either - it looks like I manually chose the encoding UTF8. Maybe first inform about selected encoding?.. Yes, I included something like that in the patch just committed. diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 6aeec8d426c52414b827686781c245291f27ed1f..348bbbeba0f5bc7ff601912bf883510d580b814c 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -2310,7 +2310,11 @@ setup_locale_encoding(void) } if (!encoding && locale_provider == COLLPROVIDER_ICU) + { encodingid = PG_UTF8; + printf(_("The default database encoding has been set to \"%s\" for a better experience with the ICU provider.\n"), + pg_encoding_to_char(encodingid)); + } else if (!encoding) { int ctype_enc; Thank you! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: ICU for global collation
On 2022-09-16 07:55, Kyotaro Horiguchi wrote: At Thu, 15 Sep 2022 18:41:31 +0300, Marina Polyakova wrote in P.S. While working on the patch, I discovered that UTF8 encoding is always used for the ICU provider in initdb unless it is explicitly specified by the user: if (!encoding && locale_provider == COLLPROVIDER_ICU) encodingid = PG_UTF8; IMO this creates additional errors for locales with other encodings: $ initdb --locale de_DE.iso885915@euro --locale-provider icu --icu-locale de-DE ... initdb: error: encoding mismatch initdb: detail: The encoding you selected (UTF8) and the encoding that the selected locale uses (LATIN9) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. And ICU supports many encodings, see the contents of pg_enc2icu_tbl in encnames.c... It seems to me the best default that fits almost all cases using icu locales. So, we need to specify encoding explicitly in that case. $ initdb --encoding iso-8859-15 --locale de_DE.iso885915@euro --locale-provider icu --icu-locale de-DE However, I think it is hardly understantable from the documentation. (I checked this using euc-jp [1] so it might be wrong..) [1] initdb --encoding euc-jp --locale ja_JP.eucjp --locale-provider icu --icu-locale ja-x-icu regards. Thank you! IMO it is hardly understantable from the program output either - it looks like I manually chose the encoding UTF8. Maybe first inform about selected encoding?.. diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 6aeec8d426c52414b827686781c245291f27ed1f..348bbbeba0f5bc7ff601912bf883510d580b814c 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -2310,7 +2310,11 @@ setup_locale_encoding(void) } if (!encoding && locale_provider == COLLPROVIDER_ICU) + { encodingid = PG_UTF8; + printf(_("The default database encoding has been set to \"%s\" for a better experience with the ICU provider.\n"), + pg_encoding_to_char(encodingid)); + } else if (!encoding) { int ctype_enc; ISTM that such choices (e.g. UTF8 for Windows in some cases) are described in the documentation [1] as By default, initdb uses the locale provider libc, takes the locale settings from the environment, and determines the encoding from the locale settings. This is almost always sufficient, unless there are special requirements. [1] https://www.postgresql.org/docs/devel/app-initdb.html -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: ICU for global collation
On 2022-09-15 09:52, Kyotaro Horiguchi wrote: If I executed initdb as follows, I would be told to specify --icu-locale option. $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: ICU locale must be specified However, when I reran the command, it complains about incompatible encoding this time. I think it's more user-friendly to check for the encoding compatibility before the check for missing --icu-locale option. regards. In continuation of options check: AFAICS the following checks in initdb if (locale_provider == COLLPROVIDER_ICU) { if (!icu_locale) pg_fatal("ICU locale must be specified"); /* * In supported builds, the ICU locale ID will be checked by the * backend during post-bootstrap initialization. */ #ifndef USE_ICU pg_fatal("ICU is not supported in this build"); #endif } are executed approximately when they are executed in create database after getting all the necessary data from the template database: if (dblocprovider == COLLPROVIDER_ICU) { /* * This would happen if template0 uses the libc provider but the new * database uses icu. */ if (!dbiculocale) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("ICU locale must be specified"))); } if (dblocprovider == COLLPROVIDER_ICU) check_icu_locale(dbiculocale); But perhaps the check that --icu-locale cannot be specified unless locale provider icu is chosen should also be moved here? So all these checks will be in one place and it will use the provider from the template database (which could be icu): $ initdb --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start && createdb --icu-locale ru-RU --template template0 mydb ... createdb: error: database creation failed: ERROR: ICU locale cannot be specified unless locale provider is ICU -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: ICU for global collation
On 2022-09-15 09:52, Kyotaro Horiguchi wrote: If I executed initdb as follows, I would be told to specify --icu-locale option. $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: ICU locale must be specified However, when I reran the command, it complains about incompatible encoding this time. I think it's more user-friendly to check for the encoding compatibility before the check for missing --icu-locale option. regards. I agree with you. Here's another version of the patch. The locale/encoding checks and reports in initdb have been reordered, because now the encoding is set first and only then the ICU locale is checked. P.S. While working on the patch, I discovered that UTF8 encoding is always used for the ICU provider in initdb unless it is explicitly specified by the user: if (!encoding && locale_provider == COLLPROVIDER_ICU) encodingid = PG_UTF8; IMO this creates additional errors for locales with other encodings: $ initdb --locale de_DE.iso885915@euro --locale-provider icu --icu-locale de-DE ... initdb: error: encoding mismatch initdb: detail: The encoding you selected (UTF8) and the encoding that the selected locale uses (LATIN9) do not match. This would lead to misbehavior in various character string processing functions. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. And ICU supports many encodings, see the contents of pg_enc2icu_tbl in encnames.c... -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 6ff48bb18f3639ae45d9528b32df51a4aebc60c0..f248ad42b77c8c0cf2089963d4357b120914ce20 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -1034,6 +1034,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) if (dblocprovider == COLLPROVIDER_ICU) { + if (!(is_encoding_supported_by_icu(encoding))) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("encoding \"%s\" is not supported with ICU provider", + pg_encoding_to_char(encoding; + /* * This would happen if template0 uses the libc provider but the new * database uses icu. @@ -1042,10 +1048,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("ICU locale must be specified"))); - } - if (dblocprovider == COLLPROVIDER_ICU) check_icu_locale(dbiculocale); + } /* * Check that the new encoding and locale settings match the source diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 6aeec8d426c52414b827686781c245291f27ed1f..999e7c2bf8dc707063eddf19a5c27eed03d3ca09 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -2092,20 +2092,30 @@ setlocales(void) check_locale_name(LC_CTYPE, lc_messages, ); lc_messages = canonname; #endif +} - if (locale_provider == COLLPROVIDER_ICU) +static void +check_icu_locale_encoding(void) +{ + if (!(is_encoding_supported_by_icu(encodingid))) { - if (!icu_locale) - pg_fatal("ICU locale must be specified"); + pg_log_error("encoding \"%s\" is not supported with ICU provider", + pg_encoding_to_char(encodingid)); + pg_log_error_hint("Rerun %s and choose a matching combination.", + progname); + exit(1); + } - /* - * In supported builds, the ICU locale ID will be checked by the - * backend during post-bootstrap initialization. - */ + if (!icu_locale) + pg_fatal("ICU locale must be specified"); + + /* + * In supported builds, the ICU locale ID will be checked by the backend + * during post-bootstrap initialization. + */ #ifndef USE_ICU - pg_fatal("ICU is not supported in this build"); + pg_fatal("ICU is not supported in this build"); #endif - } } /* @@ -2281,34 +2291,6 @@ setup_locale_encoding(void) { setlocales(); - if (locale_provider == COLLPROVIDER_LIBC && - strcmp(lc_ctype, lc_collate) == 0 && - strcmp(lc_ctype, lc_time) == 0 && - strcmp(lc_ctype, lc_numeric) == 0 && - strcmp(lc_ctype, lc_monetary) == 0 && - strcmp(lc_ctype, lc_messages) == 0 && - (!icu_locale || strcmp(lc_ctype, icu_locale) == 0)) - printf(_("The database cluster will be initialized with locale \"%s\".\n"), lc_ctype); - else - { - printf(_("The database cluster will be initialized with this locale configuration:\n")); - printf(_(" provider:%s\n"), collprovider_name(locale_provider)); - if (icu_locale) - printf(_(" ICU locale: %s\n"), icu_locale); - printf(_(" LC_COLLATE: %s\n" - " LC_CTYPE:%s\n" - " LC_MESSAGES: %s\n" - " LC_MONETARY: %s\n"
Re: ICU for global collation
Hello! I was surprised that it is allowed to create clusters/databases where the default ICU collations do not actually work due to unsupported encodings: $ initdb --encoding SQL_ASCII --locale-provider icu --icu-locale en-US -D data && pg_ctl -D data -l logfile start && psql -c "SELECT 'a' < 'b'" template1 ... waiting for server to start done server started ERROR: encoding "SQL_ASCII" not supported by ICU $ createdb --encoding SQL_ASCII --locale-provider icu --icu-locale en-US --template template0 mydb && psql -c "SELECT 'a' < 'b'" mydb ERROR: encoding "SQL_ASCII" not supported by ICU The patch diff_check_icu_encoding.patch prohibits the creation of such objects... -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 6ff48bb18f3639ae45d9528b32df51a4aebc60c0..07758d15e8613d5a049537ddf2c5992e57ad6424 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -1042,11 +1042,16 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("ICU locale must be specified"))); - } - if (dblocprovider == COLLPROVIDER_ICU) check_icu_locale(dbiculocale); + if (!(is_encoding_supported_by_icu(encoding))) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("encoding \"%s\" is not supported with ICU provider", + pg_encoding_to_char(encoding; + } + /* * Check that the new encoding and locale settings match the source * database. We insist on this because we simply copy the source data --- diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index e00837ecacf4885cf2a176168c283f3e67c6eb53..8a762ced8340c9d8256f7832a4c19a43f1d5538a 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -2362,6 +2362,16 @@ setup_locale_encoding(void) if (!check_locale_encoding(lc_ctype, encodingid) || !check_locale_encoding(lc_collate, encodingid)) exit(1);/* check_locale_encoding printed the error */ + + if (locale_provider == COLLPROVIDER_ICU && + !(is_encoding_supported_by_icu(encodingid))) + { + pg_log_error("encoding \"%s\" is not supported with ICU provider", + pg_encoding_to_char(encodingid)); + pg_log_error_hint("Rerun %s and choose a matching combination.", + progname); + exit(1); + } } diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl index a37f6dd9b334b6ee22d9fdd4d51422795cb54a39..e4bb3d0cdd9c23729c5fb97886374f8df558f239 100644 --- a/src/bin/initdb/t/001_initdb.pl +++ b/src/bin/initdb/t/001_initdb.pl @@ -118,6 +118,15 @@ if ($ENV{with_icu} eq 'yes') ], qr/FATAL: could not open collator for locale/, 'fails for invalid ICU locale'); + + command_fails_like( + [ + 'initdb','--no-sync', + '--locale-provider=icu', '--icu-locale=en', + '--encoding=SQL_ASCII', "$tempdir/dataX" + ], + qr/error: encoding "SQL_ASCII" is not supported with ICU provider/, + 'encoding "SQL_ASCII" is not supported with ICU provider'); } else { diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl index e91c1d013d08d8bd1e3a92f2aba958c5c7713ca6..eaab3caa32669ead068719d98bb953c5c6ff5a17 100644 --- a/src/bin/scripts/t/020_createdb.pl +++ b/src/bin/scripts/t/020_createdb.pl @@ -50,6 +50,16 @@ if ($ENV{with_icu} eq 'yes') ], 'fails for invalid ICU locale'); + $node->command_fails_like( + [ + 'createdb','-T', + 'template0', '--locale-provider=icu', + '--icu-locale=en', '--encoding=SQL_ASCII', + 'foobarX' + ], + qr/ERROR: encoding "SQL_ASCII" is not supported with ICU provider/, + 'encoding "SQL_ASCII" is not supported with ICU provider'); + # additional node, which uses the icu provider my $node2 = PostgreSQL::Test::Cluster->new('icu'); $node2->init(extra => ['--locale-provider=icu', '--icu-locale=en']);
Re: ICU for global collation
On 2022-09-13 15:41, Peter Eisentraut wrote: On 13.09.22 07:34, Marina Polyakova wrote: I agree with you that it is more comfortable and more similar to what has already been done in initdb. IMO it would be easier to do it like this: diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c index e523e58b2189275dc603a06324a2f28b0f49d8b7..a1482df3d981a680dd3322052e7c03ddacc8dc26 100644 --- a/src/bin/scripts/createdb.c +++ b/src/bin/scripts/createdb.c @@ -161,12 +161,10 @@ main(int argc, char *argv[]) if (locale) { - if (lc_ctype) - pg_fatal("only one of --locale and --lc-ctype can be specified"); - if (lc_collate) - pg_fatal("only one of --locale and --lc-collate can be specified"); - lc_ctype = locale; - lc_collate = locale; + if (!lc_ctype) + lc_ctype = locale; + if (!lc_collate) + lc_collate = locale; } if (encoding) done that way Thank you! BTW it's somewhat crummy that it uses a string comparison, so if you write "UTF8" without a dash, it says this; it took me a few minutes to see the difference... postgres=# create database a LC_COLLATE "en_US.UTF8" LC_CTYPE "en_US.UTF8" LOCALE "en_US.UTF8"; ERROR: new collation (en_US.UTF8) is incompatible with the collation of the template database (en_US.UTF-8) Perhaps we could check the locale itself with the function normalize_libc_locale_name (collationcmds.c). But ISTM that the current check is a safety net in case the function pg_get_encoding_from_locale (chklocale.c) returns -1 or PG_SQL_ASCII... This is not new behavior in PG15, is it? No, it has always existed [1] AFAICS.. [1] https://github.com/postgres/postgres/commit/61d967498802ab86d8897cb3c61740d7e9d712f6 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: ICU for global collation
On 2022-09-09 19:46, Justin Pryzby wrote: In pg14: |postgres=# create database a LC_COLLATE C LC_CTYPE C LOCALE C; |ERROR: conflicting or redundant options |DETAIL: LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE. In pg15: |postgres=# create database a LC_COLLATE "en_US.UTF-8" LC_CTYPE "en_US.UTF-8" LOCALE "en_US.UTF-8" ; |CREATE DATABASE f2553d430 actually relaxed the restriction by removing this check: - if (dlocale && (dcollate || dctype)) - ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), -errmsg("conflicting or redundant options"), -errdetail("LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE."))); But isn't the right fix to do the corresponding thing in createdb (relaxing the frontend restriction rather than reverting its relaxation in the backend). diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c index e523e58b218..5b80e56dfd9 100644 --- a/src/bin/scripts/createdb.c +++ b/src/bin/scripts/createdb.c @@ -159,15 +159,10 @@ main(int argc, char *argv[]) exit(1); } - if (locale) - { - if (lc_ctype) - pg_fatal("only one of --locale and --lc-ctype can be specified"); - if (lc_collate) - pg_fatal("only one of --locale and --lc-collate can be specified"); + if (locale && !lc_ctype) lc_ctype = locale; + if (locale && !lc_collate) lc_collate = locale; - } if (encoding) { I agree with you that it is more comfortable and more similar to what has already been done in initdb. IMO it would be easier to do it like this: diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c index e523e58b2189275dc603a06324a2f28b0f49d8b7..a1482df3d981a680dd3322052e7c03ddacc8dc26 100644 --- a/src/bin/scripts/createdb.c +++ b/src/bin/scripts/createdb.c @@ -161,12 +161,10 @@ main(int argc, char *argv[]) if (locale) { - if (lc_ctype) - pg_fatal("only one of --locale and --lc-ctype can be specified"); - if (lc_collate) - pg_fatal("only one of --locale and --lc-collate can be specified"); - lc_ctype = locale; - lc_collate = locale; + if (!lc_ctype) + lc_ctype = locale; + if (!lc_collate) + lc_collate = locale; } if (encoding) Should we change the behaviour of createdb and CREATE DATABASE in previous major versions?.. BTW it's somewhat crummy that it uses a string comparison, so if you write "UTF8" without a dash, it says this; it took me a few minutes to see the difference... postgres=# create database a LC_COLLATE "en_US.UTF8" LC_CTYPE "en_US.UTF8" LOCALE "en_US.UTF8"; ERROR: new collation (en_US.UTF8) is incompatible with the collation of the template database (en_US.UTF-8) Perhaps we could check the locale itself with the function normalize_libc_locale_name (collationcmds.c). But ISTM that the current check is a safety net in case the function pg_get_encoding_from_locale (chklocale.c) returns -1 or PG_SQL_ASCII... -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: ICU for global collation
Hello! IMO after adding ICU for global collations [1] the behaviour of createdb and CREATE DATABASE is a bit inconsistent when both locale and lc_collate (or locale and lc_ctype) options are used: $ createdb mydb --locale C --lc-collate C --template template0 createdb: error: only one of --locale and --lc-collate can be specified $ psql -c "create database mydb locale = 'C' lc_collate = 'C' template = 'template0'" postgres CREATE DATABASE From the CREATE DATABASE documentation [2]: locale This is a shortcut for setting LC_COLLATE and LC_CTYPE at once. If you specify this, you cannot specify either of those parameters. The patch diff_return_back_create_database_error.patch returns back the removed code for CREATE DATABASE so it behaves like createdb as before... [1] https://github.com/postgres/postgres/commit/f2553d43060edb210b36c63187d52a632448e1d2 [2] https://www.postgresql.org/docs/devel/sql-createdatabase.html -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 6ff48bb18f3639ae45d9528b32df51a4aebc60c0..0a22cace11d9df6b5fc085bfd7b86319f4b13165 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -851,6 +851,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) parser_errposition(pstate, defel->location))); } + if (dlocale && (dcollate || dctype)) + ereport(ERROR, +(errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"), + errdetail("LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE."))); + if (downer && downer->arg) dbowner = defGetString(downer); if (dtemplate && dtemplate->arg)
Re: ICU for global collation
My colleague Andrew Bille found another bug in master (b4e936859dc441102eb0b6fb7a104f3948c90490) and REL_15_STABLE (2c63b0930aee1bb5c265fad4a65c9d0b62b1f9da): pg_collation.colliculocale is not dumped. See check_icu_locale.sh: In the old cluster: SELECT collname, colliculocale FROM pg_collation WHERE collname = 'testcoll_backwards' collname | colliculocale +--- testcoll_backwards | @colBackwards=yes (1 row) In the new cluster: SELECT collname, colliculocale FROM pg_collation WHERE collname = 'testcoll_backwards' collname | colliculocale +--- testcoll_backwards | (1 row) diff_dump_colliculocale.patch works for me. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companyinitdb -D data_old && pg_ctl -D data_old -l logfile_old start && psql -ac "CREATE COLLATION testcoll_backwards (provider = icu, locale = '@colBackwards=yes')" postgres && echo "In the old cluster:" && psql -ac "SELECT collname, colliculocale FROM pg_collation WHERE collname = 'testcoll_backwards'" postgres && pg_dump postgres > dump_postgres.sql && pg_ctl -D data_old stop && initdb -D data_new && pg_ctl -D data_new -l logfile_new start && psql -v ON_ERROR_STOP=1 -f dump_postgres.sql postgres && echo "In the new cluster:" && psql -ac "SELECT collname, colliculocale FROM pg_collation WHERE collname = 'testcoll_backwards'" postgres && pg_ctl -D data_new stop diff --git a/src/bin/pg_dump/Makefile b/src/bin/pg_dump/Makefile index 2f524b09bf53a55037013d78148f8cbca4fa7eee..9dc5a784dd2d1ce58a6284f19c54730364779c4d 100644 --- a/src/bin/pg_dump/Makefile +++ b/src/bin/pg_dump/Makefile @@ -17,6 +17,7 @@ top_builddir = ../../.. include $(top_builddir)/src/Makefile.global export GZIP_PROGRAM=$(GZIP) +export with_icu override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils $(libpq_pgport) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 2c6891573296b395c5bd6b627d061de657355e9c..e8e8f69e30c8a787e5dff157f5d1619917638d7d 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -13084,9 +13084,11 @@ dumpCollation(Archive *fout, const CollInfo *collinfo) int i_collisdeterministic; int i_collcollate; int i_collctype; + int i_colliculocale; const char *collprovider; const char *collcollate; const char *collctype; + const char *colliculocale; /* Do nothing in data-only dump */ if (dopt->dataOnly) @@ -13117,6 +13119,13 @@ dumpCollation(Archive *fout, const CollInfo *collinfo) appendPQExpBufferStr(query, "true AS collisdeterministic, "); + if (fout->remoteVersion >= 15) + appendPQExpBufferStr(query, + "colliculocale, "); + else + appendPQExpBufferStr(query, + "NULL AS colliculocale, "); + appendPQExpBuffer(query, "collcollate, " "collctype " @@ -13130,10 +13139,24 @@ dumpCollation(Archive *fout, const CollInfo *collinfo) i_collisdeterministic = PQfnumber(res, "collisdeterministic"); i_collcollate = PQfnumber(res, "collcollate"); i_collctype = PQfnumber(res, "collctype"); + i_colliculocale = PQfnumber(res, "colliculocale"); collprovider = PQgetvalue(res, 0, i_collprovider); - collcollate = PQgetvalue(res, 0, i_collcollate); - collctype = PQgetvalue(res, 0, i_collctype); + + if (!PQgetisnull(res, 0, i_collcollate)) + collcollate = PQgetvalue(res, 0, i_collcollate); + else + collcollate = NULL; + + if (!PQgetisnull(res, 0, i_collctype)) + collctype = PQgetvalue(res, 0, i_collctype); + else + collctype = NULL; + + if (!PQgetisnull(res, 0, i_colliculocale)) + colliculocale = PQgetvalue(res, 0, i_colliculocale); + else + colliculocale = NULL; appendPQExpBuffer(delq, "DROP COLLATION %s;\n", fmtQualifiedDumpable(collinfo)); @@ -13156,17 +13179,28 @@ dumpCollation(Archive *fout, const CollInfo *collinfo) if (strcmp(PQgetvalue(res, 0, i_collisdeterministic), "f") == 0) appendPQExpBufferStr(q, ", deterministic = false"); - if (strcmp(collcollate, collctype) == 0) + if (colliculocale != NULL) { appendPQExpBufferStr(q, ", locale = "); - appendStringLiteralAH(q, collcollate, fout); + appendStringLiteralAH(q, colliculocale, fout); } else { - appendPQExpBufferStr(q, ", lc_collate = "); - appendStringLiteralAH(q, collcollate, fout); - appendPQExpBufferStr(q, ", lc_ctype = "); - appendStringLiteralAH(q, collctype, fout); + Assert(collcollate != NULL); + Assert(collctype != NULL); + + if (strcmp(collcollate, collctype) == 0) + { + appendPQExpBufferStr(q, ", locale = "); + appendStringLi
Re: ICU for global collation
On 2022-08-22 17:10, Peter Eisentraut wrote: On 15.08.22 14:06, Marina Polyakova wrote: 1.1) It looks like there's a bug in the function get_db_infos (src/bin/pg_upgrade/info.c), where the version of the old cluster is always checked: if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500) snprintf(query + strlen(query), sizeof(query) - strlen(query), "'c' AS datlocprovider, NULL AS daticulocale, "); else snprintf(query + strlen(query), sizeof(query) - strlen(query), "datlocprovider, daticulocale, "); With the simple patch diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c index df374ce4b362b4c6c87fc1fd0e476e5d6d353d9e..53ea348e211d3ac38334292bc16cb814bc13bb87 100644 --- a/src/bin/pg_upgrade/info.c +++ b/src/bin/pg_upgrade/info.c @@ -319,7 +319,7 @@ get_db_infos(ClusterInfo *cluster) snprintf(query, sizeof(query), "SELECT d.oid, d.datname, d.encoding, d.datcollate, d.datctype, "); - if (GET_MAJOR_VERSION(old_cluster.major_version) < 1500) + if (GET_MAJOR_VERSION(cluster->major_version) < 1500) snprintf(query + strlen(query), sizeof(query) - strlen(query), "'c' AS datlocprovider, NULL AS daticulocale, "); else fixed 1.2) It looks like the mentioned asserion in dbcommands.c conflicts with the following lines earlier: if (dbiculocale == NULL) dbiculocale = src_iculocale; fixed I'm wondering if this is not a fully-supported feature (because createdb creates an SQL command with LC_COLLATE and LC_CTYPE options instead of LOCALE option) or is it a bug in CREATE DATABASE?.. From src/backend/commands/dbcommands.c: if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale) { if (dlocale && dlocale->arg) dbiculocale = defGetString(dlocale); } I think this piece of code was left over from some earlier attempts to specify the libc locale and the icu locale with one option, which never really worked well. The CREATE DATABASE man page does not mention that LOCALE provides the default for ICU_LOCALE. Hence, I think we should delete this. Thank you! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: ICU for global collation
first 3 databases in the cluster in much the same way... P.S. FYI there seems to be a bug for very old ICU versions: in master (92fce4e1eda9b24d73f583fbe9b58f4e03f097a4): $ initdb -D data && pg_ctl -D data -l logfile start && psql -c "CREATE DATABASE mydb LOCALE \"C.UTF-8\" LOCALE_PROVIDER icu TEMPLATE template0" postgres && psql -c "SELECT 1" mydb WARNING: database "mydb" has a collation version mismatch DETAIL: The database was created using collation version 49.192.0.42, but the operating system provides version 49.192.5.42. HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE mydb REFRESH COLLATION VERSION, or build PostgreSQL with the right library version. See the additional output (diff_log_icu_collator_locale.patch) in the logfile: 2022-08-20 11:38:30.162 MSK [136546] LOG: check_icu_locale uloc_getDefault() en_US 2022-08-20 11:38:30.162 MSK [136546] STATEMENT: CREATE DATABASE mydb LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0 2022-08-20 11:38:30.163 MSK [136546] LOG: check_icu_locale icu_locale C.UTF-8 valid_locale en_US version 49.192.0.42 2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0 2022-08-20 11:38:30.163 MSK [136546] LOG: get_collation_actual_version uloc_getDefault() en_US 2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0 2022-08-20 11:38:30.163 MSK [136546] LOG: get_collation_actual_version icu_locale C.UTF-8 valid_locale en_US version 49.192.0.42 2022-08-20 11:38:30.163 MSK [136546] STATEMENT: CREATE DATABASE mydb LOCALE "C.UTF-8" LOCALE_PROVIDER icu TEMPLATE template0 2022-08-20 11:38:30.224 MSK [136548] LOG: make_icu_collator uloc_getDefault() c 2022-08-20 11:38:30.225 MSK [136548] LOG: make_icu_collator icu_locale C.UTF-8 valid_locale root version 49.192.5.42 2022-08-20 11:38:30.225 MSK [136548] LOG: get_collation_actual_version uloc_getDefault() c 2022-08-20 11:38:30.225 MSK [136548] LOG: get_collation_actual_version icu_locale C.UTF-8 valid_locale root version 49.192.5.42 2022-08-20 11:38:30.225 MSK [136548] WARNING: database "mydb" has a collation version mismatch 2022-08-20 11:38:30.225 MSK [136548] DETAIL: The database was created using collation version 49.192.0.42, but the operating system provides version 49.192.5.42. 2022-08-20 11:38:30.225 MSK [136548] HINT: Rebuild all objects in this database that use the default collation and run ALTER DATABASE mydb REFRESH COLLATION VERSION, or build PostgreSQL with the right library version. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=loach=2022-08-18%2006%3A25%3A17 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 1a047a97d74e7fe5b2cad22cd54cf30f7129bf84..21cfbfffdd1b58df821adbdd577fb599bc2747cf 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1406,6 +1406,14 @@ make_icu_collator(const char *iculocstr, #ifdef USE_ICU UCollator *collator; UErrorCode status; + const char *default_icu_locale; + UVersionInfo versioninfo; + char buf[U_MAX_VERSION_STRING_LENGTH]; + const char *valid_locale; + + default_icu_locale = uloc_getDefault(); + elog(LOG, "make_icu_collator uloc_getDefault() %s", + default_icu_locale ? default_icu_locale : "(null)"); status = U_ZERO_ERROR; collator = ucol_open(iculocstr, ); @@ -1417,6 +1425,20 @@ make_icu_collator(const char *iculocstr, if (U_ICU_VERSION_MAJOR_NUM < 54) icu_set_collation_attributes(collator, iculocstr); + status = U_ZERO_ERROR; + valid_locale = ucol_getLocaleByType(collator, ULOC_VALID_LOCALE, ); + if (U_FAILURE(status) || valid_locale == NULL) + ereport(ERROR, +(errmsg("failed to get valid locale for collator with " + "requested locale \"%s\": %s", + iculocstr, u_errorName(status; + + ucol_getVersion(collator, versioninfo); + u_versionToString(versioninfo, buf); + + elog(LOG, "make_icu_collator icu_locale %s valid_locale %s version %s", + iculocstr ? iculocstr : "(null)", valid_locale, buf); + /* We will leak this string if the caller errors later :-( */ resultp->info.icu.locale = MemoryContextStrdup(TopMemoryContext, iculocstr); resultp->info.icu.ucol = collator; @@ -1654,6 +1676,12 @@ get_collation_actual_version(char collprovider, const char *collcollate) UErrorCode status; UVersionInfo versioninfo; char buf[U_MAX_VERSION_STRING_LENGTH]; + const char *default_icu_locale; + const char *valid_locale; + + default_icu_locale = uloc_getDefault(); + elog(LOG, "get_collation_actual_versi
Re: ICU for global collation
(errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("ICU locale must be specified"))); } + else + dbiculocale = NULL; if (dblocprovider == COLLPROVIDER_ICU) check_icu_locale(dbiculocale); 2) CREATE DATABASE does not always require the icu locale unlike initdb and createdb: $ initdb -D data --locale en_US.UTF-8 --locale-provider icu ... initdb: error: ICU locale must be specified $ initdb -D data --locale en_US.UTF-8 $ pg_ctl -D data -l logfile start $ createdb mydb --locale en_US.UTF-8 --template template0 --locale-provider icu createdb: error: database creation failed: ERROR: ICU locale must be specified $ psql -c "CREATE DATABASE mydb LOCALE \"en_US.UTF-8\" TEMPLATE template0 LOCALE_PROVIDER icu" postgres CREATE DATABASE $ psql -c "CREATE DATABASE mydb TEMPLATE template0 LOCALE_PROVIDER icu" postgres ERROR: ICU locale must be specified I'm wondering if this is not a fully-supported feature (because createdb creates an SQL command with LC_COLLATE and LC_CTYPE options instead of LOCALE option) or is it a bug in CREATE DATABASE?.. From src/backend/commands/dbcommands.c: if (dblocprovider == COLLPROVIDER_ICU && !dbiculocale) { if (dlocale && dlocale->arg) dbiculocale = defGetString(dlocale); } [1] https://github.com/postgres/postgres/commit/f2553d43060edb210b36c63187d52a632448e1d2 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Reduce the time required for a database recovery from archive.
Hello everyone in this thread! On 2021-03-18 18:04, David Steele wrote: Seems like there should have been a patch attached? IMO there's a technical problem with sending, receiving (or displaying on the site) emails from the list pgsql-hackers. By subsribing to this list I received the attached patch from the email [1]. And my colleague Roman Zharkov said that the button 'Resend email' from that link helped him to receive the email with the attached patch. On the other hand follwing this link in the browser I do not see the attached patch. Do you think it is worth to write about this issue to webmaster(dot)postgresql(dot)org?.. Just in case I'm lucky this email contains the lost patch. [1] https://www.postgresql.org/message-id/4047CC05-1AF5-454B-850B-ED37374A2AC0%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres CompanyFrom c071e8ee78aac811feaf54c4374c1a998409733e Mon Sep 17 00:00:00 2001 From: Dmitry Shulga Date: Fri, 18 Dec 2020 12:38:58 +0700 Subject: [PATCH] Reduce time required to recover database from archive. Originally database recovering from archive was performed by sequential receiving of files with WAL records and applying them against the database. Delivering of files containing WAL records are performed by running a command specified by the GUC parameter restore_command. In case receiving of every file containing WAL records takes long time it results in standing idle most of time waiting until files be received. If time required to apply WAL records from an archive file is significantly lesser than time required to deliver the file from archive it leads to nonproductive standing idle after current WAL segment is applied and before next WAL segment be received. As a consequence a wall time required to recover a database from archive log can be unacceptably long. To reduce total time required to restore database from archive the procedure for delivering of WAL files was redesigned in order to allow concurrent loading of WAL files. At postmaster start a few background processes are spawned to load WAL files from archive in parallel. A number of processes that started to perform preloading of WAL files is determined by the new GUC parameter wal_prefetch_workers. A number of WAL files to prefetch from archive is limited by the new GUC parameter wal_max_prefetch_amount. Additionally, refactoring was done to extract duplicate code, used in the files xlogarchive.c and xlogrestore.c, into stanalone functions and move it to the file xlogutils.c Author: Dmitry Shulga Reviewed-by: Anna Akenteva Tested-by: Roman Zharkov diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile index 595e02de722..ffbf8090f45 100644 --- a/src/backend/access/transam/Makefile +++ b/src/backend/access/transam/Makefile @@ -32,7 +32,8 @@ OBJS = \ xlogfuncs.o \ xloginsert.o \ xlogreader.o \ - xlogutils.o + xlogutils.o \ + xlogrestore.o include $(top_srcdir)/src/backend/common.mk diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 13f1d8c3dc7..f0a0c68725e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -37,6 +37,7 @@ #include "access/xloginsert.h" #include "access/xlogreader.h" #include "access/xlogutils.h" +#include "access/xlogrestore.h" #include "catalog/catversion.h" #include "catalog/pg_control.h" #include "catalog/pg_database.h" @@ -3684,10 +3685,11 @@ XLogFileRead(XLogSegNo segno, int emode, TimeLineID tli, xlogfname); set_ps_display(activitymsg); - restoredFromArchive = RestoreArchivedFile(path, xlogfname, - "RECOVERYXLOG", - wal_segment_size, - InRedo); + restoredFromArchive = RestoreCommandXLog(path, xlogfname, + "RECOVERYXLOG", + wal_segment_size, + InRedo); + if (!restoredFromArchive) return -1; break; diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index f39dc4ddf1a..fb4023f1cec 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -22,6 +22,7 @@ #include "access/xlog.h" #include "access/xlog_internal.h" #include "access/xlogarchive.h" +#include "access/xlogutils.h" #include "common/archive.h" #include "miscadmin.h" #include "postmaster/startup.h" @@ -55,13 +56,8 @@ RestoreArchivedFile(char *path, const char *xlogfname, bool cleanupEnabled) { char xlogpath[MAXPGPATH]; - char *xlogRestoreCmd; char lastRestartPointFname[MAXPGPATH]; int rc; - struct stat stat_buf; - XLogSegNo restartSegNo; - XLogRecPtr restartRedoPtr; - TimeLineID restartTli; /* * Ignore restore_command when not in archive recovery (meaning we are
Re: pgbench: option delaying queries till connections establishment?
Hello! On 2020-11-14 20:07, Alexander Korotkov wrote: Hmm... Let's see the big picture. You've recently committed a patchset, which greatly improved the performance of GetSnapshotData(). And you're making further improvements in this direction. But you're getting trouble in measuring the effect, because Postgres is still stuck on ProcArrayLock. And in this thread you propose a workaround for that implemented on the pgbench side. My very dumb idea is following: should we finally give a chance to more fair lwlocks rather than inventing workarounds? As I remember, your major argument against more fair lwlocks was the idea that we should fix lwlocks use-cases rather than lwlock mechanism themselves. But can we expect that we fix all the lwlocks use-case in any reasonable prospect? My guess is 'no'. Links 1. https://www.postgresql.org/message-id/CAPpHfdvJhO1qutziOp%3Ddy8TO8Xb4L38BxgKG4RPa1up1Lzh_UQ%40mail.gmail.com Sorry I'm not familiar with the internal architecture of snapshots, locks etc. in postgres, but I wanted to ask - what exact kind of patch for fair lwlocks do you want to offer to the community? I applied the 6-th version of the patch for fair lwlocks from [1] to the old master branch (see commit [2]), started many threads in pgbench (-M prepared -c 1000 -j 500 -T 10 -P1 -S) and I did not receive stable first progress reports, which IIUC are one of the advantages of the discussed patch for the pgbench (see [3])... [1] https://www.postgresql.org/message-id/CAPpHfduV3v3EG7K74-9htBZz_mpE993zGz-%3D2k5RNA3tqabUAA%40mail.gmail.com [2] https://github.com/postgres/postgres/commit/84d514887f9ca673ae688d00f8b544e70f1ab270 [3] https://www.postgresql.org/message-id/20200227185129.hikscyenomnlrord%40alap3.anarazel.de -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: pgbench: option delaying queries till connections establishment?
Hello! On 2020-11-13 08:44, kuroda.hay...@fujitsu.com wrote: Dear Fabien, and this will wait till its time comes. In the mean time, I think that you should put the patch status as you see fit, independently of the other patch: it seems unlikely that they would be committed together, and I'll have to merge the remaining one anyway. OK. I found the related thread[1], and I understood you will submit another patch on the thread. PostgreSQL Patch Tester says all regression tests are passed, and I change the status to "Ready for committer." [1]: https://commitfest.postgresql.org/31/2827/ Thank you for discussing with me. Hayato Kuroda FUJITSU LIMITED From the mentioned thread [2]: While trying to test a patch that adds a synchronization barrier in pgbench [1] on Windows, Thanks for trying that, I do not have a windows setup for testing, and the sync code I wrote for Windows is basically blind coding:-( FYI: 1) It looks like pgbench will no longer support Windows XP due to the function DeleteSynchronizationBarrier. From https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier : Minimum supported client: Windows 8 [desktop apps only] Minimum supported server: Windows Server 2012 [desktop apps only] On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1] has compiled without (new) warnings, but when running pgbench I got the following error: The procedure entry point DeleteSynchronizationBarrier could not be located in the dynamic link library KERNEL32.dll. IMO, it looks like either old Windows systems should not call new functions, or we should throw them a compilation error. (Update MIN_WINNT to 0x0602 = Windows 8 in src/include/port/win32.h?) In the second case it looks like the documentation should be updated too, see doc/src/sgml/installation.sgml: PostgreSQL can be expected to work on these operating systems: Linux (all recent distributions), Windows (XP and later), FreeBSD, OpenBSD, NetBSD, macOS, AIX, HP/UX, and Solaris. Other Unix-like systems may also work but are not currently being tested. In most cases, all CPU architectures supported by a given operating system will work. Look in below to see if there is information specific to your operating system, particularly if using an older system. <...> The native Windows port requires a 32 or 64-bit version of Windows 2000 or later. Earlier operating systems do not have sufficient infrastructure (but Cygwin may be used on those). MinGW, the Unix-like build tools, and MSYS, a collection of Unix tools required to run shell scripts like configure, can be downloaded from http://www.mingw.org/;>. Neither is required to run the resulting binaries; they are needed only for creating the binaries. [2] https://www.postgresql.org/message-id/e5a09b790db21356376b6e73673aa07c%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: pgbench stopped supporting large number of client connections on Windows
On 2020-11-07 01:01, Fabien COELHO wrote: Hello Marina, Hello, Fabien! Thank you for your comments! While trying to test a patch that adds a synchronization barrier in pgbench [1] on Windows, Thanks for trying that, I do not have a windows setup for testing, and the sync code I wrote for Windows is basically blind coding:-( FYI: 1) It looks like pgbench will no longer support Windows XP due to the function DeleteSynchronizationBarrier. From https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier : Minimum supported client: Windows 8 [desktop apps only] Minimum supported server: Windows Server 2012 [desktop apps only] On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1] has compiled without (new) warnings, but when running pgbench I got the following error: The procedure entry point DeleteSynchronizationBarrier could not be located in the dynamic link library KERNEL32.dll. 2) On Windows Server 2019 (MSVC 2019) the 6-th version of the patch [1] with fix_max_client_conn_on_Windows.patch has compiled without (new) warnings. I made a few runs (-M prepared -c 100 -j 10 -T 10 -P1 -S) with and without your patches. On Linux (-M prepared -c 1000 -j 500 -T 10 -P1 -S) your patches fix problems with progress reports as in [2], but on Windows I did not notice such changes, see attached pgbench_runs_linux_vs_windows.zip. The almost same thing happens with reindexdb and vacuumdb (build on commit [3]): Windows fd implementation is somehow buggy because it does not return the smallest number available, and then with the assumption that select uses a dense array indexed with them (true on linux, less so on Windows which probably uses a sparse array), so that the number gets over the limit, even if less are actually used, hence the catch, as you noted. I agree with you. It looks like the structure fd_set just contains used sockets by this application on Windows, and the macro FD_SETSIZE is used only here. From https://docs.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set : typedef struct fd_set { u_int fd_count; SOCKET fd_array[FD_SETSIZE]; } fd_set, FD_SET, *PFD_SET, *LPFD_SET; From https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2 : The maximum number of sockets that a Windows Sockets application can use is not affected by the manifest constant FD_SETSIZE. This value defined in the Winsock2.h header file is used in constructing the FD_SET structures used with select function. IIUC the checks below are not correct on Windows, since on this system sockets can have values equal to or greater than FD_SETSIZE (see Windows documentation [4] and pgbench debug output in attached pgbench_debug.txt). Okay. But then, how may one detect that there are too many fds in the set? I think that an earlier version of the code needed to make assumptions about the internal implementation of windows (there is a counter somewhere in windows fd_set struct), which was rejected because if was breaking the interface. Now your patch is basically resurrecting that. I tried to keep the behaviour "we check if the socket value can be used in select() at runtime", but now I will also read that thread... Why not if there is no other solution, but this is quite depressing, and because it breaks the interface it would be broken if windows changed its internals for some reason:-( It looks like if the internals of the structure fd_set are changed, we will also have problems with the function pgwin32_select from src/backend/port/win32/socket.c, because it uses fd_set.fd_count too?.. (I'm writing responses to the rest of your comments but it takes time...) [1] https://www.postgresql.org/message-id/alpine.DEB.2.22.394.2011021726390.989361%40pseudo [2] https://www.postgresql.org/message-id/20200227185129.hikscyenomnlrord%40alap3.anarazel.de -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company<>
Re: pgbench stopped supporting large number of client connections on Windows
On 2020-11-06 23:54, Ranier Vilela wrote: Hi Marina, Hello! Nice catch. Thank you! rc/bin/pgbench/pgbench.c, the function add_socket_to_set: if (fd < 0 || fd >= FD_SETSIZE) { /* * Doing a hard exit here is a bit grotty, but it doesn't seem worth * complicating the API to make it less grotty. */ pg_log_fatal("too many client connections for select()"); exit(1); } It seems to me that the limit is hardcode in, src/backend/port/win32/socket.c FD_SETSIZE * 2 that would be 2048? 1) If you mean the function pgwin32_select in the file src/backend/port/win32/socket.c, IIUC it is only used in the backend, see src/include/port/win32_port.h: #ifndef FRONTEND <...> #define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout) <...> #endif /* FRONTEND */ 2) It looks like FD_SETSIZE does not set a limit on the socket value on Windows, see https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2 : The maximum number of sockets that a Windows Sockets application can use is not affected by the manifest constant FD_SETSIZE. This value defined in the Winsock2.h header file is used in constructing the FD_SET structures used with select function. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgbench stopped supporting large number of client connections on Windows
Hello, hackers! While trying to test a patch that adds a synchronization barrier in pgbench [1] on Windows, I found that since the commit "Use ppoll(2), if available, to wait for input in pgbench." [2] I cannot use a large number of client connections in pgbench on my Windows virtual machines (Windows Server 2008 R2 and Windows 2019), for example: bin\pgbench.exe -c 90 -S -T 3 postgres starting vacuum...end. too many client connections for select() The almost same thing happens with reindexdb and vacuumdb (build on commit [3]): bin\reindexdb.exe -j 95 postgres reindexdb: fatal: too many jobs for this platform -- try 90 bin\vacuumdb.exe -j 95 postgres vacuumdb: vacuuming database "postgres" vacuumdb: fatal: too many jobs for this platform -- try 90 IIUC the checks below are not correct on Windows, since on this system sockets can have values equal to or greater than FD_SETSIZE (see Windows documentation [4] and pgbench debug output in attached pgbench_debug.txt). src/bin/pgbench/pgbench.c, the function add_socket_to_set: if (fd < 0 || fd >= FD_SETSIZE) { /* * Doing a hard exit here is a bit grotty, but it doesn't seem worth * complicating the API to make it less grotty. */ pg_log_fatal("too many client connections for select()"); exit(1); } src/bin/scripts/scripts_parallel.c, the function ParallelSlotsSetup: /* * Fail and exit immediately if trying to use a socket in an * unsupported range. POSIX requires open(2) to use the lowest * unused file descriptor and the hint given relies on that. */ if (PQsocket(conn) >= FD_SETSIZE) { pg_log_fatal("too many jobs for this platform -- try %d", i); exit(1); } I tried to fix this, see attached fix_max_client_conn_on_Windows.patch (based on commit [3]). I checked it for reindexdb and vacuumdb, and it works for simple databases (1025 jobs are not allowed and 1024 jobs is ok). Unfortunately, pgbench was getting connection errors when it tried to use 1000 jobs on my virtual machines, although there were no errors for fewer jobs (500) and the same number of clients (1000)... Any suggestions are welcome! [1] https://www.postgresql.org/message-id/flat/20200227180100.zyvjwzcpiokfsqm2%40alap3.anarazel.de [2] https://github.com/postgres/postgres/commit/60e612b602999e670f2d57a01e52799eaa903ca9 [3] https://github.com/postgres/postgres/commit/48e1291342dd7771cf8c67aa1d7ec1f394b95dd8 [4] From https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select : Internally, socket handles in an fd_set structure are not represented as bit flags as in Berkeley Unix. Their data representation is opaque. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index 3057665bbec567331ad5ea03d31af707f5e91b4c..7a54638db191982d538cabf007d82715fa254b6a 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -62,6 +62,7 @@ #include "common/string.h" #include "fe_utils/cancel.h" #include "fe_utils/conditional.h" +#include "fe_utils/socket_utils.h" #include "getopt_long.h" #include "libpq-fe.h" #include "pgbench.h" @@ -6698,7 +6699,7 @@ clear_socket_set(socket_set *sa) static void add_socket_to_set(socket_set *sa, int fd, int idx) { - if (fd < 0 || fd >= FD_SETSIZE) + if (fd < 0 || !check_fd_set_size(fd, >fds)) { /* * Doing a hard exit here is a bit grotty, but it doesn't seem worth diff --git a/src/bin/scripts/scripts_parallel.c b/src/bin/scripts/scripts_parallel.c index ec264a269a7d7a506c347112af6be183b2aec9ce..61977741c7c5f76b1966ab8e59792a1d2b53b934 100644 --- a/src/bin/scripts/scripts_parallel.c +++ b/src/bin/scripts/scripts_parallel.c @@ -25,6 +25,7 @@ #include "common.h" #include "common/logging.h" #include "fe_utils/cancel.h" +#include "fe_utils/socket_utils.h" #include "scripts_parallel.h" static void init_slot(ParallelSlot *slot, PGconn *conn); @@ -144,6 +145,16 @@ ParallelSlotsGetIdle(ParallelSlot *slots, int numslots) if (sock < 0) continue; + /* + * Fail and exit immediately if trying to use a socket in an + * unsupported range. + */ + if (!check_fd_set_size(sock, )) + { +pg_log_fatal("too many jobs for this platform -- try %d", i); +exit(1); + } + FD_SET(sock, ); if (sock > maxFd) maxFd = sock; @@ -221,18 +232,6 @@ ParallelSlotsSetup(const ConnParams *cparams, for (i = 1; i < numslots; i++) { conn = connectDatabase(cparams, progname, echo, false, true); - - /* - * Fail and exit immediately if trying to use a socket in an - * unsupported range. POSIX requires open(2) to use the lowest - * unused file descriptor and the hint given relies o
Re: pg_upgrade check fails on Solaris 10
On 2019-09-23 19:41, Alvaro Herrera wrote: On 2019-Sep-23, Marina Polyakova wrote: The branch REL9_4_STABLE (commit 8a17afe84be6fefe76d0d2f4d26c5ee075e64487) has the same issue - according to the release table [2] it is still supported, isn't it?... Yeah, but pg_upgrade is in contrib/ in 9.4, so nowhere as good as from 9.5 onwards; and it's going to die in a couple of months anyway, so I'm not thrilled about fixing this there. If you *need* to have this fixed in 9.4, we can do that, but do you? No, we don't. I just noticed :-) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: pg_upgrade check fails on Solaris 10
On 2019-09-18 17:36, Alvaro Herrera wrote: On 2019-Sep-17, Marina Polyakova wrote: Hello, hackers! We got an error for pg_upgrade check on the branch REL_11_STABLE (commit 40ad4202513c72f5c1beeb03e26dfbc8890770c0) on Solaris 10 because IIUC the argument to the sed command is not enclosed in quotation marks (see [1]): Hmm, I'm surprised it has taken this long to detect the problem. Looking at the members of buildfarm [1] castoroides and protosciurus - IIUC they do not check pg_upgrade. And I was that lucky one who have run the branch with the latest commits at our buildfarm... Attached diff.patch fixes the problem. I have pushed it to all branches that have src/bin/pg_upgrade (namely, 9.5 onwards), thanks. I hope this won't make the msys/mingw machines angry ;-) Thank you! I ran pg_upgrade tests for MSYS, everything is fine. The branch REL9_4_STABLE (commit 8a17afe84be6fefe76d0d2f4d26c5ee075e64487) has the same issue - according to the release table [2] it is still supported, isn't it?... [1] https://buildfarm.postgresql.org/cgi-bin/show_members.pl [2] https://www.postgresql.org/support/versioning/ -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pg_upgrade check fails on Solaris 10
Hello, hackers! We got an error for pg_upgrade check on the branch REL_11_STABLE (commit 40ad4202513c72f5c1beeb03e26dfbc8890770c0) on Solaris 10 because IIUC the argument to the sed command is not enclosed in quotation marks (see [1]): $ gmake -C src/bin/pg_upgrade/ check <...> MAKE=gmake bindir="/home/buildfarm/mpolyakova/postgrespro_REL_11_STABLE/inst/bin" libdir="/home/buildfarm/mpolyakova/postgrespro_REL_11_STABLE/inst/lib" EXTRA_REGRESS_OPTS="" /bin/sh test.sh --install test.sh: MSYS/MINGW/: not found gmake: *** [check] Error 1 gmake: Leaving directory `/home/buildfarm/mpolyakova/postgrespro_REL_11_STABLE/src/bin/pg_upgrade' $ sed: command garbled: s/ Attached diff.patch fixes the problem. About the system: SunOS, Release 5.10, KernelID Generic_141444-09. About the used shell: according to the manual, it comes from the package SUNWcsu. Thanks to Victor Wagner for his help to investigate this issue. [1] $ man sh <...> Quoting The following characters have a special meaning to the shell and cause termination of a word unless quoted: ; & ( ) | ^ < > newline space tab -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/bin/pg_upgrade/test.sh b/src/bin/pg_upgrade/test.sh index 424d89c3feeced91531920e919a2d9e06cc4baa6..f258983b18f8276cfef3db0a425d59fead6493d9 100644 --- a/src/bin/pg_upgrade/test.sh +++ b/src/bin/pg_upgrade/test.sh @@ -33,7 +33,7 @@ standard_initdb() { # What flavor of host are we on? # Treat MINGW* (msys1) and MSYS* (msys2) the same. -testhost=`uname -s | sed s/^MSYS/MINGW/` +testhost=`uname -s | sed 's/^MSYS/MINGW/'` # Establish how the server will listen for connections case $testhost in
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 2018-11-16 22:59, Alvaro Herrera wrote: On 2018-Sep-05, Marina Polyakova wrote: v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). Pushed this one with minor stylistic changes (the most notable of which is the move of initRandomState to where the rest of the random generator infrastructure is, instead of in a totally random place). Thanks, Thank you very much! I'm going to send a new patch set until the end of this week (I'm sorry I was very busy in the release of Postgres Pro 11...). -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 12-09-2018 17:04, Fabien COELHO wrote: Hello Marina, You can get other errors that cannot happen for only one client if you use shell commands in meta commands: Or if you use untrusted procedural languages in SQL expressions (see the used file in the attachments): Or if you try to create a function and perhaps replace an existing one: Sure. Indeed there can be shell errors, perl errors, create functions conflicts... I do not understand what is your point wrt these. I'm mostly saying that your patch should focus on implementing the retry feature when appropriate, and avoid changing the behavior (error displayed, abort or not) on features unrelated to serialization & deadlock errors. Maybe there are inconsistencies, and "bug"/"feature" worth fixing, but if so that should be a separate patch, if possible, and if these are bugs they could be backpatched. For now I'm still convinced that pgbench should keep on aborting on "\set" or SQL syntax errors, and show clear error messages on these, and your examples have not changed my mind on that point. I'm fine with renaming the field if it makes thinks clearer. They are all counters, so naming them "cnt" or "total_cnt" does not help much. Maybe "succeeded" or "success" to show what is really counted? Perhaps renaming of StatsData.cnt is better than just adding a comment to this field. But IMO we have the same problem (They are all counters, so naming them "cnt" or "total_cnt" does not help much.) for CState.cnt which cannot be named in the same way because it also includes skipped and failed transactions. Hmmm. CState's cnt seems only used to implement -t anyway? I'm okay if it has a different name, esp if it has a different semantics. Ok! I think I was arguing only about cnt in StatsData. The discussion about this has become entangled from the beginning, because as I wrote in [1] at first I misread your original proposal... [1] https://www.postgresql.org/message-id/d318cdee8f96de6b1caf2ce684ffe4db%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
0 DO $$ Or if you try to create a function and perhaps replace an existing one: starting vacuum...end. client 0 got an error in command 0 (SQL) of script 0; ERROR: duplicate key value violates unique constraint "pg_proc_proname_args_nsp_index" DETAIL: Key (proname, proargtypes, pronamespace)=(my_function, , 2200) already exists. client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 1 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated client 0 got an error in command 0 (SQL) of script 0; ERROR: tuple concurrently updated transaction type: pgbench_create_function.sql scaling factor: 1 query mode: simple number of clients: 2 number of threads: 1 number of transactions per client: 10 number of transactions actually processed: 10/20 number of failures: 10 (50.000%) number of serialization failures: 0 (0.000%) number of deadlock failures: 0 (0.000%) number of other SQL failures: 10 (50.000%) maximum number of tries: 1 latency average = 82.881 ms (including failures) tps = 12.065492 (including connections establishing) tps = 12.092216 (excluding connections establishing) statement latencies in milliseconds and failures: 82.549 10 CREATE OR REPLACE FUNCTION my_function() RETURNS integer AS 'select 1;' LANGUAGE SQL; Why not handle client errors that can occur (but they may also not occur) the same way? (For example, always abort the client, or conversely do not make aborts in these cases.) Here's an example of such error: client 5 got an error in command 1 (SQL) of script 0; ERROR: division by zero This is an interesting case. For me we must stop the script because the client is asking for something "stupid", and retrying the same won't change the outcome, the division will still be by zero. It is the client responsability not to ask for something stupid, the bench script is buggy, it should not submit illegal SQL queries. This is quite different from submitting something legal which happens to fail. ... I'm not sure that having "--debug" implying this option is useful: As there are two distinct options, the user may be allowed to trigger one or the other as they wish? I'm not sure that the main debugging output will give a good clue of what's happened without full messages about errors, retries and failures... I'm more argumenting about letting the user decide what they want. These lines are quite long - do you suggest to wrap them this way? Sure, if it is too long, then wrap. Ok! Function getTransactionStatus name does not seem to correspond fully to what the function does. There is a passthru case which should be either avoided or clearly commented. I don't quite understand you - do you mean that in fact this function finds out whether we are in a (failed) transaction block or not? Or do you mean that the case of PQTRANS_INTRANS is also ok?... The former: although the function is named "getTransactionStatus", it does not really return the "status" of the transaction (aka PQstatus()?). Thank you, I'll think how to improve it. Perhaps the name checkTransactionStatus will be better... I'd insist in a comment that "cnt" does not include "skipped" transactions (anymore). If you mean CState.cnt I'm not sure if this is practically useful because the code uses only the sum of all client transactions including skipped and failed... Maybe we can rename this field to nxacts or total_cnt? I'm fine with renaming the field if it makes thinks clearer. They are all counters, so naming them "cnt" or "total_cnt" does not help much. Maybe "succeeded" or "success" to show what is really counted? Perhaps renaming of StatsData.cnt is better than just adding a comment to this field. But IMO we have the same problem (They are all counters, so naming them "cnt" or "total_cnt" does not help much.) for CState.cnt which cannot be named in the same way because it also includes skipped and failed transactions. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres CompanyDO $$ my $directory = "my_directory"; mkdir $directory or elog(ERROR, qq{could not create the directory "$directory": $!}); select(undef, undef, undef, 0.001); rmdir $directory or elog(ERROR, qq{could not delete the directory "$directory": $!}); $$ LANGUAGE plperlu;
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 11-09-2018 16:47, Marina Polyakova wrote: On 08-09-2018 16:03, Fabien COELHO wrote: Hello Marina, I'd insist in a comment that "cnt" does not include "skipped" transactions (anymore). If you mean CState.cnt I'm not sure if this is practically useful because the code uses only the sum of all client transactions including skipped and failed... Maybe we can rename this field to nxacts or total_cnt? Sorry, I misread your proposal for the first time. Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
g"); + clientAborted(st, + "perhaps the backend died while processing"); keep on one line? I tried not to break the limit of 80 characters, but if you think that this is better, I'll change it. Overall, the comment text in StatsData is very clear. However they are not clearly linked to the struct fields. I'd suggest that earch field when used should be quoted, so as to separate English from code, and the struct name should always be used explicitely when possible. Ok! I'd insist in a comment that "cnt" does not include "skipped" transactions (anymore). If you mean CState.cnt I'm not sure if this is practically useful because the code uses only the sum of all client transactions including skipped and failed... Maybe we can rename this field to nxacts or total_cnt? About v11-4. I'm do not feel that these changes are very useful/important for now. I'd propose that your prioritize on updating 11-3 so that we can have another round about it as soon as possible, and keep that one later. Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 08-09-2018 10:17, Fabien COELHO wrote: Hello Marina, Hello, Fabien! About the two first preparatory patches. v11-0001-Pgbench-errors-use-the-RandomState-structure-for.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). Same version as the previous one, which was ok. Still applies, compiles, passes tests. Fine with me. v11-0002-Pgbench-errors-use-the-Variables-structure-for-c.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). Simpler version, applies cleanly on top of previous patch, compiles and global & local "make check" are ok. Fine with me as well. Glad to hear it :) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 17-08-2018 14:04, Fabien COELHO wrote: ... Or perhaps we can use a more detailed failure status so for each type of failure we always know the command name (argument "cmd") and whether the client is aborted. Something like this (but in comparison with the first variant ISTM overly complicated): I agree., I do not think that it would be useful given that the same thing is done on all meta-command error cases in the end. Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 17-08-2018 10:49, Fabien COELHO wrote: Hello Marina, Detailed -r report. I understand from the doc that the retry number on the detailed per-statement report is to identify at what point errors occur? Probably this is more or less always at the same point on a given script, so that the most interesting feature is to report the number of retries at the script level. This may depend on various factors.. for example: [...] 21.239 5 36 UPDATE xy3 SET y = y + :delta WHERE x = :x3; 21.360 5 39 UPDATE xy2 SET y = y + :delta WHERE x = :x2; Ok, not always the same point, and you confirm that it identifies where the error is raised which leads to a retry. Yes, I confirm this. I'll try to write more clearly about this in the documentation... So we can write something like this: All the transactions are divided into several types depending on their execution. Firstly, they can be divided into transactions that we started to execute, and transactions which were skipped (it was too late to execute them). Secondly, running transactions fall into 2 main types: is there any command that got a failure during the last execution of the transaction script or not? Thus Here is an attempt at having a more precise and shorter version, not sure it is much better than yours, though: """ Transactions are counted depending on their execution and outcome. First a transaction may have started or not: skipped transactions occur under --rate and --latency-limit when the client is too late to execute them. Secondly, a started transaction may ultimately succeed or fail on some error, possibly after some retries when --max-tries is not one. Thus """ Thank you! I'd suggest to put "ANOTHER_SQL_FAILURE" as the last option, otherwise "another" does not make sense yet. Maybe firstly put a general group, and then special cases?... I understand it more as a catch all default "none of the above" case. Ok! commandFailed: I'm not thrilled by the added boolean, which is partially redundant with the second argument. Do you mean that it is partially redundant with the argument "cmd" and, for example, the meta commands errors always do not cause the abortions of the client? Yes. And also I'm not sure we should want this boolean at all. Perhaps we can use a separate function to print the messages about client's abortion, something like this (it is assumed that all abortions happen when processing SQL commands): static void clientAborted(CState *st, const char *message) { pgbench_error(..., "client %d aborted in command %d (SQL) of script %d; %s\n", st->id, st->command, st->use_file, message); } Or perhaps we can use a more detailed failure status so for each type of failure we always know the command name (argument "cmd") and whether the client is aborted. Something like this (but in comparison with the first variant ISTM overly complicated): /* * For the failures during script execution. */ typedef enum FailureStatus { NO_FAILURE = 0, /* * Failures in meta commands. In these cases the failed transaction is * terminated. */ META_SET_FAILURE, META_SETSHELL_FAILURE, META_SHELL_FAILURE, META_SLEEP_FAILURE, META_IF_FAILURE, META_ELIF_FAILURE, /* * Failures in SQL commands. In cases of serialization/deadlock failures a * failed transaction is re-executed from the very beginning if possible; * otherwise the failed transaction is terminated. */ SERIALIZATION_FAILURE, DEADLOCK_FAILURE, OTHER_SQL_FAILURE, /* other failures in SQL commands that are not * listed by themselves above */ /* * Failures while processing SQL commands. In this case the client is * aborted. */ SQL_CONNECTION_FAILURE } FailureStatus; [...] If in such cases one command is placed on several lines, ISTM that the code is more understandable if curly brackets are used... Hmmm. Such basic style changes are avoided because they break backpatching, so we try to avoid gratuitous changes unless there is a strong added value, which does not seem to be the case here. Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
rilled by the added boolean, which is partially redundant with the second argument. Do you mean that it is partially redundant with the argument "cmd" and, for example, the meta commands errors always do not cause the abortions of the client? if (per_script_stats) - accumStats(_script[st->use_file].stats, skipped, latency, lag); + { + accumStats(_script[st->use_file].stats, skipped, latency, lag, + st->failure_status, st->retries); + } } I do not see the point of changing the style here. If in such cases one command is placed on several lines, ISTM that the code is more understandable if curly brackets are used... [1] https://www.postgresql.org/message-id/fcc2512cdc9e6bc49d3b489181f454da%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 12-08-2018 12:14, Fabien COELHO wrote: HEllo Marina, Hello, Fabien! v10-0003-Pgbench-errors-use-the-Variables-structure-for-c.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). This patch adds an explicit structure to manage Variables, which is useful to reset these on pgbench script retries, which is the purpose of the whole patch series. About part 3: Patch applies cleanly, On 12-08-2018 12:17, Fabien COELHO wrote: About part 3: Patch applies cleanly, I forgot: compiles, global & local "make check" are ok. I'm glad to hear it :-) * typo in comments: "varaibles" I'm sorry, I'll fix it. * About enlargeVariables: multiple INT_MAX error handling looks strange, especially as this code can never be triggered because pgbench would be dead long before having allocated INT_MAX variables. So I would not bother to add such checks. ... I'm not sure that the size_t cast here and there are useful for any practical values likely to be encountered by pgbench. Looking at the code of the functions, for example, ParseScript and psql_scan_setup, where the integer variable is used for the size of the entire script - ISTM that you are right.. Therefore size_t casts will also be removed. ISTM that if something is amiss it will fail in pg_realloc anyway. IIUC and physical RAM is not enough, this may depend on the size of the swap. Also I do not like the ExpBuf stuff, as usual. The exponential allocation seems overkill. I'd simply add a constant number of slots, with a simple rule: /* reallocated with a margin */ if (max_vars < needed) max_vars = needed + 8; So in the end the function should be much simpler. Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 10-08-2018 17:19, Arthur Zakirov wrote: On Fri, Aug 10, 2018 at 04:46:04PM +0300, Marina Polyakova wrote: > +1 from me to keep initial name "pgbench_error". "pgbench_log" for new > function looks nice to me. I think it is better than just "log", > because "log" may conflict with natural logarithmic function (see "man 3 > log"). Do you think that pgbench_log (or another whose name speaks only about logging) will look good, for example, with FATAL? Because this means that the logging function also processes errors and calls exit(1) if necessary.. Yes, why not. "_log" just means that you want to log some message with the specified log level. Moreover those messages sometimes aren't error: pgbench_error(LOG, "starting vacuum..."); "pgbench_log" is already used as the default filename prefix for transaction logging. > I agree with Fabien. Calling pgbench_error() inside pgbench_error() > could be dangerous. I think "fmt" checking could be removed, or we may > use Assert() I would like not to use Assert in this case because IIUC they are mostly used for testing. I'd vote to remove this check at all. I don't see any place where it is possible to call pgbench_error() passing empty "fmt". pgbench_error(..., "%s", PQerrorMessage(con)); ? -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 10-08-2018 15:53, Arthur Zakirov wrote: On Thu, Aug 09, 2018 at 06:17:22PM +0300, Marina Polyakova wrote: > * ErrorLevel > > If ErrorLevel is used for things which are not errors, its name should > not include "Error"? Maybe "LogLevel"? On the one hand, this sounds better for me too. On the other hand, will not this be in some kind of conflict with error level codes in elog.h?.. I think it shouldn't because those error levels are backends levels. pgbench is a client side utility with its own code, it shares some code with libpq and other utilities, but elog.h isn't one of them. I agree with you on this :) I just meant that maybe it would be better to call this group in the same way because they are used in general for the same purpose?.. > This point also suggest that maybe "pgbench_error" is misnamed as well > (ok, I know I suggested it in place of ereport, but e stands for error > there), as it is called on errors, but is also on other things. Maybe > "pgbench_log"? Or just simply "log" or "report", as it is really an > local function, which does not need a prefix? That would mean that > "pgbench_simple_error", which is indeed called on errors, could keep > its initial name "pgbench_error", and be called on errors. About the name "log" - we already have the function doLog, so perhaps the name "report" will be better.. But like with ErrorLevel will not this be in some kind of conflict with ereport which is also used for the levels DEBUG... / LOG / INFO? +1 from me to keep initial name "pgbench_error". "pgbench_log" for new function looks nice to me. I think it is better than just "log", because "log" may conflict with natural logarithmic function (see "man 3 log"). Do you think that pgbench_log (or another whose name speaks only about logging) will look good, for example, with FATAL? Because this means that the logging function also processes errors and calls exit(1) if necessary.. > pgbench_error calls pgbench_error. Hmmm, why not. I agree with Fabien. Calling pgbench_error() inside pgbench_error() could be dangerous. I think "fmt" checking could be removed, or we may use Assert() I would like not to use Assert in this case because IIUC they are mostly used for testing. or fprintf()+exit(1) at least. Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 10-08-2018 11:33, Fabien COELHO wrote: Hello Marina, I'd suggest to let lookupCreateVariable, putVariable* as they are, call pgbench_error with a level which does not stop the execution, and abort if necessary from the callers with a "aborted because of putVariable/eval/... error" message, as it was done before. There's one more problem: if this is a client failure, an error message inside any of these functions should be printed at the level DEBUG_FAILS; otherwise it should be printed at the level LOG. Or do you suggest using the error level as an argument for these functions? No. I suggest that the called function does only one simple thing, probably "DEBUG", and that the *caller* prints a message if it is unhappy about the failure of the called function, as it is currently done. This allows to provide context as well from the caller, eg "setting variable %s failed while ". The user call rerun under debug for precision if they need it. Ok! I'm still not over enthousiastic with these changes, and still think that it should be an independent patch, not submitted together with the "retry on error" feature. In the next version I will put the error patch last, so it will be possible to compare the "retry on error" feature with it and without it, and let the committer decide how it is better) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
report, but e stands for error there), as it is called on errors, but is also on other things. Maybe "pgbench_log"? Or just simply "log" or "report", as it is really an local function, which does not need a prefix? That would mean that "pgbench_simple_error", which is indeed called on errors, could keep its initial name "pgbench_error", and be called on errors. About the name "log" - we already have the function doLog, so perhaps the name "report" will be better.. But like with ErrorLevel will not this be in some kind of conflict with ereport which is also used for the levels DEBUG... / LOG / INFO? Alternatively, the debug/logging code could be let as it is (i.e. direct print to stderr) and the function only called when there is some kind of error, in which case it could be named with "error" in its name (or elog/ereport...). As I wrote in [2]: because of my patch there may be a lot of such code (see v7 in [1]): - fprintf(stderr, - "malformed variable \"%s\" value: \"%s\"\n", - var->name, var->svalue); + if (debug_level >= DEBUG_FAILS) + { + fprintf(stderr, + "malformed variable \"%s\" value: \"%s\"\n", + var->name, var->svalue); + } - if (debug) + if (debug_level >= DEBUG_ALL) fprintf(stderr, "client %d sending %s\n", st->id, sql); That's why it was suggested to make the error function which hides all these things (see [2]): There is a lot of checks like "if (debug_level >= DEBUG_FAILS)" with corresponding fprintf(stderr..) I think it's time to do it like in the main code, wrap with some function like log(level, msg). And IIUC macros will not help in the absence of __VA_ARGS__. * PQExpBuffer I still do not see a positive value from importing PQExpBuffer complexity and cost into pgbench, as the resulting code is not very readable and it adds malloc/free cycles, so I'd try to avoid using PQExpBuf as much as possible. ISTM that all usages could be avoided in the patch, and most should be avoided even if ExpBuffer is imported because it is really useful somewhere. - to call pgbench_error from pgbench_simple_error, you can do a pgbench_log_va(level, format, va_list) version called both from pgbench_error & pgbench_simple_error. - for PGBENCH_DEBUG function, do separate calls per type, the very small partial code duplication is worth avoiding ExpBuf IMO. - for doCustom debug: I'd just let the printf as it is, with a comment, as it is really very internal stuff for debug. Or I'd just snprintf a something in a static buffer. - for syntax_error: it should terminate, so it should call pgbench_error(FATAL, ...). Idem, I'd either keep the printf then call pgbench_error(FATAL, "syntax error found\n") for a final message, or snprintf in a static buffer. - for listAvailableScript: I'd simply call "pgbench_error(LOG" several time, once per line. I see building a string with a format (printfExpBuf..) and then calling the pgbench_error function with just a "%s" format on the result as not very elegant, because the second format is somehow hacked around. Ok! About using a static buffer in doCustom debug or in syntax_error - I'm not sure that this is always possible because ISTM that the variable name can be quite large. * bool client I'm unconvince by this added boolean just to switch the level on encountered errors. I'd suggest to let lookupCreateVariable, putVariable* as they are, call pgbench_error with a level which does not stop the execution, and abort if necessary from the callers with a "aborted because of putVariable/eval/... error" message, as it was done before. There's one more problem: if this is a client failure, an error message inside any of these functions should be printed at the level DEBUG_FAILS; otherwise it should be printed at the level LOG. Or do you suggest using the error level as an argument for these functions? pgbench_error calls pgbench_error. Hmmm, why not. [1] https://www.postgresql.org/message-id/alpine.DEB.2.21.1806100837380.3655%40lancre [2] https://www.postgresql.org/message-id/b692de21caaed13c59f31c06d0098488%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 07-08-2018 19:21, Fabien COELHO wrote: Hello Marina, Hello, Fabien! v10-0001-Pgbench-errors-use-the-RandomState-structure-for.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). About this v10 part 1: Patch applies cleanly, compile, global & local make check both ok. The random state is cleanly separated so that it will be easy to reset it on client error handling ISTM that the pgbench side is deterministic with the separation of the seeds for different uses. Code is clean, comments are clear. :-) I'm wondering what is the rational for the "xseed" field name? In particular, what does the "x" stands for? I called it "...seed" instead of "data" because perhaps the "data" is too general a name for use here (but I'm not entirely sure what Alvaro Herrera meant in [1], see my answer in [2]). I called it "xseed" to combine it with the arguments of the functions _dorand48 / pg_erand48 / pg_jrand48 in the file erand48.c. IIUC they use a linear congruential generator and perhaps "xseed" means the sequence with the name X of pseudorandom values of size 48 bits (X_0, X_1, ... X_n) where X_0 is the seed / the start value. [1] https://www.postgresql.org/message-id/20180711180417.3ytmmwmonsr5lra7@alvherre.pgsql LGTM, though I'd rename the random_state struct members so that it wouldn't look as confusing. Maybe that's just me. [2] https://www.postgresql.org/message-id/cb2cde10e4e7a10a38b48e9cae8fbd28%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 11-07-2018 22:34, Fabien COELHO wrote: can we try something like this? PGBENCH_ERROR_START(DEBUG_FAIL) { PGBENCH_ERROR("client %d repeats the failed transaction (try %d", Argh, no? I was thinking of something much more trivial: pgbench_error(DEBUG, "message format %d %s...", 12, "hello world"); If you really need some complex dynamic buffer, and I would prefer that you avoid that, then the fallback is: if (level >= DEBUG) { initPQstuff(); ... pgbench_error(DEBUG, "fixed message... %s\n", msg); freePQstuff(); } The point is to avoid building the message with dynamic allocation and so if in the end it is not used. Ok! About avoidance - I'm afraid there's one more piece of debugging code with the same problem: else if (command->type == META_COMMAND) { ... initPQExpBuffer(_buf); printfPQExpBuffer(_buf, "client %d executing \\%s", st->id, argv[0]); for (i = 1; i < argc; i++) appendPQExpBuffer(_buf, " %s", argv[i]); appendPQExpBufferChar(_buf, '\n'); ereport(ELEVEL_DEBUG, (errmsg("%s", errmsg_buf.data))); termPQExpBuffer(_buf); -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 11-07-2018 21:04, Alvaro Herrera wrote: Just a quick skim while refreshing what were those error reporting API changes about ... Thank you! On 2018-May-21, Marina Polyakova wrote: v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). LGTM, though I'd rename the random_state struct members so that it wouldn't look as confusing. Maybe that's just me. IIUC, do you like "xseed" instead of "data"? typedef struct RandomState { - unsigned short data[3]; + unsigned short xseed[3]; } RandomState; Or do you want to rename "random_state" in the structures RetryState / CState / TState? Thanks to Fabien Coelho' comments in [1], TState can contain several RandomStates for different purposes, something like this: /* * Thread state */ typedef struct { ... /* * Separate randomness for each thread. Each thread option uses its own * random state to make all of them independent of each other and therefore * deterministic at the thread level. */ RandomState choose_script_rs; /* random state for selecting a script */ RandomState throttling_rs; /* random state for transaction throttling */ RandomState sampling_rs;/* random state for log sampling */ ... } TState; v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). Please don't allocate Variable structs one by one. First time allocate some decent number (say 8) and then enlarge by duplicating size. That way you save realloc overhead. We use this technique everywhere else, no reason do different here. Other than that, LGTM. Ok! [1] https://www.postgresql.org/message-id/alpine.DEB.2.21.1806090810090.5307%40lancre While reading your patch, it occurs to me that a run is not deterministic at the thread level under throttling and sampling, because the random state is sollicited differently depending on when transaction ends. This suggest that maybe each thread random_state use should have its own random state. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 11-07-2018 20:49, Alvaro Herrera wrote: On 2018-Jul-11, Marina Polyakova wrote: can we try something like this? PGBENCH_ERROR_START(DEBUG_FAIL) { PGBENCH_ERROR("client %d repeats the failed transaction (try %d", st->id, st->retries + 1); if (max_tries) PGBENCH_ERROR("/%d", max_tries); if (latency_limit) { PGBENCH_ERROR(", %.3f%% of the maximum time of tries was used", getLatencyUsed(st, )); } PGBENCH_ERROR(")\n"); } PGBENCH_ERROR_END(); I didn't quite understand what these PGBENCH_ERROR() functions/macros are supposed to do. Care to explain? It is used only to print a string with the given arguments to stderr. Probably it might be just the function pgbench_error and not a macro.. P.S. This is my mistake, I did not think that PGBENCH_ERROR_END does not know the elevel for calling exit(1) if the elevel >= ERROR. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
or not. ISTM that it will be shown final report. If I want debug, I ask for --debug, otherwise I think that the command should do what it was asked for, i.e. run scripts, collect performance statistics and show them at the end. In particular, when running with retries is enabled, the user is expecting deadlock/serialization errors, so that they are not "errors" as such for them. They also help to understand which limit of retries was violated or how close we were to these limits during the execution of a specific transaction. But I agree with you that they are costly and can be skipped if the failure type is never retried. Maybe it is better to split them into multiple error function calls?.. Debugging message costs should only be incurred when under --debug, not otherwise. Ok! IIUC instead of this part of the code initPQExpBuffer(_buf); printfPQExpBuffer(_buf, "client %d repeats the failed transaction (try %d", st->id, st->retries + 1); if (max_tries) appendPQExpBuffer(_buf, "/%d", max_tries); if (latency_limit) { appendPQExpBuffer(_buf, ", %.3f%% of the maximum time of tries was used", getLatencyUsed(st, )); } appendPQExpBufferStr(_buf, ")\n"); pgbench_error(DEBUG_FAIL, "%s", errmsg_buf.data); termPQExpBuffer(_buf); can we try something like this? PGBENCH_ERROR_START(DEBUG_FAIL) { PGBENCH_ERROR("client %d repeats the failed transaction (try %d", st->id, st->retries + 1); if (max_tries) PGBENCH_ERROR("/%d", max_tries); if (latency_limit) { PGBENCH_ERROR(", %.3f%% of the maximum time of tries was used", getLatencyUsed(st, )); } PGBENCH_ERROR(")\n"); } PGBENCH_ERROR_END(); You have added 20-columns alignment prints. This looks like too much and generates much too large lines. Probably 10 (billion) would be enough. I have already asked you about this in [2]: Probably:-) The variables for the numbers of failures and retries are of type int64 since the variable for the total number of transactions has the same type. That's why such a large alignment (as I understand it now, enough 20 characters). Do you prefer floating alignemnts, depending on the maximum number of failures/retries for any command in any script? An int64 counter is not likely to reach its limit anytime soon:-) If the column display limit is ever reached, ISTM that then the text is just misaligned, which is a minor and rare inconvenience. If very wide columns are used, then it does not fit my terminal and the report text will always be wrapped around, which makes it harder to read, every time. Ok! The latency limit to 900 ms try is a bad idea because it takes a lot of time. I did such tests before and they were removed by Tom Lane because of determinism and time issues. I would comment this test out for now. Ok! If it doesn't bother you - can you tell more about the causes of these determinism issues?.. Tests for some other failures that cannot be retried are already added to 001_pgbench_with_server.pl. Some farm animals are very slow, so you cannot really assume much about time one way or another. Thanks! I do not understand why there is so much text about in failed sql transaction stuff, while we are mainly interested in serialization & deadlock errors, and this only falls in some "other" category. There seems to be more details about other errors that about deadlocks & serializable errors. The reporting should focus on what is of interest, either all errors, or some detailed split of these errors. <...> * "errors_in_failed_tx" is some subcounter of "errors", for a special case. Why it is there fails me [I finally understood, and I think it should be removed, see end of review]. If we wanted to distinguish, then we should distinguish homogeneously: maybe just count the different error types, eg have things like "deadlock_errors", "serializable_errors", "other_errors", "internal_pgbench_errors" which would be orthogonal one to the other, and "errors" could be recomputed from these. Thank you, I agree with you. Unfortunately each new error type adds a new 1 or 2 columns of maximum width 20 to the per-statement report The fact that some data are collected does not mean that they should all be reported in detail. We can have detailed error count and report the sum of this errors for instance, or have some more verbose/detailed reports as options (eg --latencies does just that). Ok! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
(trigger one or a few deadlocks), idem for serializable, maybe idem for other errors if any. <...> The latency limit to 900 ms try is a bad idea because it takes a lot of time. I did such tests before and they were removed by Tom Lane because of determinism and time issues. I would comment this test out for now. Ok! If it doesn't bother you - can you tell more about the causes of these determinism issues?.. Tests for some other failures that cannot be retried are already added to 001_pgbench_with_server.pl. The challenge is to do that reliably and efficiently, i.e. so that the test does not rely on chance and is still quite efficient. The trick you use is to run an interactive psql in parallel to pgbench so as to play with concurrent locks. That is interesting, but deserves more comments and explanatation, eg before the test functions. Maybe this could be achieved within pgbench by using some wait stuff in PL/pgSQL so that concurrent client can wait one another based on data in unlogged table updated by a CALL within an "embedded" transactions? Not sure. <...> Anyway, TAP tests should be much lighter (in total time), and if possible much simpler. I'll try, thank you.. Otherwise, maybe (simple) pgbench-side thread barrier could help, but this would require more thinking. Tests must pass if we use --disable-thread-safety.. Documentation = Not looked at in much details for now. Just a few comments: Having the "most important settings" on line 1-6 and 8 (i.e. skipping 7) looks silly. The important ones should simply be the first ones, and the 8th is not that important, or it is in 7th position. Ok! I do not understand why there is so much text about in failed sql transaction stuff, while we are mainly interested in serialization & deadlock errors, and this only falls in some "other" category. There seems to be more details about other errors that about deadlocks & serializable errors. The reporting should focus on what is of interest, either all errors, or some detailed split of these errors. <...> * "errors_in_failed_tx" is some subcounter of "errors", for a special case. Why it is there fails me [I finally understood, and I think it should be removed, see end of review]. If we wanted to distinguish, then we should distinguish homogeneously: maybe just count the different error types, eg have things like "deadlock_errors", "serializable_errors", "other_errors", "internal_pgbench_errors" which would be orthogonal one to the other, and "errors" could be recomputed from these. Thank you, I agree with you. Unfortunately each new error type adds a new 1 or 2 columns of maximum width 20 to the per-statement report (to report errors and possibly retries of this type in this statement) and we already have 2 new columns for all errors and retries. So I'm not sure that we need add anything other than statistics only about all the errors and all the retries in general. The documentation should state clearly what are the counted errors, and then what are their effects on the reported stats. The "Errors and Serialization/Deadlock Retries" section is a good start in that direction, but it does not talk about pgbench internal errors (eg "cos(true)"). I think it should more explicit about errors. Thank you, I'll try to improve it. Option --max-tries default value should be spelled out in the doc. If you mean that it is set to 1 if neither of the options --max-tries or --latency-limit is explicitly used, I'll fix this. "Client's run is aborted", do you mean "Pgbench run is aborted"? No, other clients continue their run as usual. * FailureStatus states are not homogeneously named. I'd suggest to use *_FAILURE for all cases. The miscellaneous case should probably be the last. I do not understand the distinction between ANOTHER_FAILURE & IN_FAILED_SQL_TRANSACTION. Why should it be needed? [again, see and of review] <...> "If a failed transaction block does not terminate in the current script": this just looks like a very bad idea, and explains my general ranting above about this error condition. ISTM that the only reasonable option is that a pgbench script should be inforced as a transaction, or a set of transactions, but cannot be a "piece" of transaction, i.e. pgbench script with "BEGIN;" but without a corresponding "COMMIT" is a user error and warrants an abort, so that there is no need to manage these "in aborted transaction" errors every where and report about them and document them extensively. This means adding a check when a script is finished or starting that PQtransactionStatus(const PGconn *conn) == PQTRANS_IDLE, and abort if not with a fatal error. Then we can forget about these "in tx errors" counting, reporting and so on, and just have to document the restriction. Ok! [1] https://www.postgresql.org/message-id/alpine.DEB.2.20.1801031720270.20034%40lancre [2] https://www.postgresql.org/message-id/e4c5e8cefa4a8e88f1273b0f1ee29...@postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 13-06-2018 22:44, Fabien COELHO wrote: Hello Marina, I suppose that this is related; because of my patch there may be a lot of such code (see v7 in [1]): - fprintf(stderr, - "malformed variable \"%s\" value: \"%s\"\n", - var->name, var->svalue); + if (debug_level >= DEBUG_FAILS) + { + fprintf(stderr, + "malformed variable \"%s\" value: \"%s\"\n", + var->name, var->svalue); + } - if (debug) + if (debug_level >= DEBUG_ALL) fprintf(stderr, "client %d sending %s\n", st->id, sql); I'm not sure that debug messages needs to be kept after debug, if it is about debugging pgbench itself. That is debatable. AFAICS it is not about debugging pgbench itself, but about more detailed information that can be used to understand what exactly happened during its launch. In the case of errors this helps to distinguish between failures or errors by type (including which limit for retries was violated and how far it was exceeded for the serialization/deadlock errors). The code adapts/duplicates existing server-side "ereport" stuff and brings it to the frontend, where the logging needs are somehow quite different. I'd prefer to avoid duplication and/or have some code sharing. I was recommended to use the same interface in [3]: On elog/errstart: we already have a convention for what ereport() calls look like; I suggest to use that instead of inventing your own. The "elog" interface already exists, it is not an invention. "ereport" is a hack which is somehow necessary in some cases. I prefer a simple function call if possible for the purpose, and ISTM that this is the case. That is a lot of complication which are justified server side where logging requirements are special, but in this case I see it as overkill. I think we need ereport() if we want to make detailed error messages (see examples in [1]).. If it really needs to be duplicated, I'd suggest to put all this stuff in separated files. If we want to do that, I think that it would belong to fe_utils, and where it could/should be used by all front-end programs. I'll try to do it.. Dunno. If you only need one "elog" function which prints a message to stderr and decides whether to abort/exit/whatevrer, maybe it can just be kept in pgbench. If there are are several complicated functions and macros, better with a file. So I'd say it depends. So my current view is that if you only need an "elog" function, it is simpler to add it to "pgbench.c". Thank you! For logging purposes, ISTM that the "elog" macro interface is nicer, closer to the existing "fprintf(stderr", as it would not introduce the additional parentheses hack for "rest". I was also recommended to use ereport() instead of elog() in [3]: Probably. Are you hoping that advises from different reviewers should be consistent? That seems optimistic:-) To make the patch committable there should be no objection to it.. [1] https://www.postgresql.org/message-id/c89fcc380a19380260b5ea463efc1416%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 13-06-2018 22:59, Alvaro Herrera wrote: For context: in the backend, elog() is only used for internal messages (i.e. "can't-happen" conditions), and ereport() is used for user-facing messages. There are many things ereport() has that elog() doesn't, such as additional message fields (HINT, DETAIL, etc) that I think could have some use in pgbench as well. If you use elog() then you can't have that. AFAIU originally it was not supposed that the pgbench error messages have these fields, so will it be good to change the final output to stderr?.. For example: - fprintf(stderr, "%s", PQerrorMessage(con)); - fprintf(stderr, "(ignoring this error and continuing anyway)\n"); + ereport(LOG, + (errmsg("Ignoring the server error and continuing anyway"), +errdetail("%s", PQerrorMessage(con; - fprintf(stderr, "%s", PQerrorMessage(con)); - if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) == 0) - { -fprintf(stderr, "Perhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n", PQdb(con)); - } - - exit(1); + ereport(ERROR, + (errmsg("Server error"), +errdetail("%s", PQerrorMessage(con)), +sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) == 0 ? + errhint("Perhaps you need to do initialization (\"pgbench -i\") in database \"%s\"\n", +PQdb(con)) : 0)); -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
which maybe would exit or abort depending on level, and possibly not actually report under some levels and/or some conditions. For that, it could enough to just provide an nice "elog" function. I agree that elog() can be coded in this way. To use ereport() I need a structure to store the error level as a condition to exit. In conclusion, which you can disagree with because maybe I have missed something... anyway I currently think that: - it should be an independent submission - possibly at "fe_utils" level - possibly just a nice "elog" function is enough, if so just do that. I hope I answered all this above.. [1] https://www.postgresql.org/message-id/453fa52de88477df2c4a2d82e09e461c%40postgrespro.ru [2] https://www.postgresql.org/message-id/20180405180807.0bc1114f%40wp.localdomain [3] https://www.postgresql.org/message-id/20180508105832.6o3uf3npfpjgk5m7%40alvherre.pgsql -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 09-06-2018 16:31, Fabien COELHO wrote: Hello Marina, Hello! v9-0002-Pgbench-errors-use-the-Variables-structure-for-cl.patch - a patch for the Variables structure (this is used to reset client variables during the repeating of transactions after serialization/deadlock failures). About this second patch: This extracts the variable holding structure, so that it is somehow easier to reset them to their initial state on transaction failures, the management of which is the ultimate aim of this patch series. It is also cleaner this way. Patch applies cleanly on top of the previous one (there is no real interactions with it). It compiles cleanly. Global & pgbench "make check" are both ok. :-) The structure typedef does not need a name. "typedef struct { } V...". Ok! I tend to disagree with naming things after their type, eg "array". I'd suggest "vars" instead. "nvariables" could be "nvars" for consistency with that and "vars_sorted", and because "foo.variables->nvariables" starts looking heavy. I'd suggest but "Variables" type declaration just after "Variable" type declaration in the file. Thank you, I agree and I'll fix all this. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 09-06-2018 9:55, Fabien COELHO wrote: Hello Marina, Hello! v9-0001-Pgbench-errors-use-the-RandomState-structure-for-.patch - a patch for the RandomState structure (this is used to reset a client's random seed during the repeating of transactions after serialization/deadlock failures). A few comments about this first patch. Thank you very much! Patch applies cleanly, compiles, global & pgbench "make check" ok. I'm mostly ok with the changes, which cleanly separate the different use of random between threads (script choice, throttle delay, sampling...) and client (random*() calls). Glad to hear it :) This change is necessary so that a client can restart a transaction deterministically (at the client level at least), which is the ultimate aim of the patch series. A few remarks: The RandomState struct is 6 bytes, which will induce some padding when used. This is life and pre-existing. No problem. ISTM that the struct itself does not need a name, ie. "typedef struct { ... } RandomState" is enough. Ok! There could be clear comments, say in the TState and CState structs, about what randomness is impacted (i.e. script choices, etc.). Thank you, I'll add them. getZipfianRand, computeHarmonicZipfian: The "thread" parameter was justified because it was used for two fieds. As the random state is separated, I'd suggest that the other argument should be a zipfcache pointer. I agree with you and I will change it. While reading your patch, it occurs to me that a run is not deterministic at the thread level under throttling and sampling, because the random state is sollicited differently depending on when transaction ends. This suggest that maybe each thread random_state use should have its own random state. Thank you, I'll fix this. In passing, and totally unrelated to this patch: I've always been a little puzzled about why a quite small 48-bit internal state random generator is used. I understand the need for pg to have a portable & state-controlled thread-safe random generator, but why this particular small one fails me. The source code (src/port/erand48.c, copyright in 1993...) looks optimized for 16 bits architectures, which is probably pretty inefficent to run on 64 bits architectures. Maybe this could be updated with something more consistent with today's processors, providing more quality at a lower cost. This sounds interesting, thanks! *went to look for a multiplier and a summand that are large enough and are mutually prime..* -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] path toward faster partition pruning
On 09-05-2018 17:30, Alvaro Herrera wrote: Marina Polyakova wrote: Hello everyone in this thread! I got a similar server crash as in [1] on the master branch since the commit 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is an ArrayCoerceExpr (see [2]): Hello Marina, thanks for reporting this. I have pushed all fixes derived from this report -- thanks to Amit and Michaël for those. I verified your test case no longer crashes. If you have more elaborate test cases, please do try these too. Hello, thank you all very much! :) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] path toward faster partition pruning
On 07-05-2018 4:37, Michael Paquier wrote: On Fri, May 04, 2018 at 12:32:23PM +0300, Marina Polyakova wrote: I got a similar server crash as in [1] on the master branch since the commit 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is an ArrayCoerceExpr (see [2]): Indeed, I can see the crash. I have been playing with this stuff and I am in the middle of writing the patch, but let's track this properly for now. Thank you very much! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] path toward faster partition pruning
Hello everyone in this thread! I got a similar server crash as in [1] on the master branch since the commit 9fdb675fc5d2de825414e05939727de8b120ae81 when the assertion fails because the second argument ScalarArrayOpExpr is not a Const or an ArrayExpr, but is an ArrayCoerceExpr (see [2]): =# create table list_parted ( a varchar ) partition by list (a); =# create table part_ab_cd partition of list_parted for values in ('ab', 'cd'); =# CREATE OR REPLACE FUNCTION public.x_stl_text_integer ( ) RETURNS text STABLE AS $body$ BEGIN RAISE NOTICE 's text integer'; RETURN 1::text; END; $body$ LANGUAGE 'plpgsql'; =# explain (costs off) select * from list_parted where a in ('ab', 'cd', x_stl_text_integer()); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> [1] https://www.postgresql.org/message-id/CAKcux6nCsCmu9oUnnuKZkeBenYvUFbU2Lt4q2MFNEb7QErzn8w%40mail.gmail.com [2] partprune.c, function match_clause_to_partition_key: if (IsA(rightop, Const)) { ... } else { ArrayExpr *arrexpr = castNode(ArrayExpr, rightop); # fails here ... } -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Add support for printing/reading MergeAction nodes
Sorry for this late reply, I was very busy with the patch for pgbench.. On 04-04-2018 20:07, Simon Riggs wrote: ... Which debug mode are we talking about, please? -d 5 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Add support for printing/reading MergeAction nodes
I'm sorry I was very busy with the patch for pgbench.. On 04-04-2018 19:19, Tom Lane wrote: ... BTW, poking around in the grammar, I notice that MergeStmt did not get added to RuleActionStmt. That seems like a rather serious omission. Thank you very much! I will try to do this, if you do not mind, of course.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Add support for printing/reading MergeAction nodes
Hello, hackers! When debugging is enabled for server logging, isolation tests fail because there're no corresponding output functions for InsertStmt / DeleteStmt / UpdateStmt that are used in the output of the MergeAction nodes (see the attached regressions diffs and output). I also attached a try that makes the tests pass. Sorry if I missed that it was already discussed somewhere. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index c8d9626..2411658 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -409,6 +409,68 @@ _outMergeAction(StringInfo str, const MergeAction *node) } static void +_outInferClause(StringInfo str, const InferClause *node) +{ + WRITE_NODE_TYPE("INFERCLAUSE"); + + WRITE_NODE_FIELD(indexElems); + WRITE_NODE_FIELD(whereClause); + WRITE_STRING_FIELD(conname); + WRITE_LOCATION_FIELD(location); +} + +static void +_outOnConflictClause(StringInfo str, const OnConflictClause *node) +{ + WRITE_NODE_TYPE("ONCONFLICTCLAUSE"); + + WRITE_ENUM_FIELD(action, OnConflictAction); + WRITE_NODE_FIELD(infer); + WRITE_NODE_FIELD(targetList); + WRITE_NODE_FIELD(whereClause); + WRITE_LOCATION_FIELD(location); +} + +static void +_outInsertStmt(StringInfo str, const InsertStmt *node) +{ + WRITE_NODE_TYPE("INSERT"); + + WRITE_NODE_FIELD(relation); + WRITE_NODE_FIELD(cols); + WRITE_NODE_FIELD(selectStmt); + WRITE_NODE_FIELD(onConflictClause); + WRITE_NODE_FIELD(returningList); + WRITE_NODE_FIELD(withClause); + WRITE_ENUM_FIELD(override, OverridingKind); +} + +static void +_outDeleteStmt(StringInfo str, const DeleteStmt *node) +{ + WRITE_NODE_TYPE("DELETE"); + + WRITE_NODE_FIELD(relation); + WRITE_NODE_FIELD(usingClause); + WRITE_NODE_FIELD(whereClause); + WRITE_NODE_FIELD(returningList); + WRITE_NODE_FIELD(withClause); +} + +static void +_outUpdateStmt(StringInfo str, const UpdateStmt *node) +{ + WRITE_NODE_TYPE("UPDATE"); + + WRITE_NODE_FIELD(relation); + WRITE_NODE_FIELD(targetList); + WRITE_NODE_FIELD(whereClause); + WRITE_NODE_FIELD(fromClause); + WRITE_NODE_FIELD(returningList); + WRITE_NODE_FIELD(withClause); +} + +static void _outAppend(StringInfo str, const Append *node) { WRITE_NODE_TYPE("APPEND"); @@ -3682,6 +3744,21 @@ outNode(StringInfo str, const void *obj) case T_MergeAction: _outMergeAction(str, obj); break; + case T_InferClause: +_outInferClause(str, obj); +break; + case T_OnConflictClause: +_outOnConflictClause(str, obj); +break; + case T_InsertStmt: +_outInsertStmt(str, obj); +break; + case T_DeleteStmt: +_outDeleteStmt(str, obj); +break; + case T_UpdateStmt: +_outUpdateStmt(str, obj); +break; case T_Append: _outAppend(str, obj); break; diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 4518fa0..13891b1 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -1620,6 +1620,93 @@ _readMergeAction(void) } /* + * _readInferClause + */ +static InferClause * +_readInferClause(void) +{ + READ_LOCALS(InferClause); + + READ_NODE_FIELD(indexElems); + READ_NODE_FIELD(whereClause); + READ_STRING_FIELD(conname); + READ_LOCATION_FIELD(location); + + READ_DONE(); +} + +/* + * _readOnConflictClause + */ +static OnConflictClause * +_readOnConflictClause(void) +{ + READ_LOCALS(OnConflictClause); + + READ_ENUM_FIELD(action, OnConflictAction); + READ_NODE_FIELD(infer); + READ_NODE_FIELD(targetList); + READ_NODE_FIELD(whereClause); + READ_LOCATION_FIELD(location); + + READ_DONE(); +} + +/* + * _readInsertStmt + */ +static InsertStmt * +_readInsertStmt(void) +{ + READ_LOCALS(InsertStmt); + + READ_NODE_FIELD(relation); + READ_NODE_FIELD(cols); + READ_NODE_FIELD(selectStmt); + READ_NODE_FIELD(onConflictClause); + READ_NODE_FIELD(returningList); + READ_NODE_FIELD(withClause); + READ_ENUM_FIELD(override, OverridingKind); + + READ_DONE(); +} + +/* + * _readDeleteStmt + */ +static DeleteStmt * +_readDeleteStmt(void) +{ + READ_LOCALS(DeleteStmt); + + READ_NODE_FIELD(relation); + READ_NODE_FIELD(usingClause); + READ_NODE_FIELD(whereClause); + READ_NODE_FIELD(returningList); + READ_NODE_FIELD(withClause); + + READ_DONE(); +} + +/* + * _readUpdateStmt + */ +static UpdateStmt * +_readUpdateStmt(void) +{ + READ_LOCALS(UpdateStmt); + + READ_NODE_FIELD(relation); + READ_NODE_FIELD(targetList); + READ_NODE_FIELD(whereClause); + READ_NODE_FIELD(fromClause); + READ_NODE_FIELD(returningList); + READ_NODE_FIELD(withClause); + + READ_DONE(); +} + +/* * _readAppend */ static Append * @@ -2620,6 +2707,16 @@ parseNodeString(void) return_value = _readModifyTable(); else if (MATCH("MERGEACTION", 11)) return_value = _readMergeAction(); + else if (MATCH("INFERCLAUSE", 11)) + return_value = _readInferClause(); + else if (MATCH("ONCONFLICT
Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
On 25-03-2018 15:23, Fabien COELHO wrote: Hm, I took a look on both thread about patch and it seems to me now it's overcomplicated. With recently committed enhancements of pgbench (\if, \when) it becomes close to impossible to retry transation in case of failure. So, initial approach just to rollback such transaction looks more attractive. Yep. Many thanks to both of you! I'm working on a patch in this direction.. I think that the best approach for now is simply to reset (command zero, random generator) and start over the whole script, without attempting to be more intelligent. The limitations should be clearly documented (one transaction per script), though. That would be a significant enhancement already. I'm not sure that we can always do this, because we can get new errors until we finish the failed transaction block, and we need destroy the conditional stack.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: WIP Patch: Pgbench Serialization and deadlock errors
On 05-03-2018 18:21, David Steele wrote: Hello Marina, Hello, David! On 1/12/18 12:01 PM, Marina Polyakova wrote: ... This patch was marked Waiting on Author on Jan 8 and no new patch was submitted before this commitfest. I think we should mark this patch as Returned with Feedback. I'm sorry, I was very busy with the patch to precalculate stable functions.. I'm working on a new version of this patch for pgbench unless, of course, it's too late for v11. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: WIP Patch: Precalculate stable functions, infrastructure v1
Ok! On 02-03-2018 22:56, Andres Freund wrote: Hi, On 2018-03-02 11:22:01 +0300, Marina Polyakova wrote: I fixed the failure that Thomas pointed out to me, and I'm finishing work on it, but it took me a while to study this part of the executor.. I unfortunately think that makes this too late for v11, and we should mark this as returned with feedback. Greetings, Andres Freund -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: WIP Patch: Precalculate stable functions, infrastructure v1
Hello! I fixed the failure that Thomas pointed out to me, and I'm finishing work on it, but it took me a while to study this part of the executor.. On 02-03-2018 0:11, Andres Freund wrote: Hi, On 2018-02-01 08:01:48 +0300, Marina Polyakova wrote: > ISTM, there might be some value to consider all of them in the design of > the new mechanism. I'm sorry, the other parts have occupied all the time, and I'll work on it.. That has, as far as I can see, not happened. And the patch has been reported as failing by Thomas a while ago. So I'm inclined to mark this as returned with feedback for this CF? - Andres -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master check fails on Windows Server 2008
On 21-02-2018 18:51, Tom Lane wrote: Marina Polyakova <m.polyak...@postgrespro.ru> writes: On 20-02-2018 21:23, Tom Lane wrote: I continue to wonder if it's not better to just remove the option and thereby simplify our lives. What's the actual value of having it anymore? I agree with you, but I have too little experience to vote for removing this option. I've started a separate thread to propose removal of the option, at https://postgr.es/m/10862.1519228...@sss.pgh.pa.us Thank you! -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master check fails on Windows Server 2008
On 20-02-2018 21:23, Tom Lane wrote: Marina Polyakova <m.polyak...@postgrespro.ru> writes: On 20-02-2018 3:37, Tom Lane wrote: 4. Try to tweak the stats_ext.sql test conditions in some more refined way to get the test to pass everywhere. This'd be a lot of work with no guarantee of success, so I'm not too excited about it. Thank you for your explanations! I'll try to do something in this direction.. OK. The least painful fix might be to establish a different work_mem setting just for that one query. However, if you're intent on putting work into continued support of --disable-float8-byval, I would *strongly* suggest setting up a buildfarm member that runs that way, because otherwise we're pretty much guaranteed to break it again. Oh, thank you again! I continue to wonder if it's not better to just remove the option and thereby simplify our lives. What's the actual value of having it anymore? I agree with you, but I have too little experience to vote for removing this option. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master check fails on Windows Server 2008
On 20-02-2018 3:37, Tom Lane wrote: Ah-hah. I can reproduce the described failure if I configure with --disable-float8-byval on an otherwise 64-bit machine. It appears that the minimum work_mem setting that will allow this query to use a hashagg plan on such a configuration is about 155kB (which squares with the results you show). On the other hand, in a normal 64-bit configuration, work_mem settings of 160kB or more cause other failures (due to other plans switching to hashagg), and on a 32-bit machine I see such failures with work_mem of 150kB or more. So there's basically no setting of work_mem that would make all these tests pass everywhere. I see several things we could do about this: 1. Nothing; just say "sorry, we don't promise that the regression tests pass with no plan differences on nonstandard configurations". Given that --disable-float8-byval has hardly any real-world use, there is not a lot of downside to that. 2. Decide that --disable-float8-byval, and for that matter --disable-float4-byval, have no real-world use at all and take them out. There was some point in those options back when we cared about binary compatibility with version-zero C functions, but now I'm not sure why anyone would use them. 3. Drop that one test case from stats_ext.sql; I'm not sure how much additional test value it's actually bringing. 4. Try to tweak the stats_ext.sql test conditions in some more refined way to get the test to pass everywhere. This'd be a lot of work with no guarantee of success, so I'm not too excited about it. Thank you for your explanations! I'll try to do something in this direction.. 5. Something else? regards, tom lane -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
master check fails on Windows Server 2008
Hello, hackers! I got a permanent failure of master (commit 2a41507dab0f293ff241fe8ae326065998668af8) check on Windows Server 2008. Regression output and diffs as well as config.pl are attached. I used the following commands: build.bat > build.txt vcregress.bat check > check.txt Binary search has shown that this failure begins with commit bed9ef5a16239d91d97a1fa2efd9309c3cbbc4b2 (Rework the stats_ext test). On the previous commit (70ec3f1f8f0b753c38a1a582280a02930d7cac5f) the check passes. I'm trying to figure out what went wrong, and any suspicions are welcome. About the system: Windows Server 2008 R2 Standard, Service Pack 1, 64-bit. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company*** C:/Users/buildfarm/mpolyakova/postgrespro_master/src/test/regress/expected/stats_ext.out Fri Feb 16 12:56:00 2018 --- C:/Users/buildfarm/mpolyakova/postgrespro_master/src/test/regress/results/stats_ext.out Fri Feb 16 13:09:47 2018 *** *** 312,322 EXPLAIN (COSTS off) SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d; QUERY PLAN ! - ! HashAggregate Group Key: b, c, d -> Seq Scan on ndistinct ! (3 rows) EXPLAIN (COSTS off) SELECT COUNT(*) FROM ndistinct GROUP BY a, d; --- 312,324 EXPLAIN (COSTS off) SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d; QUERY PLAN ! --- ! GroupAggregate Group Key: b, c, d +-> Sort + Sort Key: b, c, d -> Seq Scan on ndistinct ! (5 rows) EXPLAIN (COSTS off) SELECT COUNT(*) FROM ndistinct GROUP BY a, d; == test tablespace ... ok parallel group (20 tests): boolean char name varchar text int2 int4 txid int8 bit float8 float4 oid pg_lsn regproc rangetypes enum money uuid numeric boolean ... ok char ... ok name ... ok varchar ... ok text ... ok int2 ... ok int4 ... ok int8 ... ok oid ... ok float4 ... ok float8 ... ok bit ... ok numeric ... ok txid ... ok uuid ... ok enum ... ok money... ok rangetypes ... ok pg_lsn ... ok regproc ... ok test strings ... ok test numerology ... ok parallel group (20 tests): lseg box path line point tstypes reltime circle interval date time timetz macaddr8 tinterval abstime macaddr inet timestamptz timestamp polygon point... ok lseg ... ok line ... ok box ... ok path ... ok polygon ... ok circle ... ok date ... ok time ... ok timetz ... ok timestamp... ok timestamptz ... ok interval ... ok abstime ... ok reltime ... ok tinterval... ok inet ... ok macaddr ... ok macaddr8 ... ok tstypes ... ok parallel group (9 tests): geometry horology expressions oidjoins misc_sanity type_sanity comments regex opr_sanity geometry ... ok horology ... ok regex... ok oidjoins ... ok type_sanity ... ok opr_sanity ... ok misc_sanity ... ok comments ... ok expressions ... ok test insert ... ok test insert_conflict ... ok test create_function_1... ok test create_type ... ok test create_table ... ok test create_function_2... ok parallel group (3 tests): copy copydml copyselect copy ... ok copyselect ... ok copydml ... ok parallel group (3 tests): create_misc create_operator create_procedure create_misc ... ok create_operator ... ok create_procedure ... ok parallel group (2 tests): create_view create_index create_index ... ok create_view ... ok parallel group (15 tests): create_aggregate create_function_3 create_cast roleattributes create_am typed_table hash_func vacuum drop_if_exis
Re: master plpython check fails on Solaris 10
On 14-02-2018 17:54, Tom Lane wrote: Marina Polyakova <m.polyak...@postgrespro.ru> writes: On 14-02-2018 3:43, Peter Eisentraut wrote: OK, can you get some kind of stack trace or other debugging information? I got this backtrace from gdb: Hmm, so the only question in my mind is how did this ever work for anyone? The basic problem here is that, instead of using the ErrorContextCallback.arg field provided for the purpose, plpython_error_callback is using PLy_current_execution_context() to try to identify the context it's supposed to report on. In the example, that points to the context associated with the inner DO block, not with the outer procedure. That context looks like it should reliably have curr_proc->proname == NULL, so how come this test case doesn't crash for everybody? In any case the expected output for the transaction_test4 case is obviously wrong. Rather than reporting the transaction_test4 function and then the inner DO block, it's reporting 2 levels of transaction_test4. That seems to trace directly to both levels of error context callback looking at the same execution context. I think we need to fix the error callback code so that it uses the "arg" field to find the relevant procedure, and that that's a back-patchable fix, because nested plpython functions would show this up as wrong in any case. That would also let us undo the not-terribly-safe practice of having code between the PLy_push_execution_context call and the PG_TRY that is supposed to ensure the context gets popped again. Thank you very much! I'll try to implement this. While I'm bitching ... I wonder how long it's been since the comment for PLy_procedure_name() had anything to do with its actual behavior. AFAIU this has always been like that, since the commit 1ca717f3... -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master plpython check fails on Solaris 10
On 14-02-2018 3:43, Peter Eisentraut wrote: On 2/13/18 05:40, Marina Polyakova wrote: Binary search has shown that this failure begins with commit 8561e4840c81f7e345be2df170839846814fa004 (Transaction control in PL procedures.). On the previous commit (b9ff79b8f17697f3df492017d454caa9920a7183) there's no plpython_transaction test and plpython check passes. OK, can you get some kind of stack trace or other debugging information? I got this backtrace from gdb: Program received signal SIGSEGV, Segmentation fault. 0x7d13bb50 in strlen () from /lib/64/libc.so.1 (gdb) bt #0 0x7d13bb50 in strlen () from /lib/64/libc.so.1 #1 0x7d1aabf8 in _ndoprnt () from /lib/64/libc.so.1 #2 0x7d1ad3d4 in vsnprintf () from /lib/64/libc.so.1 #3 0x0001009ab174 in pvsnprintf ( buf=0x100d680e3 "PL/Python function \"", '\177' times>..., len=933, fmt=0x100d684a0 "PL/Python function \"%s\"", args=0x7fff77f8) at psprintf.c:121 #4 0x000100492bd0 in appendStringInfoVA (str=0x7fff7738, fmt=0x100d684a0 "PL/Python function \"%s\"", args=0x7fff77f8) at stringinfo.c:130 #5 0x000100911038 in errcontext_msg (fmt=0x7aa1eb98 "PL/Python function \"%s\"") at elog.c:1021 #6 0x7aa11ea8 in plpython_error_callback (arg=0x100e7d7e8) at plpy_main.c:402 #7 0x00010090e9f8 in errfinish (dummy=0) at elog.c:438 #8 0x000100482e78 in SPI_commit () at spi.c:211 #9 0x7aa13e90 in PLy_commit (self=0x0, args=0x0) at plpy_plpymodule.c:602 #10 0x71a1ca40 in PyEval_EvalFrameEx (f=0x70619548, throwflag=2147449896) at Python/ceval.c:4004 #11 0x71a1d8a0 in PyEval_EvalFrameEx (f=0x706193a0, throwflag=2147450456) at Python/ceval.c:4106 #12 0x71a1ed20 in PyEval_EvalCodeEx (co=0x7060e6b0, globals=0x705236b4, locals=0x71bf9940 <_PyThreadState_Current>, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252 #13 0x71a1ef64 in PyEval_EvalCode (co=0x7060e6b0, globals=0x70525c58, locals=0x70525c58) at Python/ceval.c:667 #14 0x7aa10938 in PLy_procedure_call (proc=0x7fff8580, kargs=0x7aa1e128 "args", vargs=0x7060d290) at plpy_exec.c:1031 #15 0x7aa0cf08 in PLy_exec_function (fcinfo=0x7fff86f0, proc=0x7fff8580) at plpy_exec.c:107 #16 0x7aa11c64 in plpython_inline_handler (fcinfo=0x7fff8be0) at plpy_main.c:353 #17 0x00010091e44c in OidFunctionCall1Coll (functionId=16385, collation=0, arg1=4311287992) at fmgr.c:1327 #18 0x0001003746ec in ExecuteDoStmt (stmt=0x100f8fa70, atomic=1 '\001') at functioncmds.c:2183 #19 0x0001006f420c in standard_ProcessUtility (pstmt=0x100f90768, queryString=0x70528054 "DO LANGUAGE plpythonu $x$ plpy.commit() $x$", context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0, dest=0x100cd6708 , completionTag=0x7fff93f8 "") at utility.c:533 #20 0x0001006f3b04 in ProcessUtility (pstmt=0x100f90768, queryString=0x70528054 "DO LANGUAGE plpythonu $x$ plpy.commit() $x$", context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0, dest=0x100cd6708 , completionTag=0x7fff93f8 "") at utility.c:358 #21 0x0001004886e0 in _SPI_execute_plan (plan=0x7fff95b0, paramLI=0x0, snapshot=0x0, crosscheck_snapshot=0x0, read_only=0 '\000', fire_triggers=1 '\001', tcount=0) at spi.c:2207 #22 0x000100483980 in SPI_execute ( src=0x70528054 "DO LANGUAGE plpythonu $x$ plpy.commit() $x$", read_only=0 '\000', tcount=0) at spi.c:416 #23 0x7aa17698 in PLy_spi_execute_query ( query=0x70528054 "DO LANGUAGE plpythonu $x$ plpy.commit() $x$", limit=0) at plpy_spi.c:331 #24 0x7aa16c3c in PLy_spi_execute (self=0x0, args=0x7051fb50) at plpy_spi.c:168 #25 0x719af824 in PyCFunction_Call (func=0x7062a098, arg=0x7051fb50, kw=0x0) at Objects/methodobject.c:116 #26 0x71a1cfb0 in PyEval_EvalFrameEx (f=0x7af29c20, throwflag=2147457704) at Python/ceval.c:4020 #27 0x71a1d8a0 in PyEval_EvalFrameEx (f=0x706191f8, throwflag=2147458264) at Python/ceval.c:4106 #28 0x71a1ed20 in PyEval_EvalCodeEx (co=0x7072cf30, globals=0x70523624, locals=0x71bf9940 <_PyThreadState_Current>, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252 #29 0x71a1ef64 in PyEval_EvalCode (co=0x7072cf30, globals=0x70525910, locals=0x70525910) at Python/ceval.c:667 ---Type to continue, or q to quit--- #30 0x7aa10938 in PLy_procedure_
master plpython check fails on Solaris 10
Hello, hackers! I got a permanent failure of master (commit ebdb42a0d6a61b93a5bb9f4204408edf5959332c) plpython check on Solaris 10. Regression output and diffs are attached. I used the following commands: ./configure CC="ccache gcc" CFLAGS="-m64 -I/opt/csw/include" PKG_CONFIG_PATH="/opt/csw/lib/pkgconfig:/usr/local/lib/pkgconfig" LDFLAGS="-L/opt/csw/lib/sparcv9 -L/usr/local/lib/64" --enable-cassert --enable-debug --enable-nls --enable-tap-tests --with-perl --with-tcl --with-python --with-gssapi --with-openssl --with-ldap --with-libxml --with-libxslt --with-icu gmake > make_results.txt gmake -C src/pl/plpython check Binary search has shown that this failure begins with commit 8561e4840c81f7e345be2df170839846814fa004 (Transaction control in PL procedures.). On the previous commit (b9ff79b8f17697f3df492017d454caa9920a7183) there's no plpython_transaction test and plpython check passes. About the system: SunOS, Release 5.10, KernelID Generic_141444-09. P.S. It seems that there's a similar failure on Windows, and I'm trying to reproduce it.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company*** /home/buildfarm/mpolyakova/postgrespro_core_353_stable_func/src/pl/plpython/expected/plpython_transaction.out Tue Feb 13 13:00:33 2018 --- /home/buildfarm/mpolyakova/postgrespro_core_353_stable_func/src/pl/plpython/results/plpython_transaction.out Tue Feb 13 13:34:39 2018 *** *** 88,136 return 1 $$; SELECT transaction_test4(); ! ERROR: spiexceptions.InvalidTransactionTermination: invalid transaction termination ! CONTEXT: Traceback (most recent call last): ! PL/Python function "transaction_test4", line 2, in ! plpy.execute("DO LANGUAGE plpythonu $x$ plpy.commit() $x$") ! PL/Python function "transaction_test4" ! -- commit inside subtransaction (prohibited) ! DO LANGUAGE plpythonu $$ ! s = plpy.subtransaction() ! s.enter() ! plpy.commit() ! $$; ! WARNING: forcibly aborting a subtransaction that has not been exited ! ERROR: cannot commit while a subtransaction is active ! CONTEXT: PL/Python anonymous code block ! -- commit inside cursor loop ! CREATE TABLE test2 (x int); ! INSERT INTO test2 VALUES (0), (1), (2), (3), (4); ! TRUNCATE test1; ! DO LANGUAGE plpythonu $$ ! for row in plpy.cursor("SELECT * FROM test2 ORDER BY x"): ! plpy.execute("INSERT INTO test1 (a) VALUES (%s)" % row['x']) ! plpy.commit() ! $$; ! ERROR: cannot commit transaction while a cursor is open ! CONTEXT: PL/Python anonymous code block ! SELECT * FROM test1; ! a | b ! ---+--- ! (0 rows) ! ! -- rollback inside cursor loop ! TRUNCATE test1; ! DO LANGUAGE plpythonu $$ ! for row in plpy.cursor("SELECT * FROM test2 ORDER BY x"): ! plpy.execute("INSERT INTO test1 (a) VALUES (%s)" % row['x']) ! plpy.rollback() ! $$; ! ERROR: cannot abort transaction while a cursor is open ! CONTEXT: PL/Python anonymous code block ! SELECT * FROM test1; ! a | b ! ---+--- ! (0 rows) ! ! DROP TABLE test1; ! DROP TABLE test2; --- 88,94 return 1 $$; SELECT transaction_test4(); ! server closed the connection unexpectedly ! This probably means the server terminated abnormally ! before or while processing the request. ! connection to server was lost == *** /home/buildfarm/mpolyakova/postgrespro_core_353_stable_func/src/pl/plpython/expected/plpython_drop.out Wed Feb 7 17:27:50 2018 --- /home/buildfarm/mpolyakova/postgrespro_core_353_stable_func/src/pl/plpython/results/plpython_drop.out Tue Feb 13 13:34:39 2018 *** *** 1,6 ! -- ! -- For paranoia's sake, don't leave an untrusted language sitting around ! -- ! SET client_min_messages = WARNING; ! DROP EXTENSION plpythonu CASCADE; ! DROP EXTENSION IF EXISTS plpython2u CASCADE; --- 1 ! psql: FATAL: the database system is in recovery mode == test plpython_schema ... ok test plpython_populate... ok test plpython_test... ok test plpython_do ... ok test plpython_global ... ok test plpython_import ... ok test plpython_spi ... ok test plpython_newline ... ok test plpython_void... ok test plpython_call... ok test plpython_params ... ok test plpython_setof ... ok test plpython_record ... ok test plpython_trigger ... ok test plpython_types ... ok test plpython_error ... ok test plpython_ereport ... ok test plpython_unicode ... ok test plpython_quote ... ok test plpython_composite ... ok test plpython_subtransaction ... ok test plpython_transaction ... FAILED (test process exited with exit code 2) test plpython_drop... FAILED (test process exited with exit code 2)
Re: WIP Patch: Precalculate stable functions, infrastructure v1
Hello! Thank you for reporting! I'll try to get it on our buildfarm.. On 05-02-2018 0:10, Thomas Munro wrote: On Thu, Feb 1, 2018 at 6:01 PM, Marina Polyakova <m.polyak...@postgrespro.ru> wrote: This is the 8-th version of the patch for the precalculation of stable or immutable functions, stable or immutable operators and other nonvolatile expressions. This is a try to fix the most problems (I'm sorry, it took some time..) that Tom Lane and Andres Freund mentioned in [1], [2] and [3]. It is based on the top of master, and on my computer make check-world passes. And I'll continue work on it. Hi Marina, FYI I saw a repeatable crash in the contrib regression tests when running make check-world with this patch applied. test hstore_plperl... FAILED (test process exited with exit code 2) test hstore_plperlu ... FAILED (test process exited with exit code 2) test create_transform ... FAILED I'm not sure why it passes for you but fails here, but we can see from the backtrace[1] that ExecInitExprRec is receiving a null node pointer on this Ubuntu Trusty GCC 4.8 amd64 system. [1] https://travis-ci.org/postgresql-cfbot/postgresql/builds/337255374 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: WIP Patch: Precalculate stable functions, infrastructure v1
Hello! On 24-01-2018 23:36, Andres Freund wrote: On 2018-01-24 15:10:56 -0500, Tom Lane wrote: ... Keeping the stored value of a CachedExpr in a Param slot is an interesting idea indeed. We keep coming back to this, IIRC we had a pretty similar discussion around redesigning caseValue_datum/isNull domainValue_datum/isNull to be less ugly. There also was https://www.postgresql.org/message-id/20171116182208.kcvf75nfaldv3...@alap3.anarazel.de where we discussed using something similar to PARAM_EXEC Param nodes to allow inlining of volatile functions. ISTM, there might be some value to consider all of them in the design of the new mechanism. Thank you both very much for this discussion and for the link on that thread! Now I'm working on the patch, thanks to Tom Lane's comments earlier [1], and I'll try to implement something of this.. [1] https://www.postgresql.org/message-id/403e0ae329c6868b3f3467eac92cc04d%40postgrespro.ru -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: WIP Patch: Precalculate stable functions, infrastructure v1
On 17-01-2018 1:05, Tom Lane wrote: [ I'm sending this comment separately because I think it's an issue Andres might take an interest in. ] Marina Polyakova <m.polyak...@postgrespro.ru> writes: [ v7-0001-Precalculate-stable-and-immutable-functions.patch ] Another thing that's bothering me is that the execution semantics you're proposing for CachedExpr seem rather inflexible. AFAICS, once a CachedExpr has run once, it will hang on to the result value and keep returning that for the entire lifespan of the compiled expression. We already noted that that breaks plpgsql's "simple expression" logic, and it seems inevitable to me that it will be an issue for other places as well. I think it'd be a better design if we had some provision for resetting the cached values, short of recompiling the expression from scratch. One way that occurs to me to do this is to replace the simple boolean isExecuted flags with a generation counter, and add a master generation counter to ExprState. The rule for executing CachedExpr would be "if my generation counter is different from the ExprState's counter, then evaluate the subexpression and copy the ExprState's counter into mine". Then the procedure for forcing recalculation of cached values is just to increment the ExprState's counter. There are other ways one could imagine doing this --- for instance, I initially thought of keeping the master counter in the ExprContext being used to run the expression. But you need some way to remember what counter value was used last with a particular expression, so probably keeping it in ExprState is better. Or we could just brute-force it by providing a function that runs through a compiled expression step list and resets the isExecuted flag for each EEOP_CACHEDEXPR_IF_CACHED step it finds. A slightly less brute-force way is to link those steps together in a list, so that the function doesn't have to visit irrelevant steps. If the reset function were seldom used then the extra cycles for this wouldn't be very expensive. But I'm not sure it will be seldom used --- it seems like plpgsql simple expressions will be doing this every time --- so I think the counter approach might be a better idea. Thank you very much! I'll try to implement something from this. I'm curious to know whether Andres has some other ideas, or whether he feels this is all a big wart on the compiled-expression concept. I don't think there are any existing cases where we keep any meaningful state across executions of a compiled-expression data structure; maybe that's a bad idea in itself. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: WIP Patch: Precalculate stable functions, infrastructure v1
the infrastructure of ParamListInfoData. And what in the world is CheckBoundParams about? The internal documentation in this patch isn't quite nonexistent, but it's well short of being in a committable state IMO. I'll try to improve it, for CheckBoundParams (if I understood you correctly) and others. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master make check fails on Solaris 10
On 18-01-2018 20:49, Marina Polyakova wrote: On 18-01-2018 20:34, Tom Lane wrote: ... What you could do in the meantime is work on finding a variation of Victor's test that will detect the bug regardless of -O level. If we do have hope that future gcc versions will handle this correctly, we'll need a better test rather than just summarily dismissing host_cpu = sparc. Thanks, I'll try.. I tried different options of gcc but it did not help.. Perhaps searching in the source code of gcc will clarify something, but I'm sorry that I'm now too busy for this.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master make check fails on Solaris 10
On 18-01-2018 20:34, Tom Lane wrote: I wrote: ... But ... let's not panic, but wait and see the final result of the discussion on the gcc PR. Jakub at least seems to think it ought to be a supportable case. What you could do in the meantime is work on finding a variation of Victor's test that will detect the bug regardless of -O level. If we do have hope that future gcc versions will handle this correctly, we'll need a better test rather than just summarily dismissing host_cpu = sparc. Thanks, I'll try.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master make check fails on Solaris 10
On 18-01-2018 20:24, Tom Lane wrote: Marina Polyakova <m.polyak...@postgrespro.ru> writes: On 18-01-2018 19:53, Tom Lane wrote: So basically, we're outta luck and we have to consider __int128 as unsupportable on SPARC. I'm inclined to mechanize that as a test on $host_cpu. At least that means we don't need an AC_RUN test ;-) %-)) :-) Can I do something else about this problem?.. I don't see any other workable alternative. The box we're in as far as the interaction with MAXALIGN goes is still the same as it was a month ago: raising MAXALIGN is impractical, and so is allowing some datatypes to have more-than-MAXALIGN alignment specs. I suppose you could imagine declaring int128s that are in any sort of palloc'd storage as, in effect, char[16], and always memcpy'ing to and from local variables that're declared int128 whenever you want to do arithmetic with them. But ugh. I can't see taking that sort of notational and performance hit for just one non-mainstream architecture. Really, this is something that the compiler ought to do for us, IMO. If the gcc guys don't want to be bothered, OK, but that tells you more about the priority they place on SPARC support than anything else. Thank you very much for your explanations! So I'll go to all of your comments to my patch about stable functions when the next work day begins in Moscow) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master make check fails on Solaris 10
On 18-01-2018 19:53, Tom Lane wrote: Marina Polyakova <m.polyak...@postgrespro.ru> writes: On 18-01-2018 17:56, Tom Lane wrote: Weird. Maybe the gcc bug only manifests with certain optimization flags? That's not what I'd have expected from Victor's theory about why the code is wrong, but if it only shows up some of the time, it's hard to think of another explanation. Thank you! Using ./configure CC="gcc" CFLAGS="-m64 -O1" on commit 9c7d06d60680 with your patch, I got this: [ configure check passes ] But make check got the same failures, and I see the same debug output as in [1].. Interesting. Maybe the parameter-passing misbehavior that Victor's test is looking for isn't the only associated bug. P.S. As I understand it, this comment on bugzilla [2] is also about this. [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83925#c6 Even more interesting, see c7 that was just posted there: Eric Botcazou 2018-01-18 16:22:48 UTC 128-bit types requite 128-bit alignment on SPARC64 so we cannot support that. So basically, we're outta luck and we have to consider __int128 as unsupportable on SPARC. I'm inclined to mechanize that as a test on $host_cpu. At least that means we don't need an AC_RUN test ;-) %-)) :-) Can I do something else about this problem?.. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master make check fails on Solaris 10
On 18-01-2018 17:56, Tom Lane wrote: Marina Polyakova <m.polyak...@postgrespro.ru> writes: Applying your patch on commit f033462d8f77c40b7d6b33c5116e50118fb4699d and using the configuration command from [1], I got: checking for __int128... yes checking for __int128 alignment bug... broken ... And make check-world passes. Victor said that he used a much simpler configuration command, and I'm trying to figure out what's changed.. Weird. Maybe the gcc bug only manifests with certain optimization flags? That's not what I'd have expected from Victor's theory about why the code is wrong, but if it only shows up some of the time, it's hard to think of another explanation. Thank you! Using ./configure CC="gcc" CFLAGS="-m64 -O1" on commit 9c7d06d60680 with your patch, I got this: checking for __int128... yes checking for __int128 alignment bug... ok checking alignment of PG_INT128_TYPE... 16 In pg_config.h: /* The normal alignment of `PG_INT128_TYPE', in bytes. */ #define ALIGNOF_PG_INT128_TYPE 16 ... /* Define to the name of a signed 128-bit integer type. */ #define PG_INT128_TYPE __int128 But make check got the same failures, and I see the same debug output as in [1].. P.S. As I understand it, this comment on bugzilla [2] is also about this. [1] https://www.postgresql.org/message-id/90ab676392c8f9c84431976147097cf0%40postgrespro.ru [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83925#c6 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master make check fails on Solaris 10
On 17-01-2018 18:50, Marina Polyakova wrote: On 17-01-2018 18:28, Tom Lane wrote: BTW, now that you've demonstrated that the bug exists in a current gcc release, you should definitely file a bug at https://gcc.gnu.org/bugzilla/ I think you can just give them int128test2.c as-is as a test case. Please do that and let me know the PR number --- I think it would be good to cite the bug specifically in the comments for our configure code. Thanks, I'll try to do it. If I understand correctly, its PR number is 83925 (see [1]). [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83925 -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master make check fails on Solaris 10
Thank you! On 17-01-2018 19:33, Tom Lane wrote: Attached is a draft patch to incorporate Victor's slimmed-down test into configure. If you have a chance, could you confirm it does the right thing on your Sparc machine? Victor Wagner <vitus(at)wagner(dot)pp(dot)ru> writes: It seems that what it does is not exactly a right thing. I've applied it to commit 9c7d06d60680 in master and see following checking for __int128 alignment bug... ok As far as I understand your patch, there should be: checking for __int128 alignment bug... broken ... Then in the pg_config.h I see /* The normal alignment of `PG_INT128_TYPE', in bytes. */ #define ALIGNOF_PG_INT128_TYPE 16 /* Define to the name of a signed 128-bit integer type. */ #define PG_INT128_TYPE __int128 ... However, make check passes. Uh ... how could that be? If the output of configure is exactly the same as before the patch, how could the end result be different? Applying your patch on commit f033462d8f77c40b7d6b33c5116e50118fb4699d and using the configuration command from [1], I got: checking for __int128... yes checking for __int128 alignment bug... broken Nothing is defined for int128 in pg_config.h: /* The normal alignment of `PG_INT128_TYPE', in bytes. */ /* #undef ALIGNOF_PG_INT128_TYPE */ ... /* Define to the name of a signed 128-bit integer type. */ /* #undef PG_INT128_TYPE */ And make check-world passes. Victor said that he used a much simpler configuration command, and I'm trying to figure out what's changed.. BTW, it would be a good idea to set up a buildfarm member on that machine, if you care about whether that configuration continues to work in the future. Victor answered this in [2]: Really we already have buildfarm member on this machine. It is just member of PostgresPro private buildfarm, not of big comminity buildfarm. So, I'll register it in the big buildfarm as soon as I figure out how to distribute limited resources of this machine between two buildfarms. P.S. I found the trailing whitespace in line 80: ! int128a q; [1] https://www.postgresql.org/message-id/0d3a9fa264cebe1cb9966f37b7c06e86%40postgrespro.ru [2] https://www.postgresql.org/message-id/20180117203648.2626d97a%40wagner.wagner.home -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master make check fails on Solaris 10
On 17-01-2018 18:28, Tom Lane wrote: BTW, now that you've demonstrated that the bug exists in a current gcc release, you should definitely file a bug at https://gcc.gnu.org/bugzilla/ I think you can just give them int128test2.c as-is as a test case. Please do that and let me know the PR number --- I think it would be good to cite the bug specifically in the comments for our configure code. Thanks, I'll try to do it. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master make check fails on Solaris 10
Sorry, diff.patch is attached now. On 17-01-2018 18:02, Marina Polyakova wrote: [I added Victor Wagner as co-researcher of this problem] On 13-01-2018 21:10, Tom Lane wrote: In the end this might just be an instance of the old saw about avoiding dot-zero releases. Have you tried a newer gcc? (Digging in their bugzilla finds quite a number of __int128 bugs fixed in 5.4.x, though none look to be specifically about misaligned data.) gcc 5.5.0 (from [1]) did not fix the problem.. On 16-01-2018 2:41, Tom Lane wrote: Marina Polyakova <m.polyak...@postgrespro.ru> writes: On 13-01-2018 21:10, Tom Lane wrote: I'm not sure there's much we can do about this. Dropping the use of the alignment spec isn't a workable option. If there were a simple way for configure to detect that the compiler generates bad code for that, we could have it do so and reject use of __int128, but it'd be up to you to come up with a workable test. I'll think about it.. Attached is a possible test program. I can confirm it passes on a machine with working __int128, but I have no idea whether it will detect the problem on yours. If not, maybe you can tweak it? Thank you! Using gcc 5.5.0 it prints that everything is ok. But, investigating the regression diffs, we found out that the error occurs when we pass int128 as not the first argument to the function (perhaps its value is replaced by the value of some address): -- Use queries from random.sql SELECT count(*) FROM onek; -- Everything is ok ... SELECT random, count(random) FROM RANDOM_TBL GROUP BY random HAVING count(random) > 3; -- Everything is ok postgres=# SELECT * FROM RANDOM_TBL ORDER BY random; -- Print current data random 78 86 98 98 (4 rows) postgres=# SELECT AVG(random) FROM RANDOM_TBL postgres-# HAVING AVG(random) NOT BETWEEN 80 AND 120; -- Oops! avg --- 79446934848446476698976780288 (1 row) Debug output from the last query (see attached diff.patch, it is based on commit 9c7d06d60680c7f00d931233873dee81fdb311c6 of master): makeInt128AggState int8_avg_accum val 98 int8_avg_accum val_int128 as 2 x int64: 0 98 int8_avg_accum val_int128 bytes: 0062 int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 000100B4F6D8 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 int8_avg_accum val 86 int8_avg_accum val_int128 as 2 x int64: 0 86 int8_avg_accum val_int128 bytes: 0056 int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 000100B4F6D8 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 int8_avg_accum val 98 int8_avg_accum val_int128 as 2 x int64: 0 98 int8_avg_accum val_int128 bytes: 0062 int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 000100B4F6D8 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 int8_avg_accum val 78 int8_avg_accum val_int128 as 2 x int64: 0 78 int8_avg_accum val_int128 bytes: 004E int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 000100B4F6D8 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 numeric_poly_avg int128_to_numericvar int128_to_numericvar int128 val as 2 x int64: 17227307872 0 int128_to_numericvar int128 val bytes: 000402D3DB60 (val_int128 in the function int8_avg_accum is correct, but newval in the function do_int128_accum is not equal to it. val in the function int128_to_numericvar is (4 * 4306826968).) Based on this, we modified the test program (see attached). Here is its output on Solaris 10 for different alignments requirements for int128 (on my machine where make check-world passes everything is OK) (ALIGNOF_PG_INT128_TYPE is 16 on Solaris 10): $ gcc -D PG_ALIGN_128=16 -m64 -o int128test2 int128test2.c $ ./int128test2 basic aritmetic OK pass int 16 OK pass uint 16 OK pass int 32 OK pass int 64 OK pass int 128 OK $ gcc -D PG_ALIGN_128=8 -m64 -o int128test2 int128test2.c $ ./int128test2 basic aritmetic OK pass int 16 FAILED pass uint 16 FAILED pass int 32 FAILED pass int 64 FAILED pass int 128 OK Maybe some pass test from int128test2.c can be used to test __int128? P.S. I suppose, g.b should be 97656250 to get 4005: struct glob128 { __int128start; charpad; int128a a; int128a b; int128a c; int128a d; } g = {0, 'p', 48828125, 97656255, 0, 0}; ... g.b = (g.b << 12) + 5;/* 4005
Re: master make check fails on Solaris 10
[I added Victor Wagner as co-researcher of this problem] On 13-01-2018 21:10, Tom Lane wrote: In the end this might just be an instance of the old saw about avoiding dot-zero releases. Have you tried a newer gcc? (Digging in their bugzilla finds quite a number of __int128 bugs fixed in 5.4.x, though none look to be specifically about misaligned data.) gcc 5.5.0 (from [1]) did not fix the problem.. On 16-01-2018 2:41, Tom Lane wrote: Marina Polyakova <m.polyak...@postgrespro.ru> writes: On 13-01-2018 21:10, Tom Lane wrote: I'm not sure there's much we can do about this. Dropping the use of the alignment spec isn't a workable option. If there were a simple way for configure to detect that the compiler generates bad code for that, we could have it do so and reject use of __int128, but it'd be up to you to come up with a workable test. I'll think about it.. Attached is a possible test program. I can confirm it passes on a machine with working __int128, but I have no idea whether it will detect the problem on yours. If not, maybe you can tweak it? Thank you! Using gcc 5.5.0 it prints that everything is ok. But, investigating the regression diffs, we found out that the error occurs when we pass int128 as not the first argument to the function (perhaps its value is replaced by the value of some address): -- Use queries from random.sql SELECT count(*) FROM onek; -- Everything is ok ... SELECT random, count(random) FROM RANDOM_TBL GROUP BY random HAVING count(random) > 3; -- Everything is ok postgres=# SELECT * FROM RANDOM_TBL ORDER BY random; -- Print current data random 78 86 98 98 (4 rows) postgres=# SELECT AVG(random) FROM RANDOM_TBL postgres-# HAVING AVG(random) NOT BETWEEN 80 AND 120; -- Oops! avg --- 79446934848446476698976780288 (1 row) Debug output from the last query (see attached diff.patch, it is based on commit 9c7d06d60680c7f00d931233873dee81fdb311c6 of master): makeInt128AggState int8_avg_accum val 98 int8_avg_accum val_int128 as 2 x int64: 0 98 int8_avg_accum val_int128 bytes: 0062 int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 000100B4F6D8 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 int8_avg_accum val 86 int8_avg_accum val_int128 as 2 x int64: 0 86 int8_avg_accum val_int128 bytes: 0056 int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 000100B4F6D8 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 int8_avg_accum val 98 int8_avg_accum val_int128 as 2 x int64: 0 98 int8_avg_accum val_int128 bytes: 0062 int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 000100B4F6D8 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 int8_avg_accum val 78 int8_avg_accum val_int128 as 2 x int64: 0 78 int8_avg_accum val_int128 bytes: 004E int8_avg_accum state 100e648d8 int8_avg_accum 1007f2e94 do_int128_accum int128 newval as 2 x int64: 4306826968 0 do_int128_accum int128 newval bytes: 000100B4F6D8 do_int128_accum state 100e648d8 do_int128_accum 1007f1e30 numeric_poly_avg int128_to_numericvar int128_to_numericvar int128 val as 2 x int64: 17227307872 0 int128_to_numericvar int128 val bytes: 000402D3DB60 (val_int128 in the function int8_avg_accum is correct, but newval in the function do_int128_accum is not equal to it. val in the function int128_to_numericvar is (4 * 4306826968).) Based on this, we modified the test program (see attached). Here is its output on Solaris 10 for different alignments requirements for int128 (on my machine where make check-world passes everything is OK) (ALIGNOF_PG_INT128_TYPE is 16 on Solaris 10): $ gcc -D PG_ALIGN_128=16 -m64 -o int128test2 int128test2.c $ ./int128test2 basic aritmetic OK pass int 16 OK pass uint 16 OK pass int 32 OK pass int 64 OK pass int 128 OK $ gcc -D PG_ALIGN_128=8 -m64 -o int128test2 int128test2.c $ ./int128test2 basic aritmetic OK pass int 16 FAILED pass uint 16 FAILED pass int 32 FAILED pass int 64 FAILED pass int 128 OK Maybe some pass test from int128test2.c can be used to test __int128? P.S. I suppose, g.b should be 97656250 to get 4005: struct glob128 { __int128start; charpad; int128a a; int128a b; int128a c; int128a d; } g = {0, 'p', 48828125, 97656255, 0, 0}; ... g.b = (g.b << 12) + 5;/* 4005 */ [1] https://www.opencsw.org -- Marina Polyakova Postgres Pro
Re: WIP Patch: Precalculate stable functions, infrastructure v1
Thank you so much for your comments!! I'll answer a bit later because now I'm trying to find a test for int128 on Solaris 10.. [1] On 17-01-2018 1:05, Tom Lane wrote: [ I'm sending this comment separately because I think it's an issue Andres might take an interest in. ] Marina Polyakova <m.polyak...@postgrespro.ru> writes: [ v7-0001-Precalculate-stable-and-immutable-functions.patch ] Another thing that's bothering me is that the execution semantics you're proposing for CachedExpr seem rather inflexible. AFAICS, once a CachedExpr has run once, it will hang on to the result value and keep returning that for the entire lifespan of the compiled expression. We already noted that that breaks plpgsql's "simple expression" logic, and it seems inevitable to me that it will be an issue for other places as well. I think it'd be a better design if we had some provision for resetting the cached values, short of recompiling the expression from scratch. One way that occurs to me to do this is to replace the simple boolean isExecuted flags with a generation counter, and add a master generation counter to ExprState. The rule for executing CachedExpr would be "if my generation counter is different from the ExprState's counter, then evaluate the subexpression and copy the ExprState's counter into mine". Then the procedure for forcing recalculation of cached values is just to increment the ExprState's counter. There are other ways one could imagine doing this --- for instance, I initially thought of keeping the master counter in the ExprContext being used to run the expression. But you need some way to remember what counter value was used last with a particular expression, so probably keeping it in ExprState is better. Or we could just brute-force it by providing a function that runs through a compiled expression step list and resets the isExecuted flag for each EEOP_CACHEDEXPR_IF_CACHED step it finds. A slightly less brute-force way is to link those steps together in a list, so that the function doesn't have to visit irrelevant steps. If the reset function were seldom used then the extra cycles for this wouldn't be very expensive. But I'm not sure it will be seldom used --- it seems like plpgsql simple expressions will be doing this every time --- so I think the counter approach might be a better idea. I'm curious to know whether Andres has some other ideas, or whether he feels this is all a big wart on the compiled-expression concept. I don't think there are any existing cases where we keep any meaningful state across executions of a compiled-expression data structure; maybe that's a bad idea in itself. regards, tom lane [1] https://www.postgresql.org/message-id/18209.1516059711%40sss.pgh.pa.us -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master make check fails on Solaris 10
On 12-01-2018 14:05, Marina Polyakova wrote: - on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935) the same failures occur (see the attached regression diffs and output); - on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world passes. I'll try to find out from what commit it started.. Binary search has shown that all these failures begin with commit 7518049980be1d90264addab003476ae105f70d4 (Prevent int128 from requiring more than MAXALIGN alignment.). On the previous commit (91aec93e6089a5ba49cce0aca3bf7f7022d62ea4) make check-world passes. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: master make check fails on Solaris 10
On 12-01-2018 18:12, Alvaro Herrera wrote: Marina Polyakova wrote: - on the previous commit (272c2ab9fd0a604e3200030b1ea26fd464c44935) the same failures occur (see the attached regression diffs and output); - on commit bf54c0f05c0a58db17627724a83e1b6d4ec2712c make check-world passes. I'll try to find out from what commit it started.. Don't you have any suspicions?) Perhaps you can use "git bisect". Thanks, I'm doing the same thing :) -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: WIP Patch: Pgbench Serialization and deadlock errors
On 12-01-2018 15:47, Fabien COELHO wrote: Hello Marina, A partial answer, to focus on how the feature may be simplified. Thank you! I apologise as it seems that I overlooked one of your mail. Changing the thread has not helped. I'm wondering whether the whole feature could be simplified by considering that one script is one "transaction" (it is from the report point of view at least), and that any retry is for the full script only, from its beginning. That would remove the trying to guess at transactions begin or end, avoid scanning manually for subcommands, and so on. - Would it make sense? - Would it be ok for your use case? I'm not sure if this makes sense: if we retry the script from the very beginning including successful transactions, there may be errors.. What I'm suggesting is to simplify the problem by adding some restrictions to what kind of case which is handled, so as to simplify the code a lot. I'd start by stating (i.e. documenting) that the features assumes that one script is just *one* transaction. Note that pgbench somehow already assumes that one script is one transaction when it reports performance anyway. If you want 2 transactions, then you have to put them in two scripts, which looks fine with me. Different transactions are expected to be independent, otherwise they should be merged into one transaction. Therefore if the script consists of several single statements (= several transactions), you cannot retry them. For example, the script looked like this: UPDATE xy1 SET x = 1 WHERE y = 1; UPDATE xy2 SET x = 2 WHERE y = 2; UPDATE xy3 SET x = 3 WHERE y = 3; If this restriction is ok for you, I'll simplify the code :) Under these restrictions, ISTM that a retry is something like: case ABORTED: if (we want to retry) { // do necessary stats // reset the initial state (random, vars, current command) state = START_TX; // loop } else { // count as failed... state = FINISHED; // or done. } break; If we successfully complete a failed transaction block and process its end command in CSTATE_END_COMMAND, we may want to make a retry. So do you think that in this case it is ok to go to CSTATE_ABORTED at the end of CSTATE_END_COMMAND?.. Once this works, maybe it could go a step further by restarting at savepoints. I'd put restrictions there to ease detecting a savepoint so that it cannot occur in a compound command for instance. This would probably require a new state. Fine. We discussed the savepoints earlier in [1]: - if there's a failure what savepoint we should rollback to and start the execution again? ... Maybe to go to the last one, if it is not successful go to the previous one etc. Retrying the entire transaction may take less time.. Well, I do not know that. My 0.02 € is that if there was a savepoint then this is natural the restarting point of a transaction which has some recoverable error. A detail: For file contents, maybe the << 'EOF' here-document syntax would help instead of using concatenated backslashed strings everywhere. Thanks, but I'm not sure that it is better to open file handlers for printing in variables.. Perl here-document stuff is just a multiline string, no file is involved, it is different from the shell. Oh, now googling was successful, thanks) [1] https://www.postgresql.org/message-id/alpine.DEB.2.20.1707141637300.7871%40lancre -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company