Re: Tab completion for CREATE TABLE ... AS

2023-11-15 Thread Gilles Darold

Le 15/11/2023 à 03:58, Michael Paquier a écrit :

On Thu, Nov 02, 2023 at 07:27:02PM +0300, Gilles Darold wrote:

Look like the tab completion for CREATE TABLE ... AS is not
proposed.

+   /* Complete CREATE TABLE  AS with list of keywords */
+   else if (TailMatches("CREATE", "TABLE", MatchAny, "AS") ||
+TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", 
MatchAny, "AS"))
+   COMPLETE_WITH("SELECT", "WITH");

There is a bit more than SELECT and WITH as possible query for a CTAS.
How about VALUES, TABLE or even EXECUTE (itself able to handle a
SELECT, TABLE or VALUES)?
--
Michael


Right, I don't know how I have missed the sql-createtableas page in the 
documentation.


Patched v2 fixes the keyword list, I have also sorted by alphabetical 
order the CREATE TABLE completion (AS was at the end of the list).


It has also been re-based on current master.

--
Gilles Darold
http://www.darold.net/
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a1dfc11f6b..e6c88cf1a1 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2501,11 +2501,15 @@ psql_completion(const char *text, int start, int end)
 	/* Complete CREATE TABLE  with '(', OF or PARTITION OF */
 	else if (TailMatches("CREATE", "TABLE", MatchAny) ||
 			 TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny))
-		COMPLETE_WITH("(", "OF", "PARTITION OF");
+		COMPLETE_WITH("(", "AS", "OF", "PARTITION OF");
 	/* Complete CREATE TABLE  OF with list of composite types */
 	else if (TailMatches("CREATE", "TABLE", MatchAny, "OF") ||
 			 TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "OF"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_composite_datatypes, NULL);
+	/* Complete CREATE TABLE  AS with list of keywords */
+	else if (TailMatches("CREATE", "TABLE", MatchAny, "AS") ||
+			TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "AS"))
+		COMPLETE_WITH("EXECUTE", "SELECT", "TABLE", "VALUES", "WITH");
 	/* Complete CREATE TABLE name (...) with supported options */
 	else if (TailMatches("CREATE", "TABLE", MatchAny, "(*)") ||
 			 TailMatches("CREATE", "UNLOGGED", "TABLE", MatchAny, "(*)"))


Tab completion for CREATE TABLE ... AS

2023-11-02 Thread Gilles Darold

Hi,


Look like the tab completion for CREATE TABLE ... AS is not proposed.


gilles=# CREATE TABLE test
( OF    PARTITION OF

 The attached patch fix that and also propose the further completion 
after the AS keyword.



gilles=# CREATE TABLE test
( AS    OF    PARTITION OF
gilles=# CREATE TABLE test AS
SELECT  WITH

Adding the patch to current commitfest.


Best regards,

--
Gilles Darold
http://www.darold.net/
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 93742fc6ac..d793a94d6a 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3231,11 +3231,15 @@ psql_completion(const char *text, int start, int end)
 	/* Complete CREATE TABLE  with '(', OF or PARTITION OF */
 	else if (TailMatches("CREATE", "TABLE", MatchAny) ||
 			 TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny))
-		COMPLETE_WITH("(", "OF", "PARTITION OF");
+		COMPLETE_WITH("(", "OF", "PARTITION OF", "AS");
 	/* Complete CREATE TABLE  OF with list of composite types */
 	else if (TailMatches("CREATE", "TABLE", MatchAny, "OF") ||
 			 TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "OF"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_composite_datatypes);
+	/* Complete CREATE TABLE  AS with list of keywords */
+	else if (TailMatches("CREATE", "TABLE", MatchAny, "AS") ||
+			 TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "AS"))
+		COMPLETE_WITH("SELECT", "WITH");
 	/* Complete CREATE TABLE name (...) with supported options */
 	else if (TailMatches("CREATE", "TABLE", MatchAny, "(*)") ||
 			 TailMatches("CREATE", "UNLOGGED", "TABLE", MatchAny, "(*)"))


Re: [Proposal] Allow pg_dump to include all child tables with the root table

2023-03-14 Thread Gilles Darold

Le 14/03/2023 à 10:50, stephane tachoires a écrit :

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

V5 is ok (aside for LLVM 14 deprecated warnings, but that's another problem) 
with meson compile and meson test on Ubuntu 20.04.2.
Code fits well and looks standart, --help explain what it does clearly, and 
documentation is ok (but as a Français, I'm not an expert in English).



Thanks Stepane, I've changed commit fest status to "Ready for committers".

--
Gilles Darold





Re: [Proposal] Allow pg_dump to include all child tables with the root table

2023-03-13 Thread Gilles Darold

Le 12/03/2023 à 19:05, Stéphane Tachoires a écrit :


Hi Gilles,

On Ubuntu 22.04.2 all deb's updated, I have an error on a test
I'll try to find where and why, but I think you should know first.

1/1 postgresql:pg_dump / pg_dump/002_pg_dump        ERROR           
 24.40s   exit status 1
―― 
✀ 
 ――

stderr:
#   Failed test 'only_dump_measurement: should dump CREATE TABLE 
test_compression'
#   at 
/media/hddisk/stephane/postgresql/src/postgresql/src/bin/pg_dump/t/002_pg_dump.pl 
<http://002_pg_dump.pl> line 4729.
# Review only_dump_measurement results in 
/media/hddisk/stephane/postgresql/build/testrun/pg_dump/002_pg_dump/data/tmp_test_iJxJ

# Looks like you failed 1 test of 10264.


Hi Stephane,


Odd, I do not have this error when running make installcheck, I have the 
same OS version as you.



    +++ tap check in src/bin/pg_dump +++
    t/001_basic.pl  ok
    t/002_pg_dump.pl .. ok
    t/003_pg_dump_with_server.pl .. ok
    t/010_dump_connstr.pl . ok
    All tests successful.
    Files=4, Tests=9531, 11 wallclock secs ( 0.33 usr  0.04 sys + 3.05 
cusr  1.22 csys =  4.64 CPU)

    Result: PASS

Anyway this test must be fixed and this is done in version v5 of the 
patch attached here.



Thanks for the review.

--
Gilles Darold
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 334e4b7fd1..8ce4d5ff41 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -775,6 +775,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --exclude-table-and-children=pattern
+  
+   
+Same as -T/--exclude-table option but automatically
+exclude partitions or child tables of the specified tables if any.
+   
+  
+ 
+
  
   --exclude-table-data=pattern
   
@@ -793,6 +803,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --exclude-table-data-and-children=pattern
+  
+   
+Same as --exclude-table-data  but automatically
+exclude partitions or child tables of the specified tables if any.
+   
+  
+ 
+
+
  
   --extra-float-digits=ndigits
   
@@ -1158,6 +1179,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --table-and-children=pattern
+  
+   
+Same as -t/--table option but automatically
+include partitions or child tables of the specified tables if any.
+   
+  
+ 
+
  
   --use-set-session-authorization
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 4217908f84..09d37991d6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -130,6 +130,10 @@ static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
 static SimpleStringList extension_include_patterns = {NULL, NULL};
 static SimpleOidList extension_include_oids = {NULL, NULL};
 
+static SimpleStringList table_include_patterns_and_children = {NULL, NULL};
+static SimpleStringList table_exclude_patterns_and_children = {NULL, NULL};
+static SimpleStringList tabledata_exclude_patterns_and_children = {NULL, NULL};
+
 static const CatalogId nilCatalogId = {0, 0};
 
 /* override for standard extra_float_digits setting */
@@ -180,7 +184,8 @@ static void expand_foreign_server_name_patterns(Archive *fout,
 static void expand_table_name_patterns(Archive *fout,
 	   SimpleStringList *patterns,
 	   SimpleOidList *oids,
-	   bool strict_names);
+	   bool strict_names,
+	   bool with_child_tables);
 static void prohibit_crossdb_refs(PGconn *conn, const char *dbname,
   const char *pattern);
 
@@ -421,6 +426,9 @@ main(int argc, char **argv)
 		{"on-conflict-do-nothing", no_argument, _nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
 		{"include-foreign-data", required_argument, NULL, 11},
+		{"table-and-children", required_argument, NULL, 12},
+		{"exclude-table-and-children", required_argument, NULL, 13},
+		{"exclude-table-data-and-children", required_argument, NULL, 14},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -631,6 +639,19 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:			/* include table(s) with children */
+simple_string_list_append(_include_patterns_and_children, optarg);
+dopt.include_everything = false;
+break;
+
+			case 13:			/* exclude table(s) with children */
+simple_string_list_append(_exclude_patterns_and_children, optarg);
+break;
+
+			case 14:/* exclude table(s) data */
+simple_string_list_append(_exclude_patterns_and_children, optarg);
+break;
+
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" fo

Re: [Proposal] Allow pg_dump to include all child tables with the root table

2023-03-12 Thread Gilles Darold

Le 11/03/2023 à 19:51, Gilles Darold a écrit :

Le 04/03/2023 à 20:18, Tom Lane a écrit :

As noted, "childs" is bad English and "partitions" is flat out wrong
(unless you change it to recurse only to partitions, which doesn't
seem like a better definition).  We could go with
--[exclude-]table-and-children, or maybe
--[exclude-]table-and-child-tables, but those are getting into
carpal-tunnel-syndrome-inducing territory .  I lack a better
naming suggestion offhand.



In attachment is version 3 of the patch, it includes the use of 
options suggested by Stephane and Tom:


    --table-and-children,

    --exclude-table-and-children

    --exclude-table-data-and-children.

 Documentation have been updated too.


Thanks



New version v4 of the patch attached with a typo in documentation fixed.

--
Gilles Darold.
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 334e4b7fd1..8ce4d5ff41 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -775,6 +775,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --exclude-table-and-children=pattern
+  
+   
+Same as -T/--exclude-table option but automatically
+exclude partitions or child tables of the specified tables if any.
+   
+  
+ 
+
  
   --exclude-table-data=pattern
   
@@ -793,6 +803,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --exclude-table-data-and-children=pattern
+  
+   
+Same as --exclude-table-data  but automatically
+exclude partitions or child tables of the specified tables if any.
+   
+  
+ 
+
+
  
   --extra-float-digits=ndigits
   
@@ -1158,6 +1179,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --table-and-children=pattern
+  
+   
+Same as -t/--table option but automatically
+include partitions or child tables of the specified tables if any.
+   
+  
+ 
+
  
   --use-set-session-authorization
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 4217908f84..09d37991d6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -130,6 +130,10 @@ static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
 static SimpleStringList extension_include_patterns = {NULL, NULL};
 static SimpleOidList extension_include_oids = {NULL, NULL};
 
+static SimpleStringList table_include_patterns_and_children = {NULL, NULL};
+static SimpleStringList table_exclude_patterns_and_children = {NULL, NULL};
+static SimpleStringList tabledata_exclude_patterns_and_children = {NULL, NULL};
+
 static const CatalogId nilCatalogId = {0, 0};
 
 /* override for standard extra_float_digits setting */
@@ -180,7 +184,8 @@ static void expand_foreign_server_name_patterns(Archive *fout,
 static void expand_table_name_patterns(Archive *fout,
 	   SimpleStringList *patterns,
 	   SimpleOidList *oids,
-	   bool strict_names);
+	   bool strict_names,
+	   bool with_child_tables);
 static void prohibit_crossdb_refs(PGconn *conn, const char *dbname,
   const char *pattern);
 
@@ -421,6 +426,9 @@ main(int argc, char **argv)
 		{"on-conflict-do-nothing", no_argument, _nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
 		{"include-foreign-data", required_argument, NULL, 11},
+		{"table-and-children", required_argument, NULL, 12},
+		{"exclude-table-and-children", required_argument, NULL, 13},
+		{"exclude-table-data-and-children", required_argument, NULL, 14},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -631,6 +639,19 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:			/* include table(s) with children */
+simple_string_list_append(_include_patterns_and_children, optarg);
+dopt.include_everything = false;
+break;
+
+			case 13:			/* exclude table(s) with children */
+simple_string_list_append(_exclude_patterns_and_children, optarg);
+break;
+
+			case 14:/* exclude table(s) data */
+simple_string_list_append(_exclude_patterns_and_children, optarg);
+break;
+
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -814,17 +835,34 @@ main(int argc, char **argv)
 	{
 		expand_table_name_patterns(fout, _include_patterns,
    _include_oids,
-   strict_names);
+   strict_names, false);
 		if (table_include_oids.head == NULL)
 			pg_fatal("no matching tables were found");
 	}
 	expand_table_name_patterns(fout, _exclude_patterns,
 			   _exclude_oids,
-			   false);
+			   false, false);
 
 	expand_table_name_patterns(fout, _exclude_patterns,
 			   _exclude_oids,
-			   false);
+			   false, false);
+
+	/* Expand table and children selection patterns in

Re: [Proposal] Allow pg_dump to include all child tables with the root table

2023-03-11 Thread Gilles Darold

Le 04/03/2023 à 20:18, Tom Lane a écrit :

As noted, "childs" is bad English and "partitions" is flat out wrong
(unless you change it to recurse only to partitions, which doesn't
seem like a better definition).  We could go with
--[exclude-]table-and-children, or maybe
--[exclude-]table-and-child-tables, but those are getting into
carpal-tunnel-syndrome-inducing territory .  I lack a better
naming suggestion offhand.



In attachment is version 3 of the patch, it includes the use of options 
suggested by Stephane and Tom:


    --table-and-children,

    --exclude-table-and-children

    --exclude-table-data-and-children.

 Documentation have been updated too.


Thanks

--
Gilles Darold
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 334e4b7fd1..e97b73194e 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -775,6 +775,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --exclude-table-and-children=pattern
+  
+   
+Same as -T/--exclude-table option but automatically
+exclude partitions or child tables of the specified tables if any.
+   
+  
+ 
+
  
   --exclude-table-data=pattern
   
@@ -793,6 +803,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --exclude-table-data-and-children=pattern
+  
+   
+Same as --exclude-table-data  but automatically
+exclude partitions or child tables of the specified tables if any.
+   
+  
+ 
+
+
  
   --extra-float-digits=ndigits
   
@@ -1158,6 +1179,16 @@ PostgreSQL documentation
   
  
 
+ 
+  --table-with-children=pattern
+  
+   
+Same as -t/--table option but automatically
+include partitions or child tables of the specified tables if any.
+   
+  
+ 
+
  
   --use-set-session-authorization
   
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 4217908f84..09d37991d6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -130,6 +130,10 @@ static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
 static SimpleStringList extension_include_patterns = {NULL, NULL};
 static SimpleOidList extension_include_oids = {NULL, NULL};
 
+static SimpleStringList table_include_patterns_and_children = {NULL, NULL};
+static SimpleStringList table_exclude_patterns_and_children = {NULL, NULL};
+static SimpleStringList tabledata_exclude_patterns_and_children = {NULL, NULL};
+
 static const CatalogId nilCatalogId = {0, 0};
 
 /* override for standard extra_float_digits setting */
@@ -180,7 +184,8 @@ static void expand_foreign_server_name_patterns(Archive *fout,
 static void expand_table_name_patterns(Archive *fout,
 	   SimpleStringList *patterns,
 	   SimpleOidList *oids,
-	   bool strict_names);
+	   bool strict_names,
+	   bool with_child_tables);
 static void prohibit_crossdb_refs(PGconn *conn, const char *dbname,
   const char *pattern);
 
@@ -421,6 +426,9 @@ main(int argc, char **argv)
 		{"on-conflict-do-nothing", no_argument, _nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
 		{"include-foreign-data", required_argument, NULL, 11},
+		{"table-and-children", required_argument, NULL, 12},
+		{"exclude-table-and-children", required_argument, NULL, 13},
+		{"exclude-table-data-and-children", required_argument, NULL, 14},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -631,6 +639,19 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:			/* include table(s) with children */
+simple_string_list_append(_include_patterns_and_children, optarg);
+dopt.include_everything = false;
+break;
+
+			case 13:			/* exclude table(s) with children */
+simple_string_list_append(_exclude_patterns_and_children, optarg);
+break;
+
+			case 14:/* exclude table(s) data */
+simple_string_list_append(_exclude_patterns_and_children, optarg);
+break;
+
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -814,17 +835,34 @@ main(int argc, char **argv)
 	{
 		expand_table_name_patterns(fout, _include_patterns,
    _include_oids,
-   strict_names);
+   strict_names, false);
 		if (table_include_oids.head == NULL)
 			pg_fatal("no matching tables were found");
 	}
 	expand_table_name_patterns(fout, _exclude_patterns,
 			   _exclude_oids,
-			   false);
+			   false, false);
 
 	expand_table_name_patterns(fout, _exclude_patterns,
 			   _exclude_oids,
-			   false);
+			   false, false);
+
+	/* Expand table and children selection patterns into OID lists */
+	if (table_include_patterns_and_children.head != NULL)
+	{
+		expand_table_name_patterns(fout, _include_patterns_an

Re: [Proposal] Allow pg_dump to include all child tables with the root table

2023-03-05 Thread Gilles Darold

Le 04/03/2023 à 19:18, Tom Lane a écrit :

Gilles Darold  writes:

But I disagree the use of --table-with-childs and
--exclude-table-with-childs because we already have the --table and
--exclude-table, and it will add lot of code where we just need a switch
to include children tables.

I quite dislike the idea of a separate --with-whatever switch, because
it will (presumably) apply to all of your --table and --exclude-table
switches, where you may need it to apply to just some of them.
Spelling considerations aside, attaching the property to the
individual switches seems far superior.  And I neither believe that
this would add a lot of code, nor accept that as an excuse even if
it's true.y..



Right, this is not a lot of code but just more code where I think we 
just need a switch. I much prefer that it applies to all --table / 
--exclude-table because this is generally the behavior we want for all 
root/parent tables. But I agree that in some cases users could want that 
this behavior applies to some selected tables only so the proposed new 
options can answer to this need. Even if generally in similar cases 
several pg_dump commands can be used. This is just my opinion, I will 
adapt the patch to use the proposed new options.



But, what do you think about having pg_dump default to dump children 
tables with --table / --exclude-table? I was very surprised that this 
was not the case the first time I see that. In this case we could add 
--[exclude-]table-no-child-tables. I think this form will be less used 
than the form where we need the child tables to be dump with the parent 
table, meaning that most of the time pg_dump commands using --table and 
--exclude-table will be kept untouched and those using more regexp to 
dump child tables could be simplified. I'm not sure that the backward 
compatibility is an argument here to not change the default behavior of 
pg_dump.


--

Gilles



--
Gilles Darold





Re: [Proposal] Allow pg_dump to include all child tables with the root table

2023-02-25 Thread Gilles Darold

Le 25/02/2023 à 16:40, Stéphane Tachoires a écrit :

Hi,

I'm not sure about the "child" -> "partition" change as it also 
selects childs that are not partitions.
I'm more dubious about the --with-childs option, I'd rather have 
--table-with-childs= and 
--exclude-table-with-childs=. That will be clearer about what 
is what.


I'm working on that, but have a hard time with 
test pg_dump/002_pg_dump (It's brand new to me)


Stéphane

Le ven. 24 févr. 2023 à 23:50, Cary Huang  a écrit :

The following review has been posted through the commitfest
application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

Hi

the patch applies fine on current master branch and it works as
described. However, I would suggest changing the new option name
from "--with-childs" to "--with-partitions" for several reasons.

"childs" is grammatically incorrect and in the PG community, the
term "partitioned table" is normally used to denote a parent
table, and the term "partition" is used to denote the child table
under the parent table. We should use these terms to stay
consistent with the community.

Also, I would rephrase the documentation as:

Used in conjunction with
-t/--table or
-T/--exclude-table options to
include or exclude partitions of the specified tables if any.

thank you

Cary Huang

HighGo Software Canada
www.highgo.ca <http://www.highgo.ca>



Hi,


This is right this patch also works with inherited tables so 
--with-partitions can be confusing this is why --with-childs was chosen. 
But I disagree the use of --table-with-childs and 
--exclude-table-with-childs because we already have the --table and 
--exclude-table, and it will add lot of code where we just need a switch 
to include children tables. Actually my first though was that this 
behavior (dump child tables when the root table is dumped using --table) 
should be the default in pg_dump but the problem is that it could break 
existing scripts using pg_dump so I prefer to implement the 
--with-childs options.



I think we can use --with-partitions, provided that it is clear in the 
documentation that this option also works with inheritance.



Attached is a new patch v2 using the --with-partitions and the 
documentation fix.



--
Gilles Darold
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 49d218905f..15ada5c8ee 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1173,6 +1173,18 @@ PostgreSQL documentation
   
  
 
+ 
+  --with-partitions
+  
+   
+	Used in conjunction with -t/--table
+	or -T/--exclude-table options to
+	include or exclude partitions of the specified tables if any.
+	This option is also valid for tables using inheritance.
+   
+  
+ 
+
  
-?
--help
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..2e7e919391 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -200,6 +200,7 @@ typedef struct _dumpOptions
 
 	int			sequence_data;	/* dump sequence data even in schema-only mode */
 	int			do_nothing;
+	bool			with_partitions;
 } DumpOptions;
 
 /*
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 24ba936332..2b7a122d7e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -421,6 +421,7 @@ main(int argc, char **argv)
 		{"on-conflict-do-nothing", no_argument, _nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
 		{"include-foreign-data", required_argument, NULL, 11},
+		{"with-partitions", no_argument, NULL, 12},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -631,6 +632,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:/* dump child table too */
+dopt.with_partitions = true;
+break;
+
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -809,6 +814,14 @@ main(int argc, char **argv)
 false);
 	/* non-matching exclusion patterns aren't an error */
 
+	/*
+	 * The include child option require that there is
+	 * at least one table inclusion
+	 */
+	if (dopt.with_partitions && table_include_patterns.head == NULL
+			&& table_exclude_patterns.head == NULL)
+		pg_fatal("option --with-partitions requires option -t/--table or -T/--exclude-table");
+
 	/* Expand table selection patterns into OID lists */
 	if (table_include_patterns.head != NULL)
 	{
@@ -1087,6 +1100,9 @@ help(const char *progname)
 	printf(_("  --use-set-session-authorization\n"
 			 "   

Re: fix and document CLUSTER privileges

2023-01-14 Thread Gilles Darold

Le 11/01/2023 à 18:54, Nathan Bossart a écrit :

On Wed, Jan 11, 2023 at 02:22:26PM +0100, Gilles Darold wrote:

I'm moving this commitfest entry to Ready for Committers.

Thank you for reviewing.

I have changed the status to "Returned with feedback" as per commit 
ff9618e8 this patch might not be applied anymore if I have well understood.



Nathan, please confirm and fix the status of this commit fest entry.


Best regards,

--
Gilles Darold





[Proposal] Allow pg_dump to include all child tables with the root table

2023-01-11 Thread Gilles Darold

Hi all,


I would like to propose a new pg_dump option called --with-child to 
include or exclude from a dump all child and partition tables when a 
parent table is specified using option -t/--table or -T/--exclude-table. 
The whole tree is dumped with the root table.



To include all partitions or child tables with inheritance in a table 
dump we usually use the wildcard, for example:



    pg_dump -d mydb -t "root_tbname*" > out.sql


This suppose that all child/partition tables use the prefix root_tbname 
in their object name. This is often the case but, if you are as lucky as 
me, the partitions could have a total different name. No need to say 
that for inheritance this is rarely the case. The other problem is that 
with the wildcard you can also dump relations that are not concerned at 
all by what you want to dump. Using the --with-child option will allow 
to just specify the root relation and all child/partition definitions 
and/or data will be parts of dump.



    pg_dump -d mydb --table "root_tbname" --with-childs > out.sql


To exclude a whole inheritance tree from a dump:


    pg_dump -d mydb --exclude-table "root_tbname" --with-childs > out.sql


Here in attachment the patch that adds this feature to pg_dump.


Is there is any interest for this feature?


Best regards,

--
Gilles Darold
https://www.migops.com/
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 2c938cd7e1..f9635442f9 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1172,6 +1172,17 @@ PostgreSQL documentation
   
  
 
+ 
+  --with-childs
+  
+   
+Include or exclude from a dump all child and partition tables when a parent
+table is specified using option -t/--table
+or -T/--exclude-table.
+   
+  
+ 
+
  
-?
--help
diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index aba780ef4b..09284c82be 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -200,6 +200,7 @@ typedef struct _dumpOptions
 
 	int			sequence_data;	/* dump sequence data even in schema-only mode */
 	int			do_nothing;
+	bool			with_childs;
 } DumpOptions;
 
 /*
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5e800dc79a..83c092080e 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -421,6 +421,7 @@ main(int argc, char **argv)
 		{"on-conflict-do-nothing", no_argument, _nothing, 1},
 		{"rows-per-insert", required_argument, NULL, 10},
 		{"include-foreign-data", required_argument, NULL, 11},
+		{"with-childs", no_argument, NULL, 12},
 
 		{NULL, 0, NULL, 0}
 	};
@@ -631,6 +632,10 @@ main(int argc, char **argv)
 		  optarg);
 break;
 
+			case 12:/* dump child table too */
+dopt.with_childs = true;
+break;
+
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -810,6 +815,14 @@ main(int argc, char **argv)
 false);
 	/* non-matching exclusion patterns aren't an error */
 
+	/*
+	 * The include child option require that there is
+	 * at least one table inclusion
+	 */
+	if (dopt.with_childs && table_include_patterns.head == NULL
+			&& table_exclude_patterns.head == NULL)
+		pg_fatal("option --with-childs requires option -t/--table or -T/--exclude-table");
+
 	/* Expand table selection patterns into OID lists */
 	if (table_include_patterns.head != NULL)
 	{
@@ -1088,6 +1101,9 @@ help(const char *progname)
 	printf(_("  --use-set-session-authorization\n"
 			 "   use SET SESSION AUTHORIZATION commands instead of\n"
 			 "   ALTER OWNER commands to set ownership\n"));
+	printf(_("  --with-childsinclude or exclude from a dump all child and partition\n"
+			 "   tables when a parent table is specified using\n"
+			 "   -t/--table or -T/--exclude-table\n"));
 
 	printf(_("\nConnection options:\n"));
 	printf(_("  -d, --dbname=DBNAME  database to dump\n"));
@@ -1520,6 +1536,15 @@ expand_table_name_patterns(Archive *fout,
 		PQExpBufferData dbbuf;
 		int			dotcnt;
 
+		/*
+		 * With --include_child we look recursively to the inheritance
+		 * tree to find the childs tables of the matching include filter
+		 */
+		if (fout->dopt->with_childs)
+		{
+			appendPQExpBuffer(query, "WITH RECURSIVE child_tree (relid) AS (\n");
+		}
+
 		/*
 		 * Query must remain ABSOLUTELY devoid of unqualified names.  This
 		 * would be unnecessary given a pg_table_is_visible() variant taking a
@@ -1547,6 +1572,20 @@ expand_table_name_patterns(Archive *fout,
 			prohibit_crossdb_refs(Get

Re: fix and document CLUSTER privileges

2023-01-11 Thread Gilles Darold

Le 06/01/2023 à 01:26, Nathan Bossart a écrit :

Apparently I forgot to run all the tests before posting v4.  Here is a new
version of the patch that should pass all tests.


Review status:


The patch applies and compiles without issues, make check and 
checkinstall tests are running without error.


It aim to limit the permission check to run the CLUSTER command on a 
partition to ownership and the MAINTAIN privilege. Which it actually does.


In commit 3f19e17, to have CLUSTER ignore partitions not owned by 
caller, there was still a useless check of database ownership or shared 
relation in get_tables_to_cluster_partitioned().



Documentation have been updated to explain the conditions of a 
successful execution of the CLUSTER command.



Additionally this patch also adds a warning when a partition is skipped 
due to lack of permission just like VACUUM is doing:


    WARNING:  permission denied to vacuum "ptnowner2", skipping it

with CLUSTER now we have the same message:

    WARNING:  permission denied to cluster "ptnowner2", skipping it

Previous behavior was to skip the partition silently.


Tests on the CLUSTER command have been modified to skip warning messages 
except partially in src/test/regress/sql/cluster.sql to validate the 
presence of the warning.



I'm moving this commitfest entry to Ready for Committers.


Regards,

--
Gilles Darold





Re: fix and document CLUSTER privileges

2023-01-05 Thread Gilles Darold

Le 05/01/2023 à 06:12, Nathan Bossart a écrit :

On Wed, Jan 04, 2023 at 11:27:05PM +0100, Gilles Darold wrote:

Got it, this is patch add_cluster_skip_messages.patch . IMHO this patch
should be part of this commitfest as it is directly based on this one. You
could create a second patch here that adds the warning message so that
committers can decide here if it should be applied.

That's fine with me.  I added the warning messages in v4.



This is a bit confusing, this commitfest entry patch is also included in 
an other commitfest entry [1] into patch 
v3-0001-fix-maintain-privs.patch with some additional conditions.



Committers should be aware that this commitfest entry must be withdrawn 
if [1] is committed first.  There is no status or dependency field that 
I can use, I could move this one to "Ready for Committer" status but 
this is not exact until [1] has been committed or withdrawn.



[1] https://commitfest.postgresql.org/41/4070/


--
Gilles Darold





Re: fix and document CLUSTER privileges

2023-01-04 Thread Gilles Darold

Le 04/01/2023 à 19:18, Nathan Bossart a écrit :

On Wed, Jan 04, 2023 at 02:25:13PM +0100, Gilles Darold wrote:

This is the current behavior of the CLUSTER command and current patch adds a
sentence about the silent behavior in the documentation. This is good but I
just want to ask if we could want to fix this behavior too or just keep
things like that with the lack of noise.

I've proposed something like what you are describing in another thread [0].
I intended to simply fix and document the current behavior in this thread
and to take up any new changes in the other one.

[0] https://commitfest.postgresql.org/41/4070/



Got it, this is patch add_cluster_skip_messages.patch . IMHO this patch 
should be part of this commitfest as it is directly based on this one. 
You could create a second patch here that adds the warning message so 
that committers can decide here if it should be applied.



--
Gilles Darold





Re: fix and document CLUSTER privileges

2023-01-04 Thread Gilles Darold

Le 16/12/2022 à 05:57, Nathan Bossart a écrit :

Here is a new version of the patch.  I've moved the privilege checks to a
new function, and I added a note in the docs about clustering partitioned
tables in a transaction block (it's not allowed).



Getting into review of this patch I wonder why the CLUSTER command do 
not react as VACUUM FULL command when there is insuffisant privileges. 
For example with a partitioned table (ptnowner) and two partitions 
(ptnowner1 and ptnowner2) with the second partition owned by another 
user, let' say usr2. We have the following report when executing vacuum 
as usr2:


testdb=> VACUUM FULL ptnowner;
WARNING:  permission denied to vacuum "ptnowner", skipping it
WARNING:  permission denied to vacuum "ptnowner1", skipping it
VACUUM

Here only ptnowner2 have been vacuumed which is correct and expected.

For the cluster command:

testdb=> CLUSTER;
CLUSTER


I would have expected something like:

testdb=> CLUSTER;
WARNING:  permission denied to cluster "ptnowner1", skipping it
CLUSTER

I mean that the silent behavior is not very helpful.


This is the current behavior of the CLUSTER command and current patch 
adds a sentence about the silent behavior in the documentation. This is 
good but I just want to ask if we could want to fix this behavior too or 
just keep things like that with the lack of noise.



Best regards,

--
Gilles Darold


Re: [PATCH] pg_dump: lock tables in batches

2023-01-03 Thread Gilles Darold

Le 08/12/2022 à 01:03, Tom Lane a écrit :

Andres Freund  writes:

On 2022-12-07 17:53:05 -0500, Tom Lane wrote:

Is "-s" mode actually a relevant criterion here?  With per-table COPY
commands added into the mix you could not possibly get better than 2x
improvement, and likely a good deal less.

Well, -s isn't something used all that rarely, so it'd not be insane to
optimize it in isolation. But more importantly, I think the potential win
without -s is far bigger than 2x, because the COPYs can be done in parallel,
whereas the locking happens in the non-parallel stage.

True, and there's the reduce-the-lock-window issue that Jacob mentioned.


With just a 5ms delay, very well within normal network latency range, I get:
[ a nice win ]

OK.  I'm struggling to figure out why I rejected this idea last year.
I know that I thought about it and I'm fairly certain I actually
tested it.  Maybe I only tried it with near-zero-latency local
loopback; but I doubt that, because the potential for network
latency was certainly a factor in that whole discussion.

One idea is that I might've tried it before getting rid of all the
other per-object queries, at which point it wouldn't have stood out
quite so much.  But I'm just guessing.  I have a nagging feeling
there was something else.

Oh well, I guess we can always revert it if we discover a problem later.

regards, tom lane


Hi,


I have done a review of this patch, it applies well on current master 
and compiles without problem.


make check/installcheck and world run without failure, pg_dump tests 
with pgtap enabled work fine too.


I have also given a try to the bench mentioned in the previous posts and 
I have the same performances gain with the -s option.



As it seems to have a consensus to apply this patch I will change the 
commitfest status to ready for committers.



Regards,

--
Gilles Darold





Re: [BUG] pg_dump blocked

2022-11-18 Thread Gilles Darold

Le 17/11/2022 à 17:59, Tom Lane a écrit :

Gilles Darold  writes:

I have an incorrect behavior with pg_dump prior PG version 15. With
PostgreSQL 15, thanks to commit e3fcbbd623b9ccc16cdbda374654d91a4727d173
the problem is gone but for older versions it persists with locks on
partitioned tables.

I didn't want to back-patch e3fcbbd62 at the time, but it's probably aged
long enough now to be safe to back-patch.  If we do anything here,
it should be to back-patch the whole thing, else we've only partially
fixed the issue.



Here are the different patched following the PostgreSQL version from 11 
to 14, they should apply on the corresponding stable branches. The 
patches only concern the move of the unsafe functions, 
pg_get_partkeydef() and pg_get_expr(). They should all apply without 
problem on their respective branch, pg_dump tap regression test passed 
on all versions.


Regards,

--
Gilles Darold
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e110257395..cbe7c1e01d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -6322,14 +6322,15 @@ getTables(Archive *fout, int *numTables)
 	int			i_foreignserver;
 	int			i_is_identity_sequence;
 	int			i_changed_acl;
-	int			i_partkeydef;
 	int			i_ispartition;
-	int			i_partbound;
 	int			i_amname;
 
 	/*
 	 * Find all the tables and table-like objects.
 	 *
+	 * We must fetch all tables in this phase because otherwise we cannot
+	 * correctly identify inherited columns, owned sequences, etc.
+	 *
 	 * We include system catalogs, so that we can work if a user table is
 	 * defined to inherit from a system catalog (pretty weird, but...)
 	 *
@@ -6343,8 +6344,10 @@ getTables(Archive *fout, int *numTables)
 	 *
 	 * Note: in this phase we should collect only a minimal amount of
 	 * information about each table, basically just enough to decide if it is
-	 * interesting. We must fetch all tables in this phase because otherwise
-	 * we cannot correctly identify inherited columns, owned sequences, etc.
+	 * interesting.  In particular, since we do not yet have lock on any user
+	 * table, we MUST NOT invoke any server-side data collection functions
+	 * (for instance, pg_get_partkeydef()).  Those are likely to fail or give
+	 * wrong answers if any concurrent DDL is happening.
 	 *
 	 * We purposefully ignore toast OIDs for partitioned tables; the reason is
 	 * that versions 10 and 11 have them, but 12 does not, so emitting them
@@ -6353,9 +6356,7 @@ getTables(Archive *fout, int *numTables)
 
 	if (fout->remoteVersion >= 90600)
 	{
-		char	   *partkeydef = "NULL";
 		char	   *ispartition = "false";
-		char	   *partbound = "NULL";
 		char	   *relhasoids = "c.relhasoids";
 
 		PQExpBuffer acl_subquery = createPQExpBuffer();
@@ -6374,11 +6375,7 @@ getTables(Archive *fout, int *numTables)
 		 */
 
 		if (fout->remoteVersion >= 10)
-		{
-			partkeydef = "pg_get_partkeydef(c.oid)";
 			ispartition = "c.relispartition";
-			partbound = "pg_get_expr(c.relpartbound, c.oid)";
-		}
 
 		/* In PG12 upwards WITH OIDS does not exist anymore. */
 		if (fout->remoteVersion >= 12)
@@ -6419,7 +6416,7 @@ getTables(Archive *fout, int *numTables)
 		  "CASE WHEN c.relkind = 'f' THEN "
 		  "(SELECT ftserver FROM pg_catalog.pg_foreign_table WHERE ftrelid = c.oid) "
 		  "ELSE 0 END AS foreignserver, "
-		  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, "
+		  "c.reloftype, "
 		  "d.refobjid AS owning_tab, "
 		  "d.refobjsubid AS owning_col, "
 		  "(SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, "
@@ -6439,9 +6436,7 @@ getTables(Archive *fout, int *numTables)
 		  "OR %s IS NOT NULL"
 		  "))"
 		  "AS changed_acl, "
-		  "%s AS partkeydef, "
-		  "%s AS ispartition, "
-		  "%s AS partbound "
+		  "%s AS ispartition "
 		  "FROM pg_class c "
 		  "LEFT JOIN pg_depend d ON "
 		  "(c.relkind = '%c' AND "
@@ -6467,9 +6462,7 @@ getTables(Archive *fout, int *numTables)
 		  attracl_subquery->data,
 		  attinitacl_subquery->data,
 		  attinitracl_subquery->data,
-		  partkeydef,
 		  ispartition,
-		  partbound,
 		  RELKIND_SEQUENCE,
 		  RELKIND_PARTITIONED_TABLE,
 		  RELKIND_RELATION, RELKIND_SEQUENCE,
@@ -6512,7 +6505,7 @@ getTables(Archive *fout, int *numTables)
 		  "CASE WHEN c.relkind = 'f' THEN "
 		  "(SELECT ftserver FROM pg_catalog.pg_foreign_table WHERE ftrelid = c.oid) "
 		  "ELSE 0 END AS foreignserver, "
-		  "CASE WHEN c.reloftype <> 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END A

Re: [BUG] pg_dump blocked

2022-11-17 Thread Gilles Darold

Le 17/11/2022 à 17:59, Tom Lane a écrit :

Gilles Darold  writes:

I have an incorrect behavior with pg_dump prior PG version 15. With
PostgreSQL 15, thanks to commit e3fcbbd623b9ccc16cdbda374654d91a4727d173
the problem is gone but for older versions it persists with locks on
partitioned tables.

I didn't want to back-patch e3fcbbd62 at the time, but it's probably aged
long enough now to be safe to back-patch.  If we do anything here,
it should be to back-patch the whole thing, else we've only partially
fixed the issue.



I can handle this work.

--
Gilles Darold





[BUG] pg_dump blocked

2022-11-17 Thread Gilles Darold

Hi,


I have an incorrect behavior with pg_dump prior PG version 15. With 
PostgreSQL 15, thanks to commit e3fcbbd623b9ccc16cdbda374654d91a4727d173 
the problem is gone but for older versions it persists with locks on 
partitioned tables.



When we try to dump a database where a table is locked, pg_dump wait 
until the lock is released, this is expected. Now if the table is 
excluded from the dump using the -T option, obviously pg_dump is not 
concerned by the lock. Unfortunately this is not the case when the table 
is partitioned because of the call to pg_get_partkeydef(), pg_get_expr() 
in the query generated in getTables().  Here is the use case to reproduce.



In a psql session execute:

   BEGIN;

   LOCK TABLE measurement;

then run a pg_dump command excluding the measurement partitions:

    pg_dump -d test -T "measurement*" > /dev/null

it will not end until the lock on the partition is released.

I think the problem is the same if we use a schema exclusion where the 
partitioned table is locked.



Is it possible to consider a backport fix? If yes, does adding the 
table/schema filters in the query generated in getTables() is enough or 
do you think about of a kind of backport of commit 
e3fcbbd623b9ccc16cdbda374654d91a4727d173 ?



Best regards,

--
Gilles Darold


Re: [Proposal] vacuumdb --schema only

2022-04-25 Thread Gilles Darold
goo
Le 25/04/2022 à 03:27, Nathan Bossart a écrit :
> On Fri, Apr 22, 2022 at 10:57:46PM -0700, Nathan Bossart wrote:
>> On Fri, Apr 22, 2022 at 11:57:05AM +0200, Gilles Darold wrote:
>>> Patch v10 attached.
>> Thanks!  I've attached a v11 with some minor editorialization.  I think I
>> was able to improve the error handling for invalid combinations of
>> command-line options a bit, but please let me know what you think.
> I've attached a v12 of the patch that further simplifieѕ check_objfilter().
> Apologies for the noise.
>

Looks good for me, there is a failure on cfbot on FreeBSD but I have run
a CI with latest master and it pass.


Can I change the commitfest status to ready for committers?


-- 
Gilles Darold





Re: [Proposal] vacuumdb --schema only

2022-04-22 Thread Gilles Darold

Le 20/04/2022 à 19:38, Nathan Bossart a écrit :

Thanks for the new patch!  I think this is on the right track.

On Wed, Apr 20, 2022 at 05:15:02PM +0200, Gilles Darold wrote:

Le 18/04/2022 à 23:56, Nathan Bossart a écrit :

-   if (!tables_listed)
+   if (!objects_listed || objfilter == OBJFILTER_SCHEMA)

Do we need to check for objects_listed here?  IIUC we can just check for
objfilter != OBJFILTER_TABLE.

Yes we need it otherwise test 'vacuumdb with view' fail because we are not
trying to vacuum the view so the PG doesn't report:

     WARNING:  cannot vacuum non-tables or special system tables

My point is that the only time we don't want to filter for relevant
relation types is when the user provides a list of tables.  So my
suggestion would be to simplify this to the following:

if (objfilter != OBJFILTER_TABLE)
{
appendPQExpBufferStr(...);
has_where = true;
}



Right, I must have gotten mixed up in the test results. Fixed.



Unless I'm missing something, schema_is_exclude appears to only be used for
error checking and doesn't impact the generated catalog query.  It looks
like the relevant logic disappeared after v4 of the patch.

Right, removed.

I don't think -N works at the moment.  I tested it out, and vacuumdb was
still processing tables in schemas I excluded.  Can we add a test case for
this, too?



Fixed and regression tests added as well as some others to test the 
filter options compatibility.




+/*
+ * Verify that the filters used at command line are compatible
+ */
+void
+check_objfilter(VacObjectFilter curr_objfilter, VacObjectFilter curr_option)
+{
+   switch (curr_option)
+   {
+   case OBJFILTER_NONE:
+   break;
+   case OBJFILTER_DATABASE:
+   /* When filtering on database name, vacuum on all 
database is not allowed. */
+   if (curr_objfilter == OBJFILTER_ALL)
+   pg_fatal("cannot vacuum all databases and a specific 
one at the same time");
+   break;
[...]
+   }
+}
I don't think this handles all combinations.  For example, the following
command does not fail:

vacuumdb -a -N test postgres



Right, I have fix them all in this new patch.



Furthermore, do you think it'd be possible to dynamically generate the
message?  If it doesn't add too much complexity, this might be a nice way
to simplify this function.



I have tried to avoid reusing the same error message several time by 
using a new enum and function filter_error(). I also use the same 
messages with --schema and --exclude-schema related errors.



Patch v10 attached.


--
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..0de001ef24 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+   
+
+ 
+  -n
+  --schema
+ 
+ schema
+
+   
+
+   
+
+ 
+  -N
+  --exclude-schema
+ 
+ schema
+
+   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='foo' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..9ef5c789e0 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,38 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(

Re: [Proposal] vacuumdb --schema only

2022-04-20 Thread Gilles Darold

Le 18/04/2022 à 23:56, Nathan Bossart a écrit :

On Thu, Apr 14, 2022 at 10:27:46PM +0200, Gilles Darold wrote:

Attached v8 of the patch that tries to address the remarks above, fixes
patch apply failure to master and replace calls to pg_log_error+exit
with pg_fatal.

Thanks for the new patch.


+enum trivalue schema_is_exclude = TRI_DEFAULT;
+
+/*
+ * The kind of object use in the command line filter.
+ *   OBJFILTER_NONE: no filter used
+ *   OBJFILTER_SCHEMA: -n | --schema or -N | --exclude-schema
+ *   OBJFILTER_TABLE: -t | --table
+ */
+enum VacObjectFilter
+{
+   OBJFILTER_NONE,
+   OBJFILTER_TABLE,
+   OBJFILTER_SCHEMA
+};
+
+enum VacObjectFilter objfilter = OBJFILTER_NONE;

I still think we ought to remove schema_is_exclude in favor of adding
OBJFILTER_SCHEMA_EXCLUDE to the enum.  I think that would simplify some of
the error handling and improve readability.  IMO we should add
OBJFILTER_ALL, too.


Fixed.



-   simple_string_list_append(, 
optarg);
+   /* When filtering on schema name, 
filter by table is not allowed. */
+   if (schema_is_exclude != TRI_DEFAULT)
+   pg_fatal("cannot vacuum all tables 
in schema(s) and specific table(s) at the same time");
+   simple_string_list_append(, 
optarg);
+   objfilter = OBJFILTER_TABLE;
tbl_count++;
break;
}
@@ -202,6 +224,32 @@ main(int argc, char *argv[])
  
_workers))
exit(1);
break;
+   case 'n':   /* include schema(s) */
+   {
+   /* When filtering on schema name, 
filter by table is not allowed. */
+   if (tbl_count)
+   pg_fatal("cannot vacuum all tables 
in schema(s) and specific table(s) at the same time");
+
+   if (schema_is_exclude == TRI_YES)
+   pg_fatal("cannot vacuum all tables 
in schema(s) and exclude specific schema(s) at the same time");
+   simple_string_list_append(, 
optarg);
+   objfilter = OBJFILTER_SCHEMA;
+   schema_is_exclude = TRI_NO;
+   break;
+   }
+   case 'N':   /* exclude schema(s) */
+   {
+   /* When filtering on schema name, 
filter by table is not allowed. */
+   if (tbl_count)
+   pg_fatal("cannot vacuum all tables 
in schema(s) and specific table(s) at the same time");
+   if (schema_is_exclude == TRI_NO)
+   pg_fatal("cannot vacuum all tables 
in schema(s) and exclude specific schema(s) at the same time");
+
+   simple_string_list_append(, 
optarg);
+   objfilter = OBJFILTER_SCHEMA;
+   schema_is_exclude = TRI_YES;
+   break;

I was expecting these to check objfilter.


Fixed.



Another possible improvement could be to move the pg_fatal() calls to a
helper function that generates the message based on the current objfilter
setting and the current option.  I'm envisioning something like
check_objfilter(VacObjFilter curr_objfilter, VacObjFilter curr_option).


I agree, done.



+   /*
+* When filtering on schema name, filter by table is not allowed.
+* The schema name can already be set to a fqdn table name.
+*/
+   if (tbl_count && objfilter == OBJFILTER_SCHEMA && objects.head != NULL)
+   pg_fatal("cannot vacuum all tables in schema(s) and specific 
table(s) at the same time");

Isn't this redundant with the error in the option handling?


Fixed.



-   if (tables.head != NULL)
+   if (objfilter == OBJFILTER_SCHEMA && objects.head != NULL)
+   {
+   if (schema_is_exclude == TRI_YES)
+   pg_fatal("cannot exclude from vacuum specific 
schema(s) in all databases");
+   else if (schema_is_exclude == TRI_NO)
+   pg_fatal("cann

Re: [Proposal] vacuumdb --schema only

2022-04-14 Thread Gilles Darold
Le 11/04/2022 à 20:37, Nathan Bossart a écrit :
> On Fri, Apr 08, 2022 at 05:16:06PM +0200, Gilles Darold wrote:
>> Attached v7 of the patch that should pass cfbot.
> Thanks for the new patch!  Unfortunately, it looks like some recent changes
> have broken it again.
>
>> +enum trivalue schema_is_exclude = TRI_DEFAULT;
>> +
>> +/*
>> + * The kind of object filter to use. '0': none, 'n': schema, 't': table
>> + * these values correspond to the -n | -N and -t command line options.
>> + */
>> +char objfilter = '0';
> I think these should be combined into a single enum for simplicity and
> readability (e.g., OBJFILTER_NONE, OBJFILTER_INCLUDE_SCHEMA,
> OBJFILTER_EXCLUDE_SCHEMA, OBJFILTER_TABLE).
>
>>   * Instead, let the server decide whether a given relation can be
>>   * processed in which case the user will know about it.
>>   */
>> -if (!tables_listed)
>> +if (!objects_listed || objfilter == 'n')
>>  {
>>  appendPQExpBufferStr(_query, " WHERE c.relkind 
>> OPERATOR(pg_catalog.=) ANY (array["
>>   
>> CppAsString2(RELKIND_RELATION) ", "
> I think this deserveѕ a comment.
>

Attached v8 of the patch that tries to address the remarks above, fixes
patch apply failure to master and replace calls to pg_log_error+exit
with pg_fatal.


.Thanks.

-- 
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..0de001ef24 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+   
+
+ 
+  -n
+  --schema
+ 
+ schema
+
+   
+
+   
+
+ 
+  -N
+  --exclude-schema
+ 
+ schema
+
+   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='foo' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..7bbfb97246 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,15 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".bar/,
+	'vacuumdb --schema schema only');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-t', 'pg_class', 'postgres' ],
+	'cannot vacuum all tables in schema(s) and specific table(s) at the same time');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-N',  '"Foo"', 'postgres' ],
+	'cannot use option -n | --schema and -N | --exclude-schema at the same time');
 
 done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 1dcf411767..b122c995b1 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -15,5 +15,8 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '-a' ],
 	qr/statement: VACUUM.*statement: VACUUM/s,
 	'vacuum all databases');
+$node->command_fails(
+	[ 'vacuumdb', '-a',  '-n', 'pg_catalog' ],
+	'cannot vacuum specific schema(s) in all databases');
 
 done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 92f1ffe147..ce353593ce 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -46,11 +46,28 @@ typedef struct vacuuming

Re: [Proposal] vacuumdb --schema only

2022-04-08 Thread Gilles Darold

Le 08/04/2022 à 02:46, Justin Pryzby a écrit :

On Wed, Apr 06, 2022 at 07:43:42PM +0200, Gilles Darold wrote:

Thanks for the review, all these changes are available in new version v6
of the patch and attached here.

This is failing in CI (except on macos, which is strangely passing).
http://cfbot.cputube.org/gilles-darold.html

https://api.cirrus-ci.com/v1/artifact/task/5379693443547136/log/src/bin/scripts/tmp_check/log/regress_log_100_vacuumdb

not ok 59 - vacuumdb --schema "Foo" postgres exit code 0

#   Failed test 'vacuumdb --schema "Foo" postgres exit code 0'
#   at t/100_vacuumdb.pl line 151.
not ok 60 - vacuumdb --schema schema only: SQL found in server log

#   Failed test 'vacuumdb --schema schema only: SQL found in server log'
#   at t/100_vacuumdb.pl line 151.
#   '2022-04-06 18:15:36.313 UTC [34857][not initialized] 
[[unknown]][:0] LOG:  connection received: host=[local]
# 2022-04-06 18:15:36.314 UTC [34857][client backend] [[unknown]][3/2801:0] 
LOG:  connection authorized: user=postgres database=postgres 
application_name=100_vacuumdb.pl
# 2022-04-06 18:15:36.318 UTC [34857][client backend] 
[100_vacuumdb.pl][3/2802:0] LOG:  statement: SELECT 
pg_catalog.set_config('search_path', '', false);
# 2022-04-06 18:15:36.586 UTC [34857][client backend] [100_vacuumdb.pl][:0] 
LOG:  disconnection: session time: 0:00:00.273 user=postgres database=postgres 
host=[local]
# '
# doesn't match '(?^:VACUUM "Foo".bar)'



Ok, got it with the help of rjuju. Actually it was compiling well using 
gcc but clang give some warnings. A fix of these warning makes CI happy.



Attached v7 of the patch that should pass cfbot.

--
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..0de001ef24 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+   
+
+ 
+  -n
+  --schema
+ 
+ schema
+
+   
+
+   
+
+ 
+  -N
+  --exclude-schema
+ 
+ schema
+
+   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='foo' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..7bbfb97246 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,15 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".bar/,
+	'vacuumdb --schema schema only');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-t', 'pg_class', 'postgres' ],
+	'cannot vacuum all tables in schema(s) and specific table(s) at the same time');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-N',  '"Foo"', 'postgres' ],
+	'cannot use option -n | --schema and -N | --exclude-schema at the same time');
 
 done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 1dcf411767..b122c995b1 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -15,5 +15,8 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '-a' ],
 	qr/statement: VACUUM.*statement: VACUUM/s,
 	'vacuum all databases');
+$node->command_fails(
+	[ 'vacuumdb', '-a',  '-n', 'pg_catalog' ],
+	'cannot vacuum specific schema(s) in all databases');
 
 done_testing();
diff --git a/src/bin/scri

Re: [Proposal] vacuumdb --schema only

2022-04-08 Thread Gilles Darold

Le 08/04/2022 à 02:46, Justin Pryzby a écrit :

On Wed, Apr 06, 2022 at 07:43:42PM +0200, Gilles Darold wrote:

Thanks for the review, all these changes are available in new version v6
of the patch and attached here.

This is failing in CI (except on macos, which is strangely passing).
http://cfbot.cputube.org/gilles-darold.html

https://api.cirrus-ci.com/v1/artifact/task/5379693443547136/log/src/bin/scripts/tmp_check/log/regress_log_100_vacuumdb

not ok 59 - vacuumdb --schema "Foo" postgres exit code 0

#   Failed test 'vacuumdb --schema "Foo" postgres exit code 0'
#   at t/100_vacuumdb.pl line 151.
not ok 60 - vacuumdb --schema schema only: SQL found in server log

#   Failed test 'vacuumdb --schema schema only: SQL found in server log'
#   at t/100_vacuumdb.pl line 151.
#   '2022-04-06 18:15:36.313 UTC [34857][not initialized] 
[[unknown]][:0] LOG:  connection received: host=[local]
# 2022-04-06 18:15:36.314 UTC [34857][client backend] [[unknown]][3/2801:0] 
LOG:  connection authorized: user=postgres database=postgres 
application_name=100_vacuumdb.pl
# 2022-04-06 18:15:36.318 UTC [34857][client backend] 
[100_vacuumdb.pl][3/2802:0] LOG:  statement: SELECT 
pg_catalog.set_config('search_path', '', false);
# 2022-04-06 18:15:36.586 UTC [34857][client backend] [100_vacuumdb.pl][:0] 
LOG:  disconnection: session time: 0:00:00.273 user=postgres database=postgres 
host=[local]
# '
# doesn't match '(?^:VACUUM "Foo".bar)'


I'm surprised because make check do do not reports errors running on an 
Ubuntu 20.04 and CentOs 8:



t/010_clusterdb.pl  ok
t/011_clusterdb_all.pl  ok
t/020_createdb.pl . ok
t/040_createuser.pl ... ok
t/050_dropdb.pl ... ok
t/070_dropuser.pl . ok
t/080_pg_isready.pl ... ok
t/090_reindexdb.pl  ok
t/091_reindexdb_all.pl  ok
t/100_vacuumdb.pl . ok
t/101_vacuumdb_all.pl . ok
t/102_vacuumdb_stages.pl .. ok
t/200_connstr.pl .. ok
All tests successful.
Files=13, Tests=233, 17 wallclock secs ( 0.09 usr  0.02 sys + 6.63 cusr  
2.68 csys =  9.42 CPU)

Result: PASS


In tmp_check/log/regress_log_100_vacuumdb:

# Running: vacuumdb --schema "Foo" postgres
vacuumdb: vacuuming database "postgres"
ok 59 - vacuumdb --schema "Foo" postgres exit code 0
ok 60 - vacuumdb --schema schema only: SQL found in server log

In PG log:

2022-04-08 11:01:44.519 CEST [17223] 100_vacuumdb.pl LOG: statement: 
RESET search_path;
2022-04-08 11:01:44.519 CEST [17223] 100_vacuumdb.pl LOG: statement: 
WITH listed_objects (object_oid, column_list) AS (
  VALUES ('"Foo"'::pg_catalog.regnamespace::pg_catalog.oid, 
NULL::pg_catalog.text)

    )
    SELECT c.relname, ns.nspname, listed_objects.column_list FROM 
pg_catalog.pg_class c
 JOIN pg_catalog.pg_namespace ns ON c.relnamespace 
OPERATOR(pg_catalog.=) ns.oid
 LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid 
OPERATOR(pg_catalog.=) t.oid
 JOIN listed_objects ON listed_objects.object_oid 
OPERATOR(pg_catalog.=) ns.oid

 WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm'])
 ORDER BY c.relpages DESC;
2022-04-08 11:01:44.521 CEST [17223] 100_vacuumdb.pl LOG: statement: 
SELECT pg_catalog.set_config('search_path', '', false);
2022-04-08 11:01:44.521 CEST [17223] 100_vacuumdb.pl LOG: statement: 
VACUUM "Foo".bar;


And if I run the command manually:

$ /usr/local/pgsql/bin/vacuumdb -e -h localhost --schema '"Foo"' -d 
contrib_regress -U postgres

SELECT pg_catalog.set_config('search_path', '', false);
vacuumdb: vacuuming database "contrib_regress"
RESET search_path;
WITH listed_objects (object_oid, column_list) AS (
  VALUES ('"Foo"'::pg_catalog.regnamespace::pg_catalog.oid, 
NULL::pg_catalog.text)

)
SELECT c.relname, ns.nspname, listed_objects.column_list FROM 
pg_catalog.pg_class c
 JOIN pg_catalog.pg_namespace ns ON c.relnamespace 
OPERATOR(pg_catalog.=) ns.oid
 LEFT JOIN pg_catalog.pg_class t ON c.reltoastrelid 
OPERATOR(pg_catalog.=) t.oid
 JOIN listed_objects ON listed_objects.object_oid 
OPERATOR(pg_catalog.=) ns.oid

 WHERE c.relkind OPERATOR(pg_catalog.=) ANY (array['r', 'm'])
 ORDER BY c.relpages DESC;
SELECT pg_catalog.set_config('search_path', '', false);

VACUUM "Foo".bar;


$ echo $?
0

I don't know what happen on cfbot, investigating...


--
Gilles Darold





Re: [Proposal] vacuumdb --schema only

2022-04-06 Thread Gilles Darold
Le 30/03/2022 à 23:22, Nathan Bossart a écrit :
> I took a look at the v4 patch.
>
> 'git-apply' complains about whitespace errors:

Fixed.


> +   
> +To clean all tables in the Foo and 
> bar schemas
> +only in a database named xyzzy:
> +
> +$ vacuumdb --schema='"Foo"' --schema='bar' 
> xyzzy
> +
>
> nitpicks: I think the phrasing should be "To only clean tables in the...".
> Also, is there any reason to use a schema name with a capital letter as an
> example?  IMO that just adds unnecessary complexity to the example.

I have though that an example of a schema with case sensitivity was
missing in the documentation but I agree with your comment, this is
probably not he best place to do that. Fixed.


> +$node->issues_sql_like(
> +[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
> +qr/VACUUM "Foo".*/,
> +'vacuumdb --schema schema only');
>
> IIUC there should only be one table in the schema.  Can we avoid matching
> "*" and check for the exact command instead?

Fixed.


> I think there should be a few more test cases.  For example, we should test
> using -n and -N at the same time, and we should test what happens when
> those options are used for missing schemas.

Fixed


> +/*
> + * When filtering on schema name, filter by table is not allowed.
> + * The schema name can already be set to a fqdn table name.
> + */
> +if (tbl_count && (schemas.head != NULL))
> +{
> +pg_log_error("cannot vacuum all tables in schema(s) and specific 
> table(s) at the same time");
> +exit(1);
> +}
>
> I think there might be some useful refactoring we can do that would
> simplify adding similar options in the future.  Specifically, can we have a
> global variable that stores the type of vacuumdb command (e.g., all,
> tables, or schemas)?  If so, perhaps the tables list could be renamed and
> reused for schemas (and any other objects that need listing in the future).

I don't think there will be much more options like this one that will be
added to this command but anyway I have changed the patch that way.


> +if (schemas != NULL && schemas->head != NULL)
> +{
> +appendPQExpBufferStr(_query,
> + " AND c.relnamespace");
> +if (schema_is_exclude == TRI_YES)
> +appendPQExpBufferStr(_query,
> +" OPERATOR(pg_catalog.!=) ALL (ARRAY[");
> +else if (schema_is_exclude == TRI_NO)
> +appendPQExpBufferStr(_query,
> +" OPERATOR(pg_catalog.=) ANY (ARRAY[");
> +
> +for (cell = schemas->head; cell != NULL; cell = cell->next)
> +{
> +appendStringLiteralConn(_query, cell->val, conn);
> +
> +if (cell->next != NULL)
> +appendPQExpBufferStr(_query, ", ");
> +}
> +
> +/* Finish formatting schema filter */
> +appendPQExpBufferStr(_query, 
> "]::pg_catalog.regnamespace[])\n");
> +}
>
> IMO we should use a CTE for specified schemas like we do for the specified
> tables.  I wonder if we could even have a mostly-shared CTE code path for
> all vacuumdb commands with a list of names.

Fixed


> -/*
> - * If no tables were listed, filter for the relevant relation types.  If
> - * tables were given via --table, don't bother filtering by relation 
> type.
> - * Instead, let the server decide whether a given relation can be
> - * processed in which case the user will know about it.
> - */
> -if (!tables_listed)
> +else
>  {
> +/*
> + * If no tables were listed, filter for the relevant relation types. 
>  If
> + * tables were given via --table, don't bother filtering by relation 
> type.
> + * Instead, let the server decide whether a given relation can be
> + * processed in which case the user will know about it.
> + */
> nitpick: This change seems unnecessary.

Fixed


Thanks for the review, all these changes are available in new version v6
of the patch and attached here.


Best regards,

-- 
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..0de001ef24 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+   
+
+ 
+  -n
+  --schema
+ 
+ 

Re: [Proposal] vacuumdb --schema only

2022-03-09 Thread Gilles Darold
Le 09/03/2022 à 22:10, Justin Pryzby a écrit :
> On Mon, Mar 07, 2022 at 08:38:04AM +0100, Gilles Darold wrote:
>>> Maybe it's clearer to write this with =ANY() / != ALL() ?
>>> See 002.
>> I have applied your changes and produced a new version v3 of the patch,
>> thanks for the improvements. The patch have been added to commitfest
>> interface, see here https://commitfest.postgresql.org/38/3587/
> I wondered whether my patches were improvements, and it occurred to me that
> your patch didn't fail if the specified schema didn't exist.  That's arguably
> preferable, but that's the pre-existing behavior for tables.  So I think the
> behavior of my patch is more consistent.

+1

-- 
Gilles Darold


Re: [Proposal] vacuumdb --schema only

2022-03-09 Thread Gilles Darold
Hi,

New version v4 of the patch to fix a typo in a comment.

-- 
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..378328afb3 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+	   
+	
+	 
+	  -n
+	  --schema
+	 
+	 schema
+	
+	   
+
+	   
+	
+	 
+	  -N
+	  --exclude-schema
+	 
+	 schema
+	
+	   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the Foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='"Foo"' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..4c4f47e32a 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,12 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".*/,
+	'vacuumdb --schema schema only');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-t pg_class', 'postgres' ],
+	'cannot vacuum all tables in schema(s) and specific table(s) at the same time');
 
 done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 1dcf411767..b122c995b1 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -15,5 +15,8 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '-a' ],
 	qr/statement: VACUUM.*statement: VACUUM/s,
 	'vacuum all databases');
+$node->command_fails(
+	[ 'vacuumdb', '-a',  '-n', 'pg_catalog' ],
+	'cannot vacuum specific schema(s) in all databases');
 
 done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 4f6917fd39..d94f7459d5 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -46,10 +46,12 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 } vacuumingOptions;
 
+enum trivalue schema_is_exclude = TRI_DEFAULT;
 
 static void vacuum_one_database(ConnParams *cparams,
 vacuumingOptions *vacopts,
 int stage,
+SimpleStringList *schemas,
 SimpleStringList *tables,
 int concurrentCons,
 const char *progname, bool echo, bool quiet);
@@ -94,6 +96,8 @@ main(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"jobs", required_argument, NULL, 'j'},
 		{"parallel", required_argument, NULL, 'P'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
 		{"maintenance-db", required_argument, NULL, 2},
 		{"analyze-in-stages", no_argument, NULL, 3},
 		{"disable-page-skipping", no_argument, NULL, 4},
@@ -125,6 +129,7 @@ main(int argc, char *argv[])
 	SimpleStringList tables = {NULL, NULL};
 	int			concurrentCons = 1;
 	int			tbl_count = 0;
+	SimpleStringList schemas = {NULL, NULL};
 
 	/* initialize options */
 	memset(, 0, sizeof(vacopts));
@@ -140,7 +145,7 @@ main(int argc, char *argv[])
 
 	handle_help_version_opts(argc, argv, "vacuumdb", help);
 
-	while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fvj:P:", long_options, )) != -1)
+	while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:zZFat:fvj:P:n:N:", long_options, )) != -1)
 	{
 		switch (c)
 		{
@@ -202,6 +207,26 @@ main(int argc, char *argv[])
 	  _workers))
 	exit(1);
 break;
+			case 'n':   /* include schema(s) */
+if (schema_is_exclude == TRI_YES)
+	

Re: [Proposal] vacuumdb --schema only

2022-03-06 Thread Gilles Darold
Le 06/03/2022 à 16:04, Justin Pryzby a écrit :
> On Sun, Mar 06, 2022 at 09:39:37AM +0100, Gilles Darold wrote:
>> Attached a new patch version that adds the -N | --exclude-schema option
>> to the vacuumdb command as suggested. Documentation updated too.
>>
>> +pg_log_error("cannot vacuum all tables in schema(s) and and 
>> exclude specific schema(s) at the same time");
> and and
>
> It's odd that schema_exclusion is a global var, but schemas/excluded are not.
>
> Also, it seems unnecessary to have two schemas vars, since they can't be used
> together.  Maybe there's a better way than what I did in 003.
>
>> +for (cell = schemas ? schemas->head : NULL; cell; cell = 
>> cell->next)
> It's preferred to write cell != NULL
>
>> +   boolschemas_listed = false;
> ...
>> +for (cell = schemas ? schemas->head : NULL; cell; cell = 
>> cell->next)
>> +{
>> +if (!schemas_listed) {
>> +appendPQExpBufferStr(_query,
>> + " AND 
>> pg_catalog.quote_ident(ns.nspname)");
>> +if (schema_exclusion)
>> +appendPQExpBufferStr(_query, " 
>> NOT IN (");
>> +else
>> +appendPQExpBufferStr(_query, " 
>> IN (");
>> +
>> +schemas_listed = true;
>> +}
>> +else
>> +appendPQExpBufferStr(_query, ", ");
>> +
>> +appendStringLiteralConn(_query, cell->val, 
>> conn);
>> +appendPQExpBufferStr(_query, 
>> "::pg_catalog.regnamespace::pg_catalog.name");
>> +
>> +}
>> +/* Finish formatting schema filter */
>> +if (schemas_listed)
>> +appendPQExpBufferStr(_query, ")\n");
>>  }
> Maybe it's clearer to write this with =ANY() / != ALL() ?
> See 002.
>

I have applied your changes and produced a new version v3 of the patch,
thanks for the improvements. The patch have been added to commitfest
interface, see here https://commitfest.postgresql.org/38/3587/


-- 
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..378328afb3 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+	   
+	
+	 
+	  -n
+	  --schema
+	 
+	 schema
+	
+	   
+
+	   
+	
+	 
+	  -N
+	  --exclude-schema
+	 
+	 schema
+	
+	   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the Foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='"Foo"' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..4c4f47e32a 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,12 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".*/,
+	'vacuumdb --schema schema only');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_

Re: [Proposal] vacuumdb --schema only

2022-03-06 Thread Gilles Darold
Le 04/03/2022 à 11:56, Justin Pryzby a écrit :
> On Fri, Mar 04, 2022 at 10:11:28AM +0100, Gilles Darold wrote:
>> The attached patch implements that. Option -n | --schema can be used
>> multiple time and can not be used together with options -a or -t.
> Yes, thanks.
>
> I suggest there should also be an --exclude-schema.
>
>> I do not propose to extend the VACUUM and ANALYZE commands because their
>> current syntax doesn't allow me to see an easy way to do that
> I think this would be easy with the parenthesized syntax.
> I'm not suggesting to do it there, though.
>
>> +/*
>> + * When filtereing on schema name, filter by table is not allowed.
>> + * The schema name can already be set in a fqdn table name.
> set *to*
>

Attached a new patch version that adds the -N | --exclude-schema option
to the vacuumdb command as suggested. Documentation updated too.


I will add this patch to the commitfest unless there is cons about
adding these options.


-- 
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..378328afb3 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,40 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+	   
+	
+	 
+	  -n
+	  --schema
+	 
+	 schema
+	
+	   
+
+	   
+	
+	 
+	  -N
+	  --exclude-schema
+	 
+	 schema
+	
+	   
+ 
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +278,28 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
+ 
+  -N schema
+  --exclude-schema=schema
+  
+   
+Clean or analyze all tables NOT in schema.
+Multiple schemas can be excluded from the vacuum by writing multiple -N switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +675,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the Foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='"Foo"' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..4c4f47e32a 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,12 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".*/,
+	'vacuumdb --schema schema only');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-t pg_class', 'postgres' ],
+	'cannot vacuum all tables in schema(s) and specific table(s) at the same time');
 
 done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 1dcf411767..b122c995b1 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -15,5 +15,8 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '-a' ],
 	qr/statement: VACUUM.*statement: VACUUM/s,
 	'vacuum all databases');
+$node->command_fails(
+	[ 'vacuumdb', '-a',  '-n', 'pg_catalog' ],
+	'cannot vacuum specific schema(s) in all databases');
 
 done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 4f6917fd39..3dca22e1c8 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -46,10 +46,12 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 } vacuumingOptions;
 
+static bool schema_exclusion = false;
 
 static void vacuum_one_database(ConnParams *cparams,
 vacuumingOptions *vacopts,
 int stage,
+SimpleStringList *schemas,
 SimpleStringList *tables,
 int concurrentCons,
 const char *progname, bool echo, bool quiet);
@@ -94,6 +96,8 @@ main(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"jobs", required_argument, NULL, 'j'},
 		{"parallel", required_argument, NULL, 'P'},
+		{"schema", required_argument, NULL, 'n'},
+		{"exclude-schema", required_argument, NULL, 'N'},
 		{"

Re: [Proposal] vacuumdb --schema only

2022-03-04 Thread Gilles Darold

Le 04/03/2022 à 11:56, Justin Pryzby a écrit :

On Fri, Mar 04, 2022 at 10:11:28AM +0100, Gilles Darold wrote:

The attached patch implements that. Option -n | --schema can be used
multiple time and can not be used together with options -a or -t.

Yes, thanks.

I suggest there should also be an --exclude-schema.



Ok, I will add it too.





I do not propose to extend the VACUUM and ANALYZE commands because their
current syntax doesn't allow me to see an easy way to do that

I think this would be easy with the parenthesized syntax.
I'm not suggesting to do it there, though.



Yes this is what I've though, something a la EXPLAIN, for example : 
"VACUUM (ANALYZE, SCHEMA foo)" but this is a change in the VACUUM syntax 
that needs to keep the compatibility with the current syntax. We will 
have two syntax something like "VACUUM ANALYZE FULL dbname" and "VACUUM 
(ANALYZE, FULL) dbname". The other syntax "problem" is to be able to use 
multiple schema values in the VACUUM command, perhaps "VACUUM (ANALYZE, 
SCHEMA (foo,bar))".




+   /*
+* When filtereing on schema name, filter by table is not allowed.
+* The schema name can already be set in a fqdn table name.

set *to*


Thanks, will be fixed in next patch version.


--
Gilles Darold





[Proposal] vacuumdb --schema only

2022-03-04 Thread Gilles Darold

Hi,


When we want to vacuum and/or analyze all tables in a dedicated schema, 
let's say pg_catalog for example, there is no easy way to do that. The 
VACUUM command doesn't allow it so we have to use \gexec or a SQL script 
to do that. We have an external command vacuumdb that could be used to 
simplify this task. For example the following command can be used to 
clean all tables stored in the pg_catalog schema:


    vacuumdb --schema pg_catalog -d foo

The attached patch implements that. Option -n | --schema can be used 
multiple time and can not be used together with options -a or -t.



Common use cases are an application that creates lot of temporary 
objects then drop them which can bloat a lot the catalog or which have 
heavy work in some schemas only. Of course the good practice is to find 
the bloated tables and execute VACUUM on each table but if most of the 
tables in the schema are regularly bloated the use of the vacuumdb 
--schema script can save time.



I do not propose to extend the VACUUM and ANALYZE commands because their 
current syntax doesn't allow me to see an easy way to do that and also 
because I'm not really in favor of such change. But if there is interest 
in improving these commands I will be pleased to do that, with the 
syntax suggested.



Best regards,

--
Gilles Darold
diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml
index 956c0f01cb..e4f6d32ba9 100644
--- a/doc/src/sgml/ref/vacuumdb.sgml
+++ b/doc/src/sgml/ref/vacuumdb.sgml
@@ -39,6 +39,24 @@ PostgreSQL documentation
dbname
   
 
+  
+   vacuumdb
+   connection-option
+   option
+
+   
+
+ 
+  -n
+  --schema
+ 
+ schema
+
+   
+
+   dbname
+  
+
   
vacuumdb
connection-option
@@ -244,6 +262,17 @@ PostgreSQL documentation
   
  
 
+ 
+  -n schema
+  --schema=schema
+  
+   
+Clean or analyze all tables in schema only.
+Multiple schemas can be vacuumed by writing multiple -n switches.
+   
+  
+ 
+
  
   --no-index-cleanup
   
@@ -619,6 +648,14 @@ PostgreSQL documentation
 $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy
 
 
+   
+To clean all tables in the Foo and bar schemas
+only in a database named xyzzy:
+
+$ vacuumdb --schema='"Foo"' --schema='bar' xyzzy
+
+
+
  
 
  
diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl
index 96a818a3c1..4c4f47e32a 100644
--- a/src/bin/scripts/t/100_vacuumdb.pl
+++ b/src/bin/scripts/t/100_vacuumdb.pl
@@ -103,6 +103,8 @@ $node->safe_psql(
   CREATE TABLE funcidx (x int);
   INSERT INTO funcidx VALUES (0),(1),(2),(3);
   CREATE INDEX i0 ON funcidx ((f1(x)));
+  CREATE SCHEMA "Foo";
+  CREATE TABLE "Foo".bar(id int);
 |);
 $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|],
 	'column list');
@@ -146,5 +148,12 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ],
 	qr/GREATEST.*relfrozenxid.*2147483001/,
 	'vacuumdb --table --min-xid-age');
+$node->issues_sql_like(
+	[ 'vacuumdb', '--schema', '"Foo"', 'postgres' ],
+	qr/VACUUM "Foo".*/,
+	'vacuumdb --schema schema only');
+$node->command_fails(
+	[ 'vacuumdb',   '-n', 'pg_catalog', '-t pg_class', 'postgres' ],
+	'cannot vacuum all tables in schema(s) and specific table(s) at the same time');
 
 done_testing();
diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl
index 1dcf411767..b122c995b1 100644
--- a/src/bin/scripts/t/101_vacuumdb_all.pl
+++ b/src/bin/scripts/t/101_vacuumdb_all.pl
@@ -15,5 +15,8 @@ $node->issues_sql_like(
 	[ 'vacuumdb', '-a' ],
 	qr/statement: VACUUM.*statement: VACUUM/s,
 	'vacuum all databases');
+$node->command_fails(
+	[ 'vacuumdb', '-a',  '-n', 'pg_catalog' ],
+	'cannot vacuum specific schema(s) in all databases');
 
 done_testing();
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index 4f6917fd39..69b470598f 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -46,10 +46,10 @@ typedef struct vacuumingOptions
 	bool		process_toast;
 } vacuumingOptions;
 
-
 static void vacuum_one_database(ConnParams *cparams,
 vacuumingOptions *vacopts,
 int stage,
+SimpleStringList *schemas,
 SimpleStringList *tables,
 int concurrentCons,
 const char *progname, bool echo, bool quiet);
@@ -94,6 +94,7 @@ main(int argc, char *argv[])
 		{"verbose", no_argument, NULL, 'v'},
 		{"jobs", required_argument, NULL, 'j'},
 		{"parallel", required_argument, NULL, 'P'},
+		{"schema", required_argument, NULL, 'n'},
 		{"maintenance-db", required_argument, NULL, 2},
 		{"analyze-in-stages", no_argument, NULL, 3},
 		{"disable-page-skipping", no_argument, NULL, 4},
@@ -125,6 +126,7 @@ main(int argc, char

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-12-15 Thread Gilles Darold

Le 15/12/2021 à 13:41, Peter Eisentraut a écrit :

On 03.08.21 19:10, Tom Lane wrote:

Gilles Darold  writes:

Sorry I have missed that, but I'm fine with this implemenation so let's
keep the v6 version of the patch and drop this one.


Pushed, then.  There's still lots of time to tweak the behavior of 
course.


I have a documentation follow-up to this.  It seems that these new 
functions are almost a de facto standard, whereas the SQL-standard 
functions are not implemented anywhere.  I propose the attached patch 
to update the subsection in the pattern-matching section to give more 
detail on this and suggest equivalent functions among these newly 
added ones.  What do you think?



I'm in favor to apply your changes to documentation. It is a good thing 
to precise the relation between this implementation of the regex_* 
functions and the SQL stardard.


--
Gilles Darold






Re: Pasword expiration warning

2021-11-21 Thread Gilles Darold
Le 20/11/2021 à 14:48, Andrew Dunstan a écrit :
> On 11/19/21 19:17, Bossart, Nathan wrote:
>> On 11/19/21, 7:56 AM, "Tom Lane"  wrote:
>>> That leads me to wonder about server-side solutions.  It's easy
>>> enough for the server to see that it's used a password with an
>>> expiration N days away, but how could that be reported to the
>>> client?  The only idea that comes to mind that doesn't seem like
>>> a protocol break is to issue a NOTICE message, which doesn't
>>> seem like it squares with your desire to only do this interactively.
>>> (Although I'm not sure I believe that's a great idea.  If your
>>> application breaks at 2AM because its password expired, you
>>> won't be any happier than if your interactive sessions start to
>>> fail.  Maybe a message that would leave a trail in the server log
>>> would be best after all.)
>> I bet it's possible to use the ClientAuthentication_hook for this.  In
>> any case, I agree that it probably belongs server-side so that other
>> clients can benefit from this.
>>
> +1 for a server side solution. The people most likely to benefit from
> this are the people least likely to be using psql IMNSHO.


Ok, I can try to implement something at server side using a NOTICE message.


-- 
Gilles Darold





Re: Pasword expiration warning

2021-11-19 Thread Gilles Darold

Le 19/11/2021 à 16:55, Tom Lane a écrit :

Gilles Darold  writes:

Now that the security policy is getting stronger, it is not uncommon to
create users with a password expiration date (VALID UNTIL).

TBH, I thought people were starting to realize that forced password
rotations are a net security negative.  It's true that a lot of
places haven't gotten the word yet.


I'm wondering if we might be interested in having this feature in psql?

This proposal kind of seems like a hack, because
(1) not everybody uses psql



Yes, for me it's a comfort feature. When a user connect to a PG backend 
using an account that have expired you have no information that the 
problem is a password expiration. The message returned to the user is 
just: "FATAL: password authentication failed for user "foo".  We had to 
verify in the log file that the problem is related to "DETAIL:  User 
"foo" has an expired password.".  If the user was warned beforehand to 
change the password it will probably saves me some time.




(2) psql can't really tell whether rolvaliduntil is relevant.
 (It can see whether the server demanded a password, but
 maybe that went to LDAP or some other auth method.)



I agree, I hope that in case of external authentication rolvaliduntil is 
not set and in this case I guess that there is other notification 
channels to inform the user that his password will expire. Otherwise yes 
the warning message could be a false positive but the rolvaliduntil can 
be changed to infinity to fix this case.




That leads me to wonder about server-side solutions.  It's easy
enough for the server to see that it's used a password with an
expiration N days away, but how could that be reported to the
client?  The only idea that comes to mind that doesn't seem like
a protocol break is to issue a NOTICE message, which doesn't
seem like it squares with your desire to only do this interactively.
(Although I'm not sure I believe that's a great idea.  If your
application breaks at 2AM because its password expired, you
won't be any happier than if your interactive sessions start to
fail.  Maybe a message that would leave a trail in the server log
would be best after all.)



I think that this is the responsibility of the client to display a 
warning when the password is about to expire, the backend could help the 
application by sending a NOTICE but the application will still have to 
report the notice. I mean that it can continue to do all the work to 
verify that the password is about to expire.




Default value is 0 like today no warning at all.

Off-by-default is pretty much guaranteed to not help most people.


Right, I was thinking of backward compatibility but this does not apply 
here. So default to 7 days will be better.



To sum up as I said on top this is just a comfort notification dedicated 
to psql and for local pg account to avoid looking at log file for 
forgetting users.



--
Gilles Darold



Pasword expiration warning

2021-11-19 Thread Gilles Darold

Hi all,


Now that the security policy is getting stronger, it is not uncommon to 
create users with a password expiration date (VALID UNTIL). The problem 
is that the user is only aware that his password has expired when he can 
no longer log in unless the application with which he is connecting 
notifies him beforehand.



I'm wondering if we might be interested in having this feature in psql? 
For example for a user whose password expires in 3 days:



gilles=# CREATE ROLE foo LOGIN PASSWORD 'foo' VALID UNTIL '2021-11-22';
CREATE ROLE
gilles=# \c - foo
Password for user foo:
psql (15devel, server 14.1 (Ubuntu 14.1-2.pgdg20.04+1))
** Warning: your password expires in 3 days **
You are now connected to database "gilles" as user "foo".


My idea is to add a psql variable that can be defined in psqlrc to 
specify the number of days before the user password expires to start 
printing a warning. The warning message is only diplayed in interactive 
mode Example:


$ cat /etc/postgresql-common/psqlrc
\set PASSWORD_EXPIRE_WARNING 7

Default value is 0 like today no warning at all.


Of course any other client application have to write his own beforehand 
expiration notice but with psql we don't have it for the moment. If 
there is interest for this psql feature I can post the patch.



--
Gilles Darold



Re: [PATCH] fix references to like_regex

2021-11-02 Thread Gilles Darold

Le 02/11/2021 à 16:50, Tom Lane a écrit :

Gilles Darold  writes:

Since we have the regexp_like operator I have found that there is two
references in the documentation about PostgreSQL lacking of LIKE_REGEX
implementation. Here is a patch to fix the documentation. I simply
remove the reference to non exist of LIKE_REGEX in PostgreSQL in chapter
"9.7.3.8 Differences from XQuery"  And try to modify chapter "9.16.2.3.
SQL/JSON Regular Expressions" to mention the REGEXP_LIKE operator. For
the second fix there should be better wording.

I don't think we should change these (yet).  regexp_like() is *not*
LIKE_REGEX, precisely because it's using a slightly different
regular-expression language than what the spec calls for.
At some point we may provide a skin for the regexp engine that
duplicates the spec's definition, and then we can implement
LIKE_REGEX for real.



Thanks for clarifying, I thought it was an oversight.


Regards

--
Gilles Darold



[PATCH] fix references to like_regex

2021-11-02 Thread Gilles Darold

Hi,


Since we have the regexp_like operator I have found that there is two 
references in the documentation about PostgreSQL lacking of LIKE_REGEX 
implementation. Here is a patch to fix the documentation. I simply 
remove the reference to non exist of LIKE_REGEX in PostgreSQL in chapter 
"9.7.3.8 Differences from XQuery"  And try to modify chapter "9.16.2.3. 
SQL/JSON Regular Expressions" to mention the REGEXP_LIKE operator. For 
the second fix there should be better wording.



Best regards,

--
Gilles Darold

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 4b49dff2ff..cf3b2cc827 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -7353,26 +7353,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
 
 

-   Differences from XQuery (LIKE_REGEX)
-
-   
-LIKE_REGEX
-   
+   Differences from XQuery
 

 XQuery regular expressions

 
-
- Since SQL:2008, the SQL standard includes
- a LIKE_REGEX operator that performs pattern
- matching according to the XQuery regular expression
- standard.  PostgreSQL does not yet
- implement this operator, but you can get very similar behavior using
- the regexp_match() function, since XQuery
- regular expressions are quite close to the ARE syntax described above.
-
-
 
  Notable differences between the existing POSIX-based
  regular-expression feature and XQuery regular expressions include:
@@ -17456,11 +17442,11 @@ $[*] ? (@ like_regex "^[aeiou]" flag "i")
 
  The SQL/JSON standard borrows its definition for regular expressions
  from the LIKE_REGEX operator, which in turn uses the
- XQuery standard.  PostgreSQL does not currently support the
- LIKE_REGEX operator.  Therefore,
- the like_regex filter is implemented using the
- POSIX regular expression engine described in
- .  This leads to various minor
+ XQuery standard.  PostgreSQL supports the REGEX_LIKE
+ operator which is the equivalent minus some differences described in
+ . The like_regex
+ filter is implemented using the POSIX regular expression engine, see
+ . This leads to various minor
  discrepancies from standard SQL/JSON behavior, which are cataloged in
  .
  Note, however, that the flag-letter incompatibilities described there


Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-28 Thread Gilles Darold
Le 28/10/2021 à 16:31, Bruce Momjian a écrit :
> On Thu, Oct 28, 2021 at 11:30:27AM +0200, Gilles Darold wrote:
>> Fixed with new patch version v7 attached. It also fixes unwanted change
>> of some regression tests output reported by the cfbot because I forgot
>> to change my locale.
>>
>>
>> I will also add a pg_dump test to verify that ALTER ... SET UNEXPANDED
>> statements are well generated in the dump.
> I want to state I still think this feature is not generally desired, and
> is better implemented at the query level.

I think that with an implementation at query level we will cover the
user need but not the developer need to "hide" technical columns, and
also it does not cover the INSERT statement without column.


Personally I will not try to convince more I'm lacking of arguments, I
just wanted to attach a full working patch to test the proposal. So
unless there is more persons interested by this feature I suggest us to
not waste more time on this proposal.


-- 
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-18 Thread Gilles Darold
Le 18/10/2021 à 17:24, Vik Fearing a écrit :
> On 10/18/21 8:44 AM, Gilles Darold wrote:
>> Le 17/10/2021 à 23:48, Isaac Morland a écrit :
>>> On Sun, 17 Oct 2021 at 17:42, Gilles Darold >> <mailto:gil...@migops.com>> wrote:
>>>
>>> Note that psql doesn't display a separate line for each row in this
>>> case, but the actual result coming back from the server does contain
>>> the appropriate number of rows. 
>> I was not aware of that. In this case perhaps that we can remove the
>> restriction on having at least on expandable column and we will have the
>> same behavior but I can't think of an interest to allow that.
> Allowing no-column tables removed the need to handle a bunch of corner
> cases.  Useful for users or not, the precedent is set.


I agree, now that I know that this is perfectly possible to return N
rows without any data/column I also think that we should allow it in
respect to PostgreSQL behavior with a table with no column. I will
remove the check at SET UNEXPANDED.

-- 
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-18 Thread Gilles Darold
Le 17/10/2021 à 23:48, Isaac Morland a écrit :
> On Sun, 17 Oct 2021 at 17:42, Gilles Darold  <mailto:gil...@migops.com>> wrote:
>
> Perhaps I misunderstand what you are saying, but a no-columns table
> definitely can return rows:
>
> psql (12.2)
> Type "help" for help.
>
> postgres=# create table nada ();
> CREATE TABLE
> postgres=# insert into nada default values;
> INSERT 0 1
> postgres=# insert into nada default values;
> INSERT 0 1
> postgres=# table nada;
> --
> (2 rows)
>
> postgres=#
>
> Note that psql doesn't display a separate line for each row in this
> case, but the actual result coming back from the server does contain
> the appropriate number of rows. 


I was not aware of that. In this case perhaps that we can remove the
restriction on having at least on expandable column and we will have the
same behavior but I can't think of an interest to allow that.


-- 
Gilles Darold



Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-17 Thread Gilles Darold
Le 17/10/2021 à 23:04, Vik Fearing a écrit :
> On 10/17/21 11:01 PM, Gilles Darold wrote:
>>   - Add a check into SET UNEXPANDED code to verify that there is at
>> least one column expanded.
> What is the point of this?  Postgres allows column-less tables.
>
> Both of these statements are valid:
>
>  - CREATE TABLE nada ();
>  - SELECT;


Yes, my first though was to allow all columns to be unexpandable like a
table without column, but the the problem is that when you execute
"SELECT * FROM nada" it returns no rows which is not the case of a table
with hidden column. I could fix that to return no rows if all columns
are unexpandable but I think that all column hidden is a nonsens so I
have prefered to not allow it and an error is raised.


Also I've just though that applying unexpandable column feature to
plpgsql breaks the use of ROWTYPE. It contains all columns so when use
as a variable to receive a SELECT * or RETURNING * INTO it will not
works, I will try to fix that.


-- 
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-16 Thread Gilles Darold
Le 15/10/2021 à 20:51, Bruce Momjian a écrit :
> Why is this not better addressed by creating a view on the original
> table, even perhaps renaming the original table and create a view using
> the old table name.

Because when you use the view for the select you can not use the
"hidden" column in your query, for example in the WHERE or ORDER BY
clause.  Also if you have a hundred of tables, let's says with a
ts_vector column that you want to unexpand, you will have to create a
hundred of view.  The other problem it for write in the view, it you
have a complex modification involving other tables in the query you have
to define rules. Handling a technical column through a view over the
real table require lot of work, this feature will help a lot to save
this time.

-- 
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-16 Thread Gilles Darold
Le 15/10/2021 à 18:42, Aleksander Alekseev a écrit :
> Hi Gilles,
>
>> Yes, I don't wanted to offend you or to troll. This was just to point
>> that the position of "SELECT * is bad practice" is not a good argument
>> in my point of view, just because it is allowed for every one. I mean
>> that in an extension or a client which allow user query input we must
>> handle the case.
> Sure, no worries. And my apologies if my feedback seemed a little harsh.
>
> I'm sure our goal is mutual - to make PostgreSQL even better than it
> is now. Finding a consensus occasionally can take time though.
>
Right, no problem Aleksander, my english speaking and understanding is
not very good so it doesn't help too.  Let's have a beer next time :-)





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-15 Thread Gilles Darold

Le 15/10/2021 à 21:52, Andrew Dunstan a écrit :

On 10/15/21 2:51 PM, Bruce Momjian wrote:

On Fri, Oct 15, 2021 at 11:32:53AM +0200, Laurenz Albe wrote:

On Thu, 2021-10-14 at 13:16 +0200, Gilles Darold wrote:

Here is a proposal to implement HIDDEN columns feature in PostgreSQL.

The user defined columns are always visible in the PostgreSQL. If user
wants to hide some column(s) from a SELECT * returned values then the
hidden columns feature is useful. Hidden column can always be used and
returned by explicitly referring it in the query.

When I read your proposal, I had strangely mixed feelings:
"This is cute!" versus "Do we need that?".  After some thinking, I think
that it boils down to the following:

That feature is appealing to people who type SQL statements into psql,
which is probably the majority of the readers on this list.  It is
immediately clear that this can be used for all kinds of nice things.

On the other hand: a relational database is not a spreadsheet, where
I want to hide or highlight columns.  Sure, the interactive user may
use it in that way, but that is not the target of a relational database.
Databases usually are not user visible, but used by an application.
So the appeal for the interactive user is really pretty irrelevant.

Now this patch makes certain things easier, but it adds no substantially
new functionality: I can exclude a column from display as it is, simply
by listing all the other columns.  Sure, that's a pain for the interactive
user, but it is irrelevant for a query in an application.

This together with the fact that it poses complicated questions when
we dig deeper, such as "what about whole-row references?", tilts my vote.
If it were for free, I would say +1.  But given the ratio of potential
headache versus added real-life benefit, I find myself voting -1.

I can see the usefulness of this, though UNEXPANDED seems clearer.
However, it also is likely to confuse someone who does SELECT * and then
can't figure out why another query is showing a column that doesn't
appear in SELECT *.  I do think SELECT * EXCEPT is the better and less
confusing solution.  I can imagine people using different EXCEPT columns
for different queries, which HIDDEN/UNEXPANDED does not allow.  I
frankly can't think of a single case where output is specified at the
DDL level.

Why is this not better addressed by creating a view on the original
table, even perhaps renaming the original table and create a view using
the old table name.


That's pretty much my feeling. This seems a bit too cute.


I have a little function I use to create a skeleton query on tables with
lots of columns just so I can delete a few and leave the rest, a problem
that would be solved neatly by the EXCEPT proposal and not but the
HIDDEN proposal.



I have nothing against seeing the EXCEPT included into core except that 
this is a big sprain to the SQL standard and I doubt that personally I 
will used it for portability reason. Saying that, by this syntax we will 
also encourage the use of SELECT * which is incontradiction with the 
common opinion.



But again I don't think this is the same feature, the only thing where 
SELECT * EXCEPT is useful is for a single non portable statement. It 
does not help to extend PostgreSQL through extensions or can solves 
application migration issues. I'm a bit surprise by this confusion with 
the EXCEPT syntax.



--
Gilles Darold



Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-15 Thread Gilles Darold

Le 15/10/2021 à 14:24, Aleksander Alekseev a écrit :

Hi Gilles,


If we really want SELECT * to be reserved to DBA then why not removing the
star from PG unless you have the admin privilege?

Respectfully, I perceive this as a trolling (presumably, non-intentional one)
and not going to answer this.



Yes, I don't wanted to offend you or to troll. This was just to point 
that the position of "SELECT * is bad practice" is not a good argument 
in my point of view, just because it is allowed for every one. I mean 
that in an extension or a client which allow user query input we must 
handle the case.



--
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-15 Thread Gilles Darold

Le 15/10/2021 à 10:37, Aleksander Alekseev a écrit :

Hi Gilles,

> I can turn the column hidden and I will not have to modify my old 
very good application.


I see your point. At the same time, I believe the statement above 
shows the root reason why we have a different view on this feature. 
The application should have never use SELECT * in the first place. 
This is a terrible design - you add a column or change their order and 
the application is broken. And I don't believe the DBMS core is the 
right place for placing hacks for applications like this. This should 
be solved in the application itself or in some sort of proxy server 
between the application and DBMS. SELECT * is intended to be used by 
people e.g. DBA.



Yes I understand this point. Personally I have always used PostgreSQL 
and exclusively PostgreSQL in 25 years so I am aware of that and try to 
give my best to SQL code quality. But we have more and more application 
coming from others RDBMS with sometime no real possibility to modify the 
code or which requires lot of work. To give an other use case, some time 
ago I have written an extension (https://github.com/darold/pgtt-rsl) 
which use a technical column based on a composite type based on the 
backend start time and pid to emulate Global Temporary Table. To be able 
to hide this column from the user query point of view,  I had to create 
a view and route any action on this view to the real underlying table in 
the extension C code. If the hidden feature was implemented it would 
have same me some time. I see several other possible extensions that 
could benefit of this feature.



As I said when you develop an extension you can not just say to the user 
to never used SELECT * if he want to use your extension. At least this 
is something I will never said, even if this is a bad practice so I have 
to find a solution to avoid showing technical columns. If we really want 
SELECT * to be reserved to DBA then why not removing the star from PG 
unless you have the admin privilege?



--
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-15 Thread Gilles Darold

Le 15/10/2021 à 09:47, Aleksander Alekseev a écrit :



Just to remind here, there was recently a proposal to handle this
problem another way - provide a list of columns to skip for "star
selection" aka "SELECT * EXCEPT col1...".

https://postgrespro.com/list/id/d51371a2-f221-1cf3-4a7d-b2242d4da...@gmail.com

[...]

I feel using EXCEPT would be a lot clearer, no one is likely to be
mislead into thinking that its is a security feature unlike 'HIDDEN'.
Also you know that SELECT * will select all columns.

If this kind of feature were to be added, then I'd give a +1 to use the
EXCEPT syntax.

+1 to that, personally I would love to have SELECT * EXCEPT ... syntax
in PostgreSQL. Also, I discovered this feature was requested even
earlier, in 2007 [1]


I don't think that the EXCEPT syntax will be adopted as it change the
SQL syntax for SELECT in a non standard way. This is not the case of the
hidden column feature which doesn't touch of the SELECT or INSERT syntax.

HIDDEN columns affect SELECT and INSERT behaviour in the same
non-standard way, although maybe without changing the syntax.
Personally, I believe this is even worse. The difference is that with
`SELECT * EXCEPT` you explicitly state what you want, while HIDDEN
columns do this implicitly. Extending the syntax beyond standards in a
reasonable way doesn't seem to be a problem. As a recent example in
this thread [2] the community proposed to change the syntax in
multiple places at the same time.

`SELECT * EXCEPT` solves the same problem as HIDDEN columns, but is
much easier to implement and maintain. Since it's a simple syntax
sugar it doesn't affect the rest of the system.



That's not true, this is not the same feature. the EXCEPT clause will 
not return column that you don't want in a specific request. I have 
nothing against that but you have to explicitly name them. I think about 
kind of bad design that we can find commonly like a table with 
attribute1 ... attribute20. If we can use regexp with EXCEPT like 
'attribute\d+' that could be helpful too. But this is another thread.



The hidden column feature hidden the column for all queries using the 
wilcard on the concerned table. For example if I have to import a 
database with OID enabled from an old dump and I want to prevent the OID 
column to be returned through the star use, I can turn the column hidden 
and I will not have to modify my old very good application. I caricature 
but this is the kind of thing that could happen. I see several other 
possible use of this feature with extensions that could use a technical 
column that the user must not see using the wildcard. Also as Vik or 
Dave mention being able to hide all tsvector columns from query without 
having to specify it as exception in each query used can save some time.



IMHO this is definitively not the same feature.


--
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-14 Thread Gilles Darold
Le 14/10/2021 à 22:01, Gavin Flower a écrit :
> On 15/10/21 07:01, Josef Šimánek wrote:
>> čt 14. 10. 2021 v 13:17 odesílatel Gilles Darold 
>> napsal:
>>> Hi,
>>>
>>>
>>> Here is a proposal to implement HIDDEN columns feature in PostgreSQL.
>>>
>>> The user defined columns are always visible in the PostgreSQL. If user
>>> wants to hide some column(s) from a SELECT * returned values then the
>>> hidden columns feature is useful. Hidden column can always be used and
>>> returned by explicitly referring it in the query.
>>>
>>> I agree that views are done for that or that using a SELECT * is a bad
>>> practice
>>> but sometime we could need to "technically" prevent some columns to
>>> be part
>>> of a star expansion and nbot be forced to use view+rules.
>> Just to remind here, there was recently a proposal to handle this
>> problem another way - provide a list of columns to skip for "star
>> selection" aka "SELECT * EXCEPT col1...".
>>
>> https://postgrespro.com/list/id/d51371a2-f221-1cf3-4a7d-b2242d4da...@gmail.com
>>
>
> [...]
>
> I feel using EXCEPT would be a lot clearer, no one is likely to be
> mislead into thinking that its is a security feature unlike 'HIDDEN'. 
> Also you know that SELECT * will select all columns.
>
> If this kind of feature were to be added, then I'd give a +1 to use
> the EXCEPT syntax.


I don't think that the EXCEPT syntax will be adopted as it change the
SQL syntax for SELECT in a non standard way. This is not the case of the
hidden column feature which doesn't touch of the SELECT or INSERT syntax.





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-14 Thread Gilles Darold

Le 14/10/2021 à 20:55, Gilles Darold a écrit :


gilles=# SELECT row_to_json(t.*) FROM htest0 t;
   row_to_json
--
 {"a":1,"b":"htest0 one"}
 {"a":2,"b":"htest0 two"}
(2 rows)

gilles=# SELECT row_to_json(t) FROM htest0 t;
   row_to_json
--
 {"a":1,"b":"htest0 one"}
 {"a":2,"b":"htest0 two"}
(2 rows)



Tom, I have probably not well understood what you said about do the 
cases above. Do you mean that the column should not be visible too? I 
have though not but maybe I'm wrong, I will fix that.



--
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-14 Thread Gilles Darold

Le 14/10/2021 à 20:26, Tom Lane a écrit :

"David G. Johnston"  writes:

Taking this a bit further, I dislike tying the suppression of the column
from the select-list star to the behavior of insert without a column list
provided.  I’m not fully on board with having an attribute that is not
fundamental to the data model but rather an instruction about how that
column interacts with SQL; separating the two aspects, though, would help.
I accept the desire to avoid star expansion much more than default columns
for insert.

Yeah, me too.  I think it would add a lot of clarity if we defined this
as "this affects the behavior of SELECT * and nothing else" ... although
even then, there are squishy questions about how much it affects the
behavior of composite datums that are using the column's rowtype.
But as soon as you want it to bleed into INSERT, you start having a
lot of questions about what else it should bleed into, as Aleksander
already mentioned.



I not agree, expansion in executed when there is no column list provided 
and this affect SELECT and INSERT. It cover the same needs: being able 
to remove a column for the target list when it is not explicitly set. 
This feature is known like this and I'm not in favor to tear off a leg.



--
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-14 Thread Gilles Darold

Le 14/10/2021 à 20:43, Tom Lane a écrit :

Re-reading that, I realize I probably left way too much unstated,
so let me spell it out.

Should this feature affect
SELECT * FROM my_table t;
?  Yes, absolutely.

How about
SELECT t.* FROM my_table t;
?  Yup, one would think so.

Now how about
SELECT row_to_json(t.*) FROM my_table t;
?  All of a sudden, I'm a lot less sure --- not least because we *can't*
simply omit some columns, without the composite datum suddenly not being
of the table's rowtype anymore, which could have unexpected effects on
query semantics.  In particular, if we have a user-defined function
that's defined to accept composite type my_table, I don't think we can
suppress columns in
SELECT myfunction(t.*) FROM my_table t;

And don't forget that these can also be spelled like
SELECT row_to_json(t) FROM my_table t;
without any star visible anywhere.

So the more I think about this, the squishier it gets.  I'm now sharing
the fears expressed upthread about whether it's even possible to define
this in a way that won't have a lot of gotchas.

regards, tom lane



You mean this ? :-)


gilles=# CREATE TABLE htest0 (a int PRIMARY KEY, b text NOT NULL HIDDEN);
CREATE TABLE
gilles=# INSERT INTO htest0 (a, b) VALUES (1, 'htest0 one');
INSERT 0 1
gilles=# INSERT INTO htest0 (a, b) VALUES (2, 'htest0 two');
INSERT 0 1

gilles=# SELECT * FROM htest0 t;
 a
---
 1
 2
(2 rows)

gilles=# SELECT t.* FROM htest0 t;
 a
---
 1
 2
(2 rows)

gilles=# SELECT row_to_json(t.*) FROM htest0 t;
   row_to_json
--
 {"a":1,"b":"htest0 one"}
 {"a":2,"b":"htest0 two"}
(2 rows)

gilles=# SELECT row_to_json(t) FROM htest0 t;
   row_to_json
--
 {"a":1,"b":"htest0 one"}
 {"a":2,"b":"htest0 two"}
(2 rows)


You should have a look at the patch, I don't think that the way it is 
done there could have gotchas.


--
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-14 Thread Gilles Darold

Le 14/10/2021 à 19:44, Tom Lane a écrit :

As for the proposal itself, I'm kind of allergic to the terminology
you've suggested, because the column is in no way hidden.  It's
still visible in the catalogs, you can still select it explicitly,
etc.  Anybody who thinks this is useful from a security standpoint
is mistaken, but these words suggest that it is.  Perhaps some
terminology like "not expanded" or "unexpanded" would serve better
to indicate that "SELECT *" doesn't expand to include the column.
Or STAR versus NO STAR, maybe.



Agree, I also had this feeling. I decide to use HIDDEN like in DB2 just 
because UNEXPANDED looks to me difficult to understand by users and that 
hidden or Invisible column are well known. This is a kind of "vendor 
standard" now. But I agree that it can confuse uninformed people and 
doesn't reflect the real feature. I will rename the keyword as 
"UNEXPANDED", will do.




I also do not care for the syntax you propose: AFAICS the only reason
you've gotten away with making HIDDEN not fully reserved is that you
require it to be the last attribute of a column, which is something
that will trip users up all the time.  Plus, it does not scale to the
next thing we might want to add.  So if you can't make it a regular,
position-independent element of the ColQualList you shouldn't do it
at all.



Yes I have also noted that and wanted to improve this later if the 
proposal was accepted.




What I think is actually important is the ALTER COLUMN syntax.
We could easily get away with having that be the only syntax for
this --- compare the precedent of ALTER COLUMN SET STATISTICS.



Ok great, I'm fine with that, especially for the previous point :-) I 
will remove it from the CREATE TABLE syntax except in the INCLUDING like 
option.




BTW, you do NOT get to add an information_schema column for
this.  The information_schema is defined in the SQL standard.
Yes, I'm aware that mysql feels free to "extend" the standard
in that area; but our policy is that the only point of having the
information_schema views at all is if they're standard-compliant.


Ok, I will remove it.


--
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-14 Thread Gilles Darold

Le 14/10/2021 à 17:38, Jaime Casanova a écrit :

On Thu, Oct 14, 2021 at 01:16:45PM +0200, Gilles Darold wrote:

Hi,


Here is a proposal to implement HIDDEN columns feature in PostgreSQL.


Great! Actually I found this very useful, especially for those people
using big fields (geometry, files, large texts).


The user defined columns are always visible in the PostgreSQL. If user
wants to hide some column(s) from a SELECT * returned values then the
hidden columns feature is useful. Hidden column can always be used and
returned by explicitly referring it in the query.

I agree that views are done for that or that using a SELECT * is a bad
practice

An a common one, even if we want to think otherwise. I have found that
in almost every customer I have the bad luck to get to see code or
SELECTs.

Not counting that sometimes we have columns for optimization like Dave
saved about hidden a ts_vector column.

Another use case I can think of is not covered in this patch, but it
could be (I hope!) or even if not I would like opinions on this idea.
What about a boolean GUC log_hidden_column that throws a LOG message when
a hidden column is used directly?

The intention is to mark a to-be-deleted column as HIDDEN and then check
the logs to understand if is still being used somewhere. I know systems
where they carry the baggage of deprecated columns only because they
don't know if some system is still using them.

I know this would be extending your original proposal, and understand if
you decide is not a first patch material.



Why not, I will add it if there is a consencus about logging hidden 
column use, this is not a big work.



--
Gilles Darold





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-14 Thread Gilles Darold

Le 14/10/2021 à 14:28, Pavel Stehule a écrit :



čt 14. 10. 2021 v 14:13 odesílatel Vik Fearing 
mailto:v...@postgresfriends.org>> napsal:


On 10/14/21 1:47 PM, Aleksander Alekseev wrote:
> Hi Gilles,
>
>> Any though and interest in this feature?
>
> Personally, I wouldn't call this feature particularly useful.
`SELECT
> *` is intended for people who are working with DBMS directly
e.g. via
> psql and want to see ALL columns.

I disagree strongly with this.  It is really annoying when working
interactively with psql on a table that has a PostGIS geometry column,
or any other large blobby type column.

I have not looked at the patch, but +1 for the feature.


Cannot be better to redefine some strategies for output for some types.

I can agree so sometimes in some environments proposed features can be 
nice, but it can be a strong footgun too.


Maybe some strange data can be filtered in psql and it can be better 
solution. I agree, so usually print long geometry in psql is useless.



Pavel this doesn't concern only output but input too, think about the 
INSERT or COPY without a column list. We can add such filter in psql but 
how about other clients? They all have to implement their own filtering 
method. I think the HIDDEN attribute provide a common and basic way to 
implement that in all client application.



--
Gilles Darold
http://www.darold.net/



Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-14 Thread Gilles Darold

Le 14/10/2021 à 14:09, Aleksander Alekseev a écrit :

Hi again,

> So all in all, -1. [...]

Here is something I would like to add:

1. As far as I know, "all the rest of DBMS have this" was never a good 
argument in the PostgreSQL community. Generally, using it will turn 
people against you.



I have cited the implementation in the other RDBMS because it helps to 
understand the feature, it shows the state of the art on it and 
illustrates my needs. If making references to other implementation turns 
people against me I think that they have the wrong approach on this 
proposal and if we refuse feature because they are implemented in other 
RDBMS this is even worst. I'm not agree with this comment.



2. I recall there was a proposal of making the SQL syntax itself 
extendable. To my knowledge, this is still a wanted feature [1]. In 
theory, that would allow you to implement the feature you want in an 
extension.



For what I've read in this thread 
https://www.postgresql.org/message-id/flat/20210501072458.adqjoaqnmhg4l34l%40nol 
there is no real consensus in how implementing this feature should be 
done. But I agree that if the implementation through an extension was 
possible I would not write a patch to core but an extension, this is my 
common behavior.



Best regards,

--
Gilles Darold
http://www.darold.net/





Re: [PATCH] Proposal for HIDDEN/INVISIBLE column

2021-10-14 Thread Gilles Darold

Le 14/10/2021 à 13:47, Aleksander Alekseev a écrit :

Hi Gilles,


Any though and interest in this feature?

Personally, I wouldn't call this feature particularly useful. `SELECT
*` is intended for people who are working with DBMS directly e.g. via
psql and want to see ALL columns. The applications should never use
`SELECT *`. So I can't see any real benefits of adding this feature to
PostgreSQL. It will only make the existing code and the existing user
interface even more complicated than they are now.



Thanks for your comments Aleksander. This was also my thougth at 
begining but unfortunately there is cases where things are not so simple 
and just relying on SELECT * is dirty or forbidden.  The hidden column 
are not only useful for SELECT * but also for INSERT without column 
list, but INSERT without column list is also a bad practice.




Also, every yet another feature is x N corner cases when this feature
works with other N features of PostgreSQL. How should it work with
partitioned or inherited tables? Or with logical replication? With
pg_dump? With COPY?



I recommand you to have look to my patch because the partitioned and 
inherited case are covered, you can have a . For logical replication I 
guess that any change in pg_attribute is also replicated so I I would 
said that it is fully supported. But obviously I may miss something. 
pg_dump and COPY are also supported.



Actually the patch only prevent an hidden column to be part of a star 
expansion for the returned column, I don't think there is corner case 
with the other part of the code outside that we need to prevent a table 
to have all columns hidden. But I could miss something, I agree.




So all in all, -1. This being said, I very much appreciate your
attempt to improve PostgreSQL. However next time before writing the
code I suggest submitting an RFC first.



Don't worry about my time spent for the PG community, this patch is a 
dust in my contribution to open source :-) If I have provided the patch 
to show the concept and how it can be easily implemented.  Also it can 
be used in some PostgreSQL forks if one is interested by this feature.



--

Gilles Darold





Re: Schema variables - new implementation for Postgres 15

2021-09-09 Thread Gilles Darold

Le 09/09/2021 à 11:40, Pavel Stehule a écrit :

Hi

čt 9. 9. 2021 v 12:21 odesílatel Erik Rijkers <mailto:e...@xs4all.nl>> napsal:


 > [schema-variables-20210909.patch]

Hi Pavel,

The patch applies and compiles fine but 'make check' for the
assert-enabled fails on 131 out of 210 tests.


This morning I tested it. I'll recheck it.

Pavel



I had not this problem yesterday.


--
Gilles Darold
http://www.darold.net/



Re: Schema variables - new implementation for Postgres 15

2021-09-08 Thread Gilles Darold

Le 08/09/2021 à 13:41, Pavel Stehule a écrit :

Hi

so 28. 8. 2021 v 11:57 odesílatel Gilles Darold <mailto:gil...@darold.net>> napsal:


Hi,

Review resume:


This patch implements Schema Variables that are database objects
that can hold a single or composite value following the data type
used at variable declaration. Schema variables, like relations,
exist within a schema and their access is controlled via GRANT and
REVOKE commands. The schema variable can be created by the CREATE
VARIABLE command, altered using ALTER VARIABLE and removed using
DROP VARIABLE.

The value of a schema variable is local to the current session.
Retrieving a variable's value returns either a NULL or a default
value, unless its value is set to something else in the current
session with a LET command. The content of a variable is not
transactional. This is the same as in regular variables in PL
languages.

Schema variables are retrieved by the SELECT SQL command. Their
value is set with the LET SQL command. While schema variables
share properties with tables, their value cannot be updated with
an UPDATE command.


The patch apply with the patch command without problem and
compilation reports no warning or errors. Regression tests pass
successfully using make check or make installcheck
It also includes all documentation and regression tests.

Performances are near the set of plpgsql variable settings which
is impressive:

do $$
declare var1 int ; i int;
begin
  for i in 1..100
  loop
    var1 := i;
  end loop;
end;
$$;
DO
Time: 71,515 ms

CREATE VARIABLE var1 AS integer;
do $$
declare i int ;
begin
  for i in 1..100
  loop
    let var1 = i;
  end loop;
end;
$$;
DO
Time: 94,658 ms

There is just one thing that puzzles me.We can use :

    CREATE VARIABLE var1 AS date NOT NULL;
    postgres=# SELECT var1;
    ERROR:  null value is not allowed for NOT NULL schema variable
"var1"

which I understand and is the right behavior. But if we use:

    CREATE IMMUTABLE VARIABLE var1 AS date NOT NULL;
    postgres=# SELECT var1;
    ERROR:  null value is not allowed for NOT NULL schema variable
"var1"
    DETAIL:  The schema variable was not initialized yet.
    postgres=# LET var1=current_date;
    ERROR:  schema variable "var1" is declared IMMUTABLE

It should probably be better to not allow NOT NULL when IMMUTABLE
is used because the variable can not be used at all.  Also
probably IMMUTABLE without a DEFAULT value should also be
restricted as it makes no sens. If the user wants the variable to
be NULL he must use DEFAULT NULL. This is just a though, the above
error messages are explicit and the user can understand what wrong
declaration he have done.


I wrote a check that disables this case.  Please, see the attached 
patch. I agree, so this case is confusing, and it is better to disable it.




Great, I also think that this is better to not confuse the user.

    postgres=# CREATE IMMUTABLE VARIABLE var1 AS date NOT NULL;
    ERROR:  IMMUTABLE NOT NULL variable requires default expression

Working as expected. I have moved the patch to "Ready for committers". 
Thanks for this feature.



--
Gilles Darold
http://www.darold.net/



Re: [PATCH] Hooks at XactCommand level

2021-09-04 Thread Gilles Darold

I have changed the status of this proposal as rejected.


To resume the final state of this proposal there is no consensus on the 
interest to add a hook on start xact commands. Also the only useful case 
for this hook was to be able to have a server side automatic rollback at 
statement level. It can be regrettable because I don't think that 
PostgreSQL will have such feature before a long time (that's probably 
better) and a way to external implementation through an extension would 
be helpful for migration from other RDBMS like DB2 or Oracle. The only 
ways to have this feature is to handle the rollback at client side using 
savepoint, which is at least 3 times slower than a server side 
implementation, or not use such implementation at all. Outside not being 
performant it doesn't scale due to txid wraparound. And the last way is 
to use a proprietary forks of PostgreSQL, some are proposing this  feature.


--
Gilles Darold
http://www.darold.net/






Re: Schema variables - new implementation for Postgres 15

2021-08-28 Thread Gilles Darold
Hi,

Review resume:


This patch implements Schema Variables that are database objects that
can hold a single or composite value following the data type used at
variable declaration. Schema variables, like relations, exist within a
schema and their access is controlled via GRANT and REVOKE commands. The
schema variable can be created by the CREATE VARIABLE command, altered
using ALTER VARIABLE and removed using DROP VARIABLE.

The value of a schema variable is local to the current session.
Retrieving a variable's value returns either a NULL or a default value,
unless its value is set to something else in the current session with a
LET command. The content of a variable is not transactional. This is the
same as in regular variables in PL languages.

Schema variables are retrieved by the SELECT SQL command. Their value is
set with the LET SQL command. While schema variables share properties
with tables, their value cannot be updated with an UPDATE command.


The patch apply with the patch command without problem and compilation
reports no warning or errors. Regression tests pass successfully using
make check or make installcheck
It also includes all documentation and regression tests.

Performances are near the set of plpgsql variable settings which is
impressive:

do $$
declare var1 int ; i int;
begin
  for i in 1..100
  loop
    var1 := i;
  end loop;
end;
$$;
DO
Time: 71,515 ms

CREATE VARIABLE var1 AS integer;
do $$
declare i int ;
begin
  for i in 1..100
  loop
    let var1 = i;
  end loop;
end;
$$;
DO
Time: 94,658 ms

There is just one thing that puzzles me.We can use :

    CREATE VARIABLE var1 AS date NOT NULL;
    postgres=# SELECT var1;
    ERROR:  null value is not allowed for NOT NULL schema variable "var1"

which I understand and is the right behavior. But if we use:

    CREATE IMMUTABLE VARIABLE var1 AS date NOT NULL;
    postgres=# SELECT var1;
    ERROR:  null value is not allowed for NOT NULL schema variable "var1"
    DETAIL:  The schema variable was not initialized yet.
    postgres=# LET var1=current_date;
    ERROR:  schema variable "var1" is declared IMMUTABLE

It should probably be better to not allow NOT NULL when IMMUTABLE is
used because the variable can not be used at all.  Also probably
IMMUTABLE without a DEFAULT value should also be restricted as it makes
no sens. If the user wants the variable to be NULL he must use DEFAULT
NULL. This is just a though, the above error messages are explicit and
the user can understand what wrong declaration he have done.

Except that I think this patch is ready for committers, so if there is
no other opinion in favor of restricting the use of IMMUTABLE with NOT
NULL and DEFAULT I will change the status to ready for committers.

-- 
Gilles Darold
http://www.darold.net/



Re: [PATCH] Hooks at XactCommand level

2021-08-13 Thread Gilles Darold
Le 13/08/2021 à 11:58, Andres Freund a écrit :
> Hi,
>
> On 2021-08-10 10:12:26 +0200, Gilles Darold wrote:
>> Sorry for the response delay. I have though about adding this odd hook to be
>> able to implement this feature through an extension because I don't think
>> this is something that should be implemented in core. There were also
>> patches proposals which were all rejected.
>>
>> We usually implement the feature at client side which is imo enough for the
>> use cases. But the problem is that this a catastrophe in term of
>> performances. I have done a small benchmark to illustrate the problem. This
>> is a single process client on the same host than the PG backend.
>>
>> For 10,000 tuples inserted with 50% of failures and rollback at statement
>> level handled at client side:
>>
>>     Expected: 5001, Count:  5001
>>     DML insert took: 13 wallclock secs ( 0.53 usr +  0.94 sys =  1.47
>> CPU)
> Something seems off here. This suggests every insert took 2.6ms. That
> seems awfully long, unless your network latency is substantial. I did a
> quick test implementing this in the naive-most way in pgbench, and I get
> better times - and there's *lots* of room for improvement.
>
> I used a pgbench script that sent the following:
> BEGIN;
> SAVEPOINT insert_fail;
> INSERT INTO testinsert(data) VALUES (1);
> ROLLBACK TO SAVEPOINT insert_fail;
> SAVEPOINT insert_success;
> INSERT INTO testinsert(data) VALUES (1);
> RELEASE SAVEPOINT insert_success;
> {repeat 5 times}
> COMMIT;
>
> I.e. 5 failing and 5 succeeding insertions wrapped in one transaction. I
> get >2500 tps, i.e. > 25k rows/sec. And it's not hard to optimize that
> further - the {ROLLBACK TO,RELEASE} SAVEPOINT; SAVEPOINT; INSERT can be
> sent in one roundtrip. That gets me to somewhere around 40k rows/sec.
>
>
> BEGIN;
>
> \startpipeline
> SAVEPOINT insert_fail;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> \startpipeline
> ROLLBACK TO SAVEPOINT insert_fail;
> SAVEPOINT insert_success;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> \startpipeline
> RELEASE SAVEPOINT insert_success;
> SAVEPOINT insert_fail;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> \startpipeline
> ROLLBACK TO SAVEPOINT insert_fail;
> SAVEPOINT insert_success;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> {repeat last two blocks three times}
> COMMIT;
>
> Greetings,
>
> Andres Freund


I have written a Perl script to mimic what I have found in an Oracle
batch script to import data in a table. I had this use case in a recent
migration the only difference is that the batch was written in Java.


$dbh->do("BEGIN") or die "FATAL: " . $dbh->errstr . "\n";
my $start = new Benchmark;
my $sth = $dbh->prepare("INSERT INTO t1 VALUES (?, ?)");
exit 1 if (not defined $sth);
for (my $i = 0; $i <= 1; $i++)
{
    $dbh->do("SAVEPOINT foo") or die "FATAL: " . $dbh->errstr . "\n";
    # Generate a duplicate key each two row inserted
    my $val = $i;
    $val = $i-1 if ($i % 2 != 0);
    unless ($sth->execute($val, 'insert '.$i)) {
    $dbh->do("ROLLBACK TO foo") or die "FATAL: " .
$dbh->errstr . "\n";
    } else {
    $dbh->do("RELEASE foo") or die "FATAL: " . $dbh->errstr
. "\n";
    }
}
$sth->finish();
my $end = new Benchmark;

$dbh->do("COMMIT;");

my $td = timediff($end, $start);
print "DML insert took: " . timestr($td) . "\n";


The timing reported are from my personal computer, there is no network
latency, it uses localhost. Anyway, the objective was not to bench the
DML throughput but the overhead of the rollback at statement level made
at client side versus server side. I guess that you might have the same
speed gain around x3 to x5 or more following the number of tuples?


The full script can be found here
https://github.com/darold/pg_statement_rollbackv2/blob/main/test/batch_script_example.pl


Cheers,

-- 
Gilles Darold






Re: [PATCH] Hooks at XactCommand level

2021-08-10 Thread Gilles Darold

Le 30/07/2021 à 23:49, Tom Lane a écrit :

Andres Freund  writes:

On 2021-07-30 13:58:51 -0400, Tom Lane wrote:

I've not read this version of the patch, but I see from the cfbot's
results that it's broken postgres_fdw.

I think this may partially be an issue with the way that postgres_fdw
uses the callback than with the patch. It disconnects from the server
*regardless* of the XactEvent passed to the callback. That makes it
really hard to extend the callback mechanism to further events...

Perhaps.  Nonetheless, I thought upthread that adding these events
as Xact/SubXactCallback events was the wrong design, and I still
think that.  A new hook would be a more sensible way.


I'm *very* unconvinced it makes sense to implement a feature like this
in an extension / that we should expose API for that purpose. To me the
top-level transaction state is way too tied to our internals for it to
be reasonably dealt with in an extension.

Yeah, that's the other major problem --- the use-case doesn't seem
very convincing.  I'm not even sold on the goal, let alone on trying
to implement it by hooking into these particular places.  I think
that'll end up being buggy and fragile as well as not very performant.



I've attached the new version v5 of the patch that use a hook instead of 
the use of a xact callback. Compared to the first implementation calls 
to the hook have been extracted from the start_xact_command() function. 
The test extension have also be updated.



If I understand well the last discussions there is no chance of having 
this hook included. If there is no contrary opinion I will withdraw the 
patch from the commitfest. However thank you so much to have taken time 
to review this proposal.



Best regards,

--
Gilles Darold

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 530caa520b..bc62a2cb98 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -207,6 +207,8 @@ static void log_disconnections(int code, Datum arg);
 static void enable_statement_timeout(void);
 static void disable_statement_timeout(void);
 
+/* Hooks for plugins to get control at end of start_xact_command() */
+XactCommandStart_hook_type start_xact_command_hook = NULL;
 
 /* 
  *		routines to obtain user input
@@ -989,6 +991,13 @@ exec_simple_query(const char *query_string)
 	 */
 	start_xact_command();
 
+	/*
+	 * Now give loadable modules a chance to execute code
+	 * before a transaction command is processed.
+	 */
+	if (start_xact_command_hook)
+		(*start_xact_command_hook) ();
+
 	/*
 	 * Zap any pre-existing unnamed statement.  (While not strictly necessary,
 	 * it seems best to define simple-Query mode as if it used the unnamed
@@ -1082,6 +1091,13 @@ exec_simple_query(const char *query_string)
 		/* Make sure we are in a transaction command */
 		start_xact_command();
 
+		/*
+		 * Now give loadable modules a chance to execute code
+		 * before a transaction command is processed.
+		 */
+		if (start_xact_command_hook)
+			(*start_xact_command_hook) ();
+
 		/*
 		 * If using an implicit transaction block, and we're not already in a
 		 * transaction block, start an implicit block to force this statement
@@ -1361,6 +1377,13 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	 */
 	start_xact_command();
 
+	/*
+	 * Now give loadable modules a chance to execute code
+	 * before a transaction command is processed.
+	 */
+	if (start_xact_command_hook)
+		(*start_xact_command_hook) ();
+
 	/*
 	 * Switch to appropriate context for constructing parsetrees.
 	 *
@@ -1647,6 +1670,13 @@ exec_bind_message(StringInfo input_message)
 	 */
 	start_xact_command();
 
+	/*
+	 * Now give loadable modules a chance to execute code
+	 * before a transaction command is processed.
+	 */
+	if (start_xact_command_hook)
+		(*start_xact_command_hook) ();
+
 	/* Switch back to message context */
 	MemoryContextSwitchTo(MessageContext);
 
@@ -2140,6 +2170,13 @@ exec_execute_message(const char *portal_name, long max_rows)
 	 */
 	start_xact_command();
 
+	/*
+	 * Now give loadable modules a chance to execute code
+	 * before a transaction command is processed.
+	 */
+	if (start_xact_command_hook)
+		(*start_xact_command_hook) ();
+
 	/*
 	 * If we re-issue an Execute protocol request against an existing portal,
 	 * then we are only fetching more rows rather than completely re-executing
@@ -2546,6 +2583,13 @@ exec_describe_statement_message(const char *stmt_name)
 	 */
 	start_xact_command();
 
+	/*
+	 * Now give loadable modules a chance to execute code
+	 * before a transaction command is processed.
+	 */
+	if (start_xact_command_hook)
+		(*start_xact_command_hook) ();
+
 	/* Switch back to message context */
 	MemoryContextSwitchTo(MessageContext);
 
@@ -2641,6 +2685,13 @@ exec_describe_portal_message(const char *portal_name)
 	 */
 	start_xact_command();
 
+	/*
+	 * Now give loadable modules a chance to execute

Re: [PATCH] Hooks at XactCommand level

2021-08-10 Thread Gilles Darold

Le 31/07/2021 à 01:28, Andres Freund a écrit :



I'm *very* unconvinced it makes sense to implement a feature like this
in an extension / that we should expose API for that purpose. To me the
top-level transaction state is way too tied to our internals for it to
be reasonably dealt with in an extension.

Yeah, that's the other major problem --- the use-case doesn't seem
very convincing.  I'm not even sold on the goal, let alone on trying
to implement it by hooking into these particular places.  I think
that'll end up being buggy and fragile as well as not very performant.

I'm more favorable than you on the overall goal. Migrations to PG are a
frequent and good thing and as discussed before, lots of PG forks ended
up implementing a version of this. Clearly there's demand.



Sorry for the response delay. I have though about adding this odd hook 
to be able to implement this feature through an extension because I 
don't think this is something that should be implemented in core. There 
were also patches proposals which were all rejected.


We usually implement the feature at client side which is imo enough for 
the use cases. But the problem is that this a catastrophe in term of 
performances. I have done a small benchmark to illustrate the problem. 
This is a single process client on the same host than the PG backend.


For 10,000 tuples inserted with 50% of failures and rollback at 
statement level handled at client side:


    Expected: 5001, Count:  5001
    DML insert took: 13 wallclock secs ( 0.53 usr +  0.94 sys =  
1.47 CPU)


Now with statement at rollback level handled at server side using the 
hook and the extension:


    Expected: 5001, Count:  5001
    DML insert took:  4 wallclock secs ( 0.27 usr +  0.32 sys =  
0.59 CPU)



And with 100,000 tuples this is worst. Without the extension:

    Expected: 50001, Count:  50001
    DML insert took: 1796 wallclock secs (14.95 usr + 20.29 sys = 
35.24 CPU)


with server side Rollback at statement level:

    Expected: 50001, Count:  50001
    DML insert took: 372 wallclock secs ( 4.85 usr +  5.45 sys = 
10.30 CPU)



I think this is not so uncommon use cases and that could shows the 
interest of such extension.




However, I think a proper implementation would require a substantial
amount of effort. Including things like optimizing the subtransaction
logic so that switching the feature on doesn't lead to xid wraparound
issues. Adding odd hooks doesn't move us towards a real solution imo.


I would like to help on this part but unfortunately I have no idea on 
how we can improve that.



Best regards,

--
Gilles Darold



Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-03 Thread Gilles Darold

Le 03/08/2021 à 15:39, Tom Lane a écrit :

Erik Rijkers  writes:

On 8/3/21 1:26 PM, Gilles Darold wrote:

Le 03/08/2021 à 11:45, Gilles Darold a écrit :

Actually I just found that the regexp_like() function doesn't support
the start parameter which is something we should support. I saw that
Oracle do not support it but DB2 does and I think we should also
support it. I will post a new version of the patch once it is done.

+1
I for one am in favor of this 'start'-argument addition.  Slightly
harder usage, but more precise manipulation.

As I said upthread, I am *not* in favor of making those DB2 additions.
We do not need to create ambiguities around those functions like the
one we have for regexp_replace.  If Oracle doesn't have those options,
why do we need them?



Sorry I have missed that, but I'm fine with this implemenation so let's 
keep the v6 version of the patch and drop this one.


--
Gilles Darold





Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-03 Thread Gilles Darold

Le 03/08/2021 à 11:45, Gilles Darold a écrit :
Actually I just found that the regexp_like() function doesn't support 
the start parameter which is something we should support. I saw that 
Oracle do not support it but DB2 does and I think we should also 
support it. I will post a new version of the patch once it is done.



Here is a new version of the patch that adds the start parameter to 
regexp_like() function but while I'm adding support to this parameter it 
become less obvious for me that we should implement it. However feel 
free to not use this version if you think that adding the start 
parameter has no real interest.



Best regards,

--
Gilles Darold

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a5b6adc4bb..2bc9060e47 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3108,6 +3108,80 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text
+ [, start integer
+ [, flags text ] ] )
+integer
+   
+   
+Returns the number of times the POSIX regular
+expression pattern matches in
+the string; see
+.
+   
+   
+regexp_count('123456789012', '\d\d\d', 2)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text
+ [, start integer
+ [, N integer
+ [, endoption integer
+ [, flags text
+ [, subexpr integer ] ] ] ] ] )
+integer
+   
+   
+Returns the position within string where
+the N'th match of the POSIX regular
+expression pattern occurs, or zero if there is
+no such match; see .
+   
+   
+regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i')
+3
+   
+   
+regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i', 2)
+5
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text
+ [, start integer
+ [, flags text ] ] )
+boolean
+   
+   
+Checks whether a match of the POSIX regular
+expression pattern occurs
+within string starting at
+the start'th character; see
+.
+   
+   
+regexp_like('Hello World', 'world$', 1, 'i')
+t
+   
+  
+
   

 
@@ -3117,8 +3191,9 @@ repeat('Pg', 4) PgPgPgPg
 text[]


-Returns captured substrings resulting from the first match of a POSIX
-regular expression to the string; see
+Returns captured substrings resulting from the first match of the
+POSIX regular expression pattern to
+the string; see
 .


@@ -3136,10 +3211,11 @@ repeat('Pg', 4) PgPgPgPg
 setof text[]


-Returns captured substrings resulting from the first match of a
-POSIX regular expression to the string,
-or multiple matches if the g flag is used;
-see .
+Returns captured substrings resulting from the first match of the
+POSIX regular expression pattern to
+the string, or all matches if
+the g flag is used; see
+.


 regexp_matches('foobarbequebaz', 'ba.', 'g')
@@ -3156,14 +3232,16 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text
+ [, start integer ]
+ [, flags text ] )
 text


-Replaces substrings resulting from the first match of a
-POSIX regular expression, or multiple substring matches
-if the g flag is used; see .
+Replaces the substring that is the first match to the POSIX
+regular expression pattern, or all matches
+if the g flag is used; see
+.


 regexp_replace('Thomas', '.[mN]a.', 'M')
@@ -3171,6 +3249,26 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+regexp_replace ( string text, pattern text, replacement text,
+ start integer,
+ N integer
+ [, flags text ] )
+text
+   
+   
+Replaces the substring that is the N'th
+match to the POSIX regular expression pattern,
+or all matches if N is zero; see
+.
+   
+   
+regexp_replace('Thomas', '.', 'X', 3, 2)
+ThoXas
+   
+  
+
   

 
@@ -3213,6 +3311,35 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text
+ [, start integer
+ [, N integer
+ [, flags text
+ [, subexpr

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-03 Thread Gilles Darold

Le 02/08/2021 à 23:22, Gilles Darold a écrit :

Le 02/08/2021 à 01:21, Tom Lane a écrit :

Gilles Darold  writes:

[ v5-0001-regexp-foo-functions.patch ]

I've gone through this whole patch now, and found quite a lot that I did
not like.  In no particular order:

* Wrapping parentheses around the user's regexp doesn't work.  It can
turn an invalid regexp into a valid one: for example 'a)(b' should draw
a syntax error.  With this patch, no error would be thrown, but the
"outer" parens wouldn't do what you expected.  Worse, it can turn a
valid regexp into an invalid one: the metasyntax options described in
9.7.3.4 only work at the start of the regexp.  So we have to handle
whole-regexp cases honestly rather than trying to turn them into an
instance of the parenthesized-subexpression case.

* You did a lot of things quite inefficiently, apparently to avoid
touching any existing code.  I think it's better to extend
setup_regexp_matches() and replace_text_regexp() a little bit so that
they can support the behaviors these new functions need.  In both of
them, it's absolutely trivial to allow a search start position to be
passed in; and it doesn't take much to teach replace_text_regexp()
to replace only the N'th match.

* Speaking of N'th, there is not much of anything that I like
about Oracle's terminology for the function arguments, and I don't
think we ought to adopt it.  If we're documenting the functions as
processing the "N'th match", it seems to me to be natural to call
the parameter "N" not "occurrence".  Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell.
Likewise, "position" is a horribly vague term for the search start
position; it could be interpreted to mean several other things.
"start" seems much better.  "return_opt" is likewise awfully unclear.
I went with "endoption" below, though I could be talked into something
else.  The only one of Oracle's choices that I like is "subexpr" for
subexpression number ... but you went with DB2's rather vague "group"
instead.  I don't want to use their "capture group" terminology,
because that appears nowhere else in our documentation.  Our existing
terminology is "parenthesized subexpression", which seems fine to me
(and also agrees with Oracle's docs).

* I spent a lot of time on the docs too.  A lot of the syntax specs
were wrong (where you put the brackets matters), many of the examples
seemed confusingly overcomplicated, and the text explanations needed
copy-editing.

* Also, the regression tests seemed misguided.  This patch is not
responsible for testing the regexp engine as such; we have tests
elsewhere that do that.  So I don't think we need complex regexps
here.  We just need to verify that the parameters of these functions
act properly, and check their error cases.  That can be done much
more quickly and straightforwardly than what you had.


So here's a revised version that I like better.  I think this
is pretty nearly committable, aside from the question of whether
a too-large subexpression number should be an error or not.


Thanks a lot for the patch improvement and the guidance. I have read the
patch and I agree with your choices I think I was too much trying to
mimic the oraclisms. I don't think we should take care of the too-large
subexpression number, the regexp writer should always test its regular
expression and also this will not prevent him to chose the wrong capture
group number but just a non existing one.



Actually I just found that the regexp_like() function doesn't support 
the start parameter which is something we should support. I saw that 
Oracle do not support it but DB2 does and I think we should also support 
it. I will post a new version of the patch once it is done.



Best regards,

--
Gilles Darold





Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-02 Thread Gilles Darold
Le 02/08/2021 à 01:21, Tom Lane a écrit :
> Gilles Darold  writes:
>> [ v5-0001-regexp-foo-functions.patch ]
> I've gone through this whole patch now, and found quite a lot that I did
> not like.  In no particular order:
>
> * Wrapping parentheses around the user's regexp doesn't work.  It can
> turn an invalid regexp into a valid one: for example 'a)(b' should draw
> a syntax error.  With this patch, no error would be thrown, but the
> "outer" parens wouldn't do what you expected.  Worse, it can turn a
> valid regexp into an invalid one: the metasyntax options described in
> 9.7.3.4 only work at the start of the regexp.  So we have to handle
> whole-regexp cases honestly rather than trying to turn them into an
> instance of the parenthesized-subexpression case.
>
> * You did a lot of things quite inefficiently, apparently to avoid
> touching any existing code.  I think it's better to extend
> setup_regexp_matches() and replace_text_regexp() a little bit so that
> they can support the behaviors these new functions need.  In both of
> them, it's absolutely trivial to allow a search start position to be
> passed in; and it doesn't take much to teach replace_text_regexp()
> to replace only the N'th match.
>
> * Speaking of N'th, there is not much of anything that I like
> about Oracle's terminology for the function arguments, and I don't
> think we ought to adopt it.  If we're documenting the functions as
> processing the "N'th match", it seems to me to be natural to call
> the parameter "N" not "occurrence".  Speaking of the "occurrence'th
> occurrence" is just silly, not to mention long and easy to misspell.
> Likewise, "position" is a horribly vague term for the search start
> position; it could be interpreted to mean several other things.
> "start" seems much better.  "return_opt" is likewise awfully unclear.
> I went with "endoption" below, though I could be talked into something
> else.  The only one of Oracle's choices that I like is "subexpr" for
> subexpression number ... but you went with DB2's rather vague "group"
> instead.  I don't want to use their "capture group" terminology,
> because that appears nowhere else in our documentation.  Our existing
> terminology is "parenthesized subexpression", which seems fine to me
> (and also agrees with Oracle's docs).
>
> * I spent a lot of time on the docs too.  A lot of the syntax specs
> were wrong (where you put the brackets matters), many of the examples
> seemed confusingly overcomplicated, and the text explanations needed
> copy-editing.
>
> * Also, the regression tests seemed misguided.  This patch is not
> responsible for testing the regexp engine as such; we have tests
> elsewhere that do that.  So I don't think we need complex regexps
> here.  We just need to verify that the parameters of these functions
> act properly, and check their error cases.  That can be done much
> more quickly and straightforwardly than what you had.
>
>
> So here's a revised version that I like better.  I think this
> is pretty nearly committable, aside from the question of whether
> a too-large subexpression number should be an error or not.


Thanks a lot for the patch improvement and the guidance. I have read the
patch and I agree with your choices I think I was too much trying to
mimic the oraclisms. I don't think we should take care of the too-large
subexpression number, the regexp writer should always test its regular
expression and also this will not prevent him to chose the wrong capture
group number but just a non existing one.


Best regards,

-- 
Gilles Darold






Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Gilles Darold
Le 01/08/2021 à 19:23, Tom Lane a écrit :
> I've been working through this patch, and trying to verify
> compatibility against Oracle and DB2, and I see some points that need
> discussion or at least recording for the archives.
>
> * In Oracle, while the documentation for regexp_instr says that
> return_option should only be 0 or 1, experimentation with sqlfiddle
> shows that any nonzero value is silently treated as 1.  The patch
> raises an error for other values, which I think is a good idea.
> (IBM's docs say that DB2 raises an error too, though I can't test
> that.)  We don't need to be bug-compatible to that extent.
>
> * What should happen when the subexpression/capture group number of
> regexp_instr or regexp_substr exceeds the number of parenthesized
> subexpressions of the regexp?  Oracle silently returns a no-match
> result (0 or NULL), as does this patch.  However, IBM's docs say
> that DB2 raises an error.  I'm inclined to think that this is
> likewise taking bug-compatibility too far, and that we should
> raise an error like DB2.  There are clearly cases where throwing
> an error would help debug a faulty call, while I'm less clear on
> a use-case where not throwing an error would be useful.
>
> * IBM's docs say that both regexp_count and regexp_like have
> arguments "string, pattern [, start] [, flags]" --- that is,
> each of start and flags can be independently specified or omitted.
> The patch follows Oracle, which has no start option for 
> regexp_like, and where you can't write flags for regexp_count
> without writing start.  This is fine by me, because doing these
> like DB2 would introduce the same which-argument-is-this issues
> as we're being forced to cope with for regexp_replace.  I don't
> think we need to accept ambiguity in these cases too.  But it's
> worth memorializing this decision in the thread.
>
> * The patch has most of these functions silently ignoring the 'g'
> flag, but I think they should raise errors instead.  Oracle doesn't
> accept a 'g' flag for these, so why should we?  The only case where
> that logic doesn't hold is regexp_replace, because depending on which
> syntax you use the 'g' flag might or might not be meaningful.  So
> for regexp_replace, I'd vote for silently ignoring 'g' if the
> occurrence-number parameter is given, while honoring it if not.
>
> I've already made changes in my local copy per the last item,
> but I've not done anything about throwing errors for out-of-range
> subexpression numbers.  Anybody have an opinion about that one?


I thought about this while I was implementing the functions and chose to
not throw an error because of the Oracle behavior and also with others
regular expression implementation. For example in Perl there is no error:


$ perl -e '$str="hello world"; $str =~ s/(l)/$20/; print "$str\n";'
helo world


Usually a regular expression is always tested by its creator to be sure
that this the right one and that it does what is expected. But I agree
that it could help the writer to debug its RE.


Also if I recall well Oracle and DB2 limit the number of capture groups
back references from \1 to \9 for Oracle and \0 to \9 for DB2. I have
chosen to not apply this limit, I don't see the interest of such a
limitation.



-- 
Gilles Darold
http://www.darold.net/



Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Gilles Darold
Le 30/07/2021 à 23:38, Tom Lane a écrit :
> Gilles Darold  writes:
>> Le 26/07/2021 à 21:56, Tom Lane a écrit :
>>> I'm inclined to just drop the regexp_replace additions.  I don't think
>>> that the extra parameters Oracle provides here are especially useful.
>>> They're definitely not useful enough to justify creating compatibility
>>> hazards for.
>> I would not say that being able to replace the Nth occurrence of a
>> pattern matching is not useful but i agree that this is not a common
>> case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
>> and I have though that we can not have compatibility issues because of
>> the different data type at the 4th parameter.
> Well, here's an example of the potential issues:
>
> [...]


Thanks for pointing me this case, I did not think that the prepared
statement could lead to this confusion.


>> Anyway, maybe we can just
>> rename the function even if I would prefer that regexp_replace() be
>> extended. For example:
>>     regexp_replace(source, pattern, replacement [, flags ]);
>>     regexp_substitute(source, pattern, replacement [, position ] [,
>> occurrence ] [, flags ]);
> Hmm.  Of course the entire selling point of this patch seems to be
> bug-compatibility with Oracle, so using different names is largely
> defeating the point :-(
>
> Maybe we should just hold our noses and do it.  The point that
> you'd get a recognizable failure if the wrong function were chosen
> reassures me a little bit.  We've seen a lot of cases where this
> sort of ambiguity results in the system just silently doing something
> different from what you expected, and I was afraid that that could
> happen here.


I join a new version of the patch that include a check of the option
parameter in the basic form of regexp_replace() and return an error in
ambiguous cases.


PREPARE rr AS SELECT regexp_replace('healthy, wealthy, and
wise','(\w+)thy', '\1ish', $1);
EXECUTE rr(1);
ERROR:  ambiguous use of the option parameter in regex_replace(),
value: 1
HINT:  you might set the occurrence parameter to force the use of
the extended form of regex_replace()


This is done by checking if the option parameter value is an integer and
throw the error in this case. I don't think of anything better.


Best regards,

-- 
Gilles Darold

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a5b6adc4bb..02d1f72e1e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3108,6 +3108,66 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Returns the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Returns the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text [, flags text ] )
+boolean
+   
+   
+Evaluates the existence of a match to a POSIX regular expression
+in string; see .
+   
+   
+regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
+t
+   
+  
+
+
   

 
@@ -3156,7 +3216,7 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text [, position integer ] [, occurence integer ]  [, flags text ] )
 text


@@ -3171,6 +3231,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match o

Re: Case expression pushdown

2021-07-28 Thread Gilles Darold

Le 26/07/2021 à 18:03, Alexander Pyhalov a écrit :

Tom Lane писал 2021-07-26 18:18:

Alexander Pyhalov  writes:

[ 0001-Allow-pushing-CASE-expression-to-foreign-server-v6.patch ]


This doesn't compile cleanly:

deparse.c: In function 'foreign_expr_walker.isra.4':
deparse.c:920:8: warning: 'collation' may be used uninitialized in
this function [-Wmaybe-uninitialized]
 if (collation != outer_cxt->collation)
    ^
deparse.c:914:3: warning: 'state' may be used uninitialized in this
function [-Wmaybe-uninitialized]
   switch (state)
   ^~

These uninitialized variables very likely explain the fact that it fails
regression tests, both for me and for the cfbot.  Even if this 
weren't an

outright bug, we don't tolerate code that produces warnings on common
compilers.

    regards, tom lane


Hi.

Of course, this is a patch issue. Don't understand how I overlooked this.
Rebased on master and fixed it. Tests are passing here (but they also 
passed for previous patch version).


What exact tests are failing?



I confirm that there is no compilation warning and all regression tests 
pass successfully for the v7 patch, I have not checked previous patch 
but this one doesn't fail on cfbot too.



Best regards,

--
Gilles Darold





Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-07-27 Thread Gilles Darold
Le 26/07/2021 à 21:56, Tom Lane a écrit :
> Gilles Darold  writes:
>> [ v4-0001-regexp-foo-functions.patch ]
> I started to work through this and was distressed to realize that
> it's trying to redefine regexp_replace() in an incompatible way.
> We already have
>
> regression=# \df regexp_replace
>List of functions
>Schema   |  Name  | Result data type |  Argument data types   | 
> Type 
> ++--++--
>  pg_catalog | regexp_replace | text | text, text, text   | 
> func
>  pg_catalog | regexp_replace | text | text, text, text, text | 
> func
> (2 rows)
>
> The patch proposes to add (among other alternatives)
>
> +{ oid => '9608', descr => 'replace text using regexp',
> +  proname => 'regexp_replace', prorettype => 'text',
> +  proargtypes => 'text text text int4', prosrc => 
> 'textregexreplace_extended_no_occurrence' },
>
> which is going to be impossibly confusing for both humans and machines.
> I don't think we should go there.  Even if you managed to construct
> examples that didn't result in "ambiguous function" failures, that
> doesn't mean that ordinary mortals won't get bit that way.
>
> I'm inclined to just drop the regexp_replace additions.  I don't think
> that the extra parameters Oracle provides here are especially useful.
> They're definitely not useful enough to justify creating compatibility
> hazards for.


I would not say that being able to replace the Nth occurrence of a
pattern matching is not useful but i agree that this is not a common
case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
and I have though that we can not have compatibility issues because of
the different data type at the 4th parameter. Anyway, maybe we can just
rename the function even if I would prefer that regexp_replace() be
extended. For example:


    regexp_replace(source, pattern, replacement [, flags ]);

    regexp_substitute(source, pattern, replacement [, position ] [,
occurrence ] [, flags ]);


of course with only 3 parameters the two functions are the same.


What do you think about the renaming proposal instead of simply drop the
extended form of the function?


Best regards,


[1] https://docs.oracle.com/database/121/SQLRF/functions163.htm#SQLRF06302

[2] https://www.ibm.com/docs/en/db2oc?topic=functions-regexp-replace


-- 
Gilles Darold
http://www.darold.net/






Re: [PATCH] Hooks at XactCommand level

2021-07-16 Thread Gilles Darold

Le 14/07/2021 à 21:26, Tom Lane a écrit :

Gilles Darold  writes:

I have renamed the patch and the title of this proposal registered in
the commitfest "Xact/SubXact event callback at command start" to reflect
the last changes that do not include new hooks anymore.

Hmm, it doesn't seem like this has addressed my concern at all.
The callbacks are still effectively defined as "run during
start_xact_command", so they're not any less squishy semantically
than they were before.  The point of my criticism was that you
should move the call site to someplace that's more organically
connected to execution of commands.

Another thing I'm not too pleased with in this formulation is that it's
very unclear what the distinction is between XACT_EVENT_COMMAND_START
and SUBXACT_EVENT_COMMAND_START.  AFAICS, *every* potential use-case
for this would have to hook into both callback chains, and most likely
would treat the two events alike.


Please find in attachment the new version v2 of the patch, I hope this 
time I have well understood your advices. Myapologies for this waste of 
time.



I have moved the call to the callback out of start_xact_command() and 
limit his call into exec_simple_query() and c_parse_exemessage(). There 
is other call to start_xact_command() elsewhere but actually these two 
places areenough for what I'm doing with the extensions. I have updated 
the extension test cases to check the behavior when autocommit is on or 
off, error in execute of prepared statement and error in update where 
current of cursor. But there is certainly a case that I have missed.



Other calls of start_xact_command() are in exec_bind_message(), 
exec_execute_message(), exec_describe_statement_message(), 
exec_describe_portal_message and PostgresMain. In my test they are 
either not called or generates duplicates calls to the callback with 
exec_simple_query() and c_parse_exemessage().



Also CallXactStartCommand() will only use one event 
XACT_EVENT_COMMAND_START and only do a single call:


CallXactCallbacks(XACT_EVENT_COMMAND_START);



Plus, as you note, the goalposts have suddenly been moved for the
amount of overhead it's okay to have in an XactCallback or SubXactCallback
function.  So that might cause problems for unrelated code.  It's probably
better to not try to re-use that infrastructure.



About this maybe I was not clear in my bench, the overhead is not 
introduced by the patch on the callback, there is no overhead. But by 
the rollback at statement level extension. In case this was clear but 
you think that we must not reuse this callback infrastructure do you 
mean that I should fallback to a hook?



Best regard,

--
Gilles Darold
http://www.darold.net/

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 441445927e..7384345a48 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -6118,3 +6118,27 @@ MarkSubTransactionAssigned(void)
 
 	CurrentTransactionState->assigned = true;
 }
+
+/*
+ * CallXactStartCommand
+ *
+ * Wrapper over CallXactCallbacks called in postgres.c after the call to
+ * start_xact_command(). It allows to user-defined code to be executed
+ * for the start of any command through a xact registered callback. This
+ * function do nothing if a transaction is not started.
+ *
+ * The related XactEvent is XACT_EVENT_COMMAND_START.
+ */
+void
+CallXactStartCommand(void)
+{
+	TransactionState s = CurrentTransactionState;
+
+	if (s->blockState == TBLOCK_DEFAULT || s->blockState == TBLOCK_STARTED)
+		return;
+
+	/*
+	 * Call start-of-xact callbacks with start command event
+	 */
+	CallXactCallbacks(XACT_EVENT_COMMAND_START);
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..97a1a19d10 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -989,6 +989,13 @@ exec_simple_query(const char *query_string)
 	 */
 	start_xact_command();
 
+	/*
+	 * Instruct registered callbacks on xact that a new command is to be
+	 * executed. It allows to execute user-defined code before any new
+	 * statement is executed.
+	 */
+	CallXactStartCommand();
+
 	/*
 	 * Zap any pre-existing unnamed statement.  (While not strictly necessary,
 	 * it seems best to define simple-Query mode as if it used the unnamed
@@ -1082,6 +1089,13 @@ exec_simple_query(const char *query_string)
 		/* Make sure we are in a transaction command */
 		start_xact_command();
 
+		/*
+		 * Instruct registered callbacks on xact that a new command is to be
+		 * executed. It allows to execute user-defined code before any new
+		 * statement is executed.
+		 */
+		CallXactStartCommand();
+
 		/*
 		 * If using an implicit transaction block, and we're not already in a
 		 * transaction block, start an implicit block to force this statement
@@ -1361,6 +1375,13 @@ exec_parse_message(const char *query_string,	/* string to execute */
 	 */
 	start_xact_command();
 
+	/*
+	 * Instruct

Re: [PATCH] Hooks at XactCommand level

2021-07-15 Thread Gilles Darold

Le 15/07/2021 à 09:44, Gilles Darold a écrit :

Le 14/07/2021 à 21:26, Tom Lane a écrit :

Gilles Darold  writes:

I have renamed the patch and the title of this proposal registered in
the commitfest "Xact/SubXact event callback at command start" to reflect
the last changes that do not include new hooks anymore.

Hmm, it doesn't seem like this has addressed my concern at all.
The callbacks are still effectively defined as "run during
start_xact_command", so they're not any less squishy semantically
than they were before.  The point of my criticism was that you
should move the call site to someplace that's more organically
connected to execution of commands.


I would like to move it closer to the command execution but the only 
place I see would be in BeginCommand() which actually is waiting for 
code to execute, for the moment this function does nothing. I don't 
see another possible place after start_xact_command() call. All my 
attempt to inject the callback after start_xact_command() result in a 
failure. If you see an other place I will be please to give it a test.




Looks like I have not well understood again, maybe you want me to move 
the callback just after the start_xact_command() so that it is not 
"hidden" in the "run during

start_xact_command". Ok, I will do that.


--
Gilles Darold
http://www.darold.net/






Re: [PATCH] Hooks at XactCommand level

2021-07-15 Thread Gilles Darold

Le 14/07/2021 à 21:26, Tom Lane a écrit :

Gilles Darold  writes:

I have renamed the patch and the title of this proposal registered in
the commitfest "Xact/SubXact event callback at command start" to reflect
the last changes that do not include new hooks anymore.

Hmm, it doesn't seem like this has addressed my concern at all.
The callbacks are still effectively defined as "run during
start_xact_command", so they're not any less squishy semantically
than they were before.  The point of my criticism was that you
should move the call site to someplace that's more organically
connected to execution of commands.


I would like to move it closer to the command execution but the only 
place I see would be in BeginCommand() which actually is waiting for 
code to execute, for the moment this function does nothing. I don't see 
another possible place after start_xact_command() call. All my attempt 
to inject the callback after start_xact_command() result in a failure. 
If you see an other place I will be please to give it a test.




Another thing I'm not too pleased with in this formulation is that it's
very unclear what the distinction is between XACT_EVENT_COMMAND_START
and SUBXACT_EVENT_COMMAND_START.  AFAICS, *every* potential use-case
for this would have to hook into both callback chains, and most likely
would treat the two events alike.  Plus, as you note, the goalposts
have suddenly been moved for the amount of overhead it's okay to have
in an XactCallback or SubXactCallback function.  So that might cause
problems for unrelated code.  It's probably better to not try to
re-use that infrastructure.


Actually XACT_EVENT_COMMAND_START occurs only after the call BEGIN, when 
a transaction starts, whereas SUBXACT_EVENT_COMMAND_START occurs in all 
subsequent statement execution of this transaction. This helps to 
perform different actions following the event. In the example extension 
only SUBXACT_EVENT_COMMAND_START is used but for example I could use 
event XACT_EVENT_COMMAND_START to not send a RELEASE savepoint as there 
is none. I detect this case differently but this could be an improvement 
in the extension.








The objective of this new callback is to be able to call user-defined
code before any new statement is executed. For example it can call a
rollback to savepoint if there was an error in the previous transaction
statement, which allow to implements Rollback at Statement Level at
server side using a PostgreSQL extension, see [1] .

Urgh.  Isn't this re-making the same mistake we made years ago, namely
trying to implement autocommit on the server side?  I fear this will
be a disaster even larger than that was, because if it's an extension
then pretty much no applications will be prepared for the new semantics.
I strongly urge you to read the discussions that led up to f85f43dfb,
and to search the commit history before that for mentions of
"autocommit", to see just how extensive the mess was.

I spent a little time trying to locate said discussions; it's harder
than it should be because we didn't have the practice of citing email
threads in the commit log at the time.  I did find

https://www.postgresql.org/message-id/flat/Pine.LNX.4.44.0303172059170.1975-10%40peter.localdomain#7ae26ed5c1bfbf9b22a420dfd8b8e69f

which seems to have been the proximate decision, and here are
a few threads talking about all the messes that were created
for JDBC etc:

https://www.postgresql.org/message-id/flat/3D793A93.703%40xythos.com#4a2e2d9bdf2967906a6e0a75815d6636
https://www.postgresql.org/message-id/flat/3383060E-272E-11D7-BA14-000502E740BA%40wellsgaming.com
https://www.postgresql.org/message-id/flat/Law14-F37PIje6n0ssr0bc1%40hotmail.com

Basically, changing transactional semantics underneath clients is
a horrid idea.  Having such behavior in an extension rather than
the core doesn't make it less horrid.  If we'd designed it to act
that way from day one, maybe it'd have been fine.  But as things
stand, we are quite locked into the position that this has to be
managed on the client side.



Yes I have suffered of this implementation for server side autocommit, 
it was reverted in PG 7.4 if I remember well. I'm old enough to remember 
that :-). I'm also against restoring this feature inside PG core but the 
fact that the subject comes again almost every 2 years mean that there 
is a need on this feature. This is why I'm proposing to be able to use 
an extension for those who really need the feature, with all the 
associated warning.



For example in my case the first time I was needing this feature was to 
emulate the behavior of DB2 that allows rollback at statement level. 
This is not exactly autocommit because the transaction still need to be 
validated or rolledback at end, this is just that an error will not 
invalidate the full transaction but just the failing statement. I think 
that this is different. Actually I have an extension that is doing that 
for most

Re: [PATCH] Hooks at XactCommand level

2021-07-14 Thread Gilles Darold

Hi,


I have renamed the patch and the title of this proposal registered in 
the commitfest "Xact/SubXact event callback at command start" to reflect 
the last changes that do not include new hooks anymore.



Here is the new description corresponding to the current patch.


This patch allow to execute user-defined code for the start of any 
command through a xact registered callback. It introduce two new events 
in XactEvent and SubXactEvent enum called respectively 
XACT_EVENT_COMMAND_START and SUBXACT_EVENT_COMMAND_START. The callback 
is not called if a transaction is not started.



The objective of this new callback is to be able to call user-defined 
code before any new statement is executed. For example it can call a 
rollback to savepoint if there was an error in the previous transaction 
statement, which allow to implements Rollback at Statement Level at 
server side using a PostgreSQL extension, see [1] .



The patch compile and regressions tests with assert enabled passed 
successfully.


There is no regression test for this feature but extension at [1] has 
been used for validation of the new callback.



The patch adds insignificant overhead by looking at an existing callback 
definition but clearly it is the responsibility to the developer to 
evaluate the performances impact of its user-defined code for this 
callback as it will be called before each statement. Here is a very 
simple test using pgbench -c 20 -j 8 -T 30


    tps = 669.930274 (without user-defined code)
    tps = 640.718361 (with user-defined code from extension [1])

the overhead for this extension is around 4.5% which I think is not so 
bad good for such feature (internally it adds calls to RELEASE + 
SAVEPOINT before each write statement execution and in case of error a 
ROLLBACK TO savepoint).



[1] https://github.com/darold/pg_statement_rollbackv2

--
Gilles Darold
http://www.darold.net/

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 441445927e..3b5f6bfc2d 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -6118,3 +6118,32 @@ MarkSubTransactionAssigned(void)
 
 	CurrentTransactionState->assigned = true;
 }
+
+/*
+ * CallXactStartCommand
+ *
+ * Wrapper over CallXactCallbacks or CallSubXactCallbacks called in postgres.c
+ * at end of start_xact_command(). It allows to user-defined code to be executed
+ * for the start of any command through a xact registered callback. This function
+ * do nothing if a transaction is not started.
+ *
+ * The related events XactEvent/SubXactEvent are XACT_EVENT_COMMAND_START and
+ * SUBXACT_EVENT_COMMAND_START.
+ */
+void
+CallXactStartCommand(void)
+{
+	TransactionState s = CurrentTransactionState;
+
+	if (s->blockState == TBLOCK_DEFAULT || s->blockState == TBLOCK_STARTED)
+		return;
+
+	/*
+	 * Call start-of-xact callbacks with start command event
+	 */
+	if (s->parent && s->subTransactionId)
+		CallSubXactCallbacks(SUBXACT_EVENT_COMMAND_START, s->subTransactionId,
+			 s->parent->subTransactionId);
+	else
+		CallXactCallbacks(XACT_EVENT_COMMAND_START);
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..a0f4a17c51 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2708,6 +2708,16 @@ start_xact_command(void)
 		!get_timeout_active(CLIENT_CONNECTION_CHECK_TIMEOUT))
 		enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
 			 client_connection_check_interval);
+
+	/*
+	 * Instruct registered callbacks on xact that a new command is to be
+	 * executed. It allows to execute user-defined code before any new
+	 * statement is executed. This is the responsability of the user code
+	 * to take care of the state of the transaction, it can be in an ABORT
+	 * state. The related xact events are XACT_EVENT_COMMAND_START and
+	 * SUBXACT_EVENT_COMMAND_START.
+	 */
+	CallXactStartCommand();
 }
 
 static void
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 134f6862da..190fc7151b 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -119,7 +119,8 @@ typedef enum
 	XACT_EVENT_PREPARE,
 	XACT_EVENT_PRE_COMMIT,
 	XACT_EVENT_PARALLEL_PRE_COMMIT,
-	XACT_EVENT_PRE_PREPARE
+	XACT_EVENT_PRE_PREPARE,
+	XACT_EVENT_COMMAND_START
 } XactEvent;
 
 typedef void (*XactCallback) (XactEvent event, void *arg);
@@ -129,7 +130,8 @@ typedef enum
 	SUBXACT_EVENT_START_SUB,
 	SUBXACT_EVENT_COMMIT_SUB,
 	SUBXACT_EVENT_ABORT_SUB,
-	SUBXACT_EVENT_PRE_COMMIT_SUB
+	SUBXACT_EVENT_PRE_COMMIT_SUB,
+	SUBXACT_EVENT_COMMAND_START
 } SubXactEvent;
 
 typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
@@ -467,4 +469,6 @@ extern void EnterParallelMode(void);
 extern void ExitParallelMode(void);
 extern bool IsInParallelMode(void);
 
+extern void CallXactStartCommand(void);
+
 #endif			/* XACT_H */


Re: Introduce pg_receivewal gzip compression tests

2021-07-12 Thread Gilles Darold
Le 12/07/2021 à 12:27, gkokola...@pm.me a écrit :
>>>>
>>>> Shouldn't this be coded as a loop going through @gzip_wals?
>>> I would hope that there is only one gz file created. There is a line
>>>
>>> further up that tests exactly that.
>>>
>>> -   is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created");
>> Let me amend that. The line should be instead:
>>
>> is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created");
>>
>> To properly test that there is one entry.
>>
>> Let me provide with v2 to fix this.


The following tests are not correct in Perl even if Perl returns the
right value.

+    is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created");


+    is (scalar(keys @gzip_partial_wals), 1,
+        "one partial gzip compressed WAL was created");


Function keys or values are used only with hashes but here you are using
arrays. To obtain the length of the array you can just use the scalar
function as Perl returns the length of the array when it is called in a
scalar context. Please use the following instead:


+    is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created");


+    is (scalar(@gzip_partial_wals), 1,
+        "one partial gzip compressed WAL was created");


-- 
Gilles Darold
http://www.darold.net/






Re: Case expression pushdown

2021-07-07 Thread Gilles Darold

Le 07/07/2021 à 18:55, Gilles Darold a écrit :

Le 07/07/2021 à 18:50, Gilles Darold a écrit :


Great, I changing the state in the commitfest to "Ready for committers".


I'm attaching the v5 patch again as it doesn't appears in the Latest 
attachment list in the commitfest.




And the review summary:


This patch allows pushing CASE expressions to foreign servers, so that:

  - more types of updates could be executed directly
  - full foreign table scan can be avoid
  - more push down of aggregates function

The patch compile and regressions tests with assert enabled passed 
successfully.

There is a compiler warning but it is not related to this patch:

    deparse.c: In function ‘foreign_expr_walker.isra.0’:
    deparse.c:891:28: warning: ‘collation’ may be used 
uninitialized in this function [-Wmaybe-uninitialized]

  891 |   outer_cxt->collation = collation;
  |   ~^~~
    deparse.c:874:10: warning: ‘state’ may be used uninitialized in 
this function [-Wmaybe-uninitialized]

  874 |  else if (state == outer_cxt->state)
  |  ^

The regression test for this feature contains the use cases where push 
down of CASE clause are useful.

Nested CASE are also part of the regression tests.

The patch adds insignificant overhead by processing further than before 
a case expression but overall it adds a major performance improvement 
for queries on foreign tables that use a CASE WHEN clause in a predicate 
or in an aggregate function.



This patch does what it claims to do without detect problem, as expected 
the CASE clause is not pushed when a local table is involved in the CASE 
expression of if a non default collation is used.


Ready for committers.





Re: Case expression pushdown

2021-07-07 Thread Gilles Darold

Le 07/07/2021 à 18:50, Gilles Darold a écrit :


Great, I changing the state in the commitfest to "Ready for committers".


I'm attaching the v5 patch again as it doesn't appears in the Latest 
attachment list in the commitfest.



--
Gilles Darold
MigOps Inc

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c..6177a7b252 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -186,6 +186,8 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
 
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
+
 /*
  * Helper functions
  */
@@ -509,6 +511,54 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_CaseTestExpr:
+			{
+CaseTestExpr *c = (CaseTestExpr *) node;
+
+/*
+ * Collation rule is same as for function nodes.
+ */
+collation = c->collation;
+if (collation == InvalidOid)
+	state = FDW_COLLATE_NONE;
+else if (inner_cxt.state == FDW_COLLATE_SAFE &&
+		 collation == inner_cxt.collation)
+	state = FDW_COLLATE_SAFE;
+else if (collation == DEFAULT_COLLATION_OID)
+	state = FDW_COLLATE_NONE;
+else
+	state = FDW_COLLATE_UNSAFE;
+			}
+			break;
+		case T_CaseExpr:
+			{
+ListCell   *lc;
+ 
+/* Recurse to case clause subexpressions. */
+foreach(lc, ((CaseExpr *) node)->args)
+{
+	if (!foreign_expr_walker((Node *) lfirst(lc),
+			 glob_cxt, _cxt))
+		return false;
+}
+			}
+			break;
+		case T_CaseWhen:
+			{
+CaseWhen   *whenExpr = (CaseWhen *) node;
+ 
+/* Recurse to case clause expression. */
+if (!foreign_expr_walker((Node *) whenExpr->expr,
+		 glob_cxt, _cxt))
+	return false;
+/* Recurse to result expression. */
+if (!foreign_expr_walker((Node *) whenExpr->result,
+		 glob_cxt, _cxt))
+	return false;
+/* Don't apply exprType() to the case when expr. */
+check_type = false;
+			}
+			break;
 		case T_OpExpr:
 		case T_DistinctExpr:	/* struct-equivalent to OpExpr */
 			{
@@ -2462,6 +2512,9 @@ deparseExpr(Expr *node, deparse_expr_cxt *context)
 		case T_Aggref:
 			deparseAggref((Aggref *) node, context);
 			break;
+		case T_CaseExpr:
+			deparseCaseExpr((CaseExpr *) node, context);
+			break;
 		default:
 			elog(ERROR, "unsupported expression type for deparse: %d",
  (int) nodeTag(node));
@@ -3179,6 +3232,52 @@ appendAggOrderBy(List *orderList, List *targetList, deparse_expr_cxt *context)
 	}
 }
 
+/*
+ * Deparse CASE expression
+ */
+static void
+deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context)
+{
+	StringInfo  buf = context->buf;
+	ListCell   *lc = NULL;
+ 
+	appendStringInfoString(buf, "(CASE");
+ 
+	/* If this is a CASE arg WHEN clause process arg first */
+	if (node->arg != NULL)
+	{
+		appendStringInfoString(buf, " ");
+		deparseExpr(node->arg, context);
+	}
+ 
+	/* Add each condition/result of the CASE clause */
+	foreach(lc, node->args)
+	{
+		CaseWhen   *whenclause = (CaseWhen *) lfirst(lc);
+ 
+		/* WHEN */
+		appendStringInfoString(buf, " WHEN ");
+		if (node->arg == NULL)  /* CASE WHEN */
+			deparseExpr(whenclause->expr, context);
+		else/* CASE arg WHEN */
+			deparseExpr(lsecond(((OpExpr *) whenclause->expr)->args), context);
+ 
+		/* THEN */
+		appendStringInfoString(buf, " THEN ");
+		deparseExpr(whenclause->result, context);
+	}
+ 
+	/* add ELSE if needed */
+	if (node->defresult != NULL)
+	{
+		appendStringInfoString(buf, " ELSE ");
+		deparseExpr(node->defresult, context);
+	}
+ 
+	/* append END */
+	appendStringInfoString(buf, " END)");
+}
+
 /*
  * Print the representation of a parameter to be sent to the remote side.
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index b510322c4e..dfef4efd79 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -5561,6 +5561,150 @@ UPDATE ft2 d SET c2 = CASE WHEN random() >= 0 THEN d.c2 ELSE 0 END
 
 UPDATE ft2 d SET c2 = CASE WHEN random() >= 0 THEN d.c2 ELSE 0 END
   FROM ft2 AS t WHERE d.c1 = t.c1 AND d.c1 > 1000;
+-- Test CASE pushdown
+EXPLAIN (VERBOSE, COSTS OFF)
+UPDATE ft2 d SET c2 = CASE WHEN c2 > 0 THEN c2 ELSE 0 END
+WHERE c1 > 1000;
+   QUERY PLAN   
+
+ Update on public.ft2 d
+   ->  Foreign Update on public.ft2 d
+ Remote SQL: UPDATE "S 1"."T 1" SET c2 = (CASE WHEN (c2 > 0) THEN c2

Re: Case expression pushdown

2021-07-07 Thread Gilles Darold

Le 07/07/2021 à 17:39, Alexander Pyhalov a écrit :

Hi.

Gilles Darold писал 2021-07-07 15:02:


Le 22/06/2021 à 15:39, Alexander Pyhalov a écrit :


Seino Yuki писал 2021-06-22 16:03:
On 2021-06-16 01:29, Alexander Pyhalov wrote:
Hi.

Ashutosh Bapat писал 2021-06-15 16:24:
Looks quite useful to me. Can you please add this to the next
commitfest?

Addded to commitfest. Here is an updated patch version.


Thanks for posting the patch.
I agree with this content.


+ Foreign Scan on public.ft2 (cost=156.58..165.45 rows=394
width=14)

 It's not a big issue, but is there any intention behind the pattern
of
outputting costs in regression tests?

Hi.

No, I don't think it makes much sense. Updated tests (also added case
with empty else).

The patch doesn't apply anymore to master, I join an update of your
patch update in attachment. This is your patch rebased and untouched
minus a comment in the test and renamed to v4.

I could have miss something but I don't think that additional struct
elements case_args in structs foreign_loc_cxt and deparse_expr_cxt are
necessary. They look to be useless.


I thought we should compare arg collation and expression collation and 
didn't suggest, that we can take CaseTestExpr's collation directly, 
without deriving it from CaseExpr's arg. Your version of course looks 
saner.




The patch will also be more clear if the CaseWhen node was handled
separately in foreign_expr_walker() instead of being handled in the
T_CaseExpr case. By this way the T_CaseExpr case just need to call
recursively foreign_expr_walker(). I also think that code in
T_CaseTestExpr should just check the collation, there is nothing more
to do here like you have commented the function deparseCaseTestExpr().
This function can be removed as it does nothing if the case_args
elements are removed.

There is a problem the regression test with nested CASE clauses:


EXPLAIN (VERBOSE, COSTS OFF)
SELECT c1,c2,c3 FROM ft2 WHERE CASE CASE WHEN c2 > 0 THEN c2 END
WHEN 100 THEN 601 WHEN c2 THEN c2 ELSE 0 END > 600 ORDER BY c1;


the original query use "WHERE CASE CASE WHEN" but the remote query is
not the same in the plan:


Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1" WHERE (((CASE WHEN
((CASE WHEN (c2 > 0) THEN c2 ELSE NULL::integer END) = 100) THEN 601
WHEN ((CASE WHEN (c2 > 0) THEN c2 ELSE NULL::integer END) = c2) THEN
c2 ELSE 0 END) > 600)) ORDER BY "C 1" ASC NULLS LAST


Here this is "WHERE (((CASE WHEN ((CASE WHEN" I expected it to be
unchanged to "WHERE (((CASE (CASE WHEN".


I'm not sure this is an issue (as we change CASE A WHEN B ... to CASE 
WHEN (A=B)),

and expressions should be free from side effects, but again your version
looks better.



Right it returns the same result but I think this is confusing to not 
see the same case form in the remote query.





Thanks for improving the patch, it looks saner now.



Great, I changing the state in the commitfest to "Ready for committers".


--
Gilles Darold
MigOps Inc





Re: Case expression pushdown

2021-07-07 Thread Gilles Darold

Le 22/06/2021 à 15:39, Alexander Pyhalov a écrit :

Seino Yuki писал 2021-06-22 16:03:

On 2021-06-16 01:29, Alexander Pyhalov wrote:

Hi.

Ashutosh Bapat писал 2021-06-15 16:24:
Looks quite useful to me. Can you please add this to the next 
commitfest?




Addded to commitfest. Here is an updated patch version.


Thanks for posting the patch.
I agree with this content.


+ Foreign Scan on public.ft2 (cost=156.58..165.45 rows=394 width=14)

It's not a big issue, but is there any intention behind the pattern of
outputting costs in regression tests?


Hi.

No, I don't think it makes much sense. Updated tests (also added case 
with empty else).



The patch doesn't apply anymore to master, I join an update of your 
patch update in attachment. This is your patch rebased and untouched 
minus a comment in the test and renamed to v4.



I could have miss something but I don't think that additional struct 
elements case_args in structs foreign_loc_cxt and deparse_expr_cxt are 
necessary. They look to be useless.


The patch will also be more clear if the CaseWhen node was handled 
separately in foreign_expr_walker() instead of being handled in the 
T_CaseExpr case. By this way the T_CaseExpr case just need to call 
recursively foreign_expr_walker(). I also think that code in 
T_CaseTestExpr should just check the collation, there is nothing more to 
do here like you have commented the function deparseCaseTestExpr(). This 
function can be removed as it does nothing if the case_args elements are 
removed.



There is a problem the regression test with nested CASE clauses:

   EXPLAIN (VERBOSE, COSTS OFF)
   SELECT c1,c2,c3 FROM ft2 WHERE CASE CASE WHEN c2 > 0 THEN c2 END
   WHEN 100 THEN 601 WHEN c2 THEN c2 ELSE 0 END > 600 ORDER BY c1;

the original query use "WHERE CASE CASE WHEN" but the remote query is 
not the same in the plan:


   Remote SQL: SELECT "C 1", c2, c3 FROM "S 1"."T 1" WHERE (((CASE WHEN
   ((CASE WHEN (c2 > 0) THEN c2 ELSE NULL::integer END) = 100) THEN 601
   WHEN ((CASE WHEN (c2 > 0) THEN c2 ELSE NULL::integer END) = c2) THEN
   c2 ELSE 0 END) > 600)) ORDER BY "C 1" ASC NULLS LAST

Here this is "WHERE (((CASE WHEN ((CASE WHEN" I expected it to be 
unchanged to "WHERE (((CASE (CASE WHEN".



Also I would like the following regression tests to be added. It test 
that the CASE clause in aggregate and function is pushed down as well as 
the aggregate function. This was the original use case that I wanted to 
fix with this feature.


   -- CASE in aggregate function, both must be pushed down
   EXPLAIN (VERBOSE, COSTS OFF)
   SELECT sum(CASE WHEN mod(c1, 4) = 0 THEN 1 ELSE 2 END) FROM ft1;
   -- Same but without the ELSE clause
   EXPLAIN (VERBOSE, COSTS OFF)
   SELECT sum(CASE WHEN mod(c1, 4) = 0 THEN 1 END) FROM ft1;


For convenience I'm attaching a new patch v5 that change the code 
following my comments above, fix the nested CASE issue and adds more 
regression tests.



Best regards,

--
Gilles Darold

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c..f2441d4945 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -87,6 +87,7 @@ typedef struct foreign_loc_cxt
 {
 	Oid			collation;		/* OID of current collation, if any */
 	FDWCollateState state;		/* state of current collation choice */
+	Expr	   *case_arg;		/* the last case arg to inspect */
 } foreign_loc_cxt;
 
 /*
@@ -101,6 +102,7 @@ typedef struct deparse_expr_cxt
  * a base relation. */
 	StringInfo	buf;			/* output buffer to append to */
 	List	  **params_list;	/* exprs that will become remote Params */
+	List	   *case_args;		/* list of args to deparse CaseTestExpr */
 } deparse_expr_cxt;
 
 #define REL_ALIAS_PREFIX	"r"
@@ -186,6 +188,9 @@ static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
 
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
+static void deparseCaseTestExpr(CaseTestExpr *node, deparse_expr_cxt *context);
+
 /*
  * Helper functions
  */
@@ -254,6 +259,7 @@ is_foreign_expr(PlannerInfo *root,
 		glob_cxt.relids = baserel->relids;
 	loc_cxt.collation = InvalidOid;
 	loc_cxt.state = FDW_COLLATE_NONE;
+	loc_cxt.case_arg = NULL;
 	if (!foreign_expr_walker((Node *) expr, _cxt, _cxt))
 		return false;
 
@@ -312,6 +318,7 @@ foreign_expr_walker(Node *node,
 	/* Set up inner_cxt for possible recursion to child nodes */
 	inner_cxt.collation = InvalidOid;
 	inner_cxt.state = FDW_COLLATE_NONE;
+	inner_cxt.case_arg = outer_cxt->case_arg;
 
 	switch (nodeTag(node))
 	{
@@ -509,6 +516,62 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_CaseExpr:
+			{
+CaseExpr   *ce = (CaseExpr *) node;
+ListCell   *arg;
+
+if (ce->arg)
+	inner_cxt.c

Re: [PATCH][postgres_fdw] Add push down of CASE WHEN clauses

2021-07-06 Thread Gilles Darold
Le 07/07/2021 à 06:59, David Rowley a écrit :
> On Wed, 7 Jul 2021 at 10:18, Gilles Darold  wrote:
>> I have noticed that postgres_fdw do not push down the CASE WHEN clauses. In 
>> the following case this normal:
> This looks very similar to [1] which is in the current commitfest.
>
> Are you able to look over that patch and check to ensure you're not
> doing anything extra that the other patch isn't. If so, then likely
> the best way to progress would be for you to test and review that
> patch.
>
> David
>
> [1] https://commitfest.postgresql.org/33/3171/


Strange I have searched the commitfest yesterday but without success,
this is clearly a duplicate. Anyway, thanks for the pointer and yes I
will review Alexander's patch as I know the subject now :-)


Best regards

-- 
Gilles Darold
MigOps Inc (https://migops.com/)





[PATCH][postgres_fdw] Add push down of CASE WHEN clauses

2021-07-06 Thread Gilles Darold

Hi,


I have noticed that postgres_fdw do not push down the CASE WHEN clauses. 
In the following case this normal:


   contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT (CASE WHEN
   mod(c1, 4) = 0 THEN 1 ELSE 2 END) FROM ft1;
   QUERY PLAN
   
-
 Foreign Scan on public.ft1  (cost=100.00..146.00 rows=1000
   width=4) (actual time=0.306..0.844 rows=822 loops=1)
   Output: CASE WHEN (mod(c1, 4) = 0) THEN 1 ELSE 2 END
   Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
 Planning Time: 0.139 ms
 Execution Time: 1.057 ms
   (5 rows)


but in these other cases this is a performances killer, all records are 
fetched



   contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT sum(CASE WHEN
   mod(c1, 4) = 0 THEN 1 ELSE 2 END) FROM ft1;
  QUERY PLAN
   
---
 Aggregate  (cost=148.50..148.51 rows=1 width=8) (actual
   time=1.421..1.422 rows=1 loops=1)
   Output: sum(CASE WHEN (mod(c1, 4) = 0) THEN 1 ELSE 2 END)
   ->  Foreign Scan on public.ft1  (cost=100.00..141.00 rows=1000
   width=4) (actual time=0.694..1.366 rows=822 loops=1)
 Output: c1
 Remote SQL: SELECT "C 1" FROM "S 1"."T 1"
 Planning Time: 1.531 ms
 Execution Time: 3.901 ms
   (7 rows)


   contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM ft1
   WHERE c1 > (CASE WHEN mod(c1, 4) = 0 THEN 1 ELSE 100 END);
   QUERY PLAN
   
-
 Foreign Scan on public.ft1  (cost=100.00..148.48 rows=333
   width=47) (actual time=0.763..3.003 rows=762 loops=1)
   Output: c1, c2, c3, c4, c5, c6, c7, c8
   Filter: (ft1.c1 > CASE WHEN (mod(ft1.c1, 4) = 0) THEN 1 ELSE 100
   END)
   Rows Removed by Filter: 60
   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S
   1"."T 1"
 Planning Time: 0.584 ms
 Execution Time: 3.392 ms
   (7 rows)


The attached patch adds push down of CASE WHEN clauses. Queries above 
have the following plans when this patch is applied:



   contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT sum(CASE WHEN
   mod(c1, 4) = 0 THEN 1 ELSE 2 END) FROM ft1;
  QUERY PLAN
   
--
 Foreign Scan  (cost=107.50..128.53 rows=1 width=8) (actual
   time=2.022..2.024 rows=1 loops=1)
   Output: (sum(CASE WHEN (mod(c1, 4) = 0) THEN 1 ELSE 2 END))
   Relations: Aggregate on (public.ft1)
   Remote SQL: SELECT sum(CASE  WHEN (mod("C 1", 4) = 0) THEN 1
   ELSE 2 END) FROM "S 1"."T 1"
 Planning Time: 0.252 ms
 Execution Time: 2.684 ms
   (6 rows)

   contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM ft1
   WHERE c1 > (CASE WHEN mod(c1, 4) = 0 THEN 1 ELSE 100 END);
   QUERY PLAN

   

   --
 Foreign Scan on public.ft1  (cost=100.00..135.16 rows=333
   width=47) (actual time=1.797..3.463 rows=762 loops=1)
   Output: c1, c2, c3, c4, c5, c6, c7, c8
   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S
   1"."T 1" WHERE (("C 1" > CASE  WHEN (mod("C 1", 4) = 0)
   THEN 1 ELSE 100 END))
 Planning Time: 0.745 ms
     Execution Time: 3.860 ms
   (5 rows)


I don't see a good reason to never push the CASE WHEN clause but perhaps 
I'm missing something, any though?



Best regards,

--
Gilles Darold
MigOps Inc (http://migops.com)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 31919fda8c..8b8ca91e25 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -185,6 +185,7 @@ static void appendAggOrderBy(List *orderList, List *targetList,
 static void appendFunctionName(Oid funcid, deparse_expr_cxt *context);
 static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno,
 	deparse_expr_cxt *context);
+static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context);
 
 /*
  * Helper functions
@@ -796,6 +797,53 @@ foreign_expr_walker(Node *node,
 	state = FDW_COLLATE_UNSAFE;
 			}
 			break;
+		case T_CaseTestExpr:
+			{
+CaseTestExpr *c = (CaseTestExpr *) node;
+
+/*
+ * If the expression has nondefault collation, either it's of a
+ * non-builtin type,

Re: [PATCH] Hooks at XactCommand level

2021-07-05 Thread Gilles Darold

Le 01/07/2021 à 18:47, Tom Lane a écrit :

Nicolas CHAHWEKILIAN  writes:

As far as I am concerned, I am totally awaiting for this kind of feature
exposed here, for one single reason at this time : the extension
pg_statement_rollback will be much more valuable with the ability of
processing "rollback to savepoint" without the need for explicit
instruction from client side (and this patch is giving this option).

What exactly do these hooks do that isn't done as well or better
by the RegisterXactCallback and RegisterSubXactCallback mechanisms?
Perhaps we need to define some additional event types for those?
Or pass more data to the callback functions?

I quite dislike inventing a hook that's defined as "run during
start_xact_command", because there is basically nothing that's
not ad-hoc about that function: it's internal to postgres.c
and both its responsibilities and its call sites have changed
over time.  I think anyone hooking into that will be displeased
by the stability of their results.



Sorry Tom, it seems that I have totally misinterpreted your comments, 
google translate was not a great help for my understanding but Julien 
was. Thanks Julien.



I'm joining a new patch v4 that removes the need of any hook and adds a 
new events XACT_EVENT_COMMAND_START and SUBXACT_EVENT_COMMAND_START that 
can be cautch in the xact callbacks when a new command is to be executed.



--
Gilles Darold
http://www.darold.net/

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 441445927e..3b5f6bfc2d 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -6118,3 +6118,32 @@ MarkSubTransactionAssigned(void)
 
 	CurrentTransactionState->assigned = true;
 }
+
+/*
+ * CallXactStartCommand
+ *
+ * Wrapper over CallXactCallbacks or CallSubXactCallbacks called in postgres.c
+ * at end of start_xact_command(). It allows to user-defined code to be executed
+ * for the start of any command through a xact registered callback. This function
+ * do nothing if a transaction is not started.
+ *
+ * The related events XactEvent/SubXactEvent are XACT_EVENT_COMMAND_START and
+ * SUBXACT_EVENT_COMMAND_START.
+ */
+void
+CallXactStartCommand(void)
+{
+	TransactionState s = CurrentTransactionState;
+
+	if (s->blockState == TBLOCK_DEFAULT || s->blockState == TBLOCK_STARTED)
+		return;
+
+	/*
+	 * Call start-of-xact callbacks with start command event
+	 */
+	if (s->parent && s->subTransactionId)
+		CallSubXactCallbacks(SUBXACT_EVENT_COMMAND_START, s->subTransactionId,
+			 s->parent->subTransactionId);
+	else
+		CallXactCallbacks(XACT_EVENT_COMMAND_START);
+}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..a0f4a17c51 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2708,6 +2708,16 @@ start_xact_command(void)
 		!get_timeout_active(CLIENT_CONNECTION_CHECK_TIMEOUT))
 		enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
 			 client_connection_check_interval);
+
+	/*
+	 * Instruct registered callbacks on xact that a new command is to be
+	 * executed. It allows to execute user-defined code before any new
+	 * statement is executed. This is the responsability of the user code
+	 * to take care of the state of the transaction, it can be in an ABORT
+	 * state. The related xact events are XACT_EVENT_COMMAND_START and
+	 * SUBXACT_EVENT_COMMAND_START.
+	 */
+	CallXactStartCommand();
 }
 
 static void
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 134f6862da..190fc7151b 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -119,7 +119,8 @@ typedef enum
 	XACT_EVENT_PREPARE,
 	XACT_EVENT_PRE_COMMIT,
 	XACT_EVENT_PARALLEL_PRE_COMMIT,
-	XACT_EVENT_PRE_PREPARE
+	XACT_EVENT_PRE_PREPARE,
+	XACT_EVENT_COMMAND_START
 } XactEvent;
 
 typedef void (*XactCallback) (XactEvent event, void *arg);
@@ -129,7 +130,8 @@ typedef enum
 	SUBXACT_EVENT_START_SUB,
 	SUBXACT_EVENT_COMMIT_SUB,
 	SUBXACT_EVENT_ABORT_SUB,
-	SUBXACT_EVENT_PRE_COMMIT_SUB
+	SUBXACT_EVENT_PRE_COMMIT_SUB,
+	SUBXACT_EVENT_COMMAND_START
 } SubXactEvent;
 
 typedef void (*SubXactCallback) (SubXactEvent event, SubTransactionId mySubid,
@@ -467,4 +469,6 @@ extern void EnterParallelMode(void);
 extern void ExitParallelMode(void);
 extern bool IsInParallelMode(void);
 
+extern void CallXactStartCommand(void);
+
 #endif			/* XACT_H */


Re: [PATCH] Hooks at XactCommand level

2021-07-03 Thread Gilles Darold
Le 01/07/2021 à 18:47, Tom Lane a écrit :
> Nicolas CHAHWEKILIAN  writes:
>> As far as I am concerned, I am totally awaiting for this kind of feature
>> exposed here, for one single reason at this time : the extension
>> pg_statement_rollback will be much more valuable with the ability of
>> processing "rollback to savepoint" without the need for explicit
>> instruction from client side (and this patch is giving this option).
> What exactly do these hooks do that isn't done as well or better
> by the RegisterXactCallback and RegisterSubXactCallback mechanisms?
> Perhaps we need to define some additional event types for those?
> Or pass more data to the callback functions?


Sorry it take me time to recall the reason of the hooks. Actually the
problem is that the callbacks are not called when a statement is
executed after an error so that we fall back to error:

    ERROR:  current transaction is aborted, commands ignored until end
of transaction block

For example with the rollback at statement level extension:


BEGIN;
UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1; -- >>>>> will fail
LOG:  statement: UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1;
ERROR:  invalid input syntax for type integer: "two"
LINE 1: UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1;
    ^
UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1; -- >>>>> will
fail again
LOG:  statement: UPDATE tbl_rsl SET id = 'two', val = 2 WHERE id = 1;
ERROR:  current transaction is aborted, commands ignored until end
of transaction block
SELECT * FROM tbl_rsl; -- Should show records id 1 + 2
LOG:  statement: SELECT * FROM tbl_rsl;
ERROR:  current transaction is aborted, commands ignored until end
of transaction block


With the exention and the hook on start_xact_command() we can continue
and execute all the following statements.


I have updated the patch to only keep the hook on start_xact_command(),
as you've suggested the other hooks can be replaced by the use of the
xact callback. The extension has also been updated for testing the
feature, available here https://github.com/darold/pg_statement_rollbackv2


> I quite dislike inventing a hook that's defined as "run during
> start_xact_command", because there is basically nothing that's
> not ad-hoc about that function: it's internal to postgres.c
> and both its responsibilities and its call sites have changed
> over time.  I think anyone hooking into that will be displeased
> by the stability of their results.

Unfortunately I had not found a better solution, but I just tried with
placing the hook in function BeginCommand() in src/backend/tcop/dest.c
and the extension is working as espected. Do you think it would be a
better place?In this case I can update the patch. For this feature we
need a hook that is executed before any command even if the transaction
is in abort state to be able to inject the rollback to savepoint, maybe
I'm not looking at the right place to do that.


Thanks

-- 
Gilles Darold
http://www.darold.net/

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8cea10c901..4b9f8eeb3c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -207,6 +207,8 @@ static void log_disconnections(int code, Datum arg);
 static void enable_statement_timeout(void);
 static void disable_statement_timeout(void);
 
+/* Hooks for plugins to get control at end of start_xact_command() */
+XactCommandStart_hook_type start_xact_command_hook = NULL;
 
 /* 
  *		routines to obtain user input
@@ -2708,6 +2710,13 @@ start_xact_command(void)
 		!get_timeout_active(CLIENT_CONNECTION_CHECK_TIMEOUT))
 		enable_timeout_after(CLIENT_CONNECTION_CHECK_TIMEOUT,
 			 client_connection_check_interval);
+
+	/*
+	 * Now give loadable modules a chance to execute code before a transaction
+	 * command is processed.
+	 */
+	if (start_xact_command_hook)
+		(*start_xact_command_hook) ();
 }
 
 static void
diff --git a/src/include/tcop/pquery.h b/src/include/tcop/pquery.h
index 2318f04ff0..540ede42fd 100644
--- a/src/include/tcop/pquery.h
+++ b/src/include/tcop/pquery.h
@@ -48,4 +48,8 @@ extern bool PlannedStmtRequiresSnapshot(struct PlannedStmt *pstmt);
 
 extern void EnsurePortalSnapshotExists(void);
 
+/* Hook for plugins to get control in start_xact_command() and finish_xact_command() */
+typedef void (*XactCommandStart_hook_type) (void);
+extern PGDLLIMPORT XactCommandStart_hook_type start_xact_command_hook;
+
 #endif			/* PQUERY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 64c06cf952..f473c2dc39 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -2934,6 +2934,7 @@ XPVIV
 XPVMG
 XactCallback
 XactCallbackItem
+XactCommandStart_hook_type
 XactEvent
 XactLockTableWaitInfo
 XidBoundsViolation


Re: Deparsing rewritten query

2021-06-28 Thread Gilles Darold
Le 28/06/2021 à 18:41, Julien Rouhaud a écrit :
> Thanks for the feedback Gilles!
>
> On Mon, Jun 28, 2021 at 04:06:54PM +0200, Gilles Darold wrote:
>> If we could at least call get_query_def()through an extension if we didn't
>> have a functionit would be ideal for DBAs.I agree this is unusual but when
>> it does happen to you being able to call get_query_def () helps a lot.
> Since at least 2 other persons seems to be interested in that feature, I can
> take care of writing and maintaining such an extension, provided that the
> required infrastructure is available in core.
>
> PFA v2 of the patch which only adds the required alias and expose
> get_query_def().


Thanks a lot Julien, such facilities are really helpful for DBAs and
make the difference with other RDBMS. I don't think that this feature
exists else where.


-- 
Gilles Darold
http://www.darold.net/






Re: Deparsing rewritten query

2021-06-28 Thread Gilles Darold

Le 27/06/2021 à 17:14, Tom Lane a écrit :

Julien Rouhaud  writes:

On Sun, Jun 27, 2021 at 10:34:52AM -0400, Tom Lane wrote:

It's not very hard to imagine someday moving view
expansion into the planner on efficiency grounds, leaving the rewriter
handling only the rare uses of INSERT/UPDATE/DELETE rules.

Agreed.  One the other hand having such a function in core may ensure that any
significant change in those area will keep an API to retrieve the final query
representation.

My point is precisely that I'm unwilling to make such a promise.

I do not buy that this capability is worth very much, given that
we've gotten along fine without it for twenty-plus years.  If you
want to have it as an internal, might-change-at-any-time API,
that seems all right.  If you're trying to lock it down as something
that will be there forevermore, you're likely to end up with nothing.

regards, tom lane



I have to say that such feature would be very helpful for some DBA and 
especially migration work. The problem is when you have tons of views 
that call other views in the from or join clauses. These views also call 
other views, etc. I have had instances where there were up to 25 nested 
views calls. When you want to clean up this kind of code, using 
get_query_def () will help save a lot of manual rewrite time and 
headache to get the final code executed.



If we could at least call get_query_def()through an extension if we 
didn't have a functionit would be ideal for DBAs.I agree this is unusual 
but when it does happen to you being able to call get_query_def () helps 
a lot.



--
Gilles Darold
http://www.darold.net/



Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Gilles Darold
Le 21/03/2021 à 15:53, Tom Lane a écrit :
> Chapman Flack  writes:
>> If this turns out to be a case of "attached the wrong patch, here's
>> the one that does implement foo_regex functions!" then I reserve an
>> objection to that. :)
>>

And the patch renamed.

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..36e446cb7b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,66 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Returns the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Returns the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text [, flags text ] )
+boolean
+   
+   
+Evaluates the existence of a match to a POSIX regular expression
+in string; see .
+   
+   
+regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
+t
+   
+  
+
+
   

 
@@ -3144,7 +3204,7 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text [, position integer ] [, occurence integer ]  [, flags text ] )
 text


@@ -3157,6 +3217,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

 
@@ -5295,6 +5373,18 @@ substring('foobar' similar '#"o_b#"%' escape '#')NULL
 regexp_split_to_array

+   
+regexp_like
+   
+   
+regexp_count
+   
+   
+regexp_instr
+   
+   
+regexp_substr
+   
 

  lists the available
@@ -5451,6 +5541,8 @@ substring('foobar' from 'o(.)b')   o
  It has the syntax
  regexp_replace(source,
  pattern, replacement
+ , position 
+ , occurrence 
  , flags ).
  The source string is returned unchanged if
  there is no match to the pattern.  If there is a
@@ -5464,12 +5556,19 @@ substring('foobar' from 'o(.)b')   o
  substring matching the entire pattern should be inserted.  Write
  \\ if you need to put a literal backslash in the replacement
  text.
- The flags parameter is an optional text
- string containing zero or more single-letter flags that change the
+ pattern is searched in string starting
+ from an optional position or from the beginning
+ of source by default.  Optional occurrence
+ parameter indicates which occurrence of pattern in
+ source should be replaced.  Default value for occurrence
+ is 1, replace only the first occurrence.  The flags parameter
+ is an optional text string containing zero or more single-letter flags that change the
  function's behavior.  Flag i specifies case-insensitive
  matching, while flag g specifies replacement of each matching
- substring rather than only the first one.  Supported flags (though
- not g) are
+ substring rather than only the first one. When occurrence
+ is set flag g has no effect. If occurrence
+ is set to zero, all occurrences are replaced which is similar to flag g.
+ Supported flags (though not g) are
  described in .
 
 
@@ -5482,6 +5581,10 @@ regexp_replace('foobarbaz', 'b..', 'X', 'g')
fooXX
 regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
fooXarYXazY
+regexp_replace('A PostgreSQL function', 

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Gilles Darold
Le 21/03/2021 à 15:53, Tom Lane a écrit :
> Chapman Flack  writes:
>> If this turns out to be a case of "attached the wrong patch, here's
>> the one that does implement foo_regex functions!" then I reserve an
>> objection to that. :)
> +1 to that.  Just to add a note, I do have some ideas about extending
> our regex parser so that it could duplicate the XQuery syntax --- none
> of the points we mention in 9.7.3.8 seem insurmountable.  I'm not
> planning to work on that in the near future, mind you, but I definitely
> think that we don't want to paint ourselves into a corner where we've
> already implemented the XQuery regex functions with the wrong behavior.
>
>   regards, tom lane


I apologize for confusing with the words and phrases I have used. This
patch implements the regexp_foo () functions which are available in most
RDBMS with the behavior described in the documentation. I have modified
the title of the patch in the commitfest to removed wrong use of XQUERY. 


I don't know too if the other RDBMS respect the XQUERY behavior but for
what I've seen for Oracle they are using limited regexp modifiers with
sometime not the same letter than PostgreSQL for the same behavior. I
have implemented these functions with the Oracle behavior in Orafce [1]
with a function that checks the modifiers used. This patch doesn't mimic
the Oracle behavior, it use the PostgreSQL behavior with regexp, the one
used by regex_replace() and regex_matches(). All regexp modifiers can be
used.


[1] https://github.com/orafce/orafce/blob/master/orafce--3.14--3.15.sql


-- 
Gilles Darold
http://www.darold.net/



Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Gilles Darold
Le 21/03/2021 à 12:07, e...@xs4all.nl a écrit :
>> On 2021.03.20. 19:48 Gilles Darold  wrote:
>>  
>> This is a new version of the patch that now implements all the XQUERY
>> regexp functions as described in the standard, minus the differences of
>> PostgerSQL regular expression explain in [1].
>>
>> The standard SQL describe functions like_regex(), occurrences_regex(),
>> position_regex(), substring_regex() and translate_regex() which
>> correspond to the commonly named functions regexp_like(),
>> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
>> reported by Chapman Flack in [2]. All these function are implemented in
>> [v2-0001-xquery-regexp-functions.patch]
> Hi,
>
> Apply, compile and (world)check are fine. I haven't found errors in 
> functionality.
>
> I went through the docs, and came up with these changes in func.sgml, and 
> pg_proc.dat.
>
> Useful functions - thanks!
>
> Erik Rijkers


Thanks a lot Erik, here is a version of the patch with your corrections.


-- 
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..36e446cb7b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,66 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Returns the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Returns the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text [, flags text ] )
+boolean
+   
+   
+Evaluates the existence of a match to a POSIX regular expression
+in string; see .
+   
+   
+regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
+t
+   
+  
+
+
   

 
@@ -3144,7 +3204,7 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text [, position integer ] [, occurence integer ]  [, flags text ] )
 text


@@ -3157,6 +3217,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

 
@@ -5295,6 +5373,18 @@ substring('foobar' similar '#"o_b#"%' escape '#')NULL
 regexp_split_to_array

+   
+regexp_like
+   
+   
+regexp_count
+   
+   
+regexp_instr
+   
+   
+regexp_substr
+   
 

  lists the available
@@ -5451,6 +5541,8 @@ substring('foobar' from 'o(.)b')   o
  It has the syntax
  regexp_replace(source,
  pattern, replacement
+ , position 
+ , occurrence 
  , flags ).
  The source string is returned unchanged if
  there is no match to the pattern.  If there is a
@@ -5464,12 +5556,19 @@ substring('foobar' from 'o(.)b')   o
  substring matching the entire pattern should be inserted.  Write
  \\ if you need to put a literal backslash in the replacement
  text.
- The flags parameter is an optional text
- string containing zero or more single-letter flags that change the
+ pattern is searched in string starting
+ from an optional position or from the beginning
+ of source by default.  Optional occurrence
+ parameter indicates which occurrence of pattern in
+ source should be replaced.  Default value 

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Gilles Darold
Le 20/03/2021 à 19:48, Gilles Darold a écrit :
>
> Hi,
>
>
> This is a new version of the patch that now implements all the XQUERY
> regexp functions as described in the standard, minus the differences
> of PostgerSQL regular expression explain in [1].
>
>
> The standard SQL describe functions like_regex(), occurrences_regex(),
> position_regex(), substring_regex() and translate_regex() which
> correspond to the commonly named functions regexp_like(),
> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace()
> as reported by Chapman Flack in [2]. All these function are
> implemented in the patch. Syntax of the functions are:
>
>
> - regexp_like(string, pattern [, flags ])
>
> - regexp_count( string, pattern [, position ] [, flags ])
>
> - regexp_instr( string, pattern [, position ] [, occurrence ] [,
> returnopt ] [, flags ] [, group ])
>
> - regexp_substr( string, pattern [, position ] [, occurrence ] [,
> flags ] [, group ])
>
> - regexp_replace(source, pattern, replacement [, position ] [,
> occurrence ] [, flags ])
>
>
> In addition to previous patch version I have added the regexp()_like
> function and extended the existsing regex_replace() function. The
> patch documents these functions and adds regression tests for all
> functions. I will add it to the commitfest.
>
>
> An other regexp functions regexp_positions() that returns all
> occurrences that matched a POSIX regular expression is also developped
> by Joel Jacobson, see [2]. This function expands the list of regexp
> functions described in XQUERY.
>
>
> [1]
> https://www.postgresql.org/docs/13/functions-matching.html#FUNCTIONS-POSIX-REGEXP
>
> [2]
> https://www.postgresql.org/message-id/flat/bfd5-909d-408b-8531-95b32f18d4ab%40www.fastmail.com#3ec8ba658eeabcae2ac6ccca33bd1aed
>
>

I would like to see these functions in PG 14 but it is a bit too late,
added to commitfest 2021-07.


-- 
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/



Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-20 Thread Gilles Darold
Hi,


This is a new version of the patch that now implements all the XQUERY
regexp functions as described in the standard, minus the differences of
PostgerSQL regular expression explain in [1].


The standard SQL describe functions like_regex(), occurrences_regex(),
position_regex(), substring_regex() and translate_regex() which
correspond to the commonly named functions regexp_like(),
regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
reported by Chapman Flack in [2]. All these function are implemented in
the patch. Syntax of the functions are:


- regexp_like(string, pattern [, flags ])

- regexp_count( string, pattern [, position ] [, flags ])

- regexp_instr( string, pattern [, position ] [, occurrence ] [,
returnopt ] [, flags ] [, group ])

- regexp_substr( string, pattern [, position ] [, occurrence ] [, flags
] [, group ])

- regexp_replace(source, pattern, replacement [, position ] [,
occurrence ] [, flags ])


In addition to previous patch version I have added the regexp()_like
function and extended the existsing regex_replace() function. The patch
documents these functions and adds regression tests for all functions. I
will add it to the commitfest.


An other regexp functions regexp_positions() that returns all
occurrences that matched a POSIX regular expression is also developped
by Joel Jacobson, see [2]. This function expands the list of regexp
functions described in XQUERY.


[1]
https://www.postgresql.org/docs/13/functions-matching.html#FUNCTIONS-POSIX-REGEXP

[2]
https://www.postgresql.org/message-id/flat/bfd5-909d-408b-8531-95b32f18d4ab%40www.fastmail.com#3ec8ba658eeabcae2ac6ccca33bd1aed


-- 
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..f5f08f1509 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,66 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Return the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Return the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found, then the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text [, flags text ] )
+boolean
+   
+   
+Evaluate the existence of a match to a POSIX regular expression
+in string; see .
+   
+   
+regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
+3
+   
+  
+
+
   

 
@@ -3144,7 +3204,7 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text [, position integer ] [, occurence integer ]  [, flags text ] )
 text


@@ -3157,6 +3217,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

 
@@ -5295,6 +5373,18 @@ substring('foobar' similar '#"o_b#"%' escape '#')NULL
 regexp_split_to_array

+   
+regexp_like
+   
+   
+regexp_count
+   
+   
+regexp_instr
+   
+   
+regexp_substr
+   
 

  lists the available
@@ -5451,6 +5541,8 @@ substring('foobar' from 'o(.)b')   o
  It has the syntax
  regexp_replace(source,
  pattern, replacement
+ , position 
+ , occurrence 
  , flags ).
  The source string is returned unchanged if
  there is no match to the pattern.  If there is a
@@ -5464,12 +5556,19 @@ substri

Re: [PATCH] Hooks at XactCommand level

2021-03-19 Thread Gilles Darold
Le 12/03/2021 à 06:55, Julien Rouhaud a écrit :
> Hi,
>
> On Tue, Dec 08, 2020 at 11:15:12AM +0100, Gilles Darold wrote:
>> Based on a PoC reported in a previous thread [1] I'd like to propose new
>> hooks around transaction commands. The objective of this patch is to
>> allow PostgreSQL extension to act at start and end (including abort) of
>> a SQL statement in a transaction.
>>
>> The idea for these hooks is born from the no go given to Takayuki
>> Tsunakawa's patch[2] proposing an in core implementation of
>> statement-level rollback transaction and the pg_statement_rollback
>> extension[3] that we have developed at LzLabs. The extension
>> pg_statement_rollback has two limitation, the first one is that the
>> client still have to call the ROLLBACK TO SAVEPOINT when an error is
>> encountered and the second is that it generates a crash when PostgreSQL
>> is compiled with assert that can not be fixed at the extension level.
> This topic came up quite often on the mailing list, the last being from Álvaro
> at [1].  I think there's a general agreement that customers want that feature,
> won't stop asking for it, and many if not all forks ended up implementing it.
>
> I would still prefer if he had a way to support if in vanilla postgres, with 
> of
> course all possible safeguards to avoid an epic fiasco.


I have added Alvarro and Takayuki to the thread, this patch is inspired
from their proposals. I wrote this patch after reading the thread and
concluding that a core implementation doesn't seems to make the
consensus and that this feature could be available to users through an
extension.


> I personally think that Álvaro's previous approach, giving the ability to
> specify the rollback behavior in the TransactionStmt grammar, would be enough
> (I mean without the GUC part) to cover realistic and sensible usecases, which
> is where the client fully decides whether there's a statement level rollback 
> or
> not.  One could probably build a custom module on top of that to introduce 
> some
> kind of GUC to change the behavior more globally if it wants to take that 
> risk.


Yes probably, with this patch I just want to propose an external
implementation of the feature. The extension implementation "just"
require these three hooks to propose the same feature as if it was
implemented in vanilla postgres. The feature can be simply enabled or
disabled by a custom user defined variable before a transaction is
started or globaly for all transaction.


> If such an approach is still not wanted for core inclusion, then I'm in favor
> of adding those hooks.  There's already a published extension that tries to
> implement that (for complete fairness I'm one of the people to blame), but as
> Gilles previously mentioned this is very hackish and the currently available
> hooks makes it very hard if not impossible to have a perfect implementation.
> It's clear that people will never stop to try doing it, so at least let's make
> it possible using a custom module.
>
> It's also probably worthwhile to mention that the custom extension 
> implementing
> server side statement level rollback wasn't implemented because it wasn't
> doable in the client side, but because the client side implementation was
> causing a really big overhead due to the need of sending the extra commands,
> and putting it on the server side lead to really significant performance
> improvement.

Right, the closer extension to reach this feature is the extension we
develop at LzLabs [2] but it still require a rollback to savepoint at
client side in case of error. The extension [3] using these hooks
doesn't have this limitation, everything is handled server side.


>> Although that I have not though about other uses for these hooks, they
>> will allow a full server side statement-level rollback feature like in
>> commercial DBMSs like DB2 and Oracle. This feature is very often
>> requested by users that want to migrate to PostgreSQL.
> I also thought about it, and I don't really see other possible usage for those
> hooks.


Yes I have not a lot of imagination too for possible other use for these
hooks but I hope that in itself this feature can justify them. I just
though that if we expose the query_string at command_start hook we could
allow its modification by external modules, but this is surely the worst
idea I can produce.


>> There is no additional syntax or GUC, the patch just adds three new hooks:
>>
>>
>> * start_xact_command_hook called at end of the start_xact_command()
>> function.
>> * finish_xact_command called in finish_xact_command() just before
>> CommitTransactionCommand().
>> * abort_current_transaction_hook called after an error is encountered at
>> end of 

Re: MultiXact\SLRU buffers configuration

2021-03-15 Thread Gilles Darold
Le 12/03/2021 à 13:44, Andrey Borodin a écrit :
>
>> 11 марта 2021 г., в 20:50, Gilles Darold  написал(а):
>>
>>
>> The patch doesn't apply anymore in master cause of error: patch failed: 
>> src/backend/utils/init/globals.c:150
>>
>>
>>
>> An other remark about this patch is that it should be mentionned in the 
>> documentation (doc/src/sgml/config.sgml) that the new configuration 
>> variables need a server restart, for example by adding "This parameter can 
>> only be set at server start." like for shared_buffers. Patch on 
>> postgresql.conf mention it.
>>
>> And some typo to be fixed:
>>
>>
>>
>> s/Tipically/Typically/
>>
>> s/asincronous/asyncronous/
>>
>> s/confugured/configured/
>>
>> s/substrnsactions/substransactions/
>>
>>
> Thanks, Gilles! Fixed.
>
> Best regards, Andrey Borodin.
>

Hi Andrey,

I found two problems in this patch, first in src/include/miscadmin.h
multixact_members_slru_buffers is declared twice:


 extern PGDLLIMPORT int max_parallel_workers;
+extern PGDLLIMPORT int multixact_offsets_slru_buffers;
+extern PGDLLIMPORT int multixact_members_slru_buffers;
+extern PGDLLIMPORT int multixact_members_slru_buffers;  <-
+extern PGDLLIMPORT int subtrans_slru_buffers;


In file src/backend/access/transam/multixact.c the second variable
should be multixact_buffers_slru_buffers and not
multixact_offsets_slru_buffers.


@@ -1848,13 +1848,13 @@ MultiXactShmemInit(void)
    MultiXactMemberCtl->PagePrecedes = MultiXactMemberPagePrecedes;

    SimpleLruInit(MultiXactOffsetCtl,
- "MultiXactOffset",
NUM_MULTIXACTOFFSET_BUFFERS, 0,
+ "MultiXactOffset",
multixact_offsets_slru_buffers, 0,
  MultiXactOffsetSLRULock,
"pg_multixact/offsets",
  LWTRANCHE_MULTIXACTOFFSET_BUFFER,
  SYNC_HANDLER_MULTIXACT_OFFSET);
    SlruPagePrecedesUnitTests(MultiXactOffsetCtl,
MULTIXACT_OFFSETS_PER_PAGE);
    SimpleLruInit(MultiXactMemberCtl,
- "MultiXactMember",
NUM_MULTIXACTMEMBER_BUFFERS, 0,
+ "MultiXactMember",
multixact_offsets_slru_buffers, 0,    <--
      MultiXactMemberSLRULock,
"pg_multixact/members",
  LWTRANCHE_MULTIXACTMEMBER_BUFFER,
  SYNC_HANDLER_MULTIXACT_MEMBER);



Please fix them so that I can end the review.


-- 
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/





Re: MultiXact\SLRU buffers configuration

2021-03-11 Thread Gilles Darold

Le 15/02/2021 à 18:17, Andrey Borodin a écrit :



23 дек. 2020 г., в 21:31, Gilles Darold  написал(а):

Sorry for the response delay, we have run several others tests trying to figure 
out the performances gain per patch but unfortunately we have very heratic 
results. With the same parameters and patches the test doesn't returns the same 
results following the day or the hour of the day. This is very frustrating and 
I suppose that this is related to the Azure architecture. The only thing that I 
am sure is that we had the best performances results with all patches and

multixact_offsets_slru_buffers = 256
multixact_members_slru_buffers = 512
multixact_local_cache_entries = 4096


but I can not say if all or part of the patches are improving the performances. 
My feeling is that performances gain related to patches 1 (shared lock) and 3 
(conditional variable) do not have much to do with the performances gain 
compared to just tuning the multixact buffers. This is when the multixact 
contention is observed but perhaps they are delaying the contention. It's all 
the more frustrating that we had a test case to reproduce the contention but 
not the architecture apparently.

Hi! Thanks for the input.
I think we have a consensus here that configuring SLRU size is beneficial for 
MultiXacts.
There is proposal in nearby thread [0] on changing default value of commit_ts 
SLRU buffers.
In my experience from time to time there can be problems with subtransactions 
cured by extending subtrans SLRU.

Let's make all SLRUs configurable?
PFA patch with draft of these changes.

Best regards, Andrey Borodin.


[0] 
https://www.postgresql.org/message-id/flat/20210115220744.GA24457%40alvherre.pgsql



The patch doesn't apply anymore in master cause of error: patch failed: 
src/backend/utils/init/globals.c:150



An other remark about this patch is that it should be mentionned in the 
documentation (doc/src/sgml/config.sgml) that the new configuration 
variables need a server restart, for example by adding "This parameter 
can only be set at server start." like for shared_buffers. Patch on 
postgresql.conf mention it.


And some typo to be fixed:


   s/Tipically/Typically/

   s/asincronous/asyncronous/

   s/confugured/configured/

   s/substrnsactions/substransactions/


--
Gilles Darold
LzLabs GmbH



Re: [PATCH] regexp_positions ( string text, pattern text, flags text ) → setof int4range[]

2021-03-04 Thread Gilles Darold

Le 04/03/2021 à 16:40, Tom Lane a écrit :

"Joel Jacobson"  writes:

Having abandoned the cute idea that didn't work,
here comes a new patch with a regexp_positions() instead returning
setof record (start_pos integer[], end_pos integer[]).

I wonder if a 2-D integer array wouldn't be a better idea,
ie {{startpos1,length1},{startpos2,length2},...}.  My experience
with working with parallel arrays in SQL has been unpleasant.

Also, did you see

https://www.postgresql.org/message-id/fc160ee0-c843-b024-29bb-97b5da61971f%40darold.net

Seems like there may be some overlap in these proposals.



The object of regexp_position() is to return all start+end of captured 
substrings, it overlaps a little with regexp_instr() in the way that 
this function returns the start or end position of a specific captured 
substring. I think it is a good idea to have a function that returns all 
positions instead of a single one like regexp_instr(), this is not the 
same usage. Actually regexp_position() is exactly the same as 
regexp_matches() except that it return positions instead of substrings.



I also think that it should return a setof 2-D integer array, an other 
solution is to return all start/end positions of an occurrence chained 
in an integer array {start1,end1,start2,end2,..}.



Regards,

--
Gilles Darold





Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-03 Thread Gilles Darold
My apologies for the links in the head, the email formatting and the
missing patch, I accidently send the email too early.

--

Gilles

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index bf99f82149..88e08b40d2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,47 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Return the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Return the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found, then the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
   

 
@@ -3157,6 +3198,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

 
@@ -5295,6 +5354,15 @@ substring('foobar' similar '#"o_b#"%' escape '#')NULL
 regexp_split_to_array

+   
+regexp_count
+   
+   
+regexp_instr
+   
+   
+regexp_substr
+   
 

  lists the available
@@ -5669,6 +5737,132 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
 in practice.  Other software systems such as Perl use similar definitions.

 
+
+ The regexp_count function returns the number of
+ captured substring(s) resulting from matching a POSIX regular
+ expression pattern to a string.  It has the syntax regexp_replace(
+ string, pattern ,
+ position  , flags ).
+ pattern is searched in string starting
+ from an optional position or from the beginning of string
+ by default.  The flags parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ regexp_count accepts all the flags
+ shown in .
+ The g flag is forced internally to count all matches.
+ This function returns 0 if there is no match or the number of match as
+ an integer.
+
+
+   
+Some examples:
+
+SELECT regexp_count('123123123123', '\d{3}', 1);
+ regexp_count 
+--
+4
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm');
+ regexp_count 
+--
+1
+(1 row)
+
+   
+
+
+ The regexp_instr function returns the beginning or ending
+ position of the matched substring resulting from matching a POSIX regular
+ expression pattern to a string.  It has the syntax regexp_instr(
+ string, pattern ,
+ position  , occurrence 
+ , returnopt 
+ , flags 
+ , group ).
+ pattern is searched in string starting
+ from an optional position or from the beginning
+ of string by default.  occurrence
+ indicates which occurrence of pattern in string
+ should be searched.  When returnopt is set to 0 (default) the function
+ returns the position of the first character of the occurrence.  When set to 1 returns the position
+ of the character after the occurrence.
+ The flags parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ regexp_count accepts all the flags
+ shown in .
+ The g flag is forced internally to track all matches.
+ For a pattern with capture groups, group is an integer indicating
+ which capture in pattern is the target of the function.  A capture group is a part of the pattern
+ enclosed in parentheses. Capture groups can be nested.  They are numbered in order in which their
+ left parentheses appear in pattern. If group is zero, then the position
+ of the entire substring that matches the pattern is returned. If pattern
+ does not have at least group 

[PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-03 Thread Gilles Darold
position and occurrence
capabilities.

The patch is ready for testing with documentation and regression tests.


Best regards,

-- 
Gilles Darold
LzLabs GmbH






Re: MultiXact\SLRU buffers configuration

2020-12-23 Thread Gilles Darold

Le 13/12/2020 à 18:24, Andrey Borodin a écrit :



13 дек. 2020 г., в 14:17, Gilles Darold  написал(а):

I've done more review on these patches.

Thanks, Gilles! I'll incorporate all your fixes to patchset.
Can you also benchmark conditional variable sleep? The patch "Add conditional 
variable to wait for next MultXact offset in corner case"?
The problem manifests on Standby when Primary is heavily loaded with 
MultiXactOffsetControlLock.

Thanks!

Best regards, Andrey Borodin.


Hi Andrey,


Sorry for the response delay, we have run several others tests trying to 
figure out the performances gain per patch but unfortunately we have 
very heratic results. With the same parameters and patches the test 
doesn't returns the same results following the day or the hour of the 
day. This is very frustrating and I suppose that this is related to the 
Azure architecture. The only thing that I am sure is that we had the 
best performances results with all patches and


   multixact_offsets_slru_buffers = 256
   multixact_members_slru_buffers = 512
   multixact_local_cache_entries = 4096

but I can not say if all or part of the patches are improving the 
performances. My feeling is that performances gain related to patches 1 
(shared lock) and 3 (conditional variable) do not have much to do with 
the performances gain compared to just tuning the multixact buffers. 
This is when the multixact contention is observed but perhaps they are 
delaying the contention. It's all the more frustrating that we had a 
test case to reproduce the contention but not the architecture apparently.



Can't do much more at this point.


Best regards,

--
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/



Re: [BUG] orphaned function

2020-12-18 Thread Gilles Darold

Le 18/12/2020 à 00:26, Tom Lane a écrit :

Gilles Darold  writes:

The same problem applies if the returned type or the procedural language
is dropped. I have tried to fix that in ProcedureCreate() by using an
AccessSharedLock for the data types and languages involved in the
function declaration. With this patch the race condition with parameters
types, return types and PL languages are fixed. Only data types and
procedural languages with Oid greater than FirstBootstrapObjectId will
be locked locked. But this is probably more complex than this fix so it
is proposed as a separate patch
(v1-003-orphaned_function_types_language.patch) to not interfere with
the applying of Bertran's patch.

Indeed.  This points up one of the things that is standing in the way
of any serious consideration of applying this patch.  To my mind there
are two fundamental, somewhat interrelated considerations that haven't
been meaningfully addressed:

1. What set of objects is it worth trying to remove this race condition
for, and with respect to what dependencies?  Bertrand gave no
justification for worrying about function-to-namespace dependencies
specifically, and you've likewise given none for expanding the patch
to consider function-to-datatype dependencies.  There are dozens more
cases that could be considered; but I sure don't want to be processing
another ad-hoc fix every time someone thinks they're worried about
another specific case.

Taking a practical viewpoint, I'm far more worried about dependencies
of tables than those of any other kind of object.  A messed-up function
definition doesn't cost you much: you can just drop and recreate the
function.  A table full of data is way more trouble to recreate, and
indeed the data might be irreplaceable.  So it seems pretty silly to
propose that we try to remove race conditions for functions' dependencies
on datatypes without removing the same race condition for tables'
dependencies on their column datatypes.

But any of these options lead to the same question: why stop there?
An approach that would actually be defensible, perhaps, is to incorporate
this functionality into the dependency mechanism: any time we're about to
create a pg_depend or pg_shdepend entry, take out an AccessShareLock on
the referenced object, and then check to make sure it still exists.
This would improve the dependency mechanism so it prevents creation-time
race conditions, not just attempts to drop dependencies of
already-committed objects.  It would mean that the one patch would be
the end of the problem, rather than looking forward to a steady drip of
new fixes.

2. How much are we willing to pay for this added protection?  The fact
that we've gotten along fine without it for years suggests that the
answer needs to be "not much".  But I'm not sure that that actually
is the answer, especially if we don't have a policy that says "we'll
protect against these race conditions but no other ones".  I think
there are possibly-serious costs in three different areas:

* Speed.  How much do all these additional lock acquisitions slow
down a typical DDL command?

* Number of locks taken per transaction.  This'd be particularly an
issue for pg_restore runs using single-transaction mode: they'd take
not only locks on the objects they create, but also a bunch of
reference-protection locks.  It's not very hard to believe that this'd
make a difference in whether restoring a large database is possible
without increasing max_locks_per_transaction.

* Risk of deadlock.  The reference locks themselves should be sharable,
so maybe there isn't much of a problem, but I want to see this question
seriously analyzed not just ignored.

Obviously, the magnitude of these costs depends a lot on how many
dependencies we want to remove the race condition for.  But that's
exactly the reason why I don't want a piecemeal approach of fixing
some problems now and some other problems later.  That's way too
much like the old recipe for boiling a frog: we could gradually get
into serious performance problems without anyone ever having stopped
to consider the issue.

In short, I think we should either go for a 100% solution if careful
analysis shows we can afford it, or else have a reasoned policy
why we are going to close these specific race conditions and no others
(implying that we'll reject future patches in the same area).  We
haven't got either thing in this thread as it stands, so I do not
think it's anywhere near being ready to commit.

regards, tom lane



Thanks for the review,


Honestly I have never met these races condition in 25 years of 
PostgreSQL and will probably never but clearly I don't have the Amazon 
database services workload. I let Bertrand decide what to do with the 
patch and address the races condition with a more global approach 
following your advices if more work is done on this topic. I tag the 
patch as "Waiting on author".



Best regards,

--

Re: [UNVERIFIED SENDER] Re: [BUG] orphaned function

2020-12-17 Thread Gilles Darold

Le 02/12/2020 à 11:46, Drouvot, Bertrand a écrit :


I ended up making more changes in QualifiedNameGetCreationNamespace 
<https://doxygen.postgresql.org/namespace_8c.html#a2cde9c8897e89ae47a99c103c4f96d31>() 
to mimic the retry() logic that is in 
RangeVarGetAndCheckCreationNamespace().


The attached v2 patch, now protects the function to be orphaned in 
those 2 scenarios:


Scenario 1: first, session 1 begin create function, then session 2 
drops the schema: drop schema is locked and would produce (once the 
create function finishes):


bdt=# 2020-12-02 10:12:46.364 UTC [15810] ERROR:  cannot drop schema 
tobeorph because other objects depend on it
2020-12-02 10:12:46.364 UTC [15810] DETAIL:  function 
tobeorph.bdttime() depends on schema tobeorph
2020-12-02 10:12:46.364 UTC [15810] HINT:  Use DROP ... CASCADE to 
drop the dependent objects too.

2020-12-02 10:12:46.364 UTC [15810] STATEMENT:  drop schema tobeorph;

Scenario 2: first, session 1 begin drop schema, then session 2 creates 
the function: create function is locked and would produce (once the 
drop schema finishes):


2020-12-02 10:14:29.468 UTC [15822] ERROR:  schema "tobeorph" does not 
exist


Thanks

Bertrand



Hi,

This is a review for the orphaned function bug fix 
https://www.postgresql.org/message-id/flat/5a9daaae-5538-b209-6279-e903c3ea2...@amazon.com



Contents & Purpose
==

This patch fixes an historical race condition in PostgreSQL that leave a 
function orphan of a namespace. It happens when a function is created in 
a namespace inside a transaction not yet committed and that another 
session drop the namespace. The function become orphan and can not be 
used anymore.


postgres=# \df *.bdttime
    List of functions
  Schema |  Name   |  Result data type   | Argument data types 
| Type

 +-+-+-+--
 | bdttime | timestamp without time zone | 
| func



Initial Run
===

The patch applies cleanly to HEAD. The regression tests all pass 
successfully with the new behavior. Given the example of the case to 
reproduce, the second session now waits until the first session is 
committed and when it is done it thrown error:


    postgres=# drop schema tobeorph;
    ERROR:  cannot drop schema tobeorph because other objects depend on it
    DETAIL:  function tobeorph.bdttime() depends on schema tobeorph
    HINT:  Use DROP ... CASCADE to drop the dependent objects too.

the function is well defined in the catalog.


Comments


As Tom mention there is still pending DDL concurrency problems. For 
example if session 1 execute the following:


    CREATE TYPE public.foo as enum ('one', 'two');
    BEGIN;
    CREATE OR REPLACE FUNCTION tobeorph.bdttime(num foo) RETURNS 
TIMESTAMP AS $$

    DECLARE
       outTS TIMESTAMP;
    BEGIN
       perform pg_sleep(10);
       RETURN CURRENT_TIMESTAMP;
    END;
    $$ LANGUAGE plpgsql volatile;

if session 2 drop the data type before the session 1 is committed, the 
function will be declared with invalid parameter data type.


The same problem applies if the returned type or the procedural language 
is dropped. I have tried to fix that in ProcedureCreate() by using an 
AccessSharedLock for the data types and languages involved in the 
function declaration. With this patch the race condition with parameters 
types, return types and PL languages are fixed. Only data types and 
procedural languages with Oid greater than FirstBootstrapObjectId will 
be locked locked. But this is probably more complex than this fix so it 
is proposed as a separate patch 
(v1-003-orphaned_function_types_language.patch) to not interfere with 
the applying of Bertran's patch.



Conclusion
==

I tag this patch (v1-002-orphaned_function.patch) as ready for 
committers review as it fixes the bug reported.



--
Gilles Darold
LzLabs GmbH
https://www.lzlabs.com/

diff --git a/src/backend/catalog/pg_proc.c b/src/backend/catalog/pg_proc.c
index 1dd9ecc063..9b305896a3 100644
--- a/src/backend/catalog/pg_proc.c
+++ b/src/backend/catalog/pg_proc.c
@@ -34,6 +34,7 @@
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_coerce.h"
 #include "parser/parse_type.h"
+#include "storage/lmgr.h"
 #include "tcop/pquery.h"
 #include "tcop/tcopprot.h"
 #include "utils/acl.h"
@@ -55,7 +56,7 @@ static int	match_prosrc_to_query(const char *prosrc, const char *queryText,
   int cursorpos);
 static bool match_prosrc_to_literal(const char *prosrc, const char *literal,
 	int cursorpos, int *newcursorpos);
-
+static ObjectAddress ObjectAddressSetWithShareLock(Oid classId, Oid objectId);
 
 /* 
  *		ProcedureCreate
@@ -115,6 +116,7 @@ ProcedureCreate(const char *procedureName,
 	int			i;
 	Oid			trfid;
 	ObjectAddresses *addrs;
+	List 

  1   2   >