Re: appendBinaryStringInfo stuff

2023-02-10 Thread Corey Huinker
On Fri, Feb 10, 2023 at 7:16 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 19.12.22 07:13, Peter Eisentraut wrote:
> > Also, the argument type of appendBinaryStringInfo() is char *.  There is
> > some code that uses this function to assemble some kind of packed binary
> > layout, which requires a bunch of casts because of this.  I think
> > functions taking binary data plus length should take void * instead,
> > like memcpy() for example.
>
> I found a little follow-up for this one: Make the same change to
> pq_sendbytes(), which is a thin wrapper around appendBinaryStringInfo().
>   This would allow getting rid of further casts at call sites.
>

+1

Has all the benefits that 54a177a948b0a773c25c6737d1cc3cc49222a526 had.

Passes make check-world.


Re: proposal: psql: psql variable BACKEND_PID

2023-02-05 Thread Corey Huinker
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

A small but helpful feature.

The new status of this patch is: Ready for Committer


Re: proposal: psql: psql variable BACKEND_PID

2023-02-05 Thread Corey Huinker
>
>
>>
>> Clearly, it is hard to write a regression test for an externally volatile
>> value. `SELECT sign(:BACKEND_PID)` would technically do the job, if we're
>> striving for completeness.
>>
>
> I did simple test - :BACKEND_PID should be equal pg_backend_pid()
>
>

Even better.


>
>>
>> Do we want to change %p to pull from this variable and save the
>> snprintf()? Not a measurable savings, more or a don't-repeat-yourself thing.
>>
>
> I checked the code, and I don't think so. Current state is safer (I
> think). The psql's variables are not protected, and I think, so is safer,
> better to
> read the value for prompt directly by usage of the libpq API instead read
> the possibly "customized" variable. I see possible inconsistency,
> but again, the same inconsistency can be for variables USER and DBNAME
> too, and I am not able to say what is significantly better. Just prompt
> shows
> real value, and the related variable is +/- copy in connection time.
>
> I am not 100% sure in this area what is better, but the change requires
> wider and more general discussion, and I don't think the benefits of
> possible
> change are enough. There are just two possible solutions - we can protect
> some psql's variables (and it can do some compatibility issues), or we
> need to accept, so the value in prompt can be fake. It is better to not
> touch it :-).
>

I agree it is out of scope of this patch, but I like the idea of protected
psql variables, and I doubt users are intentionally re-using these vars to
any positive effect. The more likely case is that newer psql vars just
happen to use the names chosen by somebody's script in the past.


>
> done
>
>
>
Everything passes. Docs look good. Test looks good.


Re: proposal: psql: psql variable BACKEND_PID

2023-02-04 Thread Corey Huinker
>
> with doc and unsetting variable
>
> Regards
>
> Pavel
>
>
Patch applies.

Manually testing confirms that it works, at least for the connected state.
I don't actually know how get psql to invoke DISCONNECT, so I killed the
dev server and can confirm

[81:14:57:01 EST] corey=# \echo :BACKEND_PID
81
[81:14:57:04 EST] corey=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
Time: 1.554 ms
[:15:02:31 EST] !> \echo :BACKEND_PID
:BACKEND_PID

Clearly, it is hard to write a regression test for an externally volatile
value. `SELECT sign(:BACKEND_PID)` would technically do the job, if we're
striving for completeness.

The inability to easily DISCONNECT via psql, and the deleterious effect
that would have on other regression tests tells me that we can leave that
one untested.

Notes:

This effectively makes the %p prompt (which I use in the example above) the
same as %:BACKEND_PID: and we may want to note that in the documentation.

Do we want to change %p to pull from this variable and save the snprintf()?
Not a measurable savings, more or a don't-repeat-yourself thing.

In the varlistentry, I suggest we add "This variable is unset when the
connection is lost." after "but can be changed or unset.


Re: proposal: psql: show current user in prompt

2023-02-03 Thread Corey Huinker
On Fri, Feb 3, 2023 at 9:56 AM Pavel Stehule 
wrote:

> Hi
>
> one visitor of p2d2 (Prague PostgreSQL Developer Day) asked if it is
> possible to show the current role in psql's prompt. I think it is not
> possible, but fortunately (with some limits) almost all necessary work is
> done, and the patch is short.
>
> In the assigned patch I implemented a new prompt placeholder %N, that
> shows the current role name.
>
> (2023-02-03 15:52:28) postgres=# \set PROMPT1 '%n as %N at '%/%=%#
> pavel as pavel at postgres=#set role to admin;
> SET
> pavel as admin at postgres=>set role to default;
> SET
> pavel as pavel at postgres=#
>
> Comments, notes are welcome.
>
> Regards
>
> Pavel
>

This patch is cluttered with the BACKEND_PID patch and some guc_tables.c
stuff that I don't think is related.

We'd have to document the %N.

I think there is some value here for people who have to connect as several
different users (tech support), and need a reminder-at-a-glance whether
they are operating in the desired role. It may be helpful in audit
documentation as well.


Re: proposal: psql: psql variable BACKEND_PID

2023-02-03 Thread Corey Huinker
On Fri, Feb 3, 2023 at 5:42 AM Pavel Stehule 
wrote:

> Hi
>
> We can simply allow an access to backend process id thru psql variable. I
> propose the name "BACKEND_PID". The advantages of usage are simple
> accessibility by command \set, and less typing then using function
> pg_backend_pid, because psql variables are supported by tab complete
> routine. Implementation is very simple, because we can use the function
> PQbackendPID.
>
> Comments, notes?
>
> Regards
>
> Pavel
>

Interesting, and probably useful.

It needs a corresponding line in UnsyncVariables():

SetVariable(pset.vars, "BACKEND_PID", NULL);

That will set the variable back to null when the connection goes away.


Re: Remove some useless casts to (void *)

2023-02-02 Thread Corey Huinker
On Thu, Feb 2, 2023 at 5:22 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> I have found that in some corners of the code some calls to standard C
> functions are decorated with casts to (void *) for no reason, and this
> code pattern then gets copied around.  I have gone through and cleaned
> this up a bit, in the attached patches.
>
> The involved functions are: repalloc, memcpy, memset, memmove, memcmp,
> qsort, bsearch
>
> Also hash_search(), for which there was a historical reason (the
> argument used to be char *), but not anymore.


+1

All code is example code.

Applies.
Passes make check world.


Re: transition tables and UPDATE

2023-02-02 Thread Corey Huinker
>
>
> even uglier than what I already had.  So yeah, I think it might be
> useful if we had a way to inject a counter or something in there.
>
>
This came up for me when I was experimenting with making the referential
integrity triggers fire on statements rather than rows. Doing so has the
potential to take a lot of the sting out of big deletes where the
referencing column isn't indexed (thus resulting in N sequentials scans of
the referencing table). If that were 1 statement then we'd get a single
(still ugly) hash join, but it's an improvement.

It has been suggested that the the overhead of forming the tuplestores of
affected rows and reconstituting them into EphemerialNamedRelations could
be made better by instead storing an array of old ctids and new ctids,
which obviously would be in the same order, if we had a means of
reconstituting those with just the columns needed for the check (and
generating a fake ordinality column for your needs), that would be
considerably lighter weight than the tuplestores, and it might make
statement level triggers more useful all around.


Re: Add SHELL_EXIT_CODE to psql

2023-01-30 Thread Corey Huinker
>
>
> Unfortunately, there is a fail in FreeBSD
> https://cirrus-ci.com/task/6466749487382528
>
> Maybe, this patch is need to be rebased?
>
>
That failure is in postgres_fdw, which this code doesn't touch.

I'm not able to get
to 
/tmp/cirrus-ci-build/build/testrun/postgres_fdw-running/regress/regression.out
- It's not in either of the browseable zip files and the 2nd zip file isn't
downloadable, so it's hard to see what's going on.

I rebased, but there are no code differences.
From e99c7b8aa3b81725773c03583bcff50a88a58a38 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 11 Jan 2023 17:27:23 -0500
Subject: [PATCH v8 1/3] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 6cd5998b9d..bf0ada4560 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.1

From 496a84ccf758ed34d2c9bff0e9b458bef5fa9e58 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Thu, 12 Jan 2023 23:04:31 -0500
Subject: [PATCH v8 2/3] Add wait_result_to_exit_code().

---
 src/common/wait_error.c | 15 +++
 src/include/port.h  |  1 +
 2 files changed, 16 insertions(+)

diff --git a/src/common/wait_error.c b/src/common/wait_error.c
index 4a3c3c61af..1eb7ff3fa2 100644
--- a/src/common/wait_error.c
+++ b/src/common/wait_error.c
@@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found)
 		return true;
 	return false;
 }
+
+/*
+ * Return the POSIX exit code (0 to 255) that corresponds to the argument.
+ * The argument is a return code returned by wait(2) or waitpid(2), which
+ * also applies to pclose(3) and system(3).
+ */
+int
+wait_result_to_exit_code(int exit_status)
+{
+	if (WIFEXITED(exit_status))
+		return WEXITSTATUS(exit_status);
+	if (WIFSIGNALED(exit_status))
+		return WTERMSIG(exit_status);
+	return 0;
+}
diff --git a/src/include/port.h b/src/include/port.h
index e66193bed9..1b119a3c56 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src);
 extern char *wait_result_to_str(int exitstatus);
 extern bool wait_result_is_signal(int exit_status, int signum);
 extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found);
+extern int wait_result_to_exit_code(int exit_status);
 
 /*
  * Interfaces that we assume all Unix system have.  We retain individual macros
-- 
2.39.1

From e3dc0f2117b52f9661a75e79097741ed6c0f3b20 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Mon, 23 Jan 2023 16:46:16 -0500
Subject: [PATCH v8 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 ++
 src/bin/psql/command.c | 14 
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 35 +-
 src/test/regress/expected/psql.out | 29 +
 src/test/regress/sql/psql.sql  | 25 +
 6 files changed, 122 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index dc6528dc11..3d8a7353b3 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4264,6 +4264,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command, invoked via either \! or `.
+ 0 means no error. See also SHELL_ERROR.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b5201edf55..ea79d4fe57 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -5032,6 +5032,20 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	if (result == 0)
+	{
+		

Re: Add n_tup_newpage_upd to pg_stat table views

2023-01-30 Thread Corey Huinker
On Fri, Jan 27, 2023 at 6:55 PM Andres Freund  wrote:

> Hi,
>
> On 2023-01-27 18:23:39 -0500, Corey Huinker wrote:
> > This patch adds the n_tup_newpage_upd to all the table stat views.
> >
> > Just as we currently track HOT updates, it should be beneficial to track
> > updates where the new tuple cannot fit on the existing page and must go
> to
> > a different one.
>
> I like that idea.
>
>
> I wonder if it's quite detailed enough - we can be forced to do out-of-page
> updates because we actually are out of space, or because we reach the max
> number of line pointers we allow in a page. HOT pruning can't remove dead
> line
> pointers, so that doesn't necessarily help.
>

I must be missing something, I only see the check for running out of space,
not the check for exhausting line pointers. I agree dividing them would be
interesting.



> Similarly, it's a bit sad that we can't distinguish between the number of
> potential-HOT out-of-page updates and the other out-of-page updates. But
> that'd mean even more counters.
>

I wondered that too, but the combinations of "would have been HOT but not
no space" and "key update suggested not-HOT but it was id=id so today's
your lucky HOT" combinations started to get away from me.

I wondered if there was interest in knowing if the tuple had to get
TOASTed, and further wondered if we would be interested in the number of
updates that had to wait on a lock. Again, more counters.


Add n_tup_newpage_upd to pg_stat table views

2023-01-27 Thread Corey Huinker
This patch adds the n_tup_newpage_upd to all the table stat views.

Just as we currently track HOT updates, it should be beneficial to track
updates where the new tuple cannot fit on the existing page and must go to
a different one.

Hopefully this can give users some insight as to whether their current
fillfactor settings need to be adjusted.

My chosen implementation replaces the hot-update boolean with an
update_type which is currently a three-value enum. I favored that
only slightly over adding a separate newpage-update boolean because the two
events are mutually exclusive and fewer parameters is less overhead and one
less assertion check. The relative wisdom of this choice may not come to
light until we add a new measurement and see whether that new measurement
overlaps either is-hot or is-new-page.
From 55204b3d2f719f5dd8c308ea722606a40b3d09b8 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 27 Jan 2023 17:56:59 -0500
Subject: [PATCH v1] Add n_tup_newpage_upd to pg_stat_all_tables and
 pg_stat_xact_all_tables. This value reflects the number of tuples updated
 where the new tuple was placed on a different page than the previous version.

---
 doc/src/sgml/monitoring.sgml |  9 +
 src/backend/access/heap/heapam.c | 10 ++
 src/backend/catalog/system_views.sql |  4 +++-
 src/backend/utils/activity/pgstat_relation.c |  8 ++--
 src/backend/utils/adt/pgstatfuncs.c  | 18 ++
 src/include/catalog/pg_proc.dat  |  9 +
 src/include/pgstat.h | 14 --
 src/test/regress/expected/rules.out  | 12 +---
 8 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1756f1a4b6..f0291a21fb 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -4523,6 +4523,15 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
   
  
 
+ 
+  
+   n_tup_newpage_upd bigint
+  
+  
+   Number of rows updated where the new row is on a different page than the previous version
+  
+ 
+
  
   
n_live_tup bigint
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e6024a980b..f5aa429a9a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -3155,7 +3155,8 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
 pagefree;
 	bool		have_tuple_lock = false;
 	bool		iscombo;
-	bool		use_hot_update = false;
+	PgStat_HeapUpdateType update_type = PGSTAT_HEAPUPDATE_NON_HOT;
+
 	bool		key_intact;
 	bool		all_visible_cleared = false;
 	bool		all_visible_cleared_new = false;
@@ -3838,10 +3839,11 @@ l2:
 		 * changed.
 		 */
 		if (!bms_overlap(modified_attrs, hot_attrs))
-			use_hot_update = true;
+			update_type = PGSTAT_HEAPUPDATE_HOT;
 	}
 	else
 	{
+		update_type = PGSTAT_HEAPUPDATE_NEW_PAGE;
 		/* Set a hint that the old page could use prune/defrag */
 		PageSetFull(page);
 	}
@@ -3875,7 +3877,7 @@ l2:
 	 */
 	PageSetPrunable(page, xid);
 
-	if (use_hot_update)
+	if (update_type == PGSTAT_HEAPUPDATE_HOT)
 	{
 		/* Mark the old tuple as HOT-updated */
 		HeapTupleSetHotUpdated();
@@ -3986,7 +3988,7 @@ l2:
 	if (have_tuple_lock)
 		UnlockTupleTuplock(relation, &(oldtup.t_self), *lockmode);
 
-	pgstat_count_heap_update(relation, use_hot_update);
+	pgstat_count_heap_update(relation, update_type);
 
 	/*
 	 * If heaptup is a private copy, release it.  Don't forget to copy t_self
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 8608e3fa5b..292a9b88b3 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -665,6 +665,7 @@ CREATE VIEW pg_stat_all_tables AS
 pg_stat_get_tuples_updated(C.oid) AS n_tup_upd,
 pg_stat_get_tuples_deleted(C.oid) AS n_tup_del,
 pg_stat_get_tuples_hot_updated(C.oid) AS n_tup_hot_upd,
+pg_stat_get_tuples_newpage_updated(C.oid) AS n_tup_newpage_upd,
 pg_stat_get_live_tuples(C.oid) AS n_live_tup,
 pg_stat_get_dead_tuples(C.oid) AS n_dead_tup,
 pg_stat_get_mod_since_analyze(C.oid) AS n_mod_since_analyze,
@@ -696,7 +697,8 @@ CREATE VIEW pg_stat_xact_all_tables AS
 pg_stat_get_xact_tuples_inserted(C.oid) AS n_tup_ins,
 pg_stat_get_xact_tuples_updated(C.oid) AS n_tup_upd,
 pg_stat_get_xact_tuples_deleted(C.oid) AS n_tup_del,
-pg_stat_get_xact_tuples_hot_updated(C.oid) AS n_tup_hot_upd
+pg_stat_get_xact_tuples_hot_updated(C.oid) AS n_tup_hot_upd,
+pg_stat_get_xact_tuples_newpage_updated(C.oid) AS n_tup_newpage_upd
 FROM pg_class C LEFT JOIN
  pg_index I ON C.oid = I.indrelid
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
diff --git 

Re: Add SHELL_EXIT_CODE to psql

2023-01-23 Thread Corey Huinker
>
> Thanks! But CF bot still not happy. I think, we should address issues from
> here https://cirrus-ci.com/task/5391002618298368
>

Sure enough, exit codes are shell dependent...adjusted the tests to reflect
that.
From 237b892e5efe739bc8e75d4af30140520d445491 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 11 Jan 2023 17:27:23 -0500
Subject: [PATCH v7 1/3] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 6cd5998b9d..bf0ada4560 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.0

From 01b5d159e3bae3506b74eb22ec066601454fa442 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Thu, 12 Jan 2023 23:04:31 -0500
Subject: [PATCH v7 2/3] Add wait_result_to_exit_code().

---
 src/common/wait_error.c | 15 +++
 src/include/port.h  |  1 +
 2 files changed, 16 insertions(+)

diff --git a/src/common/wait_error.c b/src/common/wait_error.c
index 4a3c3c61af..1eb7ff3fa2 100644
--- a/src/common/wait_error.c
+++ b/src/common/wait_error.c
@@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found)
 		return true;
 	return false;
 }
+
+/*
+ * Return the POSIX exit code (0 to 255) that corresponds to the argument.
+ * The argument is a return code returned by wait(2) or waitpid(2), which
+ * also applies to pclose(3) and system(3).
+ */
+int
+wait_result_to_exit_code(int exit_status)
+{
+	if (WIFEXITED(exit_status))
+		return WEXITSTATUS(exit_status);
+	if (WIFSIGNALED(exit_status))
+		return WTERMSIG(exit_status);
+	return 0;
+}
diff --git a/src/include/port.h b/src/include/port.h
index e66193bed9..1b119a3c56 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src);
 extern char *wait_result_to_str(int exitstatus);
 extern bool wait_result_is_signal(int exit_status, int signum);
 extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found);
+extern int wait_result_to_exit_code(int exit_status);
 
 /*
  * Interfaces that we assume all Unix system have.  We retain individual macros
-- 
2.39.0

From bc936340ba99f9b7d2bbdb36a24f2fcb531d720c Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Mon, 23 Jan 2023 16:46:16 -0500
Subject: [PATCH v7 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 ++
 src/bin/psql/command.c | 14 
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 35 +-
 src/test/regress/expected/psql.out | 29 +
 src/test/regress/sql/psql.sql  | 25 +
 6 files changed, 122 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index dc6528dc11..3d8a7353b3 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4264,6 +4264,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command, invoked via either \! or `.
+ 0 means no error. See also SHELL_ERROR.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b5201edf55..ea79d4fe57 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -5032,6 +5032,20 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	if (result == 0)
+	{
+		SetVariable(pset.vars, "SHELL_EXIT_CODE", "0");
+		SetVariable(pset.vars, "SHELL_ERROR", "false");
+	}
+	else
+	{
+		int		exit_code = wait_result_to_exit_code(result);
+		char	buf[32];
+		snprintf(buf, sizeof(buf), "%d", exit_code);
+		SetVariable(pset.vars, 

Re: Add SHELL_EXIT_CODE to psql

2023-01-23 Thread Corey Huinker
On Mon, Jan 23, 2023 at 2:53 PM Robert Haas  wrote:

> On Mon, Jan 23, 2023 at 1:59 PM Corey Huinker 
> wrote:
> > SHELL_ERROR is helpful in that it is a ready-made boolean that works for
> \if tests in the same way that ERROR is set to true any time SQLSTATE is
> nonzero. We don't yet have inline expressions for \if so the ready-made
> boolean is a convenience.
>
> Oh, that seems a bit sad, but I guess it makes sense.
>

I agree, but there hasn't been much appetite for deciding what expressions
would look like, or how we'd implement it. My instinct would be to not
create our own expression engine, but instead integrate one that is already
available. For my needs, the Unix `expr` command would be ideal (compares
numbers and strings, can do regexes, can do complex expressions), but it's
not cross platform.


Re: Add SHELL_EXIT_CODE to psql

2023-01-23 Thread Corey Huinker
On Fri, Jan 20, 2023 at 8:54 AM Robert Haas  wrote:

> On Wed, Jan 4, 2023 at 2:09 AM Corey Huinker 
> wrote:
> > 2. There are now two psql variables, SHELL_EXIT_CODE, which has the
> return code, and SHELL_ERROR, which is a true/false flag based on whether
> the exit code was nonzero. These variables are the shell result analogues
> of SQLSTATE and ERROR.
>
> Seems redundant.
>

SHELL_ERROR is helpful in that it is a ready-made boolean that works for
\if tests in the same way that ERROR is set to true any time SQLSTATE is
nonzero. We don't yet have inline expressions for \if so the ready-made
boolean is a convenience.

Or are you suggesting that I should have just set ERROR itself rather than
create SHELL_ERROR?


Re: Add SHELL_EXIT_CODE to psql

2023-01-12 Thread Corey Huinker
>
> I belive, we need proper includes.
>

Given that wait_error.c already seems to have the right includes worked out
for WEXITSTATUS/WIFSTOPPED/etc, I decided to just add a function there.

I named it wait_result_to_exit_code(), but I welcome suggestions of a
better name.
From 9e2827a6f955e7cebf87ca538fab113a359951b4 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 11 Jan 2023 17:27:23 -0500
Subject: [PATCH v6 1/3] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 6cd5998b9d..bf0ada4560 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.0

From 2c73156874bd0c222f414a2988271c4870d4293b Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Thu, 12 Jan 2023 23:04:31 -0500
Subject: [PATCH v6 2/3] Add wait_result_to_exit_code().

---
 src/common/wait_error.c | 15 +++
 src/include/port.h  |  1 +
 2 files changed, 16 insertions(+)

diff --git a/src/common/wait_error.c b/src/common/wait_error.c
index 4a3c3c61af..1eb7ff3fa2 100644
--- a/src/common/wait_error.c
+++ b/src/common/wait_error.c
@@ -127,3 +127,18 @@ wait_result_is_any_signal(int exit_status, bool include_command_not_found)
 		return true;
 	return false;
 }
+
+/*
+ * Return the POSIX exit code (0 to 255) that corresponds to the argument.
+ * The argument is a return code returned by wait(2) or waitpid(2), which
+ * also applies to pclose(3) and system(3).
+ */
+int
+wait_result_to_exit_code(int exit_status)
+{
+	if (WIFEXITED(exit_status))
+		return WEXITSTATUS(exit_status);
+	if (WIFSIGNALED(exit_status))
+		return WTERMSIG(exit_status);
+	return 0;
+}
diff --git a/src/include/port.h b/src/include/port.h
index e66193bed9..1b119a3c56 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -495,6 +495,7 @@ extern char *escape_single_quotes_ascii(const char *src);
 extern char *wait_result_to_str(int exitstatus);
 extern bool wait_result_is_signal(int exit_status, int signum);
 extern bool wait_result_is_any_signal(int exit_status, bool include_command_not_found);
+extern int wait_result_to_exit_code(int exit_status);
 
 /*
  * Interfaces that we assume all Unix system have.  We retain individual macros
-- 
2.39.0

From 9a969b50eec614cc181f020e25bd8dd00e7802f8 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Thu, 12 Jan 2023 23:28:14 -0500
Subject: [PATCH v6 3/3] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 ++
 src/bin/psql/command.c | 14 
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 35 +-
 src/test/regress/expected/psql.out | 25 +
 src/test/regress/sql/psql.sql  | 21 ++
 6 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index dc6528dc11..3d8a7353b3 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4264,6 +4264,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command, invoked via either \! or `.
+ 0 means no error. See also SHELL_ERROR.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b5201edf55..ea79d4fe57 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -5032,6 +5032,20 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	if (result == 0)
+	{
+		SetVariable(pset.vars, "SHELL_EXIT_CODE", "0");
+		SetVariable(pset.vars, "SHELL_ERROR", "false");
+	}
+	else
+	{
+		int		exit_code = wait_result_to_exit_code(result);
+		char	buf[32];
+		snprintf(buf, sizeof(buf), 

Re: Add SHELL_EXIT_CODE to psql

2023-01-11 Thread Corey Huinker
>
>
>
> The patch does not apply on top of HEAD as in [1], please post a rebased
> patch:
>
>
Conflict was due to the doc patch applying id tags to psql variable names.
I've rebased and added my own id tags to the two new variables.
From 9e2827a6f955e7cebf87ca538fab113a359951b4 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 11 Jan 2023 17:27:23 -0500
Subject: [PATCH v5 1/2] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 6cd5998b9d..bf0ada4560 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -594,6 +594,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.0

From 4ed7ef0a21a541277250641c63a7bceea4c1ef62 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 11 Jan 2023 17:28:17 -0500
Subject: [PATCH v5 2/2] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 +
 src/bin/psql/command.c |  8 +++
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 38 ++
 src/test/regress/expected/psql.out | 25 
 src/test/regress/sql/psql.sql  | 21 +
 6 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index dc6528dc11..3d8a7353b3 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4264,6 +4264,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command, invoked via either \! or `.
+ 0 means no error. See also SHELL_ERROR.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b5201edf55..5d15c2143b 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -5005,6 +5005,7 @@ static bool
 do_shell(const char *command)
 {
 	int			result;
+	char		exit_code_buf[32];
 
 	fflush(NULL);
 	if (!command)
@@ -5032,6 +5033,13 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	snprintf(exit_code_buf, sizeof(exit_code_buf), "%d", WEXITSTATUS(result));
+	SetVariable(pset.vars, "SHELL_EXIT_CODE", exit_code_buf);
+	if (result == 0)
+		SetVariable(pset.vars, "SHELL_ERROR", "false");
+	else
+		SetVariable(pset.vars, "SHELL_ERROR", "true");
+
 	if (result == 127 || result == -1)
 	{
 		pg_log_error("\\!: failed");
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index e45c4aaca5..6d5226f793 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -455,6 +455,10 @@ helpVariables(unsigned short int pager)
 		  "show all results of a combined query (\\;) instead of only the last\n");
 	HELP0("  SHOW_CONTEXT\n"
 		  "controls display of message context fields [never, errors, always]\n");
+	HELP0("  SHELL_ERROR\n"
+		  "true if last shell command failed, else false\n");
+	HELP0("  SHELL_EXIT_CODE\n"
+		  "Exit code of the last shell command\n");
 	HELP0("  SINGLELINE\n"
 		  "if set, end of line terminates SQL commands (same as -S option)\n");
 	HELP0("  SINGLESTEP\n"
diff --git a/src/bin/psql/psqlscanslash.l b/src/bin/psql/psqlscanslash.l
index 8449ee1a52..f6f03bd816 100644
--- a/src/bin/psql/psqlscanslash.l
+++ b/src/bin/psql/psqlscanslash.l
@@ -23,6 +23,9 @@
 #include "fe_utils/conditional.h"
 
 #include "libpq-fe.h"
+#include "settings.h"
+#include "variables.h"
+#include 
 }
 
 %{
@@ -774,6 +777,8 @@ evaluate_backtick(PsqlScanState state)
 	bool		error = false;
 	char		buf[512];
 	size_t		result;
+	int			exit_code = 0;
+	char		exit_code_buf[32];
 
 	initPQExpBuffer(_output);
 
@@ -783,6 +788,7 @@ evaluate_backtick(PsqlScanState state)
 	{
 		pg_log_error("%s: %m", cmd);
 		error = true;
+		exit_code = -1;
 	}
 
 	if (!error)
@@ -790,7 

Re: Add SHELL_EXIT_CODE to psql

2023-01-10 Thread Corey Huinker
On Tue, Jan 10, 2023 at 3:54 AM Maxim Orlov  wrote:

>
>
> On Mon, 9 Jan 2023 at 21:36, Corey Huinker 
> wrote:
>
>>
>> I chose a name that would avoid collisions with anything a user might
>> potentially throw into their environment, so if the var "OS" is fairly
>> standard is a reason to avoid using it. Also, going with our own env var
>> allows us to stay in perfect synchronization with the build's #ifdef WIN32
>> ... and whatever #ifdefs may come in the future for new OSes. If there is
>> already an environment variable that does that for us, I would rather use
>> that, but I haven't found it.
>>
>> Perhaps, I didn't make myself clear. Your solution is perfectly adapted
> to our needs.
> But all Windows since 2000 already have an environment variable
> OS=Windows_NT. So, if env OS is defined and equal Windows_NT, this had to
> be Windows.
> May we use it in our case? I don't insist, just asking.
>

Ah, that makes more sense. I don't have a strong opinion on which we should
use, and I would probably defer to someone who does windows (and possibly
cygwin) builds more often than I do.


Re: Add SHELL_EXIT_CODE to psql

2023-01-09 Thread Corey Huinker
On Mon, Jan 9, 2023 at 10:01 AM Maxim Orlov  wrote:

> Hi!
>
> In overall, I think we move in the right direction. But we could make code
> better, should we?
>
> +   /* Capture exit code for SHELL_EXIT_CODE */
> +   close_exit_code = pclose(fd);
> +   if (close_exit_code == -1)
> +   {
> +   pg_log_error("%s: %m", cmd);
> +   error = true;
> +   }
> +   if (WIFEXITED(close_exit_code))
> +   exit_code=WEXITSTATUS(close_exit_code);
> +   else if(WIFSIGNALED(close_exit_code))
> +   exit_code=WTERMSIG(close_exit_code);
> +   else if(WIFSTOPPED(close_exit_code))
> +   exit_code=WSTOPSIG(close_exit_code);
> +   if (exit_code)
> +   error = true;
> I think, it's better to add spaces around middle if block. It will be easy
> to read.
> Also, consider, adding spaces around assignment in this block.
>

Noted and will implement.


> +   /*
> +   snprintf(exit_code_buf, sizeof(exit_code_buf), "%d",
> WEXITSTATUS(exit_code));
> +   */
> Probably, this is not needed.
>

Heh. Oops.


> > 1. pg_regress now creates an environment variable called PG_OS_TARGET
> Maybe, we can use env "OS"? I do not know much about Windows, but I think
> this is kind of standard environment variable there.
>

I chose a name that would avoid collisions with anything a user might
potentially throw into their environment, so if the var "OS" is fairly
standard is a reason to avoid using it. Also, going with our own env var
allows us to stay in perfect synchronization with the build's #ifdef WIN32
... and whatever #ifdefs may come in the future for new OSes. If there is
already an environment variable that does that for us, I would rather use
that, but I haven't found it.

The 0001 patch is unchanged from last time (aside from anything rebasing
might have done).

>
From cb2ac7393cf0ce0a7c336dfc749ff91142a88b09 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Tue, 3 Jan 2023 22:16:20 -0500
Subject: [PATCH v4 1/2] Add PG_OS_TARGET environment variable to enable
 OS-specific changes to regression tests.

---
 src/test/regress/pg_regress.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index 40e6c231a3..56c59e0b10 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -595,6 +595,17 @@ initialize_environment(void)
 #endif
 	}
 
+		/*
+		 * Regression tests that invoke OS commands must occasionally use
+		 * OS-specific syntax (example: NUL vs /dev/null). This env var allows
+		 * those tests to adjust commands accordingly.
+		 */
+#if defined(WIN32)
+		setenv("PG_OS_TARGET", "WIN32", 1);
+#else
+		setenv("PG_OS_TARGET", "OTHER", 1);
+#endif
+
 	/*
 	 * Set translation-related settings to English; otherwise psql will
 	 * produce translated messages and produce diffs.  (XXX If we ever support
-- 
2.39.0

From 9d4bec9e046a80f89ce484e2dc4f25c8f1d99e9c Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Mon, 9 Jan 2023 13:25:08 -0500
Subject: [PATCH v4 2/2] Add psql variables SHELL_ERROR and SHELL_EXIT_CODE
 which report the success or failure of the most recent psql OS-command
 executed via the \! command or `backticks`. These variables are roughly
 analogous to ERROR and SQLSTATE, but for OS level operations instead of SQL
 commands.

---
 doc/src/sgml/ref/psql-ref.sgml | 21 +
 src/bin/psql/command.c |  8 +++
 src/bin/psql/help.c|  4 
 src/bin/psql/psqlscanslash.l   | 38 ++
 src/test/regress/expected/psql.out | 25 
 src/test/regress/sql/psql.sql  | 21 +
 6 files changed, 112 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 3f994a3592..711a6d24fd 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4264,6 +4264,27 @@ bar
 
   
 
+  
+   SHELL_ERROR
+   
+
+ true if the last shell command failed, false if
+ it succeeded. This applies to shell commands invoked via either \!
+ or `. See also SHELL_EXIT_CODE.
+
+   
+  
+
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command, invoked via either \! or `.
+ 0 means no error. See also SHELL_ERROR.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index b5201edf55..5d15c2143b 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -5005,6 +5005,7 @@ static bool
 do_shell(const char *command)
 {
 	int			result;
+	char		exit_code_buf[32];
 
 	fflush(NULL);
 	if (!command)
@@ -5032,6 +5033,13 @@ do_shell(const char *command)
 

Re: Generate pg_stat_get_xact*() functions with Macros

2023-01-05 Thread Corey Huinker
On Thu, Jan 5, 2023 at 8:50 AM Drouvot, Bertrand <
bertranddrouvot...@gmail.com> wrote:

> Hi hackers,
>
> Please find attached a patch proposal to $SUBJECT.
>
> This is the same kind of work that has been done in 83a1a1b566 and
> 8018ffbf58 but this time for the
> pg_stat_get_xact*() functions (as suggested by Andres in [1]).
>
> The function names remain the same, but some fields have to be renamed.
>
> While at it, I also took the opportunity to create the macros for
> pg_stat_get_xact_function_total_time(),
> pg_stat_get_xact_function_self_time() and
> pg_stat_get_function_total_time(), pg_stat_get_function_self_time()
> (even if the same code pattern is only repeated two 2 times).
>
> Now that this patch renames some fields, I think that, for consistency,
> those ones should be renamed too (aka remove the f_ and t_ prefixes):
>
> PgStat_FunctionCounts.f_numcalls
> PgStat_StatFuncEntry.f_numcalls
> PgStat_TableCounts.t_truncdropped
> PgStat_TableCounts.t_delta_live_tuples
> PgStat_TableCounts.t_delta_dead_tuples
> PgStat_TableCounts.t_changed_tuples
>
> But I think it would be better to do it in a follow-up patch (once this
> one get committed).
>
> [1]:
> https://www.postgresql.org/message-id/20230105002733.ealhzubjaiqis6ua%40awork3.anarazel.de
>
> Looking forward to your feedback,
>
> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com


I like code cleanups like this. It makes sense, it results in less code,
and anyone doing a `git grep pg_stat_get_live_tuples` will quickly find the
macro definition.

Unsurprisingly, it passes `make check-world`.

So I think it's good to go as-is.

It does get me wondering, however, if we reordered the three typedefs to
group like-typed registers together, we could make them an array with the
names becoming defined constant index values (or keeping them via a union),
then the typedefs effectively become:

typedef struct PgStat_FunctionCallUsage
{
PgStat_FunctionCounts *fs;
instr_time  time_counters[3];
} PgStat_FunctionCallUsage;


typedef struct PgStat_BackendSubEntry
{
PgStat_Counter counters[2];
} PgStat_BackendSubEntry;


typedef struct PgStat_TableCounts
{
boolt_truncdropped;
PgStat_Counter counters[12];
} PgStat_TableCounts;


Then we'd only have 3 actual C functions:

pg_stat_get_xact_counter(oid, int)
pg_stat_get_xact_subtrans_counter(oid, int)
pg_stat_get_xact_function_time_counter(oid, int)

and then the existing functions become SQL standard function body calls,
something like this:

CREATE OR REPLACE FUNCTION pg_stat_get_xact_numscans(oid)
 RETURNS bigint
 LANGUAGE sql
 STABLE PARALLEL RESTRICTED COST 1
RETURN pg_stat_get_xact_counter($1, 0);


CREATE OR REPLACE FUNCTION pg_stat_get_xact_tuples_returned(oid)
 RETURNS bigint
 LANGUAGE sql
 STABLE PARALLEL RESTRICTED COST 1
RETURN pg_stat_get_xact_counter($1, 1);



The most obvious drawback to this approach is that the C functions would
need to do runtime bounds checking on the index parameter, and the amount
of memory footprint saved by going from 17 short functions to 3 is not
enough to make any real difference. So I think your approach is better, but
I wanted to throw this idea out there.


Re: Add SHELL_EXIT_CODE to psql

2023-01-03 Thread Corey Huinker
On Tue, Jan 3, 2023 at 5:36 AM vignesh C  wrote:

> On Wed, 21 Dec 2022 at 11:04, Corey Huinker 
> wrote:
> >
> > I've rebased and updated the patch to include documentation.
> >
> > Regression tests have been moved to a separate patchfile because error
> messages will vary by OS and configuration, so we probably can't do a
> stable regression test, but having them handy at least demonstrates the
> feature.
> >
> > On Sun, Dec 4, 2022 at 12:35 AM Corey Huinker 
> wrote:
> >>
> >> Rebased. Still waiting on feedback before working on documentation.
> >>
> >> On Fri, Nov 4, 2022 at 5:23 AM Corey Huinker 
> wrote:
> >>>
> >>> Oops, that sample output was from a previous run, should have been:
> >>>
> >>> -- SHELL_EXIT_CODE is undefined
> >>> \echo :SHELL_EXIT_CODE
> >>> :SHELL_EXIT_CODE
> >>> -- bad \!
> >>> \! borp
> >>> sh: line 1: borp: command not found
> >>> \echo :SHELL_EXIT_CODE
> >>> 127
> >>> -- bad backtick
> >>> \set var `borp`
> >>> sh: line 1: borp: command not found
> >>> \echo :SHELL_EXIT_CODE
> >>> 127
> >>> -- good \!
> >>> \! true
> >>> \echo :SHELL_EXIT_CODE
> >>> 0
> >>> -- play with exit codes
> >>> \! exit 4
> >>> \echo :SHELL_EXIT_CODE
> >>> 4
> >>> \set var `exit 3`
> >>> \echo :SHELL_EXIT_CODE
> >>> 3
> >>>
> >>>
> >>> On Fri, Nov 4, 2022 at 5:08 AM Corey Huinker 
> wrote:
> >>>>
> >>>>
> >>>> Over in
> https://www.postgresql.org/message-id/eaf326ad693e74eba068f33a7f518...@oss.nttdata.com
> Justin Pryzby suggested that psql might need the ability to capture the
> shell exit code.
> >>>>
> >>>> This is a POC patch that does that, but doesn't touch on the
> ON_ERROR_STOP stuff.
> >>>>
> >>>> I've added some very rudimentary tests, but haven't touched the
> documentation, because I strongly suspect that someone will suggest a
> better name for the variable.
> >>>>
> >>>> But basically, it works like this
> >>>>
> >>>> -- SHELL_EXIT_CODE is undefined
> >>>> \echo :SHELL_EXIT_CODE
> >>>> :SHELL_EXIT_CODE
> >>>> -- bad \!
> >>>> \! borp
> >>>> sh: line 1: borp: command not found
> >>>> \echo :SHELL_EXIT_CODE
> >>>> 32512
> >>>> -- bad backtick
> >>>> \set var `borp`
> >>>> sh: line 1: borp: command not found
> >>>> \echo :SHELL_EXIT_CODE
> >>>> 127
> >>>> -- good \!
> >>>> \! true
> >>>> \echo :SHELL_EXIT_CODE
> >>>> 0
> >>>> -- play with exit codes
> >>>> \! exit 4
> >>>> \echo :SHELL_EXIT_CODE
> >>>> 1024
> >>>> \set var `exit 3`
> >>>> \echo :SHELL_EXIT_CODE
> >>>> 3
> >>>>
> >>>>
> >>>> Feedback welcome.
>
> CFBot shows some compilation errors as in [1], please post an updated
> version for the same:
> [02:35:49.924] psqlscanslash.l: In function ‘evaluate_backtick’:
> [02:35:49.924] psqlscanslash.l:822:11: error: implicit declaration of
> function ‘WIFSTOPPED’ [-Werror=implicit-function-declaration]
> [02:35:49.924] 822 | exit_code=WSTOPSIG(exit_code);
> [02:35:49.924] | ^~
> [02:35:49.924] psqlscanslash.l:823:14: error: implicit declaration of
> function ‘WSTOPSIG’ [-Werror=implicit-function-declaration]
> [02:35:49.924] 823 | }
> [02:35:49.924] | ^
> [02:35:49.924] cc1: all warnings being treated as errors
> [02:35:49.925] make[3]: *** [: psqlscanslash.o] Error 1
>
> [1] - https://cirrus-ci.com/task/5424476720988160
>
> Regards,
> Vignesh
>

Thanks. I had left sys/wait.h out of psqlscanslash.

Attached is v3 of this patch, I've made the following changes:

1. pg_regress now creates an environment variable called PG_OS_TARGET,
which regression tests can use to manufacture os-specific commands. For our
purposes, this allows the regression test to manufacture a shell command
that has either "2> /dev/null" or "2> NUL". This seemed the least invasive
way to make this possible. If for some reason it becomes useful in general
psql scripting, then we can consider promoting it to a regular psql var.

2. There are now two psql variables, SHELL_EXIT_CODE, which has the return
code, and SHE

Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2023-01-03 Thread Corey Huinker
On Mon, Jan 2, 2023 at 10:57 AM Tom Lane  wrote:

> Corey Huinker  writes:
> > The proposed changes are as follows:
> > CAST(expr AS typename)
> > continues to behave as before.
> > CAST(expr AS typename ERROR ON ERROR)
> > has the identical behavior as the unadorned CAST() above.
> > CAST(expr AS typename NULL ON ERROR)
> > will use error-safe functions to do the cast of expr, and will return
> > NULL if the cast fails.
> > CAST(expr AS typename DEFAULT expr2 ON ERROR)
> > will use error-safe functions to do the cast of expr, and will return
> > expr2 if the cast fails.
>
> While I approve of trying to get some functionality in this area,
> I'm not sure that extending CAST is a great idea, because I'm afraid
> that the SQL committee will do something that conflicts with it.
> If we know that they are about to standardize exactly this syntax,
> where is that information available?  If we don't know that,
> I'd prefer to invent some kind of function or other instead of
> extending the grammar.
>
> regards, tom lane
>

I'm going off the spec that Vik presented in
https://www.postgresql.org/message-id/f8600a3b-f697-2577-8fea-f40d3e18b...@postgresfriends.org
which is his effort to get it through the SQL committee. I was
alreading thinking about how to get the SQLServer TRY_CAST() function into
postgres, so this seemed like the logical next step.

While the syntax may change, the underlying infrastructure would remain
basically the same: we would need the ability to detect that a typecast had
failed, and replace it with the default value, and handle that at parse
time, or executor time, and handle array casts where the array has the
default but the underlying elements can't.

It would be simple to move the grammar changes to their own patch if that
removes a barrier for people.


Re: Add SHELL_EXIT_CODE to psql

2022-12-31 Thread Corey Huinker
On Sat, Dec 31, 2022 at 5:28 PM Isaac Morland 
wrote:

> On Sat, 31 Dec 2022 at 16:47, Corey Huinker 
> wrote:
>
>>
>>> I wonder if there is value in setting up a psql on/off var
>>> SHELL_ERROR_OUTPUT construct that when set to "off/false"
>>> suppresses standard error via appending "2> /dev/null" (or "2> nul" if
>>> #ifdef WIN32). At the very least, it would allow for tests like this to be
>>> done with standard regression scripts.
>>>
>>
>> Thinking on this some more a few ideas came up:
>>
>> 1. The SHELL_ERROR_OUTPUT var above. This is good for simple situations,
>> but it would fail if the user took it upon themselves to redirect output,
>> and suddenly "myprog 2> /dev/null" works fine on its own but will fail when
>> we append our own "2> /dev/null" to it.
>>
>
> Rather than attempting to append redirection directives to the command,
> would it work to redirect stderr before invoking the shell? This seems to
> me to be more reliable and it should allow an explicit redirection in the
> shell command to still work. The difference between Windows and Unix then
> becomes the details of what system calls we use to accomplish the
> redirection (or maybe none, if an existing abstraction layer takes care of
> that - I'm not really up on Postgres internals enough to know), rather than
> what we append to the provided command.
>
>
Inside psql, it's a call to the system() function which takes a single
string. All output, stdout or stderr, is printed. So for the regression
test we have to compose a command that is OS appropriate AND suppresses
stderr.


Re: Add SHELL_EXIT_CODE to psql

2022-12-31 Thread Corey Huinker
On Fri, Dec 30, 2022 at 2:17 PM Corey Huinker 
wrote:

> On Wed, Dec 28, 2022 at 5:59 AM Maxim Orlov  wrote:
>
>> Hi!
>>
>> The patch is implementing what is declared to do. Shell return code is
>> now accessible is psql var.
>> Overall code is in a good condition. Applies with no errors on master.
>> Unfortunately, regression tests are failing on the macOS due to the
>> different shell output.
>>
>
> That was to be expected.
>
> I wonder if there is value in setting up a psql on/off var
> SHELL_ERROR_OUTPUT construct that when set to "off/false"
> suppresses standard error via appending "2> /dev/null" (or "2> nul" if
> #ifdef WIN32). At the very least, it would allow for tests like this to be
> done with standard regression scripts.
>

Thinking on this some more a few ideas came up:

1. The SHELL_ERROR_OUTPUT var above. This is good for simple situations,
but it would fail if the user took it upon themselves to redirect output,
and suddenly "myprog 2> /dev/null" works fine on its own but will fail when
we append our own "2> /dev/null" to it.

2. Exposing a PSQL var that identifies the shell snippet for "how to
suppress standard error", which would be "2> /dev/null" for -nix systems
and "2> NUL" for windows systems. This again, presumes that users will
actually need this feature, and I'm not convinced that they will.

3. Exposing a PSQL var that is basically an "is this environment running on
windows" and let them construct it from there. That gets complicated with
WSL and the like, so I'm not wild about this, even though it would be
comparatively simple to implement.

4. We could inject an environment variable into our own regression tests
directly based on the "#ifdef WIN32" in our own code, and we can use a
couple of \gset commands to construct the error-suppressed shell commands
as needed. This seems like the best starting point, and we can always turn
that env var into a real variable if it proves useful elsewhere.


Re: Add SHELL_EXIT_CODE to psql

2022-12-30 Thread Corey Huinker
On Wed, Dec 28, 2022 at 5:59 AM Maxim Orlov  wrote:

> Hi!
>
> The patch is implementing what is declared to do. Shell return code is now
> accessible is psql var.
> Overall code is in a good condition. Applies with no errors on master.
> Unfortunately, regression tests are failing on the macOS due to the
> different shell output.
>

That was to be expected.

I wonder if there is value in setting up a psql on/off var
SHELL_ERROR_OUTPUT construct that when set to "off/false"
suppresses standard error via appending "2> /dev/null" (or "2> nul" if
#ifdef WIN32). At the very least, it would allow for tests like this to be
done with standard regression scripts.

>


Re: Add SHELL_EXIT_CODE to psql

2022-12-20 Thread Corey Huinker
I've rebased and updated the patch to include documentation.

Regression tests have been moved to a separate patchfile because error
messages will vary by OS and configuration, so we probably can't do a
stable regression test, but having them handy at least demonstrates the
feature.

On Sun, Dec 4, 2022 at 12:35 AM Corey Huinker 
wrote:

> Rebased. Still waiting on feedback before working on documentation.
>
> On Fri, Nov 4, 2022 at 5:23 AM Corey Huinker 
> wrote:
>
>> Oops, that sample output was from a previous run, should have been:
>>
>> -- SHELL_EXIT_CODE is undefined
>> \echo :SHELL_EXIT_CODE
>> :SHELL_EXIT_CODE
>> -- bad \!
>> \! borp
>> sh: line 1: borp: command not found
>> \echo :SHELL_EXIT_CODE
>> 127
>> -- bad backtick
>> \set var `borp`
>> sh: line 1: borp: command not found
>> \echo :SHELL_EXIT_CODE
>> 127
>> -- good \!
>> \! true
>> \echo :SHELL_EXIT_CODE
>> 0
>> -- play with exit codes
>> \! exit 4
>> \echo :SHELL_EXIT_CODE
>> 4
>> \set var `exit 3`
>> \echo :SHELL_EXIT_CODE
>> 3
>>
>>
>> On Fri, Nov 4, 2022 at 5:08 AM Corey Huinker 
>> wrote:
>>
>>>
>>> Over in
>>> https://www.postgresql.org/message-id/eaf326ad693e74eba068f33a7f518...@oss.nttdata.com
>>>  Justin
>>> Pryzby suggested that psql might need the ability to capture the shell exit
>>> code.
>>>
>>> This is a POC patch that does that, but doesn't touch on the
>>> ON_ERROR_STOP stuff.
>>>
>>> I've added some very rudimentary tests, but haven't touched the
>>> documentation, because I strongly suspect that someone will suggest a
>>> better name for the variable.
>>>
>>> But basically, it works like this
>>>
>>> -- SHELL_EXIT_CODE is undefined
>>> \echo :SHELL_EXIT_CODE
>>> :SHELL_EXIT_CODE
>>> -- bad \!
>>> \! borp
>>> sh: line 1: borp: command not found
>>> \echo :SHELL_EXIT_CODE
>>> 32512
>>> -- bad backtick
>>> \set var `borp`
>>> sh: line 1: borp: command not found
>>> \echo :SHELL_EXIT_CODE
>>> 127
>>> -- good \!
>>> \! true
>>> \echo :SHELL_EXIT_CODE
>>> 0
>>> -- play with exit codes
>>> \! exit 4
>>> \echo :SHELL_EXIT_CODE
>>> 1024
>>> \set var `exit 3`
>>> \echo :SHELL_EXIT_CODE
>>> 3
>>>
>>>
>>> Feedback welcome.
>>>
>>
From 443be6f64ca89bbd6367a011f2a98aa9324f27a9 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Wed, 21 Dec 2022 00:24:47 -0500
Subject: [PATCH 1/2] Make the exit code of shell commands executed via psql
 visible via the variable SHELL_EXIT_CODE.

---
 doc/src/sgml/ref/psql-ref.sgml |  9 +
 src/bin/psql/command.c |  4 
 src/bin/psql/help.c|  2 ++
 src/bin/psql/psqlscanslash.l   | 24 +---
 4 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 8a5285da9a..d0c80b4528 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -4260,6 +4260,15 @@ bar
 
   
 
+  
+   SHELL_EXIT_CODE
+   
+
+ The exit code return by the last shell command. 0 means no error.
+
+   
+  
+
   
 SHOW_ALL_RESULTS
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index de6a3a71f8..f6d6a489a9 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -4998,6 +4998,7 @@ static bool
 do_shell(const char *command)
 {
 	int			result;
+	char		exit_code_buf[32];
 
 	fflush(NULL);
 	if (!command)
@@ -5025,6 +5026,9 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	snprintf(exit_code_buf, sizeof(exit_code_buf), "%d", WEXITSTATUS(result));
+	SetVariable(pset.vars, "SHELL_EXIT_CODE", exit_code_buf);
+
 	if (result == 127 || result == -1)
 	{
 		pg_log_error("\\!: failed");
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index b4e0ec2687..caf13e2ed2 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -455,6 +455,8 @@ helpVariables(unsigned short int pager)
 		  "show all results of a combined query (\\;) instead of only the last\n");
 	HELP0("  SHOW_CONTEXT\n"
 		  "controls display of message context fields [never, errors, always]\n");
+	HELP0("  SHELL_EXIT_CODE\n"
+		  "Exit code of the last shell command\n");
 	HELP0("  SINGLELINE\n"
 		  "if set, end of line terminates SQL commands (same as -S option)\n");
 	HELP0("  SINGLESTEP\n"
diff -

Re: Common function for percent placeholder replacement

2022-12-20 Thread Corey Huinker
>
> How about this new one with variable arguments?


I like this a lot, but I also see merit in Alvaro's PERCENT_OPT variadic,
which at least avoids the two lists getting out of sync.

Initially, I was going to ask that we have shell-quote-safe equivalents of
whatever fixed parameters we baked in, but this allows the caller to do
that as needed. It seems we could now just copy quote_identifier and strip
out the keyword checks to get the desired effect. Has anyone else had a
need for quote-safe args in the shell commands?


CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions

2022-12-19 Thread Corey Huinker
Attached is my work in progress to implement the changes to the CAST()
function as proposed by Vik Fearing.

This work builds upon the Error-safe User Functions work currently ongoing.

The proposed changes are as follows:

CAST(expr AS typename)
continues to behave as before.

CAST(expr AS typename ERROR ON ERROR)
has the identical behavior as the unadorned CAST() above.

CAST(expr AS typename NULL ON ERROR)
will use error-safe functions to do the cast of expr, and will return
NULL if the cast fails.

CAST(expr AS typename DEFAULT expr2 ON ERROR)
will use error-safe functions to do the cast of expr, and will return
expr2 if the cast fails.

There is an additional FORMAT parameter that I have not yet implemented, my
understanding is that it is largely intended for DATE/TIME field
conversions, but others are certainly possible.
CAST(expr AS typename FORMAT fmt DEFAULT expr2 ON ERROR)

What is currently working:
- Any scalar expression that can be evaluated at parse time. These tests
from cast.sql all currently work:

VALUES (CAST('error' AS integer));
VALUES (CAST('error' AS integer ERROR ON ERROR));
VALUES (CAST('error' AS integer NULL ON ERROR));
VALUES (CAST('error' AS integer DEFAULT 42 ON ERROR));

SELECT CAST('{123,abc,456}' AS integer[] DEFAULT '{-789}' ON ERROR) as
array_test1;

- Scalar values evaluated at runtime.

CREATE TEMPORARY TABLE t(t text);
INSERT INTO t VALUES ('a'), ('1'), ('b'), (2);
SELECT CAST(t.t AS integer DEFAULT -1 ON ERROR) AS foo FROM t;
 foo
-
  -1
   1
  -1
   2
(4 rows)


Along the way, I made a few design decisions, each of which is up for
debate:

First, I created OidInputFunctionCallSafe, which is to OidInputFunctionCall
what InputFunctionCallSafe is to InputFunctionCall. Given that the only
place I ended up using it was stringTypeDatumSafe(), it may be possible to
just move that code inside stringTypeDatumSafe.

Next, I had a need for FuncExpr, CoerceViaIO, and ArrayCoerce to all report
if their expr argument failed, and if not, just past the evaluation of
expr2. Rather than duplicate this logic in several places, I chose instead
to modify CoalesceExpr to allow for an error-test mode in addition to its
default null-test mode, and then to provide this altered node with two
expressions, the first being the error-safe typecast of expr and the second
being the non-error-safe typecast of expr2.

I still don't have array-to-array casts working, as the changed I would
likely need to make to ArrayCoerce get somewhat invasive, so this seemed
like a good time to post my work so far and solicit some feedback beyond
what I've already been getting from Jeff Davis and Michael Paquier.

I've sidestepped domains as well for the time being as well as avoiding JIT
issues entirely.

No documentation is currently prepared. All but one of the regression test
queries work, the one that is currently failing is:

SELECT CAST('{234,def,567}'::text[] AS integer[] DEFAULT '{-1011}' ON
ERROR) as array_test2;

Other quirks:
- an unaliased CAST ON DEFAULT will return the column name of "coalesce",
which internally is true, but obviously would be quite confusing to a user.

As a side observation, I noticed that the optimizer already tries to
resolve expressions based on constants and to collapse expression trees
where possible, which makes me wonder if the work done to do the same in
transformTypeCast/ and coerce_to_target_type and coerce_type isn't also
wasted.
From 897c9c68a29ad0fa57f28734df0c77553e026d80 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Sun, 18 Dec 2022 16:20:01 -0500
Subject: [PATCH 1/3] add OidInputFunctionCall

---
 src/backend/utils/fmgr/fmgr.c | 13 +
 src/include/fmgr.h|  4 
 2 files changed, 17 insertions(+)

diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index 0d37f69298..e9a19ce653 100644
--- a/src/backend/utils/fmgr/fmgr.c
+++ b/src/backend/utils/fmgr/fmgr.c
@@ -1701,6 +1701,19 @@ OidInputFunctionCall(Oid functionId, char *str, Oid typioparam, int32 typmod)
 	return InputFunctionCall(, str, typioparam, typmod);
 }
 
+bool
+OidInputFunctionCallSafe(Oid functionId, char *str, Oid typioparam,
+		 int32 typmod, fmNodePtr escontext,
+		 Datum *result)
+{
+	FmgrInfo			flinfo;
+
+	fmgr_info(functionId, );
+
+	return InputFunctionCallSafe(, str, typioparam, typmod,
+ escontext, result);
+}
+
 char *
 OidOutputFunctionCall(Oid functionId, Datum val)
 {
diff --git a/src/include/fmgr.h b/src/include/fmgr.h
index b7832d0aa2..b835ef72b5 100644
--- a/src/include/fmgr.h
+++ b/src/include/fmgr.h
@@ -706,6 +706,10 @@ extern bool InputFunctionCallSafe(FmgrInfo *flinfo, char *str,
   Datum *result);
 extern Datum OidInputFunctionCall(Oid functionId, char *str,
   Oid typioparam, int32 typmod);
+extern bool
+OidInputFunctionCallSafe(Oid functionId, char *str, Oid typioparam,
+		 int32 typmod, fmNodePtr escontext,
+		 Datum *result);
 extern char *OutputFunctionCall(FmgrInfo 

Re: Error-safe user functions

2022-12-10 Thread Corey Huinker
On Sat, Dec 10, 2022 at 9:20 AM Tom Lane  wrote:

> Alvaro Herrera  writes:
> > On 2022-Dec-09, Tom Lane wrote:
> >> ...  So I think it might be
> >> okay to say "if you want soft error treatment for a domain,
> >> make sure its check constraints don't throw errors".
>
> > I think that's fine.  If the user does, say "CHECK (value > 0)" and that
> > results in a soft error, that seems to me enough support for now.  If
> > they want to do something more elaborate, they can write C functions.
> > Maybe eventually we'll want to offer some other mechanism that doesn't
> > require C, but let's figure out what the requirements are.  I don't
> > think we know that, at this point.
>
> A fallback we can offer to anyone with such a problem is "write a
> plpgsql function and wrap the potentially-failing bit in an exception
> block".  Then they get to pay the cost of the subtransaction, while
> we're not imposing one on everybody else.
>
> regards, tom lane
>

That exception block will prevent parallel plans. I'm not saying it isn't
the best way forward for us, but wanted to make that side effect clear.


Re: Error-safe user functions

2022-12-09 Thread Corey Huinker
On Fri, Dec 9, 2022 at 11:17 AM Amul Sul  wrote:

> On Fri, Dec 9, 2022 at 9:08 PM Andrew Dunstan  wrote:
> >
> >
> > On 2022-12-09 Fr 10:16, Tom Lane wrote:
> > > Andrew Dunstan  writes:
> > >> On 2022-12-08 Th 21:59, Tom Lane wrote:
> > >>> Yeah, I was planning to take a look at that before walking away from
> > >>> this stuff.  (I'm sure not volunteering to convert ALL the input
> > >>> functions, but I'll do the datetime code.)
> > >> Awesome. Perhaps if there are no more comments you can commit what you
> > >> currently have so people can start work on other input functions.
> > > Pushed.
> >
> >
> > Great!
> >
> >
> > > As I said, I'll take a look at the datetime area.  Do we
> > > have any volunteers for other input functions?
> > >
> > >
> >
> >
> > I am currently looking at the json types. I think that will be enough to
> > let us rework the sql/json patches as discussed a couple of months ago.
> >
>
> I will pick a few other input functions, thanks.
>
> Regards,
> Amul
>

I can do a few as well, as I need them done for the CAST With Default
effort.

Amul, please let me know which ones you pick so we don't duplicate work.


Re: Error-safe user functions

2022-12-07 Thread Corey Huinker
On Wed, Dec 7, 2022 at 12:17 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > In my attempt to implement CAST...DEFAULT, I noticed that I immediately
> > needed an
> > OidInputFunctionCallSafe, which was trivial but maybe something we want
> to
> > add to the infra patch, but the comments around that function also
> somewhat
> > indicate that we might want to just do the work in-place and call
> > InputFunctionCallSafe directly. Open to both ideas.
>
> I'm a bit skeptical of that.  IMO using OidInputFunctionCall is only
> appropriate in places that will be executed just once per query.
>

That is what's happening when the expr of the existing CAST ( expr AS
typename ) is a constant and we want to just resolve the constant at parse
time.


>
>


Re: Error-safe user functions

2022-12-07 Thread Corey Huinker
On Wed, Dec 7, 2022 at 9:20 AM Tom Lane  wrote:

> Andrew Dunstan  writes:
> > Perhaps we should add a type in the regress library that will never have
> > a safe input function, so we can test that the mechanism works as
> > expected in that case even after we adjust all the core data types'
> > input functions.
>
> I was intending that the existing "widget" type be that.  0003 already
> adds a comment to widget_in saying not to "fix" its one ereport call.
>
> Returning to the naming quagmire -- it occurred to me just now that
> it might be helpful to call this style of error reporting "soft"
> errors rather than "safe" errors, which'd provide a nice contrast
> with "hard" errors thrown by longjmp'ing.  That would lead to naming
> all the variant functions XXXSoft not XXXSafe.  There would still
> be commentary to the effect that "soft errors must be safe, in the
> sense that there's no question whether it's safe to continue
> processing the transaction".  Anybody think that'd be an
> improvement?


In my attempt to implement CAST...DEFAULT, I noticed that I immediately
needed an
OidInputFunctionCallSafe, which was trivial but maybe something we want to
add to the infra patch, but the comments around that function also somewhat
indicate that we might want to just do the work in-place and call
InputFunctionCallSafe directly. Open to both ideas.

Looking forward cascades up into coerce_type and its brethren, and
reimplementing those from a Node returner to a boolean returner with a Node
parameter seems a bit of a stretch, so I have to pick a point where the
code pivots from passing down a safe-mode indicator and passing back a
found_error indicator (which may be combine-able, as safe is always true
when the found_error pointer is not null, and always false when it isn't),
but for the most part things look do-able.


Re: Error-safe user functions

2022-12-06 Thread Corey Huinker
On Tue, Dec 6, 2022 at 6:46 AM Andrew Dunstan  wrote:

>
> On 2022-12-05 Mo 20:06, Tom Lane wrote:
> > Andres Freund  writes:
> >
> >> But perhaps it's even worth having such a function properly exposed:
> >> It's not at all rare to parse text data during ETL and quite often
> >> erroring out fatally is undesirable. As savepoints are undesirable
> >> overhead-wise, there's a lot of SQL out there that tries to do a
> >> pre-check about whether some text could be cast to some other data
> >> type. A function that'd try to cast input to a certain type without
> >> erroring out hard would be quite useful for that.
> > Corey and Vik are already talking about a non-error CAST variant.
>
>
> /metoo! :-)
>
>
> > Maybe we should leave this in abeyance until something shows up
> > for that?  Otherwise we'll be making a nonstandard API for what
> > will probably ultimately be SQL-spec functionality.  I don't mind
> > that as regression-test infrastructure, but I'm a bit less excited
> > about exposing it as a user feature.
> >
>
>
> I think a functional mechanism could be very useful. Who knows when the
> standard might specify something in this area?
>
>
>
Vik's working on the standard (he put the spec in earlier in this thread).
I'm working on implementing it on top of Tom's work, but I'm one patchset
behind at the moment.

Once completed, it should be leverage-able in several places, COPY being
the most obvious.

What started all this was me noticing that if I implemented TRY_CAST in
pl/pgsql with an exception block, then I wasn't able to use parallel query.


Re: ANY_VALUE aggregate

2022-12-05 Thread Corey Huinker
On Mon, Dec 5, 2022 at 12:57 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Mon, Dec 5, 2022 at 7:57 AM Vik Fearing 
> wrote:
>
>> The SQL:2023 Standard defines a new aggregate named ANY_VALUE.  It
>> returns an implementation-dependent (i.e. non-deterministic) value from
>> the rows in its group.
>>
>> PFA an implementation of this aggregate.
>>
>>
> Can we please add "first_value" and "last_value" if we are going to add
> "some_random_value" to our library of aggregates?
>
> Also, maybe we should have any_value do something like compute a 50/50
> chance that any new value seen replaces the existing chosen value, instead
> of simply returning the first value all the time.  Maybe even prohibit the
> first value from being chosen so long as a second value appears.
>
> David J.
>

Adding to the pile of wanted aggregates: in the past I've lobbied for
only_value() which is like first_value() but it raises an error on
encountering a second value.


Re: Error-safe user functions

2022-12-05 Thread Corey Huinker
On Mon, Dec 5, 2022 at 1:00 PM Tom Lane  wrote:

> Andrew Dunstan  writes:
> > Wait a minute!  Oh, no, sorry, as you were, 'errsave' is fine.
>
> Seems like everybody's okay with errsave.  I'll make a v2 in a
> little bit.  I'd like to try updating array_in and/or record_in
> just to verify that indirection cases work okay, before we consider
> the design to be set.
>

+1 to errsave.


Re: Error-safe user functions

2022-12-05 Thread Corey Huinker
On Mon, Dec 5, 2022 at 11:36 AM Andrew Dunstan  wrote:

>
> On 2022-12-05 Mo 11:20, Robert Haas wrote:
> > On Mon, Dec 5, 2022 at 11:09 AM Tom Lane  wrote:
> >> Robert Haas  writes:
> >>> On Sat, Dec 3, 2022 at 10:57 PM Corey Huinker 
> wrote:
> >>>> 2. ereturn_* => errfeedback / error_feedback / feedback
> >>> Oh, I like that, especially errfeedback.
> >> efeedback?  But TBH I do not think any of these are better than ereturn.
> > I do. Having a macro name that is "return" plus one character is going
> > to make people think that it returns. I predict that if you insist on
> > using that name people are still going to be making mistakes based on
> > that confusion 10 years from now.
> >
>
> OK, I take both this point and Tom's about trying to keep it the same
> length. So we need something that's  7 letters, doesn't say 'return' and
> preferably begins with 'e'. I modestly suggest 'eseterr', or if we like
> the 'feedback' idea 'efeedbk'.
>
>
>
Consulting https://www.thesaurus.com/browse/feedback again:
ereply clocks in at 7 characters.


Re: Add SHELL_EXIT_CODE to psql

2022-12-03 Thread Corey Huinker
Rebased. Still waiting on feedback before working on documentation.

On Fri, Nov 4, 2022 at 5:23 AM Corey Huinker 
wrote:

> Oops, that sample output was from a previous run, should have been:
>
> -- SHELL_EXIT_CODE is undefined
> \echo :SHELL_EXIT_CODE
> :SHELL_EXIT_CODE
> -- bad \!
> \! borp
> sh: line 1: borp: command not found
> \echo :SHELL_EXIT_CODE
> 127
> -- bad backtick
> \set var `borp`
> sh: line 1: borp: command not found
> \echo :SHELL_EXIT_CODE
> 127
> -- good \!
> \! true
> \echo :SHELL_EXIT_CODE
> 0
> -- play with exit codes
> \! exit 4
> \echo :SHELL_EXIT_CODE
> 4
> \set var `exit 3`
> \echo :SHELL_EXIT_CODE
> 3
>
>
> On Fri, Nov 4, 2022 at 5:08 AM Corey Huinker 
> wrote:
>
>>
>> Over in
>> https://www.postgresql.org/message-id/eaf326ad693e74eba068f33a7f518...@oss.nttdata.com
>>  Justin
>> Pryzby suggested that psql might need the ability to capture the shell exit
>> code.
>>
>> This is a POC patch that does that, but doesn't touch on the
>> ON_ERROR_STOP stuff.
>>
>> I've added some very rudimentary tests, but haven't touched the
>> documentation, because I strongly suspect that someone will suggest a
>> better name for the variable.
>>
>> But basically, it works like this
>>
>> -- SHELL_EXIT_CODE is undefined
>> \echo :SHELL_EXIT_CODE
>> :SHELL_EXIT_CODE
>> -- bad \!
>> \! borp
>> sh: line 1: borp: command not found
>> \echo :SHELL_EXIT_CODE
>> 32512
>> -- bad backtick
>> \set var `borp`
>> sh: line 1: borp: command not found
>> \echo :SHELL_EXIT_CODE
>> 127
>> -- good \!
>> \! true
>> \echo :SHELL_EXIT_CODE
>> 0
>> -- play with exit codes
>> \! exit 4
>> \echo :SHELL_EXIT_CODE
>> 1024
>> \set var `exit 3`
>> \echo :SHELL_EXIT_CODE
>> 3
>>
>>
>> Feedback welcome.
>>
>
From fef0e52706c9136659f0f8846bae289f0dbfb469 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 4 Nov 2022 04:45:39 -0400
Subject: [PATCH] POC: expose shell exit code as a psql variable

---
 src/bin/psql/command.c |  4 
 src/bin/psql/help.c|  2 ++
 src/bin/psql/psqlscanslash.l   | 28 +---
 src/test/regress/expected/psql.out | 11 +++
 src/test/regress/sql/psql.sql  |  7 +++
 5 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index de6a3a71f8..f6d6a489a9 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -4998,6 +4998,7 @@ static bool
 do_shell(const char *command)
 {
 	int			result;
+	char		exit_code_buf[32];
 
 	fflush(NULL);
 	if (!command)
@@ -5025,6 +5026,9 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	snprintf(exit_code_buf, sizeof(exit_code_buf), "%d", WEXITSTATUS(result));
+	SetVariable(pset.vars, "SHELL_EXIT_CODE", exit_code_buf);
+
 	if (result == 127 || result == -1)
 	{
 		pg_log_error("\\!: failed");
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index b4e0ec2687..caf13e2ed2 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -455,6 +455,8 @@ helpVariables(unsigned short int pager)
 		  "show all results of a combined query (\\;) instead of only the last\n");
 	HELP0("  SHOW_CONTEXT\n"
 		  "controls display of message context fields [never, errors, always]\n");
+	HELP0("  SHELL_EXIT_CODE\n"
+		  "Exit code of the last shell command\n");
 	HELP0("  SINGLELINE\n"
 		  "if set, end of line terminates SQL commands (same as -S option)\n");
 	HELP0("  SINGLESTEP\n"
diff --git a/src/bin/psql/psqlscanslash.l b/src/bin/psql/psqlscanslash.l
index a467b72144..30e6f5dcd4 100644
--- a/src/bin/psql/psqlscanslash.l
+++ b/src/bin/psql/psqlscanslash.l
@@ -27,6 +27,8 @@
 
 %{
 #include "fe_utils/psqlscan_int.h"
+#include "settings.h"
+#include "variables.h"
 
 /*
  * We must have a typedef YYSTYPE for yylex's first argument, but this lexer
@@ -774,6 +776,8 @@ evaluate_backtick(PsqlScanState state)
 	bool		error = false;
 	char		buf[512];
 	size_t		result;
+	int			exit_code = 0;
+	char		exit_code_buf[32];
 
 	initPQExpBuffer(_output);
 
@@ -783,6 +787,7 @@ evaluate_backtick(PsqlScanState state)
 	{
 		pg_log_error("%s: %m", cmd);
 		error = true;
+		exit_code = -1;
 	}
 
 	if (!error)
@@ -800,10 +805,25 @@ evaluate_backtick(PsqlScanState state)
 		} while (!feof(fd));
 	}
 
-	if (fd && pclose(fd) == -1)
+	if (fd)
 	{
-		pg_log_error("%s: %m", cmd);
-		error = true;
+		exit_code = pclose(fd);
+		if (exit_code == -1)
+		{
+			pg_log_error("%s: %m", cmd);
+			error = true;
+		}
+		if

Re: Make ON_ERROR_STOP stop on shell script failure

2022-12-03 Thread Corey Huinker
On Tue, Nov 22, 2022 at 6:16 PM Matheus Alcantara  wrote:

> --- Original Message ---
> On Tuesday, November 22nd, 2022 at 20:10, bt22nakamorit <
> bt22nakamo...@oss.nttdata.com> wrote:
>
>
> > There was a mistake in the error message for \! so I updated the patch.
> >
> > Best,
> > Tatsuhiro Nakamori
>
> Hi
>
> I was checking your patch and seems that it failed to be applied into the
> master cleanly. Could you please rebase it?
>

Yes. My apologies, I had several life events get in the way.


Re: Error-safe user functions

2022-12-03 Thread Corey Huinker
>
> I think there are just a couple of loose ends here:
>
> 1. Bikeshedding on my name choices is welcome.  I know Robert is
> dissatisfied with "ereturn", but I'm content with that so I didn't
> change it here.
>

1. details_please => include_error_data

as this hints the reader directly to the struct to be filled out

2. ereturn_* => errfeedback / error_feedback / efeedback

It is returned, but it's not taking control and the caller could ignore it.
I arrived at his after checking https://www.thesaurus.com/browse/report and
https://www.thesaurus.com/browse/hint.


> 2. Everybody has struggled with just where to put the declaration
> of the error context structure.  The most natural home for it
> probably would be elog.h, but that's out because it cannot depend
> on nodes.h, and the struct has to be a Node type to conform to
> the fmgr safety guidelines.  What I've done here is to drop it
> in nodes.h, as we've done with a couple of other hard-to-classify
> node types; but I can't say I'm satisfied with that.
>
> Other plausible answers seem to be:
>
> * Drop it in fmgr.h.  The only real problem is that historically
> we've not wanted fmgr.h to depend on nodes.h either.  But I'm not
> sure how strong the argument for that really is/was.  If we did
> do it like that we could clean up a few kluges, both in this patch
> and pre-existing (fmNodePtr at least could go away).
>
> * Invent a whole new header just for this struct.  But then we're
> back to the question of what to call it.  Maybe something along the
> lines of utils/elog_extras.h ?
>

Would moving ErrorReturnContext and the ErrorData struct to their own
util/errordata.h allow us to avoid the void  pointer for ercontext? If so,
that'd be a win because typed pointers give the reader some idea of what is
expected there, as well as aiding doxygen-like tools.

Overall this looks like a good foundation.

My own effort was getting bogged down in the number of changes I needed to
make in code paths that would never want a failover case for their
typecasts, so I'm going to refactor my work on top of this and see where I
get stuck.


Re: Error-safe user functions

2022-12-02 Thread Corey Huinker
On Fri, Dec 2, 2022 at 1:46 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > I'm still working on organizing my patch, but it grew out of a desire to
> do
> > this:
> > CAST(value AS TypeName DEFAULT expr)
> > This is a thing that exists in other forms in other databases and while
> it
> > may look unrelated, it is essentially the SQL/JSON casts within a nested
> > data structure issue, just a lot simpler.
>
> Okay, maybe that's why I was thinking we had a requirement for
> failure-free casts.  Sure, you can transform it to the other thing
> by always implementing this as a cast-via-IO, but you could run into
> semantics issues that way.  (If memory serves, we already have cases
> where casting X to Y gives a different result from casting X to text
> to Y.)
>

Yes, I was setting aside the issue of direct cast functions for my v0.1


>
> > My original plan had been two new params to all _in() functions: a
> boolean
> > error_mode and a default expression Datum.
> > After consulting with Jeff Davis and Michael Paquier, the notion of
> > modifying fcinfo itself two booleans:
> >   allow_error (this call is allowed to return if there was an error with
> > INPUT) and
> >   has_error (this function has the concept of a purely-input-based error,
> > and found one)
>
> Hmm ... my main complaint about that is the lack of any way to report
> the details of the error.  I realize that a plain boolean failure flag
> might be enough for our immediate use-cases, but I don't want to expend
> the amount of effort this is going to involve and then later find we
> have a use-case where we need the error details.
>

I agree, but then we're past a boolean for allow_error, and we probably get
into a list of modes like this:

CAST_ERROR_ERROR  /* default ereport(), what we do now */
CAST_ERROR_REPORT_FULL /* report that the cast failed, everything that you
would have put in the ereport() instead put in a struct that gets returned
to caller */
CAST_ERROR_REPORT_SILENT /* report that the cast failed, but nobody cares
why so don't even form the ereport strings, good for bulk operations */
CAST_ERROR_WARNING /* report that the cast failed, but emit ereport() as a
warning */
CAST_ERROR_[NOTICE,LOG,DEBUG1,..DEBUG5] /* same, but some other loglevel */


>
> The sketch that's in my head at the moment is to make use of the existing
> "context" field of FunctionCallInfo: if that contains a node of some
> to-be-named type, then we are requesting that errors not be thrown
> but instead be reported by passing back an ErrorData using a field of
> that node.  The issue about not constructing an ErrorData if the outer
> caller doesn't need it could perhaps be addressed by adding some boolean
> flag fields in that node, but the details of that need not be known to
> the functions reporting errors this way; it'd be a side channel from the
> outer caller to elog.c.
>

That should be a good place for it, assuming it's not already used like
fn_extra is. It would also squash those cases above into just three: ERROR,
REPORT_FULL, and REPORT_SILENT, leaving it up to the node what type of
erroring/logging is appropriate.


>
> The main objection I can see to this approach is that we only support
> one context value per call, so you could not easily combine this
> functionality with existing use-cases for the context field.  A quick
> census of InitFunctionCallInfoData calls finds aggregates, window
> functions, triggers, and procedures, none of which seem like plausible
> candidates for wanting no-error behavior, so I'm not too concerned
> about that.  (Maybe the error-reporting node could be made a sub-node
> of the context node in any future cases where we do need it?)
>

A subnode had occurred to me when fiddling about with fn_extra, so so that
applies here, but if we're doing a sub-node, then maybe it's worth it's own
parameter. I struggled with that because of how few of these functions will
use it vs how often they're executed.


>
> > The one gap I see so far in the patch presented is that it returns a null
> > value on bad input, which might be ok if the node has the default, but
> that
> > then presents the node with having to understand whether it was a null
> > because of bad input vs a null that was expected.
>
> Yeah.  That's something we could probably get away with for the case of
> input functions only, but I think explicit out-of-band signaling that
> there was an error is a more future-proof solution.
>

I think we'll run into it fairly soon, because if I recall correctly,
SQL/JSON has a formatting spec that essentially means that we're not
calling input functions, we're calling TO_CHAR() and TO_DATE(), but they're
very similiar to input functions.

One positive side effect of all this is we can get a isa(value, typname)
construct like this "for free", we just try the cast and return the value.

I'm still working on my patch even though it will probably be sidelined in
the hopes that it informs us of any subsequent issues.


Re: Error-safe user functions

2022-12-02 Thread Corey Huinker
On Fri, Dec 2, 2022 at 9:34 AM Andrew Dunstan  wrote:

>
> On 2022-12-02 Fr 09:12, Tom Lane wrote:
> > Robert Haas  writes:
> >> I think the design is evolving in your head as you think about this
> >> more, which is totally understandable and actually very good. However,
> >> this is also why I think that you should produce the patch you
> >> actually want instead of letting other people repeatedly submit
> >> patches and then complain that they weren't what you had in mind.
> > OK, Corey hasn't said anything, so I will have a look at this over
> > the weekend.
> >
> >
>
>
> Great. Let's hope we can get this settled early next week and then we
> can get to work on the next tranche of functions, those that will let
> the SQL/JSON work restart.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>
I'm still working on organizing my patch, but it grew out of a desire to do
this:

CAST(value AS TypeName DEFAULT expr)

This is a thing that exists in other forms in other databases and while it
may look unrelated, it is essentially the SQL/JSON casts within a nested
data structure issue, just a lot simpler.

My original plan had been two new params to all _in() functions: a boolean
error_mode and a default expression Datum.

After consulting with Jeff Davis and Michael Paquier, the notion of
modifying fcinfo itself two booleans:
  allow_error (this call is allowed to return if there was an error with
INPUT) and
  has_error (this function has the concept of a purely-input-based error,
and found one)

The nice part about this is that unaware functions can ignore these values,
and custom data types that did not check these values would continue to
work as before. It wouldn't respect the CAST default, but that's up to the
extension writer to fix, and that's a pretty acceptable failure mode.

Where this gets tricky is arrays and complex types: the default expression
applies only to the object explicitly casted, so if somebody tried CAST
('{"123","abc"}'::text[] AS integer[] DEFAULT '{0}') the inner casts need
to know that they _can_ allow input errors, but have no default to offer,
they need merely report their failure upstream...and that's where the
issues align with the SQL/JSON issue.

In pursuing this, I see that my method was simultaneously too little and
too much. Too much in the sense that it alters the structure for all fmgr
functions, though in a very minor and back-compatible way, and too little
in the sense that we could actually return the ereport info in a structure
and leave it to the node to decide whether to raise it or not. Though I
should add that there many situations where we don't care about the
specifics of the error, we just want to know that one existed and move on,
so time spent forming that return structure would be time wasted.

The one gap I see so far in the patch presented is that it returns a null
value on bad input, which might be ok if the node has the default, but that
then presents the node with having to understand whether it was a null
because of bad input vs a null that was expected.


Re: Error-safe user functions

2022-12-02 Thread Corey Huinker
On Fri, Dec 2, 2022 at 9:12 AM Tom Lane  wrote:

> Robert Haas  writes:
> > I think the design is evolving in your head as you think about this
> > more, which is totally understandable and actually very good. However,
> > this is also why I think that you should produce the patch you
> > actually want instead of letting other people repeatedly submit
> > patches and then complain that they weren't what you had in mind.
>
> OK, Corey hasn't said anything, so I will have a look at this over
> the weekend.
>
> regards, tom lane
>

Sorry, had several life issues intervene. Glancing over what was discussed
because it seems pretty similar to what I had in mind. Will respond back in
detail shortly.


Re: psql: Add command to use extended query protocol

2022-11-21 Thread Corey Huinker
On Tue, Nov 15, 2022 at 8:29 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 09.11.22 00:12, Corey Huinker wrote:
> > As for the docs, they're very clear and probably sufficient as-is, but I
> > wonder if we should we explicitly state that the bind-state and bind
> > parameters do not "stay around" after the query is executed? Suggestions
> > in bold:
> >
> >   This command causes the extended query protocol (see  >   linkend="protocol-query-concepts"/>) to be used, unlike normal
> >   psql operation, which uses the
> simple
> >   query protocol. *Extended query protocol will be used* *even
> > if no parameters are specified, s*o this command can be useful to test
> > the extended
> >   query protocol from psql. *This command affects only the next
> > query executed, all subsequent queries will use the regular query
> > protocol by default.*
> >
> > Tests seem comprehensive. I went looking for the TAP test that this
> > would have replaced, but found none, and it seems the only test where we
> > exercise PQsendQueryParams is libpq_pipeline.c, so these tests are a
> > welcome addition.
> >
> > Aside from the possible doc change, it looks ready to go.
>
> Committed with those doc changes.  Thanks.
>
>
I got thinking about this, and while things may be fine as-is, I would like
to hear some opinions as to whether this behavior is correct:

String literals can include spaces

[16:51:35 EST] corey=# select $1, $2 \bind 'abc def' gee \g
 ?column? | ?column?
--+--
 abc def  | gee
(1 row)


String literal includes spaces, but also includes quotes:

Time: 0.363 ms
[16:51:44 EST] corey=# select $1, $2 \bind "abc def" gee \g
 ?column?  | ?column?
---+--
 "abc def" | gee
(1 row)

Semi-colon does not terminate an EQP statement, ';' is seen as a parameter:

[16:51:47 EST] corey=# select $1, $2 \bind "abc def" gee ;
corey-# \g
ERROR:  bind message supplies 3 parameters, but prepared statement ""
requires 2


Confirming that slash-commands must be unquoted

[16:52:23 EST] corey=# select $1, $2 \bind "abc def" '\\g' \g
 ?column?  | ?column?
---+--
 "abc def" | \g
(1 row)

[16:59:00 EST] corey=# select $1, $2 \bind "abc def" '\watch' \g
 ?column?  | ?column?
---+--
 "abc def" | watch
(1 row)

Confirming that any slash command terminates the bind list, but ';' does not

[16:59:54 EST] corey=# select $1, $2 \bind "abc def" gee \watch 5
Mon 21 Nov 2022 05:00:07 PM EST (every 5s)

 ?column?  | ?column?
---+--
 "abc def" | gee
(1 row)

Time: 0.422 ms
Mon 21 Nov 2022 05:00:12 PM EST (every 5s)

 ?column?  | ?column?
---+--
 "abc def" | gee
(1 row)

Is this all working as expected?


Re: Error-safe user functions

2022-11-20 Thread Corey Huinker
On Tue, Nov 15, 2022 at 11:36 AM Andrew Dunstan  wrote:

>
> On 2022-10-07 Fr 13:37, Tom Lane wrote:
>
>
> [ lots of detailed review ]
>
> > Basically, this patch set should be a lot smaller and not have ambitions
> > beyond "get the API right" and "make one or two datatypes support COPY
> > NULL_ON_ERROR".  Add more code once that core functionality gets reviewed
> > and committed.
> >
> >
>
>
> Nikita,
>
> just checking in, are you making progress on this? I think we really
> need to get this reviewed and committed ASAP if we are to have a chance
> to get the SQL/JSON stuff reworked to use it in time for release 16.
>
>
I'm making an attempt at this or something very similar to it. I don't yet
have a patch ready.


Re: Multitable insert syntax support on Postgres?

2022-11-20 Thread Corey Huinker
>
> WITH data_src AS (SELECT * FROM source_tbl),
>   insert_a AS (INSERT INTO a SELECT * FROM data_src WHERE d < 5),
>   insert_b AS (INSERT INTO b SELECT * FROM data_src WHERE d >= 5)
> INSERT INTO c SELECT * FROM data_src WHERE d < 5
>

I suppose you could just do a dummy SELECT at the bottom to make it look
more symmetrical

WITH data_src AS (SELECT * FROM source_tbl),
  insert_a AS (INSERT INTO a SELECT * FROM data_src WHERE d < 5),
  insert_b AS (INSERT INTO b SELECT * FROM data_src WHERE d >= 5)
  insert_c AS (INSERT INTO c SELECT * FROM data_src WHERE d < 5)
SELECT true AS inserts_complete;

Or maybe get some diagnostics out of it:

WITH data_src AS (SELECT * FROM source_tbl),
  insert_a AS (INSERT INTO a SELECT * FROM data_src WHERE d < 5 RETURNING
NULL),
  insert_b AS (INSERT INTO b SELECT * FROM data_src WHERE d >= 5 RETURNING
NULL),
  insert_c AS (INSERT INTO c SELECT * FROM data_src WHERE d < 5 RETURNING
NULL)
SELECT
  (SELECT COUNT(*) FROM insert_a) AS new_a_rows,
  (SELECT COUNT(*) FROM insert_b) AS new_b_rows,
  (SELECT COUNT(*) FROM insert_c) AS new_c_rows;


Re: Multitable insert syntax support on Postgres?

2022-11-20 Thread Corey Huinker
On Mon, Nov 14, 2022 at 7:06 PM Alexandre hadjinlian guerra <
alexhgue...@gmail.com> wrote:

> Hello
> Are there any plans to incorporate a formal syntax multitable/conditional
> insert , similar to the syntax below? snowflake does have the same feature
>
> https://oracle-base.com/articles/9i/multitable-inserts
>
> Today, im resorting to a function that receives the necessary parameters
> from the attributes definition/selection area in a sql select query, called
> by each tuple retrieved. A proper syntax show be real cool
>
> Thanks!
>

I'm not aware of any efforts to implement this at this time, mostly because
I don't think it's supported in the SQL Standard. Being in the standard
would change the question from "why" to "why not".

I've used that feature when I worked with Oracle in a data warehouse
situation. I found it most useful when migrating data dumps from mainframes
where the data file contained subrecords and in cases where one field in a
row changes the meaning of subsequent fields in the same row. That may
sound like a First Normal Form violation, and it is, but such data formats
are common in the IBM VSAM world, or at least they were in the data dumps
that I had to import.


Re: Document parameter count limit

2022-11-10 Thread Corey Huinker
>
>
> +if you are reading this prepatorily, please redesign your
> query to use temporary tables or arrays
>

I agree with the documentation of this parameter.
I agree with dissuading anyone from attempting to change it
The wording is bordering on snark (however well deserved) and I think the
voice is slightly off.

Alternate suggestion:

Queries approaching this limit usually can be refactored to use arrays or
temporary tables, thus reducing parameter overhead.


The bit about parameter overhead appeals to the reader's desire for
performance, rather than just focusing on "you shouldn't want this".


Re: refactor ownercheck and aclcheck functions

2022-11-09 Thread Corey Huinker
>
> After considering this again, I decided to brute-force this and get rid
> of all the trivial wrapper functions and also several of the special
> cases.  That way, there is less confusion at the call sites about why
> this or that style is used in a particular case.  Also, it now makes
> sure you can't accidentally use the generic functions when a particular
> one should be used.
>

+1

However, the aclcheck patch isn't applying for me now. That patch modifies
37 files, so it's hard to say just which commit conflicts.


Re: psql: Add command to use extended query protocol

2022-11-08 Thread Corey Huinker
>
>
> Btw., this also allows doing things like
>
> SELECT $1, $2
> \bind '1' '2' \g
> \bind '3' '4' \g
>

That's one of the things I was hoping for. Very cool.


>
> This isn't a prepared statement being reused, but it relies on the fact
> that psql \g with an empty query buffer resends the previous query.
> Still kind of neat.


Yeah, if they wanted a prepared statement there's nothing stopping them.

Review:

Patch applies, tests pass.

Code is quite straightforward.

As for the docs, they're very clear and probably sufficient as-is, but I
wonder if we should we explicitly state that the bind-state and bind
parameters do not "stay around" after the query is executed? Suggestions in
bold:

 This command causes the extended query protocol (see ) to be used, unlike normal
 psql operation, which uses the simple
 query protocol.  *Extended query protocol will be used* *even if
no parameters are specified, s*o this command can be useful to test the
extended
 query protocol from psql. *This command affects only the next
query executed, all subsequent queries will use the regular query protocol
by default.*

Tests seem comprehensive. I went looking for the TAP test that this would
have replaced, but found none, and it seems the only test where we exercise
PQsendQueryParams is libpq_pipeline.c, so these tests are a welcome
addition.

Aside from the possible doc change, it looks ready to go.


Re: psql: Add command to use extended query protocol

2022-11-07 Thread Corey Huinker
On Mon, Nov 7, 2022 at 4:12 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > I thought about basically reserving the \$[0-9]+ space as bind variables,
> > but it is possible, though unlikely, that users have been naming their
> > variables like that.
>
> Don't we already reserve that syntax as Params?  Not sure whether there
> would be any conflicts versus Params, but these are definitely not legal
> as SQL identifiers.
>
> regards, tom lane
>

I think Pavel was hinting at something like:

\set $1 foo
\set $2 123
UPDATE mytable SET value = $1 WHERE id = $2;

Which wouldn't step on anything, because I tested it, and \set $1 foo
already returns 'Invalid variable name "$1"'.

So far, there seem to be two possible variations on how to go about this:

1. Have special variables or a variable namespace that are known to be bind
variables. So long as one of them is defined, queries are sent using
extended query protocol.
2. Bind parameters one-time-use, applied strictly to the query currently in
the buffer in positional order, and once that query is run their
association with being binds is gone.

Each has its merits, I guess it comes down to how much we expect users to
want to re-use some or all the bind params of the previous query.


Re: psql: Add command to use extended query protocol

2022-11-07 Thread Corey Huinker
>
>
>
> what about introduction new syntax for psql variables that should be
> passed as bind variables.
>

I thought about basically reserving the \$[0-9]+ space as bind variables,
but it is possible, though unlikely, that users have been naming their
variables like that.

It's unclear from your example if that's what you meant, or if you wanted
actual named variables ($name, $timestamp_before, $x).

Actual named variables might cause problems withCREATE FUNCTION AS ...
$body$ ... $body$; as well as the need to deduplicate them.

So while it is less seamless, I do like the \bind x y z \g idea because it
requires no changes in variable interpolation, and the list can be
terminated with a slash command or ;

To your point about forcing extended query protocol even when no parameters
are, that would be SELECT 1 \bind \g

It hasn't been discussed, but the question of how to handle output
parameters seems fairly straightforward: the value of the bind variable is
the name of the psql variable to be set a la \gset.


Re: pg_dump: Refactor code that constructs ALTER ... OWNER TO commands

2022-11-05 Thread Corey Huinker
On Wed, Nov 2, 2022 at 5:30 PM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 01.11.22 13:59, Corey Huinker wrote:
> > On Mon, Oct 24, 2022 at 5:54 AM Peter Eisentraut
> >  > <mailto:peter.eisentr...@enterprisedb.com>> wrote:
> >
> > Avoid having to list all the possible object types twice.  Instead,
> > only
> > _getObjectDescription() needs to know about specific object types.
> It
> > communicates back to _printTocEntry() whether an owner is to be set.
> >
> > In passing, remove the logic to use ALTER TABLE to set the owner of
> > views and sequences.  This is no longer necessary.  Furthermore, if
> > pg_dump doesn't recognize the object type, this is now a fatal error,
> > not a warning.
> >
> >
> > Makes sense, passes all tests.
>
> Committed.
>
> > It's clearly out of scope for this very focused patch, but would it make
> > sense for the TocEntry struct to be populated with an type enumeration
> > integer as well as the type string to make for clearer and faster
> > sifting later?
>
> That could be better, but wouldn't that mean a change of the format of
> pg_dump archives?
>

Sorry for the confusion, I was thinking strictly of the in memory
representation after it is extracted from the dictionary.


Re: psql: Add command to use extended query protocol

2022-11-05 Thread Corey Huinker
On Fri, Nov 4, 2022 at 11:45 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 02.11.22 01:18, Corey Huinker wrote:
> >
> >   SELECT $1, $2 \gp 'foo' 'bar'
> >
> >
> > I think this is a great idea, but I foresee people wanting to send that
> > output to a file or a pipe like \g allows. If we assume everything after
> > the \gp is a param, don't we paint ourselves into a corner?
>
> Any thoughts on how that syntax could be generalized?
>

A few:

The most compact idea I can think of is to have \bind and \endbind (or more
terse equivalents \bp and \ebp)

SELECT * FROM foo WHERE type_id = $1 AND cost > $2 \bind 'param1' 'param2'
\endbind $2 \g filename.csv

Maybe the end-bind param isn't needed at all, we just insist that bind
params be single quoted strings or numbers, so the next slash command ends
the bind list.

If that proves difficult, we might save bind params like registers

something like this, positional:

\bind 1 'param1'
\bind 2 'param2'
SELECT * FROM foo WHERE type_id = $1 AND cost > $2 \g filename.csv
\unbind

or all the binds on one line

\bindmany 'param1' 'param2'
SELECT * FROM foo WHERE type_id = $1 AND cost > $2 \g filename.csv
\unbind

Then psql would merely have to check if it had any bound registers, and if
so, the next query executed is extended query protocol, and \unbind wipes
out the binds to send us back to regular mode.


Re: Add SHELL_EXIT_CODE to psql

2022-11-04 Thread Corey Huinker
Oops, that sample output was from a previous run, should have been:

-- SHELL_EXIT_CODE is undefined
\echo :SHELL_EXIT_CODE
:SHELL_EXIT_CODE
-- bad \!
\! borp
sh: line 1: borp: command not found
\echo :SHELL_EXIT_CODE
127
-- bad backtick
\set var `borp`
sh: line 1: borp: command not found
\echo :SHELL_EXIT_CODE
127
-- good \!
\! true
\echo :SHELL_EXIT_CODE
0
-- play with exit codes
\! exit 4
\echo :SHELL_EXIT_CODE
4
\set var `exit 3`
\echo :SHELL_EXIT_CODE
3


On Fri, Nov 4, 2022 at 5:08 AM Corey Huinker 
wrote:

>
> Over in
> https://www.postgresql.org/message-id/eaf326ad693e74eba068f33a7f518...@oss.nttdata.com
>  Justin
> Pryzby suggested that psql might need the ability to capture the shell exit
> code.
>
> This is a POC patch that does that, but doesn't touch on the ON_ERROR_STOP
> stuff.
>
> I've added some very rudimentary tests, but haven't touched the
> documentation, because I strongly suspect that someone will suggest a
> better name for the variable.
>
> But basically, it works like this
>
> -- SHELL_EXIT_CODE is undefined
> \echo :SHELL_EXIT_CODE
> :SHELL_EXIT_CODE
> -- bad \!
> \! borp
> sh: line 1: borp: command not found
> \echo :SHELL_EXIT_CODE
> 32512
> -- bad backtick
> \set var `borp`
> sh: line 1: borp: command not found
> \echo :SHELL_EXIT_CODE
> 127
> -- good \!
> \! true
> \echo :SHELL_EXIT_CODE
> 0
> -- play with exit codes
> \! exit 4
> \echo :SHELL_EXIT_CODE
> 1024
> \set var `exit 3`
> \echo :SHELL_EXIT_CODE
> 3
>
>
> Feedback welcome.
>


Re: Make ON_ERROR_STOP stop on shell script failure

2022-11-04 Thread Corey Huinker
>
> I think it'd be a lot better to expose the script status to psql.
> (without having to write "foo; echo status=$?").
>

I agree, and I hacked up a proof of concept, but started another thread at
https://www.postgresql.org/message-id/CADkLM=cWao2x2f+UDw15W1JkVFr_bsxfstw=ngea7r9m4j-...@mail.gmail.com
so as not to clutter up this one.


Add SHELL_EXIT_CODE to psql

2022-11-04 Thread Corey Huinker
Over in
https://www.postgresql.org/message-id/eaf326ad693e74eba068f33a7f518...@oss.nttdata.com
Justin
Pryzby suggested that psql might need the ability to capture the shell exit
code.

This is a POC patch that does that, but doesn't touch on the ON_ERROR_STOP
stuff.

I've added some very rudimentary tests, but haven't touched the
documentation, because I strongly suspect that someone will suggest a
better name for the variable.

But basically, it works like this

-- SHELL_EXIT_CODE is undefined
\echo :SHELL_EXIT_CODE
:SHELL_EXIT_CODE
-- bad \!
\! borp
sh: line 1: borp: command not found
\echo :SHELL_EXIT_CODE
32512
-- bad backtick
\set var `borp`
sh: line 1: borp: command not found
\echo :SHELL_EXIT_CODE
127
-- good \!
\! true
\echo :SHELL_EXIT_CODE
0
-- play with exit codes
\! exit 4
\echo :SHELL_EXIT_CODE
1024
\set var `exit 3`
\echo :SHELL_EXIT_CODE
3


Feedback welcome.
From 119837575f4d0da804d92ec797bbf11e8075e595 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Fri, 4 Nov 2022 04:45:39 -0400
Subject: [PATCH] POC: expose shell exit code as a psql variable

---
 src/bin/psql/command.c |  4 
 src/bin/psql/help.c|  2 ++
 src/bin/psql/psqlscanslash.l   | 28 +---
 src/test/regress/expected/psql.out | 11 +++
 src/test/regress/sql/psql.sql  |  7 +++
 5 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index ab613dd49e..4666a63051 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -4957,6 +4957,7 @@ static bool
 do_shell(const char *command)
 {
 	int			result;
+	char		exit_code_buf[32];
 
 	fflush(NULL);
 	if (!command)
@@ -4984,6 +4985,9 @@ do_shell(const char *command)
 	else
 		result = system(command);
 
+	snprintf(exit_code_buf, sizeof(exit_code_buf), "%d", WEXITSTATUS(result));
+	SetVariable(pset.vars, "SHELL_EXIT_CODE", exit_code_buf);
+
 	if (result == 127 || result == -1)
 	{
 		pg_log_error("\\!: failed");
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index f8ce1a0706..04f996332e 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -454,6 +454,8 @@ helpVariables(unsigned short int pager)
 		  "show all results of a combined query (\\;) instead of only the last\n");
 	HELP0("  SHOW_CONTEXT\n"
 		  "controls display of message context fields [never, errors, always]\n");
+	HELP0("  SHELL_EXIT_CODE\n"
+		  "Exit code of the last shell command\n");
 	HELP0("  SINGLELINE\n"
 		  "if set, end of line terminates SQL commands (same as -S option)\n");
 	HELP0("  SINGLESTEP\n"
diff --git a/src/bin/psql/psqlscanslash.l b/src/bin/psql/psqlscanslash.l
index a467b72144..30e6f5dcd4 100644
--- a/src/bin/psql/psqlscanslash.l
+++ b/src/bin/psql/psqlscanslash.l
@@ -27,6 +27,8 @@
 
 %{
 #include "fe_utils/psqlscan_int.h"
+#include "settings.h"
+#include "variables.h"
 
 /*
  * We must have a typedef YYSTYPE for yylex's first argument, but this lexer
@@ -774,6 +776,8 @@ evaluate_backtick(PsqlScanState state)
 	bool		error = false;
 	char		buf[512];
 	size_t		result;
+	int			exit_code = 0;
+	char		exit_code_buf[32];
 
 	initPQExpBuffer(_output);
 
@@ -783,6 +787,7 @@ evaluate_backtick(PsqlScanState state)
 	{
 		pg_log_error("%s: %m", cmd);
 		error = true;
+		exit_code = -1;
 	}
 
 	if (!error)
@@ -800,10 +805,25 @@ evaluate_backtick(PsqlScanState state)
 		} while (!feof(fd));
 	}
 
-	if (fd && pclose(fd) == -1)
+	if (fd)
 	{
-		pg_log_error("%s: %m", cmd);
-		error = true;
+		exit_code = pclose(fd);
+		if (exit_code == -1)
+		{
+			pg_log_error("%s: %m", cmd);
+			error = true;
+		}
+		if (WIFEXITED(exit_code))
+		{
+			exit_code=WEXITSTATUS(exit_code);
+		}
+		else if(WIFSIGNALED(exit_code)) {
+			exit_code=WTERMSIG(exit_code);
+		}
+		else if(WIFSTOPPED(exit_code)) {
+			//If you need to act upon the process stopping, do it here.
+			exit_code=WSTOPSIG(exit_code);
+		}
 	}
 
 	if (PQExpBufferDataBroken(cmd_output))
@@ -826,5 +846,7 @@ evaluate_backtick(PsqlScanState state)
 		appendBinaryPQExpBuffer(output_buf, cmd_output.data, cmd_output.len);
 	}
 
+	snprintf(exit_code_buf, sizeof(exit_code_buf), "%d", exit_code);
+	SetVariable(pset.vars, "SHELL_EXIT_CODE", exit_code_buf);
 	termPQExpBuffer(_output);
 }
diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out
index a7f5700edc..33202ffda7 100644
--- a/src/test/regress/expected/psql.out
+++ b/src/test/regress/expected/psql.out
@@ -1275,6 +1275,17 @@ execute q;
 ++-+
 
 deallocate q;
+-- test SHELL_EXIT_CODE
+\! nosuchcommand
+sh: line 1: nosuchcommand: command not found
+\echo :SHELL_EXIT_CODE
+127
+\set nosuchvar `nosuchcommand`
+sh: line 1: nosuchcommand: command not found
+\! nosuchcommand
+sh: line 1: nosuchcommand: command not found
+\echo :SHELL_EXIT_CODE
+127
 -- test single-line header and data
 prepare q as select repeat('x',2*n) as "0123456789abcdef", repeat('y',20-2*n) as "0123456789" from generate_series(1,10) as n;
 \pset 

Re: psql: Add command to use extended query protocol

2022-11-01 Thread Corey Huinker
>
>
>  SELECT $1, $2 \gp 'foo' 'bar'
>
>
I think this is a great idea, but I foresee people wanting to send that
output to a file or a pipe like \g allows. If we assume everything after
the \gp is a param, don't we paint ourselves into a corner?


Re: pg_dump: Refactor code that constructs ALTER ... OWNER TO commands

2022-11-01 Thread Corey Huinker
On Mon, Oct 24, 2022 at 5:54 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> Avoid having to list all the possible object types twice.  Instead, only
> _getObjectDescription() needs to know about specific object types.  It
> communicates back to _printTocEntry() whether an owner is to be set.
>
> In passing, remove the logic to use ALTER TABLE to set the owner of
> views and sequences.  This is no longer necessary.  Furthermore, if
> pg_dump doesn't recognize the object type, this is now a fatal error,
> not a warning.


Makes sense, passes all tests.

It's clearly out of scope for this very focused patch, but would it make
sense for the TocEntry struct to be populated with an type enumeration
integer as well as the type string to make for clearer and faster sifting
later?


Re: XMAX_LOCK_ONLY and XMAX_COMMITTED (fk/multixact code)

2022-10-20 Thread Corey Huinker
On Tue, Sep 20, 2022 at 2:32 PM Nathan Bossart 
wrote:

> Here is a rebased patch for cfbot.
>
>
>
Applies, passes make check world.

Patch is straightforward, but the previous code is less so. It purported to
set XMAX_COMMITTED _or_ XMAX_INVALID, but never seemed to un-set
XMAX_COMMITTED, was that the source of the double-setting?


Re: refactor ownercheck and aclcheck functions

2022-10-19 Thread Corey Huinker
On Fri, Oct 14, 2022 at 3:39 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> These patches take the dozens of mostly-duplicate pg_foo_ownercheck()
> and pg_foo_aclcheck() functions and replace (most of) them by common
> functions that are driven by the ObjectProperty table.  All the required
> information is already in that table.
>
> This is similar to the consolidation of the drop-by-OID functions that
> we did a while ago (b1d32d3e3230f00b5baba08f75b4f665c7d6dac6).


Nice reduction in footprint!

I'd be inclined to remove the highly used ones as well. That way the
codebase would have more examples of object_ownercheck() for readers to
see. Seeing the existence of pg_FOO_ownercheck implies that a
pg_BAR_ownercheck might exist, and if BAR is missing they might be inclined
to re-add it.

If we do keep them, would it make sense to go the extra step and turn the
remaining six "regular" into static inline functions or even #define-s?


Re: ts_locale.c: why no t_isalnum() test?

2022-10-19 Thread Corey Huinker
On Wed, Oct 5, 2022 at 3:53 PM Tom Lane  wrote:

> I happened to wonder why various places are testing things like
>
> #define ISWORDCHR(c)(t_isalpha(c) || t_isdigit(c))
>
> rather than using an isalnum-equivalent test.  The direct answer
> is that ts_locale.c/.h provides no such test function, which
> apparently is because there's not a lot of potential callers in
> the core code.  However, both pg_trgm and ltree could benefit
> from adding one.
>
> There's no semantic hazard here: the documentation I consulted
> is all pretty explicit that is[w]alnum is true exactly when
> either is[w]alpha or is[w]digit are.  For example, POSIX saith
>
> The iswalpha() and iswalpha_l() functions shall test whether wc is a
> wide-character code representing a character of class alpha in the
> current locale, or in the locale represented by locale, respectively;
> see XBD Locale.
>
> The iswdigit() and iswdigit_l() functions shall test whether wc is a
> wide-character code representing a character of class digit in the
> current locale, or in the locale represented by locale, respectively;
> see XBD Locale.
>
> The iswalnum() and iswalnum_l() functions shall test whether wc is a
> wide-character code representing a character of class alpha or digit
> in the current locale, or in the locale represented by locale,
> respectively; see XBD Locale.
>
> While I didn't try to actually measure it, these functions don't
> look remarkably cheap.  Doing char2wchar() twice when we only need
> to do it once seems silly, and the libc functions themselves are
> probably none too cheap for multibyte characters either.
>
> Hence, I propose the attached.  I got rid of some places that were
> unnecessarily checking pg_mblen before applying t_iseq(), too.
>
> regards, tom lane
>
>
I see this is already committed, but I'm curious, why do t_isalpha and
t_isdigit have the pair of /* TODO */ comments? This unfinished business
isn't explained anywhere in the file.


Re: Getting rid of SQLValueFunction

2022-10-18 Thread Corey Huinker
On Fri, Sep 30, 2022 at 2:04 AM Michael Paquier  wrote:

> Hi all,
>
> I have bumped a few days ago on the fact that COERCE_SQL_SYNTAX
> (introduced by 40c24bf) and SQLValueFunction are around to do the
> exact same thing, as known as enforcing single-function calls with
> dedicated SQL keywords.  For example, keywords like SESSION_USER,
> CURRENT_DATE, etc. go through SQLValueFunction and rely on the parser
> to set a state that gets then used in execExprInterp.c.  And it is
> rather easy to implement incorrect SQLValueFunctions, as these rely on
> more hardcoded assumptions in the parser and executor than the
> equivalent FuncCalls (like collation to assign when using a text-like
> SQLValueFunctions).
>
> There are two categories of single-value functions:
> - The ones returning names, where we enforce a C collation in two
> places of the code (current_role, role, current_catalog,
> current_schema, current_database, current_user), even if
> get_typcollation() should do that for name types.
> - The ones working on time, date and timestamps (localtime[stamp],
> current_date, current_time[stamp]), for 9 patterns as these accept an
> optional typmod.
>
> I have dug into the possibility to unify all that with a single
> interface, and finish with the attached patch set which is a reduction
> of code, where all the SQLValueFunctions are replaced by a set of
> FuncCalls:
>  25 files changed, 338 insertions(+), 477 deletions(-)
>
> 0001 is the move done for the name-related functions, cleaning up two
> places in the executor when a C collation is assigned to those
> function expressions.  0002 is the remaining cleanup for the
> time-related ones, moving a set of parser-side checks to the execution
> path within each function, so as all this knowledge is now local to
> each file holding the date and timestamp types.  Most of the gain is
> in 0002, obviously.
>
> The pg_proc entries introduced for the sake of the move use the same
> name as the SQL keywords.  These should perhaps be prefixed with a
> "pg_" at least.  There would be an exception with pg_localtime[stamp],
> though, where we could use a pg_localtime[stamp]_sql for the function
> name for prosrc.  I am open to suggestions for these names.
>
> Thoughts?
> --
> Michael
>

I like this a lot. Deleted code is debugged code.

Patch applies and passes make check-world.

No trace of SQLValueFunction is left in the codebase, at least according to
`git grep -l`.

I have only one non-nitpick question about the code:

+ /*
+ * we're not too tense about good error message here because grammar
+ * shouldn't allow wrong number of modifiers for TIME
+ */
+ if (n != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid type modifier")));


I agree that we shouldn't spend too much effort on a good error message
here, but perhaps we should have the message mention that it is
date/time-related? A person git-grepping for this error message will get 4
hits in .c files (date.c, timestamp.c, varbit.c, varchar.c) so even a
slight variation in the message could save them some time.

This is an extreme nitpick, but the patchset seems like it should have been
1 file or 3 (remove name functions, remove time functions, remove
SQLValueFunction infrastructure), but that will only matter in the unlikely
case that we find a need for SQLValueFunction but we want to leave the
timestamp function as COERCE_SQL_SYNTAX.


Re: predefined role(s) for VACUUM and ANALYZE

2022-10-14 Thread Corey Huinker
>
> Sounds good.  Here's a new patch set with aclitem's typalign fixed.
>

Patch applies.
Passes make check and make check-world.
Test coverage seems adequate.

Coding is very clear and very much in the style of the existing code. Any
quibbles I have with the coding style are ones I have with the overall
pg-style, and this isn't the forum for that.

I haven't done any benchmarking yet, but it seems that the main question
will be the impact on ordinary DML statements.

I have no opinion about the design debate earlier in this thread, but I do
think that this patch is ready and adds something concrete to the ongoing
discussion.


WIP: Analyze whether our docs need more granular refentries.

2022-10-13 Thread Corey Huinker
In reviewing another patch, I noticed that the documentation had an xref to
a fairly large page of documentation (create_table.sgml), and I wondered if
that link was chosen because the original author genuinely felt the entire
page was relevant, or merely because a more granular link did not exist at
the time, and this link had been carried forward since then while the
referenced page grew in complexity.

In the interest of narrowing the problem down to a manageable size, I wrote
a script (attached) to find all xrefs and rank them by criteria[1] that I
believe hints at the possibility that the xrefs should be more granular
than they are.

I intend to use the script output below as a guide for manually reviewing
the references and seeing if there are opportunities to guide the reader to
the relevant section of those pages.

In case anyone is curious, here is a top excerpt of the script output:

file_name  link_name link_count
 line_count  num_refentries
-    --
 --  --
ref/psql-ref.sgml  app-psql  20
 52151
ecpg.sgml  ecpg-sql-allocate-descriptor  4
  10101   17
ref/create_table.sgml  sql-createtable   23
 24371
ref/select.sgmlsql-select23
 22071
ref/create_function.sgml   sql-createfunction30
 935 1
ref/alter_table.sgml   sql-altertable12
 17761
ref/pg_dump.sgml   app-pgdump11
 15451
ref/pg_basebackup.sgml app-pgbasebackup  11
 10081
ref/create_type.sgml   sql-createtype10
 10291
ref/create_index.sgml  sql-createindex   9
  999 1
ref/postgres-ref.sgml  app-postgres  10
 845 1
ref/copy.sgml  sql-copy  7
  10811
ref/create_role.sgml   sql-createrole13
 511 1
ref/grant.sgml sql-grant 13
 507 1
ref/create_foreign_table.sgml  sql-createforeigntable14
 455 1
ref/insert.sgmlsql-insert8
  792 1
ref/pg_ctl-ref.sgmlapp-pg-ctl8
  713 1
ref/create_trigger.sgmlsql-createtrigger 7
  777 1
ref/set.sgml   sql-set   15
 332 1
ref/create_aggregate.sgml  sql-createaggregate   6
  805 1
ref/initdb.sgmlapp-initdb8
  588 1
ref/create_policy.sgml sql-createpolicy  7
  655 1
dblink.sgmlcontrib-dblink-connect1
  213619
ref/create_subscription.sgml   sql-createsubscription9
  472 1

Some of these will clearly be false positives. For instance, dblink.sgml
and ecpg.sgml have a lot of refentries, but they seem to lack a global
"top" refentry which I assumed would be there.

On the other hand, I have to wonder if the references to psql might be to a
specific feature of the tool, and perhaps we can create refentries to those.

[1] The criteria is: must be first refentry in file, file must be at least
200 lines long, then rank by lines*references, 2x for referencing the top
refentry when others exist


xref-analysis.sh
Description: application/shellscript


Re: future of serial and identity columns

2022-10-12 Thread Corey Huinker
>
> The feedback was pretty positive, so I dug through all the tests to at
> least get to the point where I could see the end of it.  The attached
> patch 0001 is the actual code and documentation changes.  The 0002 patch
> is just tests randomly updated or disabled to make the whole suite pass.
>   This reveals that there are a few things that would warrant further
> investigation, in particular around extensions and partitioning.  To be
> continued.
>

I like what I see so far!

Question: the xref  refers the reader to sql-createtable, which is a pretty
big page, which could leave the reader lost. Would it make sense to create
a SQL-CREATETABLE-IDENTITY anchor and link to that instead?


Re: [patch] \g with multiple result sets and \watch with copy queries

2022-10-10 Thread Corey Huinker
>
> This is a bit more complicated than the usual tests, but not
> that much.
> Any opinions on this?


+1

I think that because it is more complicated than usual psql, we may want to
comment on the intention of the tests and some of the less-than-common psql
elements (\set concatenation, resetting \o, etc). If you see value in that
I can amend the patch.

Are there any options on COPY (header, formats) that we think we should
test as well?


Re: Error-safe user functions

2022-10-10 Thread Corey Huinker
>
>
> The idea is simple -- introduce new "error-safe" calling mode of user
> functions by passing special node through FunctCallInfo.context, in
> which function should write error info and return instead of throwing
> it.  Also such functions should manually free resources before
> returning an error.  This gives ability to avoid PG_TRY/PG_CATCH and
> subtransactions.
>
> I tried something similar when trying to implement TRY_CAST (
https://learn.microsoft.com/en-us/sql/t-sql/functions/try-cast-transact-sql?view=sql-server-ver16)
late last year. I also considered having a default datum rather than just
returning NULL.

I had not considered a new node type. I had considered having every
function have a "safe" version, which would be a big duplication of logic
requiring a lot of regression tests and possibly fuzzing tests.

Instead, I extended every core input function to have an extra boolean
parameter to indicate if failures were allowed, and then an extra Datum
parameter for the default value. The Input function wouldn't need to check
the value of the new parameters until it was already in a situation where
it found invalid data, but the extra overhead still remained, and it meant
that basically every third party type extension would need to be changed.

Then I considered whether the cast failure should be completely silent, or
if the previous error message should instead be omitted as a LOG/INFO/WARN,
and if we'd want that to be configurable, so then the boolean parameter
became an integer enum:

* regular fail (0)
* use default silently (1)
* use default emit LOG/NOTICE/WARNING (2,3,4)

At the time, all of this seemed like too big of a change for a function
that isn't even in the SQL Standard, but maybe SQL/JSON changes that.

If so, it would allow for a can-cast-to test that users would find very
useful. Something like:

SELECT CASE WHEN 'abc' CAN BE integer THEN 'Integer' ELSE 'Nope' END

There's obviously no standard syntax to support that, but the data
cleansing possibilities would be great.


Re: Query generates infinite loop

2022-05-10 Thread Corey Huinker
>
> Less sure about that.  ISTM the reason that the previous proposal failed
> was that it introduced too much ambiguity about how to resolve
> unknown-type arguments.  Wouldn't the same problems arise here?
>

If I recall, the problem was that the lack of a date-specific
generate_series function would result in a date value being coerced to
timestamp, and thus adding generate_series(date, date, step) would change
behavior of existing code, and that was a POLA violation (among other bad
things).

By adding a different function, there is no prior behavior to worry about.
So we should be safe with the following signatures doing the right thing,
yes?:
generate_finite_series(start timestamp, step interval, num_elements
integer)
generate_finite_series(start date, step integer, num_elements integer)
generate_finite_series(start date, step interval year to month,
num_elements integer)


Re: Query generates infinite loop

2022-05-09 Thread Corey Huinker
On Mon, May 9, 2022 at 12:02 AM Tom Lane  wrote:

> Corey Huinker  writes:
> > On Wed, May 4, 2022 at 3:01 PM Jeff Janes  wrote:
> >> On Wed, Apr 20, 2022 at 5:43 PM Tom Lane  wrote:
> >>> Oh --- looks like numeric generate_series() already throws error for
> >>> this, so we should just make the timestamp variants do the same.
>
> > This came up once before
> >
> https://www.postgresql.org/message-id/CAB7nPqQUuUh_W3s55eSiMnt901Ud3meF7f_96yPkKcqfd1ZaMg%40mail.gmail.com
>
> Oh!  I'd totally forgotten that thread, but given that discussion,
> and particularly the counterexample at
>
> https://www.postgresql.org/message-id/16807.1456091547%40sss.pgh.pa.us
>
> it now feels to me like maybe this change was a mistake.  Perhaps
> instead of the committed change, we ought to go the other way and
> rip out the infinity checks in numeric generate_series().
>

The infinite-upper-bound-withlimit-pushdown counterexample makes sense, but
seems like we're using generate_series() only because we lack a function
that generates a series of N elements, without a specified upper bound,
something like

 generate_finite_series( start, step, num_elements )

And if we did that, I'd lobby that we have one that takes dates as well as
one that takes timestamps, because that was my reason for starting the
thread above.


Re: Query generates infinite loop

2022-05-08 Thread Corey Huinker
On Wed, May 4, 2022 at 3:01 PM Jeff Janes  wrote:

> On Wed, Apr 20, 2022 at 5:43 PM Tom Lane  wrote:
>
>> I wrote:
>> > it's true that infinities as generate_series endpoints are going
>> > to work pretty oddly, so I agree with the idea of forbidding 'em.
>>
>> > Numeric has infinity as of late, so the numeric variant would
>> > need to do this too.
>>
>> Oh --- looks like numeric generate_series() already throws error for
>> this, so we should just make the timestamp variants do the same.
>>
>
> The regression test you added for this change causes an infinite loop when
> run against an unpatched server with --install-check.  That is a bit
> unpleasant.  Is there something we can and should do about that?  I was
> expecting regression test failures of course but not an infinite loop
> leading towards disk exhaustion.
>
> Cheers,
>
> Jeff
>

This came up once before
https://www.postgresql.org/message-id/CAB7nPqQUuUh_W3s55eSiMnt901Ud3meF7f_96yPkKcqfd1ZaMg%40mail.gmail.com


Re: Window Function "Run Conditions"

2022-03-16 Thread Corey Huinker
On Tue, Mar 15, 2022 at 5:24 PM Greg Stark  wrote:

> This looks like an awesome addition.
>
> I have one technical questions...
>
> Is it possible to actually transform the row_number case into a LIMIT
> clause or make the planner support for this case equivalent to it (in
> which case we can replace the LIMIT clause planning to transform into
> a window function)?
>
> The reason I ask is because the Limit plan node is actually quite a
> bit more optimized than the general window function plan node. It
> calculates cost estimates based on the limit and can support Top-N
> sort nodes.
>
> But the bigger question is whether this patch is ready for a committer
> to look at? Were you able to resolve Andy Fan's bug report? Did you
> resolve the two questions in the original email?
>

+1 to all this

It seems like this effort would aid in implementing what some other
databases implement via the QUALIFY clause, which is to window functions
what HAVING is to aggregate functions.
example:
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#qualify_clause


Re: WIP: System Versioned Temporal Table

2022-02-20 Thread Corey Huinker
>
>
> The spec does not allow schema changes at all on a a system versioned
> table, except to change the system versioning itself.
>
>
That would greatly simplify things!


Re: WIP: System Versioned Temporal Table

2022-01-23 Thread Corey Huinker
>
>
> > 2. Putting data in a side table. This makes DROP SYSTEM VERSIONING
> > fairly trivial, but it complicates many DDL commands (please make a
> > list?) and requires the optimizer to know about this and cater to it,
> > possibly complicating plans. Neither issue is insurmountable, but it
> > becomes more intrusive.
>
> I'd vouch for this being the way to go; you completely sidestep issues
> like partitioning, unique constraints, optimization, etc. Especially
> true when 90% of the time, SELECTs will only be looking at
> currently-active data. MDB seems to have gone with the single-table
> approach (unless you partition) and I've run into a bug where I can't
> add a unique constraint because historical data fails.
>
>  System versioning & Application versioning
> I saw that there is an intent to harmonize system versioning with
> application versioning. Haven't read the AV thread so not positive if
> that meant intending to split tables by application versioning and
> system versioning both: to me it seems like maybe it would be good to
> use a separate table for SV, but keep AV in the same table. Reasons
> include:
>

The proposed AV uses just one table.


> - ISO states only one AV config per table, but there's no reason this
> always has to be the case; maybe you're storing products that are
> active for a period of time, EOL for a period of time, and obsolete
> for a period of time. If ISO sometime decides >1 AV config is OK,
> there would be a mess trying to split that into tables.
>

The proposed AV (so far) allows for that.


> - DB users who are allowed to change AV items likely won't be allowed
> to rewrite history by changing SV items. My proposed schema would keep
> these separate.
> - Table schemas change, and all (SV active) AV items would logically
> need to fit the active schema or be updated to do so. Different story
> for SV, nothing there should ever need to be changed.
>

Yeah, there's a mess (which you state below) about what happens if you
create a table and then rename a column, or drop a column and add a
same-named column back of another type at a later date, etc. In theory,
this means that the valid set of columns and their types changes according
to the time range specified. I may not be remembering correctly, but Vik
stated that the SQL spec seemed to imply that you had to track all those
things.


> - Partitioning for AV tables isn't as clear as with SV and is likely
> better to be user-defined
>

So this was something I was asking various parties about at PgConf NYC just
a few weeks ago. I am supposing that the main reason for SV is a regulatory
concern, what tolerance to regulators have for the ability to manipulate
the SV side-table? Is it possible to directly insert rows into one? If not,
then moving rows into a new partition becomes impossible, and you'd be
stuck with the partitioning strategy (if any) that you defined at SV
creation time.

The feedback I got was "well, you're already a superuser, if a regulator
had a problem with that then they would have required that the SV table's
storage be outside the server, either a foreign table, a csv foreign data
wrapper of some sort, or a trigger writing to a non-db storage (which
wouldn't even need SV).

>From that, I concluded that every single AV partition would have it's own
SV table, which could in turn be partitioned. In a sense, it might be
helpful to think of the SV tables as partitions of the main table, and the
period definition would effectively be the constraint that prunes the SV
partition.


>
> Sorry for acronyms, SV=system versioning, AV=application versioning
>
> In general, I think AV should be treated literally as extra rows in
> the main DB, plus the extra PK element and shortcut functions. SV
> though, needs to have a lot more nuance.
>
>  ALTER TABLE
> On to ideas about how ALTER TABLE could work. I don't think the
> question was ever answered "Do schema changes need to be tracked?" I'm
> generally in favor of saying that it should be possible to recreate
> the table exactly as it was, schema and all, at a specific period of
> time (perhaps for a view) using a fancy combination of SELECT ... AS
> and such - but it doesn't need to be straightforward. In any case, no
> data should ever be deleted by ALTER TABLE. As someone pointed out
> earlier, speed and storage space of ALTER TABLE are likely low
> considerations for system versioned tables.
>
> - ADD COLUMN easy, add the column to both the current and historical
> table, all null in historical
> - DROP COLUMN delete the column from the current table. Historical is
> difficult, because what happens if a new column with the same name is
> added? Maybe `DROP COLUMN col1` would rename col1 to _col1_1642929683
> (epoch time) in the historical table or something like that.
> - RENAME COLUMN is a bit tricky too - from a usability standpoint, the
> historical table should be renamed as well. A quick thought is maybe
> `RENAME col1 TO new_name` would 

Re: Push down time-related SQLValue functions to foreign server

2022-01-17 Thread Corey Huinker
>
> Hmm ... not really, because for these particular functions, the
> point is exactly that we *don't* translate them to some function
> call on the remote end.  We evaluate them locally and push the
> resulting constant to the far side, thus avoiding issues like
> clock skew.
>

Ah, my pattern matching brain was so excited to see a use for routine
mapping that I didn't notice that important detail.


Re: Push down time-related SQLValue functions to foreign server

2022-01-17 Thread Corey Huinker
> The implementation of converting now() to CURRENT_TIMESTAMP
> seems like an underdocumented kluge, too.
>

I'm very late to the party, but it seems to me that this effort is
describing a small subset of what "routine mapping" seems to be for:
defining function calls that can be pushed down to the foreign server, and
the analogous function on the foreign side. We may want to consider
implementing just enough of CREATE ROUTINE MAPPING and DROP ROUTINE MAPPING
to support these specific fixed functions.


Re: Add 64-bit XIDs into PostgreSQL 15

2022-01-07 Thread Corey Huinker
>
>  I'd be
> curious to know where we found the bits for that -- the tuple header
> isn't exactly replete with extra bit space.
>

+1 - and can we somehow shoehorn in a version # into the new format so we
never have to look for spare bits again.


Re: SQL:2011 application time

2022-01-06 Thread Corey Huinker
>
>
>  But
> the standard says that dropping system versioning should automatically
> drop all historical records (2 under Part 2: Foundation, 11.30  system versioning clause>). That actually makes sense though: when you
> do DML we automatically update the start/end columns, but we don't
> save copies of the previous data (and incidentally the end column will
> always be the max value.)


This is what I was referring to when I mentioned a side-table.
deleting history would be an O(1) operation. Any other
misunderstandings are all mine.


Re: Suggestion: optionally return default value instead of error on failed cast

2022-01-06 Thread Corey Huinker
On Thu, Jan 6, 2022 at 12:18 PM Andrew Dunstan  wrote:

>
> On 1/4/22 22:17, Corey Huinker wrote:
> >
> > currently a failed cast throws an error. It would be useful to have a
> > way to get a default value instead.
> >
> >
> > I've recently encountered situations where this would have been
> > helpful. Recently I came across some client code:
> >
> > CREATE OR REPLACE FUNCTION is_valid_json(str text) RETURNS boolean
> > LANGUAGE PLPGSQL
> > AS $$
> > DECLARE
> > j json;
> > BEGIN
> > j := str::json;
> > return true;
> > EXCEPTION WHEN OTHERS THEN return false;
> > END
> > $$;
> >
> >
> > This is a double-bummer. First, the function discards the value so we
> > have to recompute it, and secondly, the exception block prevents the
> > query from being parallelized.
>
>
> This particular case is catered for in the SQL/JSON patches which
> several people are currently reviewing:
>
>
That's great to know, but it would still be parsing the json twice, once to
learn that it is legit json, and once to get the casted value.

Also, I had a similar issue with type numeric, so having generic "x is a
type_y" support would essentially do everything that a try_catch()-ish
construct would need to do, and be more generic.


Re: SQL:2011 application time

2022-01-05 Thread Corey Huinker
On Wed, Jan 5, 2022 at 11:07 AM Peter Eisentraut <
peter.eisentr...@enterprisedb.com> wrote:

> On 21.11.21 02:51, Paul A Jungwirth wrote:
> > Here are updated patches. They are rebased and clean up some of my
> > TODOs.
>
> This patch set looks very interesting.  It's also very big, so it's
> difficult to see how to get a handle on it.  I did a pass through it
> to see if there were any obvious architectural or coding style
> problems.  I also looked at some of your TODO comments to see if I had
> something to contribute there.
>
> I'm confused about how to query tables based on application time
> periods.  Online, I see examples using AS OF, but in the SQL standard
> I only see this used for system time, which we are not doing here.
> What is your understanding of that?
>

Paul has previously supplied me with this document
https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf
and that formed the basis of a lot of my questions a few months earlier.

There was similar work being done for system periods, which are a bit
simpler but require a side (history) table to be created. I was picking
people's brains about some aspects of system versioning to see if I could
help bringing that into this already very large patchset, but haven't yet
felt like I had done enough research to post it.

It is my hope that we can at least get the syntax for both application and
system versioning committed, even if it's just stubbed in with
not-yet-supported errors.


Re: Suggestion: optionally return default value instead of error on failed cast

2022-01-04 Thread Corey Huinker
>
> currently a failed cast throws an error. It would be useful to have a
> way to get a default value instead.
>

I've recently encountered situations where this would have been helpful.
Recently I came across some client code:

CREATE OR REPLACE FUNCTION is_valid_json(str text) RETURNS boolean LANGUAGE
PLPGSQL
AS $$
DECLARE
j json;
BEGIN
j := str::json;
return true;
EXCEPTION WHEN OTHERS THEN return false;
END
$$;


This is a double-bummer. First, the function discards the value so we have
to recompute it, and secondly, the exception block prevents the query from
being parallelized.


>
> T-SQL has try_cast [1]
>

I'd be more in favor of this if we learn that there's no work (current or
proposed) in the SQL standard.


> Oracle has CAST(... AS .. DEFAULT ... ON CONVERSION ERROR) [2]
>

If the SQL group has suggested anything, I'd bet it looks a lot like this.


>
> The DEFAULT ... ON CONVERSION ERROR syntax seems like it could be
> implemented in PostgreSQL. Even if only DEFAULT NULL was supported (at
> first) that would already help.
>
> The short syntax could be extended for the DEFAULT NULL case, too:
>
> SELECT '...'::type -- throws error
> SELECT '...':::type -- returns NULL
>

I think I'm against adding a ::: operator, because too many people are
going to type (or omit) the third : by accident, and that would be a really
subtle bug. The CAST/TRY_CAST syntax is wordy but it makes it very clear
that you expected janky input and have specified a contingency plan.

The TypeCast node seems like it wouldn't need too much modification to
allow for this. The big lift, from what I can tell, is either creating
versions of every $foo_in() function to return NULL instead of raising an
error, and then effectively wrapping that inside a coalesce() to process
the default. Alternatively, we could add an extra boolean parameter
("nullOnFailure"? "suppressErrors"?) to the existing $foo_in() functions, a
boolean to return null instead of raising an error, and the default would
be handled in coerce_to_target_type(). Either of those would create a fair
amount of work for extensions that add types, but I think the value would
be worth it.

I do remember when I proposed the "void"/"black hole"/"meh" datatype (all
values map to NULL) I ran into a fairly fundamental rule that types must
map any not-null input to a not-null output, and this could potentially
violate that, but I'm not sure.

Does anyone know if the SQL standard has anything to say on this subject?


Re: Foreign key joins revisited

2021-12-27 Thread Corey Huinker
>
>
> First, there is only one FK in permission pointing to role, and we write a
> query leaving out the key columns.
> Then, another different FK in permission pointing to role is later added,
> and our old query is suddenly in trouble.
>
>
We already have that problem with cases where two tables have a common x
column:

SELECT x FROM a, b


so this would be on-brand for the standards body. And worst case scenario
you're just back to the situation you have now.


Re: Foreign key joins revisited

2021-12-26 Thread Corey Huinker
>
>
>
> Perhaps this would be more SQL idiomatic:
>
> FROM permission p
>LEFT JOIN ON KEY role IN p AS r
>LEFT JOIN team_role AS tr ON KEY role TO r
>LEFT JOIN ON KEY team IN tr AS t
>LEFT JOIN user_role AS ur ON KEY role TO r
>LEFT JOIN ON KEY user IN ur AS u
>
>
My second guess would be:

FROM permission p
LEFT JOIN role AS r ON [FOREIGN] KEY [(p.col1 [, p.col2 ...])]


where the key spec is only required when there are multiple foreign keys in
permission pointing to role.

But my first guess would be that the standards group won't get around to it.


Re: simplifying foreign key/RI checks

2021-12-20 Thread Corey Huinker
>
>
>
> Good catch, thanks.  Patch updated.
>
>
>
Applies clean. Passes check-world.


Re: Allow DELETE to use ORDER BY and LIMIT/OFFSET

2021-12-20 Thread Corey Huinker
>
> Out of curiosity, could you please tell me the concrete situations
> where you wanted to delete one of two identical records?
>

In my case, there is a table with known duplicates, and we would like to
delete all but the one with the lowest ctid, and then add a unique index to
the table which then allows us to use INSERT ON CONFLICT in a meaningful
way.

The other need for a DELETE...LIMIT or UPDATE...LIMIT is when you're
worried about flooding a replica, so you parcel out the DML into chunks
that don't cause unacceptable lag on the replica.

Both of these can be accomplished via  DELETE FROM foo WHERE ctid IN (
SELECT ... FROM foo ... LIMIT 1000), but until recently such a construct
would result in a full table scan, and you'd take the same hit with each
subsequent DML.

I *believe* that the ctid range scan now can limit those scans, especially
if you can order the limited set by ctid, but those techniques are not
widely known at this time.


Re: simplifying foreign key/RI checks

2021-12-19 Thread Corey Huinker
>
>
>
> I wasn't able to make much inroads into how we might be able to get
> rid of the DETACH-related partition descriptor hacks, the item (3),
> though I made some progress on items (1) and (2).
>
> For (1), the attached 0001 patch adds a new isolation suite
> fk-snapshot.spec to exercise snapshot behaviors in the cases where we
> no longer go through SPI.  It helped find some problems with the
> snapshot handling in the earlier versions of the patch, mainly with
> partitioned PK tables.  It also contains a test along the lines of the
> example you showed upthread, which shows that the partition descriptor
> hack requiring ActiveSnapshot to be set results in wrong results.
> Patch includes the buggy output for that test case and marked as such
> in a comment above the test.
>
> In updated 0002, I fixed things such that the snapshot-setting
> required by the partition descriptor hack is independent of
> snapshot-setting of the RI query such that it no longer causes the PK
> index scan to return rows that the RI query mustn't see.  That fixes
> the visibility bug illustrated in your example, and as mentioned, also
> exercised in the new test suite.
>
> I also moved find_leaf_pk_rel() into execPartition.c with a new name
> and a new set of parameters.
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com


Sorry for the delay. This patch no longer applies, it has some conflict
with d6f96ed94e73052f99a2e545ed17a8b2fdc1fb8a


Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Corey Huinker
On Sun, Dec 19, 2021 at 7:00 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > Which brings up a tangential question, is there value in having something
> > that brings in one or more env vars as psql vars directly. I'm thinking
> > something like:
>
> > \importenv pattern [prefix]
>
> Meh ... considering we've gone this long without any getenv capability
> in psql at all, that seems pretty premature.
>
> regards, tom lane
>

Fair enough.

Patches didn't apply with `git apply` but did fine with `patch -p1`, from
there it passes make check-world.


Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Corey Huinker
On Sun, Dec 19, 2021 at 5:48 PM Tom Lane  wrote:

> Corey Huinker  writes:
> > I have a nitpick about the \getenv FOO FOO lines.
> > It's a new function to everyone, and to anyone who hasn't seen the
> > documentation it won't be immediately obvious which one is the ENV var
> and
> > which one is the local var. Lowercasing the local var would be a way to
> > reinforce which is which to the reader. It would also be consistent with
> > var naming in the rest of the script.
>
> Reasonable idea.  Another thing I was wondering about was whether
> to attach PG_ prefixes to the environment variable names, since
> those are in a more-or-less global namespace.  If we do that,
> then a different method for distinguishing the psql variables
> is to not prefix them.


+1 to that as well.

Which brings up a tangential question, is there value in having something
that brings in one or more env vars as psql vars directly. I'm thinking
something like:

\importenv pattern [prefix]


(alternate names: \getenv_multi \getenv_pattern, \getenvs, etc)

which could be used like

\importenv PG* env_


which would import PGFOO and PGBAR as env_PGFOO and env_PGBAR, awkward
names but leaving no doubt about where a previously unreferenced variable
came from.

I don't *think* we need it for this specific case, but since the subject of
env vars has come up I thought I'd throw it out there.


Re: Getting rid of regression test input/ and output/ files

2021-12-19 Thread Corey Huinker
>
>
> 0001 adds the \getenv command to psql; now with documentation
> and a simple regression test.
>

+1. Wish I had added this years ago when I had a need for it.


>
> 0002 tweaks pg_regress to export the needed values as environment
> variables, and modifies the test scripts to use those variables.
> (For ease of review, this patch modifies the scripts in-place,
> and then 0003 will move them.)  A few comments on this:
>
> * I didn't see any value in exporting @testtablespace@ as a separate
> variable; we might as well let the test script know how to construct
> that path name.
>
> * I concluded that the right way to handle the concatenation issue
> is *not* to rely on SQL literal concatenation, but to use psql's
> \set command to concatenate parts of a string.  In particular this
>

+1 to that, much better than the multi-line thing.

I have a nitpick about the \getenv FOO FOO lines.
It's a new function to everyone, and to anyone who hasn't seen the
documentation it won't be immediately obvious which one is the ENV var and
which one is the local var. Lowercasing the local var would be a way to
reinforce which is which to the reader. It would also be consistent with
var naming in the rest of the script.


>
> 0004 finally removes the no-longer-needed infrastructure in
>

+1
Deleted code is debugged code.


Re: Add id's to various elements in protocol.sgml

2021-12-14 Thread Corey Huinker
On Sun, Dec 5, 2021 at 11:15 AM Daniel Gustafsson  wrote:

> > On 5 Dec 2021, at 16:51, Brar Piening  wrote:
>
> > The attached patch adds id's to various elements in protocol.sgml to
> > make them more accesssible via the public html documentation interface.
>
> Off the cuff without having checked the compiled results yet, it seems
> like a good idea.
>
> —
> Daniel Gustafsson
>

I wanted to do something similar for every function specification in the
docs. This may inspire me to take another shot at that.


Re: automatically generating node support functions

2021-10-11 Thread Corey Huinker
>
> build support and made the Perl code more portable, so that the cfbot
> doesn't have to be sad.
>

Was this also the reason for doing the output with print statements rather
than using one of the templating libraries? I'm mostly just curious, and
certainly don't want it to get in the way of working code.


Re: WIP: System Versioned Temporal Table

2021-09-19 Thread Corey Huinker
On Sun, Sep 19, 2021 at 3:12 PM Hannu Krosing  wrote:

> A side table has the nice additional benefit that we can very easily
> version the *table structure* so when we ALTER TABLE and the table
> structure changes we just make a new side table with now-currents
> structure.
>

It's true that would allow for perfect capture of changes to the table
structure, but how would you query the thing?

If a system versioned table was created with a column foo that is type
float, and then we dropped that column, how would we ever query what the
value of foo was in the past?

Would the columns returned from SELECT * change based on the timeframe
requested?

If we then later added another column that happened to also be named foo
but now was type JSONB, would we change the datatype returned based on the
time period being queried?

Is the change in structure a system versioning which itself must be
captured?


> Also we may want different set of indexes on historic table(s) for
> whatever reason
>

+1


>
> And we may even want to partition history tables for speed, storage
> cost  or just to drop very ancient history
>

+1


Re: WIP: System Versioned Temporal Table

2021-09-19 Thread Corey Huinker
>
> Thanks for giving this a lot of thought. When you asked the question
> the first time you hadn't discussed how that might work, but now we
> have something to discuss.
>

My ultimate goal is to unify this effort with the application period
effort. Step 1 in that was to understand what each was doing and why they
were doing it. If you check out the other thread, you'll see a highly
similar message that I sent over there.


> There are 3 implementation routes that I see, so let me explain so
> that others can join the discussion.
>
> 1. Putting all data in one table. This makes DROP SYSTEM VERSIONING
> effectively impossible. It requires access to the table to be
> rewritten to add in historical quals for non-historical access and it
> requires some push-ups around indexes. (The current patch adds the
> historic quals by kludging the parser, which is wrong place, since it
> doesn't work for joins etc.. However, given that issue, the rest seems
> to follow on naturally).
>
> 2. Putting data in a side table. This makes DROP SYSTEM VERSIONING
> fairly trivial, but it complicates many DDL commands (please make a
> list?) and requires the optimizer to know about this and cater to it,
> possibly complicating plans. Neither issue is insurmountable, but it
> becomes more intrusive.
>
> The current patch could go in either of the first 2 directions with
> further work.
>
> 3. Let the Table Access Method handle it. I call this out separately
> since it avoids making changes to the rest of Postgres, which might be
> a good thing, with the right TAM implementation.
>

I'd like to hear more about this idea number 3.

I could see value in allowing the history table to be a foreign table,
perhaps writing to csv/parquet/whatever files, and that sort of setup could
be persuasive to a regulator who wants extra-double-secret-proof that
auditing cannot be tampered with. But with that we'd have to give up the
relkind idea, which itself was going to be a cheap way to prevent updates
outside of the system triggers.


Re: Undocumented AT TIME ZONE INTERVAL syntax

2021-09-19 Thread Corey Huinker
>
>
>> Yeah, I really didn't expect to change the behavior, but wanted to make
> sure that the existing behavior was understood. I'll whip up a patch.
>

Attached is an attempt at an explanation of the edge cases I was
encountering, as well as some examples. If nothing else, the examples will
draw eyes and searches to the explanations that were already there.
From 618a7fbd5606510b993697a0a1968fde5f02fbb2 Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Sun, 19 Sep 2021 22:34:42 -0400
Subject: [PATCH 1/2] Explain some of the nuances with INTERVAL data when the
 string literal offers fewer positions of information than than the fields
 spefication requires.

---
 doc/src/sgml/datatype.sgml | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 50a2c8e5f1..83ebb68333 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -2816,7 +2816,30 @@ P  years-months-
  defined with a fields specification, the interpretation of
  unmarked quantities depends on the fields.  For
  example INTERVAL '1' YEAR is read as 1 year, whereas
- INTERVAL '1' means 1 second.  Also, field values
+ INTERVAL '1' means 1 second. If the string provided contains
+ a : then the number to the left of the first :
+ will be used to fill in the most significant sub-day part of the
+ fields speficification. 
+
+
+SELECT INTERVAL '2' HOUR TO SECOND;
+ interval
+--
+ 00:00:02
+
+SELECT INTERVAL '4:2' HOUR TO SECOND;
+ interval
+--
+ 04:02:00
+
+
+SELECT INTERVAL '2:' DAY TO SECOND;
+ interval
+--
+ 02:00:00
+
+
+ Also, field values
  to the right of the least significant field allowed by the
  fields specification are silently discarded.  For
  example, writing INTERVAL '1 day 2:03:04' HOUR TO MINUTE
-- 
2.30.2

From 042467c84a4f9d3cbbdb7660b0b174d99dadde9d Mon Sep 17 00:00:00 2001
From: coreyhuinker 
Date: Sun, 19 Sep 2021 23:26:28 -0400
Subject: [PATCH 2/2] Add example to show the effect of AT TIME ZONE INTERVAL.

---
 doc/src/sgml/func.sgml | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 78812b2dbe..eee10f2db9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10435,6 +10435,9 @@ SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40-05' AT TIME ZONE 'America/D
 
 SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'Asia/Tokyo' AT TIME ZONE 'America/Chicago';
 Result: 2001-02-16 05:38:40
+
+SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'UTC' AT TIME ZONE INTERVAL '3:21:20' HOUR TO SECOND;
+Result: 2001-02-17 00:00:00+00
 
 The first example adds a time zone to a value that lacks it, and
 displays the value using the current TimeZone
@@ -10442,7 +10445,8 @@ SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'Asia/Tokyo' AT TIME ZONE 'A
 to the specified time zone, and returns the value without a time zone.
 This allows storage and display of values different from the current
 TimeZone setting.  The third example converts
-Tokyo time to Chicago time.
+Tokyo time to Chicago time. The fourth example adds the time zone UTC
+to a value that lacks it, and then applies a highly contrived fixed offset.

 

-- 
2.30.2



Re: Undocumented AT TIME ZONE INTERVAL syntax

2021-09-19 Thread Corey Huinker
On Sun, Sep 19, 2021 at 10:56 AM Tom Lane  wrote:

> Corey Huinker  writes:
> >> SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE;
>
> > ... But none of this is in our own documentation.
>
> That's not entirely true.  [1] says
>
> When writing an interval constant with a fields specification, or when
> assigning a string to an interval column that was defined with a
> fields specification, the interpretation of unmarked quantities
> depends on the fields. For example INTERVAL '1' YEAR is read as 1
> year, whereas INTERVAL '1' means 1 second. Also, field values “to the
> right” of the least significant field allowed by the fields
> specification are silently discarded. For example, writing INTERVAL '1
> day 2:03:04' HOUR TO MINUTE results in dropping the seconds field, but
> not the day field.
>

That text addresses the case of the unadorned string (seconds) and the
overflow
case (more string values than places to put them), but doesn't really
address
the underflow.


>
> But I'd certainly agree that a couple of examples are not a specification.
> Looking at DecodeInterval, it looks like the rule is that unmarked or
> ambiguous fields are matched to the lowest field mentioned by the typmod
> restriction.  Thus
>
> regression=# SELECT INTERVAL '4:2' HOUR TO MINUTE;
>  interval
> --
>  04:02:00
> (1 row)
>
> regression=# SELECT INTERVAL '4:2' MINUTE TO SECOND;
>  interval
> --
>  00:04:02
> (1 row)


# SELECT INTERVAL '04:02' HOUR TO SECOND;

 interval

--

 04:02:00


This result was a bit unexpected, and the existing documentation doesn't
address underflow cases like this.

So, restating all this to get ready to document it, the rule seems to be:


1. Integer strings with no spaces or colons will always apply to the
rightmost end of the restriction given, lack of a restriction means seconds.


Example:


# SELECT INTERVAL '2' HOUR TO SECOND, INTERVAL '2' HOUR TO MINUTE, INTERVAL
'2';
 interval | interval | interval
--+--+--
 00:00:02 | 00:02:00 | 00:00:02
(1 row)



2. Strings with time context (space separator for days, : for everything
else) will apply starting with the leftmost part of the spec that fits,
continuing to the right until string values are exhausted.


Examples:


# SELECT INTERVAL '4:2' HOUR TO SECOND, INTERVAL '4:2' DAY TO SECOND;
 interval | interval
--+--
 04:02:00 | 04:02:00

(1 row)



> If you wanted to improve this para it'd be cool with me.
>

I think people's eyes are naturally drawn to the example tables, and
because the rules for handling string underflow are subtle, I think a few
concrete examples are the way to go.



>
> > Before I write a patch to add this to the documentation, I'm curious what
> > level of sloppiness we should tolerate in the interval calculation.
> Should
> > we enforce the time string to actually conform to the format laid out in
> > the X TO Y spec?
>
> We have never thrown away high-order fields:
>

And with the above I'm now clear that we're fine with the existing behavior
for underflow.


>
> I'm not sure what the SQL spec says here, but I'd be real hesitant to
> change the behavior of cases that we've accepted for twenty-plus
> years, unless they're just obviously insane.  Which these aren't IMO.
>

Yeah, I really didn't expect to change the behavior, but wanted to make
sure that the existing behavior was understood. I'll whip up a patch.


Undocumented AT TIME ZONE INTERVAL syntax

2021-09-18 Thread Corey Huinker
In reviewing Paul's application period patch, I noticed some very curious
syntax in the test cases. I learned that Paul is equally confused by it,
and has asked about it in his PgCon 2020 presentation

> SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE;
  timezone
-
 2018-03-04 05:02:00
(1 row)

Searching around, I found several instances of this syntax being used
[1][2][3], but with one important clarifying difference: the expected
syntax was

> SELECT '2018-03-04' AT TIME ZONE INTERVAL '2:00' HOUR TO MINUTE;
  timezone
-
 2018-03-04 07:00:00
(1 row)

Now I understand that the user probably meant to do this:

# SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR;
  timezone
-
 2018-03-04 07:00:00
(1 row)

But none of this is in our own documentation.

Before I write a patch to add this to the documentation, I'm curious what
level of sloppiness we should tolerate in the interval calculation. Should
we enforce the time string to actually conform to the format laid out in
the X TO Y spec? If we don't require that, is it correct to say that the
values will be filled from order of least significance to greatest?

[1]
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/DataTypes/Date-Time/TIMESTAMPATTIMEZONE.htm
[2]
https://docs.teradata.com/r/kmuOwjp1zEYg98JsB8fu_A/aWY6mGNJ5CYJlSDrvgDQag
[3]
https://community.snowflake.com/s/question/0D50Z9AqIaSSAV/is-it-possible-to-add-an-interval-of-5-hours-to-the-session-timezone-


Re: SQL:2011 application time

2021-09-18 Thread Corey Huinker
In IBM DB2 you can only have one because application-time periods must
> be named "business_time" (not joking).
>

I saw that as well, and it made me think that someone at IBM is a fan of
Flight Of The Conchords.


> Personally I feel like it's a weird limitation and I wouldn't mind
> supporting more, but my current implementation only allows for one,
> and I'd have to rethink some things to do it differently.
>

I'm satisfied that it's not something we need to do in the first MVP.


>
> Yes. Even though the name "SYSTEM_TIME" is technically enough, I'd
> still include a pertype column to make distinguishing system vs
> application periods easier and more obvious.
>

SYSTEM_TIME seems to allow for DATE values in the start_time and end_time
fields, though I cannot imagine how that would ever be practical, unless it
were somehow desirable to reject subsequent updates within a 24 hour
timeframe. I have seen instances where home-rolled application periods used
date values, which had similar problems where certain intermediate updates
would simply have to be discarded in favor of the one that was still
standing at midnight.


>
> > 2. The system versioning effort has chosen 'infinity' as their end-time
> value, whereas you have chosen NULL as that makes sense for an unbounded
> range. Other databases seem to leverage '-12-31 23:59:59' (SQLServer,
> IIRC) whereas some others seem to used '2999-12-31 23:59:59' but those
> might have been home-rolled temporal implementations. To further add to the
> confusion, the syntax seems to specify the keyword of MAXVALUE, which
> further muddies things. The system versioning people went with 'infinity'
> seemingly because it prescribe and end to the world like SQLServer did, but
> also because it allowed for a primary key based on (id, endtime) and that's
> just not possible with NULL endtime values.
>
> I think it's a little weird that our system-time patch mutates your
> primary key. None of the other RDMBSes do that. I don't think it's
> incompatible (as long as the system time patch knows how to preserve
> the extra period/range data in an application-time temporal key), but
> it feels messy to me.
>

Per outline below, I'm proposing an alternate SYSTEM_TIME implementation
that would leave the PK as-is.


> I would prefer if system-time and application-time used the same value
> to mean "unbounded". Using null means we can support any type (not
> just types with +-Infinity). And it pairs nicely with range types. If
> the only reason for system-time to use Infinity is the primary key, I
> think it would be better not to mutate the primary key (and store the
> historical records in a separate table as other RDMSes do).
>

The two  "big wins" of infinity seemed (to me) to be:

1. the ability to add "AND end_time = 'infinity'" as a cheap way to get
current rows
2. clauses like "WHERE CURRENT_DATE - 3 BETWEEN start_time AND end_time"
would work. Granted, there's very specific new syntax to do that properly,
but you know somebody's gonna see the columns and try to do it that way.


>
> Btw Oracle also uses NULL to mean "unbounded".
>

Huh, I missed that one. That is good in that it gives some precedence to
how you've approached it.


>
> We presently forbid PKs from including expressions, but my patch lifts
> that exception so it can index a rangetype expression built from the
> period start & end columns. So even if we must include the system-time
> end column in a PK, perhaps it can use a COALESCE expression to store
> Infinity even while using NULL to signify "currently true" from a user
> perspective.
>

Either way seems viable, but I understand why you want to leverage ranges
in this way.


>
> > 3. I noticed some inconsistency in the results from various "SELECT *
> FROM portion_of_test" examples. In some, the "valid_at" range is shown but
> not columns that make it up, and in some others, the "valid_from" and
> "valid_to" columns are shown, with no mention of the period. From what I've
> seen, the period column should be invisible unless invoked, like ctid or
> xmin.
>
> In most cases the tests test the same functionality with both PERIODs
> and rangetype columns. For FKs they test all four combinations of
> PERIOD-referencing-PERIOD, PERIOD-referencing-range,
> range-referencing-PERIOD, and range-referencing-range. If valid_at is
> a genuine column, it is included in SELECT *, but not if it is a
> PERIOD.
>

Ok, I'll have to look back over the test coverage to make sure that I
understand the behavior now.


>
> > 4. The syntax '2018-03-04' AT TIME ZONE INTERVAL '2'  HOUR TO MINUTE
> simply confounded me.
>
> Me too! I have no idea what that is supposed to mean. But that
> behavior predates my patch. I only had to deal with it because it
> creates a shift-reduce conflict with `FOR PORTION OF valid_at FROM x
> TO y`, where x & y are expressions. I asked about this syntax at my
> PgCon 2020 talk, but I haven't ever received an answer. Perhaps
> someone else knows 

Re: WIP: System Versioned Temporal Table

2021-09-18 Thread Corey Huinker
>
>
>
> 1. Much of what I have read about temporal tables seemed to imply or
> almost assume that system temporal tables would be implemented as two
> actual separate tables. Indeed, SQLServer appears to do it that way [1]
> with syntax like
>
> WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WebsiteUserInfoHistory));
>
>
> Q 1.1. Was that implementation considered and if so, what made this
> implementation more appealing?
>
>
I've been digging some more on this point, and I've reached the conclusion
that a separate history table is the better implementation. It would make
the act of removing system versioning into little more than a DROP TABLE,
plus adjusting the base table to reflect that it is no longer system
versioned.

What do you think of this method:

1. The regular table remains unchanged, but a pg_class attribute named
"relissystemversioned" would be set to true
2. I'm unsure if the standard allows dropping a column from a table while
it is system versioned, and the purpose behind system versioning makes me
believe the answer is a strong "no" and requiring DROP COLUMN to fail
on relissystemversioned = 't' seems pretty straightforward.
3. The history table would be given a default name of $FOO_history (space
permitting), but could be overridden with the history_table option.
4. The history table would have relkind = 'h'
5. The history table will only have rows that are not current, so it is
created empty.
6. As such, the table is effectively append-only, in a way that vacuum can
actually leverage, and likewise the fill factor of such a table should
never be less than 100.
7. The history table could only be updated only via system defined triggers
(insert,update,delete, alter to add columns), or row migration similar to
that found in partitioning. It seems like this would work as the two tables
working as partitions of the same table, but presently we can't have
multi-parent partitions.
8. The history table would be indexed the same as the base table, except
that all unique indexes would be made non-unique, and an index of pk +
start_time + end_time would be added
9. The primary key of the base table would remain the existing pk vals, and
would basically function normally, with triggers to carry forth changes to
the history table. The net effect of this is that the end_time value of all
rows in the main table would always be the chosen "current" value
(infinity, null, -12-31, etc) and as such might not actually _need_ to
be stored.
10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use
FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, would simply use the base table
directly with no quals to add.
11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS
OF CURRENT_TIMESTAMP, then the query would do a union of the base table and
the history table with quals applied to both.
12. It's a fair question whether the history table would be something that
could be queried directly. I'm inclined to say no, because that allows for
things like SELECT FOR UPDATE, which of course we'd have to reject.
13. If a history table is directly referenceable, then SELECT permission
can be granted or revoked as normal, but all insert/update/delete/truncate
options would raise an error.
14. DROP SYSTEM VERSIONING from a table would be quite straightforward -
the history table would be dropped along with the triggers that reference
it, setting relissystemversioned = 'f' on the base table.

I think this would have some key advantages:

1. MVCC bloat is no worse than it was before.
2. No changes whatsoever to referential integrity.
3. DROP SYSTEM VERSIONING becomes an O(1) operation.

Thoughts?

I'm going to be making a similar proposal to the people doing the
application time effort, but I'm very much hoping that we can reach some
consensus and combine efforts.


Re: WIP: System Versioned Temporal Table

2021-09-13 Thread Corey Huinker
On Sun, Sep 12, 2021 at 12:02 PM Simon Riggs 
wrote:

> On Fri, 10 Sept 2021 at 19:30, Jaime Casanova
>  wrote:
> >
> > On Tue, Aug 10, 2021 at 01:20:14PM +0100, Simon Riggs wrote:
> > > On Wed, 14 Jul 2021 at 12:48, vignesh C  wrote:
> > >
> > > > The patch does not apply on Head anymore, could you rebase and post a
> > > > patch. I'm changing the status to "Waiting for Author".
> > >
> > > OK, so I've rebased the patch against current master to take it to v15.
> > >
> > > I've then worked on the patch some myself to make v16 (attached),
> > > adding these things:
> > >
> >
> > Hi Simon,
> >
> > This one doesn't apply nor compile anymore.
> > Can we expect a rebase soon?
>
> Hi Jaime,
>
> Sorry for not replying.
>
> Yes, I will rebase again to assist the design input I have requested.
> Please expect that on Sep 15.
>
> Cheers
>
> --
> Simon Riggshttp://www.EnterpriseDB.com/
>
>
>
I've been interested in this patch, especially with how it will
interoperate with the work on application periods in
https://www.postgresql.org/message-id/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mdhcy4_qq0+noc...@mail.gmail.com
. I've written up a few observations and questions in that thread, and
wanted to do the same here, as the questions are a bit narrower but no less
interesting.

1. Much of what I have read about temporal tables seemed to imply or almost
assume that system temporal tables would be implemented as two actual
separate tables. Indeed, SQLServer appears to do it that way [1] with
syntax like

WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.WebsiteUserInfoHistory));


Q 1.1. Was that implementation considered and if so, what made this
implementation more appealing?

2. The endtime column constraint which enforces GENERATED ALWAYS AS ROW END
seems like it would have appeal outside of system versioning, as a lot of
tables have a last_updated column, and it would be nice if it could handle
itself and not rely on fallible application programmers or require trigger
overhead.

Q 2.1. Is that something we could break out into its own patch?

3. It is possible to have bi-temporal tables (having both a system_time
period and a named application period) as described in [2], the specific
example was

CREATE TABLE Emp(
  ENo INTEGER,
  EStart DATE,
  EEnd DATE,
  EDept INTEGER,
  PERIOD FOR EPeriod (EStart, EEnd),
  Sys_start TIMESTAMP(12) GENERATED ALWAYS AS ROW START,
  Sys_end TIMESTAMP(12) GENERATED ALWAYS AS ROW END,
  EName VARCHAR(30),
  PERIOD FOR SYSTEM_TIME(Sys_start, Sys_end),
  PRIMARY KEY (ENo, EPeriod WITHOUT OVERLAPS),
  FOREIGN KEY (Edept, PERIOD EPeriod) REFERENCES Dept (DNo, PERIOD DPeriod)
) WITH SYSTEM VERSIONING


What's interesting here is that in the case of a bitemporal table, it was
the application period that got the defined primary key. The paper went on
that only the _current_ rows of the table needed to be unique for, as it
wasn't possible to create rows with past system temporal values. This
sounds like a partial index to me, and luckily postgres can do referential
integrity on any unique index, not just primary keys. In light of the
assumption of a history side-table, I guess I shouldn't be surprised.

Q 3.1. Do you think that it would be possible to implement system
versioning with just a unique index?
Q 3.2. Are there any barriers to using a partial index as the hitch for a
foreign key? Would it be any different than the implied "and endtime =
'infinity'" that's already being done?

4. The choice of 'infinity' seemed like a good one initially - it's not
null so it can be used in a primary key, it's not some hackish magic date
like SQLServer's '-12-31 23:59:59.999'. However, it may not jibe as
well with application versioning, which is built very heavily upon range
types (and multirange types), and those ranges are capable of saying that a
record is valid for an unbounded amount of time in the future, that's
represented with NULL, not infinity. It could be awkward to have the system
endtime be infinity and the application period endtime be NULL.

Q 4.1. Do you have any thoughts about how to resolve this?

5. System versioning columns were indicated with additional columns in
pg_attribute.

Q 5.1. If you were to implement application versioning yourself, would you
just add additional columns to pg_attribute for those?

6. The current effort to implement application versioning creates an
INFORMATION_SCHEMA view called PERIODS. I wasn't aware of this one before
but there seems to be precedent for it existing.

Q 6.1. Would system versioning belong in such a view?

7. This is a trifle, but the documentation is inconsistent about starttime
vs StartTime and endtime vs EndTime.

8. Overall, I'm really excited about both of these efforts, and I'm looking
for ways to combine the efforts, perhaps starting with a patch that
implements the SQL syntax, but raises not-implemented errors, and each
effort could then build off of that.

[1] 

Re: SQL:2011 application time

2021-09-13 Thread Corey Huinker
So I've been eagerly watching this thread and hoping to have time to devote
to it. I've also been looking at the thread at
https://www.postgresql.org/message-id/calay4q8pp699qv-pjzc4tos-e2nzrjkrvax-xqg1aqj2q+w...@mail.gmail.com
that covers system versioning, and per our conversation far too long ago
(again, my bad) it's obvious that the two efforts shouldn't do anything
that would be in conflict with one another, as we eventually have to
support bitemporal [1] tables: tables that have both system versioning and
an application period.

Below is a list of observations and questions about this proposed patch of
itself in isolation, but mostly about how it relates to the work being done
for system versioning.

1. This patch creates a pg_period catalog table, whereas the system
versioning relies on additions to pg_attribute to identify the start/end
columns. Initially I thought this was because it was somehow possible to
have *multiple* application periods defined on a table, but in reading [1]
I see that there are some design suppositions that would make a second
application period impossible[2]. I can also see where having this table
would facilitate the easy creation of INFORMATION_SCHEMA.PERIODS. I was
previously unaware that this info schema table was a thing, but I have
found references to it, though I'm unclear as to whether it's supposed to
have information about system versioned tables in it as well.

Q 1.1. Would a bitemporal table have two entries in that view?
Q 1.2. Could you see being able to implement this without pg_period, using
only additions to pg_attribute (start/end for system temporal, start/end
for application, plus an addition for period name)?
Q 1.3. Can you see a way to represent the system versioning in pg_period
such that bitemporal tables were possible?

 2. The system versioning effort has chosen 'infinity' as their end-time
value, whereas you have chosen NULL as that makes sense for an unbounded
range. Other databases seem to leverage '-12-31 23:59:59' (SQLServer,
IIRC) whereas some others seem to used '2999-12-31 23:59:59' but those
might have been home-rolled temporal implementations. To further add to the
confusion, the syntax seems to specify the keyword of MAXVALUE, which
further muddies things. The system versioning people went with 'infinity'
seemingly because it prescribe and end to the world like SQLServer did, but
also because it allowed for a primary key based on (id, endtime) and that's
just not possible with NULL endtime values.

Q 2.1. Do you have any thoughts about how to resolve this notational logjam?

3. I noticed some inconsistency in the results from various "SELECT * FROM
portion_of_test" examples. In some, the "valid_at" range is shown but not
columns that make it up, and in some others, the "valid_from" and
"valid_to" columns are shown, with no mention of the period. From what I've
seen, the period column should be invisible unless invoked, like ctid or
xmin.

4. The syntax '2018-03-04' AT TIME ZONE INTERVAL '2'  HOUR TO MINUTE simply
confounded me. I googled around for it, but could find no matches for
postgres exception in mailing list discussions circa 2003. I tried it out
myself and, lo and behold

# SELECT '2018-03-04' AT TIME ZONE INTERVAL '2'  HOUR TO MINUTE;
  timezone
-
 2018-03-04 05:02:00
(1 row)


I really didn't expect that to work, or even "work". I can see that it
added 2 minutes to UTC's perspective on my local concept of midnight, but I
don't understand what it's supposed to mean.

Q 4.1. What does it mean?

5. I haven't seen any actual syntax conflicts between this patch and the
system versioning patch. Both teams added basically the same keywords,
though I haven't dove more deeply into any bison incompatibilities. Still,
it's a great start.

6. Overall, I'm really excited about what this will mean for data
governance in postgres.

[1]
https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf
[2] In the bitemporal table example in [1] - the application period get the
defined primary key, and the system_time period would be merely unique

On Mon, Sep 13, 2021 at 12:12 AM Paul A Jungwirth <
p...@illuminatedcomputing.com> wrote:

> On Fri, Sep 10, 2021 at 6:50 PM Jaime Casanova
>  wrote:
> >
> > patch 01: does apply but gives a compile warning (which is fixed by patch
> > 02)
> > [snip]
> > patch 03: produces these compile errors.
>
> I did a rebase and fixed this new error, as well as the warnings.
>
> On Mon, Sep 6, 2021 at 1:40 PM Zhihong Yu  wrote:
> >
> > + * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
> >
> > It seems the year (2018) should be updated to 2021.
>
> Done.
>
> > For RemovePeriodById(), it seems table_open() can be called after
> SearchSysCache1(). This way, if HeapTupleIsValid(tup) is true, table_open()
> can be skipped.
>
> This seems like it permits a race condition when two connections both
> try to drop the period, right?
>
> > For tablecmds.c, 

Re: simplifying foreign key/RI checks

2021-08-29 Thread Corey Huinker
>
> Rebased patches attached.


I'm reviewing the changes since v6, which was my last review.

Making ExecLockTableTuple() it's own function makes sense.
Snapshots are now accounted for.
The changes that account for n-level partitioning makes sense as well.

Passes make check-world.
Not user facing, so no user documentation required.
Marking as ready for committer again.


Nitpick/question: Use of aliases for global variables in functions

2021-08-19 Thread Corey Huinker
I'm using an ongoing patch review to educate myself on parts of the
codebase.

In src/backend/access/transam/xact.c, I'm noticing a code style
inconsistency.

In some cases, a function will declare a variable of some struct pointer
type, assign it to a globally declared struct, and then use it like this:

bool
IsTransactionState(void)
{
TransactionState s = CurrentTransactionState;


/*
 * TRANS_DEFAULT and TRANS_ABORT are obviously unsafe states.  However,
we
 * also reject the startup/shutdown states TRANS_START, TRANS_COMMIT,
 * TRANS_PREPARE since it might be too soon or too late within those
 * transition states to do anything interesting.  Hence, the only
"valid"
 * state is TRANS_INPROGRESS.
 */
return (s->state == TRANS_INPROGRESS);
}


And in other cases, the function will just reference the global directly

bool
TransactionStartedDuringRecovery(void)
{
return CurrentTransactionState->startedInRecovery;
}


Clearly, this isn't a big deal. I'm willing to bet that the compiler looks
at the variable declaration and thinks "welp, it's just going to end up in
an address register anyway, may as well optimize it away" at even really
low optimization levels, so I'd be surprised if there's even a difference
in the machine code generated, though it might hinder inlining the function
as a whole. Mostly I'm just noticing the inconsistent coding style.

Good points about the alias: I can see where the alias variable acts as a
reminder of the data type, but even that is the type of a pointer to the
struct, so we're still 1 level of indirection away instead of 2. I can also
see where it might just be an artifact of getting stuff to fit within 80
chars.

Bad points about the alias: I can see where things might hinder function
inlining, and might also hinder git grep searches for the global variable
name (though the global in question is scoped to the file, so that's not a
big concern).

Is this something worth standardizing, and if so, which style do we like
better?


<    1   2   3   4   >