Re: Convert node test compile-time settings into run-time parameters

2024-05-24 Thread Peter Eisentraut

On 21.05.24 20:48, Andres Freund wrote:

Where I'd be more concerned about peformance is the added branch in
READ_LOCATION_FIELD. There are a lot of calls to that, addding runtime
branches to each, with external function calls inside, is somewhat likely to
be measurable.


Ok, I have an improved plan.  I'm wrapping all the code related to this 
in #ifdef DEBUG_NODE_TESTS_ENABLED.  This in turn is defined in 
assert-enabled builds, or if you define it explicitly, or if you define 
one of the legacy individual symbols.  That way you get the run-time 
settings in a normal development build, but there is no new run-time 
overhead.  This is the same scheme that we use for debug_discard_caches.


(An argument could be made to enable this code if and only if assertions 
are enabled, since these tests are themselves kind of assertions.  But I 
think having a separate symbol documents the purpose of the various code 
sections better.)



Another thought:  Do we really need three separate settings?


Maybe not three settings, but a single setting, with multiple values, like
debug_io_direct?


Yeah, good idea.  Let's get some more feedback on this before I code up 
a complicated list parser.


Another approach might be levels.  My testing showed that the overhead 
of the copy_parse_plan_trees and raw_expression_coverage_tests flags is 
hardly noticeable, but write_read_parse_plan_trees has some noticeable 
impact.  So you could do 0=off, 1=only the cheap ones, 2=all tests.


In fact, if we could make "only the cheap ones" the default for 
assert-enabled builds, then most people won't even need to worry about 
this setting: The only way to mess up the write_read_parse_plan_trees is 
if you write custom node support, which is rare.  But the raw expression 
coverage still needs to be maintained by hand, so it's more often 
valuable to have it checked automatically.
From 80f35c90e3414dabd879e8832ce1b89c685e5de5 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 24 May 2024 11:42:02 +0200
Subject: [PATCH v2] Convert node test compile-time settings into run-time
 parameters

This converts

COPY_PARSE_PLAN_TREES
WRITE_READ_PARSE_PLAN_TREES
RAW_EXPRESSION_COVERAGE_TEST

into run-time parameters

debug_copy_parse_plan_trees
debug_write_read_parse_plan_trees
debug_raw_expression_coverage_test

They can be activated for tests using PG_TEST_INITDB_EXTRA_OPTS.

The compile-time symbols are kept for build farm compatibility, but
they now just determine the default value of the run-time settings.

Furthermore, support for these settings is not compiled in at all
unless assertions are enabled, or the new symbol
DEBUG_NODE_TESTS_ENABLED is defined at compile time, or any of the
legacy compile-time setting symbols are defined.  So there is no
run-time overhead in production builds.  (This is similar to the
handling of DISCARD_CACHES_ENABLED.)

Discussion: 
https://www.postgresql.org/message-id/flat/30747bd8-f51e-4e0c-a310-a6e2c37ec8aa%40eisentraut.org
---
 .cirrus.tasks.yml   |  3 +-
 doc/src/sgml/config.sgml| 71 +
 src/backend/nodes/README|  9 ++--
 src/backend/nodes/read.c| 12 ++---
 src/backend/nodes/readfuncs.c   |  4 +-
 src/backend/parser/analyze.c| 44 ++
 src/backend/tcop/postgres.c | 30 +++-
 src/backend/utils/misc/guc_tables.c | 59 
 src/include/nodes/nodes.h   |  2 +-
 src/include/nodes/readfuncs.h   |  2 +-
 src/include/pg_config_manual.h  | 34 +++---
 src/include/utils/guc.h |  6 +++
 12 files changed, 212 insertions(+), 64 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index a2388cd5036..6a9ff066391 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -133,9 +133,10 @@ task:
 DISK_SIZE: 50
 
 CCACHE_DIR: /tmp/ccache_dir
-CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES 
-DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST 
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
+CPPFLAGS: -DRELCACHE_FORCE_RELEASE 
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
 CFLAGS: -Og -ggdb
 
+PG_TEST_INITDB_EXTRA_OPTS: -c debug_copy_parse_plan_trees=on -c 
debug_write_read_parse_plan_trees=on -c debug_raw_expression_coverage_test=on
 PG_TEST_PG_UPGRADE_MODE: --link
 
   <<: *freebsd_task_template
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 698169afdb6..c121a13b4c3 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -11372,6 +11372,29 @@ Developer Options
   
  
 
+ 
+  debug_copy_parse_plan_trees 
(boolean)
+  
+   debug_copy_parse_plan_trees configuration 
parameter
+  
+  
+  
+   
+Enabling this forces all parse and plan trees to be passed through
+copyObject(), to facilitate catching errors and
+omissions in copyObject().  Th

Re: AIX support

2024-05-23 Thread Peter Eisentraut

On 22.05.24 18:15, Sriram RK wrote:

Please find the attached patch.

Apart from the AIX specific changes, there is a minor change in this 
file wrt to XLC, below is the error for which we removed inline.


Later, the build and tests passed for both XLC(16.1.0.18) and gcc(12) as 
well.


I think what you should do next is aggressively trim anything that does 
not apply to current versions of AIX or the current compiler.


For example,

+  # Old xlc versions (<13.1) don't have support for -qvisibility. Use 
expfull to force


+   
+AIX versions before 7.1 are no longer
+tested nor supported by the PostgreSQL
+community.
+   

(Probably most of that section needs to be retested and rewritten.)

+  # Native memset() is faster, tested on:
+  # - AIX 5.1 and 5.2, XLC 6.0 (IBM's cc)
+  # - AIX 5.3 ML3, gcc 4.0.1
+  memset_loop_limit = 0

+   # for the base executable (AIX 4.2 and up)

+ * "IBM XL C/C++ for AIX, V12.1" miscompiles, for 32-bit, some inline


One of the reasons that the AIX port ultimately became unmaintainable 
was that so many hacks and caveats were accumulated over the years.  A 
new port should set a more recent baseline and trim all those hacks.






Re: Virtual generated columns

2024-05-22 Thread Peter Eisentraut

On 29.04.24 20:54, Corey Huinker wrote:

  -- generation expression must be immutable
-CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision
GENERATED ALWAYS AS (random()) STORED);
+CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision
GENERATED ALWAYS AS (random()) VIRTUAL);

Does a VIRTUAL generated column have to be immutable? I can see where 
the STORED one has to be, but consider the following:


CREATE TABLE foo (
created_at timestamptz DEFAULT CURRENT_TIMESTAMP,
row_age interval GENERATED ALWAYS AS CURRENT_TIMESTAMP - created_at
);


I have been hesitant about this, but I'm now leaning toward that we 
could allow this.



  -- can't have generated column that is a child of normal column
  CREATE TABLE gtest_normal (a int, b int);
-CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS
(a * 2) STORED) INHERITS (gtest_normal);  -- error
+CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS
(a * 2) VIRTUAL) INHERITS (gtest_normal);  -- error

This is the barrier to the partitioning reorganization scheme I 
described above. Is there any hard rule why a child table couldn't have 
a generated column matching the parent's regular column? I can see where 
it might prevent indexing that column on the parent table, but is there 
some other dealbreaker or is this just a "it doesn't work yet" situation?


We had a quite a difficult time getting the inheritance business of 
stored generated columns working correctly.  I'm sticking to the 
well-trodden path here.  We can possibly expand this if someone wants to 
work out the details.


One last thing to keep in mind is that there are two special case 
expressions in the spec:


GENERATED ALWAYS AS ROW START
GENERATED ALWAYS AS ROW END

and we'll need to be able to fit those into the catalog. I'll start 
another thread for that unless you prefer I keep it here.


I think this is a separate feature.





Re: Schema variables - new implementation for Postgres 15

2024-05-22 Thread Peter Eisentraut

On 18.05.24 13:29, Alvaro Herrera wrote:

I want to note that when we discussed this patch series at the dev
meeting in FOSDEM, a sort-of conclusion was reached that we didn't want
schema variables at all because of the fact that creating a variable
would potentially change the meaning of queries by shadowing table
columns.  But this turns out to be incorrect: it's_variables_  that are
shadowed by table columns, not the other way around.


But that's still bad, because seemingly unrelated schema changes can 
make variables appear and disappear.  For example, if you have


SELECT a, b FROM table1

and then you drop column b, maybe the above query continues to work 
because there is also a variable b.  Or maybe it now does different 
things because b is of a different type.  This all has the potential to 
be very confusing.






Re: Pgoutput not capturing the generated columns

2024-05-22 Thread Peter Eisentraut

On 08.05.24 09:13, Shubham Khanna wrote:

The attached patch has the changes to support capturing generated
column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
‘include_generated_columns’ option is specified, the generated column
information and generated column data also will be sent.


It might be worth keeping half an eye on the development of virtual 
generated columns [0].  I think it won't be possible to include those 
into the replication output stream.


I think having an option for including stored generated columns is in 
general ok.



[0]: 
https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b...@eisentraut.org





Re: tydedef extraction - back to the future

2024-05-22 Thread Peter Eisentraut

On 20.05.24 23:11, Andrew Dunstan wrote:
Attached is an attempt to thread this needle. The core is a new perl 
module that imports the current buildfarm client logic. The intention is 
that once we have this, the buildfarm client will switch to using the 
module (if found) rather than its own built-in logic. There is precedent 
for this sort of arrangement (AdjustUpgrade.pm). Accompanying the new 
module is a standalone perl script that uses the new module, and 
replaces the current shell script (thus making it more portable).


It looks like this code could use a bit of work to modernize and clean 
up cruft, such as


+   my $sep = $using_msvc ? ';' : ':';

This can be done with File::Spec.

+   next if $bin =~ m!bin/(ipcclean|pltcl_)!;

Those are long gone.

+   next if $bin =~ m!/postmaster.exe$!;# sometimes a 
copy not a link


Also gone.

+   elsif ($using_osx)
+   {
+   # On OS X, we need to examine the .o files

Update the name.

+   # exclude ecpg/test, which pgindent does too
+   my $obj_wanted = sub {
+   /^.*\.o\z/s
+ && !($File::Find::name =~ m!/ecpg/test/!s)
+ && push(@testfiles, $File::Find::name);
+   };
+
+   File::Find::find($obj_wanted, $binloc);
+   }

Not clear why this is specific to that platform.

Also, some instructions should be provided.  It looks like this is meant 
to be run on the installation tree?  A README and/or a build target 
would be good.


The code distinguishes between srcdir and bindir, but it's not clear 
what the latter is.  It rather looks like the installation prefix.  Does 
this code support out of tree builds?  This should be cleared up.






Re: in parentesis is not usual on DOCs

2024-05-22 Thread Peter Eisentraut

On 20.05.24 02:00, jian he wrote:

removing parentheses means we need to rephrase this sentence?
So I come up with the following rephrase:

The context_item specifies the input data to query, the
path_expression is a JSON path expression defining the query,
json_path_name is an optional name for the path_expression. The
optional PASSING clause can provide data values to the
path_expression.


Based on this, write a simple patch.


Your patch kind of messes up the indentation of the text you are 
changing.  Please check that.





Re: Convert node test compile-time settings into run-time parameters

2024-05-21 Thread Peter Eisentraut

On 20.05.24 15:59, Tom Lane wrote:

Peter Eisentraut  writes:

This patch converts the compile-time settings
  COPY_PARSE_PLAN_TREES
  WRITE_READ_PARSE_PLAN_TREES
  RAW_EXPRESSION_COVERAGE_TEST



into run-time parameters



  debug_copy_parse_plan_trees
  debug_write_read_parse_plan_trees
  debug_raw_expression_coverage_test


I'm kind of down on this.  It seems like forcing a bunch of
useless-in-production debug support into the standard build.
What of this would be of any use to any non-developer?


We have a bunch of other debug_* settings that are available in 
production builds, such as


debug_print_parse
debug_print_rewritten
debug_print_plan
debug_pretty_print
debug_discard_caches
debug_io_direct
debug_parallel_query
debug_logical_replication_streaming

Maybe we could hide all of them behind some #ifdef DEBUG_OPTIONS, but in 
any case, I don't think the ones being proposed here are substantially 
different from those existing ones that they would require a separate 
treatment.


My goal is to make these facilities easier to use, avoiding hand-editing 
pg_config_manual.h and having to recompile.






Re: Speed up clean meson builds by ~25%

2024-05-20 Thread Peter Eisentraut

On 19.05.24 00:09, Andres Freund wrote:

On 2024-05-18 00:35:08 +0200, Peter Eisentraut wrote:

I retested the patch from 2024-04-07 (I think that's the one that "fixed
that"?  There are multiple "v1" patches in this thread.) using gcc-14 and
clang-18, with ccache disabled of course.  The measured effects of the patch
are:

gcc-14:   1% slower
clang-18: 3% faster

So with that, it doesn't seem very interesting.

I wonder whether the reason you're seing less benefit than Jelte is that - I'm
guessing - you now used ninja 1.12 and Jelte something older.  Ninja 1.12
prioritizes building edges using a "critical path" heuristic, leading to
scheduling nodes with more incoming dependencies, and deeper in the dependency
graph earlier.


Yes!  Very interesting!

With ninja 1.11 and gcc-14, I see the patch gives about a 17% speedup.




Convert node test compile-time settings into run-time parameters

2024-05-20 Thread Peter Eisentraut

This patch converts the compile-time settings

COPY_PARSE_PLAN_TREES
WRITE_READ_PARSE_PLAN_TREES
RAW_EXPRESSION_COVERAGE_TEST

into run-time parameters

debug_copy_parse_plan_trees
debug_write_read_parse_plan_trees
debug_raw_expression_coverage_test

They can be activated for tests using PG_TEST_INITDB_EXTRA_OPTS.

The effect is the same, but now you don't need to recompile in order to 
use these checks.


The compile-time symbols are kept for build farm compatibility, but they 
now just determine the default value of the run-time settings.


Possible concerns:

- Performance?  Looking for example at pg_parse_query() and its 
siblings, they also check for other debugging settings like 
log_parser_stats in the main code path, so it doesn't seem to be a concern.


- Access control?  I have these settings as PGC_USERSET for now. Maybe 
they should be PGC_SUSET?


Another thought:  Do we really need three separate settings?
From 568c620eb97f82aa8eab850cb3ce703c5c94a912 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 20 May 2024 09:13:23 +0200
Subject: [PATCH v1] Convert node test compile-time settings into run-time
 parameters

This converts

COPY_PARSE_PLAN_TREES
WRITE_READ_PARSE_PLAN_TREES
RAW_EXPRESSION_COVERAGE_TEST

into run-time parameters

debug_copy_parse_plan_trees
debug_write_read_parse_plan_trees
debug_raw_expression_coverage_test

They can be activated for tests using PG_TEST_INITDB_EXTRA_OPTS.

The compile-time symbols are kept for build farm compatibility, but
they now just determine the default value of the run-time settings.

TODO: config.sgml documentation
---
 .cirrus.tasks.yml   |  3 +-
 src/backend/nodes/README|  9 ++---
 src/backend/nodes/read.c| 15 +---
 src/backend/nodes/readfuncs.c   | 10 +-
 src/backend/parser/analyze.c| 14 +++-
 src/backend/tcop/postgres.c | 18 --
 src/backend/utils/misc/guc_tables.c | 55 +
 src/include/nodes/nodes.h   |  2 --
 src/include/nodes/readfuncs.h   |  2 --
 src/include/pg_config_manual.h  | 21 ---
 src/include/utils/guc.h |  4 +++
 11 files changed, 79 insertions(+), 74 deletions(-)

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index a2388cd5036..6a9ff066391 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -133,9 +133,10 @@ task:
 DISK_SIZE: 50
 
 CCACHE_DIR: /tmp/ccache_dir
-CPPFLAGS: -DRELCACHE_FORCE_RELEASE -DCOPY_PARSE_PLAN_TREES 
-DWRITE_READ_PARSE_PLAN_TREES -DRAW_EXPRESSION_COVERAGE_TEST 
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
+CPPFLAGS: -DRELCACHE_FORCE_RELEASE 
-DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS
 CFLAGS: -Og -ggdb
 
+PG_TEST_INITDB_EXTRA_OPTS: -c debug_copy_parse_plan_trees=on -c 
debug_write_read_parse_plan_trees=on -c debug_raw_expression_coverage_test=on
 PG_TEST_PG_UPGRADE_MODE: --link
 
   <<: *freebsd_task_template
diff --git a/src/backend/nodes/README b/src/backend/nodes/README
index 52364470205..f8bbd605386 100644
--- a/src/backend/nodes/README
+++ b/src/backend/nodes/README
@@ -98,10 +98,11 @@ Suppose you want to define a node Foo:
node types to find all the places to touch.
(Except for frequently-created nodes, don't bother writing a creator
function in makefuncs.c.)
-4. Consider testing your new code with COPY_PARSE_PLAN_TREES,
-   WRITE_READ_PARSE_PLAN_TREES, and RAW_EXPRESSION_COVERAGE_TEST to ensure
-   support has been added everywhere that it's necessary; see
-   pg_config_manual.h about these.
+4. Consider testing your new code with debug_copy_parse_plan_trees,
+   debug_write_read_parse_plan_trees, and
+   debug_raw_expression_coverage_test to ensure support has been added
+   everywhere that it's necessary (e.g., run the tests with
+   PG_TEST_INITDB_EXTRA_OPTS='-c debug_...=on').
 
 Adding a new node type moves the numbers associated with existing
 tags, so you'll need to recompile the whole tree after doing this.
diff --git a/src/backend/nodes/read.c b/src/backend/nodes/read.c
index 4eb42445c52..e2d2ce7374d 100644
--- a/src/backend/nodes/read.c
+++ b/src/backend/nodes/read.c
@@ -32,9 +32,7 @@
 static const char *pg_strtok_ptr = NULL;
 
 /* State flag that determines how readfuncs.c should treat location fields */
-#ifdef WRITE_READ_PARSE_PLAN_TREES
 bool   restore_location_fields = false;
-#endif
 
 
 /*
@@ -42,17 +40,14 @@ boolrestore_location_fields = false;
  *   builds a Node tree from its string representation (assumed valid)
  *
  * restore_loc_fields instructs readfuncs.c whether to restore location
- * fields rather than set them to -1.  This is currently only supported
- * in builds with the WRITE_READ_PARSE_PLAN_TREES debugging flag set.
+ * fields rather than set them to -1.
  */
 static void *
 stringToNodeInternal(const char *str, bool restore_loc_fields)
 {
void   *retval;
const char *save_

Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-05-19 Thread Peter Eisentraut

On 19.05.24 20:07, David Benjamin wrote:
On Sun, May 19, 2024 at 12:21 PM Tom Lane > wrote:


Per the cfbot [1], this patch needs a rebase over the ALPN-related
commits.  It still isn't likely to get human attention before the
July commitfest, but you can save some time by making sure it's
in a reviewable state before that.


Rebased version attached. (The conflict was pretty trivial. Both patches 
add a field to some struct.)


Note that there is a concurrent plan to drop support for OpenSSL older 
than 1.1.1 [0], which would obsolete your configure test for 
BIO_get_data.  Whoever commits these should be sure to coordinate this.



[0]: https://www.postgresql.org/message-id/flat/zg3jnursg69dz...@paquier.xyz





Re: Speed up clean meson builds by ~25%

2024-05-17 Thread Peter Eisentraut

On 17.05.24 23:01, Robert Haas wrote:

On Fri, May 17, 2024 at 4:59 PM Tom Lane  wrote:

As I mentioned upthread, I'm more worried about confusing error
reports than the machine time.


Well, Jelte fixed that.


I grant that there are people who are more affected, but still, I'd
just as soon not contort the build rules for a temporary problem.


Arguably by doing this, but I don't think it's enough of a contortion
to get excited about.

Anyone else want to vote?


I retested the patch from 2024-04-07 (I think that's the one that "fixed 
that"?  There are multiple "v1" patches in this thread.) using gcc-14 
and clang-18, with ccache disabled of course.  The measured effects of 
the patch are:


gcc-14:   1% slower
clang-18: 3% faster

So with that, it doesn't seem very interesting.





Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Peter Eisentraut

On 17.05.24 14:42, Joe Conway wrote:
Namely, the week before commitfest I don't actually know if I will 
have the time during that month, but I will make sure my patch is in 
the commitfest just in case I get a few clear days to work on it. 
Because if it isn't there, I can't take advantage of those "found" hours.


A solution to both of these issues (yours and mine) would be to allow 
things to be added *during* the CF month. What is the point of having a 
"freeze" before every CF anyway? Especially if they start out clean. If 
something is ready for review on day 8 of the CF, why not let it be 
added for review?


Maybe this all indicates that the idea of bimonthly commitfests has run 
its course.  The original idea might have been, if we have like 50 
patches, we can process all of them within a month.  We're clearly not 
doing that anymore.  How would the development process look like if we 
just had one commitfest per dev cycle that runs from July 1st to March 31st?






Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Peter Eisentraut

On 17.05.24 09:32, Heikki Linnakangas wrote:
Dunno about having to click a link or sparkly gold borders, but +1 on 
having a free-form text box for notes like that. Things like "cfbot says 
this has bitrotted" or "Will review this after other patch this depends 
on". On the mailing list, notes like that are both noisy and easily lost 
in the threads. But as a little free-form text box on the commitfest, it 
would be handy.


One risk is that if we start to rely too much on that, or on the other 
fields in the commitfest app for that matter, we de-value the mailing 
list archives. I'm not too worried about it, the idea is that the 
summary box just summarizes what's already been said on the mailing 
list, or is transient information like "I'll get to this tomorrow" 
that's not interesting to archive.


We already have the annotations feature, which is kind of this.




Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-17 Thread Peter Eisentraut

On 17.05.24 13:36, Daniel Gustafsson wrote:

On 17 May 2024, at 13:13, Robert Haas  wrote:



But if we are to guess what those reasons might be, Tom has already
admitted he does that for CI, and I do the same, so probably other
people also do it. I also suspect that some people are essentially
using the CF app as a personal todo list. By sticking patches in there
that they intend to commit next cycle, they both (1) feel virtuous,
because they give at least the appearance of following the community
process and inviting review before they commit and (2) avoid losing
track of the stuff they plan to commit.

There may be other reasons, too.


I think there is one more which is important: 3) Giving visibility into "this
is what I intend to commit".  Few can follow -hackers to the level where they
can have an overview of ongoing and/or finished work which will go in.  The CF
app does however provide that overview.  This is essentially the TODO list
aspect, but sharing one's TODO isn't all bad, especially for maintainers.


Ok, but these cases shouldn't use a separate "parking lot".  They should 
use the same statuses and flow diagram as the rest.  (Maybe with more 
dotted lines, sure.)






Re: GUC names in messages

2024-05-17 Thread Peter Eisentraut

On 17.05.24 05:31, Peter Smith wrote:

I think we should accept your two patches

v6-0001-GUC-names-docs.patch
v6-0002-GUC-names-add-quotes.patch

which effectively everyone was in favor of and which seem to be the most
robust and sustainable solution.

(The remaining three patches from the v6 set would be PG18 material at
this point.)

Thanks very much for taking an interest in resurrecting this thread.

It was always my intention to come back to this when the dust had
settled on PG17. But it would be even better if the docs for the rule
"just quote everything", and anything else you deem acceptable, can be
pushed sooner.

Of course, there will still be plenty more to do for PG18, including
locating examples in newly pushed code for messages that have slipped
through the cracks during the last few months using different formats,
and other improvements, but those tasks should become easier if we can
get some of these v6 patches out of the way first.


I committed your 0001 and 0002 now, with some small fixes.

There has also been quite a bit of new code, of course, since you posted 
your patches, so we'll probably find a few more things that could use 
adjustment.


I'd be happy to consider the rest of your patch set after beta1 and/or 
for PG18.






Re: Ignore Visual Studio's Temp Files While Working with PG on Windows

2024-05-17 Thread Peter Eisentraut

On 17.05.24 08:09, Yasir wrote:
I have been playing with PG on the Windows platform recently. An 
annoying thing I faced is that a lot of Visual Studio's temp files kept 
appearing in git changed files. Therefore, I am submitting this very 
trivial patch to ignore these temp files.


Our general recommendation is that you put such things into your 
personal global git ignore file.


For example, I have in ~/.gitconfig

[core]
excludesFile = ~/.gitexcludes

and then in ~/.gitexcludes I have various ignores that are specific to 
my local tooling.


That way we don't have to maintain ignore lists for all the tools in the 
world in the PostgreSQL source tree.






Re: Minor cleanups in the SSL tests

2024-05-16 Thread Peter Eisentraut

On 16.05.24 23:27, Daniel Gustafsson wrote:

On 16 May 2024, at 11:43, Peter Eisentraut  wrote:



You might want to run your patch through pgperltidy.  The result doesn't look 
bad, but a bit different than what you had crafted.


Ugh, I thought I had but clearly had forgotten. Fixed now.


append_conf() opens and closes the file for each call.  It might be nice if it 
could accept a list.  Or you can just pass the whole block as one string, like 
it was done for pg_ident.conf before.


The attached v2 pass the whole block as a here-doc which seemed like the best
option to retain readability of the config.


Works for me.





Re: Why does pgindent's README say to download typedefs.list from the buildfarm?

2024-05-16 Thread Peter Eisentraut

On 16.05.24 16:45, Tom Lane wrote:

Peter Eisentraut  writes:

In these cases, I think for
NotificationHash
ResourceOwnerData
WalSyncMethod
we can just get rid of the typedef.


I have no objection to dealing with NotificationHash as you have here.


ReadBuffersFlags shouldn't be an enum at all, because its values are
used as flag bits.


Yeah, that was bothering me too, but I went for the minimum delta.
I did think that a couple of integer macros would be a better idea,
so +1 for what you did here.


I committed this, and Michael took care of WaitEventExtension, so we 
should be all clear here.






Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-16 Thread Peter Eisentraut

On 17.05.24 04:26, Robert Haas wrote:

I do*emphatically*  think we need a parking lot.


People seem to like this idea; I'm not quite sure I follow it.  If you 
just want the automated patch testing, you can do that on your own by 
setting up github/cirrus for your own account.  If you keep emailing the 
public your patches just because you don't want to set up your private 
testing tooling, that seems a bit abusive?


Are there other reasons why developers might want their patches 
registered in a parking lot?


We also need to consider that the cfbot/cirrus resources are limited. 
Whatever changes we make, we should make sure that they are prioritized 
well.






Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-16 Thread Peter Eisentraut

On 17.05.24 00:13, Tom Lane wrote:

So a third status that encompasses the various other situations like
maybe forgotten by author, disagreements between author and reviewer,
process difficulties, needs some senior developer intervention, etc.
could be helpful.


Hmm, "forgotten by author" seems to generally turn into "this has been
in WOA state a long time".  Not sure we have a problem representing
that, only with a process for eventually retiring such entries.
Your other three examples all sound like "needs senior developer
attention", which could be a helpful state that's distinct from "ready
for committer".  It's definitely not the same as "Unclear".


Yeah, some fine-tuning might be required.  But I would be wary of 
over-designing too many new states at this point.  I think the key idea 
is that there ought to be a state that communicates "needs attention 
from someone other than author, reviewer, or committer".






Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-16 Thread Peter Eisentraut

On 16.05.24 23:46, Tom Lane wrote:

Peter Eisentraut  writes:

The problem is if we have 180 patches in Needs Review, and only 20 are
really actually ready to be reviewed.  And a second-order problem is
that if you already know that this will be the case, you give up before
even looking.


Right, so what can we do about that?  Does Needs Review state need to
be subdivided, and if so how?


Maybe a new state "Unclear".  Then a commitfest manager, or someone like 
Robert just now, can more easily power through the list and set 
everything that's doubtful to Unclear, with the understanding that the 
author can set it back to Needs Review to signal that they actually 
think it's ready.  Or, as a variant of what someone else was saying, 
just set all patches that carry over from March to July as Unclear.  Or 
something like that.


I think, if we consider the core mission of the commitfest app, we need 
to be more protective of the Needs Review state.


I have been, from time to time, when I'm in commitfest management mode, 
aggressive in setting entries back to Waiting on Author, but that's not 
always optimal.


So a third status that encompasses the various other situations like 
maybe forgotten by author, disagreements between author and reviewer, 
process difficulties, needs some senior developer intervention, etc. 
could be helpful.


This could also help scale the commitfest manager role, because then the 
head CFM could have like three helpers and just tell them, look at all 
the "Unclear" ones and help resolve them.





Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-16 Thread Peter Eisentraut

On 16.05.24 23:06, Joe Conway wrote:

On 5/16/24 16:57, Jacob Champion wrote:

On Thu, May 16, 2024 at 1:31 PM Joe Conway  wrote:

Maybe we should just make it a policy that *nothing* gets moved forward
from commitfest-to-commitfest and therefore the author needs to care
enough to register for the next one?


I think that's going to severely disadvantage anyone who doesn't do
this as their day job. Maybe I'm bristling a bit too much at the
wording, but not having time to shepherd a patch is not the same as
not caring.


Maybe the word "care" was a poor choice, but forcing authors to think 
about and decide if they have the "time to shepherd a patch" for the 
*next CF* is exactly the point. If they don't, why clutter the CF with it.


Objectively, I think this could be quite effective.  You need to prove 
your continued interest in your project by pressing a button every two 
months.


But there is a high risk that this will double the annoyance for 
contributors whose patches aren't getting reviews.  Now, not only are 
you being ignored, but you need to prove that you're still there every 
two months.






Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-16 Thread Peter Eisentraut

On 16.05.24 23:04, Tom Lane wrote:

I think it'd be great if we could separate "I'm actively reviewing
this" from "I'm interested in this".  As a bonus, adding yourself
to the "interested" list would be a fine proxy for the thumbs-up
or star markers mentioned upthread.

If those were separate columns, we could implement some sort of
aging scheme whereby somebody who'd not commented for (say)
a week or two would get quasi-automatically moved from the "active
reviewer" column to the "interested" column, whereupon it wouldn't
be impolite for someone else to sign up for active review.


Yes, I think some variant of this could be quite useful.




Re: commitfest.postgresql.org is no longer fit for purpose

2024-05-16 Thread Peter Eisentraut

On 16.05.24 22:46, Melanie Plageman wrote:

I was reflecting on why a general purpose patch tracker sounded
appealing to me, and I realized that, at least at this time of year, I
have a few patches that really count as "waiting on author" that I
know I need to do additional work on before they need more review but
which aren't currently my top priority. I should probably simply
withdraw and re-register them. My justification was that I'll lose
them if I don't keep them in the commitfest app. But, I could just,
you know, save them somewhere myself instead of polluting the
commitfest app with them. I don't know if others are in this
situation. Anyway, I'm definitely currently guilty of parking.


I don't have a problem with that at all.  It's pretty understandable 
that many patches are parked between say April and July.


The key is the keep the status up to date.

If we have 890 patches in Waiting for Author and 100 patches in Ready 
for Committer and 10 patches in Needs Review, and those 10 patches are 
actually reviewable, then that's good.  There might need to be a 
"background process" to make sure those 890 waiting patches are not 
parked forever and those 100 patches actually get committed sometime, 
but in principle this is the system working.


The problem is if we have 180 patches in Needs Review, and only 20 are 
really actually ready to be reviewed.  And a second-order problem is 
that if you already know that this will be the case, you give up before 
even looking.






avoid MERGE_ACTION keyword?

2024-05-16 Thread Peter Eisentraut

I wonder if we can avoid making MERGE_ACTION a keyword.

I think we could parse it initially as a function and then transform it 
to a more special node later.  In the attached patch, I'm doing this in 
parse analysis.  We could try to do it even later and actually execute 
it as a function, if we could get the proper context passed into it somehow.


I'm thinking about this with half an eye on future features.  For 
example, row pattern recognition might introduce similar magic functions 
match_number() and classifier() (somewhat the inspiration for the 
merge_action() syntax), which could use similar treatment.


Thoughts?From 6e0132ca3f3472fb6c5f8c5b2ec7296de1f83811 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 16 May 2024 16:09:57 +0200
Subject: [PATCH] WIP: Avoid MERGE_ACTION keyword

---
 src/backend/parser/gram.y   | 12 +-
 src/backend/parser/parse_expr.c | 73 ++---
 src/include/catalog/pg_proc.dat |  2 +
 src/include/parser/kwlist.h |  1 -
 4 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4d582950b72..b717b704972 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -751,7 +751,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
LEADING LEAKPROOF LEAST LEFT LEVEL LIKE LIMIT LISTEN LOAD LOCAL
LOCALTIME LOCALTIMESTAMP LOCATION LOCK_P LOCKED LOGGED
 
-   MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MERGE MERGE_ACTION METHOD
+   MAPPING MATCH MATCHED MATERIALIZED MAXVALUE MERGE METHOD
MINUTE_P MINVALUE MODE MONTH_P MOVE
 
NAME_P NAMES NATIONAL NATURAL NCHAR NESTED NEW NEXT NFC NFD NFKC NFKD NO
@@ -16077,14 +16077,6 @@ func_expr_common_subexpr:
n->location = @1;
$$ = (Node *) n;
}
-   | MERGE_ACTION '(' ')'
-   {
-   MergeSupportFunc *m = 
makeNode(MergeSupportFunc);
-
-   m->msftype = TEXTOID;
-   m->location = @1;
-   $$ = (Node *) m;
-   }
| JSON_QUERY '('
json_value_expr ',' a_expr 
json_passing_clause_opt
json_returning_clause_opt
@@ -17936,7 +17928,6 @@ col_name_keyword:
| JSON_TABLE
| JSON_VALUE
| LEAST
-   | MERGE_ACTION
| NATIONAL
| NCHAR
| NONE
@@ -18334,7 +18325,6 @@ bare_label_keyword:
| MATERIALIZED
| MAXVALUE
| MERGE
-   | MERGE_ACTION
| METHOD
| MINVALUE
| MODE
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index aba3546ed1a..12fc6829fc0 100644
--- a/src/backend/parser/parse_expr.c
+++ b/src/backend/parser/parse_expr.c
@@ -55,7 +55,6 @@ static Node *transformAExprDistinct(ParseState *pstate, 
A_Expr *a);
 static Node *transformAExprNullIf(ParseState *pstate, A_Expr *a);
 static Node *transformAExprIn(ParseState *pstate, A_Expr *a);
 static Node *transformAExprBetween(ParseState *pstate, A_Expr *a);
-static Node *transformMergeSupportFunc(ParseState *pstate, MergeSupportFunc 
*f);
 static Node *transformBoolExpr(ParseState *pstate, BoolExpr *a);
 static Node *transformFuncCall(ParseState *pstate, FuncCall *fn);
 static Node *transformMultiAssignRef(ParseState *pstate, MultiAssignRef 
*maref);
@@ -238,11 +237,6 @@ transformExprRecurse(ParseState *pstate, Node *expr)
result = transformGroupingFunc(pstate, (GroupingFunc *) 
expr);
break;
 
-   case T_MergeSupportFunc:
-   result = transformMergeSupportFunc(pstate,
-   
   (MergeSupportFunc *) expr);
-   break;
-
case T_NamedArgExpr:
{
NamedArgExpr *na = (NamedArgExpr *) expr;
@@ -1374,31 +1368,6 @@ transformAExprBetween(ParseState *pstate, A_Expr *a)
return transformExprRecurse(pstate, result);
 }
 
-static Node *
-transformMergeSupportFunc(ParseState *pstate, MergeSupportFunc *f)
-{
-   /*
-* All we need to do is check that we're in the RETURNING list of a 
MERGE
-* command.  If so, we just return the node as-is.
-*/
-   if (pstate->p_expr_kind != EXPR_KIND_MERGE_RETURNING)
-   {
-   ParseState *parent_pstate = pstate->p

Re: Adding the extension name to EData / log_line_prefix

2024-05-16 Thread Peter Eisentraut

On 15.05.24 17:50, Tom Lane wrote:

We kind of already have something like this, for NLS.  If you look for
pg_bindtextdomain(TEXTDOMAIN) and ereport_domain(), this information
already trickles into the vicinity of the error data.  Maybe the same
thing could just be used for this, by wiring up the macros a bit
differently.

Hmm, cute idea, but it'd only help for extensions that are
NLS-enabled.  Which I bet is a tiny fraction of the population.
So far as I can find, we don't even document how to set up
TEXTDOMAIN for an extension --- you have to cargo-cult the
macro definition from some in-core extension.


Yeah, the whole thing is a bit mysterious, and we don't need to use the 
exact mechanism we have now.


But abstractly, we should only have to specify the, uh, domain of the 
log messages once.  Whether that is used for building a message catalog 
or tagging the server log, those are just two different downstream uses 
of the same piece of information.






Re: GUC names in messages

2024-05-16 Thread Peter Eisentraut

On 04.01.24 07:53, Peter Smith wrote:

Now that I read this again, I think this is wrong.

We should decide the quoting for a category, not the actual content.
Like, quote all file names; do not quote keywords.

This led to the attempted patch to decide the quoting of GUC parameter
names dynamically based on the actual content, which no one really
liked.  But then, to preserve consistency, we also need to be uniform in
quoting GUC parameter names where the name is hardcoded.



I agree. By attempting to define when to and when not to use quotes it
has become overcomplicated.

Earlier in the thread, I counted how quotes were used in the existing
messages [5]; there were ~39 quoted and 164 not quoted. Based on that
we chose to stay with the majority, and leave all the unquoted ones so
only adding quotes "when necessary". In hindsight, that was probably
the wrong choice because it opened a can of worms about what "when
necessary" even means (e.g. what about underscores, mixed case etc).

Certainly one simple rule "just quote everything" is easiest to follow.


I've been going through the translation updates for PG17 these days and 
was led back around to this issue.  It seems we left it in an 
intermediate state that no one was really happy with and which is 
arguably as inconsistent or more so than before.


I think we should accept your two patches

v6-0001-GUC-names-docs.patch
v6-0002-GUC-names-add-quotes.patch

which effectively everyone was in favor of and which seem to be the most 
robust and sustainable solution.


(The remaining three patches from the v6 set would be PG18 material at 
this point.)






Re: [PATCH] Add --syntax to postgres for SQL syntax checking

2024-05-16 Thread Peter Eisentraut

On 15.05.24 21:05, Robert Haas wrote:

I don't think it's at all obvious that this belongs on the server side
rather than the client side. I think it could be done in either place,
or both. I just think we don't have the infrastructure to do it
cleanly, at present.


I think if you're going to do a syntax-check-with-catalog-lookup mode, 
you need authentication and access control.  The mode without catalog 
lookup doesn't need that.  But it might be better to offer both modes 
through a similar interface (or at least plan ahead for that).






Re: An improved README experience for PostgreSQL

2024-05-16 Thread Peter Eisentraut

On 15.05.24 21:34, Nathan Bossart wrote:

On Wed, May 15, 2024 at 07:23:19AM +0200, Peter Eisentraut wrote:

I think for CONTRIBUTING.md, a better link would be
<https://www.postgresql.org/developer/>.


WFM


This patch version looks good to me.





Re: Why does pgindent's README say to download typedefs.list from the buildfarm?

2024-05-16 Thread Peter Eisentraut

On 16.05.24 01:32, Tom Lane wrote:

As for the remainder, they aren't showing up because no variable
or field is declared using them, which means no debug symbol
table entry is made for them.  This means we could just drop those
typedefs and be little the worse off notationally.  I experimented
with a patch for that, as attached.  (In the case of NotificationHash,
I thought it better to arrange for there to be a suitable variable;
but it could certainly be done the other way too.)  Is this too anal?


I agree we should get rid of these.

Over the last release cycle, I've been leaning a bit more toward not 
typdef'ing enums and structs that are only in local use, in part because 
of the implied need to keep the typedefs list up to date.


In these cases, I think for

NotificationHash
ResourceOwnerData
WalSyncMethod

we can just get rid of the typedef.

ReadBuffersFlags shouldn't be an enum at all, because its values are 
used as flag bits.


WaitEventExtension, I'm not sure, it's like, an extensible enum?  I 
guess let's remove the typedef there, too.


Attached is a variant patch.
diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index d0891e3f0e0..ab4c72762d8 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -396,10 +396,10 @@ typedef struct NotificationList
 
 #define MIN_HASHABLE_NOTIFIES 16   /* threshold to build hashtab */
 
-typedef struct NotificationHash
+struct NotificationHash
 {
Notification *event;/* => the actual Notification struct */
-} NotificationHash;
+};
 
 static NotificationList *pendingNotifies = NULL;
 
@@ -2299,7 +2299,7 @@ AddEventToPendingNotifies(Notification *n)
 
/* Create the hash table */
hash_ctl.keysize = sizeof(Notification *);
-   hash_ctl.entrysize = sizeof(NotificationHash);
+   hash_ctl.entrysize = sizeof(struct NotificationHash);
hash_ctl.hash = notification_hash;
hash_ctl.match = notification_match;
hash_ctl.hcxt = CurTransactionContext;
diff --git a/src/backend/utils/resowner/resowner.c 
b/src/backend/utils/resowner/resowner.c
index ab9343bc5cf..505534ee8d3 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -107,7 +107,7 @@ 
StaticAssertDecl(RESOWNER_HASH_MAX_ITEMS(RESOWNER_HASH_INIT_SIZE) >= RESOWNER_AR
 /*
  * ResourceOwner objects look like this
  */
-typedef struct ResourceOwnerData
+struct ResourceOwnerData
 {
ResourceOwner parent;   /* NULL if no parent (toplevel owner) */
ResourceOwner firstchild;   /* head of linked list of children */
@@ -155,7 +155,7 @@ typedef struct ResourceOwnerData
 
/* The local locks cache. */
LOCALLOCK  *locks[MAX_RESOWNER_LOCKS];  /* list of owned locks */
-} ResourceOwnerData;
+};
 
 
 /*
@@ -415,7 +415,7 @@ ResourceOwnerCreate(ResourceOwner parent, const char *name)
ResourceOwner owner;
 
owner = (ResourceOwner) MemoryContextAllocZero(TopMemoryContext,
-   
   sizeof(ResourceOwnerData));
+   
   sizeof(struct ResourceOwnerData));
owner->name = name;
 
if (parent)
diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 76787a82673..1a1f11a943f 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -19,14 +19,14 @@
 
 
 /* Sync methods */
-typedef enum WalSyncMethod
+enum WalSyncMethod
 {
WAL_SYNC_METHOD_FSYNC = 0,
WAL_SYNC_METHOD_FDATASYNC,
WAL_SYNC_METHOD_OPEN,   /* for O_SYNC */
WAL_SYNC_METHOD_FSYNC_WRITETHROUGH,
WAL_SYNC_METHOD_OPEN_DSYNC  /* for O_DSYNC */
-} WalSyncMethod;
+};
 extern PGDLLIMPORT int wal_sync_method;
 
 extern PGDLLIMPORT XLogRecPtr ProcLastRecPtr;
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 42211bfec4f..08364447c74 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -107,14 +107,10 @@ typedef struct BufferManagerRelation
 #define BMR_REL(p_rel) ((BufferManagerRelation){.rel = p_rel})
 #define BMR_SMGR(p_smgr, p_relpersistence) ((BufferManagerRelation){.smgr = 
p_smgr, .relpersistence = p_relpersistence})
 
-typedef enum ReadBuffersFlags
-{
-   /* Zero out page if reading fails. */
-   READ_BUFFERS_ZERO_ON_ERROR = (1 << 0),
-
-   /* Call smgrprefetch() if I/O necessary. */
-   READ_BUFFERS_ISSUE_ADVICE = (1 << 1),
-} ReadBuffersFlags;
+/* Zero out page if reading fails. */
+#define READ_BUFFERS_ZERO_ON_ERROR (1 << 0)
+/* Call smgrprefetch() if I/O necessary. */
+#define READ_BUFFERS_ISSUE_ADVICE (1 << 1)
 
 struct ReadBuffersOperation
 {


Re: Minor cleanups in the SSL tests

2024-05-16 Thread Peter Eisentraut

On 16.05.24 09:24, Daniel Gustafsson wrote:

When writing a new SSL test for another patch it struck me that the SSL tests
are doing configuration management without using the test framework API's.  The
attached patches cleans this up, no testcases are altered as part of this.

0001 makes the test for PG_TEST_EXTRA a top-level if statement not attached to
any other conditional.  There is no change in functionality, it's mainly for
readability (PG_TEST_EXTRA is it's own concept, not tied to library presence).


Makes sense to me.


0002 ports over editing configfiles to using append_conf() instead of opening
and writing to them directly.


Yes, it's probably preferable to use append_conf() here.  You might want 
to run your patch through pgperltidy.  The result doesn't look bad, but 
a bit different than what you had crafted.


append_conf() opens and closes the file for each call.  It might be nice 
if it could accept a list.  Or you can just pass the whole block as one 
string, like it was done for pg_ident.conf before.






Re: SQL:2011 application time

2024-05-16 Thread Peter Eisentraut

On 15.05.24 11:39, Peter Eisentraut wrote:
Attached are the individual revert patches.  I'm supplying these here 
mainly so that future efforts can use those instead of the original 
patches, since that would have to redo all the conflict resolution and 
also miss various typo fixes etc. that were applied in the meantime.  I 
will commit this as one squashed patch.


This has been done.





Re: Underscore in positional parameters?

2024-05-16 Thread Peter Eisentraut

On 16.05.24 01:11, Michael Paquier wrote:

On Wed, May 15, 2024 at 01:59:36PM +0200, Peter Eisentraut wrote:

On 14.05.24 18:07, Erik Wienhold wrote:

Patch 0002 replaces atol with pg_strtoint32_safe in the backend parser
and strtoint in ECPG.  This fixes overflows like:


Seems like a good idea, but as was said, this is an older issue, so let's
look at that separately.


Hmm, yeah.  I would be really tempted to fix that now.

Now, it has been this way for ages, and with my RMT hat on (aka I need
to show the example), I'd suggest to wait for when the v18 branch
opens as there is no urgency.  I'm OK to apply it myself at the end,
the patch is a good idea.


On this specific patch, maybe reword "parameter too large" to "parameter 
number too large".


Also, I was bemused by the use of atol(), which is notoriously 
unportable (sizeof(long)).  So I poked around and found more places that 
might need fixing.  I'm attaching a patch here with annotations too look 
at later.
From 2d3ad223b1f9b7bb5bebc6b6ef8cbba0d1c0022b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 16 May 2024 08:36:21 +0200
Subject: [PATCH] WIP: atol() investigation

---
 src/backend/parser/scan.l| 2 +-
 src/bin/pg_basebackup/pg_basebackup.c| 2 +-
 src/bin/pg_basebackup/streamutil.c   | 2 +-
 src/bin/pg_rewind/libpq_source.c | 2 +-
 src/interfaces/ecpg/ecpglib/execute.c| 2 +-
 src/interfaces/ecpg/preproc/ecpg.trailer | 2 +-
 src/interfaces/ecpg/preproc/pgc.l| 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index b499975e9c4..383d3f2d39a 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -993,7 +993,7 @@ other   .
 
 {param}{
SET_YYLLOC();
-   yylval->ival = atol(yytext + 1);
+   yylval->ival = atol(yytext + 1); // 
FIXME with overflow check
return PARAM;
}
 {param_junk}   {
diff --git a/src/bin/pg_basebackup/pg_basebackup.c 
b/src/bin/pg_basebackup/pg_basebackup.c
index 8f3dd04fd22..4ebc5e3b2b8 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -2056,7 +2056,7 @@ BaseBackup(char *compression_algorithm, char 
*compression_detail,
tablespacecount = PQntuples(res);
for (i = 0; i < PQntuples(res); i++)
{
-   totalsize_kb += atol(PQgetvalue(res, i, 2));
+   totalsize_kb += atoll(PQgetvalue(res, i, 2)); // FIXME: atol() 
might truncate if sizeof(long)==4
 
/*
 * Verify tablespace directories are empty. Don't bother with 
the
diff --git a/src/bin/pg_basebackup/streamutil.c 
b/src/bin/pg_basebackup/streamutil.c
index d0efd8600ca..1d303d16ef5 100644
--- a/src/bin/pg_basebackup/streamutil.c
+++ b/src/bin/pg_basebackup/streamutil.c
@@ -631,7 +631,7 @@ GetSlotInformation(PGconn *conn, const char *slot_name,
 
/* current TLI */
if (!PQgetisnull(res, 0, 2))
-   tli_loc = (TimeLineID) atol(PQgetvalue(res, 0, 2));
+   tli_loc = (TimeLineID) atol(PQgetvalue(res, 0, 2)); // FIXME: 
use strtoul()
 
PQclear(res);
 
diff --git a/src/bin/pg_rewind/libpq_source.c b/src/bin/pg_rewind/libpq_source.c
index 7d898c3b501..618b175dcc4 100644
--- a/src/bin/pg_rewind/libpq_source.c
+++ b/src/bin/pg_rewind/libpq_source.c
@@ -294,7 +294,7 @@ libpq_traverse_files(rewind_source *source, 
process_file_callback_t callback)
}
 
path = PQgetvalue(res, i, 0);
-   filesize = atol(PQgetvalue(res, i, 1));
+   filesize = atoll(PQgetvalue(res, i, 1)); // FIXME: atol() might 
truncate
isdir = (strcmp(PQgetvalue(res, i, 2), "t") == 0);
link_target = PQgetvalue(res, i, 3);
 
diff --git a/src/interfaces/ecpg/ecpglib/execute.c 
b/src/interfaces/ecpg/ecpglib/execute.c
index 04d0b40c537..c578c21cf66 100644
--- a/src/interfaces/ecpg/ecpglib/execute.c
+++ b/src/interfaces/ecpg/ecpglib/execute.c
@@ -278,7 +278,7 @@ ecpg_is_type_an_array(int type, const struct statement 
*stmt, const struct varia
isarray = ECPG_ARRAY_NONE;
else
{
-   isarray = (atol((char *) PQgetvalue(query, 0, 0)) == 
-1) ? ECPG_ARRAY_ARRAY : ECPG_ARRAY_VECTOR;
+   isarray = (atoi((char *) PQgetvalue(query, 0, 0)) == 
-1) ? ECPG_ARRAY_ARRAY : ECPG_ARRAY_VECTOR;
if (ecpg_dynamic_type(type) == SQL3_CHARACTER ||
ecpg_dynamic_type(type) == 
SQL3_CHARACTER_VARYING)
{
diff --git a/src/interfaces/ecpg/preproc/ecpg.trailer 
b/src/interfaces/ecpg/preproc/ecpg.trailer
index b2aa44f36dd..8ac1c

Re: Adding the extension name to EData / log_line_prefix

2024-05-15 Thread Peter Eisentraut

On 14.05.24 01:11, Tom Lane wrote:

The mechanism that Andres describes for sourcing the name seems a bit
overcomplex though.  Why not just allow/require each extension to
specify its name as a constant string?  We could force the matter by
redefining PG_MODULE_MAGIC as taking an argument:

PG_MODULE_MAGIC("hstore");


We kind of already have something like this, for NLS.  If you look for 
pg_bindtextdomain(TEXTDOMAIN) and ereport_domain(), this information 
already trickles into the vicinity of the error data.  Maybe the same 
thing could just be used for this, by wiring up the macros a bit 
differently.






Re: cataloguing NOT NULL constraints

2024-05-15 Thread Peter Eisentraut

On 15.05.24 09:50, Alvaro Herrera wrote:

On 2024-May-14, Bruce Momjian wrote:


Turns out these commits generated a single release note item, which I
have now removed with the attached committed patch.


Hmm, but the commits about not-null constraints for domains were not
reverted, only the ones for constraints on relations.  I think the
release notes don't properly address the ones on domains.  I think it's
at least these two commits:


-Author: Peter Eisentraut 
-2024-03-20 [e5da0fe3c] Catalog domain not-null constraints
-Author: Peter Eisentraut 
-2024-04-15 [9895b35cb] Fix ALTER DOMAIN NOT NULL syntax


I'm confused that these were kept.  The first one was specifically to 
make the catalog representation of domain not-null constraints 
consistent with table not-null constraints.  But the table part was 
reverted, so now the domain constraints are inconsistent again.


The second one refers to the first one, but it might also fix some 
additional older issue, so it would need more investigation.






Re: Underscore in positional parameters?

2024-05-15 Thread Peter Eisentraut

On 14.05.24 18:07, Erik Wienhold wrote:

Patch 0001 changes rules param and param_junk to only accept digits 0-9.


I have committed this patch to PG16 and master.

I was a little bit on the fence about what the behavior should be, but I 
checked Perl for comparison:


print 1000;   # ok
print 1_000;  # ok
print $1000;  # ok
print $1_000; # error

So this seems alright.


Patch 0002 replaces atol with pg_strtoint32_safe in the backend parser
and strtoint in ECPG.  This fixes overflows like:


Seems like a good idea, but as was said, this is an older issue, so 
let's look at that separately.






Re: doc: some fixes for environment sections in ref pages

2024-05-15 Thread Peter Eisentraut

On 13.05.24 13:02, Daniel Gustafsson wrote:

On 13 May 2024, at 10:48, Peter Eisentraut  wrote:



Patches attached.


All patches look good.


I think the first one is a bug fix.


Agreed.


Committed, thanks.





Re: Postgres and --config-file option

2024-05-15 Thread Peter Eisentraut

On 15.05.24 04:07, Michael Paquier wrote:

Not sure that these additions in --help or the docs are necessary.
The rest looks OK.

-"You must specify the --config-file or -D invocation "
+"You must specify the --config-file (or equivalent -c) or -D invocation "

How about "You must specify the --config-file, -c
\"config_file=VALUE\" or -D invocation"?  There is some practice for
--opt=VALUE in .po files.


Yeah, some of this is becoming quite unwieldy, if we document and 
mention each spelling variant of each option everywhere.


Maybe if the original problem is that the option --config-file is not 
explicitly in the --help output, let's add it to the --help output?






Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

2024-05-15 Thread Peter Eisentraut

On 15.05.24 02:00, Michael Paquier wrote:

On Tue, May 14, 2024 at 10:39:36AM +0200, Peter Eisentraut wrote:

I saw the same thing.  The problem is that there is incomplete dependency
information in the makefiles (not meson) between src/common/ and what is
using it.  So whenever anything changes in src/common/, you pretty much have
to do a forced rebuild of everything.


Is that a recent regression?  I have some blurry memories from
working on these areas that changing src/common/ reflected on the
compiled pieces effectively at some point.


One instance of this problem that I can reproduce at least back to PG12 is

1. touch src/common/exec.c
2. make -C src/bin/pg_dump

This will rebuild libpgcommon, but it will not relink pg_dump.




Re: An improved README experience for PostgreSQL

2024-05-14 Thread Peter Eisentraut

On 14.05.24 19:33, Nathan Bossart wrote:

On Tue, May 14, 2024 at 06:12:26PM +0200, Alvaro Herrera wrote:

On 2024-May-14, Tom Lane wrote:


I don't have a position on whether we want
these additional files or not; but if we do, I think the best answer
is to stick 'em under .github/ where they are out of the way but yet
updatable by any committer.


+1 for .github/, that was my first reaction as well after reading the
link Peter posted.


Here's an updated patch that uses .github/.


I'm fine with putting them under .github/.

I think for CONTRIBUTING.md, a better link would be 
.






Re: Requiring LLVM 14+ in PostgreSQL 18

2024-05-14 Thread Peter Eisentraut

On 15.05.24 06:21, Thomas Munro wrote:

And as I'm looking up how this was previously handled, I notice that
this list of clang-NN versions was last updated equally sneakily as part
of your patch to trim off LLVM <10 (820b5af73dc).  I wonder if the
original intention of that configure code was that maintaining the
versioned list above clang-7/llvm-config-7 was not needed, because the
unversioning programs could be used, or maybe because pkg-config could
be used.  It would be nice if we could get rid of having to update that.

I probably misunderstood why we were doing that, perhaps something to
do with the way some distro (Debian?) was doing things with older
versions, and yeah I see that we went a long time after 7 without
touching it and nobody cared.  Yeah, it would be nice to get rid of
it.  Here's a patch.  Meson didn't have that.


Yes, let's get that v3-0001 patch into PG17.





Re: pgsql: Fix overread in JSON parsing errors for incomplete byte sequence

2024-05-14 Thread Peter Eisentraut

On 10.05.24 14:23, Alvaro Herrera wrote:

On 2024-May-10, Alvaro Herrera wrote:


Not sure what's going on here, or why it fails for me while the
buildfarm is all happy.


Ah, I ran 'git clean -dfx' and now it works correctly.  I must have had
an incomplete rebuild.


I saw the same thing.  The problem is that there is incomplete 
dependency information in the makefiles (not meson) between src/common/ 
and what is using it.  So whenever anything changes in src/common/, you 
pretty much have to do a forced rebuild of everything.






Re: An improved README experience for PostgreSQL

2024-05-14 Thread Peter Eisentraut

On 13.05.24 17:26, Nathan Bossart wrote:

On Sun, May 12, 2024 at 05:17:42PM +0200, Peter Eisentraut wrote:

I don't know, I find these files kind of "yelling".  It's fine to have a
couple, but now it's getting a bit much, and there are more that could be
added.


I'm not sure what you mean by this.  Do you mean that the contents are too
blunt?  That there are too many files?  Something else?


I mean the all-caps file names, cluttering up the top-level directory.


If we want to enhance the GitHub experience, we can also add these files to
the organization instead: 
https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/creating-a-default-community-health-file


This was the intent of my patch.  There might be a few others that we could
use, but I figured we could start with the low-hanging fruit that would
have the most impact on the GitHub experience.


My point is, in order to get that enhanced GitHub experience, you don't 
actually have to commit these files into the individual source code 
repository.  You can add them to the organization and they will apply to 
all repositories under the organization.  This is explained at the above 
link.


However, I don't think these files are actually that useful.  People can 
go to the web site to find out about things about the PostgreSQL 
community.  We don't need to add bunch of $X.md files that just say, 
essentially, got to postgresql.org/$X.





Re: An improved README experience for PostgreSQL

2024-05-14 Thread Peter Eisentraut

On 13.05.24 17:43, Alvaro Herrera wrote:

On 2024-May-13, Nathan Bossart wrote:


If we want to enhance the GitHub experience, we can also add these files to
the organization instead: 
https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/creating-a-default-community-health-file


This was the intent of my patch.  There might be a few others that we could
use, but I figured we could start with the low-hanging fruit that would
have the most impact on the GitHub experience.


Can't we add these two lines per topic to the README.md?


The point of these special file names is that GitHub will produce 
special links to them.  If you look at Nathan's tree


https://github.com/nathan-bossart/postgres/tree/special-files

and scroll down to the README display, you will see links for "Code of 
Conduct", "License", and "Security" across the top.


Whether it's worth having these files just to produce these links is the 
debate.






Re: consider -Wmissing-variable-declarations

2024-05-14 Thread Peter Eisentraut

On 10.05.24 11:53, Heikki Linnakangas wrote:

On 09/05/2024 12:23, Peter Eisentraut wrote:

In [0] I had noticed that we have no automated verification that global
variables are declared in header files.  (For global functions, we have
this through -Wmissing-prototypes.)  As I mentioned there, I discovered
the Clang compiler option -Wmissing-variable-declarations, which does
exactly that.  Clang has supported this for quite some time, and GCC 14,
which was released a few days ago, now also supports it.  I went and
installed this option into the standard build flags and cleaned up the
warnings it found, which revealed a number of interesting things.


Nice! More checks like this is good in general.


Attached are patches organized by sub-topic.  The most dubious stuff is
in patches 0006 and 0007.  A bunch of GUC-related variables are not in
header files but are pulled in via ad-hoc extern declarations.  I can't
recognize an intentional scheme there, probably just done for
convenience or copied from previous practice.  These should be organized
into appropriate header files.


+1 for moving all these to header files. Also all the "other stuff" in 
patch 0007.


I have found a partial explanation for the "other stuff".  We have in 
launch_backend.c:


/*
 * The following need to be available to the save/restore_backend_variables
 * functions.  They are marked NON_EXEC_STATIC in their home modules.
 */
extern slock_t *ShmemLock;
extern slock_t *ProcStructLock;
extern PGPROC *AuxiliaryProcs;
extern PMSignalData *PMSignalState;
extern pg_time_t first_syslogger_file_time;
extern struct bkend *ShmemBackendArray;
extern bool redirection_done;

So these are notionally static variables that had to be sneakily 
exported for the purposes of EXEC_BACKEND.


(This probably also means my current patch set won't work cleanly on 
EXEC_BACKEND builds.  I'll need to check that further.)


However, it turns out that that comment is not completely true. 
ShmemLock, ShmemBackendArray, and redirection_done are not in fact 
NON_EXEC_STATIC.  I think they probably once were, but then they were 
needed elsewhere and people thought, if launch_backend.c (formerly 
postmaster.c) gets at them via its own extern declaration, then I will 
do that too.


ShmemLock has been like that for a longer time, but ShmemBackendArray 
and redirection_done are new like that in PG17, probably from all the 
postmaster.c refactoring.


ShmemLock and redirection_done have now escaped for wider use and should 
be in header files, as my patches are proposing.


ShmemBackendArray only exists if EXEC_BACKEND, so it's fine, but the 
comment is slightly misleading.  Maybe sticking a NON_EXEC_STATIC onto 
ShmemBackendArray, even though it's useless, would make this more 
consistent.






Re: Large files for relations

2024-05-13 Thread Peter Eisentraut

On 06.03.24 22:54, Thomas Munro wrote:

Rebased.  I had intended to try to get this into v17, but a couple of
unresolved problems came up while rebasing over the new incremental
backup stuff.  You snooze, you lose.  Hopefully we can sort these out
in time for the next commitfest:

* should pg_combinebasebackup read the control file to fetch the segment size?
* hunt for other segment-size related problems that may be lurking in
new incremental backup stuff
* basebackup_incremental.c wants to use memory in proportion to
segment size, which looks like a problem, and I wrote about that in a
new thread[1]


Overall, I like this idea, and the patch seems to have many bases covered.

The patch will need a rebase.  I was able to test it on 
master@{2024-03-13}, but after that there are conflicts.


In .cirrus.tasks.yml, one of the test tasks uses 
--with-segsize-blocks=6, but you are removing that option.  You could 
replace that with something like


PG_TEST_INITDB_EXTRA_OPTS='--rel-segsize=48kB'

But that won't work exactly because

initdb: error: argument of --rel-segsize must be a power of two

I suppose that's ok as a change, since it makes the arithmetic more 
efficient.  But maybe it should be called out explicitly in the commit 
message.


If I run it with 64kB, the test pgbench/001_pgbench_with_server fails 
consistently, so it seems there is still a gap somewhere.


A minor point, the initdb error message

initdb: error: argument of --rel-segsize must be a multiple of BLCKSZ

would be friendlier if actually showed the value of the block size 
instead of just the symbol.  Similarly for the nearby error message 
about the off_t size.


In the control file, all the other fields use unsigned types.  Should 
relseg_size be uint64?


PG_CONTROL_VERSION needs to be changed.





Re: SQL:2011 application time

2024-05-13 Thread Peter Eisentraut

On 03.04.24 07:30, Paul Jungwirth wrote:
But is it *literally* unique? Well two identical keys, e.g. (5, 
'[Jan24,Mar24)') and (5, '[Jan24,Mar24)'), do have overlapping ranges, 
so the second is excluded. Normally a temporal unique index is *more* 
restrictive than a standard one, since it forbids other values too (e.g. 
(5, '[Jan24,Feb24)')). But sadly there is one exception: the ranges in 
these keys do not overlap: (5, 'empty'), (5, 'empty'). With 
ranges/multiranges, `'empty' && x` is false for all x. You can add that 
key as many times as you like, despite a PK/UQ constraint:


     postgres=# insert into t values
     ('[1,2)', 'empty', 'foo'),
     ('[1,2)', 'empty', 'bar');
     INSERT 0 2
     postgres=# select * from t;
   id   | valid_at | name
     ---+--+--
  [1,2) | empty    | foo
  [1,2) | empty    | bar
     (2 rows)

Cases like this shouldn't actually happen for temporal tables, since 
empty is not a meaningful value. An UPDATE/DELETE FOR PORTION OF would 
never cause an empty. But we should still make sure they don't cause 
problems.


We should give temporal primary keys an internal CHECK constraint saying 
`NOT isempty(valid_at)`. The problem is analogous to NULLs in parts of a 
primary key. NULLs prevent two identical keys from ever comparing as 
equal. And just as a regular primary key cannot contain NULLs, so a 
temporal primary key should not contain empties.


The standard effectively prevents this with PERIODs, because a PERIOD 
adds a constraint saying start < end. But our ranges enforce only start 
<= end. If you say `int4range(4,4)` you get `empty`. If we constrain 
primary keys as I'm suggesting, then they are literally unique, and 
indisunique seems safer.


Should we add the same CHECK constraint to temporal UNIQUE indexes? I'm 
inclined toward no, just as we don't forbid NULLs in parts of a UNIQUE 
key. We should try to pick what gives users more options, when possible. 
Even if it is questionably meaningful, I can see use cases for allowing 
empty ranges in a temporal table. For example it lets you "disable" a 
row, preserving its values but marking it as never true.


It looks like we missed some of these fundamental design questions early 
on, and it might be too late now to fix them for PG17.


For example, the discussion on unique constraints misses that the 
question of null values in unique constraints itself is controversial 
and that there is now a way to change the behavior.  So I imagine there 
is also a selection of possible behaviors you might want for empty 
ranges.  Intuitively, I don't think empty ranges are sensible for 
temporal unique constraints.  But anyway, it's a bit late now to be 
discussing this.


I'm also concerned that if ranges have this fundamental incompatibility 
with periods, then the plan to eventually evolve this patch set to 
support standard periods will also have as-yet-unknown problems.


Some of these issues might be design flaws in the underlying mechanisms, 
like range types and exclusion constraints.  Like, if you're supposed to 
use this for scheduling but you can use empty ranges to bypass exclusion 
constraints, how is one supposed to use this?  Yes, a check constraint 
using isempty() might be the right answer.  But I don't see this 
documented anywhere.


On the technical side, adding an implicit check constraint as part of a 
primary key constraint is quite a difficult implementation task, as I 
think you are discovering.  I'm just reminded about how the patch for 
catalogued not-null constraints struggled with linking these not-null 
constraints to primary keys correctly.  This sounds a bit similar.


I'm afraid that these issues cannot be resolved in good time for this 
release, so we should revert this patch set for now.






doc: some fixes for environment sections in ref pages

2024-05-13 Thread Peter Eisentraut
I noticed that the reference pages for initdb and pg_ctl claim in the 
Environment section that libpq variables are used, which does not seem 
correct to me.  I think this was accidentally copied when this blurb was 
added to other pages.


While I was checking around that, I also noticed that pg_amcheck and 
pg_upgrade don't have Environment sections on their reference pages, so 
I added them.  For pg_amcheck I copied the standard text for client 
programs.  pg_upgrade has its own specific list of environment variables.


Patches attached.  I think the first one is a bug fix.From 3f573c5935d46b20de7e7129cd0bf69abed1df6c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 13 May 2024 10:10:21 +0200
Subject: [PATCH 1/3] doc: Remove claims that initdb and pg_ctl use libpq
 environment variables

Erroneously introduced by 571df93cff8.
---
 doc/src/sgml/ref/initdb.sgml | 6 --
 doc/src/sgml/ref/pg_ctl-ref.sgml | 7 ---
 2 files changed, 13 deletions(-)

diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 6c96c0c0681..74008a9a82f 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -632,12 +632,6 @@ Environment

   
 
-  
-   This utility, like most other PostgreSQL 
utilities,
-   also uses the environment variables supported by 
libpq
-   (see ).
-  
-
  
 
  
diff --git a/doc/src/sgml/ref/pg_ctl-ref.sgml b/doc/src/sgml/ref/pg_ctl-ref.sgml
index 46906966eb9..a0287bb81d6 100644
--- a/doc/src/sgml/ref/pg_ctl-ref.sgml
+++ b/doc/src/sgml/ref/pg_ctl-ref.sgml
@@ -578,13 +578,6 @@ Environment
unless PGDATA is set.
   
 
-  
-   pg_ctl, like most other 
PostgreSQL
-   utilities,
-   also uses the environment variables supported by 
libpq
-   (see ).
-  
-
   
For additional variables that affect the server,
see .

base-commit: 3ca43dbbb67fbfb96dec8de2e268b96790555148
-- 
2.44.0

From 362f6ed36bebc2d1f0bdaf00f54d9212b344d98f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 13 May 2024 10:12:02 +0200
Subject: [PATCH 2/3] doc: Add standard Environment section to pg_amcheck ref
 page

---
 doc/src/sgml/ref/pg_amcheck.sgml | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/doc/src/sgml/ref/pg_amcheck.sgml b/doc/src/sgml/ref/pg_amcheck.sgml
index 067c806b46d..2b9634b3ac2 100644
--- a/doc/src/sgml/ref/pg_amcheck.sgml
+++ b/doc/src/sgml/ref/pg_amcheck.sgml
@@ -644,6 +644,24 @@ Options
   
  
 
+ 
+  Environment
+
+  
+   pg_amcheck, like most other 
PostgreSQL
+   utilities,
+   also uses the environment variables supported by 
libpq
+   (see ).
+  
+
+  
+   The environment variable PG_COLOR specifies whether to use
+   color in diagnostic messages. Possible values are
+   always, auto and
+   never.
+  
+ 
+
  
   Notes
 
-- 
2.44.0

From f276cf62d1ebb6dc4c501a5d1b397aeea6630fcb Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 13 May 2024 10:25:16 +0200
Subject: [PATCH 3/3] doc: Add standard Environment section to pg_upgrade ref
 page

---
 doc/src/sgml/ref/pgupgrade.sgml | 98 +
 1 file changed, 98 insertions(+)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 10c842adb14..9877f2f01c6 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -939,6 +939,104 @@ Reverting to old cluster
 
  
 
+ 
+  Environment
+
+  
+   Some environment variables can be used to provide defaults for command-line 
options:
+
+   
+
+ PGBINOLD
+
+ 
+  
+   The old PostgreSQL executable directory; option
+   -b/--old-bindir.
+  
+ 
+
+
+
+ PGBINNEW
+
+ 
+  
+   The new PostgreSQL executable directory; option
+   -B/--new-bindir.
+  
+ 
+
+
+
+ PGDATAOLD
+
+ 
+  
+   The old database cluster configuration directory; option
+   -d/--old-datadir.
+  
+ 
+
+
+
+ PGDATANEW
+
+ 
+  
+   The new database cluster configuration directory; option
+   -D/--new-datadir.
+  
+ 
+
+
+
+ PGPORTOLD
+
+ 
+  
+   The old cluster port number; option
+   -p/--old-port.
+  
+ 
+
+
+
+ PGPORTNEW
+
+ 
+  
+   The new cluster port number; option
+   -P/--new-port.
+  
+ 
+
+
+
+ PGSOCKETDIR
+
+ 
+  
+   Directory to use for postmaster sockets during upgrade; option
+   -s/--socketdir.
+  
+ 
+
+
+
+ PGUSER
+
+ 
+  
+   Cluster's install user name; option
+   -U/--username.
+  
+ 
+
+   
+  
+ 
+
  
   Notes
 
-- 
2.44.0



Re: Converting README documentation to Markdown

2024-05-13 Thread Peter Eisentraut

On 08.04.24 21:29, Daniel Gustafsson wrote:

Over in [0] I asked whether it would be worthwhile converting all our README
files to Markdown, and since it wasn't met with pitchforks I figured it would
be an interesting excercise to see what it would take (my honest gut feeling
was that it would be way too intrusive).  Markdown does brings a few key
features however so IMHO it's worth attempting to see:

* New developers are very used to reading/writing it
* Using a defined format ensures some level of consistency
* Many users and contributors new*as well as*  old like reading documentation
   nicely formatted in a browser
* The documentation now prints really well
* pandoc et.al can be used to render nice looking PDF's
* All the same benefits as discussed in [0]

The plan was to follow Grubers original motivation for Markdown closely:

"The idea is that a Markdown-formatted document should be publishable
as-is, as plain text, without looking like it’s been marked up with
tags or formatting instructions."

This translates to making the least amount of changes to achieve a) retained
plain text readability at todays level, b) proper Markdown rendering, not
looking like text files in a HTML window, and c) absolutly no reflows and
minimal impact on git blame.


I started looking through this and immediately found a bunch of tiny 
problems.  (This is probably in part because the READMEs under 
src/backend/access/ are some of the more complicated ones, but then they 
are also the ones that might benefit most from better rendering.)


One general problem is that original Markdown and GitHub-flavored 
Markdown (GFM) are incompatible in some interesting aspects.  For 
example, the line


A split initially marks the left page with the F_FOLLOW_RIGHT flag.

is rendered by GFM as you'd expect.  But original Markdown converts it to

A split initially marks the left page with the FFOLLOWRIGHT
flag.

This kind of problem is pervasive, as you'd expect.

Another incompatibility is that GFM accepts "1)" as a list marker (which 
appears to be used often in the READMEs), but original Markdown does 
not.  This then also affects surrounding formatting.


Also, the READMEs often do not indent lists in a non-ambiguous way.  For 
example, if you look into src/backend/optimizer/README, section "Join 
Tree Construction", there are two list items, but it's not immediately 
clear which paragraphs belong to the list and which ones follow the 
list.  This also interacts with the previous point.  The resulting 
formatting in GFM is quite misleading.


src/port/README.md is a similar case.

There are also various places where whitespace is used for ad-hoc 
formatting.  Consider for example in src/backend/access/gin/README


  the "category" of the null entry.  These are the possible categories:

1 = ordinary null key value extracted from an indexable item
2 = placeholder for zero-key indexable item
3 = placeholder for null indexable item

  Placeholder null entries are inserted into the index because otherwise

But this does not preserve the list-like formatting, it just flows it 
together.


There is a similar case with the authors list at the end of 
src/backend/access/gist/README.md.


src/test/README.md wasn't touched by your patch, but it also needs 
adjustments for list formatting.



In summary, I think before we could accept this, we'd need to go through 
this with a fine-toothed comb line by line and page by page to make sure 
the formatting is still sound.  And we'd need to figure out which 
Markdown flavor to target.






Convert sepgsql tests to TAP

2024-05-13 Thread Peter Eisentraut
The sepgsql tests have not been integrated into the Meson build system 
yet.  I propose to fix that here.


One problem there was that the tests use a very custom construction 
where a top-level shell script internally calls make.  I have converted 
this to a TAP script that does the preliminary checks and then calls 
pg_regress directly, without make.  This seems to get the job done. 
Also, once you have your SELinux environment set up as required, the 
test now works fully automatically; you don't have to do any manual prep 
work.  The whole thing is guarded by PG_TEST_EXTRA=sepgsql now.


Some comments and questions:

- Do we want to keep the old way to run the test?  I don't know all the 
testing scenarios that people might be interested in, but of course it 
would also be good to cut down on the duplication in the test files.


- Strangely, there was apparently so far no way to get to the build 
directory from a TAP script.  They only ever want to read files from the 
source directory.  So I had to add that.


- If you go through the pre-test checks in contrib/sepgsql/test_sepgsql, 
I have converted most of these checks to the Perl script.  Some of the 
checks are obsolete, because they check whether the database has been 
correctly initialized, which is now done by the TAP script anyway.  One 
check that I wasn't sure about is the


# 'psql' command must be executable from test domain

The old test was checking the installation tree, which I guess could be 
set up in random ways.  But do we need this kind of check if we are 
using a temporary installation?


As mentioned in the patch, the documentation needs to be updated.  This 
depends on the outcome of the question above whether we want to keep the 
old tests in some way.
From 2f8e73932b1068caf696582487de9e100fcd46be Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 13 May 2024 07:55:55 +0200
Subject: [PATCH v1] Convert sepgsql tests to TAP

Add a TAP test for sepgsql.  This automates the previously required
manual setup before the test.  The actual tests are still run by
pg_regress, but not called from within the TAP Perl script.

TODO: remove the old test scripts?
---
 contrib/sepgsql/.gitignore   |   3 +
 contrib/sepgsql/Makefile |   3 +
 contrib/sepgsql/meson.build  |  11 +-
 contrib/sepgsql/t/001_sepgsql.pl | 248 +++
 doc/src/sgml/regress.sgml|  11 ++
 doc/src/sgml/sepgsql.sgml|   2 +
 meson.build  |   1 +
 src/Makefile.global.in   |   1 +
 8 files changed, 279 insertions(+), 1 deletion(-)
 create mode 100644 contrib/sepgsql/t/001_sepgsql.pl

diff --git a/contrib/sepgsql/.gitignore b/contrib/sepgsql/.gitignore
index 31613e011f5..7cadb9419e9 100644
--- a/contrib/sepgsql/.gitignore
+++ b/contrib/sepgsql/.gitignore
@@ -1,7 +1,10 @@
 /sepgsql.sql
+# FIXME
 /sepgsql-regtest.fc
 /sepgsql-regtest.if
 /sepgsql-regtest.pp
 /tmp
 # Generated subdirectories
+/log/
 /results/
+/tmp_check/
diff --git a/contrib/sepgsql/Makefile b/contrib/sepgsql/Makefile
index afca75b693f..5cc9697736c 100644
--- a/contrib/sepgsql/Makefile
+++ b/contrib/sepgsql/Makefile
@@ -15,6 +15,9 @@ OBJS = \
 DATA_built = sepgsql.sql
 PGFILEDESC = "sepgsql - SELinux integration"
 
+TAP_TESTS = 1
+
+# FIXME
 # Note: because we don't tell the Makefile there are any regression tests,
 # we have to clean those result files explicitly
 EXTRA_CLEAN = -r $(pg_regress_clean_files) tmp/ *.pp sepgsql-regtest.if 
sepgsql-regtest.fc
diff --git a/contrib/sepgsql/meson.build b/contrib/sepgsql/meson.build
index 9544efe0287..5817ba30a58 100644
--- a/contrib/sepgsql/meson.build
+++ b/contrib/sepgsql/meson.build
@@ -40,4 +40,13 @@ contrib_targets += custom_target('sepgsql.sql',
   install_dir: contrib_data_args['install_dir'],
 )
 
-# TODO: implement sepgsql tests
+tests += {
+  'name': 'sepgsql',
+  'sd': meson.current_source_dir(),
+  'bd': meson.current_build_dir(),
+  'tap': {
+'tests': [
+  't/001_sepgsql.pl',
+],
+  },
+}
diff --git a/contrib/sepgsql/t/001_sepgsql.pl b/contrib/sepgsql/t/001_sepgsql.pl
new file mode 100644
index 000..82bba5774ce
--- /dev/null
+++ b/contrib/sepgsql/t/001_sepgsql.pl
@@ -0,0 +1,248 @@
+
+# Copyright (c) 2024, PostgreSQL Global Development Group
+
+use strict;
+use warnings FATAL => 'all';
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+
+use Test::More;
+
+if (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bsepgsql\b/)
+{
+   plan skip_all =>
+ 'Potentially unsafe test sepgsql not enabled in PG_TEST_EXTRA';
+}
+
+note "checking selinux environment";
+
+# matchpathcon must be present to assess whether the installation environment
+# is OK.
+note "checking for matchpathcon";
+if (system('matchpathcon -n . >/dev/null 2>&1') != 0)
+{
+   diag </dev/null 2>&1') != 0)
+{
+   diag </dev/null 2>&1') != 0)
+{
+   diag </dev/null | sed 's/:/

Re: elog/ereport VS misleading backtrace_function function address

2024-05-12 Thread Peter Eisentraut

On 07.05.24 09:43, Jakub Wartak wrote:

NOTE: in case one will be testing this: one cannot ./configure with
--enable-debug as it prevents the compiler optimizations that actually
end up with the ".cold" branch optimizations that cause backtrace() to
return the wrong address.


Is that configuration useful?  If you're interested in backtraces, 
wouldn't you also want debug symbols?  Don't production builds use debug 
symbols nowadays as well?



I recall speculating about whether we could somehow invoke gdb
to get a more comprehensive and accurate backtrace, but I don't
really have a concrete idea how that could be made to work.

Oh no, I'm more about micro-fix rather than doing some bigger
revolution. The patch simply adds this one instruction in macro, so
that now backtrace_functions behaves better:

0x00773d28 <+105>:   call   0x79243f 
0x00773d2d <+110>:   movzbl -0x12(%rbp),%eax  << this ends
up being added by the patch
0x00773d31 <+114>:   call   0xdc1a0 

I'll put that as for PG18 in CF, but maybe it could be backpatched too
- no hard opinion on that (I don't see how that ERROR/FATAL path could
cause any performance regressions)


I'm missing a principled explanation of what this does.  I just see that 
it sprinkles some no-op code to make this particular thing work a bit 
differently, but this might just behave differently with the next 
compiler next year.  I'd like to see a bit more of an abstract 
explanation of what is happening here.






Re: An improved README experience for PostgreSQL

2024-05-12 Thread Peter Eisentraut

On 17.04.24 04:36, Nathan Bossart wrote:

On Wed, Feb 28, 2024 at 02:21:49PM -0600, Nathan Bossart wrote:

I see many projects have files like SECURITY.md, CODE_OF_CONDUCT.md, and
CONTRIBUTING.md, and I think it would be relatively easy to add content to
each of those for PostgreSQL, even if they just link elsewhere.

Here's a first attempt at this.  You can see some of the effects of these
files at [0].  More information about these files is available at [1] [2]
[3].

I figured we'd want to keep these pretty bare-bones to avoid duplicating
information that's already available at postgresql.org, but perhaps it's
worth filling them out a bit more.  Anyway, I just wanted to gauge interest
in this stuff.


I don't know, I find these files kind of "yelling".  It's fine to have a 
couple, but now it's getting a bit much, and there are more that could 
be added.


If we want to enhance the GitHub experience, we can also add these files 
to the organization instead: 
https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/creating-a-default-community-health-file






Re: Requiring LLVM 14+ in PostgreSQL 18

2024-05-12 Thread Peter Eisentraut

On 24.04.24 01:43, Thomas Munro wrote:

Rebased over ca89db5f.


These patches look fine to me.  The new cut-off makes sense, and it does 
save quite a bit of code.  We do need to get the Cirrus CI Debian images 
updated first, as you had already written.


As part of this patch, you also sneak in support for LLVM 18 
(llvm-config-18, clang-18 in configure).  Should this be a separate patch?


And as I'm looking up how this was previously handled, I notice that 
this list of clang-NN versions was last updated equally sneakily as part 
of your patch to trim off LLVM <10 (820b5af73dc).  I wonder if the 
original intention of that configure code was that maintaining the 
versioned list above clang-7/llvm-config-7 was not needed, because the 
unversioning programs could be used, or maybe because pkg-config could 
be used.  It would be nice if we could get rid of having to update that.






Re: Logging which interface was connected to in log_line_prefix

2024-05-12 Thread Peter Eisentraut

On 01.05.24 19:04, Greg Sabino Mullane wrote:
Thank you for taking the time to review this. I've attached a new 
rebased version, which has no significant changes.


There is a comment in the patch that states:

/* We do not need clean_ipv6_addr here: just report verbatim */

I am not quite sure what it means, but I am guessing it means that
the patch does not need to format the IPv6 addresses in any specific
way.


Yes, basically correct. There is a kluge (their word, not mine) in 
utils/adt/network.c to strip the zone - see the comment for the  
clean_ipv6_addr() function in that file. I added the patch comment in 
case some future person wonders why we don't "clean up" the ipv6 
address, like other places in the code base do. We don't need to pass it 
back to anything else, so we can simply output the correct version, zone 
and all.


clean_ipv6_addr() needs to be called before trying to convert a string 
representation into inet/cidr types.  This is not what is happening 
here.  So the comment is not applicable.






Re: Logging which interface was connected to in log_line_prefix

2024-05-12 Thread Peter Eisentraut

On 06.03.24 16:59, Greg Sabino Mullane wrote:
Someone on -general was asking about this, as they are listening on 
multiple IPs and would like to know which exact one clients were 
hitting. I took a quick look and we already have that information, so I 
grabbed some stuff from inet_server_addr and added it as part of a "%L" 
(for 'local interface'). Quick patch / POC attached.


I was confused by this patch title.  This feature does not log the 
interface (like "eth0" or "lo"), but the local address.  Please adjust 
the terminology.


I noticed that for Unix-domain socket connections, %r and %h write 
"[local]".  I think that should be done for this new placeholder as well.






Re: [PATCH] Replace magic constant 3 with NUM_MERGE_MATCH_KINDS

2024-05-12 Thread Peter Eisentraut

On 19.04.24 11:47, Aleksander Alekseev wrote:

Thanks. I see a few pieces of code that use special FOO_NUMBER enum
values instead of a macro. Should we refactor these pieces
accordingly? PFA another patch.


I think this is a sensible improvement.

But please keep the trailing commas on the last enum items.





Re: Build the docs if there are changes in docs and don't run other tasks if the changes are only in docs

2024-05-12 Thread Peter Eisentraut

On 14.12.23 14:40, Nazir Bilal Yavuz wrote:

On Fri, 6 Oct 2023 at 17:07, Tom Lane  wrote:


As a quick cross-check, I searched our commit log to see how many
README-only commits there were so far this year.  I found 11 since
January.  (Several were triggered by the latest round of pgindent
code and process changes, so maybe this is more than typical.)

Not sure what that tells us about the value of changing the CI
logic, but it does seem like it could be worth the one-liner
change needed to teach buildfarm animals to ignore READMEs.


I agree that it could be worth implementing this logic on buildfarm animals.

In case we want to implement the same logic on the CI, I added a new
version of the patch; it skips CI completely if the changes are only
in the README files.


I don't see how this could be applicable widely enough to be useful:

- While there are some patches that touch on README files, very few of 
those end up in a commit fest.


- If someone manually pushes a change to their own CI environment, I 
don't see why we need to second-guess that.


- Buildfarm runs generally batch several commits together, so it is very 
unlikely that this would be hit.


I think unless some concrete reason for this change can be shown, we 
should drop it.






Re: Add TAP tests for backtrace functionality (was Re: Add test module for verifying backtrace functionality)

2024-05-12 Thread Peter Eisentraut

On 16.03.24 05:25, Bharath Rupireddy wrote:

Postgres has a good amount of code for dealing with backtraces - two
GUCs backtrace_functions and backtrace_on_internal_error,
errbacktrace; all of which use core function set_backtrace from
elog.c. I've not seen this code being tested at all, see code coverage
report - 
https://coverage.postgresql.org/src/backend/utils/error/elog.c.gcov.html.

I think adding a simple test module (containing no .c files) with only
TAP tests will help cover this code. I ended up having it as a
separate module under src/test/modules/test_backtrace as I was not
able to find an existing TAP file in src/test to add these tests.  I'm
able to verify the backtrace related code with the attached patch
consistently. The TAP tests rely on the fact that the server emits
text "BACKTRACE: " to server logs before logging the backtrace, and
the backtrace contains the function name in which the error occurs.
I've turned off query statement logging (set log_statement = none,
log_min_error_statement = fatal) so that the tests get to see the
functions only in the backtrace. Although the CF bot is happy with the
attached patch 
https://github.com/BRupireddy2/postgres/tree/add_test_module_for_bcktrace_functionality_v1,
there might be some more flakiness to it.

Thoughts?


Ran pgperltidy on the new TAP test file added. Please see the attached v2 patch.


I've now moved the new TAP test file to src/test/modules/test_misc/t
as opposed to a new test module to keep it simple. I was not sure why
I hadn't done that in the first place.


Note that backtrace_on_internal_error has been removed, so this patch 
will need to be adjusted for that.


I suggest you consider joining forces with thread [0] where a 
replacement for backtrace_on_internal_error would be discussed.  Having 
some test coverage for whatever is being developed there might be useful.



[0]: 
https://www.postgresql.org/message-id/flat/CAGECzQTpdujCEt2SH4DBwRLoDq4HJArGDaxJSsWX0G=tnnz...@mail.gmail.com






Re: SQL:2011 application time

2024-05-10 Thread Peter Eisentraut
I have committed the 
v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch from 
this (confusingly, there was also a v2 earlier in this thread), and I'll 
continue working on the remaining items.



On 09.05.24 06:24, Paul Jungwirth wrote:
Here are a couple new patches, rebased to e305f715, addressing Peter's 
feedback. I'm still working on integrating jian he's suggestions for the 
last patch, so I've omitted that one here.


On 5/8/24 06:51, Peter Eisentraut wrote:
About v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch, 
I think the
ideas are right, but I wonder if we can fine-tune the new conditionals 
a bit.


--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool 
speculative)
  * If the indexes are to be used for speculative 
insertion, add extra

  * information required by unique index entries.
  */
-   if (speculative && ii->ii_Unique)
+   if (speculative && ii->ii_Unique && 
!ii->ii_HasWithoutOverlaps)

 BuildSpeculativeIndexInfo(indexDesc, ii);

Here, I think we could check !indexDesc->rd_index->indisexclusion 
instead.  So we

wouldn't need ii_HasWithoutOverlaps.


Okay.

Or we could push this into BuildSpeculativeIndexInfo(); it could just 
skip the rest
if an exclusion constraint is passed, on the theory that all the 
speculative index

info is already present in that case.


I like how BuildSpeculativeIndexInfo starts with an Assert that it's 
given a unique index, so I've left the check outside the function. This 
seems cleaner anyway: the function stays more focused.



--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root)
  */
 if (indexOidFromConstraint == idxForm->indexrelid)
 {
-   if (!idxForm->indisunique && onconflict->action == 
ONCONFLICT_UPDATE)
+   if ((!idxForm->indisunique || idxForm->indisexclusion) && 
onconflict->action == ONCONFLICT_UPDATE)

 ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg("ON CONFLICT DO UPDATE not supported 
with exclusion constraints")));


Shouldn't this use only idxForm->indisexclusion anyway?  Like

+   if (idxForm->indisexclusion && onconflict->action == 
ONCONFLICT_UPDATE)


That matches what the error message is reporting afterwards.


Agreed.


  * constraints), so index under consideration can be immediately
  * skipped if it's not unique
  */
-   if (!idxForm->indisunique)
+   if (!idxForm->indisunique || idxForm->indisexclusion)
 goto next;

Maybe here we need a comment.  Or make that a separate statement, like


Yes, that is nice. Done.

Yours,







consider -Wmissing-variable-declarations

2024-05-09 Thread Peter Eisentraut
In [0] I had noticed that we have no automated verification that global 
variables are declared in header files.  (For global functions, we have 
this through -Wmissing-prototypes.)  As I mentioned there, I discovered 
the Clang compiler option -Wmissing-variable-declarations, which does 
exactly that.  Clang has supported this for quite some time, and GCC 14, 
which was released a few days ago, now also supports it.  I went and 
installed this option into the standard build flags and cleaned up the 
warnings it found, which revealed a number of interesting things.


I think these checks are important.  We have been trying to mark global 
variables as PGDLLIMPORT consistently, but that only catches variables 
declared in header files.  Also, a threading project would surely 
benefit from global variables (thread-local variables?) having 
consistent declarations.


Attached are patches organized by sub-topic.  The most dubious stuff is 
in patches 0006 and 0007.  A bunch of GUC-related variables are not in 
header files but are pulled in via ad-hoc extern declarations.  I can't 
recognize an intentional scheme there, probably just done for 
convenience or copied from previous practice.  These should be organized 
into appropriate header files.



[0]: 
https://www.postgresql.org/message-id/c4ac402f-9d7b-45fa-bbc1-4a5bf0a9f...@eisentraut.orgFrom 296a1c465de6cfea333bf7bb7950f02b3ef80b12 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 8 May 2024 13:49:37 +0200
Subject: [PATCH v1 1/7] Add -Wmissing-variable-declarations to the standard
 compilation flags

This warning flag detects global variables not declared in header
files.  This is similar to what -Wmissing-prototypes does for
functions.  (More correctly, it is similar to what
-Wmissing-declarations does for functions, but -Wmissing-prototypes is
a superset of that in C.)

This flag is new in GCC 14.  Clang has supported it for a while.

XXX many warnings to fix first
---
 configure | 99 +++
 configure.ac  |  9 +++
 meson.build   | 10 +++
 src/Makefile.global.in|  1 +
 src/interfaces/ecpg/test/Makefile.regress |  2 +-
 src/interfaces/ecpg/test/meson.build  |  1 +
 src/makefiles/meson.build |  2 +
 src/tools/pg_bsd_indent/Makefile  |  2 +
 src/tools/pg_bsd_indent/meson.build   |  1 +
 9 files changed, 126 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index 89644f2249e..70fd1de88b5 100755
--- a/configure
+++ b/configure
@@ -741,6 +741,7 @@ CXXFLAGS_SL_MODULE
 CFLAGS_SL_MODULE
 CFLAGS_VECTORIZE
 CFLAGS_UNROLL_LOOPS
+PERMIT_MISSING_VARIABLE_DECLARATIONS
 PERMIT_DECLARATION_AFTER_STATEMENT
 LLVM_BINPATH
 LLVM_CXXFLAGS
@@ -6080,6 +6081,104 @@ if test x"$pgac_cv_prog_CXX_cxxflags__Wformat_security" 
= x"yes"; then
 fi
 
 
+  # gcc 14+, clang for a while
+  save_CFLAGS=$CFLAGS
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CC} supports 
-Wmissing-variable-declarations, for CFLAGS" >&5
+$as_echo_n "checking whether ${CC} supports -Wmissing-variable-declarations, 
for CFLAGS... " >&6; }
+if ${pgac_cv_prog_CC_cflags__Wmissing_variable_declarations+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+pgac_save_CC=$CC
+CC=${CC}
+CFLAGS="${CFLAGS} -Wmissing-variable-declarations"
+ac_save_c_werror_flag=$ac_c_werror_flag
+ac_c_werror_flag=yes
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  pgac_cv_prog_CC_cflags__Wmissing_variable_declarations=yes
+else
+  pgac_cv_prog_CC_cflags__Wmissing_variable_declarations=no
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+ac_c_werror_flag=$ac_save_c_werror_flag
+CFLAGS="$pgac_save_CFLAGS"
+CC="$pgac_save_CC"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$pgac_cv_prog_CC_cflags__Wmissing_variable_declarations" >&5
+$as_echo "$pgac_cv_prog_CC_cflags__Wmissing_variable_declarations" >&6; }
+if test x"$pgac_cv_prog_CC_cflags__Wmissing_variable_declarations" = x"yes"; 
then
+  CFLAGS="${CFLAGS} -Wmissing-variable-declarations"
+fi
+
+
+  PERMIT_MISSING_VARIABLE_DECLARATIONS=
+  if test x"$save_CFLAGS" != x"$CFLAGS"; then
+PERMIT_MISSING_VARIABLE_DECLARATIONS=-Wno-missing-variable-declarations
+  fi
+
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking whether ${CXX} supports 
-Wmissing-variable-declarations, for CXXFLAGS" >&5
+$as_echo_n "checking whether ${CXX} supports -Wmissing-variable-declarations, 
for CXXFLAGS... " >&6; }
+if ${pgac_cv_prog_CXX_cxxflags__Wmissing_variable_declarations+:} false; then :
+  $as_echo_n &

Re: SQL:2011 application time

2024-05-08 Thread Peter Eisentraut

On 30.04.24 18:39, Paul Jungwirth wrote:

On 4/30/24 09:24, Robert Haas wrote:

Peter, could you have a look at
http://postgr.es/m/47550967-260b-4180-9791-b224859fe...@illuminatedcomputing.com
and express an opinion about whether each of those proposals are (a)
good or bad ideas and (b) whether they need to be fixed for the
current release?


Here are the same patches but rebased.


I have committed 
v2-0002-Add-test-for-REPLICA-IDENTITY-with-a-temporal-key.patch.

About v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch, I think 
the
ideas are right, but I wonder if we can fine-tune the new conditionals a bit.

--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool 
speculative)
 * If the indexes are to be used for speculative insertion, add 
extra
 * information required by unique index entries.
 */
-   if (speculative && ii->ii_Unique)
+   if (speculative && ii->ii_Unique && !ii->ii_HasWithoutOverlaps)
BuildSpeculativeIndexInfo(indexDesc, ii);

Here, I think we could check !indexDesc->rd_index->indisexclusion instead.  So 
we
wouldn't need ii_HasWithoutOverlaps.

Or we could push this into BuildSpeculativeIndexInfo(); it could just skip the 
rest
if an exclusion constraint is passed, on the theory that all the speculative 
index
info is already present in that case.

--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root)
 */
if (indexOidFromConstraint == idxForm->indexrelid)
{
-   if (!idxForm->indisunique && onconflict->action == 
ONCONFLICT_UPDATE)
+   if ((!idxForm->indisunique || idxForm->indisexclusion) && 
onconflict->action == ONCONFLICT_UPDATE)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("ON CONFLICT DO UPDATE not supported with exclusion 
constraints")));

Shouldn't this use only idxForm->indisexclusion anyway?  Like

+   if (idxForm->indisexclusion && onconflict->action == 
ONCONFLICT_UPDATE)

That matches what the error message is reporting afterwards.

 * constraints), so index under consideration can be immediately
 * skipped if it's not unique
 */
-   if (!idxForm->indisunique)
+   if (!idxForm->indisunique || idxForm->indisexclusion)
goto next;

Maybe here we need a comment.  Or make that a separate statement, like

/* not supported yet etc. */
if (idxForm->indixexclusion)
next;





Re: AIX support

2024-05-08 Thread Peter Eisentraut

On 08.05.24 13:39, Sriram RK wrote:
We would like to understand your inputs/plans on reverting the changes 
for AIX.


I think the ship has sailed for PG17.  The way forward would be that you 
submit a patch for new, modernized AIX support for PG18.






Re: PERIOD foreign key feature

2024-05-08 Thread Peter Eisentraut

On 07.05.24 18:43, Paul Jungwirth wrote:

On 5/7/24 08:23, David G. Johnston wrote:
On Tue, May 7, 2024 at 7:54 AM Bruce Momjian > wrote:

    In the two marked lines, it says "if one side of the foreign key uses
    PERIOD, the other side must too."  However, looking at the example
    queries, it seems like if the foreign side has PERIOD, the primary 
side

    must have WITHOUT OVERLAPS, not PERIOD.

    Does this doc text need correcting?


The text is factually correct, though a bit hard to parse.

"the other side" refers to the part after "REFERENCES":

FOREIGN KEY ( column_name [, ... ] [, PERIOD column_name ] ) 
REFERENCES reftable [ ( refcolumn [, ... ] [, PERIOD column_name ] ) ]


***(shouldn't the second occurrence be [, PERIOD refcolum] ?)

The text is pointing out that since the refcolumn specification is 
optional you may very well not see a second PERIOD keyword in the 
clause.  Instead it will be inferred from the PK.


Maybe:

Finally, if the foreign key has a PERIOD column_name specification the 
corresponding refcolumn, if present, must also be marked PERIOD.  If 
the refcolumn clause is omitted, and thus the reftable's primary key 
constraint chosen, the primary key must have its final column marked 
WITHOUT OVERLAPS.


Yes, David is correct here on all points. I like his suggestion to 
clarify the language here also. If you need a patch from me let me know, 
but I assume it's something a committer can just make happen?


In principle yes, but it's also very helpful if someone produces an 
actual patch file, with complete commit message, credits, mailing list 
link, etc.






Re: add --no-sync to pg_upgrade's calls to pg_dump and pg_dumpall

2024-05-08 Thread Peter Eisentraut

On 03.05.24 19:13, Nathan Bossart wrote:

This is likely small potatoes compared to some of the other
pg_upgrade-related improvements I've proposed [0] [1] or plan to propose,
but this is easy enough, and I already wrote the patch, so here it is.
AFAICT there's no reason to bother syncing these dump files to disk.  If
someone pulls the plug during pg_upgrade, it's not like you can resume
pg_upgrade from where it left off.  Also, I think we skipped syncing before
v10, anyway, as the --no-sync flag was only added in commit 96a7128, which
added the code to sync dump files, too.


Looks good to me.





Re: partitioning and identity column

2024-05-07 Thread Peter Eisentraut

On 30.04.24 12:59, Ashutosh Bapat wrote:

PFA patch which fixes all the three problems.


I have committed this patch.  Thanks all.




Re: wrong comment in libpq.h

2024-05-06 Thread Peter Eisentraut

On 04.05.24 00:29, David Zhang wrote:

On 2024-05-03 4:53 a.m., Daniel Gustafsson wrote:

On 3 May 2024, at 13:48, Peter Eisentraut  wrote:
Maybe it's easier if we just replaced

    prototypes for functions in xxx.c

with

    declarations for xxx.c

throughout src/include/libpq/libpq.h.

+1

+1


It looks like this wording "prototypes for functions in" is used many 
times in src/include/, in many cases equally inaccurately, so I would 
suggest creating a more comprehensive patch for this.






Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-05-03 Thread Peter Eisentraut

On 03.05.24 10:39, Daniel Gustafsson wrote:

  I would
recommend to update the documentation of PQinitSSL and PQinitOpenSSL
to tell that these become useless and are deprecated.

They are no-ops when linking against v18, but writing an extension which
targets all supported versions of postgres along with their respective
supported OpenSSL versions make them still required, or am I missing something?


I don't think extensions come into play here, since this is libpq, so 
only the shared library interface compatibility matters.






Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Peter Eisentraut

On 03.05.24 17:47, Daniel Verite wrote:

Peter Eisentraut wrote:


  However, off the top of my head, this definition has three flaws: (1)
It would make the single-character wildcard effectively an
any-number-of-characters wildcard, but only in some circumstances, which
could be confusing, (2) it would be difficult to compute, because you'd
have to check equality against all possible single-character strings,
and (3) it is not what the SQL standard says.


For #1 we're currently using the definition of a "character" as
being any single point of code,


That is the definition that is used throughout SQL and PostgreSQL.  We 
can't change that without redefining everything.  To pick just one 
example, the various trim function also behave in seemingly inconsistent 
ways when you apply then to strings in different normalization forms. 
The better fix there is to enforce the normalization form somehow.



Intuitively I think that our interpretation of "character" here should
be whatever sequence of code points are between character
boundaries [1], and that the equality of such characters would be the
equality of their sequences of code points, with the string equality
check of the collation, whatever the length of these sequences.

[1]:
https://unicode-org.github.io/icu/userguide/boundaryanalysis/#character-boundary


Even that page says, what we are calling character here is really called 
a grapheme cluster.


In a different world, pattern matching, character trimming, etc. would 
work by grapheme, but it does not.






Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Peter Eisentraut

On 03.05.24 16:58, Daniel Verite wrote:

* Generating bounds for a sort key (prefix matching)

Having sort keys for strings allows for easy creation of bounds -
sort keys that are guaranteed to be smaller or larger than any sort
key from a give range. For example, if bounds are produced for a
sortkey of string “smith”, strings between upper and lower bounds
with one level would include “Smith”, “SMITH”, “sMiTh”. Two kinds
of upper bounds can be generated - the first one will match only
strings of equal length, while the second one will match all the
strings with the same initial prefix.

CLDR 1.9/ICU 4.6 and later map U+ to a collation element with
the maximum primary weight, so that for example the string
“smith\u” can be used as the upper bound rather than modifying
the sort key for “smith”.

In other words it says that

   col LIKE 'smith%' collate "nd"

is equivalent to:

   col >= 'smith' collate "nd" AND col < U&'smith\' collate "nd"

which could be obtained from an index scan, assuming a btree
index on "col" collate "nd".

U+ is a valid code point but a "non-character" [1] so it's
not supposed to be present in normal strings.


Thanks, this could be very useful!





Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-05-03 Thread Peter Eisentraut

On 03.05.24 16:13, Tom Lane wrote:

Peter Eisentraut  writes:

On 30.04.24 19:29, Tom Lane wrote:

Also, the bigger picture here is the seeming assumption that "if
we change pg_trgm then it will be safe to replicate from x86 to
arm".  I don't believe that that's a good idea and I'm unwilling
to promise that it will work, regardless of what we do about
char signedness.  That being the case, I don't want to invest a
lot of effort in the signedness issue.  Option (1) is clearly
a small change with little if any risk of future breakage.



But note that option 1 would prevent some replication that is currently
working.


The point of this thread though is that it's working only for small
values of "work".  People are rightfully unhappy if it seems to work
and then later they get bitten by compatibility problems.

Treating char signedness as a machine property in pg_control would
signal that we don't intend to make it work, and would ensure that
even the most minimal testing would find out that it doesn't work.

If we do not do that, it seems to me we have to buy into making
it work.  That would mean dealing with the consequences of an
incompatible change in pg_trgm indexes, and then going through
the same dance again the next time(s) similar problems are found.


Yes, that is understood.  But anecdotally, replicating between x86-64 
arm64 is occasionally used for upgrades or migrations.  In practice, 
this appears to have mostly worked.  If we now discover that it won't 
work with certain index extension modules, it's usable for most users. 
Even if we say, you have to reindex everything afterwards, it's probably 
still useful for these scenarios.


The way I understand the original report, the issue has to do 
specifically with how signed and unsigned chars compare differently.  I 
don't imagine this is used anywhere in the table/heap code.  So it's 
plausible that this issue is really contained to indexes.


On the other hand, if we put in a check against this, then at least we 
can answer any user questions about this with more certainty: No, won't 
work, here is why.






Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Peter Eisentraut

On 03.05.24 15:20, Robert Haas wrote:

On Fri, May 3, 2024 at 4:52 AM Peter Eisentraut  wrote:

What the implementation does is, it walks through the pattern.  It sees
'_', so it steps over one character in the input string, which is '.'
here.  Then we have 'foo.' left to match in the input string.  Then it
takes from the pattern the next substring up to but not including either
a wildcard character or the end of the string, which is 'oo', and then
it checks if a prefix of the remaining input string can be found that is
"equal to" 'oo'.  So here it would try in turn

  '' = 'oo' collate ign_punct ?
  'f'= 'oo' collate ign_punct ?
  'fo'   = 'oo' collate ign_punct ?
  'foo'  = 'oo' collate ign_punct ?
  'foo.' = 'oo' collate ign_punct ?

and they all fail, so the match fails.


Interesting. Does that imply that these matches are slower than normal ones?


Yes, certainly, and there is also no indexing support (other than for 
exact matches).






Re: Document NULL

2024-05-03 Thread Peter Eisentraut

On 02.05.24 17:23, David G. Johnston wrote:
Version 2 attached.  Still a draft, focused on topic picking and overall 
structure.  Examples and links planned plus the usual semantic markup stuff.


I chose to add a new sect1 in the user guide (The SQL Language) chapter, 
"Data".


Please, let's not.

A stylistic note: "null" is an adjective.  You can talk about a "null 
value" or a value "is null".  These are lower-cased (or maybe 
title-cased).  You can use upper-case when referring to SQL syntax 
elements (in which case also tag it with something like ), and 
also to the C-language symbol (tagged with ).  We had recently 
cleaned this up, so I think the rest of the documentation should be 
pretty consistent about this.





Re: Tarball builds in the new world order

2024-05-03 Thread Peter Eisentraut

On 29.04.24 18:14, Tom Lane wrote:

Peter Eisentraut  writes:

On 26.04.24 21:24, Tom Lane wrote:

Concretely, I'm proposing the attached.  Peter didn't like
PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
wedded to that if a better name is proposed.



This seems ok to me, but note that we do have an equivalent
implementation in meson.  If we don't want to update that in a similar
way, maybe we should disable it.


OK.  After poking at that for awhile, it seemed like "default to
HEAD" fits into meson a lot better than "throw an error if the
variable isn't set", so I switched to doing it like that.
One reason is that AFAICT you can only set the variable during
"meson setup" not during "ninja".  This won't matter to the
tarball build script, which does a one-off configuration run
anyway.  But for manual use, a movable target like HEAD might be
more convenient given that behavior.


This patch looks good to me.





Re: A failure in prepared_xacts test

2024-05-03 Thread Peter Eisentraut

On 29.04.24 07:11, Tom Lane wrote:

Up to now, we've only worried about whether tests running in parallel
within a single test suite can interact.  It's quite scary to think
that the meson setup has expanded the possibility of interactions
to our entire source tree.  Maybe that was a bad idea and we should
fix the meson infrastructure to not do that.  I fear that otherwise,
we'll get bit regularly by very-low-probability bugs of this kind.


I don't think there is anything fundamentally different in the 
parallelism setups of the make-based and the meson-based tests.  There 
are just different implementation details that might affect the likely 
orderings and groupings.






Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation

2024-05-03 Thread Peter Eisentraut

On 30.04.24 19:29, Tom Lane wrote:

1) Assume that char signedness is somehow a property of bits-on-disk
even though it's weird.  Then pg_trgm indexes are correct, but we need
to store char signedness in pg_control.
2) Assume that char signedness is not a property of bits-on-disk.
Then pg_trgm indexes are buggy and need to be fixed.
What do you think?

Also, the bigger picture here is the seeming assumption that "if
we change pg_trgm then it will be safe to replicate from x86 to
arm".  I don't believe that that's a good idea and I'm unwilling
to promise that it will work, regardless of what we do about
char signedness.  That being the case, I don't want to invest a
lot of effort in the signedness issue.  Option (1) is clearly
a small change with little if any risk of future breakage.


But note that option 1 would prevent some replication that is currently 
working.






Re: [PATCH] json_lex_string: don't overread on bad UTF8

2024-05-03 Thread Peter Eisentraut

On 30.04.24 19:39, Jacob Champion wrote:

Tangentially: Should we maybe rethink pieces of the json_lex_string
error handling? For example, do we really want to echo an incomplete
multibyte sequence once we know it's bad?


I can't quite find the place you might be looking at in 
json_lex_string(), but for the general encoding conversion we have what 
would appear to be the same behavior in report_invalid_encoding(), and 
we go out of our way there to produce a verbose error message including 
the invalid data.






Re: wrong comment in libpq.h

2024-05-03 Thread Peter Eisentraut

On 03.05.24 00:37, David Zhang wrote:

Hi Hackers,

There is a comment like below in src/include/libpq/libpq.h,

  /*
   * prototypes for functions in be-secure.c
   */
extern PGDLLIMPORT char *ssl_library;
extern PGDLLIMPORT char *ssl_cert_file;

...

However, 'ssl_library', 'ssl_cert_file' and the rest are global 
parameter settings, not functions. To address this confusion, it would 
be better to move all global configuration settings to the proper 
section, such as /* GUCs */, to maintain consistency.


Maybe it's easier if we just replaced

prototypes for functions in xxx.c

with

declarations for xxx.c

throughout src/include/libpq/libpq.h.





Re: tablecmds.c/MergeAttributes() cleanup

2024-05-03 Thread Peter Eisentraut

On 30.04.24 21:48, Robert Haas wrote:

I took a look at this patch. Currently this case crashes:

CREATE TABLE cmdata(f1 text COMPRESSION pglz);
CREATE TABLE cmdata3(f1 text);
CREATE TABLE cminh() INHERITS (cmdata, cmdata3);

The patch makes this succeed, but I was initially unclear why it
didn't make it fail with an error instead: you can argue that cmdata
has pglz and cmdata3 has default and those are different. It seems
that prior precedent goes both ways -- we treat the absence of a
STORAGE specification as STORAGE EXTENDED and it conflicts with an
explicit storage specification on some other inheritance parent - but
on the other hand, we treat the absence of a default as compatible
with any explicit default, similar to what happens here.


The actual behavior here is arguably not ideal.  It was the purpose of 
the other thread mentioned upthread to improve that, but that was not 
successful for the time being.



So now I think this is committable, but I can't do it now because I
won't be around for the next few hours in case the buildfarm blows up.
I can do it tomorrow, or perhaps Peter would like to handle it since
it seems to have been his commit that introduced the issue.


I have committed it now.





Re: Support LIKE with nondeterministic collations

2024-05-03 Thread Peter Eisentraut

On 03.05.24 02:11, Robert Haas wrote:

On Thu, May 2, 2024 at 9:38 AM Peter Eisentraut  wrote:

On 30.04.24 14:39, Daniel Verite wrote:

postgres=# SELECT '.foo.' like '_oo' COLLATE ign_punct;
 ?column?
--
 f
(1 row)

The first two results look fine, but the next one is inconsistent.


This is correct, because '_' means "any single character".  This is
independent of the collation.


Seems really counterintuitive. I had to think for a long time to be
able to guess what was happening here. Finally I came up with this
guess:

If the collation-aware matching tries to match up f with the initial
period, the period is skipped and the f matches f. But when the
wildcard is matched to the initial period, that uses up the wildcard
and then we're left trying to match o with f, which doesn't work.


Formally, what

X like '_oo'

means is, can X be partitioned into substrings such that the first 
substring is a single character and the second substring is equal to 
'oo' under the applicable collation?  This is false in this case, there 
is no such partitioning.


What the implementation does is, it walks through the pattern.  It sees 
'_', so it steps over one character in the input string, which is '.' 
here.  Then we have 'foo.' left to match in the input string.  Then it 
takes from the pattern the next substring up to but not including either 
a wildcard character or the end of the string, which is 'oo', and then 
it checks if a prefix of the remaining input string can be found that is 
"equal to" 'oo'.  So here it would try in turn


'' = 'oo' collate ign_punct ?
'f'= 'oo' collate ign_punct ?
'fo'   = 'oo' collate ign_punct ?
'foo'  = 'oo' collate ign_punct ?
'foo.' = 'oo' collate ign_punct ?

and they all fail, so the match fails.


It'd probably be good to use something like this as an example in the
documentation. My intuition is that if foo matches a string, then _oo
f_o and fo_ should also match that string. Apparently that's not the
case, but I doubt I'll be the last one who thinks it should be.


This intuition fails because with nondeterministic collations, strings 
of different lengths can be equal, and so the question arises, what does 
the pattern '_' mean.  It could mean either, (1) a single character, or 
perhaps something like, (2) a string that is equal to some other string 
of length one.


The second definition would satisfy the expectation here, because then 
'.f' matches '_' because '.f' is equal to some string of length one, 
such as 'f'.  (And then 'oo.' matches 'oo' for the rest of the pattern.) 
 However, off the top of my head, this definition has three flaws: (1) 
It would make the single-character wildcard effectively an 
any-number-of-characters wildcard, but only in some circumstances, which 
could be confusing, (2) it would be difficult to compute, because you'd 
have to check equality against all possible single-character strings, 
and (3) it is not what the SQL standard says.


In any case, yes, some explanation and examples should be added.





Re: Rename libpq trace internal functions

2024-05-02 Thread Peter Eisentraut

On 24.04.24 12:34, Yugo NAGATA wrote:

On Wed, 24 Apr 2024 09:39:02 +0200
Peter Eisentraut  wrote:


libpq's pqTraceOutputMessage() used to look like this:

  case 'Z':   /* Ready For Query */
  pqTraceOutputZ(conn->Pfdebug, message, );
  break;

Commit f4b54e1ed98 introduced macros for protocol characters, so now
it looks like this:

  case PqMsg_ReadyForQuery:
  pqTraceOutputZ(conn->Pfdebug, message, );
  break;

But this introduced a disconnect between the symbol in the switch case
and the function name to be called, so this made the manageability of
this file a bit worse.

This patch changes the function names to match, so now it looks like
this:

  case PqMsg_ReadyForQuery:
  pqTraceOutput_ReadyForQuery(conn->Pfdebug, message, );
  break;


+1

I prefer the new function names since it seems more natural and easier to read.

I noticed pqTraceOutputNR() is left as is, is this intentional? Or, shoud this
be changed to pqTranceOutput_NoticeResponse()?


pqTraceOutputNR() is shared code used internally by _ErrorResponse() and 
_NoticeResponse().  I have updated the comments a bit to make this clearer.


With that, I have committed it.  Thanks.





Re: Support LIKE with nondeterministic collations

2024-05-02 Thread Peter Eisentraut

On 30.04.24 14:39, Daniel Verite wrote:

   postgres=# SELECT '.foo.' like '_oo' COLLATE ign_punct;
?column?
   --
f
   (1 row)

The first two results look fine, but the next one is inconsistent.


This is correct, because '_' means "any single character".  This is 
independent of the collation.


I think with nondeterministic collations, the single-character wildcard 
is often not going to be all that useful.






Re: small documentation fixes related to collations/ICU

2024-05-02 Thread Peter Eisentraut

On 29.04.24 09:18, Kashif Zeeshan wrote:

Looks good.

On Mon, Apr 29, 2024 at 12:05 PM Peter Eisentraut <mailto:pe...@eisentraut.org>> wrote:


I found two mistakes related to collation and/or ICU support in the
documentation that should probably be fixed and backpatched.  See
attached patches.


Committed, thanks.





Re: Tarball builds in the new world order

2024-04-29 Thread Peter Eisentraut

On 26.04.24 21:24, Tom Lane wrote:

Concretely, I'm proposing the attached.  Peter didn't like
PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
wedded to that if a better name is proposed.


Um, "refspec" leads me here 
, which seems 
like the wrong concept.  I think the more correct concept is "revision" 
(https://git-scm.com/docs/gitrevisions), so something like PG_GIT_REVISION?






Re: Tarball builds in the new world order

2024-04-29 Thread Peter Eisentraut

On 26.04.24 21:24, Tom Lane wrote:

Concretely, I'm proposing the attached.  Peter didn't like
PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
wedded to that if a better name is proposed.


This seems ok to me, but note that we do have an equivalent 
implementation in meson.  If we don't want to update that in a similar 
way, maybe we should disable it.






Re: Support a wildcard in backtrace_functions

2024-04-29 Thread Peter Eisentraut

On 27.04.24 00:16, Michael Paquier wrote:

On Fri, Apr 26, 2024 at 02:39:16PM -0400, Tom Lane wrote:

Well, in that case we have to have some kind of control GUC, and
I think the consensus is that the one we have now is under-designed.
So I also vote for a full revert and try again in v18.


Okay, fine by me to move on with a revert.


Ok, it's reverted.





small documentation fixes related to collations/ICU

2024-04-29 Thread Peter Eisentraut
I found two mistakes related to collation and/or ICU support in the 
documentation that should probably be fixed and backpatched.  See 
attached patches.From 44ea0d75f2739b6a3eed9a0233c3dcb2a64b2538 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 29 Apr 2024 08:50:20 +0200
Subject: [PATCH 1/2] doc: Fix description of --with-icu option

It was claiming that the ICU locale provider is used by default, which
is not correct.  (From commit fcb21b3acdc; it was once contemplated to
make it the default, but it wouldn't have been part of that patch in
any case.)

TODO backpatch 16
---
 doc/src/sgml/installation.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index a453f804cd6..1b32d5ca62c 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -167,7 +167,7 @@ Requirements
 
 
  
-  The ICU locale provider (see ) is used 
by default. If you don't want to use it then you must specify the 
--without-icu option to configure. Using 
this option disables support for ICU collation features (see ).
+  The ICU library is used by default. If you don't want to use it then you 
must specify the --without-icu option to 
configure. Using this option disables support for ICU 
collation features (see ).
  
  
   ICU support requires the ICU4C package to be

base-commit: 5c9f35fc48ea99e59300a267e090e3eafd1b3b0e
-- 
2.44.0

From c2a2fdd1272b24f2513e18f199370491b848c1b9 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 29 Apr 2024 08:55:45 +0200
Subject: [PATCH 2/2] doc: Fix description of deterministic flag of CREATE
 COLLATION

The documentation said that you need to pick a suitable LC_COLLATE
setting in addition to setting the DETERMINISTIC flag.  This would
have been correct if the libc provider supported nondeterministic
collations, but since it doesn't, you actually need to set the LOCALE
option.

TODO backpatch 12
---
 doc/src/sgml/ref/create_collation.sgml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/src/sgml/ref/create_collation.sgml 
b/doc/src/sgml/ref/create_collation.sgml
index 85f18cbbe5d..e34bfc97c3d 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -154,7 +154,7 @@ Parameters
logically equal by the comparison.  PostgreSQL breaks ties using a
byte-wise comparison.  Comparison that is not deterministic can make the
collation be, say, case- or accent-insensitive.  For that, you need to
-   choose an appropriate LC_COLLATE setting
+   choose an appropriate LOCALE setting
and set the collation to not deterministic here.
   
 
-- 
2.44.0



Support LIKE with nondeterministic collations

2024-04-29 Thread Peter Eisentraut
This patch adds support for using LIKE with nondeterministic collations. 
 So you can do things such as


col LIKE 'foo%' COLLATE case_insensitive

This currently results in a "not supported" error.  The reason for that 
is that when I first developed support for nondeterministic collations, 
I didn't know what the semantics of that should be, especially since 
with nondeterministic collations, strings of different lengths could be 
equal, and then dropped the issue for a while.


After further research, the SQL standard's definition of the LIKE 
predicate actually provides a clear definition of the semantics: The 
pattern is partitioned into substrings at wildcard characters (so 
'foo%bar' is partitioned into 'foo', '%', 'bar') and then then whole 
predicate matches if a match can be found for each partition under the 
applicable collation (so for 'foo%bar' we look to partition the input 
string into s1 || s2 || s3 such that s1 = 'foo', s2 is anything, and s3 
= 'bar'.)  The only difference to deterministic collations is that for 
deterministic collations we can optimize this by matching by character, 
but for nondeterministic collations we have to go by substring.From 3f6b584a0f34cabecac69f3cfd663dadfd34f464 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 29 Apr 2024 07:58:20 +0200
Subject: [PATCH v1] Support LIKE with nondeterministic collations

This allows for example using LIKE with case-insensitive collations.
There was previously no internal implementation of this, so it was met
with a not-supported error.  This adds the internal implementation and
removes the error.

Unlike with deterministic collations, the LIKE matching cannot go
character by character but has to go substring by substring.  For
example, if we are matching against LIKE 'foo%bar', we can't start by
looking for an 'f', then an 'o', but instead with have to find
something that matches 'foo'.  This is because the collation could
consider substrings of different lengths to be equal.  This is all
internal to MatchText() in like_match.c.

The changes in GenericMatchText() in like.c just pass through the
locale information to MatchText(), which was previously not needed.
This matches exactly Generic_Text_IC_like() below.

Note that ILIKE is not affected.  It's unclear whether ILIKE makes
sense under nondeterministic collations.

This also updates match_pattern_prefix() in like_support.c to support
optimizing the case of an exact pattern with nondeterministic
collations.  This was already alluded to in the previous code.
---
 doc/src/sgml/charset.sgml |   2 +-
 doc/src/sgml/func.sgml|   7 +-
 src/backend/utils/adt/like.c  |  30 +++--
 src/backend/utils/adt/like_match.c| 118 ++
 src/backend/utils/adt/like_support.c  |  29 ++---
 .../regress/expected/collate.icu.utf8.out |  81 ++--
 src/test/regress/sql/collate.icu.utf8.sql |  23 +++-
 7 files changed, 250 insertions(+), 40 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index daf671e6205..ec7ffb360dd 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -1197,7 +1197,7 @@ Nondeterministic Collations
  to a performance penalty.  Note, in particular, that B-tree cannot use
  deduplication with indexes that use a nondeterministic collation.  Also,
  certain operations are not possible with nondeterministic collations,
- such as pattern matching operations.  Therefore, they should be used
+ such as some pattern matching operations.  Therefore, they should be used
  only in cases where they are specifically wanted.
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1928de57623..6181cc64712 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5374,9 +5374,10 @@ Pattern Matching

 

-The pattern matching operators of all three kinds do not support
-nondeterministic collations.  If required, apply a different collation to
-the expression to work around this limitation.
+SIMILAR TO and POSIX-style regular
+expressions do not support nondeterministic collations.  If required, use
+LIKE or apply a different collation to the expression
+to work around this limitation.

 
   
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 57ead66b5aa..bbbe6c09d18 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -149,22 +149,32 @@ SB_lower_char(unsigned char c, pg_locale_t locale, bool 
locale_is_c)
 static inline int
 GenericMatchText(const char *s, int slen, const char *p, int plen, Oid 
collation)
 {
-   if (collation && !lc_ctype_is_c(collation))
-   {
-   pg_locale_t locale = pg_newlocale_from_collation(collation);
+   pg_locale_t locale = 0;
+   boollocale_is_c = false;
 
-   if (!pg_locale_de

Re: Use XLOG_CONTROL_FILE macro everywhere?

2024-04-27 Thread Peter Eisentraut

On 26.04.24 22:51, Tom Lane wrote:

Robert Haas  writes:

On Wed, Apr 24, 2024 at 8:04 PM Michael Paquier  wrote:

Not sure that I would bother with a second one.  But, well, why not if
people want to rename it, as long as you keep compatibility.



I vote for just standardizing on XLOG_CONTROL_FILE. That name seems
sufficiently intuitive to me, and I'd rather have one identifier for
this than two. It's simpler that way.


+1.  Back when we did the great xlog-to-wal renaming, we explicitly
agreed that we wouldn't change internal symbols referring to xlog.
It might or might not be appropriate to revisit that decision,
but I sure don't want to do it piecemeal, one symbol at a time.

Also, if we did rename this one, the logical choice would be
WAL_CONTROL_FILE not PG_CONTROL_FILE.


My reasoning was mainly that I don't see pg_control as controlling just 
the WAL.  But I don't feel strongly about instigating a great renaming 
here or something.






Re: Remove unnecessary code rom be_lo_put()

2024-04-25 Thread Peter Eisentraut

On 24.04.24 15:25, Tom Lane wrote:

Peter Eisentraut  writes:

On 24.04.24 11:59, Yugo NAGATA wrote:

I noticed that a permission check is performed in be_lo_put()
just after returning inv_open(), but teh permission should be
already checked in inv_open(), so I think we can remove this
part of codes. I attached a patch for this fix.



Yes, I think you are right.


I agree.  Do you want to do the honors?


done





Re: Remove unnecessary code rom be_lo_put()

2024-04-25 Thread Peter Eisentraut

On 25.04.24 01:50, Michael Paquier wrote:

On Wed, Apr 24, 2024 at 09:25:09AM -0400, Tom Lane wrote:

I agree.  Do you want to do the honors?


Good catch.  The same check happens when the object is opened.  Note
that you should be able to remove utils/acl.h at the top of
be-fsstubs.c as this would remove the last piece of code that does an
ACL check in this file.  No objections with doing that now, removing
this code.


utils/acl.h is still needed for object_ownercheck() called in 
be_lo_unlink().






Re: AIX support

2024-04-25 Thread Peter Eisentraut

On 25.04.24 06:20, Tom Lane wrote:

Something I've been mulling over is whether to suggest that the
proposed "new port" should only target building with gcc.

On the one hand, that would (I think) remove a number of annoying
issues, and the average end user is unlikely to care which compiler
their database server was built with.  On the other hand, I'm a strong
proponent of avoiding software monocultures, and xlc is one of the few
C compilers still standing that aren't gcc or clang.


My understanding is that the old xlc is dead and has been replaced by 
"xlclang", which is presumably an xlc-compatible frontend on top of 
clang/llvm.  Hopefully, that will have fewer issues.






Re: Is it acceptable making COPY format extendable?

2024-04-25 Thread Peter Eisentraut

On 25.04.24 06:53, Sutou Kouhei wrote:

I haven't got any reply since 2024-03-15:
https://www.postgresql.org/message-id/flat/20240315.173754.2049843193122381085.kou%40clear-code.com#07aefc636d8165204ddfba971dc9a490
(I sent some pings.)

So I called this status as "stalled".

I'm not familiar with the PostgreSQL's development
style. What should I do for this case? Should I just wait
for a reply from others without doing anything?


PostgreSQL development is currently in feature freeze for PostgreSQL 17 
(since April 8).  So most participants will not be looking very closely 
at development projects for releases beyond that.  The next commitfest 
for patches targeting PostgreSQL 18 is in July, and I see your patch 
registered there.  It's possible that your thread is not going to get 
much attention until then.


So, in summary, you are doing everything right, you just have to be a 
bit more patient right now. :)






Re: Experiments with Postgres and SSL

2024-04-24 Thread Peter Eisentraut

On 01.03.24 22:49, Jacob Champion wrote:

If we're interested in ALPN negotiation in the future, we may also
want to look at GREASE [1] to keep those options open in the presence
of third-party implementations. Unfortunately OpenSSL doesn't do this
automatically yet.

If we don't have a reason not to, it'd be good to follow the strictest
recommendations from [2] to avoid cross-protocol attacks. (For anyone
currently running web servers and Postgres on the same host, they
really don't want browsers "talking" to their Postgres servers.) That
would mean checking the negotiated ALPN on both the server and client
side, and failing if it's not what we expect.


I've been reading up on ALPN.  There is another thread that is 
discussing PostgreSQL protocol version negotiation, and ALPN also has 
"protocol negotiation" in the name and there is some discussion in this 
thread about the granularity oft the protocol names.


I'm concerned that there appears to be some confusion over whether ALPN 
is a performance feature or a security feature.  RFC 7301 appears to be 
pretty clear that it's for performance, not for security.


Looking at the ALPACA attack, I'm not convinced that it's very relevant 
for PostgreSQL.  It's basically just a case of, you connected to the 
wrong server.  And web browsers routinely open additional connections 
based on what data they have previously received, and they liberally 
send along session cookies to those new connections, so I understand 
that this can be a problem.  But I don't see how ALPN is a good defense. 
 It can help only if all other possible services other than http 
implement it and say, you're a web browser, go away.  And what if the 
rogue server is in fact a web server, then it doesn't help at all.  I 
guess there could be some common configurations where there is a web 
server, and ftp server, and some mail servers running on the same TLS 
end point.  But in how many cases is there also a PostgreSQL server 
running on the same end point?  The page about ALPACA also suggests SNI 
as a mitigation, which seems more sensible, because the burden is then 
on the client to do the right thing, and not on all other servers to 
send away clients doing the wrong thing.  And of course libpq already 
supports SNI.


For the protocol negotiation aspect, how does this work if the wrapped 
protocol already has a version negotiation system?  For example, various 
HTTP versions are registered as separate protocols for ALPN.  What if 
ALPN says it's HTTP/1.0 but the actual HTTP requests specify 1.1, or 
vice versa?  What is the actual mechanism where the performance benefits 
(saving round-trips) are created?  I haven't caught up with HTTP 2 and 
so on, so maybe there are additional things at play there, but it is not 
fully explained in the RFCs.  I suppose PostgreSQL would keep its 
internal protocol version negotiation in any case, but then what do we 
need ALPN on top for?






Re: Experiments with Postgres and SSL

2024-04-24 Thread Peter Eisentraut

On 08.04.24 10:38, Heikki Linnakangas wrote:

On 08/04/2024 04:25, Heikki Linnakangas wrote:

One important open item now is that we need to register a proper ALPN
protocol ID with IANA.


I sent a request for that: 
https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/


Why did you ask for "pgsql"?  The IANA protocol name for port 5432 is 
"postgres".  This seems confusing.






Re: Tarball builds in the new world order

2024-04-24 Thread Peter Eisentraut

On 24.04.24 00:05, Tom Lane wrote:

It makes tarballs all right, but whatever commit ID you specify
is semi-ignored, and you get a tarball corresponding to HEAD
of master.  (The PDFs come from the right version, though!)

The reason for that is that the mk-one-release script does this
(shorn of not-relevant-here details):

export BASE=/home/pgsql
export GIT_DIR=$BASE/postgresql.git

mkdir pgsql

# Export the selected git ref
git archive ${gitref} | tar xf - -C pgsql


Where does ${gitref} come from?  Why doesn't this line use git archive 
HEAD | ... ?



What I suggest is doing this in mk-one-release:

-make dist
+make dist PG_COMMIT_HASH=${gitref}

and changing the "make dist" rules to write $(PG_COMMIT_HASH) not
HEAD.  The extra make variable will have no effect in the back
branches, while it should cause the right thing to happen with
the new implementation of "make dist".


I suppose we could do something like that, but we'd also need to come up 
with a meson version.


(Let's not use "hash" though, since other ways to commit specify a 
commit can be used.)



This change seems like a good thing anyway for anyone who's tempted
to use "make dist" manually, since they wouldn't necessarily want
to package HEAD either.


A tin-foil-hat argument is that we might not want to encourage that, 
because for reproducibility, we need a known git commit and also a known 
implementation of make dist.  If in the future someone uses the make 
dist implementation of PG19 to build a tarball for PG17, it might not 
come out the same way as using the make dist implementation of PG17.






Re: Remove unnecessary code rom be_lo_put()

2024-04-24 Thread Peter Eisentraut

On 24.04.24 11:59, Yugo NAGATA wrote:

I noticed that a permission check is performed in be_lo_put()
just after returning inv_open(), but teh permission should be
already checked in inv_open(), so I think we can remove this
part of codes. I attached a patch for this fix.


Yes, I think you are right.

This check was added in 8d9881911f0, but then the refactoring in 
ae20b23a9e7 should probably have removed it.






Re: doc: create table improvements

2024-04-24 Thread Peter Eisentraut

> +   The reliability characteristics of a table are governed by its
> +   persistence mode.  The default mode is described
> +   here
> +   There are two alternative modes that can be specified during
> +   table creation:
> +   temporary and
> +   unlogged.

Not sure reliability is the best word here.  I mean, a temporary table 
isn't any less reliable than any other table.  It just does different 
things.






Re: Why does pgindent's README say to download typedefs.list from the buildfarm?

2024-04-24 Thread Peter Eisentraut

On 22.04.24 22:28, Tom Lane wrote:

Nathan Bossart  writes:

On Mon, Apr 22, 2024 at 04:08:08PM -0400, Tom Lane wrote:

I think the actual plan now is that we'll sync the in-tree copy
with the buildfarm's results (and then do a tree-wide pgindent)
every so often, probably shortly before beta every year.

Okay.  Is this just to resolve the delta between the manual updates and a
clean autogenerated copy every once in a while?

The main reason there's a delta is that people don't manage to
maintain the in-tree copy perfectly (at least, they certainly
haven't done so for this past year).  So we need to do that
to clean up every now and then.

A secondary reason is that the set of typedefs we absorb from
system include files changes over time.


Is the code to extract typedefs available somewhere independent of the 
buildfarm?  It would be useful sometimes to be able to run this locally, 
like before and after some patch, to keep the in-tree typedefs list updated.






Re: Q: Escapes in jsonpath Idents

2024-04-24 Thread Peter Eisentraut

On 18.03.24 01:09, Erik Wienhold wrote:

The error message 'syntax error at or near "$oo" of jsonpath input' for
the second case ($.f$oo), however, looks as if the scanner identifies
'$oo' as a variable instead of contiuing the scan of identifier (f$oo)
for the member accessor.  Looks like a bug to me because a variable
doesn't even make sense in that place.

Right. Maybe the docs should be updated to say that a literal dollar
sign isn’t supported in identifiers, unlike in JavaScript, except
through escapes like this:

Unfortunately, I don't have access to that part of the SQL spec.  So I
don't know how the jsonpath grammar is specified.


The SQL spec says that  corresponds to Identifier 
in ECMAScript.


But it also says,

A  is classified as follows.

Case:

a) A  that is a  is a .

b) A  that begins with  is a
   .

c) Otherwise, a  is a .

Does this help?  I wasn't following all the discussion to see if there 
is anything wrong with the implementation.






  1   2   3   4   5   6   7   8   9   10   >