date:20231018

Re: New WAL record to detect the checkpoint redo location

2023-10-18 Thread Michael Paquier

On Wed, Oct 18, 2023 at 10:24:50AM -0400, Robert Haas wrote:
> I added a variant of this test case. Here's v10.

+-- Verify that an XLOG_CHECKPOINT_REDO record begins at precisely the redo LSN
+-- of the checkpoint we just performed.
+SELECT redo_lsn FROM pg_control_checkpoint() \gset
+SELECT start_lsn = :'redo_lsn'::pg_lsn AS same_lsn, resource_manager,
+record_type FROM pg_get_wal_record_info(:'redo_lsn');
+ same_lsn | resource_manager |   record_type   
+--+--+-
+ t| XLOG | CHECKPOINT_REDO
+(1 row)

Seems fine to me.  Thanks for considering the idea.
--
Michael


signature.asc
Description: PGP signature

Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}

2023-10-18 Thread Michael Paquier

On Wed, Oct 18, 2023 at 02:56:42PM +0900, Michael Paquier wrote:
> Thanks for the new versions.  I have applied 0001 and backpatched it
> for now.  0002 and 0003 look in much cleaner shape than previously.

0002 and 0003 have now been applied.  I have split 0003 into two parts
at the end, mainly on clarity grounds: one for the counters with
EXPLAIN and a second for pg_stat_statements.

There were a few things in the patch set.  Per my notes:
- Some incorrect indentation.
- The additions of show_buffer_usage() did not handle correctly the
addition of a comma before/after the local timing block.  The code
area for has_local_timing needs to check for has_temp_timing, while
the area of has_shared_timing needs to check for (has_local_timing ||
has_temp_timing).
- explain.sgml was missing an update for the information related to
the read/write timings of the local blocks.

Remains what we should do about the "shared/local" string in
show_buffer_usage() for v16 and v15, as "local" is unrelated to that.
Perhaps we should just switch to "shared" in this case or just remove
the string entirely?  Still that implies changing the output of
EXPLAIN on a stable branch in this case, so there could be an argument
for leaving this stuff alone.
--
Michael


signature.asc
Description: PGP signature

Re: Patch: Improve Boolean Predicate JSON Path Docs

2023-10-18 Thread jian he

On Tue, Oct 17, 2023 at 10:56 AM David E. Wheeler  wrote:
>
>
> Oh, I thought it would report issues from the files they were found in. 
> You’re right, I forgot a title. Fixed in v4.
>
> David
>

+Returns the result of a JSON path
+predicate
+check for the specified JSON value. If the result is
not Boolean,
+then NULL is returned. Do not use with non-predicate
+JSON path expressions.

"Do not use with non-predicate", double negative is not easy to
comprehend. Maybe we can simplify it.

16933: value. Use only SQL-standard JSON path expressions, not not
there are two "not".

15842: SQL-standard JSON path expressions, not not
there are two "not".

Re: boolin comment not moved when code was refactored

2023-10-18 Thread Peter Smith

On Thu, Oct 19, 2023 at 3:26 PM Vik Fearing  wrote:
>
> On 10/19/23 06:17, Peter Smith wrote:
> >> In short, maybe the whole comment should just be
> >>
> >> /*
> >>   *  boolin - input function for type boolean
> >>   */
> >>
> > How about "boolin - converts a boolean string value to 1 or 0"
>
>
> Personally, I do not like exposing the implementation of a boolean (it
> is a base type that is not a numeric), so I prefer Tom's suggestion.

OK. Done that way.

PSA v2.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v2-0001-Relocate-or-remove-stale-comments.patch
Description: Binary data

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Andres Freund

Hi,

On 2023-10-18 19:18:13 -0700, Andres Freund wrote:
> On 2023-10-18 21:29:37 -0400, Tom Lane wrote:
> Ah, I see. If I interpret that correctly, the code filters out symbols it
> doesn't find in in some .[chly] file in the *source* directory. This code is,
> uh, barely readable and massively underdocumented.
>
> I guess I need to reimplement that :/.  Don't immediately see how this could
> be implemented for in-tree autoconf builds...
>
> > > But in the attached patch I've implemented this slightly differently. If 
> > > the
> > > tooling to do so is available, the indent-* targets explained above,
> > > use/depend on src/tools/pgindent/typedefs.list.merged (in the build dir),
> > > which is the combination of a src/tools/pgindent/typedefs.list.local 
> > > generated
> > > for the local binaries/libraries and the source tree
> > > src/tools/pgindent/typedefs.list.
> >
> > Hmm ... that allows indenting your C files, but how do you get from that
> > to a non-noisy patch to commit to typedefs.list?
>
> It doesn't provide that on its own. Being able to painlessly indent the files
> seems pretty worthwhile already. But clearly it'd much better if we can
> automatically update typedefs.list.

With code for that added, things seem to work quite nicely.  I added similar
logic to the buildfarm code that builds a list of all tokens in the source
code.

With that, the in-tree typedefs.list can be updated with new tokens found
locally *and* typdefs that aren't used anymore can be removed from the in-tree
typedefs.list (detected by no matching tokens found in the source code).

The only case this approach can't handle is newly referenced typedefs in code
that isn't built locally - which I think isn't particularly common and seems
somewhat fundamental. In those cases typedefs.list still can be updated
manually (and the sorting will still be "fixed" if necessary).

The code is still in a somewhat rough shape and I'll not finish polishing it
today. I've attached the code anyway, don't be too rough :).

The changes from running "ninja update-typedefs indent-tree" on debian and
macos are attached as 0004 - the set of changes looks quite correct to me.

The buildfarm code filtered out a few typedefs manually:
  push(@badsyms, 'date', 'interval', 'timestamp', 'ANY');
but I don't really see why? Possibly that was needed with an older
pg_bsd_indent to prevent odd stuff?

Right now building a new unified typedefs.list and copying it to the source
tree are still separate targets, but that probably makes less sense now? Or
perhaps it should be copied to the source tree when reindenting?

I've only handled linux and macos in the typedefs gathering code. But the
remaining OSs should be "just a bit of work" [TM].

Greetings,

Andres Freund
>From f7520fd87764cddbde9d6c5d25fdfe5314ec5cbc Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 27 May 2023 11:38:27 -0700
Subject: [PATCH v4 1/4] Add meson targets to run pgindent

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 meson.build | 38 +
 src/tools/pgindent/pgindent | 12 ++--
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 862c955453f..2cfeb60eb07 100644
--- a/meson.build
+++ b/meson.build
@@ -3013,6 +3013,44 @@ run_target('install-test-files',

+###
+# Indentation and similar targets
+###
+
+indent_base_cmd = [perl, files('src/tools/pgindent/pgindent'),
+'--indent', pg_bsd_indent.full_path(),
+'--sourcetree=@SOURCE_ROOT@']
+indent_depend = [pg_bsd_indent]
+
+# reindent the entire tree
+# Reindent the entire tree
+run_target('indent-tree',
+  command: indent_base_cmd + ['.'],
+  depends: indent_depend
+)
+
+# Reindent the last commit
+run_target('indent-head',
+  command: indent_base_cmd + ['--commit=HEAD'],
+  depends: indent_depend
+)
+
+# Silently check if any file is not corectly indented (fails the build if
+# there are differences)
+run_target('indent-check',
+  command: indent_base_cmd + ['--silent-diff', '.'],
+  depends: indent_depend
+)
+
+# Show a diff of all the unindented files (doesn't fail the build if there are
+# differences)
+run_target('indent-diff',
+  command: indent_base_cmd + ['--show-diff', '.'],
+  depends: indent_depend
+)
+
+
+
 ###
 # Test prep
 ###
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index bce63d95daf..08b0c95814f 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -23,7 +23,7 @@ my $devnull = File::Spec->devnull;

 my ($typedefs_file, $typedef_str, @excludes,
 	$indent, $build, $show_diff,
-	$silent_diff, $help, @commits,);
+	$silent_diff, $sourcetree, $help, @commits,);

 $help = 0;

@@ -35,7 +35,8 @@

Re: UniqueKey v2

2023-10-18 Thread jian he

On Tue, Oct 17, 2023 at 11:21 AM  wrote:
>
>
> thanks for the really good suggestion.  Here is the newer version:
>

--- a/src/backend/optimizer/path/meson.build
+++ b/src/backend/optimizer/path/meson.build
@@ -10,4 +10,5 @@ backend_sources += files(
   'joinrels.c',
   'pathkeys.c',
   'tidpath.c',
+  'uniquekey.c'
 )
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 3ac25d47..5ed550ca 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -264,7 +264,10 @@ extern PathKey *make_canonical_pathkey(PlannerInfo *root,

int strategy, bool nulls_first);
 extern void add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,

 List *live_childrels);
-
+/*
+ * uniquekey.c
+ * uniquekey.c related functions.
+ */

-
i did some simple tests using text data type.

it works with the primary key, not with unique indexes.
it does not work when the column is unique, not null.

The following is my test.

begin;
CREATE COLLATION case_insensitive (provider = icu, locale =
'und-u-ks-level2', deterministic = false);
CREATE COLLATION upper_first (provider = icu, locale = 'und-u-kf-upper');
commit;
begin;
CREATE TABLE test_uniquekey3(a text, b text);
CREATE TABLE test_uniquekey4(a text, b text);
CREATE TABLE test_uniquekey5(a text, b text);
CREATE TABLE test_uniquekey6(a text, b text);
CREATE TABLE test_uniquekey7(a text not null, b text not null);
CREATE TABLE test_uniquekey8(a text not null, b text not null);
CREATE TABLE test_uniquekey9(a text primary key COLLATE upper_first, b
text not null);
CREATE TABLE test_uniquekey10(a text primary key COLLATE
case_insensitive, b text not null);
create unique index on test_uniquekey3 (a COLLATE case_insensitive nulls first)
nulls distinct
with (fillfactor = 80);
create unique index on test_uniquekey4 (a COLLATE case_insensitive nulls first)
nulls not distinct
with (fillfactor = 80);
create unique index on test_uniquekey5 (a COLLATE upper_first nulls first)
nulls distinct;
create unique index on test_uniquekey6 (a COLLATE upper_first nulls first)
nulls not distinct;
create unique index on test_uniquekey7 (a COLLATE upper_first nulls
first) nulls distinct;
create unique index on test_uniquekey8 (a COLLATE case_insensitive
nulls first) nulls not distinct;
insert into test_uniquekey3(a,b) select g::text, (g+10)::text from
generate_series(1,1e5) g;
insert into test_uniquekey4(a,b) select g::text, (g+10)::text from
generate_series(1,1e5) g;
insert into test_uniquekey5(a,b) select g::text, (g+10)::text from
generate_series(1,1e5) g;
insert into test_uniquekey6(a,b) select g::text, (g+10)::text from
generate_series(1,1e5) g;
insert into test_uniquekey7(a,b) select g::text, (g+10)::text from
generate_series(1,1e5) g;
insert into test_uniquekey8(a,b) select g::text, (g+10)::text from
generate_series(1,1e5) g;
insert into test_uniquekey9(a,b) select g::text, (g+10)::text from
generate_series(1,1e5) g;
insert into test_uniquekey10(a,b) select g::text, (g+10)::text from
generate_series(1,1e5) g;
insert into test_uniquekey3(a) VALUES(null),(null),(null);
insert into test_uniquekey4(a) VALUES(null);
insert into test_uniquekey5(a) VALUES(null),(null),(null);
insert into test_uniquekey6(a) VALUES(null);
commit;

ANALYZE test_uniquekey3, test_uniquekey4, test_uniquekey5
,test_uniquekey6,test_uniquekey7, test_uniquekey8
,test_uniquekey9, test_uniquekey10;

explain (costs off) select distinct a from test_uniquekey3;
explain (costs off) select distinct a from test_uniquekey4;
explain (costs off) select distinct a from test_uniquekey5;
explain (costs off) select distinct a from test_uniquekey6;
explain (costs off) select distinct a from test_uniquekey7;
explain (costs off) select distinct a from test_uniquekey8;
explain (costs off) select distinct a from test_uniquekey9;
explain (costs off) select distinct a from test_uniquekey10;
explain (costs off) select distinct a from test_uniquekey3 where a < '2000';
explain (costs off) select distinct a from test_uniquekey4 where a < '2000';
explain (costs off) select distinct a from test_uniquekey5 where a < '2000';
explain (costs off) select distinct a from test_uniquekey6 where a < '2000';
explain (costs off) select distinct a from test_uniquekey7 where a < '2000';
explain (costs off) select distinct a from test_uniquekey8 where a < '2000';
explain (costs off) select distinct a from test_uniquekey9 where a < '2000';
explain (costs off) select distinct a from test_uniquekey10 where a < '2000';

--very high selectivity
explain (costs off) select distinct a from test_uniquekey3 where a < '1001';
explain (costs off) select distinct a from test_uniquekey4 where a < '1001';
explain (costs off) select distinct a from test_uniquekey5 where a < '1001';
explain (costs off) select distinct a from test_uniquekey6 where a < '1001';
explain (costs off) select distinct a from test_uniquekey7 where a < '1001';
explain (costs off) select distinct a from

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread Zhijie Hou (Fujitsu)

On Wednesday, October 18, 2023 5:26 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> 
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.

Thanks for updating the patch, here are few comments for the test.

1.

>
# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections' to
# the same value (10) but PG12 and prior considered max_walsenders as a subset
# of max_connections, so setting the same value will fail.
if ($old_publisher->pg_version->major < 12)
{
$old_publisher->append_conf(
'postgresql.conf', qq[
max_wal_senders = 5
max_connections = 10
]);
>

I think we already set max_wal_senders to 5 in init() function(in Cluster.pm),
so is this necessary ? And 002_pg_upgrade.pl doesn't seems set this.

2.

SELECT pg_create_logical_replication_slot('test_slot1', 
'test_decoding', false, true);
SELECT pg_create_logical_replication_slot('test_slot2', 
'test_decoding', false, true);

I think we don't need to set the last two parameters here as we don't check
these info in the tests.

3.

# Set extra params if cross-version checks are required. This is needed to
# avoid using previously initdb'd cluster
if (defined($ENV{oldinstall}))
{
my @initdb_params = ();
push @initdb_params, ('--encoding', 'UTF-8');
push @initdb_params, ('--locale', 'C');

I am not sure I understand the comment, would it be possible provide a bit more
explanation about the purpose of this setting ? And I see 002_pg_upgrade always
have these setting even if oldinstall is not defined, so shall we follow the
same ?

4.

+   command_ok(
+   [
+   'pg_upgrade', '--no-sync',
+   '-d', $old_publisher->data_dir,
+   '-D', $new_publisher->data_dir,
+   '-b', $oldbindir,
+   '-B', $newbindir,
+   '-s', $new_publisher->host,
+   '-p', $old_publisher->port,
+   '-P', $new_publisher->port,
+   $mode,
+   ],

I think all the pg_upgrade commands in the test are the same, so we can save 
the cmd
in a variable and pass them to command_xx(). I think it can save some effort to
check the difference of each command and can also reduce some codes.

Best Regards,
Hou zj

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Ashutosh Bapat

On Wed, Oct 18, 2023 at 8:23 PM Tomas Vondra
 wrote:

>
> I did use that many values to actually force "compaction" and merging of
> points into ranges. We only keep 32 values per page range, so with 2
> values we'll not build a range. You're right it may still trigger the
> overflow (we probably still calculate distances, I didn't realize that),
> but without the compaction we can't check the query plans.
>
> However, I agree 60 values may be a bit too much. And I realized we can
> reduce the count quite a bit by using the values_per_range option. We
> could set it to 8 (which is the minimum).
>

I haven't read BRIN code, except the comments in the beginning of the
file. From what you describe it seems we will store first 32 values as
is, but later as the number of values grow create ranges from those?
Please point me to the relevant source code/documentation. Anyway, if
we can reduce the number of rows we insert, that will be good.

> >
>
> Not sure what you mean by "crash". Yes, it doesn't hit an assert,
> because there's none when calculating distance for date. It however
> should fail in the query plan check due to the incorrect order of
> subtractions.
>
> Also, the commit message does not claim to fix overflow. In fact, it
> says it can't overflow ...
>

Reading the commit message
"Tests for overflows with dates and timestamps in BRIN ...

...

The new regression tests check this for date and timestamp data types.
It adds tables with data close to the allowed min/max values, and builds
a minmax-multi index on it."

I expected the CREATE INDEX statement to throw an error or fail the
"Assert(delta >= 0);" in brin_minmax_multi_distance_date(). But a
later commit mentions that the overflow is not possible.

> >
> > We may want to combine various test cases though. Like the test adding
> > infinity and extreme values could be combined. Also the number of
> > values it inserts in the table for the reasons stated above.
> >
>
> I prefer not to do that. I find it more comprehensible to keep the tests
> separate / testing different things. If the tests were expensive to
> setup or something like that, that'd be a different situation.

Fair enough.

> >>
> >
> > Right. Do we highlight that in the commit message so that the person
> > writing release notes picks it up from there?
> >
>
> Yes, I think I'll mention what impact each issue can have on indexes.

Thanks.

-- 
Best Wishes,
Ashutosh Bapat

Re: boolin comment not moved when code was refactored

2023-10-18 Thread Vik Fearing


On 10/19/23 06:17, Peter Smith wrote:

In short, maybe the whole comment should just be

/*
  *  boolin - input function for type boolean
  */


How about "boolin - converts a boolean string value to 1 or 0"



Personally, I do not like exposing the implementation of a boolean (it 
is a base type that is not a numeric), so I prefer Tom's suggestion.

--
Vik Fearing

Re: boolin comment not moved when code was refactored

2023-10-18 Thread Peter Smith

On Thu, Oct 19, 2023 at 2:55 PM Tom Lane  wrote:
>
> Richard Guo  writes:
> > On Thu, Oct 19, 2023 at 10:35 AM Peter Smith  wrote:
> >> I happened upon a function comment referring to non-existent code
> >> (that code was moved to another location many years ago).
> >>
> >> Probably better to move that comment too. Thoughts?
>
> > Agreed. +1 to move that comment.
>
> Hm, I'm inclined to think that the comment lines just above:
>
>  *boolin- converts "t" or "f" to 1 or 0
>  *
>  * Check explicitly for "true/false" and TRUE/FALSE, 1/0, YES/NO, ON/OFF.
>  * Reject other values.
>
> are also well past their sell-by date.  The one-line summary
> "converts "t" or "f" to 1 or 0" is not remotely accurate anymore.
> Perhaps we should just drop it?  Or else reword to something
> vaguer, like "input function for boolean".  The "Check explicitly"
> para no longer describes logic in this function.  We could move
> it to parse_bool_with_len, but that seems to have a suitable
> comment already.
>

Yes, I had the same thought about the rest of the comment being
outdated but just wanted to test the water to see if a small change
was accepted before I did too much.

> In short, maybe the whole comment should just be
>
> /*
>  *  boolin - input function for type boolean
>  */
>

How about "boolin - converts a boolean string value to 1 or 0"

==
Kind Regards,
Peter Smith.
Fujitsu Australia.

Re: Asymmetric partition-wise JOIN

2023-10-18 Thread Andrei Lepikhov


On 18/10/2023 16:59, Ashutosh Bapat wrote:

On Wed, Oct 18, 2023 at 10:55 AM Andrei Lepikhov
 wrote:



But the clauses of A parameterized by P will produce different
translations for each of the partitions. I think we will need
different RelOptInfos (for A) to store these translations.


Does the answer above resolved this issue?


May be. There are other problematic areas like EvalPlanQual, Rescans,
reparameterised paths which can blow up if we use the same RelOptInfo
for different scans of the same relation. It will be good to test


Yeah, now I got it. It is already the second place where I see some 
reference to a kind of hidden rule that the rte entry (or RelOptInfo) 
must correspond to only one plan node. I don't have a quick answer for 
now - maybe it is a kind of architectural agreement - and I will 
consider this issue during the development.



those. And also A need not be a simple relation; it could be join as
well.


For a join RelOptInfo, as well as for any subtree, we have the same 
logic: the pathlist of this subtree is already formed during the 
previous level of the search and will not be changed.





The relid is also used to track the scans at executor level. Since we
have so many scans on A, each may be using different plan, we will
need different ids for those.


I don't understand this sentence. Which way executor uses this index of
RelOptInfo ?


See Scan::scanrelid



--
regards,
Andrey Lepikhov
Postgres Professional

Re: boolin comment not moved when code was refactored

2023-10-18 Thread Tom Lane

Richard Guo  writes:
> On Thu, Oct 19, 2023 at 10:35 AM Peter Smith  wrote:
>> I happened upon a function comment referring to non-existent code
>> (that code was moved to another location many years ago).
>> 
>> Probably better to move that comment too. Thoughts?

> Agreed. +1 to move that comment.

Hm, I'm inclined to think that the comment lines just above:

 *boolin- converts "t" or "f" to 1 or 0
 *
 * Check explicitly for "true/false" and TRUE/FALSE, 1/0, YES/NO, ON/OFF.
 * Reject other values.

are also well past their sell-by date.  The one-line summary
"converts "t" or "f" to 1 or 0" is not remotely accurate anymore.
Perhaps we should just drop it?  Or else reword to something
vaguer, like "input function for boolean".  The "Check explicitly"
para no longer describes logic in this function.  We could move
it to parse_bool_with_len, but that seems to have a suitable
comment already.

In short, maybe the whole comment should just be

/*
 *  boolin - input function for type boolean
 */

Agreed with your original point, though.

regards, tom lane

Re: pg_upgrade's interaction with pg_resetwal seems confusing

2023-10-18 Thread vignesh C

On Mon, 16 Oct 2023 at 10:33, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Vignesh,
>
> Thank you for reviewing! PSA new version.
>
> >
> > Few comments:
> > 1) Most of the code in binary_upgrade_set_next_oid is similar to
> > SetNextObjectId, but SetNextObjectId has the error handling to report
> > an error if an invalid nextOid is specified:
> > if (ShmemVariableCache->nextOid > nextOid)
> > elog(ERROR, "too late to advance OID counter to %u, it is now %u",
> > nextOid, ShmemVariableCache->nextOid);
> >
> > Is this check been left out from binary_upgrade_set_next_oid function
> > intentionally? Have you left this because it could be like a dead
> > code. If so, should we have an assert for this here?
>
> Yeah, they were removed intentionally, but I did rethink that they could be
> combined. ereport() would be skipped during the upgrade mode. Thought?
>
> Regarding the first ereport(ERROR), it just requires that we are doing initdb.
>
> As for the second ereport(ERROR), it requires that next OID is not rollbacked.
> The restriction seems OK during the initialization, but it is not appropriate 
> for
> upgrading: wraparound of OID counter might be occurred on old cluster but we 
> try
> to restore the counter anyway.
>
> > 2) How about changing issue_warnings_and_set_oid function name to
> > issue_warnings_and_set_next_oid?
> >  void
> > -issue_warnings_and_set_wal_level(void)
> > +issue_warnings_and_set_oid(void)
> >  {
>
> Fixed.
>
> > 3) We have removed these comments, is there any change to the rsync
> > instructions? If so we could update the comments accordingly.
> > -* We unconditionally start/stop the new server because
> > pg_resetwal -o set
> > -* wal_level to 'minimum'.  If the user is upgrading standby
> > servers using
> > -* the rsync instructions, they will need pg_upgrade to write its 
> > final
> > -* WAL record showing wal_level as 'replica'.
> >
>
> Hmm, I thought comments for rsync seemed outdated so that removed. I still 
> think
> this is not needed. Since controlfile->wal_level is not updated to 'minimal'
> anymore, the unconditional startup is not required for physical standby.

We can update the commit message with the details of the same, it will
help to understand that it is intentionally done.

There are couple of typos with the new patch:
1) "uprade logical replication slot" should be "upgrade logical
replication slot":
Previously, the OID counter is restored by invoking pg_resetwal with the -o
option, at the end of upgrade. This is not problematic for now, but WAL removals
are not happy if we want to uprade logical replication slot. Therefore, a new
upgrade function is introduced to reset next OID.
2) "becasue the value" should be "because the value":
Note that we only update the on-memory data to avoid concurrent update of
control with the chekpointer. It is harmless becasue the value would be set at
shutdown checkpoint.

Regards,
Vignesh

Re: Removing unneeded self joins

2023-10-18 Thread Andrei Lepikhov

On 19/10/2023 01:50, Alexander Korotkov wrote:

On Mon, Oct 16, 2023 at 11:28 AM Andrei Lepikhov
 wrote:

On 12/10/2023 18:32, Alexander Korotkov wrote:

On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
 wrote:

On 4/10/2023 14:34, Alexander Korotkov wrote:

   > Relid replacement machinery is the most contradictory code here. We used
   > a utilitarian approach and implemented a simplistic variant.

   > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
   > > they are not necessary.  [1] points that infrastructure from [2] might
   > > be useful.  The patchset from [2] seems committed mow.  However, I
   > > can't see it is directly helpful in this matter.  Could we just skip
   > > adding IS NOT NULL clause for the columns, that have
   > > pg_attribute.attnotnull set?
   > Thanks for the links, I will look into that case.

To be more precise, in the attachment, you can find a diff to the main
patch, which shows the volume of changes to achieve the desired behaviour.
Some explains in regression tests shifted. So, I've made additional tests:

DROP TABLE test CASCADE;
CREATE TABLE test (a int, b int not null);
CREATE UNIQUE INDEX abc ON test(b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
CREATE UNIQUE INDEX abc1 ON test(a,b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
DROP INDEX abc1;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);

We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.

OK.  I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).

I made improvements over the code.  Mostly new comments, grammar
corrections of existing comments and small refactoring.

Also, I found that the  suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it.  That helps to evade quadratic complexity with
large number of relations.

Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found.  It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).

I would like to propose one more minor improvement (see in attachment).
The idea here is that after removing a self-join and changing clauses we
should re-probe the set of relids with the same Oid, because we can find
more removable self-joins (see the demo test in join.sql).

Thank you, I've integrated this into the patch.  BTW, the patch
introduces two new GUC variables: enable_self_join_removal,
self_join_search_limit.  enable_self_join_removal variable turns
on/off optimization at all.  self_join_search_limit variable limits
its usage by the number of joins.  AFICS, self_join_search_limit is
intended to protect us from quadratic complexity self-join removal
has.  I tried to reproduce the extreme case.

SELECT count(*) FROM pgbench_accounts a0, pgbench_accounts a1, ...,
pgbench_accounts a100 WHERE a0.aid = 1 AND a1.aid = a0.aid + 1 AND ...
AND a100.aid = a99.aid + 1;

This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled.  So, no measurable
overhead.  Similar to the higher number of joins.  Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts?  If not, should we
remove self_join_search_limit GUC?

Thanks,
It was Zhihong Yu who worried about that case [1]. And my purpose was to 
show a method to avoid such a problem if it would be needed.
I guess the main idea here is that we have a lot of self-joins, but only 
few of them (or no one) can be removed.
I can't imagine a practical situation when we can be stuck in the 
problems here. So, I vote to remove this GUC.

[1] 
https://www.postgresql.org/message-id/CALNJ-vTyL-LpvSOPZxpC63Et3LJLUAFZSfRqGEhT5Rj7_EEj7w%40mail.gmail.com

--
regards,
Andrey Lepikhov
Postgres Professional

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread vignesh C

On Wed, 18 Oct 2023 at 14:55, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Peter,
>
> Thank you for reviewing! PSA new version.
> Note that 0001 and 0002 are combined into one patch.
>
> > Here are some review comments for v51-0001
> >
> > ==
> > src/bin/pg_upgrade/check.c
> >
> > 0.
> > +check_old_cluster_for_valid_slots(bool live_check)
> > +{
> > + char output_path[MAXPGPATH];
> > + FILE*script = NULL;
> > +
> > + prep_status("Checking for valid logical replication slots");
> > +
> > + snprintf(output_path, sizeof(output_path), "%s/%s",
> > + log_opts.basedir,
> > + "invalid_logical_relication_slots.txt");
> >
> > 0a
> > typo /invalid_logical_relication_slots/invalid_logical_replication_slots/
>
> Fixed.
>
> > 0b.
> > Since the non-upgradable slots are not strictly "invalid", is this an
> > appropriate filename for the bad ones?
> >
> > But I don't have very good alternatives. Maybe:
> > - non_upgradable_logical_replication_slots.txt
> > - problem_logical_replication_slots.txt
>
> Per discussion [1], I kept current style.
>
> > src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> >
> > 1.
> > +# --
> > +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> > +#
> > +# There are two requirements for GUCs - wal_level and 
> > max_replication_slots,
> > +# but only max_replication_slots will be tested here. This is because to
> > +# reduce the execution time of the test.
> >
> >
> > SUGGESTION
> > # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> > #
> > # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> > # reduce the test execution time, only 'max_replication_slots' is tested 
> > here.
>
> First part was fixed. Second part was removed per [1].
>
> > 2.
> > +# Preparations for the subsequent test:
> > +# 1. Create two slots on the old cluster
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot1',
> > 'test_decoding', false, true);"
> > +);
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_create_logical_replication_slot('test_slot2',
> > 'test_decoding', false, true);"
> > +);
> >
> >
> > Can't you combine those SQL in the same $old_publisher->safe_psql.
>
> Combined.
>
> > 3.
> > +# Clean up
> > +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> > +# Set max_replication_slots to the same value as the number of slots. Both 
> > of
> > +# slots will be used for subsequent tests.
> > +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 
> > 1");
> >
> > The code doesn't seem to match the comment - is this correct? The
> > old_publisher created 2 slots, so why are you setting new_publisher
> > "max_replication_slots = 1" again?
>
> Fixed to "max_replication_slots = 2" Note that previous test worked well 
> because
> GUC checking on new cluster is done after checking the status of slots.
>
> > 4.
> > +# Preparations for the subsequent test:
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> > +$old_publisher->start;
> > +$old_publisher->safe_psql('postgres',
> > + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> > +
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> > +
> > +# 3. Emit a non-transactional message. test_slot2 detects the message so 
> > that
> > +# this slot will be also reported by upcoming pg_upgrade.
> > +$old_publisher->safe_psql('postgres',
> > + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> > 'This is a non-transactional message');"
> > +);
> >
> >
> > I felt this test would be clearer if you emphasised the state of the
> > test_slot1 also. e.g.
> >
> > 4a.
> > BEFORE
> > +# 1. Generate extra WAL records. Because these WAL records do not get
> > consumed
> > +# it will cause the upcoming pg_upgrade test to fail.
> >
> > SUGGESTION
> > # 1. Generate extra WAL records. At this point neither test_slot1 nor 
> > test_slot2
> > #has consumed them.
>
> Fixed.
>
> > 4b.
> > BEFORE
> > +# 2. Advance the slot test_slot2 up to the current WAL location
> >
> > SUGGESTION
> > # 2. Advance the slot test_slot2 up to the current WAL location, but 
> > test_slot2
> > #still has unconsumed WAL records.
>
> IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I 
> think
> "but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.
>
> > 5.
> > +# pg_upgrade will fail because the slot still has unconsumed WAL records
> > +command_checks_all(
> >
> > /because the slot still has/because there are slots still having/
>
> Fixed.
>
> > 6.
> > + [qr//],
> > + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
> > +);
> >
> > /slot/slots/
>
> Fixed.
>
> > 7.

Re: boolin comment not moved when code was refactored

2023-10-18 Thread Richard Guo

On Thu, Oct 19, 2023 at 10:35 AM Peter Smith  wrote:

> Hi.
>
> I happened upon a function comment referring to non-existent code
> (that code was moved to another location many years ago).
>
> Probably better to move that comment too. Thoughts?


Agreed. +1 to move that comment.

Thanks
Richard

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2023-10-18 Thread Andrei Lepikhov


On 19/10/2023 02:00, Stephen Frost wrote:

Greetings,

* Andrei Lepikhov (a.lepik...@postgrespro.ru) wrote:

On 29/9/2023 09:52, Andrei Lepikhov wrote:

On 22/5/2023 22:59, reid.thomp...@crunchydata.com wrote:

Attach patches updated to master.
Pulled from patch 2 back to patch 1 a change that was also pertinent
to patch 1.

+1 to the idea, have doubts on the implementation.

I have a question. I see the feature triggers ERROR on the exceeding of
the memory limit. The superior PG_CATCH() section will handle the error.
As I see, many such sections use memory allocations. What if some
routine, like the CopyErrorData(), exceeds the limit, too? In this case,
we could repeat the error until the top PG_CATCH(). Is this correct
behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for
recursion and allow error handlers to slightly exceed this hard limit?



By the patch in attachment I try to show which sort of problems I'm worrying
about. In some PП_CATCH() sections we do CopyErrorData (allocate some
memory) before aborting the transaction. So, the allocation error can move
us out of this section before aborting. We await for soft ERROR message but
will face more hard consequences.


While it's an interesting idea to consider making exceptions to the
limit, and perhaps we'll do that (or have some kind of 'reserve' for
such cases), this isn't really any different than today, is it?  We
might have a malloc() failure in the main path, end up in PG_CATCH() and
then try to do a CopyErrorData() and have another malloc() failure.

If we can rearrange the code to make this less likely to happen, by
doing a bit more work to free() resources used in the main path before
trying to do new allocations, then, sure, let's go ahead and do that,
but that's independent from this effort.


I agree that rearranging efforts can be made independently. The code in 
the letter above was shown just as a demo of the case I'm worried about.
IMO, the thing that should be implemented here is a recursion level for 
the memory limit. If processing the error, we fall into recursion with 
this limit - we should ignore it.
I imagine custom extensions that use PG_CATCH() and allocate some data 
there. At least we can raise the level of error to FATAL.


--
regards,
Andrey Lepikhov
Postgres Professional

boolin comment not moved when code was refactored

2023-10-18 Thread Peter Smith

Hi.

I happened upon a function comment referring to non-existent code
(that code was moved to another location many years ago).

Probably better to move that comment too. Thoughts?

PSA.

==
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Relocate-old-comment.patch
Description: Binary data

Re: pgBufferUsage.blk_{read|write}_time are zero although there are pgBufferUsage.local_blks_{read|written}

2023-10-18 Thread Michael Paquier

On Tue, Oct 03, 2023 at 12:44:36PM -0400, Robert Haas wrote:
> My first thought was to wonder whether this was even a bug. I
> remembered that EXPLAIN treats shared, local, and temp buffers as
> three separate categories of things. But it seems that someone decided
> to conflate two of them for I/O timing purposes:
> 
> if (has_timing)
> {
> appendStringInfoString(es->str, "
> shared/local");
> 
>  Notice this bit in particular.

I was reviewing the whole, and this is an oversight specific to
efb0ef909f60, because we've never incremented the write/read counters
for local buffers, even with this commit applied, for both the EXPLAIN
reports and anything stored in pg_stat_statement.  It seems to me that
the origin of the confusion comes down to pg_stat_database where
blk_{read|write}_time increments on both local and shared blocks, but
on EXPLAIN this stuff only reflects data about shared buffers.  So the
"shared" part of the string is right, but the "local" part is not in
v15 and v16.
--
Michael


signature.asc
Description: PGP signature

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Andres Freund

Hi,

On 2023-10-18 21:29:37 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > It turns out that updating the in-tree typedefs.list would be very noisy. On
> > my local linux system I get
> >  1 file changed, 422 insertions(+), 1 deletion(-)
> > On a mac mini I get
> >  1 file changed, 351 insertions(+), 1 deletion(-)
>
> That seems like it needs a considerably closer look.  What exactly
> is getting added/deleted?

Types from bison, openssl, libxml, libxslt, icu and libc, at least. If I
enable LLVM, there are even more.

(I think I figured out what's happening further down)

> > We could possibly address that by updating the in-tree typedefs.list a bit
> > more aggressively. Sure looks like the source systems are on the older side.
>
> Really?  Per [1] we've currently got contributions from calliphoridae
> which is Debian sid, crake which is Fedora 38, indri/sifaka which are
> macOS Sonoma.  Were you really expecting something newer, and if so what?

It's quite odd, I see plenty more types than those. I can't really explain why
they're not being picked up on those animals.

E.g. for me bison generated files contain typedefs like

typedef int_least8_t yytype_int8;
typedef signed char yytype_int8;
typedef yytype_int8 yy_state_t;

yet they don't show up in the buildfarm typedefs output.   It's not a thing of
the binary, I checked that the symbols are present.

I have a hard time parsing the buildfarm code for generating the typedefs
file, tbh.

Ah, I see. If I interpret that correctly, the code filters out symbols it
doesn't find in in some .[chly] file in the *source* directory. This code is,
uh, barely readable and massively underdocumented.

I guess I need to reimplement that :/.  Don't immediately see how this could
be implemented for in-tree autoconf builds...

> > But in the attached patch I've implemented this slightly differently. If the
> > tooling to do so is available, the indent-* targets explained above,
> > use/depend on src/tools/pgindent/typedefs.list.merged (in the build dir),
> > which is the combination of a src/tools/pgindent/typedefs.list.local 
> > generated
> > for the local binaries/libraries and the source tree
> > src/tools/pgindent/typedefs.list.
>
> Hmm ... that allows indenting your C files, but how do you get from that
> to a non-noisy patch to commit to typedefs.list?

It doesn't provide that on its own. Being able to painlessly indent the files
seems pretty worthwhile already. But clearly it'd much better if we can
automatically update typedefs.list.

Greetings,

Andres Freund

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread Peter Smith

Here are some review comments for v52-0001

==
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

1.
+ # 2. max_replication_slots is set to smaller than the number of slots (2)
+ # present on the old cluster

SUGGESTION
2. Set 'max_replication_slots' to be less than the number of slots (2)
present on the old cluster.

~~~

2.
+ # Set max_replication_slots to the same value as the number of slots. Both
+ # of slots will be used for subsequent tests.

SUGGESTION
Set 'max_replication_slots' to match the number of slots (2) present
on the old cluster.
Both slots will be used for subsequent tests.

~~~

3.
+ # 3. Emit a non-transactional message. test_slot2 detects the message so
+ # that this slot will be also reported by upcoming pg_upgrade.
+ $old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+ );

SUGGESTION
3. Emit a non-transactional message. This will cause test_slot2 to
detect the unconsumed WAL record.

~~~

4.
+ # Preparations for the subsequent test:
+ # 1. Generate extra WAL records. At this point neither test_slot1 nor
+ # test_slot2 has consumed them.
+ $old_publisher->start;
+ $old_publisher->safe_psql('postgres',
+ "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
+
+ # 2. Advance the slot test_slot2 up to the current WAL location, but
+ # test_slot1 still has unconsumed WAL records.
+ $old_publisher->safe_psql('postgres',
+ "SELECT pg_replication_slot_advance('test_slot2', NULL);");
+
+ # 3. Emit a non-transactional message. test_slot2 detects the message so
+ # that this slot will be also reported by upcoming pg_upgrade.
+ $old_publisher->safe_psql('postgres',
+ "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');"
+ );
+
+ $old_publisher->stop;

All of the above are sequentially executed on the
old_publisher->safe_psql, so consider if it is worth combining them
all in a single call (keeping the comments 1,2,3 separate still)

For example,

$old_publisher->start;
$old_publisher->safe_psql('postgres', qq[
  CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
  SELECT pg_replication_slot_advance('test_slot2', NULL);
  SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
'This is a non-transactional message');
]);
$old_publisher->stop;

~~~

5.
+ # Clean up
+ $subscriber->stop();
+ $new_publisher->stop();

Should this also drop the 'test_slot1' and 'test_slot2'?

~~~

6.
+# Verify that logical replication slots cannot be migrated.  This function
+# will be executed when the old cluster is PG16 and prior.
+sub test_upgrade_from_pre_PG17
+{
+ my ($old_publisher, $new_publisher, $mode) = @_;
+
+ my $oldbindir = $old_publisher->config_data('--bindir');
+ my $newbindir = $new_publisher->config_data('--bindir');

SUGGESTION (let's not mention lots of different numbers; just refer to 17)
This function will be executed when the old cluster version is prior to PG17.

~~

7.
+ # Actual run, successful upgrade is expected
+ command_ok(
+ [
+ 'pg_upgrade', '--no-sync',
+ '-d', $old_publisher->data_dir,
+ '-D', $new_publisher->data_dir,
+ '-b', $oldbindir,
+ '-B', $newbindir,
+ '-s', $new_publisher->host,
+ '-p', $old_publisher->port,
+ '-P', $new_publisher->port,
+ $mode,
+ ],
+ 'run of pg_upgrade of old cluster');
+
+ ok( !-d $new_publisher->data_dir . "/pg_upgrade_output.d",
+ "pg_upgrade_output.d/ removed after pg_upgrade success");

7a.
The comment is wrong?

SUGGESTION
# pg_upgrade should NOT be successful

~

7b.
There is a blank line here before the ok() function, but in the other
tests, there was none. Better to be consistent.

~~~

8.
+ # Clean up
+ $new_publisher->stop();

Should this also drop the 'test_slot'?

~~~

9.
+# The TAP Cluster.pm assigns default 'max_wal_senders' and 'max_connections' to
+# the same value (10) but PG12 and prior considered max_walsenders as a subset
+# of max_connections, so setting the same value will fail.
+if ($old_publisher->pg_version->major < 12)
+{
+ $old_publisher->append_conf(
+ 'postgresql.conf', qq[
+ max_wal_senders = 5
+ max_connections = 10
+ ]);
+}

If the comment is correct, then PG12 *and* prior, should be testing
"<= 12", not "< 12". right?

~~~

10.
+# Test according to the major version of the old cluster.
+# Upgrading logical replication slots has been supported only since PG17.
+if ($old_publisher->pg_version->major >= 17)

This comment seems wrong IMO. I think we always running the latest
version of pg_upgrade so slot migration is always "supported" from now
on. IIUC you intended this comment to be saying something about the
old_publisher slots.

BEFORE
Upgrading logical replication slots has been supported only since PG17.

SUGGESTION
Upgrading logical replication slots from versions older than PG17 is
not supported.

==
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Tom Lane

Andres Freund  writes:
> It turns out that updating the in-tree typedefs.list would be very noisy. On
> my local linux system I get
>  1 file changed, 422 insertions(+), 1 deletion(-)
> On a mac mini I get
>  1 file changed, 351 insertions(+), 1 deletion(-)

That seems like it needs a considerably closer look.  What exactly
is getting added/deleted?

> We could possibly address that by updating the in-tree typedefs.list a bit
> more aggressively. Sure looks like the source systems are on the older side.

Really?  Per [1] we've currently got contributions from calliphoridae
which is Debian sid, crake which is Fedora 38, indri/sifaka which are
macOS Sonoma.  Were you really expecting something newer, and if so what?

> But in the attached patch I've implemented this slightly differently. If the
> tooling to do so is available, the indent-* targets explained above,
> use/depend on src/tools/pgindent/typedefs.list.merged (in the build dir),
> which is the combination of a src/tools/pgindent/typedefs.list.local generated
> for the local binaries/libraries and the source tree
> src/tools/pgindent/typedefs.list.

Hmm ... that allows indenting your C files, but how do you get from that
to a non-noisy patch to commit to typedefs.list?

regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/typedefs.pl?show_list

Re: Parent/child context relation in pg_get_backend_memory_contexts()

2023-10-18 Thread Andres Freund

Hi,

On 2023-10-18 15:53:30 -0400, Stephen Frost wrote:
> > Here how pg_backend_memory_contexts would look like with this patch:
> > 
> > postgres=# SELECT name, id, parent, parent_id, path
> > FROM pg_backend_memory_contexts
> > ORDER BY total_bytes DESC LIMIT 10;
> >   name   | id  |  parent  | parent_id | path
> > -+-+--+---+--
> >  CacheMemoryContext  |  27 | TopMemoryContext | 0 | {0}
> >  Timezones   | 124 | TopMemoryContext | 0 | {0}
> >  TopMemoryContext|   0 |  |   |
> >  MessageContext  |   8 | TopMemoryContext | 0 | {0}
> >  WAL record construction | 118 | TopMemoryContext | 0 | {0}
> >  ExecutorState   |  18 | PortalContext|17 | {0,16,17}
> >  TupleSort main  |  19 | ExecutorState|18 | {0,16,17,18}
> >  TransactionAbortContext |  14 | TopMemoryContext | 0 | {0}
> >  smgr relation table |  10 | TopMemoryContext | 0 | {0}
> >  GUC hash table  | 123 | GUCMemoryContext |   122 | {0,122}
> > (10 rows)
> > 
> > An example query to calculate the total_bytes including its children for a
> > context (say CacheMemoryContext) would look like this:
> > 
> > WITH contexts AS (
> > SELECT * FROM pg_backend_memory_contexts
> > )
> > SELECT sum(total_bytes)
> > FROM contexts
> > WHERE ARRAY[(SELECT id FROM contexts WHERE name = 'CacheMemoryContext')] <@
> > path;
> 
> I wonder if we should perhaps just include
> "total_bytes_including_children" as another column?  Certainly seems
> like a very useful thing that folks would like to see.

The "issue" is where to stop - should we also add that for some of the other
columns? They are a bit less important, but not that much.


> > We still need to use cte since ids are not persisted and might change in
> > each run of pg_backend_memory_contexts. Materializing the result can
> > prevent any inconsistencies due to id change. Also it can be even good for
> > performance reasons as well.
> 
> I don't think we really want this to be materialized, do we?  Where this
> is particularly interesting is when it's being dumped to the log ( ...
> though I wish we could do better than that and hope we do in the future)
> while something is ongoing in a given backend and if we do that a few
> times we are able to see what's changing in terms of allocations,
> whereas if we materialized it (when?  transaction start?  first time
> it's asked for?) then we'd only ever get the one view from whenever the
> snapshot was taken.

I think the comment was just about the need to use a CTE, because self-joining
with divergent versions of pg_backend_memory_contexts would not always work
out well.

Greetings,

Andres Freund

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Andres Freund

Hi,

On 2023-10-18 12:15:51 -0700, Andres Freund wrote:
> I still think that one of the more important things we ought to do is to make
> it trivial to check if code is correctly indented and reindent it for the
> user.  I've posted a preliminary patch to add a 'indent-tree' target a few
> months back, at
> https://postgr.es/m/20230527184201.2zdorrijg2inqt6v%40alap3.anarazel.de
> 
> I've updated that patch, now it has
> - indent-tree, reindents the entire tree
> - indent-head, which pgindent --commit=HEAD
> - indent-check, fails if the tree isn't correctly indented
> - indent-diff, like indent-check, but also shows the diff
> 
> If we tought pgindent to emit the list of files it processes to a dependency
> file, we could make it cheap to call indent-check repeatedly, by teaching
> meson/ninja to not reinvoke it if the input files haven't changed.  Personally
> that'd make it more bearable to script indentation checks to happen
> frequently.
> 
> 
> I'll look into writing a command to update typedefs.list with all the local
> changes.

It turns out that updating the in-tree typedefs.list would be very noisy. On
my local linux system I get
 1 file changed, 422 insertions(+), 1 deletion(-)

On a mac mini I get
 1 file changed, 351 insertions(+), 1 deletion(-)

We could possibly address that by updating the in-tree typedefs.list a bit
more aggressively. Sure looks like the source systems are on the older side.

But in the attached patch I've implemented this slightly differently. If the
tooling to do so is available, the indent-* targets explained above,
use/depend on src/tools/pgindent/typedefs.list.merged (in the build dir),
which is the combination of a src/tools/pgindent/typedefs.list.local generated
for the local binaries/libraries and the source tree
src/tools/pgindent/typedefs.list.

src/tools/pgindent/typedefs.list.local is generated in fragments, with one
fragment for each build target. That way the whole file doesn't have to be
regenerated all the time, which can save a good bit of time (althoug obviously
less when hacking on the backend).

This makes it quite quick to locally indent, without needing to manually
fiddle around with manually modifying typedefs.list or using a separate
typedefs.list.

In a third commit I added a 'nitpick' configure time option, defaulting to
off, which runs an indentation check. The failure mode of that currently isn't
very helpful though, as it just uses --silent-diff.

All of this currently is meson only, largely because I don't feel like
spending the time messing with the configure build, particularly before there
is any agreement on this being the thing to do.

Greetings,

Andres Freund
>From f7520fd87764cddbde9d6c5d25fdfe5314ec5cbc Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 27 May 2023 11:38:27 -0700
Subject: [PATCH v3 1/3] Add meson targets to run pgindent

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 meson.build | 38 +
 src/tools/pgindent/pgindent | 12 ++--
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 862c955453f..2cfeb60eb07 100644
--- a/meson.build
+++ b/meson.build
@@ -3013,6 +3013,44 @@ run_target('install-test-files',

+###
+# Indentation and similar targets
+###
+
+indent_base_cmd = [perl, files('src/tools/pgindent/pgindent'),
+'--indent', pg_bsd_indent.full_path(),
+'--sourcetree=@SOURCE_ROOT@']
+indent_depend = [pg_bsd_indent]
+
+# reindent the entire tree
+# Reindent the entire tree
+run_target('indent-tree',
+  command: indent_base_cmd + ['.'],
+  depends: indent_depend
+)
+
+# Reindent the last commit
+run_target('indent-head',
+  command: indent_base_cmd + ['--commit=HEAD'],
+  depends: indent_depend
+)
+
+# Silently check if any file is not corectly indented (fails the build if
+# there are differences)
+run_target('indent-check',
+  command: indent_base_cmd + ['--silent-diff', '.'],
+  depends: indent_depend
+)
+
+# Show a diff of all the unindented files (doesn't fail the build if there are
+# differences)
+run_target('indent-diff',
+  command: indent_base_cmd + ['--show-diff', '.'],
+  depends: indent_depend
+)
+
+
+
 ###
 # Test prep
 ###
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index bce63d95daf..08b0c95814f 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -23,7 +23,7 @@ my $devnull = File::Spec->devnull;

 my ($typedefs_file, $typedef_str, @excludes,
 	$indent, $build, $show_diff,
-	$silent_diff, $help, @commits,);
+	$silent_diff, $sourcetree, $help, @commits,);

 $help = 0;

@@ -35,7 +35,8 @@ my %options = (
 	"excludes=s" => \@excludes,
 	"indent=s" => \$indent,
 	"show-diff" =>

Re: pgsql: Generate automatically code and documentation related to wait ev

2023-10-18 Thread Michael Paquier

On Wed, Oct 18, 2023 at 05:59:19PM +0200, Christoph Berg wrote:
> I'm not entirely sure this is the commit to blame, but it's certainly
> close:

That would be in this area, thanks for the report.

> A Debian user is complaining that in PG17, the installed
> /usr/include/postgresql/17/server/utils/wait_event.h file is
> referencing utils/wait_event_types.h, but that file doesn't get
> installed by the (autoconf) build sytem.

On a fresh install of HEAD (3f9b1f26ca), I get:
$ cd $(pg_config --includedir-server)
$ find . -name wait_event.h
./utils/wait_event.h
$ find . -name wait_event_types.h
./utils/wait_event_types.h

But I have missed a piece related to VPATH builds for
wait_event_types.h in src/include/Makefile (see around probes.h).
Will fix as per the attached.  Thanks for the report.
--
Michael
diff --git a/src/include/Makefile b/src/include/Makefile
index 2d5242561c..a7ca277803 100644
--- a/src/include/Makefile
+++ b/src/include/Makefile
@@ -54,7 +54,7 @@ install: all installdirs
 	  $(INSTALL_DATA) $(srcdir)/$$dir/*.h '$(DESTDIR)$(includedir_server)'/$$dir || exit; \
 	done
 ifeq ($(vpath_build),yes)
-	for file in catalog/schemapg.h catalog/system_fk_info.h catalog/pg_*_d.h storage/lwlocknames.h utils/probes.h; do \
+	for file in catalog/schemapg.h catalog/system_fk_info.h catalog/pg_*_d.h storage/lwlocknames.h utils/probes.h utils/wait_event_types.h; do \
 	  $(INSTALL_DATA) $$file '$(DESTDIR)$(includedir_server)'/$$file || exit; \
 	done
 endif


signature.asc
Description: PGP signature

Re: The danger of deleting backup_label

2023-10-18 Thread David G. Johnston

On Wednesday, October 18, 2023, David Steele  wrote:

> On 10/18/23 08:39, Robert Haas wrote:
>
>> On Tue, Oct 17, 2023 at 4:17 PM David Steele  wrote:
>>
>>> Given that the above can't be back patched, I'm thinking we don't need
>>> backup_label at all going forward. We just write the values we need for
>>> recovery into pg_control and return *that* from pg_backup_stop() and
>>> tell the user to store it with their backup. We already have "These
>>> files are vital to the backup working and must be written byte for byte
>>> without modification, which may require opening the file in binary
>>> mode." in the documentation so dealing with pg_control should not be a
>>> problem. pg_control also has a CRC so we will know if it gets munged.
>>>
>>
>> Yeah, I was thinking about this kind of idea, too. I think it might be
>> a good idea, although I'm not completely certain about that, either.
>>
>
> 
>
> First, anything that is stored in the backup_label but not the control
>> file has to (a) move into the control file,
>>
>
> I'd rather avoid this.
>
>
If we are OK outputting custom pg_control file content from pg_backup_end
to govern this then I’d probably just include
“tablespace_map_required=true|false” and “backup_label_required=true” lines
in it and leave everything else mostly the same, including the name.  In
order for a server with those lines in its pg_control to boot it requires
that a signal file be present along with any of the named files where
*_required is true.  Upon successful completion those lines are removed
from pg_control.

It seem unnecessary to move any and all relevant content into pg_control;
just a flag to ensure that those files that are needed a present in the
backup directory and whatever validation of those files is needed to ensure
they provide sufficient data.

If the user removes those files without a backup the server is not going to
start unless they make further obvious attempts to circumvent the design.
Manually editing pg_comtrol being obvious circumventing.

This would seem to be a compatible change.  If you fail to use the
pg_control from pg_backup_stop you don’t get the safeguard but everything
still works.  Do we really believe we need to break/force-upgrade tooling
to use this new procedure?  Depending on the answer to the torn pg_comtrol
file problem which may indeed warrant such breakage.

David J.

Re: Add support for AT LOCAL

2023-10-18 Thread Noah Misch

On Wed, Oct 18, 2023 at 04:45:46PM -0400, Tom Lane wrote:
> > On Wed, Oct 18, 2023 at 12:02 PM Tom Lane  wrote:
> >> Probably.  Independent of that, it's fair to ask why we're still
> >> testing against xlc 12.1 and not the considerably-more-recent xlclang,
> >> or at least xlc 16.1.  (I also wonder why we're still testing AIX 7.1
> >> rather than an OS version that's not EOL.)

> The machine belongs to OSU (via the gcc compile farm), and I see
> that they have another one that's POWER8 and is running AIX 7.3 [1].
> So in principle the buildfarm animals could just be moved over
> to that one.
> 
> Perhaps Noah has some particular attachment to 7.1, but now that that's
> EOL it seems like we shouldn't be paying so much attention to it.
> My guess is that it's still there in the compile farm because the gcc
> people think it's still useful to have access to POWER7 hardware; but
> I doubt there's enough difference for our purposes to be worth dealing
> with a dead OS and ancient compiler.

No particular attachment.  From 2019 to 2023-08, hoverfly tested xlc16 on AIX
7.2; its run ended when cfarm119's owner replaced cfarm119 with an AIX 7.3,
ibm-clang v17.1.1 machine.  Since 2015, hornet and mandrill have tested xlc12
on AIX 7.1.  That said, given your finding that later xlc versions have the
same code generation bug, the choice of version is a side issue.  A migration
to ibm-clang wouldn't have prevented this week's xlc-prompted commits.

I feel the gravity and longevity of xlc bugs has been out of proportion with
the compiler's contribution to PostgreSQL.  I would find it reasonable to
revoke xlc support in v17+, leaving AIX gcc support in place.  The main
contribution of AIX has been to find the bug behind commit a1b8aa1.  That
benefited from the AIX kernel, not from any particular compiler.  hornet and
mandrill would continue to test v16-.

By the way, I once tried to report an xlc bug.  Their system was tailored to
accept bugs from paid support customers only.  I submitted it via some sales
inquiry form, just in case, but never heard back.

Re: The danger of deleting backup_label

2023-10-18 Thread David Steele


On 10/18/23 08:39, Robert Haas wrote:

On Tue, Oct 17, 2023 at 4:17 PM David Steele  wrote:

Given that the above can't be back patched, I'm thinking we don't need
backup_label at all going forward. We just write the values we need for
recovery into pg_control and return *that* from pg_backup_stop() and
tell the user to store it with their backup. We already have "These
files are vital to the backup working and must be written byte for byte
without modification, which may require opening the file in binary
mode." in the documentation so dealing with pg_control should not be a
problem. pg_control also has a CRC so we will know if it gets munged.


Yeah, I was thinking about this kind of idea, too. I think it might be
a good idea, although I'm not completely certain about that, either.





First, anything that is stored in the backup_label but not the control
file has to (a) move into the control file, 


I'd rather avoid this.


(b) be stored someplace
else, 


I don't think the additional fields *need* to be stored anywhere at all, 
at least not by us. We can provide them as output from pg_backup_stop() 
and the caller can do as they please. None of those fields are part of 
the restore process.


or (c) be eliminated as a concept. 


I think we should keep them as above since I don't believe they hurt 
anything.



We're likely to get
complaints about (a), especially if the data in question is anything
big. Any proposal to do (b) risks undermining the whole theory under
which this is a good proposal, namely that removing backup_label gives
us one less thing to worry about. So that brings us to (c).
Personally, I would lose very little sleep if the LABEL field died and
never came back, and I wouldn't miss START TIME and STOP TIME either,
but somebody else might well feel differently. I don't think it's
trivial to get rid of BACKUP METHOD, as there unfortunately seems to
be code that depends on knowing the difference between BACKUP FROM:
streamed and BACKUP FROM: pg_rewind. I suspect that BACKUP FROM:
primary/standby might have the same issue, but I'm not sure. 


BACKUP FROM: primary/standby we can definitely handle, and BACKUP 
METHOD: pg_rewind just means backupEndRequired is false. That will need 
to be written by pg_rewind (instead of writing backup_label).



STOP
TIMELINE could be a problem too. I think that if somebody could do
some rejiggering to eliminate some of the differences between the
cases here, that could be really good general cleanup irrespective of
what we decide about this proposal, and moving some things in to
pg_control is probably reasonable too. For instance, it would seem
crazy to me to argue that storing the backup end location in the
control file is OK, but storing the backup end TLI there would not be
OK. 


We have a place in pg_control for the end TLI (minRecoveryPointTLI). 
That's only valid for backup from standby since a primary backup can 
never change timelines.



But the point here is that there's probably a good deal of careful
thinking that would need to be done here about exactly where all of
the stuff that currently exists in the backup_label file but not in
pg_control needs to end up.


Agreed.


Second, right now, the stuff that we return at the end of a backup is
all text data. With this proposal, it becomes binary data. I entirely
realize that people should only be doing these kinds of backups using
automated tools that that those automated tools should be perfectly
capable of handling binary data without garbling anything. But that's
about as realistic as supposing that people won't instantly remove the
backup_label file the moment it seems like it will solve some problem,
even when the directions clearly state that this should only be done
in some other situation that is not the one the user is facing. 


Well, we do specify that backup_label and tablespace_map should be saved 
byte for byte. But We've already seen users mess this up in the past and 
add \r characters that made the files unreadable.


If it can be done wrong, it will be done wrong by somebody.


It
just occurred to me that one thing we could do to improve the user
experience here is offer some kind of command-line utility to assist
with taking a low-level backup. This could be done even if we don't
proceed with this proposal e.g.

pg_llbackup -d $CONNTR --backup-label=PATH --tablespace-map=PATH
--copy-data-directory=SHELLCOMMAND

I don't know for sure how much that would help, but I wonder if it
might actually help quite a bit, because right now people do things
like use psql in a shell script to try to juggle a database connection
and then in some other part of the shell script do the data copying.
But it is quite easy to screw up the error handling or the psql
session lifetime or something like that, and this would maybe provide
a nicer interface. 


I think in most cases where this would be useful the user should just be 
using pg_basebackup. If the backup is trying to use snapshots,

Re: Add support for AT LOCAL

2023-10-18 Thread Tom Lane

Robert Haas  writes:
> On Wed, Oct 18, 2023 at 12:02 PM Tom Lane  wrote:
>> Probably.  Independent of that, it's fair to ask why we're still
>> testing against xlc 12.1 and not the considerably-more-recent xlclang,
>> or at least xlc 16.1.  (I also wonder why we're still testing AIX 7.1
>> rather than an OS version that's not EOL.)

> Well, according to Wikipedia, AIX 7.3 (released in 2021) requires
> POWER8. AIX 7.2 (released 2015) only requires POWER7, and according to
> the buildfarm page, this machine is POWER7. So it could possibly be
> upgraded from 7.1 to 7.2, supposing that it is indeed compatible with
> that release and that Noah's willing to do it and that there's not an
> exorbitant fee and so on, but that still leaves you running an OS
> version that is almost certainly closer to EOL than it is to the
> original release date. Anything newer would require buying new
> hardware, or so I guess.

The machine belongs to OSU (via the gcc compile farm), and I see
that they have another one that's POWER8 and is running AIX 7.3 [1].
So in principle the buildfarm animals could just be moved over
to that one.

Perhaps Noah has some particular attachment to 7.1, but now that that's
EOL it seems like we shouldn't be paying so much attention to it.
My guess is that it's still there in the compile farm because the gcc
people think it's still useful to have access to POWER7 hardware; but
I doubt there's enough difference for our purposes to be worth dealing
with a dead OS and ancient compiler.

regards, tom lane

[1] https://portal.cfarm.net/machines/list/

Re: pg_dump needs SELECT privileges on irrelevant extension table

2023-10-18 Thread Tom Lane

Stephen Frost  writes:
> Greetings,
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> I wrote:
>>> Why are we marking extension member objects as being subject to SECLABEL
>>> or POLICY dumping?

> This change would mean that policies added by a user after the extension
> is created would just be lost by a pg_dump/reload, doesn't it?

Yes.  But I'd say that's unsupported, just like making other ad-hoc
changes to extension objects is unsupported (and the effects will be
lost on dump/reload).  We specifically have support for user-added
ACLs, and that's good, but don't claim that we have support for
doing the same with policies.

As far as I can see, the current behavior is that we'll dump and
try to reload policies (and seclabels) on extension objects even
if those properties were set by the extension creation script.
That has many more problems than just the one Jacob is moaning
about: you'll see failures at reload if you're not superuser,
and if the destination installation has a newer version of the
extension than what was dumped, the old properties might be
completely inappropriate.  So IMO there's basically nothing
that works properly about this.  To make it work, we'd need
infrastructure comparable to the pg_init_privs infrastructure.

regards, tom lane

Re: pg_dump needs SELECT privileges on irrelevant extension table

2023-10-18 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> I wrote:
> > Why are we marking extension member objects as being subject to SECLABEL
> > or POLICY dumping?  As the comment notes, that isn't really sensible
> > unless what we are dumping is a delta from the extension's initial
> > assignments.  But we have no infrastructure for that, and none seems
> > likely to appear in the near future.
> 
> Here's a quick patch that does it that way.  The test changes
> are identical to Jacob's v3-0001.

What the comment is talking about is that we don't support initial
policies, not that we don't support policies on extension tables at all.
That said ... even the claim that we don't support such policies isn't
supported by code and there are people out there doing it, which creates
its own set of problems (ones we should really try to find solutions to
though..).

This change would mean that policies added by a user after the extension
is created would just be lost by a pg_dump/reload, doesn't it?

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: Parent/child context relation in pg_get_backend_memory_contexts()

2023-10-18 Thread Stephen Frost

Greetings,

* Melih Mutlu (m.melihmu...@gmail.com) wrote:
> Melih Mutlu , 16 Haz 2023 Cum, 17:03 tarihinde şunu
> yazdı:
> 
> > With this change, here's a query to find how much space used by each
> > context including its children:
> >
> > > WITH RECURSIVE cte AS (
> > > SELECT id, total_bytes, id as root, name as root_name
> > > FROM memory_contexts
> > > UNION ALL
> > > SELECT r.id, r.total_bytes, cte.root, cte.root_name
> > > FROM memory_contexts r
> > > INNER JOIN cte ON r.parent_id = cte.id
> > > ),
> > > memory_contexts AS (
> > > SELECT * FROM pg_backend_memory_contexts
> > > )
> > > SELECT root as id, root_name as name, sum(total_bytes)
> > > FROM cte
> > > GROUP BY root, root_name
> > > ORDER BY sum DESC;
> 
> Given that the above query to get total bytes including all children is
> still a complex one, I decided to add an additional info in
> pg_backend_memory_contexts.
> The new "path" field displays an integer array that consists of ids of all
> parents for the current context. This way it's easier to tell whether a
> context is a child of another context, and we don't need to use recursive
> queries to get this info.

Nice, this does seem quite useful.

> Here how pg_backend_memory_contexts would look like with this patch:
> 
> postgres=# SELECT name, id, parent, parent_id, path
> FROM pg_backend_memory_contexts
> ORDER BY total_bytes DESC LIMIT 10;
>   name   | id  |  parent  | parent_id | path
> -+-+--+---+--
>  CacheMemoryContext  |  27 | TopMemoryContext | 0 | {0}
>  Timezones   | 124 | TopMemoryContext | 0 | {0}
>  TopMemoryContext|   0 |  |   |
>  MessageContext  |   8 | TopMemoryContext | 0 | {0}
>  WAL record construction | 118 | TopMemoryContext | 0 | {0}
>  ExecutorState   |  18 | PortalContext|17 | {0,16,17}
>  TupleSort main  |  19 | ExecutorState|18 | {0,16,17,18}
>  TransactionAbortContext |  14 | TopMemoryContext | 0 | {0}
>  smgr relation table |  10 | TopMemoryContext | 0 | {0}
>  GUC hash table  | 123 | GUCMemoryContext |   122 | {0,122}
> (10 rows)
> 
> An example query to calculate the total_bytes including its children for a
> context (say CacheMemoryContext) would look like this:
> 
> WITH contexts AS (
> SELECT * FROM pg_backend_memory_contexts
> )
> SELECT sum(total_bytes)
> FROM contexts
> WHERE ARRAY[(SELECT id FROM contexts WHERE name = 'CacheMemoryContext')] <@
> path;

I wonder if we should perhaps just include
"total_bytes_including_children" as another column?  Certainly seems
like a very useful thing that folks would like to see.  We could do that
either with C, or even something as simple as changing the view to do
something like:

WITH contexts AS MATERIALIZED (
  SELECT * FROM pg_get_backend_memory_contexts()
)
SELECT
  *,
  coalesce
  (
(
  (SELECT sum(total_bytes) FROM contexts WHERE ARRAY[a.id] <@ path)
  + total_bytes
),
total_bytes
  ) AS total_bytes_including_children
FROM contexts a;

> We still need to use cte since ids are not persisted and might change in
> each run of pg_backend_memory_contexts. Materializing the result can
> prevent any inconsistencies due to id change. Also it can be even good for
> performance reasons as well.

I don't think we really want this to be materialized, do we?  Where this
is particularly interesting is when it's being dumped to the log ( ...
though I wish we could do better than that and hope we do in the future)
while something is ongoing in a given backend and if we do that a few
times we are able to see what's changing in terms of allocations,
whereas if we materialized it (when?  transaction start?  first time
it's asked for?) then we'd only ever get the one view from whenever the
snapshot was taken.

> Any thoughts?

Generally +1 from me for working on improving this.

Thanks!

Stephen


signature.asc
Description: PGP signature

Avoid race condition for event_triggers regress test

2023-10-18 Thread Mikhail Gribkov

Hi hackers,

After committing the on-login trigger
(e83d1b0c40ccda8955f1245087f0697652c4df86) the event_trigger regress test
became sensible to any other parallel tests, not only DDL. Thus it should
be placed in a separate parallel schedule group.

The current problem is that a race condition may occur on some systems,
when oidjoins test starts a moment later than normally and affects logins
count for on-login trigger test. The problem is quite a rare one and I only
faced it once. But rare or not - the problem is a problem and it should be
addressed.

Such race condition can be simulated by adding "select pg_sleep(2);" and
"\c" at the very beginning of oidjoins.sql and adding "select pg_sleep(5);"
after creation of the login trigger in event_trigger.sql.
The resulting symptoms are quite recognizable: regression.diffs file will
contain unexpected welcome message for oidjoins test and unexpectedly
increased result of "SELECT COUNT(*) FROM user_logins;" for event_triggers
test. (These are accompanied with the expected responses to the newly added
commands of course)

To get rid of the unexpected results the oidjoins and event_triggers tests
should be splitted into separate paralell schedule groups. This is exactly
what the proposed (attached) patch is doing.

What do you think?
--
 best regards,
Mikhail A. Gribkov

e-mail: youzh...@gmail.com


v001_avoid_race_condition_for_event_trigger_test.patch
Description: Binary data

Re: Guiding principle for dropping LLVM versions?

2023-10-18 Thread Thomas Munro

*rouge

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Andres Freund

Hi,

On 2023-10-16 20:45:00 -0400, Tom Lane wrote:
> Peter Geoghegan  writes:
> 2. We could raise awareness of this issue by adding indent verification
> to CI testing.  I hesitate to suggest that, though, for a couple of
> reasons:
>2a. It seems fairly expensive, though I might be misjudging.

Compared to other things it's not that expensive. On my workstation, which is
slower on a per-core basis than CI, a whole tree pgindent --silent-diff takes
6.8s.  For That's doing things serially, it shouldn't be that hard to 
parallelize
the per-file processing.

For comparison, the current compiler warnings task takes 6-15min, depending on
the state of the ccache "database". Even when ccache is primed, running
cpluspluscheck or headerscheck is ~30s each. Adding a few more seconds for an
indentation check wouldn't be a problem.

>2b. It's often pretty handy to submit patches that aren't fully
>indent-clean; I have such a patch in flight right now at [1].
>
> 2b could be ameliorated by making the indent check be a separate
> test process that doesn't obscure the results of other testing.

The compiler warnings task already executes a number of tests even if prior
tests have failed (to be able to find compiler warnings in different compilers
at once). Adding pgindent cleanliness to that would be fairly simple.

I still think that one of the more important things we ought to do is to make
it trivial to check if code is correctly indented and reindent it for the
user.  I've posted a preliminary patch to add a 'indent-tree' target a few
months back, at
https://postgr.es/m/20230527184201.2zdorrijg2inqt6v%40alap3.anarazel.de

I've updated that patch, now it has
- indent-tree, reindents the entire tree
- indent-head, which pgindent --commit=HEAD
- indent-check, fails if the tree isn't correctly indented
- indent-diff, like indent-check, but also shows the diff

If we tought pgindent to emit the list of files it processes to a dependency
file, we could make it cheap to call indent-check repeatedly, by teaching
meson/ninja to not reinvoke it if the input files haven't changed.  Personally
that'd make it more bearable to script indentation checks to happen
frequently.

I'll look into writing a command to update typedefs.list with all the local
changes.

Greetings,

Andres Freund
>From 288c109286fda4dbd40fab345fbaf71d91d2db39 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sat, 27 May 2023 11:38:27 -0700
Subject: [PATCH v2] Add meson targets to run pgindent

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 meson.build | 36 
 src/tools/pgindent/pgindent | 12 ++--
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 862c955453f..7d4d02e7624 100644
--- a/meson.build
+++ b/meson.build
@@ -3013,6 +3013,42 @@ run_target('install-test-files',

+###
+# Indentation and similar targets
+###
+
+indent_base = [perl, files('src/tools/pgindent/pgindent'),
+'--indent', pg_bsd_indent.full_path(),
+'--sourcetree=@SOURCE_ROOT@']
+
+# reindent the entire tree
+run_target('indent-tree',
+  command: indent_base + ['.'],
+  depends: pg_bsd_indent
+)
+
+# reindent the last commit
+run_target('indent-head',
+  command: indent_base + ['--commit=HEAD'],
+  depends: pg_bsd_indent
+)
+
+# silently check if any file is not corectly indented (fails the build if
+# there are differences)
+run_target('indent-check',
+  command: indent_base + ['--silent-diff', '.'],
+  depends: pg_bsd_indent
+)
+
+# show a diff of all the unindented files (doesn't fail the build if there are
+# differences)
+run_target('indent-diff',
+  command: indent_base + ['--show-diff', '.'],
+  depends: pg_bsd_indent
+)
+
+
+
 ###
 # Test prep
 ###
diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent
index bce63d95daf..08b0c95814f 100755
--- a/src/tools/pgindent/pgindent
+++ b/src/tools/pgindent/pgindent
@@ -23,7 +23,7 @@ my $devnull = File::Spec->devnull;

 my ($typedefs_file, $typedef_str, @excludes,
 	$indent, $build, $show_diff,
-	$silent_diff, $help, @commits,);
+	$silent_diff, $sourcetree, $help, @commits,);

 $help = 0;

@@ -35,7 +35,8 @@ my %options = (
 	"excludes=s" => \@excludes,
 	"indent=s" => \$indent,
 	"show-diff" => \$show_diff,
-	"silent-diff" => \$silent_diff,);
+	"silent-diff" => \$silent_diff,
+	"sourcetree=s" => \$sourcetree);
 GetOptions(%options) || usage("bad command line argument");

 usage() if $help;
@@ -53,6 +54,13 @@ $typedefs_file ||= $ENV{PGTYPEDEFS};
 # get indent location for environment or default
 $indent ||= $ENV{PGINDENT} || $ENV{INDENT} || "pg_bsd_indent";

+# When invoked from the build directory, change into source tree, otherwise
+#

Re: Guiding principle for dropping LLVM versions?

2023-10-18 Thread Thomas Munro

On Thu, Sep 21, 2023 at 12:27 PM Devrim Gündüz  wrote:
> Even though older LLVM versions exist on both RHEL and Fedora, they
> don't provide older Clang packages, which means we have to link to the
> latest release anyway (like currently Fedora 38 packages are waiting for
> LLVM 16 patch, as they cannot be linked against LLVM 15)

That's quite interesting, because it means that RHEL doesn't act as
the "lanterne route" for this, ie the most conservative relevant
distribution.

If we used Debian as a yardstick, PostgreSQL 17 wouldn't need anything
older than LLVM 14 AFAICS.  Who else do we need to ask?  Where could
we find this sort of information in machine-readable form (that is
feedback I got discussing the wiki page idea with people, ie that it
would be bound to become stale and abandoned)?

Fresh from doing battle with this stuff, I wanted to see what it would
look like if we dropped 3.9...13 in master:

 11 files changed, 12 insertions(+), 367 deletions(-)

I noticed in passing that the LLVMOrcRegisterJITEventListener
configure probes are not present in meson.

0001-jit-Require-at-least-LLVM-14-if-enabled.patch
Description: Binary data

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2023-10-18 Thread Stephen Frost

Greetings,

* Andrei Lepikhov (a.lepik...@postgrespro.ru) wrote:
> On 29/9/2023 09:52, Andrei Lepikhov wrote:
> > On 22/5/2023 22:59, reid.thomp...@crunchydata.com wrote:
> > > Attach patches updated to master.
> > > Pulled from patch 2 back to patch 1 a change that was also pertinent
> > > to patch 1.
> > +1 to the idea, have doubts on the implementation.
> > 
> > I have a question. I see the feature triggers ERROR on the exceeding of
> > the memory limit. The superior PG_CATCH() section will handle the error.
> > As I see, many such sections use memory allocations. What if some
> > routine, like the CopyErrorData(), exceeds the limit, too? In this case,
> > we could repeat the error until the top PG_CATCH(). Is this correct
> > behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for
> > recursion and allow error handlers to slightly exceed this hard limit?

> By the patch in attachment I try to show which sort of problems I'm worrying
> about. In some PП_CATCH() sections we do CopyErrorData (allocate some
> memory) before aborting the transaction. So, the allocation error can move
> us out of this section before aborting. We await for soft ERROR message but
> will face more hard consequences.

While it's an interesting idea to consider making exceptions to the
limit, and perhaps we'll do that (or have some kind of 'reserve' for
such cases), this isn't really any different than today, is it?  We
might have a malloc() failure in the main path, end up in PG_CATCH() and
then try to do a CopyErrorData() and have another malloc() failure.

If we can rearrange the code to make this less likely to happen, by
doing a bit more work to free() resources used in the main path before
trying to do new allocations, then, sure, let's go ahead and do that,
but that's independent from this effort.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Removing unneeded self joins

2023-10-18 Thread Alexander Korotkov

On Mon, Oct 16, 2023 at 11:28 AM Andrei Lepikhov
 wrote:
> On 12/10/2023 18:32, Alexander Korotkov wrote:
> > On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
> >  wrote:
> >> On 4/10/2023 14:34, Alexander Korotkov wrote:
> >>>   > Relid replacement machinery is the most contradictory code here. We 
> >>> used
> >>>   > a utilitarian approach and implemented a simplistic variant.
> >>>
> >>>   > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> >>>   > > they are not necessary.  [1] points that infrastructure from [2] 
> >>> might
> >>>   > > be useful.  The patchset from [2] seems committed mow.  However, I
> >>>   > > can't see it is directly helpful in this matter.  Could we just skip
> >>>   > > adding IS NOT NULL clause for the columns, that have
> >>>   > > pg_attribute.attnotnull set?
> >>>   > Thanks for the links, I will look into that case.
> >> To be more precise, in the attachment, you can find a diff to the main
> >> patch, which shows the volume of changes to achieve the desired behaviour.
> >> Some explains in regression tests shifted. So, I've made additional tests:
> >>
> >> DROP TABLE test CASCADE;
> >> CREATE TABLE test (a int, b int not null);
> >> CREATE UNIQUE INDEX abc ON test(b);
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b;
> >> CREATE UNIQUE INDEX abc1 ON test(a,b);
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b;
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
> >> DROP INDEX abc1;
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
> >>
> >> We have almost the results we wanted to have. But in the last explain
> >> you can see that nothing happened with the OR clause. We should use the
> >> expression mutator instead of walker to handle such clauses. But It
> >> doesn't process the RestrictInfo node ... I'm inclined to put a solution
> >> of this issue off for a while.
> >
> > OK.  I think it doesn't worth to eliminate IS NULL quals with this
> > complexity (at least at this stage of work).
> >
> > I made improvements over the code.  Mostly new comments, grammar
> > corrections of existing comments and small refactoring.
> >
> > Also, I found that the  suggestion from David Rowley [1] to qsort
> > array of relations to faster find duplicates is still unaddressed.
> > I've implemented it.  That helps to evade quadratic complexity with
> > large number of relations.
> >
> > Also I've incorporated improvements from Alena Rybakina except one for
> > skipping SJ removal when no SJ quals is found.  It's not yet clear for
> > me if this check fix some cases. But at least optimization got skipped
> > in some useful cases (as you can see in regression tests).
>
> I would like to propose one more minor improvement (see in attachment).
> The idea here is that after removing a self-join and changing clauses we
> should re-probe the set of relids with the same Oid, because we can find
> more removable self-joins (see the demo test in join.sql).


Thank you, I've integrated this into the patch.  BTW, the patch
introduces two new GUC variables: enable_self_join_removal,
self_join_search_limit.  enable_self_join_removal variable turns
on/off optimization at all.  self_join_search_limit variable limits
its usage by the number of joins.  AFICS, self_join_search_limit is
intended to protect us from quadratic complexity self-join removal
has.  I tried to reproduce the extreme case.

SELECT count(*) FROM pgbench_accounts a0, pgbench_accounts a1, ...,
pgbench_accounts a100 WHERE a0.aid = 1 AND a1.aid = a0.aid + 1 AND ...
AND a100.aid = a99.aid + 1;

This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled.  So, no measurable
overhead.  Similar to the higher number of joins.  Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts?  If not, should we
remove self_join_search_limit GUC?

--
Regards,
Alexander Korotkov


0001-Remove-useless-self-joins-v45.patch
Description: Binary data

Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-18 Thread Stephen Frost

Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> On Tue, 2023-10-17 at 22:52 -0400, Stephen Frost wrote:
> 
> > Reading back through the thread, from a user perspective, the primary
> > one seems to be that passwords are expected to be named.  I'm
> > surprised
> > this is being brought up as such a serious concern.  Lots and lots
> > and
> > lots of things in the system require naming, after all, and the idea
> > that this is somehow harder or more of an issue is quite odd to me.
> 
> In the simplest intended use case, the names will be arbitrary and
> temporary. It's easy for me to imagine someone wondering "was I
> supposed to delete 'bear' or 'lion'?". For indexes and other objects,
> there's a lot more to go on, easily visible with \d.

Sure, agreed.

> Now, obviously that is not the end of the world, and the user could
> prevent that problem a number of different ways. And we can do things
> like improve the monitoring of password use, and store the password
> creation time, to help users if they are confused. So I don't raise
> concerns about naming as an objection to the feature overall, but
> rather a concern that we aren't getting it quite right.

Right, we need more observability, agreed, but that's not strictly
necessary of this patch and could certainly be added independently.  Is
there really a need to make this observability a requirement of this
particular change?

> Maybe a name should be entirely optional, more like a comment, and the
> passwords can be referenced by the salt? The salt needs to be unique
> for a given user anyway.

I proposed an approach in the email you replied to explicitly suggesting
a way we could make the name be optional, so, sure, I'm generally on
board with that idea.  Note that it'd be optional for the user to
provide and then we'd simply generate one for them and then that name is
what would be used to refer to that password later.

> (Aside: is the uniqueness of the salt enforced in the current patch?)

Err, the salt has to be *identical* for each password of a given user,
not unique, so I'm a bit confused here.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: LLVM 16 (opaque pointers)

2023-10-18 Thread Devrim Gündüz

Hi,

On Thu, 2023-10-19 at 06:20 +1300, Thomas Munro wrote:
> Pushed, after digging up some old LLVM skeletons to test, and those
> "old LLVM" animals are turning green now.  I went ahead and pushed the
> much smaller and simpler patch for LLVM 17.

Thank you! I can confirm that RPMs built fine on Fedora 39 with those
patches, which ships LLVM 17.0.2 as of today.

I can also confirm that builds are not broken on RHEL 9 and 8 and Fedora
37 which ship LLVM 15, and Fedora 38 (LLVM 16).

Thanks again!

Regards,
-- 
Devrim Gündüz
Open Source Solution Architect, PostgreSQL Major Contributor
Twitter: @DevrimGunduz , @DevrimGunduzTR

Re: LLVM 16 (opaque pointers)

2023-10-18 Thread Andres Freund

Hi,

On 2023-10-19 06:20:26 +1300, Thomas Munro wrote:
> On Thu, Oct 19, 2023 at 1:06 AM Thomas Munro  wrote:
> > On Thu, Oct 19, 2023 at 12:02 AM Thomas Munro  
> > wrote:
> > > I pushed the first patch, for LLVM 16, and the build farm told me that
> > > some old LLVM versions don't like it.  The problem seems to be the
> > > function LLVMGlobalGetValueType().  I can see that that was only added
> > > to the C API in 2018, so it looks like I may need to back-port that
> > > (trivial) wrapper into our own llvmjit_wrap.cpp for LLVM < 8.
> >
> > Concretely something like the attached should probably fix it, but
> > it'll take me a little while to confirm that...
> 
> Pushed, after digging up some old LLVM skeletons to test, and those
> "old LLVM" animals are turning green now.  I went ahead and pushed the
> much smaller and simpler patch for LLVM 17.

I enabled a new set of buildfarm animals to test LLVM 16 and 17. I initially
forgot to disable them for 11, which means we'll have those failed build on 11
until they age out :/.

Particularly for the LLVM debug builds it'll take a fair bit to run on all
branches. Each branch takes about 3h.

Greetings,

Andres Freund

Re: LLVM 16 (opaque pointers)

2023-10-18 Thread Thomas Munro

On Thu, Oct 19, 2023 at 1:06 AM Thomas Munro  wrote:
> On Thu, Oct 19, 2023 at 12:02 AM Thomas Munro  wrote:
> > I pushed the first patch, for LLVM 16, and the build farm told me that
> > some old LLVM versions don't like it.  The problem seems to be the
> > function LLVMGlobalGetValueType().  I can see that that was only added
> > to the C API in 2018, so it looks like I may need to back-port that
> > (trivial) wrapper into our own llvmjit_wrap.cpp for LLVM < 8.
>
> Concretely something like the attached should probably fix it, but
> it'll take me a little while to confirm that...

Pushed, after digging up some old LLVM skeletons to test, and those
"old LLVM" animals are turning green now.  I went ahead and pushed the
much smaller and simpler patch for LLVM 17.

Interestingly, a new problem just showed up on the the RHEL9 s390x
machine "lora", where a previously reported problem [1] apparently
re-appeared.  It complains about incompatible layout, previously
blamed on mismatch between clang and LLVM versions.  I can see that
its clang is v15 from clues in the conflig log, but I don't know which
version of LLVM is being used.  However, I see now that --with-llvm
was literally just turned on, so there is no reason to think that this
would have worked before or this work is relevant.  Strange though --
we must be able to JIT further than that on s390x because we have
crash reports in other threads (ie we made it past this and into other
more advanced brokenness).

[1] 
https://www.postgresql.org/message-id/flat/20210319190047.7o4bwhbp5dzkqif3%40alap3.anarazel.de#ec51b488ca8eac8c603d91c0439d38b2

Re: Add support for AT LOCAL

2023-10-18 Thread Robert Haas

On Wed, Oct 18, 2023 at 12:02 PM Tom Lane  wrote:
> Probably.  Independent of that, it's fair to ask why we're still
> testing against xlc 12.1 and not the considerably-more-recent xlclang,
> or at least xlc 16.1.  (I also wonder why we're still testing AIX 7.1
> rather than an OS version that's not EOL.)

Well, according to Wikipedia, AIX 7.3 (released in 2021) requires
POWER8. AIX 7.2 (released 2015) only requires POWER7, and according to
the buildfarm page, this machine is POWER7. So it could possibly be
upgraded from 7.1 to 7.2, supposing that it is indeed compatible with
that release and that Noah's willing to do it and that there's not an
exorbitant fee and so on, but that still leaves you running an OS
version that is almost certainly closer to EOL than it is to the
original release date. Anything newer would require buying new
hardware, or so I guess.

Put otherwise, I think the reason we're testing on this AIX rather
than anything else is probably that there is exactly 1 person
associated with the project who has >0 pieces of hardware that can run
AIX, and that person has one, so we're testing on that one. That might
be a reason to question whether that particular strain of hardware has
a bright future, at least in terms of PostgreSQL support.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Postgres Architecture

2023-10-18 Thread Bruce Momjian

On Tue, Oct 17, 2023 at 08:39:49AM +1100, Timothy Nelson wrote:
> Great!  I'm not surprised it's been around a long time -- I didn't think I
> could be the only one to think of it.  
> 
> Thanks for the heads-up on Postgres-XL -- I'd missed that one somehow. 
> 
> I'm going to include the words "architecture" and "replication" so that people
> searching the archives in the future have more chance of finding this
> conversation. 

You can get some of this using foreign data wrappers to other Postgres
servers.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Bruce Momjian

On Tue, Oct 17, 2023 at 11:01:44AM -0400, Robert Haas wrote:
> In fact, that particular experience is one of the worst things about
> being a committer. It actively discourages me, at least, from trying
> to get other people's patches committed. This particular problem is
> minor, but the overall experience of trying to get things committed is
> that you have to check 300 things for every patch and if you get every
> one of them right then nothing happens and if you get one of them
> wrong then you get a bunch of irritated emails criticizing your
> laziness, sloppiness, or whatever, and you have to drop everything to
> go fix it immediately. What a deal! I'm sure this isn't the only
> reason why we have such a huge backlog of patches needing committer
> attention, but it sure doesn't help. And there is absolutely zero need
> for this to be yet another thing that you can find out you did wrong
> in the 1-24 hour period AFTER you type 'git push'.

This comment resonated with me.  I do all my git operations with shell
scripts so I can check for all the mistakes I have made in the past and
generate errors. Even with all of that, committing is an
anxiety-producing activity because any small mistake is quickly revealed
to the world.  There aren't many things I do in a day where mistakes are
so impactful.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-18 Thread Jeff Davis

On Tue, 2023-10-17 at 22:52 -0400, Stephen Frost wrote:

> Reading back through the thread, from a user perspective, the primary
> one seems to be that passwords are expected to be named.  I'm
> surprised
> this is being brought up as such a serious concern.  Lots and lots
> and
> lots of things in the system require naming, after all, and the idea
> that this is somehow harder or more of an issue is quite odd to me.

In the simplest intended use case, the names will be arbitrary and
temporary. It's easy for me to imagine someone wondering "was I
supposed to delete 'bear' or 'lion'?". For indexes and other objects,
there's a lot more to go on, easily visible with \d.

Now, obviously that is not the end of the world, and the user could
prevent that problem a number of different ways. And we can do things
like improve the monitoring of password use, and store the password
creation time, to help users if they are confused. So I don't raise
concerns about naming as an objection to the feature overall, but
rather a concern that we aren't getting it quite right.

Maybe a name should be entirely optional, more like a comment, and the
passwords can be referenced by the salt? The salt needs to be unique
for a given user anyway.

(Aside: is the uniqueness of the salt enforced in the current patch?)

Regards,
Jeff Davis

Re: RFC: Logging plan of the running query

2023-10-18 Thread James Coleman

On Wed, Oct 18, 2023 at 2:09 AM torikoshia  wrote:
>
> On 2023-10-16 18:46, Ashutosh Bapat wrote:
> > On Thu, Oct 12, 2023 at 6:51 PM torikoshia 
> > wrote:
> >>
> >> On 2023-10-11 16:22, Ashutosh Bapat wrote:
> >> >
> >> > Considering the similarity with auto_explain I wondered whether this
> >> > function should be part of auto_explain contrib module itself? If we
> >> > do that users will need to load auto_explain extension and thus
> >> > install executor hooks when this function doesn't need those. So may
> >> > not be such a good idea. I didn't see any discussion on this.
> >>
> >> I once thought about adding this to auto_explain, but I left it asis
> >> for
> >> below reasons:
> >>
> >> - One of the typical use case of pg_log_query_plan() would be
> >> analyzing
> >> slow query on customer environments. On such environments, We cannot
> >> always control what extensions to install.
> >
> > The same argument applies to auto_explain functionality as well. But
> > it's not part of the core.
>
> Yeah, and when we have a situation where we want to run
> pg_log_query_plan(), we can run it in any environment as long as it is
>   bundled with the core.
> On the other hand, if it is built into auto_explain, we need to start by
> installing auto_explain if we do not have auto_explain, which is often
> difficult to do in production environments.
>
> >>Of course auto_explain is a major extension and it is quite
> >> possible
> >> that they installed auto_explain, but but it is also possible they do
> >> not.
> >> - It seems a bit counter-intuitive that pg_log_query_plan() is in an
> >> extension called auto_explain, since it `manually`` logs plans
> >>
> >
> > pg_log_query_plan() may not fit auto_explain but
> > pg_explain_backend_query() does. What we are logging is more than just
> > plan of the query, it might expand to be closer to explain output.
> > While auto in auto_explain would refer to its automatically logging
> > explain outputs, it can provide an additional function which provides
> > similar functionality by manually triggering it.
> >
> > But we can defer this to a committer, if you want.
> >
> > I am more interested in avoiding the duplication of code, esp. the
> > first comment in my reply
>
> If there are no objections, I will try porting it to auto_explain and
> see its feasibility.
>
> >>> There is a lot of similarity between what this feature does and what
> >>> auto explain does. I see the code is also duplicated. There is some
> >>> merit in avoiding this duplication
> >>> 1. we will get all the features of auto_explain automatically like
> >>> choosing a format (this was expressed somebody earlier in this
> >>> thread), setings etc.
> >>> 2. avoid bugs. E.g your code switches context after ExplainState has
> >>> been allocated. These states may leak depending upon when this
> >>> function gets called.
> >>> 3. Building features on top as James envisions will be easier.

In my view the fact that auto_explain is itself not part of core is a
mistake, and I know there are more prominent hackers than myself who
hold that view.

While that decision as regards auto_explain has long since been made
(and there would be work to undo it), I don't believe that we should
repeat that choice here. I'm -10 on moving this into auto_explain.

However perhaps there is still an opportunity for moving some of the
auto_explain code into core so as to enable deduplicating the code.

Regards,
James Coleman

Re: Query execution in Perl TAP tests needs work

2023-10-18 Thread Alvaro Herrera

On 2023-Oct-18, Robert Haas wrote:

> Without FFI::Platypus, we have to write Perl code that can speak the
> wire protocol directly. Basically, we're writing our own PostgreSQL
> driver for Perl, though we might need only a subset of the things a
> real driver would need to handle, and we might add some extra things,
> like code that can send intentionally botched protocol messages.

We could revive the old src/interfaces/perl5 code, which was a libpq
wrapper -- at least the subset of it that the tests need.  It was moved
to gborg by commit 9a0b4d7f8474 and a couple more versions were made
there, which can be seen at
https://www.postgresql.org/ftp/projects/gborg/pgperl/stable/,
version 2.1.1 being apparently the latest.  The complete driver was
about 3000 lines, judging by the commit that removed it.  Presumably we
don't need the whole of that.

Apparently the project was migrated from gborg to pgFoundry at some
point, because this exists
https://www.postgresql.org/ftp/projects/pgFoundry/pgperl/
and maybe they did some additional changes there, but at least
our FTP site doesn't show anything.  Perhaps there were never any
releases, and we don't have the CVSROOT.  But I doubt any changes at
that point would have been critical.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Add support for AT LOCAL

2023-10-18 Thread Tom Lane

Robert Haas  writes:
> On Tue, Oct 17, 2023 at 7:35 PM Tom Lane  wrote:
>> Discounting the Windows animals, it looks like the xlc animals are
>> our only remaining ones that use anything except gcc or clang.

> After some research I determined that the release date for xlc 12.1
> seems to be June 1, 2012. At that time, clang 3.1 was current and just
> after, GCC release version 4.7.1 was released. The oldest version of
> clang that I find in the buildfarm is 3.9, and the oldest version of
> gcc I find in the buildfarm is 4.6.3. So, somewhat to my surprise, xlc
> is not the oldest compiler that we're still supporting in the
> buildfarm. But it is very old, and it seems like that gcc and clang
> are going to continue to gain ground against gcc and other proprietary
> compilers for some time to come.

Probably.  Independent of that, it's fair to ask why we're still
testing against xlc 12.1 and not the considerably-more-recent xlclang,
or at least xlc 16.1.  (I also wonder why we're still testing AIX 7.1
rather than an OS version that's not EOL.)

> What does concern me is finding and coding
> around compiler bugs. 19fa977311b9da9c6c84f0108600e78213751a38 is just
> ridiculous, IMHO.

I would agree, except for the downthread discovery that the bug is
still present in current xlc and xlclang.  Short of blowing off AIX
altogether, it seems like we need to do something about it.

regards, tom lane

Re: pgsql: Generate automatically code and documentation related to wait ev

2023-10-18 Thread Christoph Berg

Re: Michael Paquier
> Generate automatically code and documentation related to wait events

Hi,

I'm not entirely sure this is the commit to blame, but it's certainly
close:

A Debian user is complaining that in PG17, the installed
/usr/include/postgresql/17/server/utils/wait_event.h file is
referencing utils/wait_event_types.h, but that file doesn't get
installed by the (autoconf) build sytem.

See
https://pgdgbuild.dus.dg-i.net/job/postgresql-17-binaries-snapshot/242/architecture=amd64,distribution=sid/consoleText
for a build log with file listings near the end.

Christoph

Re: Query execution in Perl TAP tests needs work

2023-10-18 Thread Robert Haas

On Wed, Oct 18, 2023 at 11:43 AM Tom Lane  wrote:
> Having said that ... I read the man page for FFI::Platypus,
> and I'm failing to draw a straight line between what it can do
> and what we need.  Aren't we going to need a big chunk of new
> Perl code anyway?  If so, why not write a big chunk of new Perl
> that doesn't have new external dependencies?

I think that the question here is what exactly the Perl code that we'd
have to write would be doing.

If we use FFI::Platypus, our Perl code can directly call libpq
functions. The Perl code would just be a wrapper around those function
calls, basically. I'm sure there's some work to be done there but a
lot of it is probably boilerplate.

Without FFI::Platypus, we have to write Perl code that can speak the
wire protocol directly. Basically, we're writing our own PostgreSQL
driver for Perl, though we might need only a subset of the things a
real driver would need to handle, and we might add some extra things,
like code that can send intentionally botched protocol messages.

I think it's a judgement call which is better. Depending on
FFI::Platypus is annoying, because nobody likes dependencies. But
writing a new implementation of the wire protocol is probably more
work, and once we wrote it, we'd also need to maintain it and debug it
and stuff. We would probably be able to gain some test coverage of
situations that libpq won't let you create, but we would also perhaps
lose some test coverage for libpq itself.

I feel like either way is a potentially viable way forward, and where
we end up might end up depending on who is willing to do the work and
what that person would prefer to do.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Query execution in Perl TAP tests needs work

2023-10-18 Thread Tom Lane

Robert Haas  writes:
> On Wed, Oct 18, 2023 at 10:28 AM Tom Lane  wrote:
>> I did a bit of research on this on my favorite platforms, and did
>> not like the results:

> Hmm. That's unfortunate. Is perl -MCPAN -e 'install Platypus::FFI' a
> viable alternative?

Probably, see my followup.

regards, tom lane

Re: Query execution in Perl TAP tests needs work

2023-10-18 Thread Tom Lane

=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Tom Lane  writes:
>> gfortran?   Really??  I mean, I don't care about the disk space,
>> but this is not promising for anyone who has to build it themselves.

> The Fortran support for FFI::Platypus is in a separate CPAN distribution
> (FFI-Platypus-Lang-Fortran), so that must be some quirk of the Fedora
> packaging and not a problem for people building it themselves.

Ah, they must've decided to combine FFI-Platypus-Lang-Fortran into the
same RPM.  Not quite as bad then, since clearly we don't need that.

Another thing to worry about is whether it runs on the oldest
perl versions we support.  I tried it on a 5.14.0 installation,
and it at least compiles and passes its self-test, so that's
promising.  "cpanm install FFI::Platypus" took about 5 minutes
(on a 2012 mac mini, not the world's fastest machine).  The
list of dependencies it pulled in is still kinda long:

Capture-Tiny-0.48
ExtUtils-ParseXS-3.51
Test-Simple-1.302195
constant-1.33
Scalar-List-Utils-1.63
Term-Table-0.017
Test2-Suite-0.000156
File-Which-1.27
FFI-CheckLib-0.31
Try-Tiny-0.31
Test-Fatal-0.017
Test-Needs-0.002010
Test-Warnings-0.032
URI-5.21
Algorithm-Diff-1.201
Text-Diff-1.45
Spiffy-0.46
Test-Base-0.89
Test-YAML-1.07
Test-Deep-1.204
YAML-1.30
File-chdir-0.1011
Path-Tiny-0.144
Alien-Build-2.80
Alien-Build-Plugin-Download-GitHub-0.10
Net-SSLeay-1.92
HTTP-Tiny-0.088
Mozilla-CA-20230821
IO-Socket-SSL-2.083
Mojo-DOM58-3.001
Alien-FFI-0.27
FFI-Platypus-2.08

A couple of these are things a buildfarm animal would need anyway,
but on the whole it seems like this is pretty far up the food chain
compared to our previous set of TAP dependencies (only three
non-core-Perl modules).

Still, writing our own equivalent is probably more work than it's
worth, if this is a suitable solution in all other respects.

Having said that ... I read the man page for FFI::Platypus,
and I'm failing to draw a straight line between what it can do
and what we need.  Aren't we going to need a big chunk of new
Perl code anyway?  If so, why not write a big chunk of new Perl
that doesn't have new external dependencies?

regards, tom lane

Re: Query execution in Perl TAP tests needs work

2023-10-18 Thread Robert Haas

On Wed, Oct 18, 2023 at 10:28 AM Tom Lane  wrote:
> I did a bit of research on this on my favorite platforms, and did
> not like the results:

Hmm. That's unfortunate. Is perl -MCPAN -e 'install Platypus::FFI' a
viable alternative?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add support for AT LOCAL

2023-10-18 Thread Robert Haas

On Tue, Oct 17, 2023 at 7:35 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Wed, Oct 18, 2023 at 11:54 AM Tom Lane  wrote:
> >> Should we be testing against xlclang instead?
>
> > I hesitated to suggest it because it's not my animal/time we're
> > talking about but it seems to make more sense.  It appears to be IBM's
> > answer to the nothing-builds-with-this-thing phenomenon, since it
> > accepts a lot of GCCisms via Clang's adoption of them.  From a quick
> > glance at [1], it lacks the atomics builtins but we have our own
> > assembler magic for POWER.  So maybe it'd all just work™.
>
> Discounting the Windows animals, it looks like the xlc animals are
> our only remaining ones that use anything except gcc or clang.
> That feels uncomfortably like a compiler monoculture to me, so
> I can understand the reasoning for keeping hornet/mandrill going.
> Still, maybe we should just accept the fact that gcc/clang have
> outcompeted everything else in the C compiler universe.  It's
> getting hard to imagine that anyone would bring out some new product
> that didn't try to be bug-compatible with gcc, for precisely the
> reason you mention.

After some research I determined that the release date for xlc 12.1
seems to be June 1, 2012. At that time, clang 3.1 was current and just
after, GCC release version 4.7.1 was released. The oldest version of
clang that I find in the buildfarm is 3.9, and the oldest version of
gcc I find in the buildfarm is 4.6.3. So, somewhat to my surprise, xlc
is not the oldest compiler that we're still supporting in the
buildfarm. But it is very old, and it seems like that gcc and clang
are going to continue to gain ground against gcc and other proprietary
compilers for some time to come. I think it's reasonable to ask
ourselves whether we really want to go to the trouble of maintaining
something that is likely to get so little real-world usage.

To be honest, I'm not particularly concerned about the need to adjust
compiler and linker options from time to time, even though I know that
probably annoys Andres. What does concern me is finding and coding
around compiler bugs. 19fa977311b9da9c6c84f0108600e78213751a38 is just
ridiculous, IMHO. If an end-of-life compiler for an end-of-life
operating system has bugs that mean that C code that's doing nothing
more than a bit of arithmetic isn't compiling properly, it's time to
pull the plug. Nor is this the first example of working around a bug
that only manifests in ancient xlc.

I think that, when there was more real diversity in the software
ecosystem, testing on a lot of platforms was a good way of finding out
whether you'd done something that was in general correct or just
something that happened to work on the machine you had in front of
you. But hornet and mandrill are not telling us about things we've
done that are incorrect in general yet happen to work on gcc and
clang. What they seem to be telling us about, in this case and some
others, are things that are CORRECT in general yet happen NOT to work
on ancient xlc. And that's an important difference, because if we were
finding real mistakes for which other platforms were not punishing us,
then we could hope that fixing those mistakes would improve
compatibility with other, equally niche platforms, potentially
including future platforms that haven't come along yet. As it is, it's
hard to believe that any work we put into this is going to have any
benefit on any system other than ancient AIX. If there are other niche
systems out there that have a similar number of bugs, they'll probably
be *different* bugs.

Sources for release dates:

https://www.ibm.com/support/pages/fix-list-xl-c-aix
https://releases.llvm.org/
https://gcc.gnu.org/releases.html

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_upgrade's interaction with pg_resetwal seems confusing

2023-10-18 Thread Alvaro Herrera

Note that this patch falsifies the comment in SetNextObjectId that
taking the lock is pro forma only -- it no longer is, since in upgrade
mode there can be multiple subprocesses running -- so I think it should
be updated.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: Query execution in Perl TAP tests needs work

2023-10-18 Thread Dagfinn Ilmari Mannsåker

Tom Lane  writes:

> Robert Haas  writes:
>> On Sat, Sep 2, 2023 at 2:42 PM Andrew Dunstan  wrote:
 How much burden is it? Would anyone actually mind?
>
>> ... At the same time, fallbacks can be a problem too,
>> because then you can end up with things that work one way on one
>> developer's machine (or BF machine) and another way on another
>> developer's machine (or BF machine) and it's not obvious that the
>> reason for the difference is that one machine is using a fallback and
>> the other is not.
>
> I agree with this worry.
>
>> In terms of what that faster and better thing might be, AFAICS, there
>> are two main options. First, we could proceed with the approach you've
>> tried here, namely requiring everybody to get Platypus::FFI. I find
>> that it's included in MacPorts on my machine, which is at least
>> somewhat of a good sign that perhaps this is fairly widely available.
>
> I did a bit of research on this on my favorite platforms, and did
> not like the results:
>
> RHEL8: does not seem to be packaged at all.
>
> Fedora 37: available, but the dependencies are a bit much:
>
> $ sudo yum install perl-FFI-Platypus
> Last metadata expiration check: 2:07:42 ago on Wed Oct 18 08:05:40 2023.
> Dependencies resolved.
> 
>  Package Architecture Version   Repository 
> Size
> 
> Installing:
>  perl-FFI-Platypus   x86_64   2.08-1.fc37   updates   417 
> k
> Installing dependencies:
>  libgfortran x86_64   12.3.1-1.fc37 updates   904 
> k
>  libquadmath x86_64   12.3.1-1.fc37 updates   206 
> k
>  libquadmath-devel   x86_64   12.3.1-1.fc37 updates48 
> k
>  perl-FFI-CheckLib   noarch   0.29-2.fc37   updates29 
> k
> Installing weak dependencies:
>  gcc-gfortranx86_64   12.3.1-1.fc37 updates12 
> M
>
> Transaction Summary
> 
> Install  6 Packages
>
> Total download size: 14 M
> Installed size: 37 M
> Is this ok [y/N]: 
>
> gfortran?   Really??  I mean, I don't care about the disk space,
> but this is not promising for anyone who has to build it themselves.

The Fortran support for FFI::Platypus is in a separate CPAN distribution
(FFI-Platypus-Lang-Fortran), so that must be some quirk of the Fedora
packaging and not a problem for people building it themselves.  They
just need libffi and a working Perl/CPAN setup.

On Debian the only things besides Perl and core perl modules it
(build-)depends on are libffi, Capture::Tiny, FFI::Checklib (which
depends on File::Which), Test2::Suite and pkg-config.

- ilmari

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Tomas Vondra

On 10/18/23 12:47, Ashutosh Bapat wrote:
> On Wed, Oct 18, 2023 at 1:55 AM Tomas Vondra
>  wrote:
>>
>> Hi,
>>
>> Here's a couple cleaned-up patches fixing the various discussed here.
>> I've tried to always add a regression test demonstrating the issue
>> first, and then fix it in the next patch.
> 
> It will be good to commit the test changes as well.
> 

I do plan to commit them, ofc. I was just explaining why I'm adding the
tests first, and then fixing the issue in a separate commit.

>>
>> In particular, this deals with these issues:
>>
>> 1) overflows in distance calculation for large timestamp values (0002)
> 
> I could reduce the SQL for timestamp overflow test to just
> -- test overflows during CREATE INDEX with extreme timestamp values
> CREATE TEMPORARY TABLE brin_timestamp_test(a TIMESTAMPTZ);
> 
> SET datestyle TO iso;
> 
> INSERT INTO brin_timestamp_test VALUES
> ('4713-01-01 00:00:30 BC'),
> ('294276-12-01 00:00:01');
> 
> CREATE INDEX ON brin_timestamp_test USING brin (a
> timestamptz_minmax_multi_ops) WITH (pages_per_range=1);
> 
> I didn't understand the purpose of adding 60 odd values to the table.
> It didn't tell which of those values triggers the overflow. Minimal
> set above is much easier to understand IMO. Using a temporary table
> just avoids DROP TABLE statement. But I am ok if you want to use
> non-temporary table with DROP.
> 
> Code changes in 0002 look fine. Do we want to add a comment "cast to a
> wider datatype to avoid overflow"? Or is that too explicit?
> 
> The code changes fix the timestamp issue but there's a diff in case of
> 

I did use that many values to actually force "compaction" and merging of
points into ranges. We only keep 32 values per page range, so with 2
values we'll not build a range. You're right it may still trigger the
overflow (we probably still calculate distances, I didn't realize that),
but without the compaction we can't check the query plans.

However, I agree 60 values may be a bit too much. And I realized we can
reduce the count quite a bit by using the values_per_range option. We
could set it to 8 (which is the minimum).

>>
>> 2) incorrect subtraction in distance for date values (0003)
> 
> The test case for date brin index didn't crash though. Even after
> applying 0003 patch. The reason why date subtraction can't overflow is
> a bit obscure. PostgreSQL doesn't allow dates beyond 4714-12-31 BC
> because of the code below
> #define IS_VALID_DATE(d) \
> ((DATETIME_MIN_JULIAN - POSTGRES_EPOCH_JDATE) <= (d) && \
> (d) < (DATE_END_JULIAN - POSTGRES_EPOCH_JDATE))
> This prevents the lower side to be well within the negative int32
> overflow threshold and we always subtract higher value from the lower
> one. May be good to elaborate this? A later patch does use float 8
> casting eliminating "any" possibility of overflow. So the comment may
> not be necessary after squashing all the changes.
> 

Not sure what you mean by "crash". Yes, it doesn't hit an assert,
because there's none when calculating distance for date. It however
should fail in the query plan check due to the incorrect order of
subtractions.

Also, the commit message does not claim to fix overflow. In fact, it
says it can't overflow ...

>>
>> 3) incorrect distance for infinite date/timestamp values (0005)
> 
> The tests could use a minimal set of rows here too.
> 
> The code changes look fine and fix the problem seen with the tests alone.
> 

OK

>>
>> 4) failing distance for extreme interval values (0007)
> 
> I could reproduce the issue with a minimal set of values
> -- test handling of overflow for interval values
> CREATE TABLE brin_interval_test(a INTERVAL);
> 
> INSERT INTO brin_interval_test VALUES
> ('17785 years'),
> ('-17800 years');
> 
> CREATE INDEX ON brin_interval_test USING brin (a
> interval_minmax_multi_ops) WITH (pages_per_range=1);
> DROP TABLE brin_interval_test;
> 
> The code looks fine and fixed the issue seen with the test.
> 
> We may want to combine various test cases though. Like the test adding
> infinity and extreme values could be combined. Also the number of
> values it inserts in the table for the reasons stated above.
> 

I prefer not to do that. I find it more comprehensible to keep the tests
separate / testing different things. If the tests were expensive to
setup or something like that, that'd be a different situation.

>>
>> All the problems except "2" have been discussed earlier, but this seems
>> a bit more serious than the other issues, as it's easier to hit. It
>> subtracts the values in the opposite order (smaller - larger), so the
>> distances are negated. Which means we actually merge the values from the
>> most distant ones, and thus are "guaranteed" to build very a very
>> inefficient summary. People with multi-minmax indexes on "date" columns
>> probably will need to reindex.
>>
> 
> Right. Do we highlight that in the commit message so that the person
> writing release notes picks it up from there?
> 

Yes, I think I'll mention

Re: The danger of deleting backup_label

2023-10-18 Thread David Steele


On 10/17/23 22:13, Kyotaro Horiguchi wrote:

At Tue, 17 Oct 2023 16:16:42 -0400, David Steele  wrote in

Given that the above can't be back patched, I'm thinking we don't need
backup_label at all going forward. We just write the values we need
for recovery into pg_control and return *that* from pg_backup_stop()
and tell the user to store it with their backup. We already have
"These files are vital to the backup working and must be written byte
for byte without modification, which may require opening the file in
binary mode." in the documentation so dealing with pg_control should
not be a problem. pg_control also has a CRC so we will know if it gets
munged.


I'm somewhat perplexed regarding the objective of this thread.

This thread began with the intent of preventing users from removing
the backup_label from a backup. At the beginning, the proposal aimed
to achieve this by injecting an invalid value to pg_control file
located in the generated backup. However, this (and previous) proposal
seems to deviate from that initial objective. It now eliminates the
need to be concerned about the pg_control version that is coped into
the generated backup. However, if someone removes the backup_label
from a backup, the initial concerns could still surface.


Yeah, the discussion has moved around quite a bit, but the goal remains 
the same, to make Postgres error when it does not have the information 
it needs to proceed with recovery. Right now if you delete backup_label 
recovery appears to complete successfully, silently corrupting the database.


In the proposal as it stands now there would be no backup_label at all, 
so no danger of removing it.


We have also gotten a bit sidetracked by our hope to use this proposal 
to address torn reads of pg_control during the backup, at least in HEAD.


Regards,
-David

Re: Rename backup_label to recovery_control

2023-10-18 Thread David Steele


On 10/18/23 03:07, Peter Eisentraut wrote:

On 16.10.23 17:15, David Steele wrote:

I also do wonder with recovery_control is really a better name. Maybe
I just have backup_label too firmly stuck in my head, but is what that
file does really best described as recovery control? I'm not so sure
about that.


The thing it does that describes it as "recovery control" in my view 
is that it contains the LSN where Postgres must start recovery (plus 
TLI, backup method, etc.). There is some other informational stuff in 
there, but the important fields are all about ensuring consistent 
recovery.


At the end of the day the entire point of backup *is* recovery and 
users will interact with this file primarily in recovery scenarios.


Maybe "restore" is better than "recovery", since recovery also happens 
separate from backups, but restoring is something you do with a backup 
(and there is also restore_command etc.).


I would not object to restore (there is restore_command) but I do think 
of what PostgreSQL does as "recovery" as opposed to "restore", which 
comes before the recovery. Recovery is used a lot in the docs and there 
is also recovery.signal.


But based on the discussion in [1] I think we might be able to do away 
with backup_label entirely, which would make this change moot.


Regards,
-David

[1] 
https://www.postgresql.org/message-id/0f948866-7caf-0759-d53c-93c3e266ec3f%40pgmasters.net

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Tomas Vondra




On 10/18/23 12:13, Dean Rasheed wrote:
> On Tue, 17 Oct 2023 at 21:25, Tomas Vondra
>  wrote:
>>
>> Here's a couple cleaned-up patches fixing the various discussed here.
>> I've tried to always add a regression test demonstrating the issue
>> first, and then fix it in the next patch.
>>
> 
> This looks good to me.
> 
>> 2) incorrect subtraction in distance for date values (0003)
>>
>> All the problems except "2" have been discussed earlier, but this seems
>> a bit more serious than the other issues, as it's easier to hit. It
>> subtracts the values in the opposite order (smaller - larger), so the
>> distances are negated. Which means we actually merge the values from the
>> most distant ones, and thus are "guaranteed" to build very a very
>> inefficient summary.
>>
> 
> Yeah, that's not good. Amusingly this accidentally made infinite dates
> behave correctly, since they were distance 0 away from anything else,
> which was larger than all the other negative distances! But yes, that
> needed fixing properly.
> 

Right. Apparently two wrongs can make a right, after all ;-)


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Query execution in Perl TAP tests needs work

2023-10-18 Thread Tom Lane

Robert Haas  writes:
> On Sat, Sep 2, 2023 at 2:42 PM Andrew Dunstan  wrote:
>>> How much burden is it? Would anyone actually mind?

> ... At the same time, fallbacks can be a problem too,
> because then you can end up with things that work one way on one
> developer's machine (or BF machine) and another way on another
> developer's machine (or BF machine) and it's not obvious that the
> reason for the difference is that one machine is using a fallback and
> the other is not.

I agree with this worry.

> In terms of what that faster and better thing might be, AFAICS, there
> are two main options. First, we could proceed with the approach you've
> tried here, namely requiring everybody to get Platypus::FFI. I find
> that it's included in MacPorts on my machine, which is at least
> somewhat of a good sign that perhaps this is fairly widely available.

I did a bit of research on this on my favorite platforms, and did
not like the results:

RHEL8: does not seem to be packaged at all.

Fedora 37: available, but the dependencies are a bit much:

$ sudo yum install perl-FFI-Platypus
Last metadata expiration check: 2:07:42 ago on Wed Oct 18 08:05:40 2023.
Dependencies resolved.

 Package Architecture Version   Repository Size

Installing:
 perl-FFI-Platypus   x86_64   2.08-1.fc37   updates   417 k
Installing dependencies:
 libgfortran x86_64   12.3.1-1.fc37 updates   904 k
 libquadmath x86_64   12.3.1-1.fc37 updates   206 k
 libquadmath-devel   x86_64   12.3.1-1.fc37 updates48 k
 perl-FFI-CheckLib   noarch   0.29-2.fc37   updates29 k
Installing weak dependencies:
 gcc-gfortranx86_64   12.3.1-1.fc37 updates12 M

Transaction Summary

Install  6 Packages

Total download size: 14 M
Installed size: 37 M
Is this ok [y/N]: 

gfortran?   Really??  I mean, I don't care about the disk space,
but this is not promising for anyone who has to build it themselves.

So I'm afraid that requiring Platypus::FFI might be a bridge too
far for a lot of our older buildfarm machines.

> Another thing, also already mentioned, that we can do is cache psql
> connections instead of continuously respawing psql.

This seems like it's worth thinking about.  I agree with requiring
the re-use to be explicit within a TAP test, else we might have
mysterious behavioral changes (akin to connection-pooler-induced
bugs).

regards, tom lane

Re: New WAL record to detect the checkpoint redo location

2023-10-18 Thread Robert Haas

On Tue, Oct 17, 2023 at 8:35 PM Michael Paquier  wrote:
> Suggestion is from here, with a test for pg_walinspect after it runs
> its online checkpoint (see the full-page case):
> https://www.postgresql.org/message-id/ZOvf1tu6rfL/b...@paquier.xyz
>
> +-- Check presence of REDO record.
> +SELECT redo_lsn FROM pg_control_checkpoint() \gset
> +SELECT start_lsn = :'redo_lsn'::pg_lsn AS same_lsn, record_type
> +  FROM pg_get_wal_record_info(:'redo_lsn');

I added a variant of this test case. Here's v10.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v10-0001-During-online-checkpoints-insert-XLOG_CHECKPOINT.patch
Description: Binary data

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Robert Haas

On Wed, Oct 18, 2023 at 3:21 AM Peter Eisentraut  wrote:
> On 18.10.23 06:40, David Rowley wrote:
> > I agree that it's not nice to add yet another way of breaking the
> > buildfarm and even more so when the committer did make check-world
> > before committing. We have --enable-tap-tests, we could have
> > --enable-indent-checks and have pgindent check the code is correctly
> > indented during make check-world. Then just not have
> > --enable-indent-checks in CI.
>
> This approach seems like a good improvement, even independent of
> everything else we might do about this.  Making it easier to use and
> less likely to be forgotten.  Also, this way, non-committer contributors
> can opt-in, if they want to earn bonus points.

Yeah. I'm not going to push anything that doesn't pass make
check-world, so this is appealing in that something that I'm already
doing would (or could be configured to) catch this problem also.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Query execution in Perl TAP tests needs work

2023-10-18 Thread Robert Haas

On Sat, Sep 2, 2023 at 2:42 PM Andrew Dunstan  wrote:
> I confess I'm a little reluctant to impose this burden on buildfarm owners. 
> We should think about some sort of fallback in case this isn't supported on 
> some platform, either due to technological barriers or buildfarm owner 
> reluctance.

How much burden is it? Would anyone actually mind?

I definitely don't want to put ourselves in a situation where we add a
bunch of annoying dependencies that are required to be able to run the
tests, not just because it will inconvenience buildfarm owners, but
also because it will potentially inconvenience developers, and in
particular, me. At the same time, fallbacks can be a problem too,
because then you can end up with things that work one way on one
developer's machine (or BF machine) and another way on another
developer's machine (or BF machine) and it's not obvious that the
reason for the difference is that one machine is using a fallback and
the other is not. I feel like this tends to create so much aggravation
in practice that it's best not to have fallbacks in this kind of
situation - my vote is that we either stick with the current method
and live with the performance characteristics thereof, or we put in
place something that is faster and better and that new thing becomes a
hard dependency for anyone who wants to be able to run the TAP tests.

In terms of what that faster and better thing might be, AFAICS, there
are two main options. First, we could proceed with the approach you've
tried here, namely requiring everybody to get Platypus::FFI. I find
that it's included in MacPorts on my machine, which is at least
somewhat of a good sign that perhaps this is fairly widely available.
That might need more investigation, though. Second, we could write a
pure-Perl implementation, as you proposed earlier. That would be more
work to write and maintain, but would avoid needing FFI. Personally, I
feel like either an FFI-based approach or a pure-Perl approach would
be pretty reasonable, as long as Platypus::FFI is widely
available/usable. If we go with pure Perl, the hard part might be
managing the authentication methods, but as Thomas pointed out to me
yesterday, we now have UNIX sockets on Windows, and thus everywhere,
so maybe we could get to a point where the pure-Perl implementation
wouldn't need to do any non-trivial authentication.

Another thing, also already mentioned, that we can do is cache psql
connections instead of continuously respawing psql. That doesn't
require any fundamentally new mechanism, and in some sense it's
independent of the approaches above, because they could be implemented
without caching connections, but they would benefit from caching
connections, as the currently psql-based approach also does. I think
it would be good to introduce new syntax for this, e.g.:

$conn_primary = $node_primary->connect();
$conn_primary->simple_query('whatever');
$conn_primary->simple_query('whatever 2');
$conn_primary->disconnect();

Something like this would require a fairly large amount of mechanical
work to implement across all of our TAP test cases, but I think it
would be effort well spent. If we try to introduce connection caching
"transparently," I think it will turn into another foot-gun that
people keep getting wrong because they don't realize there is magic
under the hood, or forget how it works.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

2023-10-18 Thread Alena Rybakina


Hi!

Thank you for your work on the subject.


I reviewed your patch and found that your commit message does not fully 
explain your code, in addition, I found several spelling mistakes.


I think it's better to change to:

With parallel seqscan, we should consider materializing the cheapest 
inner path in
case of nested loop if it doesn't contain a unique node or it is unsafe 
to use it in a subquery.



Besides, I couldn't understand why we again check that material path is 
safe?


if (matpath != NULL && matpath->parallel_safe)
        try_partial_nestloop_path(root, joinrel, outerpath, matpath,
                                  pathkeys, jointype, extra);

--
Regards,
Alena Rybakina

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Vik Fearing


On 10/17/23 16:23, Tom Lane wrote:
An alternative I was thinking about after reading your earlier email was 
going back to the status quo ante, but doing the manual tree-wide 
reindents significantly more often than once a year. Adding one at the 
conclusion of each commitfest would be a natural thing to do, for 
instance. It's hard to say what frequency would lead to the least 
rebasing pain, but we know once-a-year isn't ideal.



This is basically how the SQL Committee functions.  The Change Proposals 
(patches) submitted every meeting (commitfest) are always against the 
documents as they exist after the application of papers (commits) from 
the previous meeting.


One major difference is that Change Proposals are against the text, and 
patches are against the code.  It is not dissimilar to people saying 
what our documentation should say, and then someone implementing that 
change.


So I am in favor of a pgindent run *at least* at the end of each 
commitfest, giving a full month for patch authors to rebase before the 
next fest.

--
Vik Fearing

Re: The danger of deleting backup_label

2023-10-18 Thread Robert Haas

On Tue, Oct 17, 2023 at 4:17 PM David Steele  wrote:
> Given that the above can't be back patched, I'm thinking we don't need
> backup_label at all going forward. We just write the values we need for
> recovery into pg_control and return *that* from pg_backup_stop() and
> tell the user to store it with their backup. We already have "These
> files are vital to the backup working and must be written byte for byte
> without modification, which may require opening the file in binary
> mode." in the documentation so dealing with pg_control should not be a
> problem. pg_control also has a CRC so we will know if it gets munged.

Yeah, I was thinking about this kind of idea, too. I think it might be
a good idea, although I'm not completely certain about that, either.

On the positive side, you can't remove backup_label in error if
backup_label is not a thing. You certainly can't remove the control
file. You can, however, use the original control file instead of the
one that you were supposed to use. However, that is not really any
different from failing to write the backup_label into the backup
directory, which you can already do today. Also, it adds very little
net complexity to the low-level backup procedure. Instead of needing
to write the backup_label into the backup directory, you write the
control file -- but that's instead, not in addition. So overall it
seems like the complexity is similar to what we have today but one
possible mistake is eliminated.

Also on the positive side, I suspect we could remove a decent amount
of code for dealing with backup_label files. We wouldn't have to read
them any more (and the code to do that is pretty rough-and-ready) and
we wouldn't have to do different things based on whether the
backup_label exists or not. The logic in xlog*.c is extremely
complicated, and everything that we can do to reduce the number of
cases that need to be considered is not just good, but great.

But there are also some negatives.

First, anything that is stored in the backup_label but not the control
file has to (a) move into the control file, (b) be stored someplace
else, or (c) be eliminated as a concept. We're likely to get
complaints about (a), especially if the data in question is anything
big. Any proposal to do (b) risks undermining the whole theory under
which this is a good proposal, namely that removing backup_label gives
us one less thing to worry about. So that brings us to (c).
Personally, I would lose very little sleep if the LABEL field died and
never came back, and I wouldn't miss START TIME and STOP TIME either,
but somebody else might well feel differently. I don't think it's
trivial to get rid of BACKUP METHOD, as there unfortunately seems to
be code that depends on knowing the difference between BACKUP FROM:
streamed and BACKUP FROM: pg_rewind. I suspect that BACKUP FROM:
primary/standby might have the same issue, but I'm not sure. STOP
TIMELINE could be a problem too. I think that if somebody could do
some rejiggering to eliminate some of the differences between the
cases here, that could be really good general cleanup irrespective of
what we decide about this proposal, and moving some things in to
pg_control is probably reasonable too. For instance, it would seem
crazy to me to argue that storing the backup end location in the
control file is OK, but storing the backup end TLI there would not be
OK. But the point here is that there's probably a good deal of careful
thinking that would need to be done here about exactly where all of
the stuff that currently exists in the backup_label file but not in
pg_control needs to end up.

Second, right now, the stuff that we return at the end of a backup is
all text data. With this proposal, it becomes binary data. I entirely
realize that people should only be doing these kinds of backups using
automated tools that that those automated tools should be perfectly
capable of handling binary data without garbling anything. But that's
about as realistic as supposing that people won't instantly remove the
backup_label file the moment it seems like it will solve some problem,
even when the directions clearly state that this should only be done
in some other situation that is not the one the user is facing. It
just occurred to me that one thing we could do to improve the user
experience here is offer some kind of command-line utility to assist
with taking a low-level backup. This could be done even if we don't
proceed with this proposal e.g.

pg_llbackup -d $CONNTR --backup-label=PATH --tablespace-map=PATH
--copy-data-directory=SHELLCOMMAND

I don't know for sure how much that would help, but I wonder if it
might actually help quite a bit, because right now people do things
like use psql in a shell script to try to juggle a database connection
and then in some other part of the shell script do the data copying.
But it is quite easy to screw up the error handling or the psql
session lifetime or something like that, and this would maybe provide
a

Re: LLVM 16 (opaque pointers)

2023-10-18 Thread Thomas Munro

On Thu, Oct 19, 2023 at 12:02 AM Thomas Munro  wrote:
> I pushed the first patch, for LLVM 16, and the build farm told me that
> some old LLVM versions don't like it.  The problem seems to be the
> function LLVMGlobalGetValueType().  I can see that that was only added
> to the C API in 2018, so it looks like I may need to back-port that
> (trivial) wrapper into our own llvmjit_wrap.cpp for LLVM < 8.

Concretely something like the attached should probably fix it, but
it'll take me a little while to confirm that...


0001-Supply-missing-LLVMGlobalGetValueType-for-LLVM-8.patch
Description: Binary data

Re: More new SQL/JSON item methods

2023-10-18 Thread jian he

On Fri, Oct 6, 2023 at 7:47 PM Peter Eisentraut  wrote:
>
> On 29.08.23 09:05, Jeevan Chalke wrote:
> > v1-0001-Implement-jsonpath-.bigint-.integer-and-.number-m.patch
> >
> > This commit implements jsonpath .bigint(), .integer(), and .number()
> > methods.  The JSON string or a numeric value is converted to the
> > bigint, int4, and numeric type representation.
>
> A comment that applies to all of these: These add various keywords,
> switch cases, documentation entries in some order.  Are we happy with
> that?  Should we try to reorder all of that for better maintainability
> or readability?
>
> > v1-0002-Implement-.date-.time-.time_tz-.timestamp-and-.ti.patch
> >
> > This commit implements jsonpath .date(), .time(), .time_tz(),
> > .timestamp(), .timestamp_tz() methods.  The JSON string representing
> > a valid date/time is converted to the specific date or time type
> > representation.
> >
> > The changes use the infrastructure of the .datetime() method and
> > perform the datatype conversion as appropriate.  All these methods
> > accept no argument and use ISO datetime formats.
>
> These should accept an optional precision argument.  Did you plan to add
> that?

compiler warnings issue resolved.

I figured out how to use the precision argument.
But I don't know how to get the precision argument in the parse stage.

attached is my attempt to implement: select
jsonb_path_query('"2017-03-10 11:11:01.123"', '$.timestamp(2)');
not that familiar with src/backend/utils/adt/jsonpath_gram.y. imitate
decimal method failed. decimal has precision and scale two arguments.
here only one argument.

looking for hints.
diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c
index 42f325bd..460e43cb 100644
--- a/src/backend/utils/adt/jsonpath.c
+++ b/src/backend/utils/adt/jsonpath.c
@@ -1144,6 +1144,7 @@ jspGetLeftArg(JsonPathItem *v, JsonPathItem *a)
 		   v->type == jpiDiv ||
 		   v->type == jpiMod ||
 		   v->type == jpiDecimal ||
+		   v->type == jpiTimestamp ||
 		   v->type == jpiStartsWith);
 
 	jspInitByBuffer(a, v->base, v->content.args.left);
diff --git a/src/backend/utils/adt/jsonpath_exec.c b/src/backend/utils/adt/jsonpath_exec.c
index 94f1052d..4241837f 100644
--- a/src/backend/utils/adt/jsonpath_exec.c
+++ b/src/backend/utils/adt/jsonpath_exec.c
@@ -1096,7 +1096,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
 		case jpiStringFunc:
 			{
 JsonbValue	jbv;
-char	   *tmp;
+char	   *tmp = NULL;
 
 switch (JsonbType(jb))
 {
@@ -1159,6 +1159,7 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
 res = jperOk;
 
 jb = 
+Assert(tmp != NULL);	/* above switch case, covered all the case jbvType */
 jb->val.string.val = (jb->type == jbvString) ? tmp : pstrdup(tmp);
 jb->val.string.len = strlen(jb->val.string.val);
 jb->type = jbvString;
@@ -1400,9 +1401,9 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
 	/* cast string as number */
 	Datum		datum;
 	bool		noerr;
-	char	   *numstr = pnstrdup(jb->val.string.val,
-  jb->val.string.len);
 	ErrorSaveContext escontext = {T_ErrorSaveContext};
+	numstr		= pnstrdup(jb->val.string.val,
+			jb->val.string.len);
 
 	noerr = DirectInputFunctionCallSafe(numeric_in, numstr,
 		InvalidOid, -1,
@@ -1439,7 +1440,6 @@ executeItemOptUnwrapTarget(JsonPathExecContext *cxt, JsonPathItem *jsp,
  */
 if (jsp->type == jpiDecimal && jsp->content.args.left)
 {
-	JsonPathItem elem;
 	Datum		numdatum;
 	Datum		dtypmod;
 	int32		precision;
@@ -2443,6 +2443,7 @@ executeDateTimeMethod(JsonPathExecContext *cxt, JsonPathItem *jsp,
 			break;
 		case jpiTimestamp:
 			{
+Timestamp	tmp;
 /* Convert result type to timestamp without time zone */
 switch (typid)
 {
@@ -2468,6 +2469,9 @@ executeDateTimeMethod(JsonPathExecContext *cxt, JsonPathItem *jsp,
 }
 
 typid = TIMESTAMPOID;
+tmp = DatumGetTimestamp(value);
+AdjustTimestampForTypmod(, 2, NULL);
+value = TimestampGetDatum(tmp);
 			}
 			break;
 		case jpiTimestampTz:
diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index da031e35..ec188472 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -264,6 +264,18 @@ accessor_op:
 	 errmsg("invalid input syntax for type %s", "jsonpath"),
 	 errdetail(".decimal() can only have an optional precision[,scale].")));
 	}
+	| '.' TIMESTAMP_P '(' opt_csv_list ')'
+	{
+		if (list_length($4) == 0)
+			$$ = makeItemBinary(jpiTimestamp, NULL, NULL);
+		else if (list_length($4) == 1)
+			$$ = makeItemBinary(jpiTimestamp, linitial($4), NULL);
+		else
+			ereturn(escontext, false,
+	(errcode(ERRCODE_SYNTAX_ERROR),
+	 errmsg("invalid input syntax for type %s", "jsonpath"),
+	 errdetail(".jpiTimestamp() can only have an optional precision.")));
+	}
 	|

Re: LLVM 16 (opaque pointers)

2023-10-18 Thread Thomas Munro

I pushed the first patch, for LLVM 16, and the build farm told me that
some old LLVM versions don't like it.  The problem seems to be the
function LLVMGlobalGetValueType().  I can see that that was only added
to the C API in 2018, so it looks like I may need to back-port that
(trivial) wrapper into our own llvmjit_wrap.cpp for LLVM < 8.

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Dean Rasheed

On Wed, 18 Oct 2023 at 09:16, Tomas Vondra
 wrote:
>
> BTW when adding the tests with extreme values, I noticed this:
>
>   test=# select '5874897-01-01'::date;
>date
>   ---
>5874897-01-01
>   (1 row)
>
>   test=# select '5874897-01-01'::date + '1 second'::interval;
>   ERROR:  date out of range for timestamp
>

That's correct because date + interval returns timestamp, and the
value is out of range for a timestamp. This is equivalent to:

select '5874897-01-01'::date::timestamp + '1 second'::interval;
ERROR:  date out of range for timestamp

and I think it's good that it gives a different error from this:

select '294276-01-01'::date::timestamp + '1 year'::interval;
ERROR:  timestamp out of range

so you can tell that the overflow in the first case happens before the addition.

Of course a side effect of internally casting first is that you can't
do things like this:

select '5874897-01-01'::date - '5872897 years'::interval;
ERROR:  date out of range for timestamp

which arguably ought to return '2000-01-01 00:00:00'. In practice
though, I think it would be far more trouble than it's worth trying to
change that.

Regards,
Dean

Re: Synchronizing slots from primary to standby

2023-10-18 Thread Amit Kapila

On Tue, Oct 17, 2023 at 2:01 PM shveta malik  wrote:
>
> On Tue, Oct 17, 2023 at 12:44 PM Peter Smith  wrote:
> >
> > FYI - the latest patch failed to apply.
> >
> > [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> > ../patches_misc/v24-0001-Allow-logical-walsenders-to-wait-for-the-physica.patch
> > error: patch failed: src/include/utils/guc_hooks.h:160
> > error: src/include/utils/guc_hooks.h: patch does not apply
>
> Rebased v24. PFA.
>

Few comments:
==
1.
+List of physical replication slots that logical replication
with failover
+enabled waits for.

/logical replication/logical replication slots

2.
 If
+enable_syncslot is not enabled on the
+corresponding standbys, then it may result in indefinite waiting
+on the primary for physical replication slots configured in
+standby_slot_names
+   

Why the above leads to indefinite wait? I think we should just ignore
standby_slot_names and probably LOG a message in the server for the
same.

3.
+++ b/src/backend/replication/logical/tablesync.c
@@ -1412,7 +1412,8 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos)
  */
  walrcv_create_slot(LogRepWorkerWalRcvConn,
 slotname, false /* permanent */ , false /* two_phase */ ,
-CRS_USE_SNAPSHOT, origin_startpos);
+false /* enable_failover */ , CRS_USE_SNAPSHOT,
+origin_startpos);

As per this code, we won't enable failover for tablesync slots. So,
what happens if we need to failover to new node after the tablesync
worker has reached SUBREL_STATE_FINISHEDCOPY or SUBREL_STATE_DATASYNC?
I think we won't be able to continue replication from failed over
node. If this theory is correct, we have two options (a) enable
failover for sync slots as well, if it is enabled for main slot; but
then after we drop the slot on primary once sync is complete, same
needs to be taken care at standby. (b) enable failover even for the
main slot after all tables are in ready state, something similar to
what we do for two_phase.

4.
+ /* Verify syntax */
+ if (!validate_slot_names(newval, ))
+ return false;
+
+ /* Now verify if these really exist and have correct type */
+ if (!validate_standby_slots(elemlist))

These two functions serve quite similar functionality which makes
their naming quite confusing. Can we directly move the functionality
of validate_slot_names() into validate_standby_slots()?

5.
+SlotSyncInitConfig(void)
+{
+ char*rawname;
+
+ /* Free the old one */
+ list_free(standby_slot_names_list);
+ standby_slot_names_list = NIL;
+
+ if (strcmp(standby_slot_names, "") != 0)
+ {
+ rawname = pstrdup(standby_slot_names);
+ SplitIdentifierString(rawname, ',', _slot_names_list);

How does this handle the case where '*' is specified for standby_slot_names?

-- 
With Regards,
Amit Kapila.

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Ashutosh Bapat

On Wed, Oct 18, 2023 at 1:55 AM Tomas Vondra
 wrote:
>
> Hi,
>
> Here's a couple cleaned-up patches fixing the various discussed here.
> I've tried to always add a regression test demonstrating the issue
> first, and then fix it in the next patch.

It will be good to commit the test changes as well.

>
> In particular, this deals with these issues:
>
> 1) overflows in distance calculation for large timestamp values (0002)

I could reduce the SQL for timestamp overflow test to just
-- test overflows during CREATE INDEX with extreme timestamp values
CREATE TEMPORARY TABLE brin_timestamp_test(a TIMESTAMPTZ);

SET datestyle TO iso;

INSERT INTO brin_timestamp_test VALUES
('4713-01-01 00:00:30 BC'),
('294276-12-01 00:00:01');

CREATE INDEX ON brin_timestamp_test USING brin (a
timestamptz_minmax_multi_ops) WITH (pages_per_range=1);

I didn't understand the purpose of adding 60 odd values to the table.
It didn't tell which of those values triggers the overflow. Minimal
set above is much easier to understand IMO. Using a temporary table
just avoids DROP TABLE statement. But I am ok if you want to use
non-temporary table with DROP.

Code changes in 0002 look fine. Do we want to add a comment "cast to a
wider datatype to avoid overflow"? Or is that too explicit?

The code changes fix the timestamp issue but there's a diff in case of

>
> 2) incorrect subtraction in distance for date values (0003)

The test case for date brin index didn't crash though. Even after
applying 0003 patch. The reason why date subtraction can't overflow is
a bit obscure. PostgreSQL doesn't allow dates beyond 4714-12-31 BC
because of the code below
#define IS_VALID_DATE(d) \
((DATETIME_MIN_JULIAN - POSTGRES_EPOCH_JDATE) <= (d) && \
(d) < (DATE_END_JULIAN - POSTGRES_EPOCH_JDATE))
This prevents the lower side to be well within the negative int32
overflow threshold and we always subtract higher value from the lower
one. May be good to elaborate this? A later patch does use float 8
casting eliminating "any" possibility of overflow. So the comment may
not be necessary after squashing all the changes.

>
> 3) incorrect distance for infinite date/timestamp values (0005)

The tests could use a minimal set of rows here too.

The code changes look fine and fix the problem seen with the tests alone.

>
> 4) failing distance for extreme interval values (0007)

I could reproduce the issue with a minimal set of values
-- test handling of overflow for interval values
CREATE TABLE brin_interval_test(a INTERVAL);

INSERT INTO brin_interval_test VALUES
('17785 years'),
('-17800 years');

CREATE INDEX ON brin_interval_test USING brin (a
interval_minmax_multi_ops) WITH (pages_per_range=1);
DROP TABLE brin_interval_test;

The code looks fine and fixed the issue seen with the test.

We may want to combine various test cases though. Like the test adding
infinity and extreme values could be combined. Also the number of
values it inserts in the table for the reasons stated above.

>
> All the problems except "2" have been discussed earlier, but this seems
> a bit more serious than the other issues, as it's easier to hit. It
> subtracts the values in the opposite order (smaller - larger), so the
> distances are negated. Which means we actually merge the values from the
> most distant ones, and thus are "guaranteed" to build very a very
> inefficient summary. People with multi-minmax indexes on "date" columns
> probably will need to reindex.
>

Right. Do we highlight that in the commit message so that the person
writing release notes picks it up from there?

--
Best Wishes,
Ashutosh Bapat

Re: On login trigger: take three

2023-10-18 Thread Mikhail Gribkov

Hi Alexander,

Sorry for my long offline and thanks for the activity. So should we close
the patch on the commitfest page now?

By the way I had one more issue with the login trigger tests (quite a rare
one though). A race condition may occur on some systems, when oidjoins test
starts a moment later than normally and affects logins count for on-login
trigger test. Thus I had to split event_trigger and oidjoins tests into
separate parallel groups. I'll post this as an independent patch then.

--
 best regards,
Mikhail A. Gribkov

e-mail: youzh...@gmail.com
*http://www.flickr.com/photos/youzhick/albums
*
http://www.strava.com/athletes/5085772
phone: +7(916)604-71-12
Telegram: @youzhick

On Mon, Oct 16, 2023 at 4:05 AM Alexander Korotkov 
wrote:

> On Mon, Oct 16, 2023 at 4:00 AM Michael Paquier 
> wrote:
> > On Mon, Oct 16, 2023 at 02:47:03AM +0300, Alexander Korotkov wrote:
> > > The attached revision fixes test failures spotted by
> > > commitfest.cputube.org.  Also, perl scripts passed perltidy.
> >
> > Still you've missed a few things.  At quick glance:
> > - The code indentation was off a bit in event_trigger.c.
> > - 005_login_trigger.pl fails if the code is compiled with
> > ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS because a WARNING is
> > reported in test "create tmp objects: err equals".
> > - 005_sspi.pl is older than the new test 005_login_trigger.pl, could
> > you rename it with a different number?
>
> You are very fast and sharp eye!
> Thank you for fixing the indentation.  I just pushed fixes for the rest.
>
> --
> Regards,
> Alexander Korotkov
>

Re: Synchronizing slots from primary to standby

2023-10-18 Thread shveta malik

On Wed, Oct 18, 2023 at 10:20 AM Amit Kapila  wrote:
>
> On Tue, Oct 17, 2023 at 9:06 PM Drouvot, Bertrand
>  wrote:
> >
> > On 10/13/23 10:35 AM, shveta malik wrote:
> > > On Thu, Oct 12, 2023 at 9:18 AM shveta malik  
> > > wrote:
> > >>
> >
> > Code:
> >
> > +   True if this logical slot is enabled to be synced to the physical 
> > standbys
> > +   so that logical replication is not blocked after failover. Always 
> > false
> > +   for physical slots.
> >
> > Not sure "not blocked" is the right wording. "can be resumed from the new 
> > primary" maybe?
> >
>
> Yeah, your proposed wording sounds better. Also, I think we should
> document the impact of not doing so because I think the replication
> can continue after failover but it may lead to data inconsistency.
>
> BTW, I noticed that the code for Create Subscription is updated but
> not the corresponding docs. By looking at other parameters like
> password_required, streaming, two_phase where true or false indicates
> whether that option is enabled or not, I am thinking about whether
> enable_failover is an appropriate name for this option. The other
> option name that comes to mind is 'failover' where true indicates that
> the corresponding subscription will be enabled for failover. What do
> you think?

+1.  'failover' seems more in sync with other options' names.

> --
> With Regards,
> Amit Kapila.

Re: On login trigger: take three

2023-10-18 Thread Alexander Korotkov

Hi, Mikhail.

On Wed, Oct 18, 2023 at 1:30 PM Mikhail Gribkov  wrote:
> Sorry for my long offline and thanks for the activity. So should we close the 
> patch on the commitfest page now?

I have just done this.

> By the way I had one more issue with the login trigger tests (quite a rare 
> one though). A race condition may occur on some systems, when oidjoins test 
> starts a moment later than normally and affects logins count for on-login 
> trigger test. Thus I had to split event_trigger and oidjoins tests into 
> separate parallel groups. I'll post this as an independent patch then.

Thank you for catching it.  Please, post this.

--
Regards,
Alexander Korotkov

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Dean Rasheed

On Tue, 17 Oct 2023 at 21:25, Tomas Vondra
 wrote:
>
> Here's a couple cleaned-up patches fixing the various discussed here.
> I've tried to always add a regression test demonstrating the issue
> first, and then fix it in the next patch.
>

This looks good to me.

> 2) incorrect subtraction in distance for date values (0003)
>
> All the problems except "2" have been discussed earlier, but this seems
> a bit more serious than the other issues, as it's easier to hit. It
> subtracts the values in the opposite order (smaller - larger), so the
> distances are negated. Which means we actually merge the values from the
> most distant ones, and thus are "guaranteed" to build very a very
> inefficient summary.
>

Yeah, that's not good. Amusingly this accidentally made infinite dates
behave correctly, since they were distance 0 away from anything else,
which was larger than all the other negative distances! But yes, that
needed fixing properly.

Thanks for taking care of this.

Regards,
Dean

Re: RFC: Logging plan of the running query

2023-10-18 Thread Ashutosh Bapat

On Wed, Oct 18, 2023 at 11:39 AM torikoshia  wrote:
> >
> > I am more interested in avoiding the duplication of code, esp. the
> > first comment in my reply
>
> If there are no objections, I will try porting it to auto_explain and
> see its feasibility.

If you want it in core, you will need to port relevant parts of
auto_explain code to core.

-- 
Best Wishes,
Ashutosh Bapat

Re: Asymmetric partition-wise JOIN

2023-10-18 Thread Ashutosh Bapat

On Wed, Oct 18, 2023 at 10:55 AM Andrei Lepikhov
 wrote:
>
> > But the clauses of A parameterized by P will produce different
> > translations for each of the partitions. I think we will need
> > different RelOptInfos (for A) to store these translations.
>
> Does the answer above resolved this issue?

May be. There are other problematic areas like EvalPlanQual, Rescans,
reparameterised paths which can blow up if we use the same RelOptInfo
for different scans of the same relation. It will be good to test
those. And also A need not be a simple relation; it could be join as
well.

>
> > The relid is also used to track the scans at executor level. Since we
> > have so many scans on A, each may be using different plan, we will
> > need different ids for those.
>
> I don't understand this sentence. Which way executor uses this index of
> RelOptInfo ?

See Scan::scanrelid

-- 
Best Wishes,
Ashutosh Bapat

Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-10-18 Thread Andrei Lepikhov

On 18/10/2023 13:39, Richard Guo wrote:

On Fri, Oct 13, 2023 at 6:18 PM Andrei Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:

On 23/8/2023 12:37, Richard Guo wrote:
 > To fix it we may need to modify RelOptInfos for Path, BitmapHeapPath,
 > ForeignPath and CustomPath, and modify IndexOptInfos for
IndexPath.  It
 > seems that that is not easily done without postponing
reparameterization
 > of paths until createplan.c.
 >
 > Attached is a patch which is v5 + fix for this new issue.

Having looked into the patch for a while, I couldn't answer to myself
for some stupid questions:

Thanks for reviewing this patch!  I think these are great questions.

1. If we postpone parameterization until the plan creation, why do we
still copy the path node in the FLAT_COPY_PATH macros? Do we really
need it?

Good point.  The NestPath's origin inner path should not be referenced
any more after the reparameterization, so it seems safe to adjust the
path itself, without the need of a flat-copy.  I've done that in v8
patch.

2. I see big switches on path nodes. May it be time to create a
path_walker function? I recall some thread where such a suggestion was
declined, but I don't remember why.

I'm not sure.  But this seems a separate topic, so maybe it's better to
discuss it separately?

Agree.

3. Clause changing is postponed. Why does it not influence the
calc_joinrel_size_estimate? We will use statistics on the parent table
here. Am I wrong?

Hmm, I think the reparameterization does not change the estimated
size/costs.  Quoting the comment

Agree. I have looked at the code and figured it out - you're right. But 
it seems strange: maybe I don't understand something. Why not estimate 
selectivity for parameterized clauses based on leaf partition statistic, 
not the parent one?

--
regards,
Andrey Lepikhov
Postgres Professional

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread David Rowley

On Wed, 18 Oct 2023 at 22:07, Jelte Fennema  wrote:
> But based on the contents of the fixup commits a commonality seems to
> be that the fixup only fixes a few lines, quite often touching only
> comments. So it seems like the main reason for breaking koel is
> forgetting to re-run pgindent after some final cleanup/wording
> changes/typo fixes. And that seems like an expected flaw of being
> human instead of a robot, which can only be worked around with better
> automation.

I wonder if you might just be assuming these were caused by
last-minute comment adjustments. I may have missed something on the
thread, but it could be that, or it could be due to the fact that
pgindent just simply does more adjustments to comments than it does
with code lines.

On Wed, 18 Oct 2023 at 06:40, David Rowley  wrote:
> > I agree that it's not nice to add yet another way of breaking the
> > buildfarm and even more so when the committer did make check-world
> > before committing. We have --enable-tap-tests, we could have
> > --enable-indent-checks and have pgindent check the code is correctly
> > indented during make check-world. Then just not have
> > --enable-indent-checks in CI.
>
> I think --enable-indent-checks sounds like a good improvement to the
> status quo. But I'm not confident that it will help remove the cases
> where only a comment needs to be re-indented. Do commiters really
> always run check-world again when only changing a typo in a comment? I
> know I probably wouldn't (or at least not always).

I can't speak for others, but I always make edits in a dev branch and
do "make check-world" before doing "git format-patch" before I "git
am" that patch into a clean repo.  Before I push, I'll always run
"make check-world" again as sometimes master might have moved on a few
commits from where the dev branch was taken (perhaps I need to update
the expected output of some newly added EXPLAIN tests if say doing a
planner adjustment).  I personally never adjust any code or comments
after the git am. I only sometimes adjust the commit message.

So, in theory at least, if --enable-indent-checks existed and I used
it, I shouldn't break koel...  let's see if I just jinxed myself.

It would be good to learn how many of the committers out of the ones
you listed that --enable-indent-checks would have saved from breaking
koel.

David

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thank you for reviewing! New patch is available in [1].

> ==
> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> 
> 1.
> +# Set max_wal_senders to a lower value if the old cluster is prior to PG12.
> +# Such clusters regard max_wal_senders as part of max_connections, but the
> +# current TAP tester sets these GUCs to the same value.
> +if ($old_publisher->pg_version < 12)
> +{
> + $old_publisher->append_conf('postgresql.conf', "max_wal_senders = 5");
> +}
> 
> 1a.
> I was initially unsure what the above comment meant -- thanks for the
> offline explanation.
> 
> SUGGESTION
> The TAP Cluster.pm assigns default 'max_wal_senders' and
> 'max_connections' to the same value (10) but PG12 and prior considered
> max_walsenders as a subset of max_connections, so setting the same
> value will fail.

Fixed.

> 1b.
> I also felt it is better to explicitly set both values in the < PG12
> configuration because otherwise, you are still assuming knowledge that
> the TAP default max_connections is 10.
> 
> SUGGESTION
> $old_publisher->append_conf('postgresql.conf', qq{
> max_wal_senders = 5
> max_connections = 10
> });

Fixed.

> 2.
> +# Switch workloads depend on the major version of the old cluster.  Upgrading
> +# logical replication slots has been supported since PG17.
> +if ($old_publisher->pg_version <= 16)
> +{
> + test_for_16_and_prior($old_publisher, $new_publisher, $mode);
> +}
> +else
> +{
> + test_for_17_and_later($old_publisher, $new_publisher, $mode);
> +}
> 
> IMO it is less confusing to have fewer version numbers floating around
> in comments and names and code. So instead of referring to 16 and 17,
> how about just referring to 17 everywhere?
> 
> For example
> 
> SUGGESTION
> # Test according to the major version of the old cluster.
> # Upgrading logical replication slots has been supported only since PG17.
> 
> if ($old_publisher->pg_version >= 17)
> {
>   test_upgrade_from_PG17_and_later($old_publisher, $new_publisher, $mode);
> }
> else
> {
>   test_upgrade_from_pre_PG17($old_publisher, $new_publisher, $mode);
> }

In HEAD code, the pg_version seems "17devel". The string seemed smaller than 17 
for Perl.
(i.e., "17devel" >= 17 means false)
For the purpose of comparing only the major version, pg_version->major was used.

Also, I removed the support for ~PG9.4. I cannot find descriptions, but 
according to [2],
Cluster.pm does not support such binaries.
(cluster_name is set when the server process is started, but the GUC has been 
added in PG9.5)

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB5870EBEBC89F5224F6B3788CF5D5A%40TYCPR01MB5870.jpnprd01.prod.outlook.com
[2]: https://www.postgresql.org/message-id/YsUrUDrRhUbuU/6k%40paquier.xyz

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread Hayato Kuroda (Fujitsu)

Dear Peter,

Thank you for reviewing! PSA new version.
Note that 0001 and 0002 are combined into one patch.

> Here are some review comments for v51-0001
> 
> ==
> src/bin/pg_upgrade/check.c
> 
> 0.
> +check_old_cluster_for_valid_slots(bool live_check)
> +{
> + char output_path[MAXPGPATH];
> + FILE*script = NULL;
> +
> + prep_status("Checking for valid logical replication slots");
> +
> + snprintf(output_path, sizeof(output_path), "%s/%s",
> + log_opts.basedir,
> + "invalid_logical_relication_slots.txt");
> 
> 0a
> typo /invalid_logical_relication_slots/invalid_logical_replication_slots/

Fixed.

> 0b.
> Since the non-upgradable slots are not strictly "invalid", is this an
> appropriate filename for the bad ones?
> 
> But I don't have very good alternatives. Maybe:
> - non_upgradable_logical_replication_slots.txt
> - problem_logical_replication_slots.txt

Per discussion [1], I kept current style.

> src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl
> 
> 1.
> +# --
> +# TEST: Confirm pg_upgrade fails when wrong GUC is set on new cluster
> +#
> +# There are two requirements for GUCs - wal_level and max_replication_slots,
> +# but only max_replication_slots will be tested here. This is because to
> +# reduce the execution time of the test.
> 
> 
> SUGGESTION
> # TEST: Confirm pg_upgrade fails when the new cluster has wrong GUC values.
> #
> # Two GUCs are required - 'wal_level' and 'max_replication_slots' - but to
> # reduce the test execution time, only 'max_replication_slots' is tested here.

First part was fixed. Second part was removed per [1].

> 2.
> +# Preparations for the subsequent test:
> +# 1. Create two slots on the old cluster
> +$old_publisher->start;
> +$old_publisher->safe_psql('postgres',
> + "SELECT pg_create_logical_replication_slot('test_slot1',
> 'test_decoding', false, true);"
> +);
> +$old_publisher->safe_psql('postgres',
> + "SELECT pg_create_logical_replication_slot('test_slot2',
> 'test_decoding', false, true);"
> +);
> 
> 
> Can't you combine those SQL in the same $old_publisher->safe_psql.

Combined.

> 3.
> +# Clean up
> +rmtree($new_publisher->data_dir . "/pg_upgrade_output.d");
> +# Set max_replication_slots to the same value as the number of slots. Both of
> +# slots will be used for subsequent tests.
> +$new_publisher->append_conf('postgresql.conf', "max_replication_slots = 1");
> 
> The code doesn't seem to match the comment - is this correct? The
> old_publisher created 2 slots, so why are you setting new_publisher
> "max_replication_slots = 1" again?

Fixed to "max_replication_slots = 2" Note that previous test worked well because
GUC checking on new cluster is done after checking the status of slots.

> 4.
> +# Preparations for the subsequent test:
> +# 1. Generate extra WAL records. Because these WAL records do not get
> consumed
> +# it will cause the upcoming pg_upgrade test to fail.
> +$old_publisher->start;
> +$old_publisher->safe_psql('postgres',
> + "CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;");
> +
> +# 2. Advance the slot test_slot2 up to the current WAL location
> +$old_publisher->safe_psql('postgres',
> + "SELECT pg_replication_slot_advance('test_slot2', NULL);");
> +
> +# 3. Emit a non-transactional message. test_slot2 detects the message so that
> +# this slot will be also reported by upcoming pg_upgrade.
> +$old_publisher->safe_psql('postgres',
> + "SELECT count(*) FROM pg_logical_emit_message('false', 'prefix',
> 'This is a non-transactional message');"
> +);
> 
> 
> I felt this test would be clearer if you emphasised the state of the
> test_slot1 also. e.g.
> 
> 4a.
> BEFORE
> +# 1. Generate extra WAL records. Because these WAL records do not get
> consumed
> +# it will cause the upcoming pg_upgrade test to fail.
> 
> SUGGESTION
> # 1. Generate extra WAL records. At this point neither test_slot1 nor 
> test_slot2
> #has consumed them.

Fixed.

> 4b.
> BEFORE
> +# 2. Advance the slot test_slot2 up to the current WAL location
> 
> SUGGESTION
> # 2. Advance the slot test_slot2 up to the current WAL location, but 
> test_slot2
> #still has unconsumed WAL records.

IIUC, test_slot2 is caught up by pg_replication_slot_advance('test_slot2'). I 
think 
"but test_slot1 still has unconsumed WAL records." is appropriate. Fixed.

> 5.
> +# pg_upgrade will fail because the slot still has unconsumed WAL records
> +command_checks_all(
> 
> /because the slot still has/because there are slots still having/

Fixed.

> 6.
> + [qr//],
> + 'run of pg_upgrade of old cluster with slot having unconsumed WAL records'
> +);
> 
> /slot/slots/

Fixed.

> 7.
> +# And check the content. Both of slots must be reported that they have
> +# unconsumed WALs after confirmed_flush_lsn.
> 
> SUGGESTION
> # Check the file content. Both slots should be reporting that they have
> # unconsumed WAL records.

Fixed.

> 
> 8.
> +# Preparations for the subsequent test:
> +# 1. Setup logical replication
> +my $old_connstr =

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Jelte Fennema

On Wed, 18 Oct 2023 at 06:40, David Rowley  wrote:
> How many of the committers who have broken koel are repeat offenders?

I just checked the commits and there don't seem to be real repeat
offenders. The maximum number of times someone broke koel since its
inception is two. That was the case for only two people. The other 8
people only caused one breakage.

> What is their opinion on this?
> Did they just forget once or do they hate the process and want to go back?

The commiters that broke koel since its inception are:
- Alexander Korotkov
- Amit Kapila
- Amit Langote
- Andres Freund
- Etsuro Fujita
- Jeff Davis
- Michael Paquier
- Peter Eisentraut
- Tatsuo Ishii
- Tomas Vondra

I included all of them in the To field of this message, in the hope
that they share their viewpoint. Because otherwise it stays guessing
what they think.

But based on the contents of the fixup commits a commonality seems to
be that the fixup only fixes a few lines, quite often touching only
comments. So it seems like the main reason for breaking koel is
forgetting to re-run pgindent after some final cleanup/wording
changes/typo fixes. And that seems like an expected flaw of being
human instead of a robot, which can only be worked around with better
automation.

> I agree that it's not nice to add yet another way of breaking the
> buildfarm and even more so when the committer did make check-world
> before committing. We have --enable-tap-tests, we could have
> --enable-indent-checks and have pgindent check the code is correctly
> indented during make check-world. Then just not have
> --enable-indent-checks in CI.

I think --enable-indent-checks sounds like a good improvement to the
status quo. But I'm not confident that it will help remove the cases
where only a comment needs to be re-indented. Do commiters really
always run check-world again when only changing a typo in a comment? I
know I probably wouldn't (or at least not always).

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Jelte Fennema

On Tue, 17 Oct 2023 at 23:01, Magnus Hagander  wrote:
> And unless we're only enforcing it on master, we'd also need to make
> provisions for different versions of it on different branches, I
> think?

Only enforcing on master sounds fine to me, that's what koel is doing
too afaik. In practice this seems to be enough to solve my main issue
of having to manually remove unrelated indents when rebasing my
patches. Enforcing on different branches seems like it would add a lot
of complexity. So I'm not sure that's worth doing at this point, since
currently some committers are proposing to stop enforcing continuous
indentation because of problems with the current flow. I think we
should only enforce it on more branches once we have the flow
mastered.

> It does need a lot more
> sandboxing than what's in there now, but that's not too hard of a
> problem to solve, *if* this is what we want.

Yeah, I didn't bother with that. Since that seems very tightly coupled
with the environment that the git server is running, and I have no
clue what that environment is.

Re: BRIN minmax multi - incorrect distance for infinite timestamp/date

2023-10-18 Thread Tomas Vondra

On 10/17/23 22:25, Tomas Vondra wrote:
> Hi,
> 
> Here's a couple cleaned-up patches fixing the various discussed here.
> I've tried to always add a regression test demonstrating the issue
> first, and then fix it in the next patch.
> 
> In particular, this deals with these issues:
> 
> 1) overflows in distance calculation for large timestamp values (0002)
> 
> 2) incorrect subtraction in distance for date values (0003)
> 
> 3) incorrect distance for infinite date/timestamp values (0005)
> 
> 4) failing distance for extreme interval values (0007)
> 
> All the problems except "2" have been discussed earlier, but this seems
> a bit more serious than the other issues, as it's easier to hit. It
> subtracts the values in the opposite order (smaller - larger), so the
> distances are negated. Which means we actually merge the values from the
> most distant ones, and thus are "guaranteed" to build very a very
> inefficient summary. People with multi-minmax indexes on "date" columns
> probably will need to reindex.
> 

BTW when adding the tests with extreme values, I noticed this:

  test=# select '5874897-01-01'::date;
   date
  ---
   5874897-01-01
  (1 row)

  test=# select '5874897-01-01'::date + '1 second'::interval;
  ERROR:  date out of range for timestamp

IIUC this happens because the first thing date_pl_interval does is
date2timestamp, ignoring the fact that the ranges of those data types
are different - dates allow values up to '5874897 AD' while timestamps
only allows values up to '294276 AD'.

This seems to be a long-standing behavior, added by a9e08392dd6f in
2004. Not sure how serious it is, I just noticed when I tried to do
arithmetics on the extreme values in tests.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Fix one hint message in 002_pg_upgrade.pl

2023-10-18 Thread Michael Paquier

On Wed, Oct 18, 2023 at 07:27:45AM +, Zhijie Hou (Fujitsu) wrote:
> The test is to confirm the output file has been removed for pg_upgrade 
> --check while
> the message here is not consistent. Attach a small patch to fix it.

Indeed, will fix.  Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Magnus Hagander

On Tue, Oct 17, 2023 at 11:07 PM Tom Lane  wrote:
>
> Magnus Hagander  writes:
> > If it doesn't know how to rebuild it, aren't we going to be stuck in a
> > catch-22 if we need to change it in certain ways? Since an old version
> > of pg_bsd_indent would reject the patch that might include updating
> > it. (And when it does, one should expect the push to take quite a long
> > time, but given the infrequency I agree that part is probably not an
> > issue)
>
> Everyone who has proposed this has included a caveat that there must
> be a way to override the check.  Given that minimal expectation, it
> shouldn't be too hard to deal with pg_bsd_indent-updating commits.

I haven't managed to fully keep up with the thread, so I missed that.
And i can't directly find it looking back either - but as long as
three's an actual idea for how to do that, the problem goes away :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Fix one hint message in 002_pg_upgrade.pl

2023-10-18 Thread Zhijie Hou (Fujitsu)

Hi,

There is one hint message in 002_pg_upgrade.pl that is not consistent with the
testing purpose.

# --check command works here, cleans up pg_upgrade_output.d.
command_ok(
[
'pg_upgrade', '--no-sync', '-d', $oldnode->data_dir,
...
ok(!-d $newnode->data_dir . "/pg_upgrade_output.d",
-   "pg_upgrade_output.d/ not removed after pg_upgrade --check success");
+   "pg_upgrade_output.d/ removed after pg_upgrade --check success");

The test is to confirm the output file has been removed for pg_upgrade --check 
while
the message here is not consistent. Attach a small patch to fix it.

Best Regards,
Hou Zhijie



0001-Fix-one-hint-message-in-002_pg_upgrade.pl.patch
Description: 0001-Fix-one-hint-message-in-002_pg_upgrade.pl.patch

Re: run pgindent on a regular basis / scripted manner

2023-10-18 Thread Peter Eisentraut


On 18.10.23 06:40, David Rowley wrote:
I agree that it's not nice to add yet another way of breaking the 
buildfarm and even more so when the committer did make check-world 
before committing. We have --enable-tap-tests, we could have 
--enable-indent-checks and have pgindent check the code is correctly 
indented during make check-world. Then just not have 
--enable-indent-checks in CI.


This approach seems like a good improvement, even independent of 
everything else we might do about this.  Making it easier to use and 
less likely to be forgotten.  Also, this way, non-committer contributors 
can opt-in, if they want to earn bonus points.

Re: Rename backup_label to recovery_control

2023-10-18 Thread Peter Eisentraut


On 16.10.23 17:15, David Steele wrote:

I also do wonder with recovery_control is really a better name. Maybe
I just have backup_label too firmly stuck in my head, but is what that
file does really best described as recovery control? I'm not so sure
about that.


The thing it does that describes it as "recovery control" in my view is 
that it contains the LSN where Postgres must start recovery (plus TLI, 
backup method, etc.). There is some other informational stuff in there, 
but the important fields are all about ensuring consistent recovery.


At the end of the day the entire point of backup *is* recovery and users 
will interact with this file primarily in recovery scenarios.


Maybe "restore" is better than "recovery", since recovery also happens 
separate from backups, but restoring is something you do with a backup 
(and there is also restore_command etc.).

Re: pg_rewind WAL segments deletion pitfall

2023-10-18 Thread torikoshia


Thanks for the patch.

I tested the v6 patch using the test script attached on [1], old primary 
has succeeded to become new standby.


I have very minor questions on the regression tests mainly regarding the 
consistency with other tests for pg_rewind:




+setup_cluster;
+create_standby;


Would it be better to add parentheses?
Also should we add "RewindTest::" for these function?



+primary_psql("create table t(a int)");
+primary_psql("insert into t values(0)");
+primary_psql("select pg_switch_wal()");

..

Should 'select', 'create', etc be capitalized?



my $false = "$^X -e 'exit(1)'";

I feel it's hard to understand what does this mean.
Isn't it better to add comments and describe this is for windows 
environments?



--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-10-18 Thread Richard Guo

On Fri, Oct 13, 2023 at 6:18 PM Andrei Lepikhov 
wrote:

> On 23/8/2023 12:37, Richard Guo wrote:
> > To fix it we may need to modify RelOptInfos for Path, BitmapHeapPath,
> > ForeignPath and CustomPath, and modify IndexOptInfos for IndexPath.  It
> > seems that that is not easily done without postponing reparameterization
> > of paths until createplan.c.
> >
> > Attached is a patch which is v5 + fix for this new issue.
>
> Having looked into the patch for a while, I couldn't answer to myself
> for some stupid questions:

Thanks for reviewing this patch!  I think these are great questions.

> 1. If we postpone parameterization until the plan creation, why do we
> still copy the path node in the FLAT_COPY_PATH macros? Do we really need
> it?

Good point.  The NestPath's origin inner path should not be referenced
any more after the reparameterization, so it seems safe to adjust the
path itself, without the need of a flat-copy.  I've done that in v8
patch.

> 2. I see big switches on path nodes. May it be time to create a
> path_walker function? I recall some thread where such a suggestion was
> declined, but I don't remember why.

I'm not sure.  But this seems a separate topic, so maybe it's better to
discuss it separately?

> 3. Clause changing is postponed. Why does it not influence the
> calc_joinrel_size_estimate? We will use statistics on the parent table
> here. Am I wrong?

Hmm, I think the reparameterization does not change the estimated
size/costs.  Quoting the comment

* The cost, number of rows, width and parallel path properties depend upon
* path->parent, which does not change during the translation.

Hi Tom, I'm wondering if you've had a chance to follow this issue.  What
do you think about the proposed patch?

Thanks
Richard

v8-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch
Description: Binary data

Re: Remove wal_level settings for subscribers in tap tests

2023-10-18 Thread Michael Paquier

On Wed, Oct 18, 2023 at 02:59:52AM +, Hayato Kuroda (Fujitsu) wrote:
> While discussing [1], I found that in tap tests, wal_level was set to logical 
> for
> subscribers too. The setting is not needed for subscriber side, and it may 
> cause
> misunderstanding for newcomers. Therefore, I wanted to propose the patch which
> removes unnecessary "allows_streaming => 'logical'".
> I grepped with the string and checked the necessity of them one by one.
> 
> How do you think?
> 
> [1]: https://commitfest.postgresql.org/45/4273/

Hmm, okay.  On top of your argument, this may be a good idea for a
different reason: it makes the tests a bit cheaper as "logical"
generates a bit more WAL.  Still the gain is marginal. 
--
Michael


signature.asc
Description: PGP signature

Re: Test 026_overwrite_contrecord fails on very slow machines (under Valgrind)

2023-10-18 Thread Michael Paquier

On Thu, Oct 12, 2023 at 04:46:02PM -0700, Andres Freund wrote:
> The machine skink is hosted on runs numerous buildfarm animals (24 I think
> right now, about to be 28). While it has plenty resources (16 cores/32
> threads, 128GB RAM), test runtime is still pretty variable depending on what
> other tests are running at the same time...

Okay.  It seems to me that just setting checkpoint_timeout to 1h
should leave plenty of room to make sure that the test is not unstable
compared to the default of 5 mins.  So, why not just do that and see
what happens for a few days?
--
Michael


signature.asc
Description: PGP signature

Re: Clean up some pg_dump tests

2023-10-18 Thread Peter Eisentraut


On 10.10.23 10:03, Peter Eisentraut wrote:

On 09.10.23 11:20, Alvaro Herrera wrote:

I tried this out.  I agree it's a good change.  BTW, this made me
realize that "unlike" is not a good name: maybe it should be called
"except".


right


I would add quotes to the words "like" and "unlike" there.  Otherwise,
these sentences are hard to parse.  Also, some commentary on what this
is about seems warranted: maybe "Check that this test properly defines
which dumps the output should match on." or similar.


Done.

I also moved the code a bit earlier, before the checks for supported 
compression libraries etc., so it runs even if those cause a skip.



I didn't like using diag(), because automated runs will not alert to any
problems.  Now maybe that's not critical, but I fear that people would
not notice problems if they are just noise in the output.  Let's make
them test errors.  fail() seems good enough: with the lines I quote
above and omitting the test corrections, I get this, which seems good
enough:


After researching this a bit more, I think "die" is the convention for 
problems in the test definitions themselves.  (Otherwise, you're writing 
a test about the tests, which would be a bit weird.)  The result is 
approximately the same.


committed

Re: RFC: Logging plan of the running query

2023-10-18 Thread torikoshia

On 2023-10-16 18:46, Ashutosh Bapat wrote:
On Thu, Oct 12, 2023 at 6:51 PM torikoshia  
wrote:

On 2023-10-11 16:22, Ashutosh Bapat wrote:
>
> Considering the similarity with auto_explain I wondered whether this
> function should be part of auto_explain contrib module itself? If we
> do that users will need to load auto_explain extension and thus
> install executor hooks when this function doesn't need those. So may
> not be such a good idea. I didn't see any discussion on this.

I once thought about adding this to auto_explain, but I left it asis 
for

below reasons:

- One of the typical use case of pg_log_query_plan() would be 
analyzing

slow query on customer environments. On such environments, We cannot
always control what extensions to install.

The same argument applies to auto_explain functionality as well. But
it's not part of the core.

Yeah, and when we have a situation where we want to run
pg_log_query_plan(), we can run it in any environment as long as it is
 bundled with the core.
On the other hand, if it is built into auto_explain, we need to start by
installing auto_explain if we do not have auto_explain, which is often
difficult to do in production environments.

   Of course auto_explain is a major extension and it is quite 
possible

that they installed auto_explain, but but it is also possible they do
not.
- It seems a bit counter-intuitive that pg_log_query_plan() is in an
extension called auto_explain, since it `manually`` logs plans

pg_log_query_plan() may not fit auto_explain but
pg_explain_backend_query() does. What we are logging is more than just
plan of the query, it might expand to be closer to explain output.
While auto in auto_explain would refer to its automatically logging
explain outputs, it can provide an additional function which provides
similar functionality by manually triggering it.

But we can defer this to a committer, if you want.

I am more interested in avoiding the duplication of code, esp. the
first comment in my reply

If there are no objections, I will try porting it to auto_explain and
see its feasibility.

There is a lot of similarity between what this feature does and what
auto explain does. I see the code is also duplicated. There is some
merit in avoiding this duplication
1. we will get all the features of auto_explain automatically like
choosing a format (this was expressed somebody earlier in this
thread), setings etc.
2. avoid bugs. E.g your code switches context after ExplainState has
been allocated. These states may leak depending upon when this
function gets called.
3. Building features on top as James envisions will be easier.

   =# select pg_log_query_plan(pid), application_name, backend_type 
from

pg_stat_activity where backend_type = 'autovacuum launcher';
   WARNING:  PID 63323 is not a PostgreSQL client backend process
pg_log_query_plan | application_name |backend_type
   ---+--+-
f |  | autovacuum launcher

> I am also wondering whether it's better to report the WARNING as
> status column in the output. E.g. instead of
> #select pg_log_query_plan(100);
> WARNING:  PID 100 is not a PostgreSQL backend process
>  pg_log_query_plan
> ---
>  f
> (1 row)
> we output
> #select pg_log_query_plan(100);
>  pg_log_query_plan |   status
> ---+-
>  f | PID 100 is not a PostgreSQL backend process
> (1 row)
>
> That looks neater and can easily be handled by scripts, applications
> and such. But it will be inconsistent with other functions like
> pg_terminate_backend() and pg_log_backend_memory_contexts().

It seems neater, but it might be inconvenient because we can no longer
use it  in select list like the following query as you wrote:

   #select pg_log_query_plan(pid), application_name, backend_type from
   pg_stat_activity where backend_type = 'autovacuum launcher';

Why is that?

Sorry, I misunderstood and confirmed we can run queries like below:

```
=# CREATE OR REPLACE FUNCTION pg_log_query_plan_stab(i int)  
   RETURNS TABLE(

 pg_log_query_plan bool, 
  status text

  ) AS $$
   DECLARE   

   BEGIN 

   RETURN QUERY SELECT false::bool, 'PID 100 is not a PostgreSQL 
backend

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-18 Thread Peter Smith

Here are some comments for the patch v51-0002

==
src/bin/pg_upgrade/t/003_upgrade_logical_replication_slots.pl

1.
+# Set max_wal_senders to a lower value if the old cluster is prior to PG12.
+# Such clusters regard max_wal_senders as part of max_connections, but the
+# current TAP tester sets these GUCs to the same value.
+if ($old_publisher->pg_version < 12)
+{
+ $old_publisher->append_conf('postgresql.conf', "max_wal_senders = 5");
+}

1a.
I was initially unsure what the above comment meant -- thanks for the
offline explanation.

SUGGESTION
The TAP Cluster.pm assigns default 'max_wal_senders' and
'max_connections' to the same value (10) but PG12 and prior considered
max_walsenders as a subset of max_connections, so setting the same
value will fail.

~

1b.
I also felt it is better to explicitly set both values in the < PG12
configuration because otherwise, you are still assuming knowledge that
the TAP default max_connections is 10.

SUGGESTION
$old_publisher->append_conf('postgresql.conf', qq{
max_wal_senders = 5
max_connections = 10
});

~~~

2.
+# Switch workloads depend on the major version of the old cluster.  Upgrading
+# logical replication slots has been supported since PG17.
+if ($old_publisher->pg_version <= 16)
+{
+ test_for_16_and_prior($old_publisher, $new_publisher, $mode);
+}
+else
+{
+ test_for_17_and_later($old_publisher, $new_publisher, $mode);
+}

IMO it is less confusing to have fewer version numbers floating around
in comments and names and code. So instead of referring to 16 and 17,
how about just referring to 17 everywhere?

For example

SUGGESTION
# Test according to the major version of the old cluster.
# Upgrading logical replication slots has been supported only since PG17.

if ($old_publisher->pg_version >= 17)
{
  test_upgrade_from_PG17_and_later($old_publisher, $new_publisher, $mode);
}
else
{
  test_upgrade_from_pre_PG17($old_publisher, $new_publisher, $mode);
}

==
Kind Regards,
Peter Smith.
Fujitsu Australia

99 matches

Mail list logo