date:20240724

Re: Reuse child_relids in try_partitionwise_join was Re: Assert failure on bms_equal(child_joinrel->relids, child_joinrelids)

2024-07-24 Thread Ashutosh Bapat

On Tue, Jul 23, 2024 at 2:05 PM Richard Guo  wrote:
>
> On Wed, Jun 5, 2024 at 3:48 PM Ashutosh Bapat
>  wrote:
> > This is different. But it needs a rebase. PFA rebased patch. The revised 
> > commit message explains the change better.
>
> I looked through this patch and I think it is in a good shape.
>
> Some minor comments:
>
> * I think it's better to declare 'child_relids' as type Relids
> rather than Bitmapset *.  While they are the same thing, Bitmapset *
> isn't typically used in joinrels.c.

Agreed. Sorry. Fixed now.

> And I doubt that it is necessary
> to explicitly initialize it to NULL.

Done.

>
> * This patch changes the signature of function build_child_join_rel().
> I recommend arranging the two newly added parameters as 'int
> nappinfos, AppendRelInfo **appinfos' to maintain consistency with
> other functions that include these parameters.

Yes. Done.

>
> * At the end of the for loop in try_partitionwise_join(), we attempt
> to free the variables used within the loop.  I suggest freeing these
> variables in the order or reverse order of their allocation, rather
> than arbitrarily.

Done. Are you suggesting it for aesthetic purposes or there's
something else (read, less defragmentation, performance gain etc.)? I
am curious.

> Also, in non-partitionwise-join planning, we do not
> free local variables in this manner.  I understand that we do this for
> partitionwise-join because accumulating local variables can lead to
> significant memory usage, particularly with many partitions.  I think
> it would be beneficial to add a comment in try_partitionwise_join()
> explaining this behavior.

Sounds useful. Done.

PFA patches:
0001 - same as previous one
0002 - addresses your review comments

--
Best Wishes,
Ashutosh Bapat
From 53be9fd80aa89bf7612105ddfbbe48200b1f0ce0 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Wed, 26 Jul 2023 12:08:55 +0530
Subject: [PATCH 1/2] Avoid repeated computation in try_partitionwise_join()
 and build_child_join_rel()

try_partitionwise_join() computes bms_union() of Bitmapsets representing
joining child relations and fetches AppendRelInfos corresponding child
base relations participating in the join. The same computation is
repeated in build_child_join_rel(). Avoid this repeated computation by
performing it only once in try_partitionwise_join() and passing the
AppendRelInfos gathered there to build_child_join_rel().

Bitmapsets representing child relids consume large memory when thousands
of partitions are involved. By performing the bms_union() only once and
freeing the result when it's no longer required, we save memory. The
memory savings are considerable when many partitioned tables with
thousands of partitions are joined using partitionwise join.

Author: Ashutosh Bapat
Reviewed-by: David Christensen
Discussion: https://www.postgresql.org/message-id/flat/CAExHW5snUW7pD2RdtaBa1T_TqJYaY6W_YPVjWDrgSf33i-0uqA%40mail.gmail.com#1f9e0950518310dd300e21574979646f
---
 src/backend/optimizer/path/joinrels.c | 10 ++
 src/backend/optimizer/util/relnode.c  | 18 +++---
 src/include/optimizer/pathnode.h  |  3 ++-
 3 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index a3677f824f..f9ab82a23f 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1547,6 +1547,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		RelOptInfo *child_joinrel;
 		AppendRelInfo **appinfos;
 		int			nappinfos;
+		Bitmapset  *child_relids = NULL;
 
 		if (joinrel->partbounds_merged)
 		{
@@ -1642,9 +1643,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 			   child_rel2->relids);
 
 		/* Find the AppendRelInfo structures */
-		appinfos = find_appinfos_by_relids(root,
-		   bms_union(child_rel1->relids,
-	 child_rel2->relids),
+		child_relids = bms_union(child_rel1->relids, child_rel2->relids);
+		appinfos = find_appinfos_by_relids(root, child_relids,
 		   &nappinfos);
 
 		/*
@@ -1662,7 +1662,8 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 		{
 			child_joinrel = build_child_join_rel(root, child_rel1, child_rel2,
  joinrel, child_restrictlist,
- child_sjinfo);
+ child_sjinfo, appinfos,
+ nappinfos);
 			joinrel->part_rels[cnt_parts] = child_joinrel;
 			joinrel->live_parts = bms_add_member(joinrel->live_parts, cnt_parts);
 			joinrel->all_partrels = bms_add_members(joinrel->all_partrels,
@@ -1681,6 +1682,7 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 
 		pfree(appinfos);
 		free_child_join_sjinfo(child_sjinfo);
+		bms_free(child_relids);
 	}
 }
 
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e05b21c884..989bee0fa8 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/

Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

2024-07-24 Thread John Naylor

On Wed, Jul 24, 2024 at 5:40 AM Masahiko Sawada  wrote:

> Without MEMORY_CONTEXT_CHECK, if size is 16 bytes, required_size is
> also 16 bytes as it's already 8-byte aligned and Bump_CHUNKHDRSZ is 0.
> On the other hand with MEMORY_CONTEXT_CHECK, the requied_size is
> bumped to 40 bytes as chunk_size is 24 bytes and Bump_CHUNKHDRSZ is 16
> bytes. Therefore, with MEMORY_CONTEXT_CHECK, we allocate more memory
> and use more Bump memory blocks, resulting in filling up TidStore in
> the test cases. We can easily reproduce this test failure with
> PostgreSQL server built without --enable-cassert. It seems that
> copperhead is the sole BF animal that doesn't use --enable-cassert but
> runs recovery-check.

It seems we could force the bitmaps to be larger, and also reduce the
number of updated tuples by updating only the last few tuples (say
5-10) by looking at the ctid's offset. This requires some trickery,
but I believe I've done it in the past by casting to text and
extracting with a regex. (I'm assuming the number of tuples updated is
more important than the number of tuples inserted on a newly created
table.)

As for lowering the limit, we've experimented with 256kB here:

https://www.postgresql.org/message-id/canwcazzutvz3lsypauyqvzcezxz7qe+9ntnhgyzdtwxpul+...@mail.gmail.com

As I mention there, going lower than that would need a small amount of
reorganization in the radix tree. Not difficult -- the thing I'm
concerned about is that we'd likely need to document a separate
minimum for DSA, since that behaves strangely with 256kB and might not
work at all lower than that.

Re: race condition when writing pg_control

2024-07-24 Thread Alexander Lakhin


Hello Thomas,

15.07.2024 06:44, Thomas Munro wrote:

I'm going to upgrade this to a proposal:

https://commitfest.postgresql.org/49/5124/

I wonder how often this happens in the wild.


Please look at a recent failure [1], produced by buildfarm animal
culicidae, which tests EXEC_BACKEND. I guess it's caused by the issue
discussed.

Maybe it would make sense to construct a reliable reproducer for the
issue (I could not find a ready-to-use recipe in this thread)...

What do you think?

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2024-07-24%2004%3A08%3A23

Best regards,
Alexander

Re: Make COPY format extendable: Extract COPY TO format implementations

2024-07-24 Thread Sutou Kouhei

Hi,

In <9172d4eb-6de0-4c6d-beab-8210b7a22...@enterprisedb.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 22 Jul 2024 14:36:40 +0200,
  Tomas Vondra  wrote:

> Thanks for the summary/responses. I still think it'd be better to post a
> summary as a separate message, not as yet another post responding to
> someone else. If I was reading the thread, I would not have noticed this
> is meant to be a summary. I'd even consider putting a "THREAD SUMMARY"
> title on the first line, or something like that. Up to you, of course.

It makes sense. I'll do it as a separated e-mail.

> My suggestions would be to maintain this as a series of patches, making
> incremental changes, with the "more complex" or "more experimental"
> parts larger in the series. For example, I can imagine doing this:
> 
> 0001 - minimal version of the patch (e.g. current v17)
> 0002 - switch existing formats to the new interface
> 0003 - extend the interface to add bits needed for columnar formats
> 0004 - add DML to create/alter/drop custom implementations
> 0005 - minimal patch with extension adding support for Arrow
> 
> Or something like that. The idea is that we still have a coherent story
> of what we're trying to do, and can discuss the incremental changes
> (easier than looking at a large patch). It's even possible to commit
> earlier parts before the later parts are quite cleanup up for commit.
> And some changes changes may not be even meant for commit (e.g. the
> extension) but as guidance / validation for the earlier parts.

OK. I attach the v18 patch set:

0001: add a basic feature (Copy{From,To}Routine)
  (same as the v17 but it's based on the current master)
0002: use Copy{From,To}Rountine for the existing formats
  (this may not be committed because there is a
  profiling related concern)
0003: add support for specifying custom format by "COPY
  ... WITH (format 'my-format')"
  (this also has a test)
0004: export Copy{From,To}StateData
  (but this isn't enough to implement custom COPY
  FROM/TO handlers as an extension)
0005: add opaque member to Copy{From,To}StateData and export
  some functions to read the next data and flush the buffer
  (we can implement a PoC Apache Arrow COPY FROM/TO
  handler as an extension with this)

https://github.com/kou/pg-copy-arrow is a PoC Apache Arrow
COPY FROM/TO handler as an extension.

Notes:

* 0002: We use "static inline" and "constant argument" for
  optimization.
* 0002: This hides NextCopyFromRawFields() in a public
  header because it's not used in PostgreSQL and we want to
  use "static inline" for it. If it's a problem, we can keep
  it and create an internal function for "static inline".
* 0003: We use "CREATE FUNCTION" to register a custom COPY
  FROM/TO handler. It's the same approach as tablesample.
* 0004 and 0005: We can mix them but this patch set split
  them for easy to review. 0004 just moves the existing
  codes. It doesn't change the existing codes.
* PoC: I provide it as a separated repository instead of a
  patch because an extension exists as a separated project
  in general. If it's a problem, I can provide it as a patch
  for contrib/.
* This patch set still has minimal Copy{From,To}Routine. For
  example, custom COPY FROM/TO handlers can't process their
  own options with this patch set. We may add more callbacks
  to Copy{From,To}Routine later based on real world use-cases.

> Unfortunately, there's not much information about what exactly the tests
> did, context (hardware, ...). So I don't know, really. But if you share
> enough information on how to reproduce this, I'm willing to take a look
> and investigate.

Thanks. Here is related information based on the past
e-mails from Michael:

* Use -O2 for optimization build flag
  ("meson setup --buildtype=release" may be used)
* Use tmpfs for PGDATA
* Disable fsync
* Run on scissors (what is "scissors" in this context...?)

https://www.postgresql.org/message-id/flat/Zbr6piWuVHDtFFOl%40paquier.xyz#dbbec4d5c54ef2317be01a54abaf495c
* Unlogged table may be used
* Use a table that has 30 integer columns (*1)
* Use 5M rows (*2)
* Use '/dev/null' for COPY TO (*3)
* Use blackhole_am for COPY FROM (*4)
  https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
* perf is used but used options are unknown (sorry)

(*1) This SQL may be used to create the table:

CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int)
RETURNS VOID AS
$func$
DECLARE
  query text;
BEGIN
  query := 'CREATE UNLOGGED TABLE ' || tabname || ' (';
  FOR i IN 1..num_cols LOOP
query := query || 'a_' || i::text || ' int default 1';
IF i != num_cols THEN
  query := query || ', ';
END IF;
  END LOOP;
  query := query || ')';
  EXECUTE format(query);
END
$func$ LANGUAGE plpgsql;
SELECT create_table_cols ('to_tab_30', 30);
SELECT create_table_cols ('from_tab_30', 30);

(*2) This SQL may be used to insert 5M rows:

INSERT INTO to_tab_30 SE

Re: pgsql: Add more SQL/JSON constructor functions

2024-07-24 Thread jian he

drop domain if exists djs;
create domain djs as jsonb check ( value <> '"11"' );
SELECT JSON_QUERY(jsonb '"aaa"', '$.a' RETURNING djs keep quotes
DEFAULT '"11"' ON empty);
SELECT JSON_QUERY(jsonb '"aaa"', '$.a' RETURNING djs omit quotes
DEFAULT '"11"' ON empty);
SELECT JSON_QUERY(jsonb '"11"', '$' RETURNING djs omit quotes DEFAULT
'"11"' ON empty);

SELECT JSON_QUERY(jsonb '"aaa"', '$.a' RETURNING jsonb keep quotes
DEFAULT '"11"' ON empty);
SELECT JSON_QUERY(jsonb '"aaa"', '$.a' RETURNING jsonb omit quotes
DEFAULT '"11"' ON empty);
SELECT JSON_QUERY(jsonb '"aaa"', '$.a' RETURNING int4range omit quotes
DEFAULT '"[1,2]"'::jsonb ON empty);
SELECT JSON_QUERY(jsonb '"aaa"', '$.a' RETURNING int4range keep quotes
DEFAULT '"[1,2]"'::jsonb ON empty);
SELECT JSON_value(jsonb '"aaa"', '$.a' RETURNING int4range  DEFAULT
'"[1,2]"'::jsonb ON empty);


I found out 2 issues for the above tests.
1. RETURNING types is jsonb/domain over jsonb, default expression does
not respect omit/keep quotes,
but other RETURNING types do. Maybe this will be fine.

2. domain over jsonb should fail just like domain over other types?
RETURNING djs keep quotes DEFAULT '"11"' ON empty
should fail as
ERROR:  could not coerce ON EMPTY expression (DEFAULT) to the RETURNING type
DETAIL:  value for domain djs violates check constraint "djs_check""



errcode(ERRCODE_CANNOT_COERCE),
errmsg("cannot cast behavior expression of
type %s to %s",
   format_type_be(exprType(expr)),
   format_type_be(returning->typid)),
errhint("You will need to cast the expression."),
parser_errposition(pstate, exprLocation(expr)));

maybe
errhint("You will need to explicitly cast the expression to type %s",
format_type_be(returning->typid))

Re: [PATCH] GROUP BY ALL

2024-07-24 Thread Jelte Fennema-Nio

On Mon, 22 Jul 2024 at 22:55, David Christensen  wrote:
> I see that there'd been some chatter but not a lot of discussion about
> a GROUP BY ALL feature/functionality.  There certainly is utility in
> such a construct IMHO.

+1 from me. When exploring data, this is extremely useful because you
don't have to update the GROUP BY clause every time

Regarding the arguments against this:
1. I don't think this is any more unreadable than being able to GROUP
BY 1, 2, 3. Or being able to use column aliases from the SELECT in the
GROUP BY clause. Again this is already allowed. Personally I actually
think it's more readable than specifying e.g. 5 columns in the group
by, because then I have to cross-reference with columns in the SELECT
clause to find out if they are the same. With ALL I instantly know
it's grouped by all
2. This is indeed not part of the standard. But we have many things
that are not part of the standard. I think as long as we use the same
syntax as snowflake, databricks and duckdb I personally don't see a
big problem. Then we could try and make this be part of the standard
in the next version of the standard.

Re: Use pgBufferUsage for block reporting in analyze

2024-07-24 Thread Anthonin Bonnefoy

On Mon, Jul 22, 2024 at 10:59 PM Masahiko Sawada  wrote:
> The first line would vary depending on whether an autovacuum worker or
> not. And the above suggestion includes a change of term "row" to
> "tuple" for better consistency with VACUUM VERBOSE outputs. I think it
> would be great if autoanalyze also writes logs in the same format.
> IIUC with the patch, autoanalyze logs don't include the page and tuple
> statistics.

One issue is that the number of scanned pages, live tuples and dead
tuples is only available in acquire_sample_rows which is where the log
containing those stats is emitted. I've tried to implement the
following in 0003:
- Sampling functions now accept an AcquireSampleStats struct to store
pages and tuples stats
- Log is removed from sampling functions
- do_analyze_rel now outputs scanned and tuples statistics when
relevant. sampling from fdw doesn't provide those statistics so they
are not displayed in those cases.

This ensures that analyze logs are only emitted in do_analyze_rel,
allowing to display the same output for both autoanalyze and ANALYZE
VERBOSE. With those changes, we have the following output for analyze
verbose of a table:

analyze (verbose) pgbench_accounts ;
INFO:  analyzing "public.pgbench_accounts"
INFO:  analyze of table "postgres.public.pgbench_accounts"
pages: 1640 of 1640 scanned
tuples: 10 live tuples, 0 are dead; 3 tuples in samples,
10 estimated total tuples
avg read rate: 174.395 MB/s, avg write rate: 0.000 MB/s
buffer usage: 285 hits, 1384 reads, 0 dirtied
WAL usage: 14 records, 0 full page images, 1343 bytes
system usage: CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.06 s

For a file_fdw, the output will look like:

analyze (verbose) pglog;
INFO:  analyzing "public.pglog"
INFO:  analyze of table "postgres.public.pglog"
tuples: 3 tuples in samples, 60042 estimated total tuples
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 182 hits, 0 reads, 0 dirtied
WAL usage: 118 records, 0 full page images, 13086 bytes
system usage: CPU: user: 0.40 s, system: 0.00 s, elapsed: 0.41 s

I've also slightly modified 0002 to display "automatic analyze" when
we're inside an autovacuum worker, similar to what's done with vacuum
output.

Regards,
Anthonin


v3-0002-Output-buffer-and-wal-usage-with-verbose-analyze.patch
Description: Binary data


v3-0001-Use-pgBufferUsage-for-block-reporting-in-analyze.patch
Description: Binary data


v3-0003-Output-pages-and-tuples-stats-in-ANALYZE-VERBOSE.patch
Description: Binary data

Re: [PATCH] GROUP BY ALL

2024-07-24 Thread Jelte Fennema-Nio

On Tue, 23 Jul 2024 at 15:22, Andrei Borodin  wrote:
> I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE 
> ME). But I wouldn't like to open pandora box of syntax sugar extensions which 
> may will be incompatible with future standards.
> If we could have extensible grammar - I'd be happy to have a lot of such 
> enhancements. My top 2 are FROM table SELECT column and better GROUP BY.

Personally my number one enhancement would be allowing a trailing
comma after the last column in the SELECT clause.

Re: [PATCH] GROUP BY ALL

2024-07-24 Thread Pavel Stehule

Hi

st 24. 7. 2024 v 10:57 odesílatel Jelte Fennema-Nio 
napsal:

> On Mon, 22 Jul 2024 at 22:55, David Christensen  wrote:
> > I see that there'd been some chatter but not a lot of discussion about
> > a GROUP BY ALL feature/functionality.  There certainly is utility in
> > such a construct IMHO.
>
> +1 from me. When exploring data, this is extremely useful because you
> don't have to update the GROUP BY clause every time
>
> Regarding the arguments against this:
> 1. I don't think this is any more unreadable than being able to GROUP
> BY 1, 2, 3. Or being able to use column aliases from the SELECT in the
> GROUP BY clause. Again this is already allowed. Personally I actually
> think it's more readable than specifying e.g. 5 columns in the group
> by, because then I have to cross-reference with columns in the SELECT
> clause to find out if they are the same. With ALL I instantly know
> it's grouped by all
> 2. This is indeed not part of the standard. But we have many things
> that are not part of the standard. I think as long as we use the same
> syntax as snowflake, databricks and duckdb I personally don't see a
> big problem. Then we could try and make this be part of the standard
> in the next version of the standard.
>

 Aggregation against more columns is pretty slow and memory expensive in
Postgres.

DuckDB is an analytic database with different storage, different executors.
I like it very much, but I am not sure if I want to see these features in
Postgres.

Lot of developers are not very smart, and with proposed feature, then
instead to write correct and effective query, then they use just GROUP BY
ALL. Slow query should look like a slow query :-)

Regards

Pavel

Re: Logical Replication of sequences

2024-07-24 Thread shveta malik

On Wed, Jul 24, 2024 at 11:52 AM shveta malik  wrote:
>
> On Wed, Jul 24, 2024 at 9:17 AM Peter Smith  wrote:
> >
>
> I had a look at patches v20240720* (considering these as the latest
> one) and tried to do some basic testing (WIP). Few comments:
>
> 1)
> I see 'last_value' is updated wrongly after create-sub.  Steps:
>
> ---
> pub:
> CREATE SEQUENCE myseq0 INCREMENT 5 START 100;
> SELECT nextval('myseq0');
> SELECT nextval('myseq0');
> --last_value on pub is 105
> select * from pg_sequences;
> create publication pub1 for all tables, sequences;
>
> Sub:
> CREATE SEQUENCE myseq0 INCREMENT 5 START 100;
> create subscription sub1 connection 'dbname=postgres host=localhost
> user=shveta port=5433' publication pub1;
>
> --check 'r' state is reached
> select pc.relname, pr.srsubstate, pr.srsublsn from pg_subscription_rel
> pr, pg_class pc where (pr.srrelid = pc.oid);
>
> --check 'last_value', it shows some random value as 136
> select * from pg_sequences;

Okay, I see that in fetch_remote_sequence_data(), we are inserting
'last_value + log_cnt' fetched from remote as 'last_val' on subscriber
and thus leading to above behaviour. I did not understand why this is
done? This may result into issue when we insert data into a table with
identity column on subscriber (whose internal sequence is replicated);
the identity column in this case will end up having wrong value.

thanks
Shveta

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-07-24 Thread Amit Kapila

On Tue, Jul 23, 2024 at 4:55 PM Amit Kapila  wrote:
>
> On Mon, Jul 22, 2024 at 2:48 PM Peter Smith  wrote:
> >
> > Hi, Patch v22-0001 LGTM apart from the following nitpicks
> >
>
> I have included these in the attached. The patch looks good to me. I
> am planning to push this tomorrow unless there are more comments.
>

I merged these changes, made a few other cosmetic changes, and pushed the patch.

-- 
With Regards,
Amit Kapila.

Re: Parent/child context relation in pg_get_backend_memory_contexts()

2024-07-24 Thread David Rowley

On Tue, 23 Jul 2024 at 22:14, Melih Mutlu  wrote:
> Fixed the name. Also I needed to cast parameters when calling that function 
> as below to get rid of some warnings.
>
> +   get_memory_context_name_and_ident(context,
> + 
> (const char **)&name,
> + 
> (const char **) &ident);

Thanks for fixing all those.

I've only had a quick look so far, but I think the patch is now in the
right shape. Unless there's some objections to how things are being
done in v10, I plan to commit this in the next few days... modulo any
minor adjustments.

David

Detect buffer underflow in get_th()

2024-07-24 Thread Alexander Kuznetsov


Hello everyone,

In src/backend/utils/adt/formatting.c:1516, there is a get_th() function 
utilized to return ST/ND/RD/TH suffixes for simple numbers.
Upon reviewing its behavior, it appears capable of receiving non-numeric inputs 
(this is verified by a check at formatting.c:1527).

Given that the function can accept non-numeric inputs,
it is plausible that it could also receive an empty input,
although a brief examination of its calls did not reveal any such instances.

Nevertheless, if the function were to receive an empty input of zero length,
a buffer underflow would occur when attempting to compute *(num + (len - 1)), 
as (len - 1) would result in a negative shift.
To mitigate this issue, I propose a patch incorporating the 
zero_length_character_string error code, as detailed in the attachment.

--
Best regards,
Alexander KuznetsovFrom de36780802b72b643b47ec129979292e9832e9e7 Mon Sep 17 00:00:00 2001
From: Alexander Kuznetsov 
Date: Wed, 24 Jul 2024 12:31:45 +0300
Subject: [PATCH] Detect buffer underflow in get_th()

If get_th() can receive input that is not a number,
then it can also receive empty input.

Empty input with zero length can result in a buffer underflow
when accessing *(num + (len - 1)), as (len - 1) would produce a negative index.
Add a check for zero-length input to prevent it.

This was found by ALT Linux Team.
---
 src/backend/utils/adt/formatting.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 8736ada4be..0ab5ac5bf0 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1518,6 +1518,11 @@ get_th(char *num, int type)
 	int			len = strlen(num),
 last;
 
+	if (len == 0)
+		ereport(ERROR,
+(errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING),
+ errmsg("input cannot be empty string")));
+
 	last = *(num + (len - 1));
 	if (!isdigit((unsigned char) last))
 		ereport(ERROR,
-- 
2.42.2

Re: Add on_error and log_verbosity options to file_fdw

2024-07-24 Thread torikoshia


On 2024-07-23 08:57, Michael Paquier wrote:

On Mon, Jul 22, 2024 at 03:07:46PM -0700, Masahiko Sawada wrote:

I'm slightly concerned that users might not want to see the NOTICE
message for every scan. Unlike COPY FROM, scanning a file via file_fdw
could be frequent.


Agreed.


Yeah, I also have some concerns regarding the noise that this could
produce if called on a foreign table on a regular basis.  The verbose
mode is disabled by default so I don't see why we should not allow it
if the relation owner wants to show it.

Perhaps we should first do a silence mode for log_verbosity to skip
the NOTICE produced at the end of the COPY FROM summarizing the whole?


I like this idea.
If there are no objections, I'm going to make a patch for this.


It would be confusing to have different defaults between COPY and
file_fdw, but having the option to silence that completely is also
appealing from the user point of view.


I'm not sure we should change the defaults.
If the default of file_fdw is silence mode, I am a little concerned that 
there may be cases where people think they have no errors, but in fact 
they have.



   QUERY PLAN

 Foreign Scan on public.test  (cost=0.00..1.10 rows=1 width=12)
   Output: a, b, c
   Foreign File: test.csv
   Foreign File Size: 12 b
   Skipped Rows: 10


Interesting idea linked to the idea of pushing the error state to
something else than the logs.  Sounds like a separate feature.


+1


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation

Re: Incremental backup from a streaming replication standby fails

2024-07-24 Thread Laurenz Albe

On Mon, 2024-07-22 at 09:37 -0400, Robert Haas wrote:
> On Fri, Jul 19, 2024 at 6:07 PM Laurenz Albe  wrote:
> > Here is a patch.
> > I went for both the errhint and some documentation.
> 
> Hmm, the hint doesn't end up using the word "standby" anywhere. That
> seems like it might not be optimal?

I guessed that the user was aware that she is taking the backup on
a standby server...

Anyway, I reworded the hint to

  This can happen for incremental backups on a standby if there was
  little activity since the previous backup.

> Hmm. I feel like I'm about to be super-nitpicky, but this seems
> imprecise to me in multiple ways.

On the contrary, cour comments and explanations are valuable.

> How about something like this:
> 
> An incremental backup is only possible if replay would begin from a
> later checkpoint than for the previous backup upon which it depends.
> On the primary, this condition is always satisfied, because each
> backup triggers a new checkpoint. On a standby, replay begins from the
> most recent restartpoint. As a result, an incremental backup may fail
> on a standby if there has been very little activity since the previous
> backup. Attempting to take an incremental backup that is lagging
> behind the primary (or some other standby) using a prior backup taken
> at a later WAL position may fail for the same reason.
> 
> I'm not saying that's perfect, but let me know your thoughts.

I tinkered with this some more, and the attached patch has

  An incremental backup is only possible if replay would begin from a later
  checkpoint than the checkpoint that started the previous backup upon which
  it depends.  If you take the incremental backup on the primary, this
  condition is always satisfied, because each backup triggers a new
  checkpoint.  On a standby, replay begins from the most recent restartpoint.
  Therefore, an incremental backup of a standby server can fail if there has
  been very little activity since the previous backup, since no new
  restartpoint might have been created.

Yours,
Laurenz Albe
From 407c6b4ab695956bb9f207efddf477d130c820ba Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Wed, 24 Jul 2024 12:42:24 +0200
Subject: [PATCH v2] Add documentation and hint for incremental backup on
 standbys

Taking an incremental backup on a streaming replication standby immediately
after the previous backup backup can result in the error

  manifest requires WAL from final timeline n ending at XXX, but this backup starts at YYY

This message looks scary, even though the only problem is that the backup
would be empty and thus it makes no sense to take it anyway.

Add a clarifying errhint and some documentation to mitigate the problem.

Author: Laurenz Albe
Reviewed-by: Robert Haas, David Steele
Discussion: https://postgr.es/m/04f4277e5ed4046773e46837110bed1381a2583f.ca...@cybertec.at

Backpatch to v17.
---
 doc/src/sgml/backup.sgml| 11 +++
 src/backend/backup/basebackup_incremental.c |  3 ++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 91da3c26ba..13b60c39bb 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -925,6 +925,17 @@ test ! -f /mnt/server/archivedir/000100A90065 && cp pg_wal/0
 to manage. For a large database all of which is heavily modified,
 incremental backups won't be much smaller than full backups.

+
+   
+An incremental backup is only possible if replay would begin from a later
+checkpoint than the checkpoint that started the previous backup upon which
+it depends.  If you take the incremental backup on the primary, this
+condition is always satisfied, because each backup triggers a new
+checkpoint.  On a standby, replay begins from the most recent restartpoint.
+Therefore, an incremental backup of a standby server can fail if there has
+been very little activity since the previous backup, since no new
+restartpoint might have been created.
+   
   
 
   
diff --git a/src/backend/backup/basebackup_incremental.c b/src/backend/backup/basebackup_incremental.c
index 2108702397..a023e62440 100644
--- a/src/backend/backup/basebackup_incremental.c
+++ b/src/backend/backup/basebackup_incremental.c
@@ -441,7 +441,8 @@ PrepareForIncrementalBackup(IncrementalBackupInfo *ib,
 		 errmsg("manifest requires WAL from final timeline %u ending at %X/%X, but this backup starts at %X/%X",
 range->tli,
 LSN_FORMAT_ARGS(range->end_lsn),
-LSN_FORMAT_ARGS(backup_state->startpoint;
+LSN_FORMAT_ARGS(backup_state->startpoint)),
+		 errhint("This can happen for incremental backups on a standby if there was little activity since the previous backup.")));
 		}
 		else
 		{
-- 
2.45.2

Re: Speed up JSON escape processing with SIMD plus other optimisations

2024-07-24 Thread Heikki Linnakangas


On 02/07/2024 07:49, David Rowley wrote:

I've attached a rebased set of patches.  The previous set no longer applied.


I looked briefly at the first patch. Seems reasonable.

One little thing that caught my eye is that in populate_scalar(), you 
sometimes make a temporary copy of the string to add the 
null-terminator, but then call escape_json() which doesn't need the 
null-terminator anymore. See attached patch to avoid that. However, it's 
not clear to me how to reach that codepath, or if it reachable at all. I 
tried to add a NOTICE there and ran the regression tests, but got no 
failures.


--
Heikki Linnakangas
Neon (https://neon.tech)
diff --git a/src/backend/utils/adt/jsonfuncs.c 
b/src/backend/utils/adt/jsonfuncs.c
index fd3c6b8f432..55c0fc4d73d 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -3133,18 +3133,6 @@ populate_scalar(ScalarIOData *io, Oid typid, int32 
typmod, JsValue *jsv,
 
json = jsv->val.json.str;
Assert(json);
-   if (len >= 0)
-   {
-   /* Need to copy non-null-terminated string */
-   str = palloc(len + 1 * sizeof(char));
-   memcpy(str, json, len);
-   str[len] = '\0';
-   }
-   else
-   {
-   str = unconstify(char *, json);
-   len = strlen(str);
-   }
 
/* If converting to json/jsonb, make string into valid JSON 
literal */
if ((typid == JSONOID || typid == JSONBOID) &&
@@ -3153,12 +3141,24 @@ populate_scalar(ScalarIOData *io, Oid typid, int32 
typmod, JsValue *jsv,
StringInfoData buf;
 
initStringInfo(&buf);
-   escape_json(&buf, str, len);
-   /* free temporary buffer */
-   if (str != json)
-   pfree(str);
+   if (len >= 0)
+   escape_json(&buf, json, len);
+   else
+   escape_json_cstring(&buf, json);
str = buf.data;
}
+   else if (len >= 0)
+   {
+   /* Need to copy non-null-terminated string */
+   str = palloc(len + 1 * sizeof(char));
+   memcpy(str, json, len);
+   str[len] = '\0';
+   }
+   else
+   {
+   /* string is already null-terminated */
+   str = unconstify(char *, json);
+   }
}
else
{

PG buildfarm member cisticola

2024-07-24 Thread Alvaro Herrera

Hello

A couple of days ago, PG buildfarm member cisticola started failing:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=cisticola&br=HEAD

The failures[1] are pretty mysterious:

make[3]: 
\347\246\273\345\274\200\347\233\256\345\275\225\342\200\234/home/postgres/buildfarm/HEAD/pgsql.build/src/backend/utils\342\200\235
ccache gcc -Wall -Wmissing-prototypes -Wpointer-arith 
-Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type 
-Wshadow=compatible-local -Wformat-security -fno-strict-aliasing -fwrapv 
-fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g 
-O2 access/brin/brin.o [...] libpq/be-secure-openssl.o [...] 
../../src/timezone/localtime.o ../../src/timezone/pgtz.o 
../../src/timezone/strftime.o jit/jit.o ../../src/common/libpgcommon_srv.a 
../../src/port/libpgport_srv.a -L../../src/port -L../../src/common   
-Wl,--as-needed 
-Wl,-rpath,'/home/postgres/buildfarm/HEAD/inst/lib',--enable-new-dtags  
-Wl,--export-dynamic -lxslt -lxml2 -lssl -lcrypto -lgssapi_krb5 -lz -lpthread 
-lrt -ldl -lm -lldap -licui18n -licuuc -licudata -o postgres
/usr/bin/ld: libpq/be-secure-openssl.o: in function `.L388':
be-secure-openssl.c:(.text+0x1f8c): undefined reference to `ERR_new'
/usr/bin/ld: libpq/be-secure-openssl.o: in function `.L530':
be-secure-openssl.c:(.text+0x1fa4): undefined reference to `ERR_set_debug'
/usr/bin/ld: libpq/be-secure-openssl.o: in function `.LVL614':
be-secure-openssl.c:(.text+0x1fb8): undefined reference to `ERR_set_error'
/usr/bin/ld: libpq/be-secure-openssl.o: in function `.L539':
be-secure-openssl.c:(.text+0x2010): undefined reference to `ERR_new'
/usr/bin/ld: libpq/be-secure-openssl.o: in function `.L417':
be-secure-openssl.c:(.text+0x20b0): undefined reference to 
`SSL_get1_peer_certificate'
collect2: error: ld returned 1 exit status

Previously, this was working fine.

In this run, openssl is 
  checking for openssl... /usr/bin/openssl
  configure: using openssl: OpenSSL 1.1.1g FIPS  21 Apr 2020

but that's the same that was used in the last successful run[2], too.

Not sure what else could be relevant.


[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=cisticola&dt=2024-07-24%2010%3A20%3A37
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=cisticola&dt=2024-07-20%2022%3A20%3A38&stg=configure

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"No renuncies a nada. No te aferres a nada."

Re: Removing unneeded self joins

2024-07-24 Thread Dean Rasheed

On Sat, 20 Jul 2024 at 12:39, Alexander Korotkov  wrote:
>
> > The new function replace_relid() looks to be the same as adjust_relid_set().
>
> They are similar, not the same.  replace_relid() has handling for
> negative newId, while adjust_relid_set() hasn't.  One thing I'd like
> to borrow from adjust_relid_set() to replace_relid() is the usage of
> IS_SPECIAL_VARNO() macro.

Ah, that makes sense. In that case, I'd say that replace_relid()
should go in analyzejoins.c (and be a local function there), since
that's the only place that requires this special negative newId
handling.

> It would be probably nice to move this logic into bms_replace_member()
> residing at bitmapset.c.  What do you think?

Maybe. It feels a little specialised though, so maybe it's not worth the effort.

I have been reviewing more of the patch, mainly focusing on the logic
in analyzejoins.c that decides when to apply SJE.

I understand broadly what the code is doing, but I still find it
somewhat hard to follow. One thing that makes it hard is that in
analyzejoins.c, "inner" and "outer" get swapped round at various
points. For example generate_join_implied_equalities() is defined like
this:

List *
generate_join_implied_equalities(PlannerInfo *root,
 Relids join_relids,
 Relids outer_relids,
 RelOptInfo *inner_rel,
 SpecialJoinInfo *sjinfo);

but remove_self_joins_one_group() calls it like this:

restrictlist = generate_join_implied_equalities(root, joinrelids,
inner->relids,
outer, NULL);

So you have to remember that "inner" is "outer" and "outer" is "inner"
when going into generate_join_implied_equalities() from
remove_self_joins_one_group(). And the same thing happens when calling
innerrel_is_unique_ext() and match_unique_clauses(). I think all that
could be resolved by swapping "inner" and "outer" in the variable
names and comments in remove_self_joins_one_group().

Another thing I noticed in remove_self_joins_one_group() was this:

/*
 * To enable SJE for the only degenerate case without any self
 * join clauses at all, add baserestrictinfo to this list. The
 * degenerate case works only if both sides have the same clause.
 * So doesn't matter which side to add.
 */
selfjoinquals = list_concat(selfjoinquals, outer->baserestrictinfo);

That appears to be pointless, because is_innerrel_unique_for() will
filter the restrictlist it is given, removing those baserestrictinfo
clauses (because I think they'll always have can_join = false). And
then relation_has_unique_index_ext() will re-add them:

/*
 * Examine the rel's restriction clauses for usable var = const clauses
 * that we can add to the restrictlist.
 */
foreach(ic, rel->baserestrictinfo)
{
... add suitable clauses
}

where "rel" is "innerrel" from is_innerrel_unique_for(), which is
"outer" from remove_self_joins_one_group(), so it's the same set of
baserestrictinfo clauses.

Something else that looks a little messy is this in innerrel_is_unique_ext():

/*
 * innerrel_is_unique_ext
 *Do the same as innerrel_is_unique(), but also set to '*extra_clauses'
 *additional clauses from a baserestrictinfo list that were used to prove
 *uniqueness.  A non NULL 'extra_clauses' indicates that we're checking
 *for self-join and correspondingly dealing with filtered clauses.
 */
bool
innerrel_is_unique_ext(PlannerInfo *root,
   ...
   List **extra_clauses)
{
boolself_join = (extra_clauses != NULL);

[logic depending on self_join]
}

This presumes that any caller interested in knowing the extra
baserestrictinfo clauses used to prove uniqueness must be looking at a
self join. That may be true today, but it doesn't seem like a good API
design choice. I think it would be better to just add "self_join" as
an extra parameter, and also maybe have the function return the
UniqueRelInfo containing the "extra_clauses", or NULL if it's not
unique. That way, it would be more extensible, if we wanted it to
return more information in the future.

Instead of adding relation_has_unique_index_ext(), maybe it would be
OK to just change the signature of relation_has_unique_index_for(). It
looks like it's only called from one other place outside
analyzejoins.c. Perhaps the same is true for innerrel_is_unique_ext().

Should match_unique_clauses() be comparing mergeopfamilies or opnos to
ensure that the clauses are using the same equality operator?

Regards,
Dean

Re: [PATCH] GROUP BY ALL

2024-07-24 Thread Andrey M. Borodin

> On 24 Jul 2024, at 13:58, Jelte Fennema-Nio  wrote:
> 
> On Tue, 23 Jul 2024 at 15:22, Andrei Borodin  wrote:
>> I'd like to have GROUP BY AUTO (I also proposed version GROUP BY SURPRISE 
>> ME). But I wouldn't like to open pandora box of syntax sugar extensions 
>> which may will be incompatible with future standards.
>> If we could have extensible grammar - I'd be happy to have a lot of such 
>> enhancements. My top 2 are FROM table SELECT column and better GROUP BY.
> 
> Personally my number one enhancement would be allowing a trailing
> comma after the last column in the SELECT clause.

Yes, trailing comma sounds great too.

One more similar syntax sugar I can think of. I see lots of queries like
SELECT somtheing
FROM table1
WHERE 1=1
and id = x
--and col1 = val1
and col2 = val2

I was wondering where does that "1=1" comes from. It's because developer 
comment condition one by one like "--, col1 = val1". And they do not want to 
cope with and\or continuation.

Best regards, Andrey Borodin.

PS. Seems like Mail.App mangled my previous message despite using plain text. 
It's totally broken in archives...

Re: PG buildfarm member cisticola

2024-07-24 Thread Alvaro Herrera

On 2024-Jul-24, Alvaro Herrera wrote:

> be-secure-openssl.c:(.text+0x1f8c): undefined reference to `ERR_new'
> be-secure-openssl.c:(.text+0x1fa4): undefined reference to `ERR_set_debug'
> be-secure-openssl.c:(.text+0x1fb8): undefined reference to `ERR_set_error'
> be-secure-openssl.c:(.text+0x2010): undefined reference to `ERR_new'
> be-secure-openssl.c:(.text+0x20b0): undefined reference to 
> `SSL_get1_peer_certificate'

> In this run, openssl is 
>   checking for openssl... /usr/bin/openssl
>   configure: using openssl: OpenSSL 1.1.1g FIPS  21 Apr 2020

Hmm, it appears that these symbols did not exist in 1.1.1g, and since
neither of them is invoked directly by the Postgres code, maybe the
reason for this is that the OpenSSL headers being used do not correspond
to the library being linked.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

RE: [Proposal] Add foreign-server health checks infrastructure

2024-07-24 Thread Hayato Kuroda (Fujitsu)

Dear Fujii-san,

> On 2024/07/18 19:49, Hayato Kuroda (Fujitsu) wrote:
> >> Shouldn't this test also check if the returned user_name is valid?
> >
> > You meant to say that we should print the user_name, right? Done.
> 
> Yes, I think it's better to test if the value in the user_name column is as 
> expected.

But this might cause a test failure by cfbot See below comment.

> > - I found an inconsistency of name between source and doc,
> >so I unified to postgres_fdw_can_verify_connection().
> 
> I'm unsure about the necessity of introducing a standalone function to check
> POLLRDHUP availability. Instead of providing
> postgres_fdw_can_verify_connection(),
> could we modify postgres_fdw_get_connections() to return NULL in the "closed"
> column on platforms where POLLRDHUP isn't supported?

Initially I felt that user might not be able to distinguish whether 1) the
connection has not been established yet or 2) the checking cannot be done on 
this
platform. But after considering more, such servers are not listed in the 
function.
So modified like that.

> > - Also, test patch (0002) was combined into this.
> > 0002:
> > - user_name was added after the `valid` to preserve ordering of the old 
> > version.
> 
> Do we really need to keep this ordering? Wouldn't it be more intuitive to
> have the user_name column next to server_name? In pg_stat_statements,
> for example, the ordering isn't always preserved, as seen with
> WAL-related columns being added in the middle.

I also prefer the style, so changed accordingly.

> > Thanks for reviewing the patch! PSA new version.
> 
> Thanks for updating the patches!
> 
> 
> The regression test for postgres_fdw failed with the following diff.
> 
> --
>   SELECT * FROM postgres_fdw_get_connections() ORDER BY 1;
>server_name | valid | user_name | used_in_xact | closed
>   -+---+---+--+
> - loopback| f | hayato| t|
> + loopback| f | runner| t|
>| f |   | t|
>   (2 rows)
> --

This was because the user_name is quite depends on the environment.
I think it looks not so good, but one approach is to print `user_name = 
CURRENT_USER`
to test the feature. For loopback3, the user_name is set to NULL so that the 
column
will be NULL as well. How do you think?  Do you have better idea?

> +  * If requested and the connection is not invalidated,
> check the
> +  * status of the remote connection from the backend
> process and
> +  * return the result. Otherwise returns NULL.
> +  */
> + if (require_verify && !entry->invalidated &&
> entry->conn)
> 
> Should we also consider checking invalidated connections? Even though
> a connection is marked as invalidated, it can still be used until
> the current transaction ends. Therefore, if an invalidated connection
> is used in this transaction (i.e., used_in_xact = true) and has
> already been closed (closed = true), users might want to consider
> rolling back the transaction promptly. Thought?

I confirmed the meaning of `invalidated` attribute. IIUC:

- It is set to true when the server or user mapping is altered, but
- This connection has already been opened within the transaction.

If the entry is invalided, the server_name of postgres_fdw_get_connections is 
set
set to NULL (because the entry may be modified). Also, the connection is 
discarded
when the transaction ends.
Based on the unserstanding, yes, the connection should be also checked. One 
concern
is that user may not recognize which connection is lost (because the column may 
be blank).

> + -- Set client_min_messages to ERROR temporary because the
> following
> + -- function only throws a WARNING on the supported platform.
> 
> Is this still true? From my reading of the code, it doesn't appear
> that the function throws a WARNING.

Good finding, removed.

Attached patches contain above fixes and comment improvements per request from 
GPT-4o.

Best regards,
Hayato Kuroda
FUJITSU LIMITED



v42-0001-postgres_fdw-Allow-postgres_fdw_get_connections-.patch
Description:  v42-0001-postgres_fdw-Allow-postgres_fdw_get_connections-.patch


v42-0002-Extend-postgres_fdw_get_connections-to-return-us.patch
Description:  v42-0002-Extend-postgres_fdw_get_connections-to-return-us.patch

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-24 Thread Noah Misch

On Tue, Jul 23, 2024 at 01:07:49PM -0700, Jeff Davis wrote:
> On Tue, 2024-07-23 at 07:39 -0700, Noah Misch wrote:
> > Short-term, we should remedy the step backward that pg_c_utf8 has taken:
> > https://postgr.es/m/20240718233908.52.nmi...@google.com
> > https://postgr.es/m/486d71991a3f80ec1c47e1bd7931e2ef3627b6b3.ca...@cybertec.at
> 
> Obviously I disagree that we've taken a step backwards.

Yes.

> Can you articulate the principle by which all of the other problems
> with IMMUTABLE are just fine, but updates to Unicode are intolerable,
> and only for PG_C_UTF8?

No, because I don't think all the other problems with IMMUTABLE are just fine.
The two messages linked cover the comparisons I do consider important,
especially the comparison between pg_c_utf8 and packager-frozen ICU.

Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

2024-07-24 Thread John Naylor

On Wed, Jul 24, 2024 at 2:42 PM John Naylor  wrote:
> As for lowering the limit, we've experimented with 256kB here:
>
> https://www.postgresql.org/message-id/canwcazzutvz3lsypauyqvzcezxz7qe+9ntnhgyzdtwxpul+...@mail.gmail.com
>
> As I mention there, going lower than that would need a small amount of
> reorganization in the radix tree. Not difficult -- the thing I'm
> concerned about is that we'd likely need to document a separate
> minimum for DSA, since that behaves strangely with 256kB and might not
> work at all lower than that.

For experimentation, here's a rough patch (really two, squashed
together for now) that allows m_w_m to go down to 64kB.

drop table if exists test;
create table test (a int) with (autovacuum_enabled=false, fillfactor=10);
insert into test (a) select i from generate_series(1,2000) i;
create index on test (a);
update test set a = a + 1;

set maintenance_work_mem = '64kB';
vacuum (verbose) test;

INFO:  vacuuming "john.public.test"
INFO:  finished vacuuming "john.public.test": index scans: 3
pages: 0 removed, 91 remain, 91 scanned (100.00% of total)

The advantage with this is that we don't need to care about
MEMORY_CONTEXT_CHECKING or 32/64 bit-ness, since allocating a single
large node will immediately blow the limit, and that will happen
fairly quickly regardless. I suspect going this low will not work with
dynamic shared memory and if so would need a warning comment.
From 25ccfb9ed812c72194bd585906f9f11e18400422 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Wed, 24 Jul 2024 18:41:51 +0700
Subject: [PATCH v1] WIP: set minimum maintenance_work_mem to 64kB, matching
 work_mem

To allow this to work for vacuum, create radix tree node
contexts lazily.
---
 src/backend/utils/misc/guc_tables.c |  2 +-
 src/include/lib/radixtree.h | 44 +
 2 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f6fcdebb03..853ad8dba1 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -2445,7 +2445,7 @@ struct config_int ConfigureNamesInt[] =
 			GUC_UNIT_KB
 		},
 		&maintenance_work_mem,
-		65536, 1024, MAX_KILOBYTES,
+		65536, 64, MAX_KILOBYTES,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index 88bf695e3f..bd83b670dd 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -834,12 +834,30 @@ RT_ALLOC_NODE(RT_RADIX_TREE * tree, const uint8 kind, const RT_SIZE_CLASS size_c
 	RT_CHILD_PTR allocnode;
 	RT_NODE*node;
 	size_t		allocsize;
+	RT_SIZE_CLASS_ELEM class_info;
 
-	allocsize = RT_SIZE_CLASS_INFO[size_class].allocsize;
+	class_info = RT_SIZE_CLASS_INFO[size_class];
+	allocsize = class_info.allocsize;
 
 #ifdef RT_SHMEM
 	allocnode.alloc = dsa_allocate(tree->dsa, allocsize);
 #else
+
+	/*
+	 * The context for RT_CLASS_4 should have been created when the tree was
+	 * initialized.
+	 */
+	Assert(tree->node_slabs[RT_CLASS_4] != NULL);
+		if (size_class > RT_CLASS_4 && tree->node_slabs[size_class] == NULL)
+	{
+		size_t		blocksize = RT_SLAB_BLOCK_SIZE(allocsize);
+
+		tree->node_slabs[size_class] = SlabContextCreate(tree->context,
+		 class_info.name,
+		 blocksize,
+		 allocsize);
+	}
+
 	allocnode.alloc = (RT_PTR_ALLOC) MemoryContextAlloc(tree->node_slabs[size_class],
 		allocsize);
 #endif
@@ -1827,6 +1845,9 @@ RT_CREATE(MemoryContext ctx)
 	RT_CHILD_PTR rootnode;
 #ifdef RT_SHMEM
 	dsa_pointer dp;
+#else
+	RT_SIZE_CLASS_ELEM class_info;
+	size_t		node_blocksize;
 #endif
 
 	old_ctx = MemoryContextSwitchTo(ctx);
@@ -1852,17 +1873,18 @@ RT_CREATE(MemoryContext ctx)
 #else
 	tree->ctl = (RT_RADIX_TREE_CONTROL *) palloc0(sizeof(RT_RADIX_TREE_CONTROL));
 
-	/* Create a slab context for each size class */
-	for (int i = 0; i < RT_NUM_SIZE_CLASSES; i++)
-	{
-		RT_SIZE_CLASS_ELEM size_class = RT_SIZE_CLASS_INFO[i];
-		size_t		inner_blocksize = RT_SLAB_BLOCK_SIZE(size_class.allocsize);
+	/*
+	 * For now only create a slab context for the smallest size class. The
+	 * rest will be created on demand. This allows us to stay within memory
+	 * settings that are smaller than the size of the larger slab blocks.
+	 */
+	class_info = RT_SIZE_CLASS_INFO[RT_CLASS_4];
+	node_blocksize = RT_SLAB_BLOCK_SIZE(class_info.allocsize);
 
-		tree->node_slabs[i] = SlabContextCreate(ctx,
-size_class.name,
-inner_blocksize,
-size_class.allocsize);
-	}
+	tree->node_slabs[RT_CLASS_4] = SlabContextCreate(ctx,
+	 class_info.name,
+	 node_blocksize,
+	 class_info.allocsize);
 
 	/* By default we use the passed context for leaves. */
 	tree->leaf_context = tree->context;
-- 
2.45.2

Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bindx

2024-07-24 Thread Aleksander Alekseev

Hi,

> It took me a couple of days to get back to it, but attached is what I
> have finished with.  This was mostly OK, except for a few things:
> - \close was inconsistent with the other two commands, where no
> argument was treated as the unnamed prepared statement.  I think that
> this should be made consistent with \parse and \bindx, requiring an
> argument, where '' is the unnamed statement.
> - The docs did not mention the case of the unnamed statement, so added
> some notes about that.
> - Some free() calls were not needed in the command executions, where
> psql_scan_slash_option() returns NULL.
> - Tests missing when no argument is provided for the new commands.
>
> One last thing I have found really confusing is that this leads to the
> addition of two more status flags in pset for the close and parse
> parts, with \bind and \bindx sharing the third one while deciding
> which path to use depending on if the statement name is provided.
> That's fragile.  I think that it would be much cleaner to put all that
> behind an enum, falling back to PQsendQuery() by default.  I am
> attaching that as 0002, for clarity, but my plan is to merge both 0001
> and 0002 together.

I reviewed and tested v6. I believe it's ready to be merged.

-- 
Best regards,
Aleksander Alekseev

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-24 Thread Robert Haas

On Wed, Jul 24, 2024 at 12:42 AM Peter Eisentraut  wrote:
> Fair enough.  My argument was, that topic is distinct from the topic of
> this thread.

OK, that's fair. But I think the solutions are the same: we complain
all the time about glibc and ICU shipping collations and not
versioning them. We shouldn't make the same kinds of mistakes. Even if
ctype is less likely to break things than collations, it still can,
and we should move in the direction of letting people keep the v17
behavior for the foreseeable future while at the same time having a
way that they can also get the new behavior if they want it (and the
new behavior should be the default).

I note in passing that the last time I saw a customer query with
UPPER() in the join clause was... yesterday. The problems there had
nothing to do with CTYPE, but there's no reason to suppose that it
couldn't have had such a problem. I suspect the reason we don't hear
about ctype problems now is that the collation problems are worse and
happen in similar situations. But if all the collation problems went
away, a subset of the same users would then be unhappy about ctype.

So I don't want to see us sit on our hands and assert that we don't
need to worry about ctype because it's minor in comparison with
collation. It *is* minor in comparison with collation. But one problem
can be small in comparison with another and still bad. If an aircraft
is on fire whilst experiencing a dual engine failure, it's still in a
lot of trouble even if the fire can be put out.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions

2024-07-24 Thread Ashutosh Sharma

Hi All,

Here is the v4 patch with the following new changes:

1) Now allows users to explicitly set search_path to $extension_schema.

2) Includes updates to the documentation.

3) Adds comments where previously absent.

Note: The new control file parameter is currently named as "protected"
which is actually not the precise name knowing that it just solves a
small portion of security problems related to extensions. I intend to
rename it to something more appropriate later; but any suggestions are
welcome.

Besides, should we consider restricting the installation of extensions
in schemas where a UDF with the same name that the extension intends
to create already exists? Additionally, similar restrictions can also
be applied if UDF being created shares its name with a function
already created by an extension in that schema? I haven't looked at
the feasibility part, but just thought of sharing it just because
something of this sort would probably solve most of the issues related
to extensions.

--
With Regards,
Ashutosh Sharma.


v4-0001-Introduce-new-control-file-parameter-protected-to-de.patch
Description: Binary data

Re: Add privileges test for pg_stat_statements to improve coverage

2024-07-24 Thread Fujii Masao





On 2024/07/24 10:23, kuroda.keis...@nttcom.co.jp wrote:

I've slightly modified the comments in the regression tests for clarity.
Attached is the v6 patch. If there are no objections,
I will push this version.


Thank you for updating patch! I have no objection.


Pushed. Thanks!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Interrupts vs signals

2024-07-24 Thread Heikki Linnakangas


On 10/07/2024 09:48, Thomas Munro wrote:

On Mon, Jul 8, 2024 at 9:38 PM Heikki Linnakangas  wrote:

The patch actually does both: it still does kill(SIGUSR1) and also sets
the latch.


Yeah, I had some ideas about supporting old extension code that really
wanted a SIGUSR1, but on reflection, the only reason anyone ever wants
that is so that sigusr1_handler can SetLatch(), which pairs with
WaitLatch() in WaitForBackgroundWorker*().  Let's go all the way and
assume that.


+1


I think it would be nice if RegisterDynamicBackgroundWorker() had a
"bool notify_me" argument, instead of requiring the caller to set
"bgw_notify_pid = MyProcPid" before the call. That's a
backwards-compatibility break, but maybe we should bite the bullet and
do it. Or we could do this in RegisterDynamicBackgroundWorker():

if (worker->bgw_notify_pid == MyProcPid)
  worker->bgw_notify_pgprocno = MyProcNumber;

I think we can forbid setting pgw_notify_pid to anything else than 0 or
MyProcPid.


Another idea: we could keep the bgw_notify_pid field around for a
while, documented as unused and due to be removed in future.  We could
automatically capture the caller's proc number.  So you get latch
wakeups by default, which I expect many people want, and most people
could cope with even if they don't want them.  If you really really
don't want them, you could set a new flag BGW_NO_NOTIFY.


Ok. I was going to say that it feels excessive to change the default 
like that. However, searching for RegisterDynamicBackgroundWorker() in 
github, I can't actually find any callers that don't set pg_notify_pid. 
So yeah, make sense.



I have now done this part of the change in a new first patch.  This
particular use of kill(SIGUSR1) is separate from the ProcSignal
removal, it's just that it relies on ProcSignal's handler's default
action of calling SetLatch().  It's needed so the ProcSignal-ectomy
can fully delete sigusr1_handler(), but it's not directly the same
thing, so it seemed good to split the patch.


PostmasterMarkPIDForWorkerNotify() is now unused. Which means that 
bgworker_notify is never set, and BackgroundWorkerStopNotifications() is 
never called either.



A SetLatchRobust would be nice. Looking at SetLatch(), I don't think it
can do any damage if you called it on a pointer to garbage, except if
the pointer itself is bogus, then just dereferencing it an cause a
segfault. So it would be nice to have a version specifically designed
with that in mind. For example, it could assume that the Latch's pid is
never legally equal to MyProcPid, because postmaster cannot own any latches.


Yeah I'm starting to think that all we need to do here is range-check
the proc number.  Here's a version that adds:
ProcSetLatch(proc_number).  Another idea would be for SetLatch(latch)
to sanitise the address of a latch, ie its offset and range.


Hmm, I don't think postmaster should trust ProcGlobal->allProcCount either.


The next problems to remove are, I think, the various SIGUSR2, SIGINT,
SIGTERM signals sent by the postmaster.  These should clearly become
SendInterrupt() or ProcSetLatch().


+1


The problem here is that the
postmaster doesn't have the proc numbers yet.  One idea is to teach
the postmaster to assign them!  Not explored yet.


I've been thinking that we should:

a) assign every child process a PGPROC entry, and make postmaster 
responsible for assigning them like you suggest. We'll need more PGPROC 
entries, because currently a process doesn't reserve one until 
authentication has happened. Or we change that behavior.


or

b) Use the pmsignal.c slot numbers for this, instead of ProcNumbers. 
Postmaster already assigns those.


I'm kind of leaning towards b) for now, because it feels like a much 
smaller patch. In the long run, it would be nice if every child process 
had a ProcNumber, though. It was a nice simplification in v17 that we 
don't have separate BackendId and ProcNumber anymore; similarly it would 
be nice to not have separate PMChildSlot and ProcNumber concepts.


--
Heikki Linnakangas
Neon (https://neon.tech)

Sporadic connection-setup-related test failures on Cygwin in v15-

2024-07-24 Thread Alexander Lakhin


Hello hackers,

A recent lorikeet (a Cygwin animal) failure [1] revealed one more
long-standing (see also [2], [3], [4]) issue related to Cygwin:
 SELECT dblink_connect('dtest1', connection_parameters());
- dblink_connect
-
- OK
-(1 row)
-
+ERROR:  could not establish connection
+DETAIL:  could not connect to server: Connection refused

where inst/logfile contains:
2024-07-16 05:38:21.492 EDT [66963f67.7823:4] LOG:  could not accept new 
connection: Software caused connection abort
2024-07-16 05:38:21.492 EDT [66963f8c.79e5:170] pg_regress/dblink ERROR:  could 
not establish connection
2024-07-16 05:38:21.492 EDT [66963f8c.79e5:171] pg_regress/dblink DETAIL:  
could not connect to server: Connection refused
    Is the server running locally and accepting
    connections on Unix domain socket 
"/home/andrew/bf/root/tmp/buildfarm-DK1yh4/.s.PGSQL.5838"?

I made a standalone reproducing script (assuming the dblink extension
installed):
numclients=50
for ((i=1;i<=1000;i++)); do
echo "iteration $i"

for ((c=1;c<=numclients;c++)); do
cat << 'EOF' | /usr/local/pgsql/bin/psql >/dev/null 2>&1 &

SELECT 'dbname='|| current_database()||' port='||current_setting('port')
  AS connstr
\gset

SELECT * FROM dblink('service=no_service', 'SELECT 1') AS t(i int);

SELECT * FROM
dblink(:'connstr', 'SELECT 1') AS t1(i int),
dblink(:'connstr', 'SELECT 2') AS t2(i int),
dblink(:'connstr', 'SELECT 3') AS t3(i int),
dblink(:'connstr', 'SELECT 4') AS t4(i int),
dblink(:'connstr', 'SELECT 5') AS t5(i int);
EOF
done
wait

grep -A1 "Software caused connection abort" server.log && break;
done

which fails for me as below:
iteration 318
2024-07-24 04:19:46.511 PDT [29062:6][postmaster][:0] LOG:  could not accept new connection: Software caused connection 
abort

2024-07-24 04:19:46.512 PDT [25312:8][client backend][36/1996:0] ERROR:  could 
not establish connection

The important fact here is that this failure is not reproduced after
7389aad63 (in v16), so it seems that it's somehow related to signal
processing. Given that, I'm inclined to stop here, without digging deeper,
at least until there are plans to backport that fix or something...

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2024-07-16%2009%3A18%3A31
 (REL_13_STABLE)
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2022-07-21%2000%3A36%3A44
 (REL_14_STABLE)
[3] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2023-07-06%2009%3A19%3A36
 (REL_12_STABLE)
[4] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2022-02-12%2001%3A40%3A56 (REL_13_STABLE, 
postgres_fdw)


Best regards,
Alexander

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2024-07-24 Thread Noah Misch

On Tue, Jul 23, 2024 at 06:01:44PM +0900, Michael Paquier wrote:
> Hearing nothing but cicadas as now is their season, I have taken the
> initiative here to address this open item.
> 
> 0001 felt a bit too complicated in slru.h, so I have simplified it and
> kept all the details in slru.c with SlruFileName().
> 
> I have reviewed all the code that uses SLRUs, and spotted three more
> problematic code paths in predicate.c that needed an update like the
> others for some pagenos.  I've added these, and applied 0002.  We
> should be good now.

I'm still seeing need for s/int/int64/ at:

- "pagesegno" variable
- return value of MultiXactIdToOffsetSegment()
- return value of MXOffsetToMemberSegment()
- callers of previous two

Only the first should be a live bug, since multixact isn't electing the higher
pageno ceiling.

Re: apply_scanjoin_target_to_paths and partitionwise join

2024-07-24 Thread Ashutosh Bapat

On Wed, Jul 24, 2024 at 9:42 AM Richard Guo  wrote:
>
> On Wed, May 22, 2024 at 3:57 PM Ashutosh Bapat
>  wrote:
> > I will create patches for the back-branches once the patch for master is in 
> > a committable state.
>
> AFAIU, this patch prevents apply_scanjoin_target_to_paths() from
> discarding old paths of partitioned joinrels.  Therefore, we can
> retain non-partitionwise join paths if the cheapest path happens to be
> among them.

Right. Thanks for the summary.

>
> One concern from me is that if the cheapest path of a joinrel is a
> partitionwise join path, following this approach could lead to
> undesirable cross-platform plan variations, as detailed in the
> original comment.

I read through the email thread [3] referenced in the commit
(1d338584062b3e53b738f987ecb0d2b67745232a) which added that comment.
The change is mentioned in [4] first. Please notice that this change
is unrelated to the bug that started the thread. [5], [6] talk about
the costs of projection path above Append vs project path below
Append. But I don't see any example of any cross-platform plan
variations. I also do not see an example in that thread where such a
plan variation results in bad performance. If the costs of
partitionwise and non-partitionwise join paths are so close to each
other that platform specific arithmetic can swing it one way or the
other, possibly their performance is going to be comparable. Without
an example query it's hard to assess this possibility or address the
concern, especially when we have examples of the behaviour otherwise.

>
> Is there a specific query that demonstrates benefits from this change?
> I'm curious about scenarios where a partitionwise join runs slower
> than a non-partitionwise join.

[1] provides a testcase where a nonpartitionwise join is better than
partitionwise join. This testcase is derived from a bug reported by an
EDB customer. [2] is another bug report on psql-bugs.

[1] 
https://www.postgresql.org/message-id/CAKZiRmyaFFvxyEYGG_hu0F-EVEcqcnveH23MULhW6UY_jwykGw%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/786.1565541557%40sss.pgh.pa.us#9d50e1b375201f29bbf17072d75569e3
[3] 
https://www.postgresql.org/message-id/flat/15669-02fb3296cca26203%40postgresql.org
[4] https://www.postgresql.org/message-id/20477.1551819776%40sss.pgh.pa.us
[5]  https://www.postgresql.org/message-id/15350.1551973953%40sss.pgh.pa.us
[6] https://www.postgresql.org/message-id/24357.1551984010%40sss.pgh.pa.us

--
Best Wishes,
Ashutosh Bapat

Re: Reducing the log spam

2024-07-24 Thread Rafia Sabih

Hello Laurenz,

I liked the idea for this patch. I will also go for the default being
an empty string.
I went through this patch and have some comments on the code,

1. In general, I don't like the idea of goto, maybe we can have a
free_something function to call here.

2.
if (!SplitIdentifierString(new_copy, ',', &states))
{
GUC_check_errdetail("List syntax is invalid.");
goto failed;
}
Here, we don't need all that free-ing, we can just return false here.

3.
/*
* Check the the values are alphanumeric and convert them to upper case
* (SplitIdentifierString converted them to lower case).
*/
for (p = state; *p != '\0'; p++)
if (*p >= 'a' && *p <= 'z')
*p += 'A' - 'a';
else if (*p < '0' || *p > '9')
{
GUC_check_errdetail("error codes can only contain digits and ASCII letters.");
goto failed;
}
I was thinking, maybe we can use tolower() function here.

4.
list_free(states);
pfree(new_copy);

*extra = statelist;
return true;

failed:
list_free(states);
pfree(new_copy);
guc_free(statelist);
return false;

This looks like duplication that can be easily avoided.
You may have free_somthing function to do all free-ing stuff only and
its callee can then have a return statement.
e.g for here,
free_states(states, new_copy, statelist);
return true;

5. Also, for alphanumeric check, maybe we can have something like,
if (isalnum(*state) == 0)
{
GUC_check_errdetail("error codes can only contain digits and ASCII letters.");
goto failed;
}
and we can do this in the beginning after the len check.

On Tue, 18 Jun 2024 at 18:49, Laurenz Albe  wrote:
>
> On Mon, 2024-06-17 at 16:40 -0500, Justin Pryzby wrote:
> > > The feature will become much less useful if unique voilations keep 
> > > getting logged.
> >
> > Uh, to be clear, your patch is changing the *defaults*, which I found
> > surprising, even after reaading the thread.  Evidently, the current
> > behavior is not what you want, and you want to change it, but I'm *sure*
> > that whatever default you want to use at your site/with your application
> > is going to make someone else unhappy.  I surely want unique violations
> > to be logged, for example.
>
> I was afraid that setting the default non-empty would cause objections.
>
> > > +   
> > > +This setting is useful to exclude error messages from the log 
> > > that are
> > > +frequent but irrelevant.
> >
> > I think you should phrase the feature as ".. *allow* skipping error
> > logging for messages ... that are frequent but irrelevant for a given
> > site/role/DB/etc."
>
> I have reworded that part.
>
> > I suggest that this patch should not change the default behavior at all:
> > its default should be empty.  That you, personally, would plan to
> > exclude this or that error code is pretty uninteresting.  I think the
> > idea of changing the default behavior will kill the patch, and even if
> > you want to propose to do that, it should be a separate discussion.
> > Maybe you should make it an 002 patch.
>
> I have attached a new version that leaves the parameter empty by default.
>
> The patch is not motivated by my personal dislike of certain error messages.
> A well-written application would not need that parameter at all.
> The motivation for me is based on my dealing with customers' log files,
> which are often full of messages that are only distracting from serious
> problems and fill up the disk.
>
> But you are probably right that it would be hard to find a default setting
> that nobody has quibbles with, and the default can always be changed with
> a future patch.
>
> Yours,
> Laurenz Albe



-- 
Regards,
Rafia Sabih

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-24 Thread Jeremy Schneider

On Wed, Jul 24, 2024 at 6:20 AM Robert Haas  wrote:

>
> I note in passing that the last time I saw a customer query with
> UPPER() in the join clause was... yesterday. The problems there had
> nothing to do with CTYPE, but there's no reason to suppose that it
> couldn't have had such a problem. I suspect the reason we don't hear
> about ctype problems now is that the collation problems are worse and
> happen in similar situations. But if all the collation problems went
> away, a subset of the same users would then be unhappy about ctype.


I have seen and created indexes on upper() functions a number of times too,
and I think this is not an uncommon pattern for case insensitive searching

Before glibc 2.28, there was at least one mailing list thread where an
unhappy person complained about collation problems; but for a number of
years before 2.28 I guess the collation changes were uncommon so it didn’t
get enough momentum to be considered a real problem until the problem
became widespread a few years ago?

https://www.postgresql.org/message-id/flat/BA6132ED-1F6B-4A0B-AC22-81278F5AB81E%40tripadvisor.com

I myself would prefer an approach here that sets a higher bar for
pg_upgrade not corrupting indexes, rather than saying it’s ok as long as
it’s rare

-Jeremy

Re: improve ssl error code, 2147483650

2024-07-24 Thread Peter Eisentraut


On 25.06.24 16:21, Tom Lane wrote:

Peter Eisentraut  writes:

On 21.06.24 16:53, Tom Lane wrote:

Most of libpq gets at strerror_r via SOCK_STRERROR for Windows
portability.  Is that relevant here?



Looking inside the OpenSSL code, it makes no efforts to translate
between winsock error codes and standard error codes, so I don't think
our workaround/replacement code needs to do that either.


Fair enough.


Btw., our source code comments say something like
"ERR_reason_error_string randomly refuses to map system errno values."
The reason it doesn't is exactly that it can't do it while maintaining
thread-safety.


Ah.  Do you want to improve that comment while you're at it?


Here is a patch that fixes the strerror() call and updates the comments 
a bit.


This ought to be backpatched like the original fix; ideally for the next 
minor releases in about two weeks.
From fb61f5f897e6151957b6ea6e2dfd7905b48b7ca2 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 24 Jul 2024 14:58:12 +0200
Subject: [PATCH] libpq: Use strerror_r instead of strerror

Commit 453c4687377 introduced a use of strerror() into libpq, but that
is not thread-safe.  Fix by using strerror_r() instead.

In passing, update some of the code comments added by 453c4687377, as
we have learned more about the reason for the change in OpenSSL that
started this.

Discussion: Discussion: 
https://postgr.es/m/b6fb018b-f05c-4afd-abd3-318c649fa...@highgo.ca
---
 src/backend/libpq/be-secure-openssl.c|  9 +
 src/interfaces/libpq/fe-secure-openssl.c | 11 ++-
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c 
b/src/backend/libpq/be-secure-openssl.c
index 39b1a66236b..95b4d94a2e6 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -1453,10 +1453,11 @@ SSLerrmessage(unsigned long ecode)
return errreason;
 
/*
-* In OpenSSL 3.0.0 and later, ERR_reason_error_string randomly refuses 
to
-* map system errno values.  We can cover that shortcoming with this bit
-* of code.  Older OpenSSL versions don't have the ERR_SYSTEM_ERROR 
macro,
-* but that's okay because they don't have the shortcoming either.
+* In OpenSSL 3.0.0 and later, ERR_reason_error_string does not map 
system
+* errno values anymore.  (See OpenSSL source code for the explanation.)
+* We can cover that shortcoming with this bit of code.  Older OpenSSL
+* versions don't have the ERR_SYSTEM_ERROR macro, but that's okay 
because
+* they don't have the shortcoming either.
 */
 #ifdef ERR_SYSTEM_ERROR
if (ERR_SYSTEM_ERROR(ecode))
diff --git a/src/interfaces/libpq/fe-secure-openssl.c 
b/src/interfaces/libpq/fe-secure-openssl.c
index 5c867106fb0..b6fffd7b9b0 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1759,15 +1759,16 @@ SSLerrmessage(unsigned long ecode)
 #endif
 
/*
-* In OpenSSL 3.0.0 and later, ERR_reason_error_string randomly refuses 
to
-* map system errno values.  We can cover that shortcoming with this bit
-* of code.  Older OpenSSL versions don't have the ERR_SYSTEM_ERROR 
macro,
-* but that's okay because they don't have the shortcoming either.
+* In OpenSSL 3.0.0 and later, ERR_reason_error_string does not map 
system
+* errno values anymore.  (See OpenSSL source code for the explanation.)
+* We can cover that shortcoming with this bit of code.  Older OpenSSL
+* versions don't have the ERR_SYSTEM_ERROR macro, but that's okay 
because
+* they don't have the shortcoming either.
 */
 #ifdef ERR_SYSTEM_ERROR
if (ERR_SYSTEM_ERROR(ecode))
{
-   strlcpy(errbuf, strerror(ERR_GET_REASON(ecode)), SSL_ERR_LEN);
+   strerror_r(ERR_GET_REASON(ecode), errbuf, SSL_ERR_LEN);
return errbuf;
}
 #endif
-- 
2.45.2

Re: Send duration output to separate log files

2024-07-24 Thread Pavel Stehule

Hi

st 10. 7. 2024 v 17:58 odesílatel Greg Sabino Mullane 
napsal:

> Please find attached a patch to allow for durations to optionally be sent
> to separate log files. In other words, rather than cluttering up our
> postgres202007.log file with tons of output from
> log_min_duration_statement, duration lines are sent instead to the file
> postgres202007.duration.
>
> Over the years, durations have been the number one cause of very large log
> files, in which the more "important" items get buried in the noise. Also,
> programs that are scanning for durations typically do not care about the
> normal, non-duration output. Some people have a policy of logging
> everything, which in effect means setting log_min_duration_statement to 0,
> which in turn makes your log files nearly worthless for spotting
> non-duration items. This feature will also be very useful for those who
> need to temporarily turn on log_min_duration_statement, for some quick
> auditing of exactly what is being run on their database. When done, you can
> move or remove the duration file without messing up your existing log
> stream.
>
> This only covers the case when both the duration and statement are set on
> the same line. In other words, log_min_duration_statement output, but not
> log_duration (which is best avoided anyway). It also requires
> logging_collector to be on, obviously.
>
> Details:
>
> The edata structure is expanded to have a new message_type, with a
> matching function errmessagetype() created.
> [include/utils/elog.h]
> [backend/utils/elog.c]
>
> Any errors that have both a duration and a statement are marked via
> errmessagetype()
> [backend/tcop/postgres.c]
>
> A new GUC named "log_duration_destination" is created, which supports any
> combination of stderr, csvlog, and jsonlog. It does not need to match
> log_destination, in order to support different use cases. For example, the
> user wants durations sent to a CSV file for processing by some other tool,
> but still wants everything else going to a normal text log file.
>
> Code: [include/utils/guc_hooks.h] [backend/utils/misc/guc_tables.c]
> Docs: [sgml/config.sgml]  [backend/utils/misc/postgresql.conf.sample]
>
> Create a new flag called PIPE_PROTO_DEST_DURATION
> [include/postmaster/syslogger.h]
>
> Create new flags:
>   LOG_DESTINATION_DURATION,
>   LOG_DESTINATION_DURATION_CSV,
>   LOG_DESTINATION_DURATION_JSON
> [include/utils/elog.h]
>
> Routing and mapping LOG_DESTINATION to PIPE_PROTO
> [backend/utils/error/elog.c]
>
> Minor rerouting when using alternate forms
> [backend/utils/error/csvlog.c]
> [backend/utils/error/jsonlog.c]
>
> Create new filehandles, do log rotation, map PIPE_PROTO to
> LOG_DESTINATION. Rotation and entry into the "current_logfiles" file are
> the same as existing log files. The new names/suffixes are duration,
> duration.csv, and duration.json.
> [backend/postmaster/syslogger.c]
>
> Testing to ensure combinations of log_destination and
> log_duration_destination work as intended
> [bin/pg_ctl/meson.build]
> [bin/pg_ctl/t/005_log_duration_destination.pl]
>
> Questions I've asked along the way, and perhaps other might as well:
>
> What about logging other things? Why just duration?
>
> Duration logging is a very unique event in our logs. There is nothing
> quite like it - it's always client-driven, yet automatically generated. And
> it can be extraordinarily verbose. Removing it from the existing logging
> stream has no particular downsides. Almost any other class of log message
> would likely meet resistance as far as moving it to a separate log file,
> with good reason.
>
> Why not build a more generic log filtering case?
>
> I looked into this, but it would be a large undertaking, given the way our
> logging system works. And as per above, I don't think the pain would be
> worth it, as duration covers 99% of the use cases for separate logs.
> Certainly, nothing else other than a recurring ERROR from the client can
> cause massive bloat in the size of the files. (There is a nearby patch to
> exclude certain errors from the log file as a way to mitigate the error
> spam - I don't like that idea, but should mention it here as another effort
> to keep the log files a manageable size)
>
> Why not use an extension for this?
>
> I did start this as an extension, but it only goes so far. We can use
> emit_log_hook, but it requires careful coordination of open filehandles,
> has to do inefficient regex of every log message, and cannot do things like
> log rotation.
>
> Why not bitmap PIPE_PROTO *and* LOG_DESTINATION?
>
> I tried to do both as simple bitmaps (i.e. duration+csv = duration.csv),
> and not have to use e.g. LOG_DESTIATION_DURATION_CSV, but size_rotation_for
> ruined it for me. Since our PIPE always sends one thing at a time, a single
> new flag enables it to stay as a clean bits8 type.
>
> What about Windows?
>
> Untested. I don't have access to a Windows build, but I think in theory it
> should work fine.
>

I like the proposed fea

Re: warning: dereferencing type-punned pointer

2024-07-24 Thread Tom Lane

Tatsuo Ishii  writes:
> So I think the warnings in ExecEvalJsonExprPath are better fixed
> because these are the only places where IsA (nodeTag) macro are used
> and the warning is printed. Patch attached.

I'm not very thrilled with these changes.  It's not apparent why
your compiler is warning about these usages of IsA and not any other
ones, nor is it apparent why these changes suppress the warnings.
(The code's not fundamentally different, so I'd think the underlying
problem is still there, if there really is one at all.)
I'm afraid we'd be playing whack-a-mole to suppress similar warnings
on various compiler versions, with no end result other than making
the code uglier and less consistent.

If we can figure out why the warning is appearing, maybe it'd be
possible to adjust the definition of IsA() to prevent it.

regards, tom lane

Re: [PATCH] GROUP BY ALL

2024-07-24 Thread Ashutosh Bapat

On Tue, Jul 23, 2024 at 6:53 PM David Christensen  wrote:
>
> On Mon, Jul 22, 2024 at 4:34 PM David G. Johnston
>  wrote:
> >
> > On Mon, Jul 22, 2024 at 1:55 PM David Christensen  wrote:
> >>
> >> I see that there'd been some chatter but not a lot of discussion about
> >> a GROUP BY ALL feature/functionality.  There certainly is utility in
> >> such a construct IMHO.
> >>
> >> Still need some docs; just throwing this out there and getting some 
> >> feedback.
> >>
> >
> > I strongly dislike adding this feature.  I'd only consider supporting it if 
> > it was part of the SQL standard.
> >
> > Code is written once and read many times.  This feature caters to the 
> > writer, not the reader.  And furthermore usage of this is prone to be to 
> > the writer's detriment as well.
>
> I'd say this feature (at least for me) caters to the investigator;
> someone who is interactively looking at data hence why it would cater
> to the writer.  Consider acquainting yourself with a large table that
> has a large number of annoying-named fields where you want to look at
> how different data is correlated or broken-down.  Something along the
> lines of:

To me this looks like a feature that a data exploration tool may
implement instead of being part of the server. It would then provide
more statistics about each correlation/column set etc.

-- 
Best Wishes,
Ashutosh Bapat

Re: pg_upgrade failing for 200+ million Large Objects

2024-07-24 Thread Justin Pryzby

On Mon, Apr 01, 2024 at 03:28:26PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
> > The one design point that worries me a little is the non-configurability of
> > --transaction-size in pg_upgrade.  I think it's fine to default it to 1,000
> > or something, but given how often I've had to fiddle with
> > max_locks_per_transaction, I'm wondering if we might regret hard-coding it.
> 
> Well, we could add a command-line switch to pg_upgrade, but I'm
> unconvinced that it'd be worth the trouble.  I think a very large
> fraction of users invoke pg_upgrade by means of packager-supplied
> scripts that are unlikely to provide a way to pass through such
> a switch.  I'm inclined to say let's leave it as-is until we get
> some actual field requests for a switch.

I've been importing our schemas and doing upgrade testing, and was
surprised when a postgres backend was killed for OOM during pg_upgrade:

Killed process 989302 (postgres) total-vm:5495648kB, anon-rss:5153292kB, ...

Upgrading from v16 => v16 doesn't use nearly as much RAM.

While tracking down the responsible commit, I reproduced the problem
using a subset of tables; at 959b38d770, the backend process used
~650 MB RAM, and at its parent commit used at most ~120 MB.

959b38d770b Invent --transaction-size option for pg_restore.

By changing RESTORE_TRANSACTION_SIZE to 100, backend RAM use goes to
180 MB during pg_upgrade, which is reasonable.

With partitioning, we have a lot of tables, some of them wide (126
partitioned tables, 8942 childs, total 1019315 columns).  I didn't track
if certain parts of our schema contribute most to the high backend mem
use, just that it's now 5x (while testing a subset) to 50x higher.

We'd surely prefer that the transaction size be configurable.

-- 
Justin

pg_upgrade adds unexpected pg_constraint entries to pg_depend

2024-07-24 Thread Stan Hu

I've noticed that after running `pg_upgrade` that my `pg_depend` table
contains unexpected dependencies for sequences. Before the upgrade
from PostgreSQL 15.7:

```
% psql -d gitlabhq_production
psql (16.3, server 15.7)
Type "help" for help.

gitlabhq_production=# SELECT seq_pg_class.relname AS seq_name,
   dep_pg_class.relname AS table_name,
   pg_attribute.attname AS col_name,
   pg_depend.classid,
   classid_class.relname AS classid_relname,
   pg_depend.refclassid,
   refclassid_class.relname AS refclassid_relname
FROM pg_class seq_pg_class
INNER JOIN pg_depend ON seq_pg_class.oid = pg_depend.objid
INNER JOIN pg_class dep_pg_class ON pg_depend.refobjid = dep_pg_class.oid
INNER JOIN pg_attribute ON dep_pg_class.oid = pg_attribute.attrelid
   AND pg_depend.refobjsubid = pg_attribute.attnum
INNER JOIN pg_class classid_class ON pg_depend.classid = classid_class.oid
INNER JOIN pg_class refclassid_class ON pg_depend.refclassid =
refclassid_class.oid
WHERE seq_pg_class.relkind = 'S'
  AND (dep_pg_class.relname = 'p_ci_builds' OR dep_pg_class.relname =
'ci_builds');
 seq_name | table_name  | col_name | classid | classid_relname
| refclassid | refclassid_relname
--+-+--+-+-++
 ci_builds_id_seq | p_ci_builds | id   |1259 | pg_class
|   1259 | pg_class
(1 row)
```

After the upgrade to PostgreSQL 16.3, I see these dependencies:

```
seq_name | table_name  |   col_name
| classid | classid_relname | refclassid | refclassid_relname
-+-+--+-+-++
 ci_builds_id_seq| p_ci_builds | id
|1259 | pg_class|   1259 | pg_class
 note_metadata_note_id_seq   | ci_builds   | stage_id
|2606 | pg_constraint   |   1259 | pg_class
 note_metadata_note_id_seq   | ci_builds   | partition_id
|2606 | pg_constraint   |   1259 | pg_class
 project_repository_storage_moves_id_seq | ci_builds   | id
|2606 | pg_constraint   |   1259 | pg_class
 project_repository_storage_moves_id_seq | ci_builds   | partition_id
|2606 | pg_constraint   |   1259 | pg_class
 x509_commit_signatures_id_seq   | ci_builds   | id
|2606 | pg_constraint   |   1259 | pg_class
 x509_commit_signatures_id_seq   | ci_builds   | partition_id
|2606 | pg_constraint   |   1259 | pg_class
(7 rows)
```

What's odd is that the `pg_constraint` entries don't seem to be
deterministic: I often see different entries every time I run
`pg_upgrade`.

Are these entries expected to be there, or is this a bug?

Here's what I did to reproduce. I use `asdf` to manage multiple
versions, so I used the ASDF_POSTGRES_VERSION environment variable to
override which version to use:


1. First, install both PostgreSQL 15.7 and 16.3 via `asdf` (e.g. `asdf
install postgres 15.7 && asdf install postgres 16.3`). You may use any
two major versions.

2. Then run:

```shell
export ASDF_POSTGRES_VERSION=15.7
initdb /tmp/data.15
curl -O 
https://gitlab.com/gitlab-org/gitlab/-/raw/16-11-stable-ee/db/structure.sql
postgres -D /tmp/data.15
```

3. In another terminal, load this schema:

```shell
psql -d template1 -c 'create database gitlabhq_production'
psql -d gitlabhq_production < structure.sql
```

4. Check the constraints that `ci_builds_id_seq` is the only entry:

```sql
psql -d gitlabhq_production

gitlabhq_production=# SELECT seq_pg_class.relname AS seq_name,
dep_pg_class.relname AS table_name, pg_attribute.attname AS col_name,
deptype
FROM pg_class seq_pg_class
INNER JOIN pg_depend ON seq_pg_class.oid = pg_depend.objid
INNER JOIN pg_class dep_pg_class ON pg_depend.refobjid = dep_pg_class.oid
INNER JOIN pg_attribute ON dep_pg_class.oid = pg_attribute.attrelid
AND pg_depend.refobjsubid = pg_attribute.attnum
WHERE seq_pg_class.relkind = 'S'
dep_pg_class.relname = 'p_ci_builds';
 seq_name | table_name  | col_name | deptype
--+-+--+-
 ci_builds_id_seq | p_ci_builds | id   | a
(1 row)
```

5. Terminate `postgres` in the other window.
6. Now let's upgrade to PostgreSQL 16 and run the database:

```shell
export ASDF_POSTGRES_VERSION=16.3
initdb /tmp/data.16
pg_upgrade -b ~/.asdf/installs/postgres/15.7/bin -B
~/.asdf/installs/postgres/16.3/bin -d /tmp/data.15 -D /tmp/data.16
postgres -D /tmp/data.16
```

7. Now try the query and see the new entries:

```sql
gitlabhq_production=# SELECT seq_pg_class.relname AS seq_name,
dep_pg_class.relname AS table_name, pg_attribute.attname AS col_name,
deptype
FROM pg_class seq_pg_class
INNER JOIN pg_depend ON seq_pg_class.oid = pg_depend.objid
INNER JOIN pg_class dep_pg_class ON pg_depend.refobjid = dep_pg_class.oid
INNER JOIN pg_attribute ON dep_pg_class.oid = pg_attribute.attrelid
AND pg_depend.r

Re: Convert sepgsql tests to TAP

2024-07-24 Thread Andreas Karlsson

I took a quick look at the patch and I like that we standardize things a 
bit. But one thing I am not a fan of are all the use of sed and awk in 
the Perl script. I would prefer if that logic happened all in Perl, 
especially since we have some of it in Perl (e.g. chomp). Also I wonder 
if we should not use IPC::Run to do the tests since we already depend on 
it for the other TAP tests.


I have not yet set up an VM with selinux to try the patch out for real 
but will do so later.


On 5/13/24 8:16 AM, Peter Eisentraut wrote:
- Do we want to keep the old way to run the test?  I don't know all the 
testing scenarios that people might be interested in, but of course it 
would also be good to cut down on the duplication in the test files.


I cannot see why. Having two ways to run the tests seems only like a bad 
thing to me.


- If you go through the pre-test checks in contrib/sepgsql/test_sepgsql, 
I have converted most of these checks to the Perl script.  Some of the 
checks are obsolete, because they check whether the database has been 
correctly initialized, which is now done by the TAP script anyway.  One 
check that I wasn't sure about is the


# 'psql' command must be executable from test domain

The old test was checking the installation tree, which I guess could be 
set up in random ways.  But do we need this kind of check if we are 
using a temporary installation?


Yeah, that does not seem necessary.

Andreas

Re: [PATCH] Add min/max aggregate functions to BYTEA

2024-07-24 Thread Marat Bukharov

V5 patch. I've added more tests with different bytea sizes

--

With best regards,
Marat Bukharov

чт, 4 июл. 2024 г. в 15:29, Aleksander Alekseev :
>
> Hi Marat,
>
> > V4 path with fixed usage PG_GETARG_BYTEA_PP instead of PG_GETARG_TEXT_PP
>
> Thanks for the patch.
>
> Please add it to the nearest open commitfest [1].
>
> ```
> +select min(v) from bytea_test_table;
> + min
> +--
> + \xaa
> +(1 row)
> +
> +select max(v) from bytea_test_table;
> + max
> +--
> + \xff
> +(1 row)
> ```
>
> If I understand correctly, all the v's are of the same size. If this
> is the case you should add more test cases.
>
> [1]: https://commitfest.postgresql.org/
>
> --
> Best regards,
> Aleksander Alekseev
From 3bc3786ede117442b06d6cbe36a7b015d76a7cf4 Mon Sep 17 00:00:00 2001
From: Marat Bukharov 
Date: Wed, 24 Jul 2024 17:37:31 +0300
Subject: [PATCH v5] add bytea agg funcs

---
 doc/src/sgml/func.sgml   |  4 +--
 src/backend/utils/adt/varlena.c  | 38 
 src/include/catalog/pg_aggregate.dat |  6 
 src/include/catalog/pg_proc.dat  | 13 
 src/test/regress/expected/aggregates.out | 28 -
 src/test/regress/expected/opr_sanity.out |  2 ++
 src/test/regress/sql/aggregates.sql  | 11 ++-
 7 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f1f22a1960..891f7a4259 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -21976,7 +21976,7 @@ SELECT NULLIF(value, '(none)') ...

 Computes the maximum of the non-null input
 values.  Available for any numeric, string, date/time, or enum type,
-as well as inet, interval,
+as well as bytea, inet, interval,
 money, oid, pg_lsn,
 tid, xid8,
 and arrays of any of these types.
@@ -21995,7 +21995,7 @@ SELECT NULLIF(value, '(none)') ...

 Computes the minimum of the non-null input
 values.  Available for any numeric, string, date/time, or enum type,
-as well as inet, interval,
+as well as bytea, inet, interval,
 money, oid, pg_lsn,
 tid, xid8,
 and arrays of any of these types.
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index d2e2e9bbba..e80c789442 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -3956,6 +3956,44 @@ byteacmp(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(cmp);
 }
 
+Datum
+bytea_larger(PG_FUNCTION_ARGS)
+{
+	bytea	   *arg1 = PG_GETARG_BYTEA_PP(0);
+	bytea	   *arg2 = PG_GETARG_BYTEA_PP(1);
+	bytea	   *result;
+	int			len1,
+len2;
+	int			cmp;
+
+	len1 = VARSIZE_ANY_EXHDR(arg1);
+	len2 = VARSIZE_ANY_EXHDR(arg2);
+
+	cmp = memcmp(VARDATA_ANY(arg1), VARDATA_ANY(arg2), Min(len1, len2));
+	result = ((cmp > 0) || ((cmp == 0) && (len1 > len2)) ? arg1 : arg2);
+
+	PG_RETURN_BYTEA_P(result);
+}
+
+Datum
+bytea_smaller(PG_FUNCTION_ARGS)
+{
+	bytea	   *arg1 = PG_GETARG_BYTEA_PP(0);
+	bytea	   *arg2 = PG_GETARG_BYTEA_PP(1);
+	bytea	   *result;
+	int			len1,
+len2;
+	int			cmp;
+
+	len1 = VARSIZE_ANY_EXHDR(arg1);
+	len2 = VARSIZE_ANY_EXHDR(arg2);
+
+	cmp = memcmp(VARDATA_ANY(arg1), VARDATA_ANY(arg2), Min(len1, len2));
+	result = ((cmp < 0) || ((cmp == 0) && (len1 < len2)) ? arg1 : arg2);
+
+	PG_RETURN_BYTEA_P(result);
+}
+
 Datum
 bytea_sortsupport(PG_FUNCTION_ARGS)
 {
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index 5f13532abc..970d9a70fd 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -158,6 +158,9 @@
 { aggfnoid => 'max(xid8)', aggtransfn => 'xid8_larger',
   aggcombinefn => 'xid8_larger', aggsortop => '>(xid8,xid8)',
   aggtranstype => 'xid8' },
+{ aggfnoid => 'max(bytea)', aggtransfn => 'bytea_larger',
+  aggcombinefn => 'bytea_larger', aggsortop => '>(bytea,bytea)',
+  aggtranstype => 'bytea' },
 
 # min
 { aggfnoid => 'min(int8)', aggtransfn => 'int8smaller',
@@ -226,6 +229,9 @@
 { aggfnoid => 'min(xid8)', aggtransfn => 'xid8_smaller',
   aggcombinefn => 'xid8_smaller', aggsortop => '<(xid8,xid8)',
   aggtranstype => 'xid8' },
+{ aggfnoid => 'min(bytea)', aggtransfn => 'bytea_smaller',
+  aggcombinefn => 'bytea_smaller', aggsortop => '<(bytea,bytea)',
+  aggtranstype => 'bytea' },
 
 # count
 { aggfnoid => 'count(any)', aggtransfn => 'int8inc_any',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index d4ac578ae6..f16f17ef3a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1278,6 +1278,13 @@
   proname => 'text_smaller', proleakproof => 't', prorettype => 'text',
   proargtypes => 'text text', prosrc => 'text_smaller' },
 
+{ oid => '6347', descr => 'larger of two',
+  proname => 'bytea_larger', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'bytea bytea', prosrc => 'bytea_larger' },
+{ oid => '6348', descr => 'smaller of two',
+  proname => 'by

Re: [PATCH] Add min/max aggregate functions to BYTEA

2024-07-24 Thread Marat Bukharov

Added patch to commitfest https://commitfest.postgresql.org/49/5138/

--

With best regards,
Marat Bukharov

>
> Hi,
>
> > What part of commitfest should I put the current patch to: "SQL
> > Commands", "Miscellaneous" or something else? I can't figure it out.
>
> Personally I qualified a similar patch [1] as "Server Features",
> although I'm not 100% sure if this was the best choice.
>
> [1]: https://commitfest.postgresql.org/48/4905/
>
> --
> Best regards,
> Aleksander Alekseev

Re: Meson far from ready on Windows

2024-07-24 Thread Dave Page

On Sat, 20 Jul 2024 at 21:56, Andres Freund  wrote:

> Hi,
>
> On 2024-07-17 09:49:47 -0700, Andres Freund wrote:
> > On 2024-07-16 15:53:45 -0500, Tristan Partin wrote:
> > > Other than that, it looks pretty solid.
> >
> > Thanks for looking!  I'm thinking of pushing the first few patches
> soon-ish.
> >
> > I'm debating between going for 17 + HEAD or also applying it to 16, to
> keep
> > the trees more similar.
>
> Pushed a number of these to 16 - HEAD.
>

Thanks. I've updated winpgbuild with the additional pkgconfig file needed
for ICU now, so it should better match a *nix build.

Any chance you can look at the GSSAPI/OpenSSL X509_NAME conflict one? I'm
still having to patch around that to build with all the dependencies.

https://www.postgresql.org/message-id/flat/CA%2BOCxoxwsgi8QdzN8A0OPGuGfu_1vEW3ufVBnbwd3gfawVpsXw%40mail.gmail.com

-- 
Dave Page
pgAdmin: https://www.pgadmin.org
PostgreSQL: https://www.postgresql.org
EDB: https://www.enterprisedb.com

PGDay UK 2024, 11th September, London: https://2024.pgday.uk/

Re: Built-in CTYPE provider

2024-07-24 Thread Noah Misch

On Wed, Jul 17, 2024 at 03:03:26PM -0700, Noah Misch wrote:
> On Wed, Jul 17, 2024 at 08:48:46AM -0700, Jeff Davis wrote:
> > On Thu, 2024-07-11 at 05:50 -0700, Noah Misch wrote:
> > > > This is still marked as an open item for 17, but you've already
> > > > acknowledged[1] that no code changes are necessary in version 17.
> > > 
> > > Later posts on the thread made that obsolete.  The next step is to
> > > settle the
> > > question at https://postgr.es/m/20240706195129...@rfd.leadboat.com. 
> > > If that
> > > conclusion entails a remedy, v17 code changes may be part of that
> > > remedy.
> > 
> > This is the first time you've mentioned a code change in version 17. If
> 
> That's right.
> 
> > you have something in mind, please propose it. However, this feature
> > followed the right policies at the time of commit, so there would need
> > to be a strong consensus to accept such a change.
> 
> If I'm counting the votes right, you and Tom have voted that the feature's
> current state is okay, and I and Laurenz have voted that it's not okay.  I
> still hope more people will vote, to avoid dealing with the tie.  Daniel,
> Peter, and Jeremy, you're all listed as reviewers on commit f69319f.  Are you
> willing to vote one way or the other on the question in
> https://postgr.es/m/20240706195129...@rfd.leadboat.com?

The last vote arrived 6 days ago.  So far, we have votes from Jeff, Noah, Tom,
Daniel, and Laurenz.  I'll keep the voting open for another 24 hours from now
or 36 hours after the last vote, whichever comes last.  If that schedule is
too compressed for anyone, do share.

Re: Built-in CTYPE provider

2024-07-24 Thread Peter Eisentraut


On 24.07.24 17:19, Noah Misch wrote:

On Wed, Jul 17, 2024 at 03:03:26PM -0700, Noah Misch wrote:

On Wed, Jul 17, 2024 at 08:48:46AM -0700, Jeff Davis wrote:

On Thu, 2024-07-11 at 05:50 -0700, Noah Misch wrote:

This is still marked as an open item for 17, but you've already
acknowledged[1] that no code changes are necessary in version 17.


Later posts on the thread made that obsolete.  The next step is to
settle the
question at https://postgr.es/m/20240706195129...@rfd.leadboat.com.
If that
conclusion entails a remedy, v17 code changes may be part of that
remedy.


This is the first time you've mentioned a code change in version 17. If


That's right.


you have something in mind, please propose it. However, this feature
followed the right policies at the time of commit, so there would need
to be a strong consensus to accept such a change.


If I'm counting the votes right, you and Tom have voted that the feature's
current state is okay, and I and Laurenz have voted that it's not okay.  I
still hope more people will vote, to avoid dealing with the tie.  Daniel,
Peter, and Jeremy, you're all listed as reviewers on commit f69319f.  Are you
willing to vote one way or the other on the question in
https://postgr.es/m/20240706195129...@rfd.leadboat.com?


The last vote arrived 6 days ago.  So far, we have votes from Jeff, Noah, Tom,
Daniel, and Laurenz.  I'll keep the voting open for another 24 hours from now
or 36 hours after the last vote, whichever comes last.  If that schedule is
too compressed for anyone, do share.


My opinion is that it is okay to release as is.

Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bindx

2024-07-24 Thread Peter Eisentraut


On 24.07.24 07:04, Michael Paquier wrote:

This commit introduces three additional commands: \parse, \bindx and
\close.
\parse creates a prepared statement using extended protocol.
\bindx binds and execute an existing prepared statement using extended
protocol.
\close closes an existing prepared statement using extended protocol.


This commit message confused me, because I don't think this is what the 
\bindx command actually does.  AFAICT, it only binds, it does not 
execute.  At least that is what the documentation in the content of the 
patch appears to indicate.


I'm not sure \bindx is such a great name.  The "x" stands for "I ran out 
of ideas". ;-)  Maybe \bind_named or \bindn or something like that.  Or 
use the existing \bind with a -name argument?

Re: Built-in CTYPE provider

2024-07-24 Thread Joe Conway

On 7/24/24 11:19, Noah Misch wrote:

On Wed, Jul 17, 2024 at 03:03:26PM -0700, Noah Misch wrote:

On Wed, Jul 17, 2024 at 08:48:46AM -0700, Jeff Davis wrote:
> you have something in mind, please propose it. However, this feature
> followed the right policies at the time of commit, so there would need
> to be a strong consensus to accept such a change.

If I'm counting the votes right, you and Tom have voted that the feature's
current state is okay, and I and Laurenz have voted that it's not okay.  I
still hope more people will vote, to avoid dealing with the tie.  Daniel,
Peter, and Jeremy, you're all listed as reviewers on commit f69319f.  Are you
willing to vote one way or the other on the question in
https://postgr.es/m/20240706195129...@rfd.leadboat.com?

The last vote arrived 6 days ago.  So far, we have votes from Jeff, Noah, Tom,
Daniel, and Laurenz.  I'll keep the voting open for another 24 hours from now
or 36 hours after the last vote, whichever comes last.  If that schedule is
too compressed for anyone, do share.

It isn't entirely clear to me exactly what we are voting on.

* If someone votes +1 (current state is ok) -- that is pretty clear.
* But if someone votes -1 (current state is not ok?) what does that mean
  in practice?
  - a revert?
  - we hold shipping 17 until we get consensus (via some plan or
mitigation or whatever)?
  - something else?

In any case, I am a hard -1 against reverting. +0.5 on "current state is 
ok", and +1 on "current state is ok with agreement on what to do in 18"

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Detect buffer underflow in get_th()

2024-07-24 Thread Peter Eisentraut


On 24.07.24 11:43, Alexander Kuznetsov wrote:

Hello everyone,

In src/backend/utils/adt/formatting.c:1516, there is a get_th() function 
utilized to return ST/ND/RD/TH suffixes for simple numbers.
Upon reviewing its behavior, it appears capable of receiving non-numeric 
inputs (this is verified by a check at formatting.c:1527).


Given that the function can accept non-numeric inputs,
it is plausible that it could also receive an empty input,
although a brief examination of its calls did not reveal any such 
instances.


Nevertheless, if the function were to receive an empty input of zero 
length,
a buffer underflow would occur when attempting to compute *(num + (len - 
1)), as (len - 1) would result in a negative shift.
To mitigate this issue, I propose a patch incorporating the 
zero_length_character_string error code, as detailed in the attachment.


If it can't happen in practice, maybe an assertion would be enough?

Re: Support prepared statement invalidation when result types change

2024-07-24 Thread Tom Lane

Jelte Fennema  writes:
> The cached plan for a prepared statements can get invalidated when DDL
> changes the tables used in the query, or when search_path changes.
> ...
> However, we would throw an error if the the result of the query is of a
> different type than it was before:
> ERROR: cached plan must not change result type

Yes, this is intentional.

> This patch starts to allow a prepared statement to continue to work even
> when the result type changes.

What this is is a wire protocol break.  What if the client has
previously done a Describe Statement on the prepared statement?
We have no mechanism for notifying it that that information is
now falsified.  The error is thrown to prevent us from getting
into a situation where we'd need to do that.

regards, tom lane

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-07-24 Thread Tom Lane

Amit Kapila  writes:
> I merged these changes, made a few other cosmetic changes, and pushed the 
> patch.

There is a CF entry pointing at this thread [1].  Should it be closed?

regards, tom lane

[1] https://commitfest.postgresql.org/48/4867/

Re: Built-in CTYPE provider

2024-07-24 Thread Jeremy Schneider

On Wed, Jul 24, 2024 at 9:27 AM Peter Eisentraut 
wrote:

> > The last vote arrived 6 days ago.  So far, we have votes from Jeff,
> Noah, Tom,
> > Daniel, and Laurenz.  I'll keep the voting open for another 24 hours
> from now
> > or 36 hours after the last vote, whichever comes last.  If that schedule
> is
> > too compressed for anyone, do share.
>
> My opinion is that it is okay to release as is.

Like Jeff, I don’t think counting votes or putting names on one side or
another is the best way to decide things. Everyone has unique opinions and
nuances, it’s not like there’s two groups that all agree together on
everything and disagree with the other group. I don’t want my name put on a
list this way; there are some places where I agree and some places where I
disagree with most people 🙂

I don’t know the code as intimately as some others on the lists, but I’m
not aware of any one-way doors that would create major difficulties for
future v18+ ideas being discussed

fwiw, I don’t want to pull this feature out of v17, I think it’s okay to
release it

-Jeremy

Re: tests fail on windows with default git settings

2024-07-24 Thread Dave Page

Hi

On Sat, 13 Jul 2024 at 23:01, Thomas Munro  wrote:

> On Fri, Jul 12, 2024 at 3:49 AM Dave Page  wrote:
> > So I received an off-list tip to checkout [1], a discussion around
> GSSAPI causing test failures on windows that Alexander Lakhin was looking
> at. Thomas Munro's v2 patch to try to address the issue brought me down to
> just a single test failure with GSSAPI enabled on 17b2 (with a second,
> simple fix for the OpenSSL/Kerberos/x509 issue): pg_dump/002_pg_dump. The
> relevant section from the log looks like this:
>
> I pushed that (ba9fcac7).
>
> > [15:28:42.692](0.006s) not ok 2 - connecting to a non-existent database:
> matches
> > [15:28:42.693](0.001s) #   Failed test 'connecting to a non-existent
> database: matches'
> > #   at C:/Users/dpage/git/postgresql/src/bin/pg_dump/t/002_pg_dump.pl
> line 4689.
> > [15:28:42.694](0.001s) #   'pg_dump: error: connection
> to server at "127.0.0.1", port 53834 failed: could not initiate GSSAPI
> security context: No credentials were supplied, or the credentials were
> unavailable or inaccessible: Credential cache is empty
> > # connection to server at "127.0.0.1", port 53834 failed: FATAL:
> database "qqq" does not exist
> > # '
> > # doesn't match '(?^:pg_dump: error: connection to server .* failed:
> FATAL:  database "qqq" does not exist)'
>
> Does it help if you revert 29992a6?
>

Sorry for the delay - things got crazy busy for a while.

No, reverting that commit does not help.

-- 
Dave Page
pgAdmin: https://www.pgadmin.org
PostgreSQL: https://www.postgresql.org
EDB: https://www.enterprisedb.com

PGDay UK 2024, 11th September, London: https://2024.pgday.uk/

Re: Wrong security context for deferred triggers?

2024-07-24 Thread Pavel Stehule

po 8. 7. 2024 v 14:36 odesílatel Laurenz Albe 
napsal:

> On 7/8/24 04:14, Joseph Koshakow wrote:
> > Given the above and the fact that the patch is a breaking change, my
> > vote would still be to keep the current behavior and update the
> > documentation. Though I'd be happy to be overruled by someone with more
> > knowledge of triggers.
>
> Thanks for that feedback.
> Based on that, the patch should be rejected.
>
> Since there were a couple of other opinions early in the thread, I'll let
> it sit like that for now, and judgement can be passed at the end of the
> commitfest.  Perhaps somebody else wants to chime in.
>

It is hard to say what should be expected behaviour in this case. I think
the best is just to document this issue, and change nothing.

Regards

Pavel


>
> Yours,
> Laurenz Albe
>
>
>
>

Re: Inconsistency between try_mergejoin_path and create_mergejoin_plan

2024-07-24 Thread Tom Lane

Richard Guo  writes:
> Thank you for confirming.  Here is an updated patch with some tweaks to
> the comments and commit message.  I've parked this patch in the July
> commitfest.

I took a brief look at this.  I think the basic idea is sound,
but I have a couple of nits:

* It's not entirely obvious that the checks preceding these additions
are sufficient to ensure that the clauses are OpExprs.  They probably
are, since the clauses are marked hashable or mergeable, but that test
is mighty far away.  More to the point, if they ever weren't OpExprs
the result would likely be to pass a bogus OID to get_commutator and
thus silently fail, allowing the problem to go undetected for a long
time.  I'd suggest using castNode() rather than a hard-wired
assumption that the clause is an OpExpr.

* Do we really need to construct a whole new set of bogus operators
and opclasses to test this?  As you noted, the regression tests
already set up an incomplete opclass for other purposes.  Why can't
we extend that test, to reduce the amount of cycles wasted forevermore
on this rather trivial point?

(I'm actually wondering whether we really need to memorialize this
with a regression test case at all.  But I'd settle for minimizing
the amount of test cycles added.)

regards, tom lane

Re: Make query cancellation keys longer

2024-07-24 Thread Heikki Linnakangas


On 04/07/2024 15:20, Jelte Fennema-Nio wrote:

On Thu, 4 Jul 2024 at 12:32, Heikki Linnakangas  wrote:

We currently don't do any locking on the ProcSignal array. For query
cancellations, that's good because a query cancel packet is processed
without having a PGPROC entry, so we cannot take LWLocks. We could use
spinlocks though. In this patch, I used memory barriers to ensure that
we load/store the pid and the cancellation key in a sensible order, so
that you cannot e.g. send a cancellation signal to a backend that's just
starting up and hasn't advertised its cancellation key in ProcSignal
yet. But I think this might be simpler and less error-prone by just
adding a spinlock to each ProcSignal slot. That would also fix the
existing race condition where we might set the pss_signalFlags flag for
a slot, when the process concurrently terminates and the slot is reused
for a different process. Because of that, we currently have this caveat:
"... all the signals are such that no harm is done if they're mistakenly
fired". With a spinlock, we could eliminate that race.


I think a spinlock would make this thing a whole concurrency stuff a
lot easier to reason about.

+   slot->pss_cancel_key_valid = false;
+   slot->pss_cancel_key = 0;

If no spinlock is added, I think these accesses should still be made
atomic writes. Otherwise data-races on those fields are still
possible, resulting in undefined behaviour. The memory barriers you
added don't prevent that afaict. With atomic operations there are
still race conditions, but no data-races.

Actually it seems like that same argument applies to the already
existing reading/writing of pss_pid: it's written/read using
non-atomic operations so data-races can occur and thus undefined
behaviour too.


Ok, here's a version with spinlocks.

I went back and forth on what exactly is protected by the spinlock. I 
kept the "volatile sig_atomic_t" type for pss_signalFlags, so that it 
can still be safely read without holding the spinlock in 
CheckProcSignal, but all the functions that set the flags now hold the 
spinlock. That removes the race condition that you might set the flag 
for wrong slot, if the target backend exits and the slot is recycled. 
The race condition was harmless and there were comments to note it, but 
it doesn't occur anymore with the spinlock.


(Note: Thomas's "Interrupts vs signals" patch will remove ps_signalFlags 
altogether. I'm looking forward to that.)



-volatile pid_t pss_pid;
+pid_tpss_pid;

Why remove the volatile modifier?


Because I introduced a memory barrier to ensure the reads/writes of 
pss_pid become visible to other processes in right order. That makes the 
'volatile' unnecessary IIUC. With the spinlock, the point is moot however.


--
Heikki Linnakangas
Neon (https://neon.tech)
From 2e6035837c028abd3b645a8c038cac7179016eef Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 24 Jul 2024 19:06:32 +0300
Subject: [PATCH v4 1/1] Move cancel key generation to after forking the
 backend

Move responsibility of generating the cancel key to the backend
process. The cancel key is now generated after forking, and the
backend advertises it in the ProcSignal array. When a cancel request
arrives, the backend handling it scans the ProcSignal array to find
the target pid and cancel key. This is similar to how this previously
worked in the EXEC_BACKEND case with the ShmemBackendArray, just
reusing the ProcSignal array.

One notable change is that we no longer generate cancellation keys for
non-backend processes. We generated them before just to prevent a
malicious user from canceling them, the keys for non-backend processes
were never actually given to anyone. There is now an explicit flag
indicating whether a process has a valid key or not.

I wrote this originally in preparation for supporting longer cancel
keys, but it's a nice cleanup on its own.

Reviewed-by: Jelte Fennema-Nio
Discussion: https://www.postgresql.org/message-id/508d0505-8b7a-4864-a681-e7e5edfe3...@iki.fi
---
 src/backend/postmaster/auxprocess.c |   2 +-
 src/backend/postmaster/launch_backend.c |   6 -
 src/backend/postmaster/postmaster.c | 196 +---
 src/backend/storage/ipc/ipci.c  |  11 --
 src/backend/storage/ipc/procsignal.c| 172 -
 src/backend/tcop/backend_startup.c  |   9 +-
 src/backend/tcop/postgres.c |  21 +++
 src/backend/utils/init/globals.c|   3 +-
 src/backend/utils/init/postinit.c   |   2 +-
 src/include/miscadmin.h |   1 +
 src/include/postmaster/postmaster.h |   9 --
 src/include/storage/procsignal.h|   3 +-
 12 files changed, 175 insertions(+), 260 deletions(-)

diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
index b21eae7136..74b8a00c94 100644
--- a/src/backend/postmaster/auxprocess.c
+++ b/src/backend/postmaster/auxprocess.c
@@ -71,7 +71,7 @@ AuxiliaryProcessMainCommon(void)
 
 	BaseInit

Re: Convert sepgsql tests to TAP

2024-07-24 Thread Andreas Karlsson


On 7/24/24 4:31 PM, Andreas Karlsson wrote:
I have not yet set up an VM with selinux to try the patch out for real 
but will do so later.


I almost got the tests running but it required way too many manual steps 
to just get there and I gave up after just getting segfaults. I had to 
edit sepgsql-regtest.te because sepgsql-regtest.pp would not build 
otherwise on Debian bookworm, but after I had done that instead of 
getting test failures as I expected I just got segfaults. Maybe those 
are caused by an incorrect sepgsql-regtest.pp but this was not nice at 
all to try to get running for someone like me who does not know selinux 
well.


Peter, what did you do to get the tests running? And should we fix these 
tests to make them more user friendly?


Andreas

Re: Convert sepgsql tests to TAP

2024-07-24 Thread Peter Eisentraut


On 24.07.24 18:29, Andreas Karlsson wrote:

On 7/24/24 4:31 PM, Andreas Karlsson wrote:
I have not yet set up an VM with selinux to try the patch out for real 
but will do so later.


I almost got the tests running but it required way too many manual steps 
to just get there and I gave up after just getting segfaults. I had to 
edit sepgsql-regtest.te because sepgsql-regtest.pp would not build 
otherwise on Debian bookworm, but after I had done that instead of 
getting test failures as I expected I just got segfaults. Maybe those 
are caused by an incorrect sepgsql-regtest.pp but this was not nice at 
all to try to get running for someone like me who does not know selinux 
well.


Peter, what did you do to get the tests running? And should we fix these 
tests to make them more user friendly?


In my experience, the tests (both the old and the proposed new) only 
work on Red Hat-like platforms.  I had also tried on Debian but decided 
that it won't work.

Re: Convert sepgsql tests to TAP

2024-07-24 Thread Peter Eisentraut


On 24.07.24 16:31, Andreas Karlsson wrote:
I took a quick look at the patch and I like that we standardize things a 
bit. But one thing I am not a fan of are all the use of sed and awk in 
the Perl script. I would prefer if that logic happened all in Perl, 
especially since we have some of it in Perl (e.g. chomp). Also I wonder 
if we should not use IPC::Run to do the tests since we already depend on 
it for the other TAP tests.


In principle yes, but here I tried not rewriting the tests too much but 
just port them to a newer environment.  I think the adjustments you 
describe could be done as a second step.


(I don't really have any expertise in sepgsql or selinux, I'm just doing 
this to reduce the dependency on makefiles for testing.  So I'm trying 
to use as light a touch as possible.)

Re: pg_upgrade and logical replication

2024-07-24 Thread Nathan Bossart

On Wed, Jul 24, 2024 at 11:32:47AM +0530, Amit Kapila wrote:
> LGTM.

Thanks for reviewing.  Committed and back-patched to v17.

-- 
nathan

Re: Convert sepgsql tests to TAP

2024-07-24 Thread Andreas Karlsson


On 7/24/24 6:31 PM, Peter Eisentraut wrote:

On 24.07.24 18:29, Andreas Karlsson wrote:
Peter, what did you do to get the tests running? And should we fix 
these tests to make them more user friendly?


In my experience, the tests (both the old and the proposed new) only 
work on Red Hat-like platforms.  I had also tried on Debian but decided 
that it won't work.


Thanks, will try to run them on Rocky Linux when I have calmed down a 
bit. :)


Andreas

Re: Convert sepgsql tests to TAP

2024-07-24 Thread Tom Lane

Peter Eisentraut  writes:
> In my experience, the tests (both the old and the proposed new) only 
> work on Red Hat-like platforms.  I had also tried on Debian but decided 
> that it won't work.

Yeah, Red Hat is pretty much the only vendor that has pushed SELinux
far enough to be usable by non-wizards.  I'm not surprised if there
are outright bugs in other distros' versions of it, as AFAIK
nobody else turns it on by default.

regards, tom lane

Re: Convert sepgsql tests to TAP

2024-07-24 Thread Joe Conway


On 7/24/24 12:36, Tom Lane wrote:

Peter Eisentraut  writes:
In my experience, the tests (both the old and the proposed new) only 
work on Red Hat-like platforms.  I had also tried on Debian but decided 
that it won't work.


Yeah, Red Hat is pretty much the only vendor that has pushed SELinux
far enough to be usable by non-wizards.  I'm not surprised if there
are outright bugs in other distros' versions of it, as AFAIK
nobody else turns it on by default.


I tried some years ago to get it working on my Debian-derived Linux Mint 
desktop and gave up. I think SELinux is a really good tool on RHEL 
variants, but I don't think many people use it on anything else. As Tom 
says, perhaps there are a few wizards out there though...


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: proposal: schema variables

2024-07-24 Thread Laurenz Albe

On Wed, 2024-07-24 at 17:19 +0200, Pavel Stehule wrote:
> >   This is buggy:
> > 
> >     CREATE VARIABLE str AS text NOT NULL DEFAULT NULL;
> > 
> >   Ugh.
> > 
> >     SELECT str;
> >     ERROR:  null value is not allowed for NOT NULL session variable 
> > "laurenz.str"
> >     DETAIL:  The result of DEFAULT expression is NULL.
> > 
> >   Perhaps that is a leftover from the previous coding, but I think there 
> > need be
> >   no check upon SELECT.  It should be enough to check during CREATE 
> > VARIABLE and
> >   LET.
> 
> I think it is correct. When you use SELECT str, then DEFAULT expression is
> executed, and then the result is assigned to a variable, and there is NOT NULL
> guard, which raises an exception. The variable is not initialized when you run
> DDL, but it is initialized when you first read or write from/to the variable.
> The DEFAULT expression is not evaluated in DDL time. In this case, I can 
> detect
> the problem in DDL time because the result is transformed to NULL node, but
> generally there can be SQL non immutable function, and then I need to wait 
> until
> the DEFAULT expression will be evaluated - and it is time of first reading.
> Unfortunately, there is not an available check if some expression is NULL,
> that I can use in DDL time, but I have it in plpgsql_check.

That makes sense to me.

In that case, I think we can drop the requirement that NOT NULL variables
need a default clause.

> 
> >   I can see the usefulness of IMMUTABLE variables, but I am surprised that
> >   they are reset by DISCARD.  What is the use case you have in mind?
> >   The use case I can envision is an application that sets a value right 
> > after
> >   authentication, for use with row-level security.  But then it would be 
> > harmful
> >   if the user could reset the variable with DISCARD.
> 
> Primary I think about IMMUTABLE variables like about some form of cache.
> This cache is protected against unwanted repeated write - and can help with
> detection of this situation.

We can leave it as it is.  The IMMUTABLE feature need not go into the first
release, so that can be discussed some more later.

Thanks for the new patch set; I'll look at it soon.

Yours,
Laurenz Albe

Re: Detect buffer underflow in get_th()

2024-07-24 Thread Alexander Kuznetsov




24.07.2024 18:39, Peter Eisentraut wrote:

If it can't happen in practice, maybe an assertion would be enough?



In practice, the function should not receive non-numeric strings either; 
however, since there is an exception in place for such cases, I thought it 
would be good to add a check for zero-length input in a similar manner.

But of course it's open for discussion and team decision whether this should be 
addressed as an assertion or handled differently.

--
Best regards,
Alexander Kuznetsov

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-24 Thread Jeff Davis

On Tue, 2024-07-23 at 06:31 -0600, Jeremy Schneider wrote:
> Other RDBMS are very careful not to corrupt databases, afaik
> including function based indexes, by changing Unicode. I’m not aware
> of any other RDBMS that updates Unicode versions in place; instead
> they support multiple Unicode versions and do not drop the old ones.

I'm curious about the details of what other RDBMSs do.

Let's simplify and say that there's one database-wide collation at
version 1, and the application doesn't use any COLLATE clause or other
specifications for queries or DDL.

Then, version 2 of that collation becomes available. When a query comes
into the database, which version of the collation does it use, 1 or 2?
If it uses the latest available (version 2), then all the old indexes
are effectively useless.

So I suppose there's some kind of migration process where you
rebuild/fix objects to use the new collation, and when that's done then
you change the default so that queries use version 2. How does all that
work?

Regards,
Jeff Davis

Re: Fixing backslash dot for COPY FROM...CSV

2024-07-24 Thread Tom Lane

Sutou Kouhei  writes:
> I read through this thread. It seems that this thread
> discuss 2 things:
> 1. \. in CSV mode
> 2. \. in non-CSV mode
> Recent messages discussed mainly 2. but how about create a
> separated thread for 2.? Because the original mail focused
> on 1. and it seems that we can handle them separately.

Well, we don't want to paint ourselves into a corner by considering
only part of the problem.

What I'm currently thinking is that we can't remove the special
treatment of \. in text mode.  It's not arguably a bug, because
it's been part of the specification since day one; and there
are too many compatibility risks in pg_dump and elsewhere.
I think we should fix it so that \. that's not alone on a line
throws an error, but I wouldn't go further than that.

> How about introducing a new COPY option that controls
> whether "\." is ignored or not instead of this approach?

No thanks.  Once we create such an option we'll never be
able to get rid of it.

regards, tom lane

Re: warning: dereferencing type-punned pointer

2024-07-24 Thread Peter Eisentraut


On 24.07.24 08:55, Tatsuo Ishii wrote:

origin.c: In function ‘StartupReplicationOrigin’:
origin.c:773:16: warning: dereferencing type-punned pointer will break 
strict-aliasing rules [-Wstrict-aliasing]
   773 |file_crc = *(pg_crc32c *) &disk_state;
   |^



I am not so sure about StartupReplicationOrigin. Should we fix it now?
For me the code looks sane as long as we keep -fno-strict-aliasing
option. Or maybe better to fix so that someday we could remove the
compiler option?


This is basically the textbook example of aliasing violation, isn't it? 
Wouldn't it be just as simple to do


memcpy(&file_crc, &disk_state, sizeof(file_crc));

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-24 Thread Jeff Davis

On Wed, 2024-07-24 at 08:20 -0400, Robert Haas wrote:
> I note in passing that the last time I saw a customer query with
> UPPER() in the join clause was... yesterday.

Can you expand on that? This thread is mostly about durable state so I
don't immediately see the connection.

> So I don't want to see us sit on our hands and assert that we don't
> need to worry about ctype because it's minor in comparison with
> collation. It *is* minor in comparison with collation. 

...

> But one problem
> can be small in comparison with another and still bad. If an aircraft
> is on fire whilst experiencing a dual engine failure, it's still in a
> lot of trouble even if the fire can be put out.

There's a qualitative difference between a collation update which can
break your PKs and FKs, and a ctype update which definitely will not.
Your analogy doesn't quite capture this distinction. I don't mean to
over-emphasize this point, but I do think we need to keep some
perspective here.

But I agree with your general point that we shouldn't dismiss the
problem just because it's minor. We should expect the problem to
surface at some point and be reasonably prepared.

Regards,
Jeff Davis

Re: warning: dereferencing type-punned pointer

2024-07-24 Thread Tom Lane

Peter Eisentraut  writes:
> This is basically the textbook example of aliasing violation, isn't it? 
> Wouldn't it be just as simple to do

> memcpy(&file_crc, &disk_state, sizeof(file_crc));

+1.  Also, it seems thoroughly bizarre to me that this case is handled
before checking for read failure.  I'd move the stanza to after the
"if (readBytes < 0)" one.

regards, tom lane

Re: warning: dereferencing type-punned pointer

2024-07-24 Thread Peter Eisentraut


On 24.07.24 16:05, Tom Lane wrote:

I'm not very thrilled with these changes.  It's not apparent why
your compiler is warning about these usages of IsA and not any other
ones,


I think one difference is that normally IsA is called on a Node * (since 
you call IsA to decide what to cast it to), but in this case it's called 
on a pointer that is already of type ErrorSaveContext *.  You wouldn't 
normally need to call IsA on that, but it comes with the 
SOFT_ERROR_OCCURRED macro.  Another difference is that most nodes are 
dynamically allocated but in this case the ErrorSaveContext object (not 
a pointer to it) is part of another struct, and so I think some 
different aliasing rules might apply, but I'm not sure.


I think here you could just bypass the SOFT_ERROR_OCCURRED macro:

-   if (SOFT_ERROR_OCCURRED(&jsestate->escontext))
+   if (jsestate->escontext.error_occurred)

Re: warning: dereferencing type-punned pointer

2024-07-24 Thread Tom Lane

Peter Eisentraut  writes:
> On 24.07.24 16:05, Tom Lane wrote:
>> I'm not very thrilled with these changes.  It's not apparent why
>> your compiler is warning about these usages of IsA and not any other
>> ones,

> I think one difference is that normally IsA is called on a Node * (since 
> you call IsA to decide what to cast it to), but in this case it's called 
> on a pointer that is already of type ErrorSaveContext *.

Hmm.  But there are boatloads of places where we call IsA on a
pointer of type Expr *, or sometimes other things.  Why aren't
those triggering the same warning?

> I think here you could just bypass the SOFT_ERROR_OCCURRED macro:
> -   if (SOFT_ERROR_OCCURRED(&jsestate->escontext))
> +   if (jsestate->escontext.error_occurred)

Perhaps.  That's a bit sad because it's piercing a layer of
abstraction.  I do not like compiler warnings that can't be
gotten rid of without making the code objectively worse.

regards, tom lane

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-24 Thread Peter Eisentraut


On 24.07.24 14:20, Robert Haas wrote:

On Wed, Jul 24, 2024 at 12:42 AM Peter Eisentraut  wrote:

Fair enough.  My argument was, that topic is distinct from the topic of
this thread.


OK, that's fair. But I think the solutions are the same: we complain
all the time about glibc and ICU shipping collations and not
versioning them. We shouldn't make the same kinds of mistakes. Even if
ctype is less likely to break things than collations, it still can,
and we should move in the direction of letting people keep the v17
behavior for the foreseeable future while at the same time having a
way that they can also get the new behavior if they want it (and the
new behavior should be the default).


Versioning is possibly part of the answer, but I think it would be 
different versioning from the collation version.


The collation versions are in principle designed to change rarely.  Some 
languages' rules might change once in twenty years, some never.  Maybe 
you have a database mostly in English and a few tables in, I don't know, 
Swedish (unverified examples).  Most of the time nothing happens during 
upgrades, but one time in many years you need to reindex the Swedish 
tables, and the system starts warning you about that as soon as you 
access the Swedish tables.  (Conversely, if you never actually access 
the Swedish tables, then you don't get warned about.)


If we wanted a similar versioning system for the Unicode updates, it 
would be separate.  We'd write the Unicode version that was current when 
the system catalogs were initialized into, say, a pg_database column. 
And then at run-time, when someone runs say the normalize() function or 
some regular expression character classification, then we check what the 
version of the current compiled-in Unicode tables are, and then we'd 
issue a warning when they are different.


A possible problem is that the Unicode version changes in practice with 
every major PostgreSQL release, so this approach would end up warning 
users after every upgrade.  To avoid that, we'd probably need to keep 
support for multiple Unicode versions around, as has been suggested in 
this thread already.

Re: warning: dereferencing type-punned pointer

2024-07-24 Thread Peter Eisentraut


On 24.07.24 20:09, Tom Lane wrote:

Peter Eisentraut  writes:

On 24.07.24 16:05, Tom Lane wrote:

I'm not very thrilled with these changes.  It's not apparent why
your compiler is warning about these usages of IsA and not any other
ones,

I think one difference is that normally IsA is called on a Node * (since
you call IsA to decide what to cast it to), but in this case it's called
on a pointer that is already of type ErrorSaveContext *.

Hmm.  But there are boatloads of places where we call IsA on a
pointer of type Expr *, or sometimes other things.  Why aren't
those triggering the same warning?


It must have to do with the fact that the escontext field in 
JsonExprState has the object inline, not as a pointer.  AIUI, with 
dynamically allocated objects you have more liberties about what type to 
interpret them as than with actually declared objects.


If you change the member to a pointer

-   ErrorSaveContext escontext;
+   ErrorSaveContext *escontext;
 } JsonExprState;

and make the required adjustments elsewhere in the code, the warning 
goes away.


This arrangement would also appear to be more consistent with other 
executor nodes (e.g., ExprState, ExprEvalStep), so it might be worth it 
for consistency in any case.

Re: warning: dereferencing type-punned pointer

2024-07-24 Thread Tom Lane

BTW, I tried the same experiment of building without
-fno-strict-aliasing using gcc 11.4.1 (from RHEL9).
I see one more warning than Tatsuo-san did:

In file included from verify_heapam.c:18:
verify_heapam.c: In function ‘check_tuple_attribute’:
../../src/include/access/toast_internals.h:37:11: warning: dereferencing 
type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   37 | (((toast_compress_header *) (ptr))->tcinfo >> 
VARLENA_EXTSIZE_BITS)
  |  ~^~~~
verify_heapam.c:1693:24: note: in expansion of macro ‘TOAST_COMPRESS_METHOD’
 1693 | cmid = TOAST_COMPRESS_METHOD(&toast_pointer);
  |^

This looks a bit messy to fix: we surely don't want to pierce
the abstraction TOAST_COMPRESS_METHOD provides.  Perhaps
the toast_pointer local variable could be turned into a union
of struct varatt_external and toast_compress_header, but that
would impose a good deal of notational overhead on the rest
of this function.

The good news is that we get through check-world (although
I didn't try very many build options).

regards, tom lane

Re: warning: dereferencing type-punned pointer

2024-07-24 Thread Tom Lane

Peter Eisentraut  writes:
> If you change the member to a pointer

> -   ErrorSaveContext escontext;
> +   ErrorSaveContext *escontext;
>   } JsonExprState;

> and make the required adjustments elsewhere in the code, the warning 
> goes away.

> This arrangement would also appear to be more consistent with other 
> executor nodes (e.g., ExprState, ExprEvalStep), so it might be worth it 
> for consistency in any case.

+1, makes sense to me.

regards, tom lane

Re: improve performance of pg_dump with many sequences

2024-07-24 Thread Nathan Bossart

I ran Euler's tests again on the v6 patch set.

for i in `seq 1 1`; do psql postgres -c "CREATE SEQUENCE s$i;"; done
time pg_dump -f - -s -d postgres > /dev/null

HEAD:0.607s
0001 + 0002: 0.094s
all patches: 0.094s

Barring additional feedback, I am planning to commit these patches early
next week.

-- 
nathan

Regarding t_cid in Neon heap WAL records

2024-07-24 Thread Muhammad Malik

Neon added a t_cid field to heap WAL records 
https://github.com/yibit/neon-postgresql/blob/main/docs/core_changes.md#add-t_cid-to-heap-wal-records.

However, when replaying the delete log record, it is discarding the combo flag 
and storing the raw cmax on the old tuple 
https://github.com/neondatabase/neon/blob/main/pgxn/neon_rmgr/neon_rmgr.c#L376. 
This will make the tuple header different from what is in the buffer cache if 
the deleted tuple was using a combocid. Similarly, there was no t_cid added for 
the old tuple in xl_neon_heap_update, and it is using the t_cid of the new 
tuple to set cmax on the old tuple during redo_neon_heap_update.

Why is this not a problem when a visibility check is performed on the tuple 
after reading from storage, since it won't get the correct cmin value on the 
old tuple?
Also, what is the need of adding the t_cid of the new tuple in 
xl_neon_heap_update when it is already present in the xl_neon_heap_header? 
Seems like it is sending the same t_cid twice with the update WAL record.
Thanks,
Muhammad

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-24 Thread Robert Haas

On Wed, Jul 24, 2024 at 1:45 PM Jeff Davis  wrote:
> There's a qualitative difference between a collation update which can
> break your PKs and FKs, and a ctype update which definitely will not.

I don't think that's true. All you need is a unique index on UPPER(somecol).

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: add function argument names to regex* functions.

2024-07-24 Thread Tom Lane

jian he  writes:
> On Fri, Jul 19, 2024 at 5:48 AM Tom Lane  wrote:
>> The larger issue is that contrib/citext offers versions of some of
>> these functions that are meant to be drop-in replacements using
>> citext input.  Hence, we need to add the same parameter names to
>> those functions, or they'll fail to replace some calls.

> citext module, these functions:
> regexp_match()
> regexp_matches()
> regexp_replace()
> regexp_split_to_array()
> regexp_split_to_table()
> were created in contrib/citext/citext--1.4.sql, we can add the CREATE
> OR REPLACE FUNCTION to 1.4.sql.
> but to avoid unintended consequences I just add these to the newly
> created file citext--1.6--1.7.sql,
> to make a version bump.

Yes.  You *have to* do it like that, the shortcut is not an option,
because without an extension update script there is no way to
upgrade an existing installation to the new definition.  Basically,
once we ship a given release of an extension, that script is frozen
in amber.

I haven't heard any further bikeshedding on the argument names,
so I'll move forward with committing this soon.

regards, tom lane

Re: proposal: schema variables

2024-07-24 Thread Pavel Stehule

út 23. 7. 2024 v 23:41 odesílatel Laurenz Albe 
napsal:

> On Tue, 2024-07-23 at 16:34 +0200, Laurenz Albe wrote:
> > CREATE VARIABLE command:
> >
> >   This is buggy:
> >
> > CREATE VARIABLE str AS text NOT NULL DEFAULT NULL;
> >
> >   Ugh.
> >
> > SELECT str;
> > ERROR:  null value is not allowed for NOT NULL session variable
> "laurenz.str"
> > DETAIL:  The result of DEFAULT expression is NULL.
> >
> >   Perhaps that is a leftover from the previous coding, but I think there
> need be
> >   no check upon SELECT.  It should be enough to check during CREATE
> VARIABLE and
> >   LET.
>
> I'm having second thoughts about that.
>
> I was thinking of a variable like of a table column, but there is a
> fundamental
> difference: there is a clear moment when a tuple is added (INSERT or
> UPDATE),
> which is the point where a column can be checked for NULL values.
>
> A variable can be SELECTed without having been LET before, in which case it
> has the default value.  But there is no way to test the default value
> before
> the variable is SELECTed.  So while DEFAULT NULL for a non-nullable
> variable
> seems weird, it is no worse than DEFAULT somefunc() for a function that
> returns
> NULL.
>
> So perhaps the behavior I complained about above is actually the right one.
> In the view of that, it doesn't seem necessary to enforce a DEFAULT value
> for
> a NOT NULL variable: NOT NULL might just as well mean "you have to LET it
> before
> you can SELECT it".
>

exactly


>
> > IMMUTABLE variables:
> >
> > +   
> > +IMMUTABLE
> > +
> > + 
> > +  The assigned value of the session variable can not be changed.
> > +  Only if the session variable doesn't have a default value, a
> single
> > +  initialization is allowed using the LET
> command. Once
> > +  done, no further change is allowed until end of transaction
> > +  if the session variable was created with clause ON
> TRANSACTION
> > +  END RESET, or until reset of all session variables
> by
> > +  DISCARD VARIABLES, or until reset of all
> session
> > +  objects by command DISCARD ALL.
> > + 
> > +
> > +   
> >
> >   I can see the usefulness of IMMUTABLE variables, but I am surprised
> that
> >   they are reset by DISCARD.  What is the use case you have in mind?
> >   The use case I can envision is an application that sets a value right
> after
> >   authentication, for use with row-level security.  But then it would be
> harmful
> >   if the user could reset the variable with DISCARD.
>
> I'm beginning to be uncertain about that as well.  You might want to use a
> connection pool, and you LET the variable when you take it out of the pool.
> When the session is returned to the pool, variables get DISCARDed.
>
> Sure, a user can call DISCARD, but only if he or she is in an interactive
> session.
>
> So perhaps it is good as it is.
>

I think this design should work. There are a lot of scenarios, where
session variables can be used well, and sure, there will be scenarios where
it doesn't work well, but now, I think it is a good balance between
usability, complexity and code complexity. There are a lot of lines, but
the code is almost very simple.

Regards

Pavel


>
> Yours,
> Laurenz Albe
>

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-24 Thread Jeff Davis

On Wed, 2024-07-24 at 14:47 -0400, Robert Haas wrote:
> On Wed, Jul 24, 2024 at 1:45 PM Jeff Davis  wrote:
> > There's a qualitative difference between a collation update which
> > can
> > break your PKs and FKs, and a ctype update which definitely will
> > not.
> 
> I don't think that's true. All you need is a unique index on
> UPPER(somecol).

Primary keys are on plain column references, not expressions; and don't
support WHERE clauses, so I don't see how a ctype update would affect a
PK.

In any case, you are correct that Unicode updates could put some
constraints at risk, including unique indexes, CHECK, and partition
constraints. But someone has to actually use one of the affected
functions somewhere, and that's the main distinction that I'm trying to
draw.

The reason why collation is qualitatively a much bigger problem is
because there's no obvious indication that you are doing anything
related to collation at all. A very plain "CREATE TABLE x(t text
PRIMARY KEY)" is at risk.

Regards,
Jeff Davis

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-24 Thread Robert Haas

On Wed, Jul 24, 2024 at 3:12 PM Jeff Davis  wrote:
> In any case, you are correct that Unicode updates could put some
> constraints at risk, including unique indexes, CHECK, and partition
> constraints. But someone has to actually use one of the affected
> functions somewhere, and that's the main distinction that I'm trying to
> draw.
>
> The reason why collation is qualitatively a much bigger problem is
> because there's no obvious indication that you are doing anything
> related to collation at all. A very plain "CREATE TABLE x(t text
> PRIMARY KEY)" is at risk.

Well, I don't know. I agree that collation is a much bigger problem,
but not for that reason. I think a user who is familiar with the
problems in this area will see the danger either way, and one who
isn't, won't. For me, the only real difference is that a unique index
on a text column is a lot more common than one that involves UPPER.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: CI, macports, darwin version problems

2024-07-24 Thread Joe Conway


On 7/23/24 10:44, Joe Conway wrote:

I guess if all else fails I will have to get the mac mini with more
built in storage in order to accommodate sonoma.


I *think* I finally have it in a good place. I replaced the nvme 
enclosure that I bought the other day (which had a 10G interface speed) 
with a new one (which has 40G rated speed). The entire ~/.tart directory 
is a symlink to /Volumes/extnvme. The last two runs completed 
successfully and at about the same speed as the PGX macmini does.


Let me know if you see any issues.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Incremental backup from a streaming replication standby fails

2024-07-24 Thread Robert Haas

On Wed, Jul 24, 2024 at 6:46 AM Laurenz Albe  wrote:
>   An incremental backup is only possible if replay would begin from a later
>   checkpoint than the checkpoint that started the previous backup upon which
>   it depends.

My concern here is that the previous backup might have been taken on a
standby, and therefore it did not start with a checkpoint. For a
standby backup, replay will begin from a checkpoint record, but that
record may be quite a bit earlier in the WAL. For instance, imagine
checkpoint_timeout is set to 30 minutes on the standby. When the
backup is taken, the most recent restartpoint could be up to 30
minutes ago -- and it is the checkpoint record for that restartpoint
from which replay will begin. I think that in my phrasing, it's always
about the checkpoint from which replay would begin (which is always
well-defined) not the checkpoint that started the backup (which is
only logical on the primary).

>  If you take the incremental backup on the primary, this
>   condition is always satisfied, because each backup triggers a new
>   checkpoint.  On a standby, replay begins from the most recent restartpoint.
>   Therefore, an incremental backup of a standby server can fail if there has
>   been very little activity since the previous backup, since no new
>   restartpoint might have been created.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Regarding t_cid in Neon heap WAL records

2024-07-24 Thread Heikki Linnakangas


On 24/07/2024 21:44, Muhammad Malik wrote:
Neon added a t_cid field to heap WAL records 
https://github.com/yibit/neon-postgresql/blob/main/docs/core_changes.md#add-t_cid-to-heap-wal-records .


This isn't really a pgsql-hackers topic, so I've opened an issue on the 
neon github issue tracker for this: 
https://github.com/neondatabase/neon/issues/8499. (Thanks for the report 
though!)


--
Heikki Linnakangas
Neon (https://neon.tech)

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-24 Thread Jeremy Schneider

On Wed, Jul 24, 2024 at 12:47 PM Robert Haas  wrote:

On Wed, Jul 24, 2024 at 1:45 PM Jeff Davis  wrote:
> There's a qualitative difference between a collation update which can
> break your PKs and FKs, and a ctype update which definitely will not.

I don't think that's true. All you need is a unique index on UPPER(somecol).

I doubt it’s common to have unique on upper()

But non-unique indexes for case insensitive searches will be more common.
Historically this is the most common way people did case insensitive on
oracle.

Changing ctype would mean these queries can return wrong results

The impact would be similar to the critical problem TripAdvisor hit in 2014
with their read replicas, in the Postgres email thread I linked above

-Jeremy

Re: [18] Policy on IMMUTABLE functions and Unicode updates

2024-07-24 Thread Robert Haas

On Wed, Jul 24, 2024 at 3:43 PM Jeremy Schneider
 wrote:
> But non-unique indexes for case insensitive searches will be more common. 
> Historically this is the most common way people did case insensitive on 
> oracle.
>
> Changing ctype would mean these queries can return wrong results

Yeah. I mentioned earlier that I very recently saw a customer query
with UPPER() in the join condition. If someone is doing foo JOIN bar
ON upper(foo.x) = upper(bar.x), it is not unlikely that one or both of
those expressions are indexed. Not guaranteed, of course, but very
plausible.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Convert sepgsql tests to TAP

2024-07-24 Thread Andreas Karlsson


On 7/24/24 6:33 PM, Peter Eisentraut wrote:

On 24.07.24 16:31, Andreas Karlsson wrote:
I took a quick look at the patch and I like that we standardize things 
a bit. But one thing I am not a fan of are all the use of sed and awk 
in the Perl script. I would prefer if that logic happened all in Perl, 
especially since we have some of it in Perl (e.g. chomp). Also I 
wonder if we should not use IPC::Run to do the tests since we already 
depend on it for the other TAP tests.


In principle yes, but here I tried not rewriting the tests too much but 
just port them to a newer environment.  I think the adjustments you 
describe could be done as a second step.


That reasoning makes a lot of sense and I am in agreement. Cleaning that 
up is best for another patch.


And managed to get the tests running on Rocky Linux 9 with both 
autotools and meson and everything work as it should.


So I have two comments:

1) As I said earlier I think we should remove the old code.

2) If we remove the old code I think the launcher script can be merged 
into the TAP test instead of being a separate shell script. But I am 
fine if you think that is also something for a separate commit.


I like this kind of clean up patch. Good work! :)

Andreas

Re: [BUG?] check_exclusion_or_unique_constraint false negative

2024-07-24 Thread Michail Nikolaev

Hello, everyone!

Updates so far:
* issue happens with both LOGGED and UNLOGGED relations
* issue happens with DirtySnapshot
* not happens with SnapshotSelf
* not happens with SnapshotAny
* not related to speculative inserted tuples - I have commented the code of
its insertion - and the issue continues to occur.

Best regards,
Mikhail.

Re: DRAFT: Pass sk_attno to consistent function

2024-07-24 Thread Tom Lane

=?utf-8?Q?Micha=C5=82_K=C5=82eczek?=  writes:
> I understand extensibility of GIST makes many operators opaque to the planner 
> and it is the “consistent” function that can perform optimisations (or we can 
> come up with some additional mechanism during planning phase).
> Providing more information to “consistent” function would make it possible to 
> implement optimisations not only for array scan keys but for ranges and 
> others.

> What we can do is to provide the index attribute number (reduntantly) as 
> option - it is going to work but is somewhat ugly - especially that this 
> information is already available when calling consistent function.

While the proposed change is simple enough and wouldn't break
anything, I share Matthias' distaste for it, because the motivation
you've given for it is deeply unsatisfying and indeed troubling.
A GIST consistent function is surely the wrong place to be optimizing
away unnecessary partitions: that consideration is not unique to
GIST indexes (or even to indexscans), much less to particular opclasses.
Moreover, having a consistent function inspect catalog state seems
like a kluge of the first order, not least because there'd be no good
place to cache the lookup results, so you'd be doing them over and
over again.

I like Matthias' suggestion of seeing whether you could use a
planner support function to split up the query by partitions
during planning.

There are bits of what you mentioned that I would gladly take
patches for --- for example, I think the reason GIST lacks SAOP
support is mostly lack of round tuits, not that it'd be a bad
thing.  But it's not clear to me that that alone would help you
much.  The whole design as you've described it seems like multiple
layers of kluges, so that getting rid of any one of them doesn't
really make it not klugy.  (I also have serious doubts about how well
it would perform for you, even with this additional kluge in place.
I don't think that a multidimensional GIST over zillions of rows will
perform very well: the upper tree layers are just not going to be very
selective.)

regards, tom lane

Re: Convert sepgsql tests to TAP

2024-07-24 Thread Tom Lane

Andreas Karlsson  writes:
> 1) As I said earlier I think we should remove the old code.

I agree that carrying two versions of the test doesn't seem great.
However, a large part of the purpose of test_sepgsql is to help
people debug their sepgsql setup, which is why it goes to great
lengths to print helpful error messages.  I'm worried that making
it into a TAP test will degrade the usefulness of that, simply
because the TAP infrastructure is pretty damn unfriendly when it
comes to figuring out why a test failed.  You have to know where
to even look for the test logfile, and then you have to ignore
a bunch of useless-to-you chatter.  I'm not sure if there is much
we can do to improve that.  (Although if we could, it would
yield benefits across the whole tree.)

OTOH, I suspect there are so few people using sepgsql that this
doesn't matter too much.  Probably most of them will be advanced
hackers who won't blink at digging through a TAP log.  We should
update the docs to explain that though.

regards, tom lane

Re: Sporadic connection-setup-related test failures on Cygwin in v15-

2024-07-24 Thread Thomas Munro

On Thu, Jul 25, 2024 at 1:00 AM Alexander Lakhin  wrote:
> The important fact here is that this failure is not reproduced after
> 7389aad63 (in v16), so it seems that it's somehow related to signal
> processing. Given that, I'm inclined to stop here, without digging deeper,
> at least until there are plans to backport that fix or something...

+1.  I'm not planning to back-patch that work.  Perhaps lorikeet
could stop testing releases < 16?  They don't work and it's not our
bug[1].  We decided not to drop Cygwin support[2], but I don't think
we're learning anything from investigating that noise in the
known-broken branches.

[1] https://sourceware.org/legacy-ml/cygwin/2017-08/msg00048.html
[2] 
https://www.postgresql.org/message-id/5e6797e9-bc26-ced7-6c9c-59bca415598b%40dunslane.net

Re: Convert sepgsql tests to TAP

2024-07-24 Thread Andreas Karlsson


On 7/24/24 10:35 PM, Tom Lane wrote:

Andreas Karlsson  writes:

1) As I said earlier I think we should remove the old code.


I agree that carrying two versions of the test doesn't seem great.
However, a large part of the purpose of test_sepgsql is to help
people debug their sepgsql setup, which is why it goes to great
lengths to print helpful error messages.  I'm worried that making
it into a TAP test will degrade the usefulness of that, simply
because the TAP infrastructure is pretty damn unfriendly when it
comes to figuring out why a test failed.  You have to know where
to even look for the test logfile, and then you have to ignore
a bunch of useless-to-you chatter.  I'm not sure if there is much
we can do to improve that.  (Although if we could, it would
yield benefits across the whole tree.)


For me personally the output from when running it with meson was good 
enough while the output when running with autotools was usable but 
annoying to work with. Meson's integration with TAP is pretty good. But 
with that said I am a power user and developer used to both meson and 
autotools. Unclear what skill we should expect from the target audience 
of test_sepgsql.


Andreas

Re: Remove dependence on integer wrapping

2024-07-24 Thread Matthew Kim

On Mon, Jul 22, 2024 at 5:52 PM Alexander Lakhin

wrote:

> Also there are several trap-producing cases with date types:
> SELECT make_date(-2147483648, 1, 1);

Hi, I’ve attached a patch that fixes the make_date overflow. I’ve upgraded
the patches accordingly to reflect Joseph’s committed fix.

Thank you,
Matthew Kim


v16-0004-Handle-negative-years-overflow-in-make_date.patch
Description: Binary data


v16-0002-Remove-dependence-on-integer-wrapping-for-jsonb.patch
Description: Binary data


v16-0001-Remove-dependence-on-integer-wrapping.patch
Description: Binary data


v16-0003-Handle-overflows-in-do_to_timestamp.patch
Description: Binary data

Re: CI, macports, darwin version problems

2024-07-24 Thread Thomas Munro

On Thu, Jul 25, 2024 at 7:25 AM Joe Conway  wrote:
> I *think* I finally have it in a good place. I replaced the nvme
> enclosure that I bought the other day (which had a 10G interface speed)
> with a new one (which has 40G rated speed). The entire ~/.tart directory
> is a symlink to /Volumes/extnvme. The last two runs completed
> successfully and at about the same speed as the PGX macmini does.

Looking good!  Thanks.  I have now pushed the patch to switch CI to
Sonoma, back-patched as far as 15.  Let's see how that goes.  I have
also paused the pgx machine for now, until Christophe is available to
help us fix it.

Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

2024-07-24 Thread Michael Paquier

On Wed, Jul 24, 2024 at 06:00:59AM -0700, Noah Misch wrote:
> I'm still seeing need for s/int/int64/ at:

Nice catches!  I've missed these.

> - "pagesegno" variable
> - return value of MultiXactIdToOffsetSegment()

Only used in four places for two elog(DEBUG1) entries with %x.

> - return value of MXOffsetToMemberSegment()

Also used in four places for two elog(DEBUG1) entries with %x, plus
three callers in PerformMembersTruncation(), nothing fancy.

> Only the first should be a live bug, since multixact isn't electing the higher
> pageno ceiling.

Yes, and it makes a switch to long segment names everywhere easier.
There is a patch in the air to do that, without the pg_upgrade
additions required to do the transfer, though.

I am attaching a patch for all these you have spotted, switching the
logs to use %llx.  Does that look fine for you?
--
Michael
From d2bae5b020ab6a5182a52f4a8e58a39a7d3a5947 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 25 Jul 2024 10:47:51 +0900
Subject: [PATCH] Fix a couple of extra spots missing the int->int64 switch for
 SLRU pages

---
 src/backend/access/transam/multixact.c | 39 +-
 src/backend/access/transam/slru.c  |  2 +-
 2 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/transam/multixact.c 
b/src/backend/access/transam/multixact.c
index 675affe4f7..b7b47ef076 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -120,7 +120,7 @@ MultiXactIdToOffsetEntry(MultiXactId multi)
return multi % MULTIXACT_OFFSETS_PER_PAGE;
 }
 
-static inline int
+static inline int64
 MultiXactIdToOffsetSegment(MultiXactId multi)
 {
return MultiXactIdToOffsetPage(multi) / SLRU_PAGES_PER_SEGMENT;
@@ -174,7 +174,7 @@ MXOffsetToMemberPage(MultiXactOffset offset)
return offset / MULTIXACT_MEMBERS_PER_PAGE;
 }
 
-static inline int
+static inline int64
 MXOffsetToMemberSegment(MultiXactOffset offset)
 {
return MXOffsetToMemberPage(offset) / SLRU_PAGES_PER_SEGMENT;
@@ -3039,10 +3039,10 @@ SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, 
int64 segpage, void *data
 static void
 PerformMembersTruncation(MultiXactOffset oldestOffset, MultiXactOffset 
newOldestOffset)
 {
-   const int   maxsegment = 
MXOffsetToMemberSegment(MaxMultiXactOffset);
-   int startsegment = 
MXOffsetToMemberSegment(oldestOffset);
-   int endsegment = 
MXOffsetToMemberSegment(newOldestOffset);
-   int segment = startsegment;
+   const int64 maxsegment = MXOffsetToMemberSegment(MaxMultiXactOffset);
+   int64   startsegment = MXOffsetToMemberSegment(oldestOffset);
+   int64   endsegment = MXOffsetToMemberSegment(newOldestOffset);
+   int64   segment = startsegment;
 
/*
 * Delete all the segments but the last one. The last segment can still
@@ -3050,7 +3050,8 @@ PerformMembersTruncation(MultiXactOffset oldestOffset, 
MultiXactOffset newOldest
 */
while (segment != endsegment)
{
-   elog(DEBUG2, "truncating multixact members segment %x", 
segment);
+   elog(DEBUG2, "truncating multixact members segment %llx",
+(unsigned long long) segment);
SlruDeleteSegment(MultiXactMemberCtl, segment);
 
/* move to next segment, handling wraparound correctly */
@@ -3201,14 +3202,14 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid 
newOldestMultiDB)
}
 
elog(DEBUG1, "performing multixact truncation: "
-"offsets [%u, %u), offsets segments [%x, %x), "
-"members [%u, %u), members segments [%x, %x)",
+"offsets [%u, %u), offsets segments [%llx, %llx), "
+"members [%u, %u), members segments [%llx, %llx)",
 oldestMulti, newOldestMulti,
-MultiXactIdToOffsetSegment(oldestMulti),
-MultiXactIdToOffsetSegment(newOldestMulti),
+(unsigned long long) MultiXactIdToOffsetSegment(oldestMulti),
+(unsigned long long) 
MultiXactIdToOffsetSegment(newOldestMulti),
 oldestOffset, newOldestOffset,
-MXOffsetToMemberSegment(oldestOffset),
-MXOffsetToMemberSegment(newOldestOffset));
+(unsigned long long) MXOffsetToMemberSegment(oldestOffset),
+(unsigned long long) MXOffsetToMemberSegment(newOldestOffset));
 
/*
 * Do truncation, and the WAL logging of the truncation, in a critical
@@ -3461,14 +3462,14 @@ multixact_redo(XLogReaderState *record)
   SizeOfMultiXactTruncate);
 
elog(DEBUG1, "replaying multixact truncation: "
-"offsets [%u, %u), offsets segments [%x, %x), "
-"members [%u, %u), members segments [%x, %x)",
+"offsets

Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bind

2024-07-24 Thread Michael Paquier

On Wed, Jul 24, 2024 at 05:33:07PM +0200, Peter Eisentraut wrote:
> This commit message confused me, because I don't think this is what the
> \bindx command actually does.  AFAICT, it only binds, it does not execute.
> At least that is what the documentation in the content of the patch appears
> to indicate.

Yep.  FWIW, I always edit these before commit, and noticed that it was
incorrect.  Just took the original message for now.

> I'm not sure \bindx is such a great name.  The "x" stands for "I ran out of
> ideas". ;-)  Maybe \bind_named or \bindn or something like that.  Or use the
> existing \bind with a -name argument?

Not sure that I like much the additional option embedded in the
existing command; I'd rather keep a separate command for each libpq
call, that seems cleaner.  So I would be OK with your suggested
\bind_named.  Fine by me to be outvoted, of course.
--
Michael


signature.asc
Description: PGP signature

Re: CI, macports, darwin version problems

2024-07-24 Thread Thomas Munro

On Thu, Jul 25, 2024 at 11:35 AM Thomas Munro  wrote:
> Looking good!  Thanks.  I have now pushed the patch to switch CI to
> Sonoma, back-patched as far as 15.  Let's see how that goes.  I have
> also paused the pgx machine for now, until Christophe is available to
> help us fix it.

Cfbot builds are working nicely on Sonoma.

But... unfortunately the github.com/postgres/postgres CI was working
for REL_15_STABLE only, and not 16, 17 or master.  After scratching my
head for a moment, I realised that the new logic for finding the most
recent version of MacPorts is now picking up a new beta version of
MacPorts that has just been published, and apparently it doesn't work
quite right and couldn't install meson.  D'oh!

I have pushed a fix for that: I went back to requesting 2.9.3 until we
are ready to change it.

Long version: I had imagined we might one day need to nail down the
version in a commented out line already, I just failed to think about
non-release versions appearing in the list.  An alternative might be
to use a pattern that matches stable releases eg [0-9][0-9.]* to
exclude stuff like 2.10-beta1 but for now I have cold feet about that
lack of control.  While thinking about that, it occurred to me that it
might also be better if it also re-installs from scratch whenever our
script that installs MacPorts changes, so I included its md5 in the
cache key.  Otherwise it's a bit hard to test, since the cached
installation survives across rebuilds by design (that's why cfbot
isn't affected, it installed 2.9.3 before they unleashed 2.10-beta1
and cached the result).  This way, if someone changes 2.9.3 to
2.whatever in that script, it'll do a fresh installation of MacPorts
on the first build in a given github account, and then later builds
will work from the cached copy.  Seems like desirable behaviour for
future maintenance work.

Re: optimizing pg_upgrade's once-in-each-database steps

2024-07-24 Thread Nathan Bossart

On Mon, Jul 22, 2024 at 03:07:10PM -0500, Nathan Bossart wrote:
> Here is a new patch set.  I've included the latest revision of the patch to
> fix get_db_subscription_count() from the other thread [0] as 0001 since I
> expect that to be committed soon.  I've also moved the patch that moves the
> "live_check" variable to "user_opts" to 0002 since I plan on committing
> that sooner than later, too.  Otherwise, I've tried to address all feedback
> provided thus far.

Here is just the live_check patch.  I rebased it, gave it a commit message,
and fixed a silly mistake.  Barring objections or cfbot indigestion, I plan
to commit this within the next couple of days.

-- 
nathan
>From 963ca637e9b2481eb762de35b4d4eaa30faf5f47 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 24 Jul 2024 21:55:15 -0500
Subject: [PATCH v6 1/1] pg_upgrade: Move live_check variable to user_opts.

At the moment, pg_upgrade stores whether it is doing a "live check"
(i.e., the user specified --check and the old server is still
running) in a variable scoped to main().  This live_check variable
is passed to several functions.  To further complicate matters, a
few calls to said functions provide a hard-coded "false" as the
live_check argument.  Specifically, this is done when calling these
functions for the new cluster, for which any live-check-only paths
won't apply.

This commit moves the live_check variable to the global user_opts
variable, which stores information about the options the user
provided on the command line.  This allows us to remove the
"live_check" parameter from several functions.  For the functions
with callers that provide a hard-coded "false" as the live_check
argument (e.g., get_control_data()), we simply verify the given
cluster is the old cluster before taking any live-check-only paths.

This small refactoring effort aims to simplify proposed future
commits that would parallelize many of pg_upgrade's
once-in-each-database tasks using libpq's asynchronous APIs.
Specifically, by removing the live_check parameter, we can more
easily convert the functions to callbacks for the new parallel
system.

Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/20240516211638.GA1688936%40nathanxps13
---
 src/bin/pg_upgrade/check.c   | 30 +++---
 src/bin/pg_upgrade/controldata.c |  3 ++-
 src/bin/pg_upgrade/info.c| 12 +---
 src/bin/pg_upgrade/option.c  |  4 ++--
 src/bin/pg_upgrade/pg_upgrade.c  | 21 ++---
 src/bin/pg_upgrade/pg_upgrade.h  | 13 +++--
 6 files changed, 41 insertions(+), 42 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 51e30a2f23..5038231731 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -29,7 +29,7 @@ static void check_for_new_tablespace_dir(void);
 static void check_for_user_defined_encoding_conversions(ClusterInfo *cluster);
 static void check_new_cluster_logical_replication_slots(void);
 static void check_new_cluster_subscription_configuration(void);
-static void check_old_cluster_for_valid_slots(bool live_check);
+static void check_old_cluster_for_valid_slots(void);
 static void check_old_cluster_subscription_state(void);

 /*
@@ -555,9 +555,9 @@ fix_path_separator(char *path)
 }

 void
-output_check_banner(bool live_check)
+output_check_banner(void)
 {
-   if (user_opts.check && live_check)
+   if (user_opts.live_check)
{
pg_log(PG_REPORT,
   "Performing Consistency Checks on Old Live Server\n"
@@ -573,18 +573,18 @@ output_check_banner(bool live_check)

 void
-check_and_dump_old_cluster(bool live_check)
+check_and_dump_old_cluster(void)
 {
/* -- OLD -- */

-   if (!live_check)
+   if (!user_opts.live_check)
start_postmaster(&old_cluster, true);

/*
 * Extract a list of databases, tables, and logical replication slots 
from
 * the old cluster.
 */
-   get_db_rel_and_slot_infos(&old_cluster, live_check);
+   get_db_rel_and_slot_infos(&old_cluster);

init_tablespaces();

@@ -605,7 +605,7 @@ check_and_dump_old_cluster(bool live_check)
 * Logical replication slots can be migrated since PG17. See 
comments
 * atop get_old_cluster_logical_slot_infos().
 */
-   check_old_cluster_for_valid_slots(live_check);
+   check_old_cluster_for_valid_slots();

/*
 * Subscriptions and their dependencies can be migrated since 
PG17.
@@ -669,7 +669,7 @@ check_and_dump_old_cluster(bool live_check)
if (!user_opts.check)
generate_old_dump();

-   if (!live_check)
+   if (!user_opts.live_check)
stop_postmaster(false);
 }

@@ -677,7 +677,7 @@ check_and_dump_old_cluster(bool live_check)
 void
 check_new_cluster(void)
 {
-   get_db_rel_and_slot_infos(&new_cluster, false);
+   get_

Re: Sporadic connection-setup-related test failures on Cygwin in v15-

2024-07-24 Thread Alexander Lakhin


24.07.2024 23:58, Thomas Munro wrote:

+1.  I'm not planning to back-patch that work.  Perhaps lorikeet
could stop testing releases < 16?  They don't work and it's not our
bug[1].  We decided not to drop Cygwin support[2], but I don't think
we're learning anything from investigating that noise in the
known-broken branches.


Yeah, it looks like lorikeet votes +[1] for your proposal.
(I suppose it failed due to the same signal processing issue, just another
way.)

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet&dt=2024-07-24%2008%3A54%3A07

Best regards,
Alexander

1 2 >

1 - 100 of 115 matches

Mail list logo