date:20240506

Re: Revert: Remove useless self-joins and -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-06 Thread David Rowley

On Tue, 7 May 2024 at 17:28, Tom Lane  wrote:
>
> David Rowley  writes:
> > Yeah, before the revert, that did:
> > -   sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, 
> > subst);
> > That replace code seems to have always done a bms_copy()
>
> Hmm, not always; see e0477837c.

It was the discussion on that thread that led to the invention of
REALLOCATE_BITMAPSETS

> What I'm trying to figure out here is whether we have a live bug
> in this area in released branches; and if so, why we've not seen
> reports of that.

We could check what portions of REALLOCATE_BITMAPSETS are
backpatchable. It may not be applicable very far back because of v16's
00b41463c. The bms_del_member() would have left a zero set rather than
doing bms_free() prior to that commit.  There could be a bug in v16.

David

Re: Use pgstat_kind_infos to read fixed shared stats structs

2024-05-06 Thread Tristan Partin


On Mon May 6, 2024 at 9:50 PM CDT, Michael Paquier wrote:

On Mon, May 06, 2024 at 02:07:53PM -0500, Tristan Partin wrote:
> This was originally thought of by Andres in
> 5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd.

+1 because you are removing a duplication between the order of the
items in PgStat_Kind and the order when these are read.  I suspect
that nobody would mess up with the order if adding a stats kind with a
fixed number of objects, but that makes maintenance slightly easier in
the long-term :)

> Not a fix, per se, but it does remove a comment. Perhaps the discussion will
> just lead to someone deleting the comment, and keeping the code as is.
> Either way, a win :).

Yup.  Let's leave that as something to do for v18.


Thanks for the feedback Michael. Should I just throw the patch in the 
next commitfest so it doesn't get left behind?


--
Tristan Partin
Neon (https://neon.tech)

Re: Revert: Remove useless self-joins and -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-06 Thread Tom Lane

David Rowley  writes:
> Yeah, before the revert, that did:
> -   sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, 
> subst);
> That replace code seems to have always done a bms_copy()

Hmm, not always; see e0477837c.

What I'm trying to figure out here is whether we have a live bug
in this area in released branches; and if so, why we've not seen
reports of that.

regards, tom lane

Re: Revert: Remove useless self-joins and -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-06 Thread David Rowley

On Tue, 7 May 2024 at 15:35, jian he  wrote:
> i also found that not specifying c_args: `-DREALLOCATE_BITMAPSETS`,
> the regress test won't fail.

It would be good to get some build farm coverage of this so we don't
have to rely on manual testing.  I wonder if it's a good idea to just
define REALLOCATE_BITMAPSETS when #ifdef CLOBBER_FREED_MEMORY... or if
we should ask on the buildfarm-members list if someone wouldn't mind
defining it?

David

Re: Revert: Remove useless self-joins and -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-06 Thread David Rowley

On Tue, 7 May 2024 at 17:11, David Rowley  wrote:
> sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);
>
> I've not looked, but I assumed the revert must have removed some
> common code that was added and reverted with SJE.

Yeah, before the revert, that did:

-   sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, subst);

That replace code seems to have always done a bms_copy()

-static Bitmapset *
-replace_relid(Relids relids, int oldId, int newId)
-{
-   if (oldId < 0)
-   return relids;
-
-   /* Delete relid without substitution. */
-   if (newId < 0)
-   return bms_del_member(bms_copy(relids), oldId);
-
-   /* Substitute newId for oldId. */
-   if (bms_is_member(oldId, relids))
-   return bms_add_member(bms_del_member(bms_copy(relids), oldId), newId);
-
-   return relids;
-}


David

Re: Revert: Remove useless self-joins and -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-06 Thread David Rowley

On Tue, 7 May 2024 at 17:01, Tom Lane  wrote:
>
> Richard Guo  writes:
> > Thank you for the report.  I looked at this a little bit and I think
> > here is what happened.  In deconstruct_distribute_oj_quals we call
> > distribute_quals_to_rels using the uncopied sjinfo->syn_lefthand as
> > outerjoin_nonnullable, which eventually becomes rinfo->outer_relids.
> > Later on, when we remove useless left joins, we modify
> > sjinfo->syn_lefthand using bms_del_member and recycle
> > sjinfo->syn_lefthand.  And that causes the rinfo->outer_relids becomes
> > invalid, and finally triggers this issue in join_clause_is_movable_to.
>
> Hmm, the SJE code didn't really touch any of this logic, so why
> didn't we see the failure before?

The bms_free() occurs in remove_rel_from_query() on:

sjinf->syn_lefthand = bms_del_member(sjinf->syn_lefthand, relid);

I've not looked, but I assumed the revert must have removed some
common code that was added and reverted with SJE.

David

Re: Revert: Remove useless self-joins and -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-06 Thread Tom Lane

Richard Guo  writes:
> Thank you for the report.  I looked at this a little bit and I think
> here is what happened.  In deconstruct_distribute_oj_quals we call
> distribute_quals_to_rels using the uncopied sjinfo->syn_lefthand as
> outerjoin_nonnullable, which eventually becomes rinfo->outer_relids.
> Later on, when we remove useless left joins, we modify
> sjinfo->syn_lefthand using bms_del_member and recycle
> sjinfo->syn_lefthand.  And that causes the rinfo->outer_relids becomes
> invalid, and finally triggers this issue in join_clause_is_movable_to.

Hmm, the SJE code didn't really touch any of this logic, so why
didn't we see the failure before?

regards, tom lane

Re: Revert: Remove useless self-joins and -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-06 Thread David Rowley

On Tue, 7 May 2024 at 16:47, Richard Guo  wrote:
> --- a/src/backend/optimizer/plan/initsplan.c
> +++ b/src/backend/optimizer/plan/initsplan.c
> @@ -1888,7 +1888,7 @@ deconstruct_distribute_oj_quals(PlannerInfo *root,
> qualscope = bms_union(sjinfo->syn_lefthand, sjinfo->syn_righthand);
> qualscope = bms_add_member(qualscope, sjinfo->ojrelid);
> ojscope = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
> -   nonnullable_rels = sjinfo->syn_lefthand;
> +   nonnullable_rels = bms_copy(sjinfo->syn_lefthand);

I was busy looking at this too and I came to the same conclusion.

David

Re: Revert: Remove useless self-joins and -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-06 Thread Richard Guo

On Tue, May 7, 2024 at 11:35 AM jian he  wrote:

> hi,
>
> SELECT table_name, column_name, is_updatable
>   FROM information_schema.columns
>  WHERE table_name LIKE E'r_\\_view%'
>  ORDER BY table_name, ordinal_position;
>
> at d1d286d83c0eed695910cb20d970ea9bea2e5001,
> this query in src/test/regress/sql/updatable_views.sql
> makes regress tests fail. maybe other query also,
> but this is the first one that invokes the server crash.

Thank you for the report.  I looked at this a little bit and I think
here is what happened.  In deconstruct_distribute_oj_quals we call
distribute_quals_to_rels using the uncopied sjinfo->syn_lefthand as
outerjoin_nonnullable, which eventually becomes rinfo->outer_relids.
Later on, when we remove useless left joins, we modify
sjinfo->syn_lefthand using bms_del_member and recycle
sjinfo->syn_lefthand.  And that causes the rinfo->outer_relids becomes
invalid, and finally triggers this issue in join_clause_is_movable_to.

Maybe we want to bms_copy sjinfo->syn_lefthand first before using it as
nonnullable_rels.

--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -1888,7 +1888,7 @@ deconstruct_distribute_oj_quals(PlannerInfo *root,
qualscope = bms_union(sjinfo->syn_lefthand, sjinfo->syn_righthand);
qualscope = bms_add_member(qualscope, sjinfo->ojrelid);
ojscope = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);
-   nonnullable_rels = sjinfo->syn_lefthand;
+   nonnullable_rels = bms_copy(sjinfo->syn_lefthand);

I will take a closer look in the afternoon.

Thanks
Richard

Re: [PATCH] Fix bug when calling strncmp in check_authmethod_valid

2024-05-06 Thread Jingxian Li

Hi Daniel，
Thank you for explaining the ins and outs of this problem.

On 2024/4/30 17:14, Daniel Gustafsson wrote:
>> On 30 Apr 2024, at 04:41, Jingxian Li  wrote:
>
>> Attached is a patch that fixes bug when calling strncmp function, in 
>> which case the third argument (authmethod - strchr(authmethod, ' ')) 
>> may be negative, which is not as expected..
>
> The calculation is indeed incorrect, but the lack of complaints of it being
> broken made me wonder why this exist in the first place.  This dates back to
> e7029b212755, just shy of 2 decades old, which added --auth with support for
> strings with auth-options to ident and pam like --auth 'pam ' and
> 'ident sameuser'.  Support for options to ident was removed in 01c1a12a5bb4 
> but
> options to pam is still supported (although not documented), but was AFAICT
> broken in commit 8a02339e9ba3 some 12 years ago with this strncmp().
>
> -  if (strncmp(authmethod, *p, (authmethod - strchr(authmethod, ' '))) == 0)
> +  if (strncmp(authmethod, *p, (strchr(authmethod, ' ') - authmethod)) == 0)
>
> This with compare "pam postgresql" with "pam" and not "pam " so the length
> should be "(strchr(authmethod, ' ') - authmethod + 1)" since "pam " is a 
> method
> separate from "pam" in auth_methods_{host|local}.  We don't want to allow "md5
> " as that's not a method in the array of valid methods.
>
> But, since it's been broken in all supported versions of postgres and has
> AFAICT never been documented to exist, should we fix it or just remove it?  We
> don't support auth-options for any other methods, like clientcert to cert for
> example.  If we fix it we should also document that it works IMHO.

You mentioned that auth-options are not supported for auth methods except pam,
but I found that  some methods (such as  ldap and radius etc.) also requires 
aut-options,
and there are no corresponding auth methods ending with space (such as  "ldap " 
and 
radius ") present in auth_methods_host and auth_methods_local arrays.

--
Jingxian Li

Re: Control flow in logical replication walsender

2024-05-06 Thread Ashutosh Bapat

On Tue, May 7, 2024 at 12:00 AM Christophe Pettus  wrote:

> Thank you for the reply!
>
> > On May 1, 2024, at 02:18, Ashutosh Bapat 
> wrote:
> > Is there a large transaction which is failing to be replicated
> repeatedly - timeouts, crashes on upstream or downstream?
>
> AFAIK, no, although I am doing this somewhat by remote control (I don't
> have direct access to the failing system).  This did bring up one other
> question, though:
>
> Are subtransactions written to their own individual reorder buffers (and
> thus potentially spill files), or are they appended to the topmost
> transaction's reorder buffer?


IIRC, they have their own RB, but once they commit, they are transferred to
topmost transaction's RB. So they can spill files.

-- 
Best Wishes,
Ashutosh Bapat

Re: 2024-05-09 release announcement draft

2024-05-06 Thread Tom Lane

David Rowley  writes:
> Why not "Fix INSERT with multi-row VALUES clauses ..."?

To my mind, the VALUES clause is the data source for INSERT,
so "from" seems appropriate.  I'm not going to argue hard
about it.

regards, tom lane

Re: 2024-05-09 release announcement draft

2024-05-06 Thread David Rowley

On Tue, 7 May 2024 at 15:48, Tom Lane  wrote:
>
> David Rowley  writes:
> > I know this is the same wording as Tom added in [1], I might just have
> > failed to comprehend something, but if I strip out the links and try
> > to make sense of "Fix INSERT from multiple VALUES rows into", I just
> > can't figure out how to parse it.  I'm pretty sure it means "Fix
> > multiple-row VALUES clauses with INSERT statements when ...", but I'm
> > not sure.
>
> The problem happens in commands like
> INSERT INTO tab VALUES (1,2), (3,4), ...
> We treat this separately from the single-VALUES-row case for
> efficiency reasons.

Yeah, I know about the multi-row VALUES. What I'm mostly struggling to
parse is the "from" and the double plural of "VALUES" and "rows".
Also, why is it "from" and not "with"?  I get that "VALUES" is a
keyword that happens to be plural, but try reading it out loud.

Why not "Fix INSERT with multi-row VALUES clauses ..."?

David

Re: 2024-05-09 release announcement draft

2024-05-06 Thread Tom Lane

David Rowley  writes:
> I know this is the same wording as Tom added in [1], I might just have
> failed to comprehend something, but if I strip out the links and try
> to make sense of "Fix INSERT from multiple VALUES rows into", I just
> can't figure out how to parse it.  I'm pretty sure it means "Fix
> multiple-row VALUES clauses with INSERT statements when ...", but I'm
> not sure.

The problem happens in commands like
INSERT INTO tab VALUES (1,2), (3,4), ...
We treat this separately from the single-VALUES-row case for
efficiency reasons.

regards, tom lane

Re: [PATCH] json_lex_string: don't overread on bad UTF8

2024-05-06 Thread Michael Paquier

On Fri, May 03, 2024 at 07:05:38AM -0700, Jacob Champion wrote:
> On Fri, May 3, 2024 at 4:54 AM Peter Eisentraut  wrote:
>> but for the general encoding conversion we have what
>> would appear to be the same behavior in report_invalid_encoding(), and
>> we go out of our way there to produce a verbose error message including
>> the invalid data.

I was looking for that a couple of days ago in the backend but could
not put my finger on it.  Thanks.

> We could port something like that to src/common. IMO that'd be more
> suited for an actual conversion routine, though, as opposed to a
> parser that for the most part assumes you didn't lie about the input
> encoding and is just trying not to crash if you're wrong. Most of the
> time, the parser just copies bytes between delimiters around and it's
> up to the caller to handle encodings... the exceptions to that are the
> \u escapes and the error handling.

Hmm.  That would still leave the backpatch issue at hand, which is
kind of confusing to leave as it is.  Would it be complicated to
truncate the entire byte sequence in the error message and just give
up because we cannot do better if the input byte sequence is
incomplete?  We could still have some information depending on the
string given in input, which should be enough, hopefully.  With the
location pointing to the beginning of the sequence, even better.

> Offhand, are all of our supported frontend encodings
> self-synchronizing? By that I mean, is it safe to print a partial byte
> sequence if the locale isn't UTF-8? (As I type this I'm starting at
> Shift-JIS, and thinking "probably not.")
>
> Actually -- hopefully this is not too much of a tangent -- that
> further crystallizes a vague unease about the API that I have. The
> JsonLexContext is initialized with something called the
> "input_encoding", but that encoding is necessarily also the output
> encoding for parsed string literals and error messages. For the server
> side that's fine, but frontend clients have the input_encoding locked
> to UTF-8, which seems like it might cause problems? Maybe I'm missing
> code somewhere, but I don't see a conversion routine from
> json_errdetail() to the actual client/locale encoding. (And the parser
> does not support multibyte input_encodings that contain ASCII in trail
> bytes.)

Referring to json_lex_string() that does UTF-8 -> ASCII -> give-up in
its conversion for FRONTEND, I guess?  Yep.  This limitation looks
like a problem, especially if plugging that to libpq.
--
Michael


signature.asc
Description: PGP signature

Revert: Remove useless self-joins and -DREALLOCATE_BITMAPSETS make server crash, regress test fail.

2024-05-06 Thread jian he

hi,

SELECT table_name, column_name, is_updatable
  FROM information_schema.columns
 WHERE table_name LIKE E'r_\\_view%'
 ORDER BY table_name, ordinal_position;

at d1d286d83c0eed695910cb20d970ea9bea2e5001,
this query in src/test/regress/sql/updatable_views.sql
makes regress tests fail. maybe other query also,
but this is the first one that invokes the server crash.


src3=# SELECT table_name, column_name, is_updatable
  FROM information_schema.columns
 WHERE table_name LIKE E'r_\\_view%'
 ORDER BY table_name, ordinal_position;
TRAP: failed Assert("bms_is_valid_set(a)"), File:
"../../Desktop/pg_src/src3/postgres/src/backend/nodes/bitmapset.c",
Line: 515, PID: 158266
postgres: jian src3 [local] SELECT(ExceptionalCondition+0x106)[0x5579188c0b6f]
postgres: jian src3 [local] SELECT(bms_is_member+0x56)[0x5579183581c7]
postgres: jian src3 [local]
SELECT(join_clause_is_movable_to+0x72)[0x557918439711]
postgres: jian src3 [local] SELECT(+0x73e26c)[0x5579183a126c]
postgres: jian src3 [local] SELECT(create_index_paths+0x3b8)[0x55791839d0ce]
postgres: jian src3 [local] SELECT(+0x719b4d)[0x55791837cb4d]
postgres: jian src3 [local] SELECT(+0x719400)[0x55791837c400]
postgres: jian src3 [local] SELECT(+0x718e90)[0x55791837be90]
postgres: jian src3 [local] SELECT(make_one_rel+0x187)[0x55791837bac5]
postgres: jian src3 [local] SELECT(query_planner+0x4e8)[0x5579183d2dc2]
postgres: jian src3 [local] SELECT(+0x7734ad)[0x5579183d64ad]
postgres: jian src3 [local] SELECT(subquery_planner+0x14b9)[0x5579183d57e4]
postgres: jian src3 [local] SELECT(standard_planner+0x365)[0x5579183d379e]
postgres: jian src3 [local] SELECT(planner+0x81)[0x5579183d3426]
postgres: jian src3 [local] SELECT(pg_plan_query+0xbb)[0x5579186100c8]
postgres: jian src3 [local] SELECT(pg_plan_queries+0x11a)[0x5579186102de]
postgres: jian src3 [local] SELECT(+0x9ad8f1)[0x5579186108f1]
postgres: jian src3 [local] SELECT(PostgresMain+0xd4a)[0x557918618603]
postgres: jian src3 [local] SELECT(+0x9a76a8)[0x55791860a6a8]
postgres: jian src3 [local]
SELECT(postmaster_child_launch+0x14d)[0x5579184d3430]
postgres: jian src3 [local] SELECT(+0x879c28)[0x5579184dcc28]
postgres: jian src3 [local] SELECT(+0x875278)[0x5579184d8278]
postgres: jian src3 [local] SELECT(PostmasterMain+0x205f)[0x5579184d7837]


version

 PostgreSQL 17devel_debug_build on x86_64-linux, compiled by gcc-11.4.0, 64-bit


meson config:
-Dpgport=5458 \
-Dplperl=enabled \
-Dplpython=enabled \
-Dssl=openssl \
-Dldap=enabled \
-Dlibxml=enabled \
-Dlibxslt=enabled \
-Duuid=e2fs \
-Dzstd=enabled \
-Dlz4=enabled \
-Dcassert=true \
-Db_coverage=true \
-Dicu=enabled \
-Dbuildtype=debug \
-Dwerror=true \
-Dc_args='-Wunused-variable
-Wuninitialized
-Werror=maybe-uninitialized
-Wreturn-type
-DWRITE_READ_PARSE_PLAN_TREES
-DREALLOCATE_BITMAPSETS
-DCOPY_PARSE_PLAN_TREES
-DRAW_EXPRESSION_COVERAGE_TEST -fno-omit-frame-pointer' \
-Ddocs_pdf=disabled \
-Ddocs_html_style=website \
-Dllvm=disabled \
-Dtap_tests=enabled \
-Dextra_version=_debug_build


This commit: d1d286d83c0eed695910cb20d970ea9bea2e5001
Revert: Remove useless self-joins make it fail.

the preceding commit (81b2252e609cfa74550dd6804949485c094e4b85)
won't make the regress fail.

i also found that not specifying c_args: `-DREALLOCATE_BITMAPSETS`,
the regress test won't fail.


later, i found out that `select 1 from information_schema.columns`
would also crash the server.

information_schema.columns view is very complex.
I get the view information_schema.columns definitions,
omit unnecessary const and where qual parts of the it
so the minimum reproducer is:

SELECT 1
FROM (
(pg_attribute a LEFT JOIN pg_attrdef ad ON (((a.attrelid =
ad.adrelid) AND (a.attnum = ad.adnum
JOIN (pg_class c JOIN pg_namespace nc ON (c.relnamespace = nc.oid)) ON
(a.attrelid = c.oid))
JOIN (pg_type t JOIN pg_namespace nt ON ((t.typnamespace = nt.oid)))
ON (a.atttypid = t.oid))
LEFT JOIN (pg_type bt JOIN pg_namespace nbt ON (bt.typnamespace =
nbt.oid))  ON ( t.typbasetype = bt.oid  ))
LEFT JOIN (pg_collation co JOIN pg_namespace nco ON ( co.collnamespace
= nco.oid)) ON (a.attcollation = co.oid))
LEFT JOIN (pg_depend dep JOIN pg_sequence seq ON (dep.objid =
seq.seqrelid )) ON (((dep.refobjid = c.oid) AND (dep.refobjsubid =
a.attnum;

Re: 2024-05-09 release announcement draft

2024-05-06 Thread David Rowley

On Tue, 7 May 2024 at 15:09, Jonathan S. Katz  wrote:
> I opted for that; and it turned out the other fix was simple, so here's
> an updated draft.

Thanks

> * Fix [`INSERT`](https://www.postgresql.org/docs/current/sql-insert.html) from
> multiple [`VALUES`](https://www.postgresql.org/docs/current/sql-values.html)
> rows into a target column that is a domain over an array or composite type.

I know this is the same wording as Tom added in [1], I might just have
failed to comprehend something, but if I strip out the links and try
to make sense of "Fix INSERT from multiple VALUES rows into", I just
can't figure out how to parse it.  I'm pretty sure it means "Fix
multiple-row VALUES clauses with INSERT statements when ...", but I'm
not sure.

> * Require the [SELECT 
> privilege](https://www.postgresql.org/docs/current/sql-grant.html)
> on the target table when using 
> [`MERGE`](https://www.postgresql.org/docs/current/sql-merge.html)
> when using `MERGE ... DO NOTHING`.

I think the last line should just be "with `NO NOTHING`"

David

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7155cc4a60e7bfc837233b2dea2563a2edc673fd

Re: 2024-05-09 release announcement draft

2024-05-06 Thread Jonathan S. Katz


On 5/6/24 11:05 PM, David Rowley wrote:

On Tue, 7 May 2024 at 14:58, Jonathan S. Katz  wrote:

* Throw an error if an index is accessed while it is being reindexed.




Based on this, I'd vote to just remove it from the release announcement.


I'd prefer that over leaving the wording the way it is.  Looking at
the test case in [1], it does not seem like a very likely thing for
people to hit. It basically requires someone to be telling lies about
a function's immutability.


I opted for that; and it turned out the other fix was simple, so here's 
an updated draft.


Jonathan

The PostgreSQL Global Development Group has released an update to all supported
versions of PostgreSQL, including 16.3, 15.7, 14.12, 13.15, and 12.19.
This release fixes over 55 bugs reported over the last several months.

For the full list of changes, please review the
[release notes](https://www.postgresql.org/docs/release/).

PostgreSQL 12 EOL Notice


PostgreSQL 12 will stop receiving fixes on November 14, 2024. If you are
running PostgreSQL 12 in a production environment, we suggest that you make
plans to upgrade to a newer, supported version of PostgreSQL. Please see our
[versioning policy](https://www.postgresql.org/support/versioning/) for more
information.

Bug Fixes and Improvements
--
 
This update fixes over 55 bugs that were reported in the last several months.
The issues listed below affect PostgreSQL 16. Some of these issues may also
affect other supported versions of PostgreSQL.

* Fix [`INSERT`](https://www.postgresql.org/docs/current/sql-insert.html) from
multiple [`VALUES`](https://www.postgresql.org/docs/current/sql-values.html)
rows into a target column that is a domain over an array or composite type.
* Require the [SELECT 
privilege](https://www.postgresql.org/docs/current/sql-grant.html)
on the target table when using 
[`MERGE`](https://www.postgresql.org/docs/current/sql-merge.html)
when using `MERGE ... DO NOTHING`.
* Per the SQL standard, throw an error if a target row in `MERGE` joins to more
than one source row during a modification.
* Fix incorrect pruning of `NULL` partition when a table is partitioned on
a boolean column and the query has a boolean `IS NOT` clause.
* Make [`ALTER FOREIGN TABLE ... SET 
SCHEMA`](https://www.postgresql.org/docs/current/sql-alterforeigntable.html)
move any owned sequences into the new schema.
* [`CREATE 
DATABASE`](https://www.postgresql.org/docs/current/sql-createdatabase.html)
now recognizes `STRATEGY` keywords case-insensitively.
* Fix how [EXPLAIN](https://www.postgresql.org/docs/current/sql-explain.html)
counts heap pages during bitmap heap scan to show all counted pages, not just
ones with visible tuples.
* Avoid deadlock during removal of orphaned temporary tables.
* Several fixes for 
[`VACUUM`](https://www.postgresql.org/docs/current/sql-vacuum.html),
including one that can reduce unnecessary I/O.
* Several query planner fixes.
* Add optimization for certain operations where an installation has thousands
of roles.
* Fix confusion for SQL-language procedures that return a single composite-type
column.
* Fix incorrect rounding and overflow hazards in 
[`date_bin()`](https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-BIN).
* Detect integer overflow when adding or subtracting an
[interval](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT)
to/from a timestamp.
* Fix several race conditions with logical replication, including determining if
a table sync operation is required.
* Disconnect if a new server session's client socket cannot be put into 
non-blocking mode.
* [`initdb -c`](https://www.postgresql.org/docs/current/app-initdb.html) now
matches parameter names case-insensitively.
* Avoid leaking a query result from 
[`psql`](https://www.postgresql.org/docs/current/app-psql.html)
after the query is cancelled.
* Fix how PL/pgSQL parses of [single-line 
comments](https://www.postgresql.org/docs/current/plpgsql-structure.html) (`-- 
style comments`)
following expression.

Updating


All PostgreSQL update releases are cumulative. As with other minor releases,
users are not required to dump and reload their database or use `pg_upgrade` in
order to apply this update release; you may simply shutdown PostgreSQL and
update its binaries.

Users who have skipped one or more update releases may need to run additional
post-update steps; please see the release notes from earlier versions for
details.

For more details, please see the
[release notes](https://www.postgresql.org/docs/release/).

Links
-
* [Download](https://www.postgresql.org/download/)
* [Release Notes](https://www.postgresql.org/docs/release/)
* [Security](https://www.postgresql.org/support/security/)
* [Versioning Policy](https://www.postgresql.org/support/versioning/)
* [PostgreSQL 16 Release Announcement](https://www.postgresql.org/about/press/)
* [Follow @postgresql on

Re: 2024-05-09 release announcement draft

2024-05-06 Thread David Rowley

On Tue, 7 May 2024 at 14:58, Jonathan S. Katz  wrote:
> >> * Throw an error if an index is accessed while it is being reindexed.
> >
>
> Based on this, I'd vote to just remove it from the release announcement.

I'd prefer that over leaving the wording the way it is.  Looking at
the test case in [1], it does not seem like a very likely thing for
people to hit. It basically requires someone to be telling lies about
a function's immutability.

David

[1] https://www.postgresql.org/message-id/18363-e3598a5a572d0...@postgresql.org

Re: 2024-05-09 release announcement draft

2024-05-06 Thread Tom Lane

"Jonathan S. Katz"  writes:
> On 5/6/24 5:08 PM, David Rowley wrote:
>>> * Throw an error if an index is accessed while it is being reindexed.

>>Maybe we should give more detail
>> there so people don't think we made such a silly mistake. The release
>> note version I think does have enough words to allow the reader to
>> understand that the mistake is more complex. Maybe we could add
>> something here to make it sound like less of an embarrassing mistake?

> Based on this, I'd vote to just remove it from the release announcement.

+1.  This is hardly a major bug fix --- it's just blocking off
something that people shouldn't be doing in the first place.

regards, tom lane

Re: 2024-05-09 release announcement draft,2024-05-09 release announcement draft

2024-05-06 Thread Jonathan S. Katz


On 5/6/24 5:36 PM, Sutou Kouhei wrote:

Hi,

In 
<779790c7-45d7-4010-9305-c3f9e6a60...@postgresql.org>,<779790c7-45d7-4010-9305-c3f9e6a60...@postgresql.org>
   "2024-05-09 release announcement draft,2024-05-09 release announcement 
draft" on Mon, 6 May 2024 13:44:05 -0400,
   "Jonathan S. Katz"  wrote:


* Added optimization for certain operations where an installation has thousands
of roles.


Added ->
Add


Fixed.


* [Follow @postgresql on Twitter](https://twitter.com/postgresql)


Twitter ->
X


I think this one is less clear, from browsing around. I think 
"X/Twitter" is considered acceptable, and regardless the domain is still 
pointing to "Twitter". However, I'll go with the hybrid adjustment.


Jonathan



OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: 2024-05-09 release announcement draft

2024-05-06 Thread Jonathan S. Katz


On 5/6/24 5:08 PM, David Rowley wrote:

On Tue, 7 May 2024 at 05:44, Jonathan S. Katz  wrote:

Please provide feedback no later (and preferably sooner) than 2024-05-09
12:00 UTC.


Thanks for the draft.  Here's some feedback.


* Fix [`INSERT`](https://www.postgresql.org/docs/current/sql-insert.html) from
multiple [`VALUES`](https://www.postgresql.org/docs/current/sql-values.html)
rows into a target column that is a domain over an array or composite type.
including requiring the [SELECT 
privilege](https://www.postgresql.org/docs/current/sql-grant.html)
on the target table when using 
[`MERGE`](https://www.postgresql.org/docs/current/sql-merge.html)
when using `MERGE ... DO NOTHING`.


Something looks wrong with the above. Are two separate items merged
into one?  52898c63e and a3f5d2056?


Ugh, I see what happened. I was originally planning to combine them, and 
then had one be the lede, then the other. Given I ended up consolidating 
quite a bit, I'll just have them each stand on their own. I'll fix this 
in the next draft (which I'll upload on my Tuesday).



* Fix confusion for SQL-language procedures that returns a single composite-type
column.


Should "returns" be singular here?


Fixed.


* Throw an error if an index is accessed while it is being reindexed.


  I know you want to keep these short and I understand the above is the
same wording from release notes, but these words sound like a terrible
oversite that we allow any concurrent query to still use the table
while a reindex is in progress.


Yeah, I was not happy with this one at all.

  Maybe we should give more detail

there so people don't think we made such a silly mistake. The release
note version I think does have enough words to allow the reader to
understand that the mistake is more complex. Maybe we could add
something here to make it sound like less of an embarrassing mistake?


Based on this, I'd vote to just remove it from the release announcement.

Jonathan



OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: Use pgstat_kind_infos to read fixed shared stats structs

2024-05-06 Thread Michael Paquier

On Mon, May 06, 2024 at 02:07:53PM -0500, Tristan Partin wrote:
> This was originally thought of by Andres in
> 5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd.

+1 because you are removing a duplication between the order of the
items in PgStat_Kind and the order when these are read.  I suspect
that nobody would mess up with the order if adding a stats kind with a
fixed number of objects, but that makes maintenance slightly easier in
the long-term :)

> Not a fix, per se, but it does remove a comment. Perhaps the discussion will
> just lead to someone deleting the comment, and keeping the code as is.
> Either way, a win :).

Yup.  Let's leave that as something to do for v18.
--
Michael


signature.asc
Description: PGP signature

Re: Weird test mixup

2024-05-06 Thread Michael Paquier

On Mon, May 06, 2024 at 02:23:24PM -0700, Noah Misch wrote:
> Here's how I've patched it locally.  It does avoid changing the backend-side,
> which has some attraction.  Shall I just push this?

It looks like you did not rebase on top of HEAD to avoid the spinlock
taken with InjectionPointDetach() in the shmem callback.  I think that
you'd mean the attached, once rebased (apologies I've reset the author
field).

+ *s1: local-attach to POINT
+ *s2: yield CPU before InjectionPointRun(POINT) calls injection_callback()
+ *s1: exit()
+ *s2: run POINT as though it had been non-local

I see.  So you are taking a shortcut in the shape of never resetting
the name of a condition, so as it is possible to let the point of step
4 check the runtime condition with a condition still stored while the
point has been detached, removed from the hash table.

 if (strcmp(condition->name, name) == 0)
 {
+condition->valid = false;
 condition->pid = 0;
-condition->name[0] = '\0';
 }
 }

As of HEAD, we rely on InjectionPointCondition->name to be set to
check if a condition is valid.  Your patch adds a different variable
to do mostly the same thing, and never clears the name in any existing
conditions.  A side effect is that this causes the conditions to pile
up on a running server when running installcheck, and assuming that
many test suites are run on a server left running this could cause
spurious failures when failing to find a new slot.  Always resetting
condition->name when detaching a point is a simpler flow and saner
IMO.

Overall, this switches from one detach behavior to a different one,
which may or may not be intuitive depending on what one is looking
for.  FWIW, I see InjectionPointCondition as something that should be
around as long as its injection point exists, with the condition
entirely gone once the point is detached because it should not exist
anymore on the server running, with no information left in shmem.

Through your patch, you make conditions have a different meaning, with
a mix of "local" definition, but it is kind of permanent as it keeps a
trace of the point's name in shmem.  I find the behavior of the patch
less intuitive.  Perhaps it would be interesting to see first the bug
and/or problem you are trying to tackle with this different behavior
as I feel like we could do something even with the module as-is.  As
far as I understand, the implementation of the module on HEAD allows
one to emulate a breakpoint with a wait/wake, which can avoid the
window mentioned in step 2.  Even if a wait point is detached
concurrently, it can be awaken with its traces in shmem removed.
--
Michael
From 9cf38b2f6dadd5b4bb81fe5ccea350d259e6a241 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 7 May 2024 10:15:28 +0900
Subject: [PATCH v2] Fix race condition in backend exit after
 injection_points_set_local().

Symptoms were like those prompting commit
f587338dec87d3c35b025e131c5977930ac69077.  That is, under installcheck,
backends from unrelated tests could run the injection points.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME
---
 .../injection_points/injection_points.c   | 27 ++-
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c
index a74a4a28af..8c9a3ebd74 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -37,7 +37,13 @@ PG_MODULE_MAGIC;

 /*
  * Conditions related to injection points.  This tracks in shared memory the
- * runtime conditions under which an injection point is allowed to run.
+ * runtime conditions under which an injection point is allowed to run.  Once
+ * set, a name is never changed.  That avoids a race condition:
+ *
+ *	s1: local-attach to POINT
+ *	s2: yield CPU before InjectionPointRun(POINT) calls injection_callback()
+ *	s1: exit()
+ *	s2: run POINT as though it had been non-local
  *
  * If more types of runtime conditions need to be tracked, this structure
  * should be expanded.
@@ -49,6 +55,9 @@ typedef struct InjectionPointCondition

 	/* ID of the process where the injection point is allowed to run */
 	int			pid;
+
+	/* Should "pid" run this injection point, or not? */
+	bool		valid;
 } InjectionPointCondition;

 /* Shared state information for injection points. */
@@ -133,7 +142,7 @@ injection_point_allowed(const char *name)
 	{
 		InjectionPointCondition *condition = _state->conditions[i];

-		if (strcmp(condition->name, name) == 0)
+		if (condition->valid && strcmp(condition->name, name) == 0)
 		{
 			/*
 			 * Check if this injection point is allowed to run in this
@@ -175,9 +184,11 @@ injection_points_cleanup(int code, Datum arg)
 	{
 		InjectionPointCondition *condition = _state->conditions[i];

-		if (condition->name[0] == '\0')
+		if (!condition->valid)

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-05-06 Thread Michael Paquier

On Fri, May 03, 2024 at 10:39:15AM +0200, Daniel Gustafsson wrote:
> They are no-ops when linking against v18, but writing an extension which
> targets all supported versions of postgres along with their respective
> supported OpenSSL versions make them still required, or am I missing 
> something?

Yeah, that depends on how much version you expect your application to
work on.  Still it seems to me that there's value in mentioning that
if your application does not care about anything older than OpenSSL 
1.1.0, like PG 18 assuming that this patch is merged, then these calls
are pointless for HEAD.  The routine definitions would be around only
for the .so compatibility.
--
Michael


signature.asc
Description: PGP signature

WHERE CURRENT OF with RLS quals that are ctid conditions

2024-05-06 Thread Tom Lane

Robert pointed out [1] that the planner fails if we have $SUBJECT,
because tidpath.c can seize on the RLS-derived ctid constraint
instead of the CurrentOfExpr.  Since the executor can only handle
CurrentOfExpr in a TidScan's tidquals, that leads to a confusing
runtime error.

Here's a patch for that.

However ... along the way to testing it, I found that you can only
get such an RLS qual to work if it accepts "(InvalidBlockNumber,0)",
because that's what the ctid field will look like in a
not-yet-stored-to-disk tuple.  That's sufficiently weird, and so
unduly in bed with undocumented implementation details, that I can't
imagine anyone is actually using such an RLS condition or ever will.
So maybe this is not really worth fixing.  Thoughts?

regards, tom lane

[1] 
https://www.postgresql.org/message-id/CA%2BTgmobwgL1XyV4uyUd26Nxff5WVA7%2B9XUED4yjpvft83_MBAw%40mail.gmail.com

diff --git a/src/backend/optimizer/path/tidpath.c b/src/backend/optimizer/path/tidpath.c
index 2ae5ddfe43..eb11bc79c7 100644
--- a/src/backend/optimizer/path/tidpath.c
+++ b/src/backend/optimizer/path/tidpath.c
@@ -225,42 +225,37 @@ IsCurrentOfClause(RestrictInfo *rinfo, RelOptInfo *rel)
 }
 
 /*
- * Extract a set of CTID conditions from the given RestrictInfo
- *
- * Returns a List of CTID qual RestrictInfos for the specified rel (with
- * implicit OR semantics across the list), or NIL if there are no usable
- * conditions.
+ * Is the RestrictInfo usable as a CTID qual for the specified rel?
  *
  * This function considers only base cases; AND/OR combination is handled
- * below.  Therefore the returned List never has more than one element.
- * (Using a List may seem a bit weird, but it simplifies the caller.)
+ * below.
  */
-static List *
-TidQualFromRestrictInfo(PlannerInfo *root, RestrictInfo *rinfo, RelOptInfo *rel)
+static bool
+RestrictInfoIsTidQual(PlannerInfo *root, RestrictInfo *rinfo, RelOptInfo *rel)
 {
 	/*
 	 * We may ignore pseudoconstant clauses (they can't contain Vars, so could
 	 * not match anyway).
 	 */
 	if (rinfo->pseudoconstant)
-		return NIL;
+		return false;
 
 	/*
 	 * If clause must wait till after some lower-security-level restriction
 	 * clause, reject it.
 	 */
 	if (!restriction_is_securely_promotable(rinfo, rel))
-		return NIL;
+		return false;
 
 	/*
-	 * Check all base cases.  If we get a match, return the clause.
+	 * Check all base cases.
 	 */
 	if (IsTidEqualClause(rinfo, rel) ||
 		IsTidEqualAnyClause(root, rinfo, rel) ||
 		IsCurrentOfClause(rinfo, rel))
-		return list_make1(rinfo);
+		return true;
 
-	return NIL;
+	return false;
 }
 
 /*
@@ -270,12 +265,22 @@ TidQualFromRestrictInfo(PlannerInfo *root, RestrictInfo *rinfo, RelOptInfo *rel)
  * implicit OR semantics across the list), or NIL if there are no usable
  * equality conditions.
  *
- * This function is just concerned with handling AND/OR recursion.
+ * This function is mainly concerned with handling AND/OR recursion.
+ * However, we do have a special rule to enforce: if there is a CurrentOfExpr
+ * qual, we *must* return that and only that, else the executor may fail.
+ * Ordinarily a CurrentOfExpr would be all alone anyway because of grammar
+ * restrictions, but it is possible for RLS quals to appear AND'ed with it.
+ * It's even possible (if fairly useless) for the RLS quals to be CTID quals.
+ * So we must scan the whole rlist to see if there's a CurrentOfExpr.  Since
+ * we have to do that, we also apply some very-trivial preference rules about
+ * which of the other possibilities should be chosen, in the unlikely event
+ * that there's more than one choice.
  */
 static List *
 TidQualFromRestrictInfoList(PlannerInfo *root, List *rlist, RelOptInfo *rel)
 {
-	List	   *rlst = NIL;
+	RestrictInfo *tidclause = NULL; /* best simple CTID qual so far */
+	List	   *orlist = NIL;	/* best OR'ed CTID qual so far */
 	ListCell   *l;
 
 	foreach(l, rlist)
@@ -284,6 +289,7 @@ TidQualFromRestrictInfoList(PlannerInfo *root, List *rlist, RelOptInfo *rel)
 
 		if (restriction_is_or_clause(rinfo))
 		{
+			List	   *rlst = NIL;
 			ListCell   *j;
 
 			/*
@@ -308,7 +314,10 @@ TidQualFromRestrictInfoList(PlannerInfo *root, List *rlist, RelOptInfo *rel)
 	RestrictInfo *ri = castNode(RestrictInfo, orarg);
 
 	Assert(!restriction_is_or_clause(ri));
-	sublist = TidQualFromRestrictInfo(root, ri, rel);
+	if (RestrictInfoIsTidQual(root, ri, rel))
+		sublist = list_make1(ri);
+	else
+		sublist = NIL;
 }
 
 /*
@@ -326,25 +335,44 @@ TidQualFromRestrictInfoList(PlannerInfo *root, List *rlist, RelOptInfo *rel)
  */
 rlst = list_concat(rlst, sublist);
 			}
+
+			if (rlst)
+			{
+/*
+ * Accept the OR'ed list if it's the first one, or if it's
+ * shorter than the previous one.
+ */
+if (orlist == NIL || list_length(rlst) < list_length(orlist))
+	orlist = rlst;
+			}
 		}
 		else
 		{
 			/* Not an OR clause, so handle base cases */
-			rlst = TidQualFromRestrictInfo(root,

Re: Incorrect explain output for updates/delete operations with returning-list on partitioned tables

2024-05-06 Thread David Rowley

On Tue, 7 May 2024 at 09:18, Tom Lane  wrote:
>
> SAIKIRAN AVULA  writes:
> > I have been working on partitioned tables recently, and I have noticed
> > something that doesn't seem correct with the EXPLAIN output of an
> > update/delete query with a returning list.
>
> What do you think is not right exactly?  The output has to use some
> one of the correlation names for the partitioned table.  I think
> it generally chooses the one corresponding to the first Append arm,
> but really any would be good enough for EXPLAIN's purposes.

Also looks harmless to me.  But just a slight correction, you're
talking about the deparse Append condition that's in
set_deparse_plan().  Whereas the code that controls this for the
returningList is the following in nodeModifyTable.c:

/*
* Initialize result tuple slot and assign its rowtype using the first
* RETURNING list.  We assume the rest will look the same.
*/
mtstate->ps.plan->targetlist = (List *) linitial(node->returningLists);

David

Re: backend stuck in DataFileExtend

2024-05-06 Thread Thomas Munro

On Tue, May 7, 2024 at 6:21 AM Justin Pryzby  wrote:
> FWIW: both are running zfs-2.2.3 RPMs from zfsonlinux.org.
...
> Yes, they're running centos7 with the indicated kernels.

So far we've got:

* spurious EIO when opening a file (your previous report)
* hanging with CPU spinning (?) inside pwritev()
* old kernel, bleeding edge ZFS

>From an (uninformed) peek at the ZFS code, if it really is spinning
there is seems like a pretty low level problem: it's finish the write,
and now is just trying to release (something like our unpin) and
unlock the buffers, which involves various code paths that might touch
various mutexes and spinlocks, and to get stuck like that I guess it's
either corrupted itself or it is deadlocking against something else,
but what?  Do you see any other processes (including kernel threads)
with any stuck stacks that might be a deadlock partner?

While looking around for reported issues I found your abandoned report
against an older ZFS version from a few years ago, same old Linux
version:

https://github.com/openzfs/zfs/issues/11641

I don't know enough to say anything useful about that but it certainly
smells similar...

I see you've been busy reporting lots of issues, which seems to
involve big data, big "recordsize" (= ZFS block sizes), compression
and PostgreSQL:

https://github.com/openzfs/zfs/issues?q=is%3Aissue+author%3Ajustinpryzby

Re: Is it acceptable making COPY format extendable?

2024-05-06 Thread Sutou Kouhei

Hi,

In <7a978dd2-371b-4d10-b58e-cb32ef728...@eisentraut.org>
  "Re: Is it acceptable making COPY format extendable?" on Thu, 25 Apr 2024 
09:59:18 +0200,
  Peter Eisentraut  wrote:

> PostgreSQL development is currently in feature freeze for PostgreSQL
> 17 (since April 8).  So most participants will not be looking very
> closely at development projects for releases beyond that.  The next
> commitfest for patches targeting PostgreSQL 18 is in July, and I see
> your patch registered there.  It's possible that your thread is not
> going to get much attention until then.
> 
> So, in summary, you are doing everything right, you just have to be a
> bit more patient right now. :)

I see. I'll wait for the next commitfest.


Thanks,
-- 
kou

Re: 2024-05-09 release announcement draft,2024-05-09 release announcement draft

2024-05-06 Thread Sutou Kouhei

Hi,

In 
<779790c7-45d7-4010-9305-c3f9e6a60...@postgresql.org>,<779790c7-45d7-4010-9305-c3f9e6a60...@postgresql.org>
  "2024-05-09 release announcement draft,2024-05-09 release announcement draft" 
on Mon, 6 May 2024 13:44:05 -0400,
  "Jonathan S. Katz"  wrote:

> * Added optimization for certain operations where an installation has 
> thousands
> of roles.

Added ->
Add

> * [Follow @postgresql on Twitter](https://twitter.com/postgresql)

Twitter ->
X


Thanks,
-- 
kou

[PATCH] pg_stat_activity: make slow/hanging authentication more visible

2024-05-06 Thread Jacob Champion

Hi all,

Recently I dealt with a server where PAM had hung a connection
indefinitely, suppressing our authentication timeout and preventing a
clean shutdown. Worse, the xmin that was pinned by the opening
transaction cascaded to replicas and started messing things up
downstream.

The DBAs didn't know what was going on, because pg_stat_activity
doesn't report the authenticating connection or its open transaction.
It just looked like a Postgres bug. And while talking about it with
Euler, he mentioned he'd seen similar "invisible" hangs with
misbehaving LDAP deployments. I think we can do better to show DBAs
what's happening.

0001, attached, changes InitPostgres() to report a nearly-complete
pgstat entry before entering client authentication, then fills it in
the rest of the way once we know who the user is. Here's a sample
entry for a client that's hung during a SCRAM exchange:

=# select * from pg_stat_activity where state = 'authenticating';
-[ RECORD 1 ]+--
datid|
datname  |
pid  | 745662
leader_pid   |
usesysid |
usename  |
application_name |
client_addr  | 127.0.0.1
client_hostname  |
client_port  | 38304
backend_start| 2024-05-06 11:25:23.905923-07
xact_start   |
query_start  |
state_change |
wait_event_type  | Client
wait_event   | ClientRead
state| authenticating
backend_xid  |
backend_xmin | 784
query_id |
query|
backend_type | client backend

0002 goes even further, and adds wait events for various forms of
external authentication, but it's not fully baked. The intent is for a
DBA to be able to see when a bunch of connections are piling up
waiting for PAM/Kerberos/whatever. (I'm also motivated by my OAuth
patchset, where there's a server-side plugin that we have no control
over, and we'd want to be able to correctly point fingers at it if
things go wrong.)

= Open Issues, Idle Thoughts =

Maybe it's wishful thinking, but it'd be cool if a misbehaving
authentication exchange did not impact replicas in any way. Is there a
way to make that opening transaction lighterweight?

0001 may be a little too much code. There are only two parts of
pgstat_bestart() that need to be modified: omit the user ID, and fill
in the state as 'authenticating' rather than unknown. I could just add
the `pre_auth` boolean to the signature of pgstat_bestart() directly,
if we don't mind adjusting all the call sites. We could also avoid
changing the signature entirely, and just assume that we're
authenticating if SessionUserId isn't set. That felt like a little too
much global magic to me, though.

Would anyone like me to be more aggressive, and create a pgstat entry
as soon as we have the opening transaction? Or... as soon as a
connection is made?

0002 is abusing the "IPC" wait event class. If the general idea seems
okay, maybe we could add an "External" class that encompasses the
general idea of "it's not our fault, it's someone else's"?

I had trouble deciding how granular to make the areas that are covered
by the new wait events. Ideally they would kick in only when we call
out to an external system, but for some authentication types, that's a
lot of calls to wrap. On the other extreme, we don't want to go too
high in the call stack and accidentally nest wait events (such as
those generated during pq_getmessage()). What I have now is not very
principled.

I haven't decided how to test these patches. Seems like a potential
use case for injection points, but I think I'd need to preload an
injection library rather than using the existing extension. Does that
seem like an okay way to go?

Thanks,
--Jacob


0001-pgstat-report-in-earlier-with-STATE_AUTHENTICATING.patch
Description: Binary data


0002-WIP-report-external-auth-calls-as-wait-events.patch
Description: Binary data

Re: Weird test mixup

2024-05-06 Thread Noah Misch

On Mon, May 06, 2024 at 10:03:37AM +0900, Michael Paquier wrote:
> On Thu, May 02, 2024 at 12:35:55PM -0700, Noah Misch wrote:
> > I should have given a simpler example:
> > 
> > s1: local-attach to POINT
> > s2: enter InjectionPointRun(POINT), yield CPU just before 
> > injection_callback()
> > s1: exit
> > s2: wake up and run POINT as though it had been non-local

Here's how I've patched it locally.  It does avoid changing the backend-side,
which has some attraction.  Shall I just push this?

> Hmm.  Even if you were to emulate that in a controlled manner, you
> would need a second injection point that does a wait in s2, which is
> something that would happen before injection_callback() and before
> scanning the local entry.  This relies on the fact that we're holding
> CPU in s2 between the backend shmem hash table lookup and the callback
> being called.

Right.  We would need "second-level injection points" to write a test for that
race in the injection point system.
Author: Noah Misch 
Commit: Noah Misch 

Fix race condition in backend exit after injection_points_set_local().

Symptoms were like those prompting commit
f587338dec87d3c35b025e131c5977930ac69077.  That is, under installcheck,
backends from unrelated tests could run the injection points.

Reviewed by FIXME.

Discussion: https://postgr.es/m/FIXME

diff --git a/src/test/modules/injection_points/injection_points.c 
b/src/test/modules/injection_points/injection_points.c
index ace386d..75466d3 100644
--- a/src/test/modules/injection_points/injection_points.c
+++ b/src/test/modules/injection_points/injection_points.c
@@ -37,7 +37,13 @@ PG_MODULE_MAGIC;
 
 /*
  * Conditions related to injection points.  This tracks in shared memory the
- * runtime conditions under which an injection point is allowed to run.
+ * runtime conditions under which an injection point is allowed to run.  Once
+ * set, a name is never changed.  That avoids a race condition:
+ *
+ * s1: local-attach to POINT
+ * s2: yield CPU before InjectionPointRun(POINT) calls injection_callback()
+ * s1: exit()
+ * s2: run POINT as though it had been non-local
  *
  * If more types of runtime conditions need to be tracked, this structure
  * should be expanded.
@@ -49,6 +55,9 @@ typedef struct InjectionPointCondition
 
/* ID of the process where the injection point is allowed to run */
int pid;
+
+   /* Should "pid" run this injection point, or not? */
+   boolvalid;
 } InjectionPointCondition;
 
 /* Shared state information for injection points. */
@@ -133,7 +142,7 @@ injection_point_allowed(const char *name)
{
InjectionPointCondition *condition = _state->conditions[i];
 
-   if (strcmp(condition->name, name) == 0)
+   if (condition->valid && strcmp(condition->name, name) == 0)
{
/*
 * Check if this injection point is allowed to run in 
this
@@ -168,15 +177,16 @@ injection_points_cleanup(int code, Datum arg)
{
InjectionPointCondition *condition = _state->conditions[i];
 
-   if (condition->name[0] == '\0')
+   if (!condition->valid)
continue;
+   Assert(condition->name[0] != '\0');
 
if (condition->pid != MyProcPid)
continue;
 
/* Detach the injection point and unregister condition */
InjectionPointDetach(condition->name);
-   condition->name[0] = '\0';
+   condition->valid = false;
condition->pid = 0;
}
SpinLockRelease(_state->lock);
@@ -299,11 +309,13 @@ injection_points_attach(PG_FUNCTION_ARGS)
{
InjectionPointCondition *condition = 
_state->conditions[i];
 
-   if (condition->name[0] == '\0')
+   if (strcmp(condition->name, name) == 0 ||
+   condition->name[0] == '\0')
{
index = i;
strlcpy(condition->name, name, INJ_NAME_MAXLEN);
condition->pid = MyProcPid;
+   condition->valid = true;
break;
}
}
@@ -416,8 +428,8 @@ injection_points_detach(PG_FUNCTION_ARGS)
 
if (strcmp(condition->name, name) == 0)
{
+   condition->valid = false;
condition->pid = 0;
-   condition->name[0] = '\0';
}
}
SpinLockRelease(_state->lock);

Re: Incorrect explain output for updates/delete operations with returning-list on partitioned tables

2024-05-06 Thread Tom Lane

SAIKIRAN AVULA  writes:
> I have been working on partitioned tables recently, and I have noticed
> something that doesn't seem correct with the EXPLAIN output of an
> update/delete query with a returning list.

What do you think is not right exactly?  The output has to use some
one of the correlation names for the partitioned table.  I think
it generally chooses the one corresponding to the first Append arm,
but really any would be good enough for EXPLAIN's purposes.

> 1. In the 'grouping_planner()' function, while generating paths for the
> final relation (
> https://github.com/postgres/postgres/blob/master/src/backend/optimizer/plan/planner.c#L1857),
> we only take care of adjusting the append_rel_attributes in returningList
> for resultRelation. Shouldn't we do that for other relations as well in
> query?

If the only difference is which way variables get labeled in EXPLAIN,
I'd be kind of disinclined to spend extra cycles.  But in any case,
I rather suspect you'll find that this actively breaks things.
Whether we change the varno on a Var isn't really optional, and there
are cross-checks in setrefs.c to make sure things match up.

regards, tom lane

Re: 2024-05-09 release announcement draft

2024-05-06 Thread David Rowley

On Tue, 7 May 2024 at 05:44, Jonathan S. Katz  wrote:
> Please provide feedback no later (and preferably sooner) than 2024-05-09
> 12:00 UTC.

Thanks for the draft.  Here's some feedback.

> * Fix [`INSERT`](https://www.postgresql.org/docs/current/sql-insert.html) from
> multiple [`VALUES`](https://www.postgresql.org/docs/current/sql-values.html)
> rows into a target column that is a domain over an array or composite type.
> including requiring the [SELECT 
> privilege](https://www.postgresql.org/docs/current/sql-grant.html)
> on the target table when using 
> [`MERGE`](https://www.postgresql.org/docs/current/sql-merge.html)
> when using `MERGE ... DO NOTHING`.

Something looks wrong with the above. Are two separate items merged
into one?  52898c63e and a3f5d2056?

> * Fix confusion for SQL-language procedures that returns a single 
> composite-type
> column.

Should "returns" be singular here?

> * Throw an error if an index is accessed while it is being reindexed.

 I know you want to keep these short and I understand the above is the
same wording from release notes, but these words sound like a terrible
oversite that we allow any concurrent query to still use the table
while a reindex is in progress.  Maybe we should give more detail
there so people don't think we made such a silly mistake. The release
note version I think does have enough words to allow the reader to
understand that the mistake is more complex. Maybe we could add
something here to make it sound like less of an embarrassing mistake?

David

Incorrect explain output for updates/delete operations with returning-list on partitioned tables

2024-05-06 Thread SAIKIRAN AVULA

Hi PostgreSQL Community,

I have been working on partitioned tables recently, and I have noticed
something that doesn't seem correct with the EXPLAIN output of an
update/delete query with a returning list.

For example, consider two partitioned tables, "t1" and "t2," with
partitions "t11," "t12," and "t21," "t22," respectively. The table
definitions are as follows:

```sql
postgres=# \d+ t1
 Partitioned table "public.t1"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression
| Stats target | Description
+-+---+--+-+-+-+--+-
 a  | integer |   |  | | plain   |
|  |
 b  | integer |   |  | | plain   |
|  |
 c  | integer |   |  | | plain   |
|  |
Partition key: RANGE (a)
Partitions: t11 FOR VALUES FROM (0) TO (1000),
t12 FOR VALUES FROM (1000) TO (1)

postgres=# \d+ t2
 Partitioned table "public.t2"
 Column |  Type   | Collation | Nullable | Default | Storage | Compression
| Stats target | Description
+-+---+--+-+-+-+--+-
 a  | integer |   |  | | plain   |
|  |
 b  | integer |   |  | | plain   |
|  |
 c  | integer |   |  | | plain   |
|  |
Partition key: RANGE (a)
Partitions: t21 FOR VALUES FROM (0) TO (1000),
t22 FOR VALUES FROM (1000) TO (1)
```

The EXPLAIN output for an update query with a returning list doesn't seem
correct to me. Here are the examples (the part that doesn't seem right is
highlighted in bold):

*Query1:*
```
postgres=# explain verbose update t1 set b = 10 from t2 where t1.a = t2.a
 returning t1.c;
QUERY PLAN

---
 Update on public.t1  (cost=0.00..125187.88 rows=41616 width=14)
   *Output: t1_1.c -> something not right??*
   Update on public.t11 t1_1
   Update on public.t12 t1_2
   ->  Append  (cost=0.00..125187.88 rows=41616 width=14)
 ->  Nested Loop  (cost=0.00..62489.90 rows=20808 width=14)
   Output: 10, t1_1.tableoid, t1_1.ctid
   Join Filter: (t1_1.a = t2_1.a)
   ->  Seq Scan on public.t11 t1_1  (cost=0.00..30.40 rows=2040
width=14)
 Output: t1_1.a, t1_1.tableoid, t1_1.ctid
   ->  Materialize  (cost=0.00..40.60 rows=2040 width=4)
 Output: t2_1.a
 ->  Seq Scan on public.t21 t2_1  (cost=0.00..30.40
rows=2040 width=4)
   Output: t2_1.a
 ->  Nested Loop  (cost=0.00..62489.90 rows=20808 width=14)
   Output: 10, t1_2.tableoid, t1_2.ctid
   Join Filter: (t1_2.a = t2_2.a)
   ->  Seq Scan on public.t12 t1_2  (cost=0.00..30.40 rows=2040
width=14)
 Output: t1_2.a, t1_2.tableoid, t1_2.ctid
   ->  Materialize  (cost=0.00..40.60 rows=2040 width=4)
 Output: t2_2.a
 ->  Seq Scan on public.t22 t2_2  (cost=0.00..30.40
rows=2040 width=4)
   Output: t2_2.a
(23 rows)
```

*Query2:*

*```*postgres=# explain verbose update t1 set b = 10 from t2 where t1.a =
t2.a  returning t2.c;
QUERY PLAN

---
 Update on public.t1  (cost=0.00..125187.88 rows=41616 width=18)
   *Output: t2.c*
   Update on public.t11 t1_1
   Update on public.t12 t1_2
   ->  Append  (cost=0.00..125187.88 rows=41616 width=18)
 ->  Nested Loop  (cost=0.00..62489.90 rows=20808 width=18)
   Output: 10, t2_1.c, t1_1.tableoid, t1_1.ctid
   Join Filter: (t1_1.a = t2_1.a)
   ->  Seq Scan on public.t11 t1_1  (cost=0.00..30.40 rows=2040
width=14)
 Output: t1_1.a, t1_1.tableoid, t1_1.ctid
   ->  Materialize  (cost=0.00..40.60 rows=2040 width=8)
 Output: t2_1.c, t2_1.a
 ->  Seq Scan on public.t21 t2_1  (cost=0.00..30.40
rows=2040 width=8)
   Output: t2_1.c, t2_1.a
 ->  Nested Loop  (cost=0.00..62489.90 rows=20808 width=18)
   Output: 10, t2_2.c, t1_2.tableoid, t1_2.ctid
   Join Filter: (t1_2.a = t2_2.a)
   ->  Seq Scan on public.t12 t1_2  (cost=0.00..30.40 rows=2040
width=14)
 Output: t1_2.a, t1_2.tableoid, t1_2.ctid
   ->  Materialize  (cost=0.00..40.60 rows=2040 width=8)
 Output: t2_2.c, t2_2.a
 ->  Seq Scan on public.t22 t2_2

Re: 2024-05-09 release announcement draft

2024-05-06 Thread Erik Rijkers


Op 5/6/24 om 19:44 schreef Jonathan S. Katz:

Hi,

Please find the draft of the 2024-05-09 release announcement.



'procedures that returns'  should be
'procedures that return'

Re: On disable_cost

2024-05-06 Thread Peter Geoghegan

On Mon, May 6, 2024 at 8:27 AM Robert Haas  wrote:
> Stepping back a bit, my current view of this area is: disable_cost is
> highly imperfect both as an idea and as implemented in PostgreSQL.
> Although I'm discovering that the current implementation gets more
> things right than I had realized, it also sometimes gets things wrong.
> The original poster gave an example of that, and there are others.
> Furthermore, the current implementation has some weird
> inconsistencies. Therefore, I would like something better.

FWIW I always found those weird inconsistencies to be annoying at
best, and confusing at worst. I speak as somebody that uses
disable_cost a lot.

I certainly wouldn't ask anybody to make it a priority for that reason
alone -- it's not *that* bad. I've given my opinion on this because
it's already under discussion.

-- 
Peter Geoghegan

Re: On disable_cost

2024-05-06 Thread Tom Lane

Robert Haas  writes:
> I'll look into this, unless you want to do it.

I have a draft patch already.  Need to add a test case.

> Incidentally, another thing I just noticed is that
> IsCurrentOfClause()'s test for (node->cvarno == rel->relid) is
> possibly dead code. At least, there are no examples in our test suite
> where it fails to hold. Which seems like it makes sense, because if it
> didn't, then how did the clause end up in baserestrictinfo? Maybe this
> is worth keeping as defensive coding, or maybe it should be changed to
> an Assert or something.

I wouldn't remove it, but maybe an Assert is good enough.  The tests
on Vars' varno should be equally pointless no?

regards, tom lane

Re: Removing unneeded self joins

2024-05-06 Thread Robert Haas

On Mon, May 6, 2024 at 3:27 PM Alexander Korotkov  wrote:
> I agree it was a hurry to put the plan into commit message.  I think
> Tom already gave valuable feedback [1] and probably we will get more.
> So, plan is to be decided.  One way or the other I'm not going to
> re-commit this without explicit Tom's consent.

Thanks. I hope we find a way to make it happen.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: On disable_cost

2024-05-06 Thread Robert Haas

On Mon, May 6, 2024 at 3:26 PM Tom Lane  wrote:
> Nah, I'm wrong: we do treat it as leakproof, and the comment about
> that in contain_leaked_vars_walker shows that the interaction with
> RLS quals *was* thought about.  What wasn't thought about was the
> possibility of RLS quals that themselves could be usable as tidquals,
> which breaks this assumption in TidQualFromRestrictInfoList:
>
>  * Stop as soon as we find any usable CTID condition.  In theory we
>  * could get CTID equality conditions from different AND'ed clauses,
>  * in which case we could try to pick the most efficient one.  In
>  * practice, such usage seems very unlikely, so we don't bother; we
>  * just exit as soon as we find the first candidate.

Right. I had noticed this but didn't spell it out.

> The executor doesn't seem to be prepared to cope with multiple AND'ed
> TID clauses (only OR'ed ones).  So we need to fix this at least to the
> extent of looking for a CurrentOfExpr qual, and preferring that over
> anything else.
>
> I'm also now wondering about this assumption in the executor:
>
> /* CurrentOfExpr could never appear OR'd with something else */
> Assert(list_length(tidstate->tss_tidexprs) == 1 ||
>!tidstate->tss_isCurrentOf);
>
> It still seems OK, because anything that might come in from RLS quals
> would be AND'ed not OR'ed with the CurrentOfExpr.

This stuff I had not noticed.

> > In general I think you're right that something less rickety than
> > the disable_cost hack would be a good idea to ensure the desired
> > TidPath gets chosen, but this problem is not the fault of that.
> > We're not making the TidPath with the correct contents in the first
> > place.
>
> Still true.

I'll look into this, unless you want to do it.

Incidentally, another thing I just noticed is that
IsCurrentOfClause()'s test for (node->cvarno == rel->relid) is
possibly dead code. At least, there are no examples in our test suite
where it fails to hold. Which seems like it makes sense, because if it
didn't, then how did the clause end up in baserestrictinfo? Maybe this
is worth keeping as defensive coding, or maybe it should be changed to
an Assert or something.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Removing unneeded self joins

2024-05-06 Thread Alexander Korotkov

On Mon, May 6, 2024 at 5:44 PM Robert Haas  wrote:
> I want to go on record right now as disagreeing with the plan proposed
> in the commit message for the revert commit, namely, committing this
> again early in the v18 cycle. I don't think Tom would have proposed
> reverting this feature unless he believed that it had more serious
> problems than could be easily fixed in a short period of time. I think
> that concern is well-founded, given the number of fixes that were
> committed. It seems likely that the patch needs significant rework and
> stabilization before it gets committed again, and I think it shouldn't
> be committed again without explicit agreement from Tom or one of the
> other committers who have significant experience with the query
> planner. That is not to say that I don't approve generally of the idea
> of committing things earlier in the release cycle: I certainly do. It
> gives us more time to shake out problems with patches before we ship.
> But it only makes sense if we collectively believe that the patch is
> mostly correct, and only needs fine-tuning, and I think there are good
> reasons to believe that we shouldn't have that level of confidence in
> this case.

I agree it was a hurry to put the plan into commit message.  I think
Tom already gave valuable feedback [1] and probably we will get more.
So, plan is to be decided.  One way or the other I'm not going to
re-commit this without explicit Tom's consent.

Links.
1. https://www.postgresql.org/message-id/3622801.1715010885%40sss.pgh.pa.us

--
Regards,
Alexander Korotkov
Supabase

Re: On disable_cost

2024-05-06 Thread Tom Lane

I wrote:
> Robert Haas  writes:
>> ... Which then allowed me to
>> construct the example above, where there are two possible TID quals
>> and the logic in tidpath.c latches onto the wrong one.

> Hmm.  Without having traced through it, I'm betting that the
> CurrentOfExpr qual is rejected as a tidqual because it's not
> considered leakproof.

Nah, I'm wrong: we do treat it as leakproof, and the comment about
that in contain_leaked_vars_walker shows that the interaction with
RLS quals *was* thought about.  What wasn't thought about was the
possibility of RLS quals that themselves could be usable as tidquals,
which breaks this assumption in TidQualFromRestrictInfoList:

 * Stop as soon as we find any usable CTID condition.  In theory we
 * could get CTID equality conditions from different AND'ed clauses,
 * in which case we could try to pick the most efficient one.  In
 * practice, such usage seems very unlikely, so we don't bother; we
 * just exit as soon as we find the first candidate.

The executor doesn't seem to be prepared to cope with multiple AND'ed
TID clauses (only OR'ed ones).  So we need to fix this at least to the
extent of looking for a CurrentOfExpr qual, and preferring that over
anything else.

I'm also now wondering about this assumption in the executor:

/* CurrentOfExpr could never appear OR'd with something else */
Assert(list_length(tidstate->tss_tidexprs) == 1 ||
   !tidstate->tss_isCurrentOf);

It still seems OK, because anything that might come in from RLS quals
would be AND'ed not OR'ed with the CurrentOfExpr.

> In general I think you're right that something less rickety than
> the disable_cost hack would be a good idea to ensure the desired
> TidPath gets chosen, but this problem is not the fault of that.
> We're not making the TidPath with the correct contents in the first
> place.

Still true.

regards, tom lane

Re: Removing unneeded self joins

2024-05-06 Thread Alexander Korotkov

On Mon, May 6, 2024 at 6:54 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I want to go on record right now as disagreeing with the plan proposed
> > in the commit message for the revert commit, namely, committing this
> > again early in the v18 cycle. I don't think Tom would have proposed
> > reverting this feature unless he believed that it had more serious
> > problems than could be easily fixed in a short period of time. I think
> > that concern is well-founded, given the number of fixes that were
> > committed. It seems likely that the patch needs significant rework and
> > stabilization before it gets committed again, and I think it shouldn't
> > be committed again without explicit agreement from Tom or one of the
> > other committers who have significant experience with the query
> > planner.
>
> FWIW I accept some of the blame here, for not having paid any
> attention to the SJE work earlier.  I had other things on my mind
> for most of last year, and not enough bandwidth to help.
>
> The main thing I'd like to understand before we try this again is
> why SJE needed so much new query-tree-manipulation infrastructure.
> I would have expected it to be very similar to the left-join
> elimination we do already, and therefore to mostly just share the
> existing infrastructure.  (I also harbor suspicions that some of
> the new code existed just because someone didn't research what
> was already there --- for instance, the now-removed replace_varno
> sure looks like ChangeVarNodes should have been used instead.)

Thank you for pointing this.  This area certainly requires more investigation.

> Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
> result relation).  While I don't claim that that destroyed the entire
> use case for SJE, it certainly knocked its usefulness down by many
> notches, maybe even to the point where it's not worth putting in the
> effort needed to get it to re-committability.  So I think we need to
> look harder at finding a way around that.  Is the concern that
> RETURNING should return either old or new values depending on which
> RTE is mentioned?  If so, maybe the feature Dean has proposed to
> allow RETURNING to access old values [1] is a prerequisite to moving
> forward.  Alternatively, perhaps it'd be good enough to forbid SJE
> only when the non-target relation is actually mentioned in RETURNING.

Another problem is EPQ.  During EPQ, we use most recent tuples for the
target relation and snapshot-satisfying tuples for joined relations.
And that affects RETURNING as well.  If we need to return values for
joined relation, that wouldn't be old values, but values of
snapshot-satisfying tuple which might be even older.

Proper support of this looks like quite amount of work for me.
Committing SJE to v18 with this looks challenging.  AFICS, going this
way would require substantial help from you.

--
Regards,
Alexander Korotkov
Supabase

Use pgstat_kind_infos to read fixed shared stats structs

2024-05-06 Thread Tristan Partin


Instead of needing to be explicit, we can just iterate the
pgstat_kind_infos array to find the memory locations to read into.

This was originally thought of by Andres in
5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd.

Not a fix, per se, but it does remove a comment. Perhaps the discussion 
will just lead to someone deleting the comment, and keeping the code 
as is. Either way, a win :).


--
Tristan Partin
Neon (https://neon.tech)
From 5cc5a67edbd3dee145ea582c024b6ee207ae4085 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Mon, 6 May 2024 13:52:58 -0500
Subject: [PATCH v1] Use pgstat_kind_infos to read fixed shared stats structs

Instead of needing to be explicit, we can just iterate the
pgstat_kind_infos array to find the memory locations to read into.

This was originally thought of by Andres in
5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd.
---
 src/backend/utils/activity/pgstat.c | 75 ++---
 src/include/utils/pgstat_internal.h |  6 +++
 2 files changed, 42 insertions(+), 39 deletions(-)

diff --git ./src/backend/utils/activity/pgstat.c incoming/src/backend/utils/activity/pgstat.c
index dcc2ad8d954..81e3d702ccd 100644
--- ./src/backend/utils/activity/pgstat.c
+++ incoming/src/backend/utils/activity/pgstat.c
@@ -342,6 +342,10 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
 
 		.fixed_amount = true,
 
+		.shared_ctl_off = offsetof(PgStat_ShmemControl, archiver),
+		.shared_data_off = offsetof(PgStatShared_Archiver, stats),
+		.shared_data_len = sizeof(((PgStatShared_Archiver *) 0)->stats),
+
 		.reset_all_cb = pgstat_archiver_reset_all_cb,
 		.snapshot_cb = pgstat_archiver_snapshot_cb,
 	},
@@ -351,6 +355,10 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
 
 		.fixed_amount = true,
 
+		.shared_ctl_off = offsetof(PgStat_ShmemControl, bgwriter),
+		.shared_data_off = offsetof(PgStatShared_BgWriter, stats),
+		.shared_data_len = sizeof(((PgStatShared_BgWriter *) 0)->stats),
+
 		.reset_all_cb = pgstat_bgwriter_reset_all_cb,
 		.snapshot_cb = pgstat_bgwriter_snapshot_cb,
 	},
@@ -360,6 +368,10 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
 
 		.fixed_amount = true,
 
+		.shared_ctl_off = offsetof(PgStat_ShmemControl, checkpointer),
+		.shared_data_off = offsetof(PgStatShared_Checkpointer, stats),
+		.shared_data_len = sizeof(((PgStatShared_Checkpointer *) 0)->stats),
+
 		.reset_all_cb = pgstat_checkpointer_reset_all_cb,
 		.snapshot_cb = pgstat_checkpointer_snapshot_cb,
 	},
@@ -369,6 +381,10 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
 
 		.fixed_amount = true,
 
+		.shared_ctl_off = offsetof(PgStat_ShmemControl, io),
+		.shared_data_off = offsetof(PgStatShared_IO, stats),
+		.shared_data_len = sizeof(((PgStatShared_IO *) 0)->stats),
+
 		.reset_all_cb = pgstat_io_reset_all_cb,
 		.snapshot_cb = pgstat_io_snapshot_cb,
 	},
@@ -378,6 +394,10 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
 
 		.fixed_amount = true,
 
+		.shared_ctl_off = offsetof(PgStat_ShmemControl, slru),
+		.shared_data_off = offsetof(PgStatShared_SLRU, stats),
+		.shared_data_len = sizeof(((PgStatShared_SLRU *) 0)->stats),
+
 		.reset_all_cb = pgstat_slru_reset_all_cb,
 		.snapshot_cb = pgstat_slru_snapshot_cb,
 	},
@@ -387,6 +407,10 @@ static const PgStat_KindInfo pgstat_kind_infos[PGSTAT_NUM_KINDS] = {
 
 		.fixed_amount = true,
 
+		.shared_ctl_off = offsetof(PgStat_ShmemControl, wal),
+		.shared_data_off = offsetof(PgStatShared_Wal, stats),
+		.shared_data_len = sizeof(((PgStatShared_Wal *) 0)->stats),
+
 		.reset_all_cb = pgstat_wal_reset_all_cb,
 		.snapshot_cb = pgstat_wal_snapshot_cb,
 	},
@@ -1520,47 +1544,20 @@ pgstat_read_statsfile(void)
 		format_id != PGSTAT_FILE_FORMAT_ID)
 		goto error;
 
-	/*
-	 * XXX: The following could now be generalized to just iterate over
-	 * pgstat_kind_infos instead of knowing about the different kinds of
-	 * stats.
-	 */
+	/* Read various stats structs */
+	for (PgStat_Kind kind = PGSTAT_KIND_FIRST_VALID; kind < PGSTAT_NUM_KINDS; ++kind)
+	{
+		char	   *ptr;
+		const PgStat_KindInfo *info;
 
-	/*
-	 * Read archiver stats struct
-	 */
-	if (!read_chunk_s(fpin, >archiver.stats))
-		goto error;
+		info = pgstat_kind_infos + kind;
+		if (!info->fixed_amount)
+			continue;
 
-	/*
-	 * Read bgwriter stats struct
-	 */
-	if (!read_chunk_s(fpin, >bgwriter.stats))
-		goto error;
-
-	/*
-	 * Read checkpointer stats struct
-	 */
-	if (!read_chunk_s(fpin, >checkpointer.stats))
-		goto error;
-
-	/*
-	 * Read IO stats struct
-	 */
-	if (!read_chunk_s(fpin, >io.stats))
-		goto error;
-
-	/*
-	 * Read SLRU stats struct
-	 */
-	if (!read_chunk_s(fpin, >slru.stats))
-		goto error;
-
-	/*
-	 * Read WAL stats struct
-	 */
-	if (!read_chunk_s(fpin, >wal.stats))
-		goto error;
+		ptr = ((char *) shmem) + info->shared_ctl_off + info->shared_data_off;
+		if (!read_chunk(fpin, ptr, info->shared_data_len))
+			goto error;
+	}
 
 	/*
 	 * We found an existing statistics

Re: On disable_cost

2024-05-06 Thread Tom Lane

Robert Haas  writes:
> On Mon, May 6, 2024 at 9:39 AM Robert Haas  wrote:
>> It's not very clear that this mechanism is actually 100% reliable,

> It isn't. Here's a test case.

Very interesting.

> ... Which then allowed me to
> construct the example above, where there are two possible TID quals
> and the logic in tidpath.c latches onto the wrong one.

Hmm.  Without having traced through it, I'm betting that the
CurrentOfExpr qual is rejected as a tidqual because it's not
considered leakproof.  It's not obvious to me why we couldn't consider
it as leakproof, though.  If we don't want to do that in general,
then we need some kind of hack in TidQualFromRestrictInfo to accept
CurrentOfExpr quals anyway.

In general I think you're right that something less rickety than
the disable_cost hack would be a good idea to ensure the desired
TidPath gets chosen, but this problem is not the fault of that.
We're not making the TidPath with the correct contents in the first
place.

regards, tom lane

Re: Control flow in logical replication walsender

2024-05-06 Thread Christophe Pettus

Thank you for the reply!

> On May 1, 2024, at 02:18, Ashutosh Bapat  wrote:
> Is there a large transaction which is failing to be replicated repeatedly - 
> timeouts, crashes on upstream or downstream?

AFAIK, no, although I am doing this somewhat by remote control (I don't have 
direct access to the failing system).  This did bring up one other question, 
though:

Are subtransactions written to their own individual reorder buffers (and thus 
potentially spill files), or are they appended to the topmost transaction's 
reorder buffer?

Re: backend stuck in DataFileExtend

2024-05-06 Thread Justin Pryzby

On Mon, May 06, 2024 at 10:51:08AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2024-05-06 12:37:26 -0500, Justin Pryzby wrote:
> > On Mon, May 06, 2024 at 10:04:13AM -0700, Andres Freund wrote:
> > > Hi,
> > >
> > > On 2024-05-06 09:05:38 -0500, Justin Pryzby wrote:
> > > > In March, I noticed that a backend got stuck overnight doing:
> > > >
> > > > backend_start| 2024-03-27 22:34:12.774195-07
> > > > xact_start   | 2024-03-27 22:34:39.741933-07
> > > > query_start  | 2024-03-27 22:34:41.997276-07
> > > > state_change | 2024-03-27 22:34:41.997307-07
> > > > wait_event_type  | IO
> > > > wait_event   | DataFileExtend
> > > > state| active
> > > > backend_xid  | 330896991
> > > > backend_xmin | 330896991
> > > > query_id | -3255287420265634540
> > > > query| PREPARE mml_1 AS INSERT INTO child.un...
> > > >
> > > > The backend was spinning at 100% CPU:
> > > >
> > > > [pryzbyj@telsa2021 ~]$ ps -O wchan,pcpu 7881
> > > >PID WCHAN  %CPU S TTY  TIME COMMAND
> > > >   7881 ?  99.4 R ?08:14:55 postgres: telsasoft ts [local] 
> > > > INSERT
> > > >
> > > > This was postgres 16 STABLE compiled at 14e991db8.
> > > >
> > > > It's possible that this is a rare issue that we haven't hit before.
> > > > It's also possible this this is a recent regression.  We previously
> > > > compiled at b2c9936a7 without hitting any issue (possibly by chance).
> > > >
> > > > I could neither strace the process nor attach a debugger.  They got
> > > > stuck attaching.  Maybe it's possible there's a kernel issue.  This is a
> > > > VM running centos7 / 3.10.0-1160.49.1.el7.x86_64.
> > >
> > > > $ awk '{print $14, $15}' /proc/7881/stat # usr / sys
> > > > 229 3088448
> > > >
> > > > When I tried to shut down postgres (hoping to finally be able to attach
> > > > a debugger), instead it got stuck:
> > >
> > > That strongly indicates either a kernel bug or storage having an issue. It
> > > can't be postgres' fault if an IO never completes.
> >
> > Is that for sure even though wchan=? (which I take to mean "not in a system
> > call"), and the process is stuck in user mode ?
> 
> Postgres doesn't do anything to prevent a debugger from working, so this is
> just indicative that the kernel is stuck somewhere that it didn't set up
> information about being blocked - because it's busy doing something.
> 
> 
> > > What do /proc/$pid/stack say?
> >
> > [pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/18468/stat
> > 18468 (postgres) R 2274 18468 18468 0 -1 4857920 91836 0 3985 0 364 3794271 
> > 0 0 20 0 1 0 6092292660 941846528 10 18446744073709551615 4194304 12848820 
> > 140732995870240 140732995857304 139726958536394 0 4194304 19929088 
> > 536896135 0 0 0 17 3 0 0 1682 0 0 14949632 15052146 34668544 
> > 140732995874457 140732995874511 140732995874511 140732995874781 0
> 
> stack, not stat...

Ah, that is illuminating - thanks.

[pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/18468/stack 
[] __cond_resched+0x26/0x30
[] dbuf_rele+0x1e/0x40 [zfs]
[] dmu_buf_rele_array.part.6+0x40/0x70 [zfs]
[] dmu_write_uio_dnode+0x11a/0x180 [zfs]
[] dmu_write_uio_dbuf+0x54/0x70 [zfs]
[] zfs_write+0xb9b/0xfb0 [zfs]
[] zpl_aio_write+0x152/0x1a0 [zfs]
[] do_sync_readv_writev+0x7b/0xd0
[] do_readv_writev+0xce/0x260
[] vfs_writev+0x35/0x60
[] SyS_pwritev+0xc2/0xf0
[] system_call_fastpath+0x25/0x2a
[] 0x

FWIW: both are running zfs-2.2.3 RPMs from zfsonlinux.org.

It's surely possible that there's an issue that affects older kernels
but not more recent ones.

> > > > Full disclosure: the VM that hit this issue today has had storage-level
> > > > errors (reported here at ZZqr_GTaHyuW7fLp@pryzbyj2023), as recently as 3
> > > > days ago.
> > >
> > > So indeed, my suspicion from above is confirmed.
> >
> > I'd be fine with that conclusion (as in the earlier thread), except this
> > has now happened on 2 different VMs, and the first one has no I/O
> > issues.  If this were another symptom of a storage failure, and hadn't
> > previously happened on another VM, I wouldn't be re-reporting it.
> 
> Is it the same VM hosting environment? And the same old distro?

Yes, they're running centos7 with the indicated kernels.

dmidecode shows they're both running:

Product Name: VMware Virtual Platform

But they're different customers, so I'd be somewhat surprised if they're
running same versions of the hypervisor.

-- 
Justin

Re: On disable_cost

2024-05-06 Thread Robert Haas

On Mon, May 6, 2024 at 9:39 AM Robert Haas  wrote:
> It's not very clear that this mechanism is actually 100% reliable,

It isn't. Here's a test case. As a non-superuser, do this:

create table foo (a int, b text, primary key (a));
insert into foo values (1, 'Apple');
alter table foo enable row level security;
alter table foo force row level security;
create policy p1 on foo as permissive using (ctid in ('(0,1)', '(0,2)'));
begin;
declare c cursor for select * from foo;
fetch from c;
explain update foo set b = 'Manzana' where current of c;
update foo set b = 'Manzana' where current of c;

The explain produces this output:

 Update on foo  (cost=100.00..108.02 rows=0 width=0)
   ->  Tid Scan on foo  (cost=100.00..108.02 rows=1 width=38)
 TID Cond: (ctid = ANY ('{"(0,1)","(0,2)"}'::tid[]))
 Filter: CURRENT OF c

Unless I'm quite confused, the point of the code is to force
CurrentOfExpr to be a TID Cond, and it normally succeeds in doing so,
because WHERE CURRENT OF cursor_name has to be the one and only WHERE
condition for a normal UPDATE. I tried various cases involving views
and CTEs and got nowhere. But then I wrote a patch to make the
regression tests fail if a baserel's restrictinfo list contains a
CurrentOfExpr and also some other qual, and a couple of row-level
security tests failed (and nothing else). Which then allowed me to
construct the example above, where there are two possible TID quals
and the logic in tidpath.c latches onto the wrong one. The actual
UPDATE fails like this:

ERROR:  WHERE CURRENT OF is not supported for this table type

...because ExecEvalCurrentOfExpr supposes that the only way it can be
reached is for an FDW without the necessary support, but actually in
this case it's planner error that gets us here.

Fortunately, there's no real reason for anyone to ever do something
like this, or at least I can't see one, so the fact that it doesn't
work probably doesn't really matter that much. And you can argue that
the only problem here is that the costing hack just didn't get updated
for RLS and now needs to be a bit more clever. But I think it'd be
better to find a way of making it less hacky. With the way the code is
structured right now, the chances of anyone understanding that RLS
might have an impact on its correctness were just about nil, IMHO.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: backend stuck in DataFileExtend

2024-05-06 Thread Andres Freund

Hi,

On 2024-05-06 12:37:26 -0500, Justin Pryzby wrote:
> On Mon, May 06, 2024 at 10:04:13AM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2024-05-06 09:05:38 -0500, Justin Pryzby wrote:
> > > In March, I noticed that a backend got stuck overnight doing:
> > >
> > > backend_start| 2024-03-27 22:34:12.774195-07
> > > xact_start   | 2024-03-27 22:34:39.741933-07
> > > query_start  | 2024-03-27 22:34:41.997276-07
> > > state_change | 2024-03-27 22:34:41.997307-07
> > > wait_event_type  | IO
> > > wait_event   | DataFileExtend
> > > state| active
> > > backend_xid  | 330896991
> > > backend_xmin | 330896991
> > > query_id | -3255287420265634540
> > > query| PREPARE mml_1 AS INSERT INTO child.un...
> > >
> > > The backend was spinning at 100% CPU:
> > >
> > > [pryzbyj@telsa2021 ~]$ ps -O wchan,pcpu 7881
> > >PID WCHAN  %CPU S TTY  TIME COMMAND
> > >   7881 ?  99.4 R ?08:14:55 postgres: telsasoft ts [local] 
> > > INSERT
> > >
> > > This was postgres 16 STABLE compiled at 14e991db8.
> > >
> > > It's possible that this is a rare issue that we haven't hit before.
> > > It's also possible this this is a recent regression.  We previously
> > > compiled at b2c9936a7 without hitting any issue (possibly by chance).
> > >
> > > I could neither strace the process nor attach a debugger.  They got
> > > stuck attaching.  Maybe it's possible there's a kernel issue.  This is a
> > > VM running centos7 / 3.10.0-1160.49.1.el7.x86_64.
> >
> > > $ awk '{print $14, $15}' /proc/7881/stat # usr / sys
> > > 229 3088448
> > >
> > > When I tried to shut down postgres (hoping to finally be able to attach
> > > a debugger), instead it got stuck:
> >
> > That strongly indicates either a kernel bug or storage having an issue. It
> > can't be postgres' fault if an IO never completes.
>
> Is that for sure even though wchan=? (which I take to mean "not in a system
> call"), and the process is stuck in user mode ?

Postgres doesn't do anything to prevent a debugger from working, so this is
just indicative that the kernel is stuck somewhere that it didn't set up
information about being blocked - because it's busy doing something.


> > What do /proc/$pid/stack say?
>
> [pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/18468/stat
> 18468 (postgres) R 2274 18468 18468 0 -1 4857920 91836 0 3985 0 364 3794271 0 
> 0 20 0 1 0 6092292660 941846528 10 18446744073709551615 4194304 12848820 
> 140732995870240 140732995857304 139726958536394 0 4194304 19929088 536896135 
> 0 0 0 17 3 0 0 1682 0 0 14949632 15052146 34668544 140732995874457 
> 140732995874511 140732995874511 140732995874781 0

stack, not stat...


> > > Full disclosure: the VM that hit this issue today has had storage-level
> > > errors (reported here at ZZqr_GTaHyuW7fLp@pryzbyj2023), as recently as 3
> > > days ago.
> >
> > So indeed, my suspicion from above is confirmed.
>
> I'd be fine with that conclusion (as in the earlier thread), except this
> has now happened on 2 different VMs, and the first one has no I/O
> issues.  If this were another symptom of a storage failure, and hadn't
> previously happened on another VM, I wouldn't be re-reporting it.

Is it the same VM hosting environment? And the same old distro?

Greetings,

Andres Freund

2024-05-09 release announcement draft

2024-05-06 Thread Jonathan S. Katz


Hi,

Please find the draft of the 2024-05-09 release announcement.

Please review for corrections and any omissions that should be called 
out as part of this release.


Please provide feedback no later (and preferably sooner) than 2024-05-09 
12:00 UTC.


Thanks,

Jonathan
The PostgreSQL Global Development Group has released an update to all supported
versions of PostgreSQL, including 16.3, 15.7, 14.12, 13.15, and 12.19.
This release fixes over 55 bugs reported over the last several months.

For the full list of changes, please review the
[release notes](https://www.postgresql.org/docs/release/).

PostgreSQL 12 EOL Notice


PostgreSQL 12 will stop receiving fixes on November 14, 2024. If you are
running PostgreSQL 12 in a production environment, we suggest that you make
plans to upgrade to a newer, supported version of PostgreSQL. Please see our
[versioning policy](https://www.postgresql.org/support/versioning/) for more
information.

Bug Fixes and Improvements
--
 
This update fixes over 55 bugs that were reported in the last several months.
The issues listed below affect PostgreSQL 16. Some of these issues may also
affect other supported versions of PostgreSQL.

* Fix [`INSERT`](https://www.postgresql.org/docs/current/sql-insert.html) from
multiple [`VALUES`](https://www.postgresql.org/docs/current/sql-values.html)
rows into a target column that is a domain over an array or composite type.
including requiring the [SELECT 
privilege](https://www.postgresql.org/docs/current/sql-grant.html)
on the target table when using 
[`MERGE`](https://www.postgresql.org/docs/current/sql-merge.html)
when using `MERGE ... DO NOTHING`.
* Per the SQL standard, throw an error if a target row in `MERGE` joins to more
than one source row during a modification.
* Fix incorrect pruning of `NULL` partition when a table is partitioned on
a boolean column and the query has a boolean `IS NOT` clause.
* Make [`ALTER FOREIGN TABLE ... SET 
SCHEMA`](https://www.postgresql.org/docs/current/sql-alterforeigntable.html)
move any owned sequences into the new schema.
* [`CREATE 
DATABASE`](https://www.postgresql.org/docs/current/sql-createdatabase.html)
now recognizes `STRATEGY` keywords case-insensitively.
* Fix how [EXPLAIN](https://www.postgresql.org/docs/current/sql-explain.html)
counts heap pages during bitmap heap scan to show all counted pages, not just
ones with visible tuples.
* Avoid deadlock during removal of orphaned temporary tables.
* Several fixes for 
[`VACUUM`](https://www.postgresql.org/docs/current/sql-vacuum.html),
including one that can reduce unnecessary I/O.
* Several query planner fixes.
* Added optimization for certain operations where an installation has thousands
of roles.
* Fix confusion for SQL-language procedures that returns a single composite-type
column.
* Fix incorrect rounding and overflow hazards in 
[`date_bin()`](https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-BIN).
* Detect integer overflow when adding or subtracting an
[interval](https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT)
to/from a timestamp.
* Throw an error if an index is accessed while it is being reindexed.
* Fix several race conditions with logical replication, including determining if
a table sync operation is required.
* Disconnect if a new server session's client socket cannot be put into 
non-blocking mode.
* [`initdb -c`](https://www.postgresql.org/docs/current/app-initdb.html) now
matches parameter names case-insensitively.
* Avoid leaking a query result from 
[`psql`](https://www.postgresql.org/docs/current/app-psql.html)
after the query is cancelled.
* Fix how PL/pgSQL parses of [single-line 
comments](https://www.postgresql.org/docs/current/plpgsql-structure.html) (`-- 
style comments`)
following expression.

Updating


All PostgreSQL update releases are cumulative. As with other minor releases,
users are not required to dump and reload their database or use `pg_upgrade` in
order to apply this update release; you may simply shutdown PostgreSQL and
update its binaries.

Users who have skipped one or more update releases may need to run additional
post-update steps; please see the release notes from earlier versions for
details.

For more details, please see the
[release notes](https://www.postgresql.org/docs/release/).

Links
-
* [Download](https://www.postgresql.org/download/)
* [Release Notes](https://www.postgresql.org/docs/release/)
* [Security](https://www.postgresql.org/support/security/)
* [Versioning Policy](https://www.postgresql.org/support/versioning/)
* [PostgreSQL 16 Release Announcement](https://www.postgresql.org/about/press/)
* [Follow @postgresql on Twitter](https://twitter.com/postgresql)
* [Donate](https://www.postgresql.org/about/donate/)

If you have corrections or suggestions for this release announcement, please
send them to the _pgsql-www@lists.postgresql.org_ public

Re: backend stuck in DataFileExtend

2024-05-06 Thread Justin Pryzby

On Mon, May 06, 2024 at 10:04:13AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2024-05-06 09:05:38 -0500, Justin Pryzby wrote:
> > In March, I noticed that a backend got stuck overnight doing:
> > 
> > backend_start| 2024-03-27 22:34:12.774195-07
> > xact_start   | 2024-03-27 22:34:39.741933-07
> > query_start  | 2024-03-27 22:34:41.997276-07
> > state_change | 2024-03-27 22:34:41.997307-07
> > wait_event_type  | IO
> > wait_event   | DataFileExtend
> > state| active
> > backend_xid  | 330896991
> > backend_xmin | 330896991
> > query_id | -3255287420265634540
> > query| PREPARE mml_1 AS INSERT INTO child.un...
> > 
> > The backend was spinning at 100% CPU:
> > 
> > [pryzbyj@telsa2021 ~]$ ps -O wchan,pcpu 7881
> >PID WCHAN  %CPU S TTY  TIME COMMAND
> >   7881 ?  99.4 R ?08:14:55 postgres: telsasoft ts [local] INSERT
> > 
> > This was postgres 16 STABLE compiled at 14e991db8.
> > 
> > It's possible that this is a rare issue that we haven't hit before.
> > It's also possible this this is a recent regression.  We previously
> > compiled at b2c9936a7 without hitting any issue (possibly by chance).
> > 
> > I could neither strace the process nor attach a debugger.  They got
> > stuck attaching.  Maybe it's possible there's a kernel issue.  This is a
> > VM running centos7 / 3.10.0-1160.49.1.el7.x86_64.
> 
> > $ awk '{print $14, $15}' /proc/7881/stat # usr / sys
> > 229 3088448
> > 
> > When I tried to shut down postgres (hoping to finally be able to attach
> > a debugger), instead it got stuck:
> 
> That strongly indicates either a kernel bug or storage having an issue. It
> can't be postgres' fault if an IO never completes.

Is that for sure even though wchan=? (which I take to mean "not in a system
call"), and the process is stuck in user mode ?

> What do /proc/$pid/stack say?

[pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/18468/stat
18468 (postgres) R 2274 18468 18468 0 -1 4857920 91836 0 3985 0 364 3794271 0 0 
20 0 1 0 6092292660 941846528 10 18446744073709551615 4194304 12848820 
140732995870240 140732995857304 139726958536394 0 4194304 19929088 536896135 0 
0 0 17 3 0 0 1682 0 0 14949632 15052146 34668544 140732995874457 
140732995874511 140732995874511 140732995874781 0

> > In both cases, the backend got stuck after 10pm, which is when a backup
> > job kicks off, followed by other DB maintenance.  Our backup job uses
> > pg_export_snapshot() + pg_dump --snapshot.  In today's case, the pg_dump
> > would've finished and snapshot closed at 2023-05-05 22:15.  The backup
> > job did some more pg_dumps involving historic tables without snapshots
> > and finished at 01:11:40, at which point a reindex job started, but it
> > looks like the DB was already stuck for the purpose of reindex, and so
> > the script ended after a handful of commands were "[canceled] due to
> > statement timeout".
> 
> Is it possible that you're "just" waiting for very slow IO? Is there a lot of
> dirty memory? Particularly on these old kernels that can lead to very extreme
> delays.
> 
> grep -Ei 'dirty|writeback' /proc/meminfo

[pryzbyj@telsasoft-centos7 ~]$ grep -Ei 'dirty|writeback' /proc/meminfo
Dirty:28 kB
Writeback: 0 kB
WritebackTmp:  0 kB

> > Full disclosure: the VM that hit this issue today has had storage-level
> > errors (reported here at ZZqr_GTaHyuW7fLp@pryzbyj2023), as recently as 3
> > days ago.
> 
> So indeed, my suspicion from above is confirmed.

I'd be fine with that conclusion (as in the earlier thread), except this
has now happened on 2 different VMs, and the first one has no I/O
issues.  If this were another symptom of a storage failure, and hadn't
previously happened on another VM, I wouldn't be re-reporting it.

-- 
Justin

Re: Removing unneeded self joins

2024-05-06 Thread Bruce Momjian

On Mon, May  6, 2024 at 12:24:41PM -0400, Robert Haas wrote:
> Now that being said, I do also agree that the planner code is quite
> hard to understand, for various reasons. I don't think the structure
> of that code and the assumptions underlying it are as well-documented
> as they could be, and neither do I think that all of them are optimal.
> It has taken me a long time to learn as much as I know, and there is
> still quite a lot that I don't know. And I also agree that the planner
> does an unfortunate amount of in-place modification of existing
> structures without a lot of clarity about how it all works, and an
> unfortunate amount of data copying in some places, and even that the
> partition-wise join code isn't all that it could be. But I do not
> think that adds up to a conclusion that we should just be less
> ambitious with planner changes. Indeed, I would like to see us do
> more. There is certainly a lot of useful work that could be done. The
> trick is figuring out how to do it without breaking too many things,
> and that is not easy.

I agree with Robert.  While writting the Postgres 17 release notes, I am
excited to see the many optimizer improvements, and removing self-joins
from that list will be unfortunate.

I did write a blog entry in 2021 that suggested we could have
optimizer aggressiveness control to allow for more expensive
optimizations:

https://momjian.us/main/blogs/pgblog/2021.html#May_14_2021

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: backend stuck in DataFileExtend

2024-05-06 Thread Andres Freund

Hi,

On 2024-05-06 09:05:38 -0500, Justin Pryzby wrote:
> In March, I noticed that a backend got stuck overnight doing:
> 
> backend_start| 2024-03-27 22:34:12.774195-07
> xact_start   | 2024-03-27 22:34:39.741933-07
> query_start  | 2024-03-27 22:34:41.997276-07
> state_change | 2024-03-27 22:34:41.997307-07
> wait_event_type  | IO
> wait_event   | DataFileExtend
> state| active
> backend_xid  | 330896991
> backend_xmin | 330896991
> query_id | -3255287420265634540
> query| PREPARE mml_1 AS INSERT INTO child.un...
> 
> The backend was spinning at 100% CPU:
> 
> [pryzbyj@telsa2021 ~]$ ps -O wchan,pcpu 7881
>PID WCHAN  %CPU S TTY  TIME COMMAND
>   7881 ?  99.4 R ?08:14:55 postgres: telsasoft ts [local] INSERT
> 
> This was postgres 16 STABLE compiled at 14e991db8.
> 
> It's possible that this is a rare issue that we haven't hit before.
> It's also possible this this is a recent regression.  We previously
> compiled at b2c9936a7 without hitting any issue (possibly by chance).
> 
> I could neither strace the process nor attach a debugger.  They got
> stuck attaching.  Maybe it's possible there's a kernel issue.  This is a
> VM running centos7 / 3.10.0-1160.49.1.el7.x86_64.

> $ awk '{print $14, $15}' /proc/7881/stat # usr / sys
> 229 3088448
> 
> When I tried to shut down postgres (hoping to finally be able to attach
> a debugger), instead it got stuck:

That strongly indicates either a kernel bug or storage having an issue. It
can't be postgres' fault if an IO never completes.

What do /proc/$pid/stack say?


> In both cases, the backend got stuck after 10pm, which is when a backup
> job kicks off, followed by other DB maintenance.  Our backup job uses
> pg_export_snapshot() + pg_dump --snapshot.  In today's case, the pg_dump
> would've finished and snapshot closed at 2023-05-05 22:15.  The backup
> job did some more pg_dumps involving historic tables without snapshots
> and finished at 01:11:40, at which point a reindex job started, but it
> looks like the DB was already stuck for the purpose of reindex, and so
> the script ended after a handful of commands were "[canceled] due to
> statement timeout".

Is it possible that you're "just" waiting for very slow IO? Is there a lot of
dirty memory? Particularly on these old kernels that can lead to very extreme
delays.

grep -Ei 'dirty|writeback' /proc/meminfo


> [...]
> Full disclosure: the VM that hit this issue today has had storage-level
> errors (reported here at ZZqr_GTaHyuW7fLp@pryzbyj2023), as recently as 3
> days ago.

So indeed, my suspicion from above is confirmed.

Greetings,

Andres Freund

Re: pg17 issues with not-null contraints

2024-05-06 Thread Justin Pryzby

On Mon, May 06, 2024 at 06:34:16PM +0200, Alvaro Herrera wrote:
> On 2024-May-06, Justin Pryzby wrote:
> 
> > > (Do you really want the partition to be
> > > created without the primary key already there?)
> > 
> > Why not ?  The PK will be added when I attach it one moment later.
> > 
> > CREATE TABLE part (LIKE parent);
> > ALTER TABLE parent ATTACH PARTITION part ...
> 
> Well, if you load data in the meantime, you'll spend time during `ALTER
> TABLE parent` for the index to be created.  (On the other hand, you may
> want to first create the table, then load data, then create the
> indexes.)

To be clear, I'm referring to the case of CREATE+ATTACH to avoid a
strong lock while creating a partition in advance of loading data.  See:
20220718143304.gc18...@telsasoft.com
f170b572d2b4cc232c5b6d391b4ecf3e368594b7
898e5e3290a72d288923260143930fb32036c00c

> This would also solve your complaint, because then the table would have
> the not-null constraint in all cases.

I agree that it would solve my complaint, but at this time I've no
further opinion.

-- 
Justin

Re: pg17 issues with not-null contraints

2024-05-06 Thread Alvaro Herrera

On 2024-May-06, Justin Pryzby wrote:

> > (Do you really want the partition to be
> > created without the primary key already there?)
> 
> Why not ?  The PK will be added when I attach it one moment later.
> 
> CREATE TABLE part (LIKE parent);
> ALTER TABLE parent ATTACH PARTITION part ...

Well, if you load data in the meantime, you'll spend time during `ALTER
TABLE parent` for the index to be created.  (On the other hand, you may
want to first create the table, then load data, then create the
indexes.)

> Do you really think that after ATTACH, the constraints should be
> different depending on whether the child was created INCLUDING INDEXES ?
> I'll continue to think about this, but I still find that surprising.

I don't think I have a choice about this, because the standard says that
the resulting table must have NOT NULL on all columns which have a
nullability characteristic is known not nullable; and the primary key
forces that to be the case.

Thinking again, maybe this is wrong in the opposite direction: perhaps
we should have not-null constraints on those columns even if INCLUDING
CONSTRAINTS is given, because failing to do that (which is the current
behavior) is non-conformant.  In turn, this suggests that in order to
make the partitioning behavior consistent, we should _in addition_ make
CREATE TABLE PARTITION OF add explicit not-null constraints to the
columns of the primary key of the partitioned table.

This would also solve your complaint, because then the table would have
the not-null constraint in all cases.

This might be taking the whole affair too far, though; not sure.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"We’ve narrowed the problem down to the customer’s pants being in a situation
 of vigorous combustion" (Robert Haas, Postgres expert extraordinaire)

Re: Removing unneeded self joins

2024-05-06 Thread Robert Haas

On Mon, May 6, 2024 at 12:01 PM Andrei Lepikhov
 wrote:
> Right now, it evolves extensively - new structures, variables,
> alternative copies of the same node trees with slightly changed
> properties ... This way allows us to quickly introduce some planning
> features (a lot of changes in planner logic since PG16 is evidence of
> that) and with still growing computing resources it allows postgres to
> fit RAM and proper planning time. But maybe we want to be more modest?
> The Ashutosh's work he has been doing this year shows how sometimes
> expensive the planner is. Perhaps we want machinery that will check the
> integrity of planning data except the setrefs, which fail to detect that
> occasionally?
> If an extensive approach is the only viable option, then it's clear that
> this and many other features are simply not suitable for Postgres
> Planner. It's disheartening that this patch didn't elicit such
> high-level feedback.

Well, as I said before, I think self-join elimination is a good
feature, and I believe that it belongs in PostgreSQL. However, I don't
believe that this implementation was done as well as it needed to be
done. A great deal of the work involved in a feature like this lies in
figuring out at what stage of processing certain kinds of
transformations ought to be done, and what cleanup is needed
afterward. It is difficult for anyone to get that completely right the
first time around; left join elimination also provoked a series of
after-the-fact bug fixes. However, I think those were fewer in number
and spread over a longer period of time.

Now that being said, I do also agree that the planner code is quite
hard to understand, for various reasons. I don't think the structure
of that code and the assumptions underlying it are as well-documented
as they could be, and neither do I think that all of them are optimal.
It has taken me a long time to learn as much as I know, and there is
still quite a lot that I don't know. And I also agree that the planner
does an unfortunate amount of in-place modification of existing
structures without a lot of clarity about how it all works, and an
unfortunate amount of data copying in some places, and even that the
partition-wise join code isn't all that it could be. But I do not
think that adds up to a conclusion that we should just be less
ambitious with planner changes. Indeed, I would like to see us do
more. There is certainly a lot of useful work that could be done. The
trick is figuring out how to do it without breaking too many things,
and that is not easy.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-06 Thread Jelte Fennema-Nio

On Tue, 23 Apr 2024 at 19:39, Jacob Champion
 wrote:
>
> On Mon, Apr 22, 2024 at 2:20 PM Jelte Fennema-Nio  wrote:
> > 1. I strongly believe minor protocol version bumps after the initial
> > 3.1 one can be made painless for clients/poolers (so the ones to
> > 3.2/3.3/etc). Similar to how TLS 1.3 can be safely introduced, and not
> > having to worry about breaking TLS 1.2 communication.
>
> Apologies for focusing on a single portion of your argument, but this
> claim in particular stuck out to me. To my understanding, IETF worried
> a _lot_ about breaking TLS 1.2 implementations with the TLS 1.3
> change, to the point that TLS 1.3 clients and servers advertise
> themselves as TLS 1.2 and sneak the actual version used into a TLS
> extension (roughly analogous to the _pq_ stuff). I vaguely recall that
> the engineering work done for that update was pretty far from
> painless.

My bad... I guess TLS 1.3 was a bad example, due to it changing the
handshake itself so significantly.

Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs

2024-05-06 Thread Jelte Fennema-Nio

On Tue, 23 Apr 2024 at 17:03, Robert Haas  wrote:
> If a client doesn't know about encrypted columns and sets that
> bit at random, that will break things, and formally I think that's a
> risk, because I don't believe we document anywhere that you shouldn't
> set unused bits in the format mask. But practically, it's not likely.
> (And also, maybe we should document that you shouldn't do that.)

Currently Postgres errors out when you set anything other than 0 or 1
in the format field. And it's already documented that these are the
only allowed values: "Each must presently be zero (text) or one
(binary)."

> Let's consider a hypothetical country much like Canada except that
> there are three official languages rather than two: English, French,
> and Robertish.

I don't really understand the point you are trying to make with this
analogy. Sure an almost equivalent unused language maybe shouldn't be
put on the signs, but having two different names (aka protocol
versions) for English and Robertish seems quite useful to avoid
confusion.

> And so here. If someone codes a connection pooler in the way you
> suppose, then it will break. But, first of all, they probably won't do
> that, both because it's not particularly likely that someone wants to
> gather that particular set of statistics and also because erroring out
> seems like an overreaction.

Is it really that much of an overreaction? Postgres happily errors on
any weirdness in the protocol including unknown format codes. Why
wouldn't a pooler/proxy in the middle be allowed to do the same? The
pooler has no clue what the new format means, maybe it requires some
session state to be passed on to the server to be interpreted
correctly. The pooler would be responsible for syncing that session
state but might not have synced it because it doesn't know that the
format code requires that. An example of this would be a compressed
value of which the compression algorithm is configured using a GUC.
Thus the pooler being strict in what it accepts would be a good way of
notifying the client that the pooler needs to be updated (or the
feature not used).

But I do agree that it seems quite unlikely that a pooler would
implement it like that though. However, a pooler logging a warning
seems not that crazy. And in that case the connection might not be
closed, but the logs would be flooded.

> And secondly, let's imagine that we do
> bump the protocol version and think about whether and how that solves
> the problem. A client will request from the pooler a version 3.1
> connection and the pooler will say, sorry, no can do, I only
> understand 3.0. So the client will now say, oh ok, no problem, I'm
> going to refrain from setting that parameter format bit. Cool, right?
>
> Well, no, not really. First, now the client application is probably
> broken. If the client is varying its behavior based on the server's
> protocol version, that must mean that it cares about accessing
> encrypted columns, and that means that the bit in question is not an
> optional feature. So actually, the fact that the pooler can force the
> client to downgrade hasn't fixed anything at all.

It seems quite a lot nicer if the client can safely fallback to not
writing data using the new format code, instead of getting an error
from the pooler/flooding the pooler logs/having the pooler close the
connection.

> Third, applications, drivers, and connection poolers now all need to
> worry about handling downgrades smoothly. If a connection pooler
> requests a v3.1 connection to the server and gets v3.0, it had better
> make sure that it only advertises 3.0 to the client.

This seems quite straightforward to solve from a pooler perspective:
1. Connect to the server first to find out what the maximum version is
that it support
2. If a client asks for a higher version, advertise the server version

> If the client
> requests v3.0, the pooler had better make sure to either request v3.0
> from the server. Or alternatively, the pooler can be prepared to
> translate between 3.0 and 3.1 wherever that's needed, in either
> direction. But it's not at all clear what that would look like for
> something like TCE. Will the pooler arrange to encrypt parameters
> destined for encrypted tables if the client doesn't do so? Will it
> arrange to decrypt values coming from encrypted tables if the client
> doesn't understand encryption?

This is a harder problem, and indeed one I hadn't considered much.
>From a pooler perspective I really would like to be able to have just
one pool of connections to the server all initially connected with the
3.1 protocol and use those 3.1 server connections for clients that use
the 3.1 protocol but also for clients that use the 3.0 protocol.
Creating separate pools of server connections for different protocol
versions is super annoying to manage for operators (e.g. how should
these pools be sized). One of the reasons I'm proposing the
ParameterSet message is to avoid this problem for protocol

Re: pg17 issues with not-null contraints

2024-05-06 Thread Justin Pryzby

On Mon, May 06, 2024 at 05:56:54PM +0200, Alvaro Herrera wrote:
> On 2024-May-04, Alvaro Herrera wrote:
> > On 2024-May-03, Justin Pryzby wrote:
> > 
> > > But if it's created with LIKE:
> > > postgres=# CREATE TABLE t1 (LIKE t);
> > > postgres=# ALTER TABLE t ATTACH PARTITION t1 DEFAULT ;
> > > 
> > > ..one also sees:
> > > 
> > > Not-null constraints:
> > > "t1_i_not_null" NOT NULL "i"
> > 
> > Hmm, I think the problem here is not ATTACH; the not-null constraint is
> > there immediately after CREATE.  I think this is all right actually,
> > because we derive a not-null constraint from the primary key and this is
> > definitely intentional.  But I also think that if you do CREATE TABLE t1
> > (LIKE t INCLUDING CONSTRAINTS) then you should get only the primary key
> > and no separate not-null constraint.  That will make the table more
> > similar to the one being copied.
> 
> I misspoke -- it's INCLUDING INDEXES that we need here, not INCLUDING
> CONSTRAINTS ... and it turns out we already do it that way, so with this
> script
> 
> CREATE TABLE t (i int PRIMARY KEY) PARTITION BY RANGE (i);
> CREATE TABLE t1 (LIKE t INCLUDING INDEXES);
> ALTER TABLE t ATTACH PARTITION t1 DEFAULT ;
> 
> you end up with this
> 
> 55432 17devel 71313=# \d+ t
>   Partitioned table "public.t"
>  Column │  Type   │ Collation │ Nullable │ Default │ Storage │ Compression │ 
> Stats target │ Description 
> ┼─┼───┼──┼─┼─┼─┼──┼─
>  i  │ integer │   │ not null │ │ plain   │ │  
> │ 
> Partition key: RANGE (i)
> Indexes:
> "t_pkey" PRIMARY KEY, btree (i)
> Partitions: t1 DEFAULT
> 
> 55432 17devel 71313=# \d+ t1
>Table "public.t1"
>  Column │  Type   │ Collation │ Nullable │ Default │ Storage │ Compression │ 
> Stats target │ Description 
> ┼─┼───┼──┼─┼─┼─┼──┼─
>  i  │ integer │   │ not null │ │ plain   │ │  
> │ 
> Partition of: t DEFAULT
> No partition constraint
> Indexes:
> "t1_pkey" PRIMARY KEY, btree (i)
> Access method: heap
> 
> which I think is what you want.  (Do you really want the partition to be
> created without the primary key already there?)

Why not ?  The PK will be added when I attach it one moment later.

CREATE TABLE part (LIKE parent);
ALTER TABLE parent ATTACH PARTITION part ...

Do you really think that after ATTACH, the constraints should be
different depending on whether the child was created INCLUDING INDEXES ?
I'll continue to think about this, but I still find that surprising.

-- 
Justin

Re: Removing unneeded self joins

2024-05-06 Thread Andrei Lepikhov


On 6/5/2024 21:44, Robert Haas wrote:

On Sat, May 4, 2024 at 10:46 PM Andrei Lepikhov
 wrote:

Having no objective negative feedback, we have no reason to change
anything in the design or any part of the code. It looks regrettable and
unusual.


To me, this sounds like you think it's someone else's job to tell you
what is wrong with the patch, or how to fix it, and if they don't,
then you should get to have the patch as part of PostgreSQL. But that
is not how we do things, nor should we. I agree that it sucks when you
need feedback and don't get it, and I've written about that elsewhere
and recently. But if you don't get feedback and as a result you can't
get the patch to an acceptable level, 

I'm really sorry that the level of my language caused a misunderstanding.
The main purpose of this work is to form a more or less certain view of 
the direction of the planner's development.
Right now, it evolves extensively - new structures, variables, 
alternative copies of the same node trees with slightly changed 
properties ... This way allows us to quickly introduce some planning 
features (a lot of changes in planner logic since PG16 is evidence of 
that) and with still growing computing resources it allows postgres to 
fit RAM and proper planning time. But maybe we want to be more modest? 
The Ashutosh's work he has been doing this year shows how sometimes 
expensive the planner is. Perhaps we want machinery that will check the 
integrity of planning data except the setrefs, which fail to detect that 
occasionally?
If an extensive approach is the only viable option, then it's clear that 
this and many other features are simply not suitable for Postgres 
Planner. It's disheartening that this patch didn't elicit such 
high-level feedback.


--
regards,
Andrei Lepikhov

Re: pg17 issues with not-null contraints

2024-05-06 Thread Alvaro Herrera

On 2024-May-04, Alvaro Herrera wrote:

> On 2024-May-03, Justin Pryzby wrote:
> 
> > But if it's created with LIKE:
> > postgres=# CREATE TABLE t1 (LIKE t);
> > postgres=# ALTER TABLE t ATTACH PARTITION t1 DEFAULT ;
> > 
> > ..one also sees:
> > 
> > Not-null constraints:
> > "t1_i_not_null" NOT NULL "i"
> 
> Hmm, I think the problem here is not ATTACH; the not-null constraint is
> there immediately after CREATE.  I think this is all right actually,
> because we derive a not-null constraint from the primary key and this is
> definitely intentional.  But I also think that if you do CREATE TABLE t1
> (LIKE t INCLUDING CONSTRAINTS) then you should get only the primary key
> and no separate not-null constraint.  That will make the table more
> similar to the one being copied.

I misspoke -- it's INCLUDING INDEXES that we need here, not INCLUDING
CONSTRAINTS ... and it turns out we already do it that way, so with this
script

CREATE TABLE t (i int PRIMARY KEY) PARTITION BY RANGE (i);
CREATE TABLE t1 (LIKE t INCLUDING INDEXES);
ALTER TABLE t ATTACH PARTITION t1 DEFAULT ;

you end up with this

55432 17devel 71313=# \d+ t
  Partitioned table "public.t"
 Column │  Type   │ Collation │ Nullable │ Default │ Storage │ Compression │ 
Stats target │ Description 
┼─┼───┼──┼─┼─┼─┼──┼─
 i  │ integer │   │ not null │ │ plain   │ │
  │ 
Partition key: RANGE (i)
Indexes:
"t_pkey" PRIMARY KEY, btree (i)
Partitions: t1 DEFAULT

55432 17devel 71313=# \d+ t1
   Table "public.t1"
 Column │  Type   │ Collation │ Nullable │ Default │ Storage │ Compression │ 
Stats target │ Description 
┼─┼───┼──┼─┼─┼─┼──┼─
 i  │ integer │   │ not null │ │ plain   │ │
  │ 
Partition of: t DEFAULT
No partition constraint
Indexes:
"t1_pkey" PRIMARY KEY, btree (i)
Access method: heap

which I think is what you want.  (Do you really want the partition to be
created without the primary key already there?)

Now maybe in https://www.postgresql.org/docs/devel/sql-createtable.html
we need some explanation for this.  Right now we have

 INCLUDING INDEXES 
Indexes, PRIMARY KEY, UNIQUE, and EXCLUDE constraints on the original table
will be created on the new table. Names for the new indexes and constraints
are chosen according to the default rules, regardless of how the originals 
were
named. (This behavior avoids possible duplicate-name failures for the new
indexes.)

Maybe something like this before the naming considerations:
When creating a table like another that has a primary key and indexes
are excluded, a not-null constraint will be added to every column of
the primary key.

resulting in

 INCLUDING INDEXES 
Indexes, PRIMARY KEY, UNIQUE, and EXCLUDE constraints on the original table
will be created on the new table.  [When/If ?] indexes are excluded while
creating a table like another that has a primary key, a not-null
constraint will be added to every column of the primary key.

Names for the new indexes and constraints are chosen according to
the default rules, regardless of how the originals were named. (This
behavior avoids possible duplicate-name failures for the new
indexes.)

What do you think?

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica

2024-05-06 Thread m . litsarev


simple there are already too many of them.  Perhaps we should begin
tracking all that as a set of bitmasks, then plug in the tracked state
in shmem for consumption in some SQL function.


Hi!

Michael, Tristan
as a first step I have introduced the `SharedRecoveryDataFlags` bitmask 
instead of three boolean SharedHotStandbyActive, 
SharedPromoteIsTriggered and SharedStandbyModeRequested flags (the last 
one from my previous patch) and made minimal updates in corresponding 
code based on that change.


Respectfully,

Mikhail Litsarev
Postgres Professional: https://postgrespro.com


From e05e86baf90a14efec29b0efcb5696504796af9b Mon Sep 17 00:00:00 2001
From: Mikhail Litsarev 
Date: Mon, 6 May 2024 15:23:52 +0300
Subject: [PATCH 1/2] Introduce pg_is_standby_requested().

Introduce a bit array SharedRecoveryDataFlags and
pg_is_standby_requested() function to distinguish
a replica from a regular instance in point in time recovery mode.
---
 src/backend/access/transam/xlogfuncs.c| 14 +
 src/backend/access/transam/xlogrecovery.c | 75 ++-
 src/include/access/xlogrecovery.h |  1 +
 src/include/catalog/pg_proc.dat   |  4 ++
 4 files changed, 79 insertions(+), 15 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 92bdb17ed52..3ce934fcedb 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -747,3 +747,17 @@ pg_promote(PG_FUNCTION_ARGS)
 		   wait_seconds)));
 	PG_RETURN_BOOL(false);
 }
+
+/*
+ * Returns bool with a current value of StandbyModeIsRequested flag
+ * to distinguish a replica from a regular instance in a
+ * Point In Time Recovery (PITR) mode.
+ *
+ * Returns	true	if standby.signal file is found at startup process
+ * 			false	otherwise
+ */
+Datum
+pg_is_standby_requested(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_BOOL(StandbyModeIsRequested());
+}
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index 29c5bec0847..5337c4974f0 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -170,13 +170,13 @@ static XLogRecPtr RedoStartLSN = InvalidXLogRecPtr;
 static TimeLineID RedoStartTLI = 0;
 
 /*
- * Local copy of SharedHotStandbyActive variable. False actually means "not
+ * Local copy of XLR_HOT_STANDBY_ACTIVE flag. False actually means "not
  * known, need to check the shared state".
  */
 static bool LocalHotStandbyActive = false;
 
 /*
- * Local copy of SharedPromoteIsTriggered variable. False actually means "not
+ * Local copy of XLR_PROMOTE_IS_TRIGGERED flag. False actually means "not
  * known, need to check the shared state".
  */
 static bool LocalPromoteIsTriggered = false;
@@ -298,22 +298,42 @@ static char *replay_image_masked = NULL;
 static char *primary_image_masked = NULL;
 
 
+/*
+ * The flag indicates if we allow hot standby queries to be
+ * run.  Protected by info_lck.
+ */
+#define	XLR_HOT_STANDBY_ACTIVE		0x01
+/*
+ * The flag indicates if a standby promotion has been
+ * triggered.  Protected by info_lck.
+ */
+#define	XLR_PROMOTE_IS_TRIGGERED	0x02
+/*
+ * The flag indicates if we're in a standby mode at
+ * start, while recovery mode is on.
+ */
+#define	XLR_STANDBY_MODE_REQUESTED	0x04
+
 /*
  * Shared-memory state for WAL recovery.
  */
 typedef struct XLogRecoveryCtlData
 {
 	/*
-	 * SharedHotStandbyActive indicates if we allow hot standby queries to be
-	 * run.  Protected by info_lck.
-	 */
-	bool		SharedHotStandbyActive;
-
-	/*
-	 * SharedPromoteIsTriggered indicates if a standby promotion has been
-	 * triggered.  Protected by info_lck.
+	 * This bit array is introduced to keep the following states:
+	 *
+	 *	--	XLR_HOT_STANDBY_ACTIVE indicates if we allow hot standby queries
+	 *		to be run. Protected by info_lck.
+	 *
+	 *	--	XLR_PROMOTE_IS_TRIGGERED indicates if a standby promotion
+	 *		has been triggered. Protected by info_lck.
+	 *
+	 * 	--	XLR_STANDBY_MODE_REQUESTED indicates if we're in a standby mode
+	 *		at start, while recovery mode is on. No info_lck protection.
+	 *
+	 *	and can be extended in future.
 	 */
-	bool		SharedPromoteIsTriggered;
+	uint32		SharedRecoveryDataFlags;
 
 	/*
 	 * recoveryWakeupLatch is used to wake up the startup process to continue
@@ -1082,10 +1102,17 @@ readRecoverySignalFile(void)
 
 	StandbyModeRequested = false;
 	ArchiveRecoveryRequested = false;
+	/*
+	 * There is no need to use Spinlock here because only the startup
+	 * process modifies the SharedRecoveryDataFlags bit here and
+	 * no other processes are reading it at that time.
+	 */
+	XLogRecoveryCtl->SharedRecoveryDataFlags &= ~XLR_STANDBY_MODE_REQUESTED;
 	if (standby_signal_file_found)
 	{
 		StandbyModeRequested = true;
 		ArchiveRecoveryRequested = true;
+		XLogRecoveryCtl->SharedRecoveryDataFlags |= XLR_STANDBY_MODE_REQUESTED;
 	}
 	else if (recovery_signal_file_found)
 	{
@@ -2259,7 +2286,7 @@ CheckRecoveryConsistency(void)

Re: Removing unneeded self joins

2024-05-06 Thread Tom Lane

Robert Haas  writes:
> I want to go on record right now as disagreeing with the plan proposed
> in the commit message for the revert commit, namely, committing this
> again early in the v18 cycle. I don't think Tom would have proposed
> reverting this feature unless he believed that it had more serious
> problems than could be easily fixed in a short period of time. I think
> that concern is well-founded, given the number of fixes that were
> committed. It seems likely that the patch needs significant rework and
> stabilization before it gets committed again, and I think it shouldn't
> be committed again without explicit agreement from Tom or one of the
> other committers who have significant experience with the query
> planner.

FWIW I accept some of the blame here, for not having paid any
attention to the SJE work earlier.  I had other things on my mind
for most of last year, and not enough bandwidth to help.

The main thing I'd like to understand before we try this again is
why SJE needed so much new query-tree-manipulation infrastructure.
I would have expected it to be very similar to the left-join
elimination we do already, and therefore to mostly just share the
existing infrastructure.  (I also harbor suspicions that some of
the new code existed just because someone didn't research what
was already there --- for instance, the now-removed replace_varno
sure looks like ChangeVarNodes should have been used instead.)

Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
result relation).  While I don't claim that that destroyed the entire
use case for SJE, it certainly knocked its usefulness down by many
notches, maybe even to the point where it's not worth putting in the
effort needed to get it to re-committability.  So I think we need to
look harder at finding a way around that.  Is the concern that
RETURNING should return either old or new values depending on which
RTE is mentioned?  If so, maybe the feature Dean has proposed to
allow RETURNING to access old values [1] is a prerequisite to moving
forward.  Alternatively, perhaps it'd be good enough to forbid SJE
only when the non-target relation is actually mentioned in RETURNING.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50agoe+pv...@mail.gmail.com

Re: Removing unneeded self joins

2024-05-06 Thread Bruce Momjian

On Mon, May  6, 2024 at 10:44:33AM -0400, Robert Haas wrote:
> I want to go on record right now as disagreeing with the plan proposed
> in the commit message for the revert commit, namely, committing this
> again early in the v18 cycle. I don't think Tom would have proposed
> reverting this feature unless he believed that it had more serious
> problems than could be easily fixed in a short period of time. I think
> that concern is well-founded, given the number of fixes that were
> committed. It seems likely that the patch needs significant rework and
> stabilization before it gets committed again, and I think it shouldn't
> be committed again without explicit agreement from Tom or one of the
> other committers who have significant experience with the query
> planner. That is not to say that I don't approve generally of the idea
> of committing things earlier in the release cycle: I certainly do. It
> gives us more time to shake out problems with patches before we ship.
> But it only makes sense if we collectively believe that the patch is
> mostly correct, and only needs fine-tuning, and I think there are good
> reasons to believe that we shouldn't have that level of confidence in
> this case.

I think what Robert is saying is that it is an unacceptable plan to just
dump the code into PG 18 and clean it up in the following months --- it
needs more research before it is re-added to git.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Restrict EXPLAIN (ANALYZE) for RLS and security_barrier views

2024-05-06 Thread Laurenz Albe

Currently, it is pretty easy to subvert the restrictions imposed
by row-level security and security_barrier views.  All you have to
to is use EXPLAIN (ANALYZE) and see how many rows were filtered
out by the RLS policy or the view condition.

This is not considered a security bug (I asked), but I still think
it should be fixed.

My idea is to forbid EXPLAIN (ANALYZE) for ordinary users whenever
a statement uses either of these features.  But restricting it to
superusers would be too restrictive (with a superuser, you can never
observe RLS, since superusers are exempt) and it would also be
dangerous (you shouldn't perform DML on untrusted tables as superuser).

So I thought we could restrict the use of EXPLAIN (ANALYZE) in these
situations to the members of a predefined role.  That could be a new
predefined role, but I think it might as well be "pg_read_all_stats",
since that role allows you to view sensitive data like the MCV in
pg_statistic, and EXPLAIN (ANALYZE) can be seen as provideing executor
statistics.

Attached is a POC patch that implements that (documentation and
regression tests are still missing) to form a basis for a discussion.

There are a few things I would like feedback about:

- is it OK to use "pg_read_all_stats" for that?

- should the check be moved to standard_ExplainOneQuery()?

Yours,
Laurenz Albe
From f31ee5919a9d30f51ff9d54adc7397cb98dfa370 Mon Sep 17 00:00:00 2001
From: Laurenz Albe 
Date: Mon, 6 May 2024 12:43:02 +0200
Subject: [PATCH v1] Restrict EXPLAIN (ANALYZE) on security relevant statements

Using EXPLAIN (ANALYZE), it is easy to work around restrictions
imposed by row-level security or security barrier views.
This is not considered a security bug, but we ought to do better.
Restricting the use of EXPLAIN (ANALYZE) to superusers in such
cases would be too much, and superusers bypass row-level security,
so that would leave no way to debug such statements.

Consequently, restrict EXPLAIN (ANALYZE) on statements that
involve security_barrier views or row-level security to members
of the predefined role pg_read_all_stats.
---
 src/backend/commands/explain.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c0c73aa3c9..85782e614e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -14,6 +14,7 @@
 #include "postgres.h"
 
 #include "access/xact.h"
+#include "catalog/pg_authid_d.h"
 #include "catalog/pg_type.h"
 #include "commands/createas.h"
 #include "commands/defrem.h"
@@ -21,6 +22,7 @@
 #include "foreign/fdwapi.h"
 #include "jit/jit.h"
 #include "libpq/pqformat.h"
+#include "miscadmin.h"
 #include "nodes/extensible.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -29,6 +31,7 @@
 #include "rewrite/rewriteHandler.h"
 #include "storage/bufmgr.h"
 #include "tcop/tcopprot.h"
+#include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/guc_tables.h"
 #include "utils/json.h"
@@ -431,6 +434,36 @@ ExplainOneQuery(Query *query, int cursorOptions,
 		return;
 	}
 
+	/*
+	 * Since EXPLAIN (ANALYZE) shows data like the number of rows removed by a
+	 * filter, it can be used to work around security restrictions that hide
+	 * table data from the user, such as security barrier views and row-level
+	 * security.  Only members of pg_read_all_stats and superusers can see such
+	 * statistics.  The check is here rather than in standard_ExplainOneQuery
+	 * to keep plugins from inadvertently subverting security.
+	 */
+	if (es->analyze &&
+		!has_privs_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+	{
+		ListCell   *rtable;
+
+		foreach(rtable, query->rtable)
+		{
+			RangeTblEntry *rte = lfirst(rtable);
+			if (rte->security_barrier)
+ereport(ERROR,
+		(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+		 errmsg("permission denied to use EXPLAIN (ANALYZE) with security_barrier views"),
+		 errdetail("Only members of pg_read_all_stats may see statement execution statistics.")));
+		}
+
+		if (query->hasRowSecurity)
+			ereport(ERROR,
+	(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+	 errmsg("permission denied to use EXPLAIN (ANALYZE) with row-level security"),
+	 errdetail("Only members of pg_read_all_stats may see statement execution statistics.")));
+	}
+
 	/* if an advisor plugin is present, let it manage things */
 	if (ExplainOneQuery_hook)
 		(*ExplainOneQuery_hook) (query, cursorOptions, into, es,
-- 
2.45.0

Re: Removing unneeded self joins

2024-05-06 Thread Robert Haas

On Sat, May 4, 2024 at 10:46 PM Andrei Lepikhov
 wrote:
> Having no objective negative feedback, we have no reason to change
> anything in the design or any part of the code. It looks regrettable and
> unusual.

To me, this sounds like you think it's someone else's job to tell you
what is wrong with the patch, or how to fix it, and if they don't,
then you should get to have the patch as part of PostgreSQL. But that
is not how we do things, nor should we. I agree that it sucks when you
need feedback and don't get it, and I've written about that elsewhere
and recently. But if you don't get feedback and as a result you can't
get the patch to an acceptable level, or if you do get feedback but
the patch fails to reach an acceptable level anyway, then the only
correct decision is for us to not ship that code. That obviously sucks
from the point of view of the patch author, and also of the committer,
but consider the alternative. Once patches get through an initial
release and become part of the product, the responsibility for fixing
problems is understood to slowly move from the original committer to
the community as a whole. In practice, that means that a lot of the
work of fixing things that are broken, after some initial period, ends
up falling on committers other than the person who did the initial
commit. Even one or two problematic commits can generate an enormous
amount of work for people who weren't involved in the original
development and may not even have agreed with the development
direction, and it is more than fair for those people to express a view
about whether they are willing to carry that burden or not. When they
aren't, I do think that's regrettable, but I don't think it's unusual.
Just in this release, we've removed at least two previously-released
features because they're in bad shape and nobody's willing to maintain
them (snapshot too old, AIX support).

> After designing the feature, fixing its bugs, and reviewing joint
> patches on the commitfest, the question more likely lies in the planner
> design. For example, I wonder if anyone here knows why exactly the
> optimiser makes a copy of the whole query subtree in some places.
> Another example is PlannerInfo. Can we really control all the
> consequences of introducing, let's say, a new JoinDomain entity?

Bluntly, if you can't control those consequences, then you aren't
allowed to make that change.

I know first-hand how difficult some of these problems are. Sometime
in the last year or three, I spent weeks getting rid of ONE global
variable (ThisTimeLineID). It took an absolutely inordinate amount of
time, and it became clear to me that I was never going to get rid of
enough global variables in that part of the code to be able to write a
patch for the feature I wanted without risk of unforeseen
consequences. So I gave up on the entire feature. Maybe I'll try again
at some point, or maybe somebody else will feel like cleaning up that
code and then I can try again with a cleaner base, but what I don't
get to do is write a buggy patch for the feature I want and commit it
anyway. I either figure out a way to do it that I believe is low-risk
and that the community judges to be acceptable, or I don't do it.

I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner. That is not to say that I don't approve generally of the idea
of committing things earlier in the release cycle: I certainly do. It
gives us more time to shake out problems with patches before we ship.
But it only makes sense if we collectively believe that the patch is
mostly correct, and only needs fine-tuning, and I think there are good
reasons to believe that we shouldn't have that level of confidence in
this case.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Help update PostgreSQL 13.12 to 13.14

2024-05-06 Thread Bruce Momjian



This seven email thread should never have appeared on the hackers list. 
It is more appropriate for the pgsql-gene...@postgresql.org email list.

---

On Mon, May  6, 2024 at 08:53:15AM +0500, Kashif Zeeshan wrote:
> Hi
> 
> Upgrade works when you have an existing Postgres installation with server
> running.
> If you run the following command then it will upgrade the
> existing installation of postgre.
> sudo dnf install -y postgresql13-server
> 
> But you don't need to execute the below commands as this will create data
> directory and start the server on it. 
> sudo /usr/pgsql-13/bin/postgresql-13-setup initdb
> sudo systemctl enable postgresql-13
> sudo systemctl start postgresql-13
> 
> You just need to install the version of postgres you need and it will upgrade
> the existing installation and you just need to restart the server.
> sudo systemctl restart postgresql-13
> 
> The following commands you mentioned are going to setup the repos for postgres
> which will used to download and install postgres packages.
> # Install the RPM from the repository:
> sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/
> EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
> But you done need to disable it as this will disable the repo you installed
> above.
> sudo dnf -qy module disables postgresql
> 
> 
> Regards
> Kashif Zeeshan
> Bitnine Global
> 
> On Fri, May 3, 2024 at 7:28 PM •Isaac Rv  wrote:
> 
> Hola, estos son los pasos que me dan
> 
> Pero esto es solamente para instalar el Yum? O instala una instancia nueva
> de PostgreSQL?
> Y que pasa si ya esta instalado el yum pero mal configurado cómo bien
> dices?
> 
> # Instalar el RPM del repositorio:
> sudo dnf install -y 
> https://download.postgresql.org/pub/repos/yum/reporpms/
> EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
> 
> sudo dnf -qy módulo deshabilita postgresql
> 
> sudo dnf install -y postgresql13-servidor
> 
> sudo /usr/pgsql-13/bin/postgresql-13-setup initdb
> sudo systemctl habilitar postgresql-13
> sudo systemctl iniciar postgresql-13
> 
> 
> Quedo atento
> 
> 
> Saludos 
> 
> El lun, 29 abr 2024 a las 21:59, Kashif Zeeshan 
> ()
> escribió:
> 
> 
> 
> On Mon, Apr 29, 2024 at 9:07 PM •Isaac Rv 
> wrote:
> 
> Ok entiendo sí, pero mi versión sigue en la 13.12 y necesito que
> sea 13.14, me indica que ya no tiene actualizaciones pero 
> realmente
> sí, ya no sé cómo actualizarla a la 13.14
> 
> 
> Hi
> 
> Please make sure that your postgres repository is set properly, that's
> the only reason that it's not finding V13.14. Please follow the link
> below.
> 
> https://www.postgresql.org/download/linux/redhat/
> 
> There is another way to avoid it by downloading the V13.14 on your
> system and then install this version on your system which will upgrade
> your existing installation.
> 
> Regards
> Kashif Zeeshan
> Bitnine Global
> 
> 
> El sáb, 27 abr 2024 a las 9:29, Kashif Zeeshan (<
> kashi.zees...@gmail.com>) escribió:
> 
> Glad to be of help.
> pg_uprade is used with major version upgrade e.g. from PG13 to
> 14 etc
> 
> Regards
> Kashif Zeeshan
> Bitnine Global
> 
> On Fri, Apr 26, 2024 at 10:47 PM •Isaac Rv <
> isaacrodr...@gmail.com> wrote:
> 
> Hola, lo acabo de hacer, quedó bien luego detuve el
> servidor, aplique otra vez el   sudo yum update
> postgresql13 y me devolvió otra vez el mensaje que ya no
> tiene más actualizaciones pendientes
> Veo que esta el pg_upgrade, pero no entiendo bien cómo
> usarlo
> 
> Saludos y muchas gracias
> 
> El vie, 26 abr 2024 a las 11:34, Kashif Zeeshan (<
> kashi.zees...@gmail.com>) escribió:
> 
> 
> 
> On Fri, Apr 26, 2024 at 9:22 PM •Isaac Rv <
> isaacrodr...@gmail.com> wrote:
> 
> Mira intente con el yum y si actualizó pero sin
> embargo no actualizo a la 13.14  
> 
> sudo yum update postgresql13
> Updating Subscription Management repositories.
> 
> This system is registered with an entitlement
> server, but is not receiving updates. You can use
> subscription-manager to assign subscriptions.
> 
> Last metadata expiration check: 0:07:02 ago on Fri
> 26 Apr 2024 10:01:36 AM CST.
>

Re: AIX support

2024-05-06 Thread Sriram RK

Hi Team, on further investigation we were able to resolve the perl issue by 
setting the right PERL env location. Earlier it was pointing to the 32bit perl, 
as a result the perl lib mismatch seems to be happening.
Now we have successfully built release 15 and 16 stable branches on the OSU lab 
node.

p9aix (OSU)
OS: AIX 72Z
RELEASE 16
p9aix:REL_16_STABLE [08:31:26] OK
 log passed to send_result ===
Branch: REL_16_STABLE
All stages succeeded

RELEASE 15
p9aix:REL_15_STABLE [08:55:37] OK
 log passed to send_result ===
Branch: REL_15_STABLE
All stages succeeded


Also, we had successfully built release 16 branch on our local nodes as well
OS: AIX 71C
pgsql-aix71C:REL_16_STABLE [02:25:32] OK
 log passed to send_result ===
Branch: REL_16_STABLE
All stages succeeded

OS: AIX72Z
pgsql-aix72Z:REL_16_STABLE [02:35:03] OK
 log passed to send_result ===
Branch: REL_16_STABLE
All stages succeeded

OS: AIX73D
pgsql-aix73D:REL_16_STABLE [05:32:29] OK
 log passed to send_result ===
Branch: REL_16_STABLE
All stages succeeded


Regards,
Sriram.

backend stuck in DataFileExtend

2024-05-06 Thread Justin Pryzby

In March, I noticed that a backend got stuck overnight doing:

backend_start| 2024-03-27 22:34:12.774195-07
xact_start   | 2024-03-27 22:34:39.741933-07
query_start  | 2024-03-27 22:34:41.997276-07
state_change | 2024-03-27 22:34:41.997307-07
wait_event_type  | IO
wait_event   | DataFileExtend
state| active
backend_xid  | 330896991
backend_xmin | 330896991
query_id | -3255287420265634540
query| PREPARE mml_1 AS INSERT INTO child.un...

The backend was spinning at 100% CPU:

[pryzbyj@telsa2021 ~]$ ps -O wchan,pcpu 7881
   PID WCHAN  %CPU S TTY  TIME COMMAND
  7881 ?  99.4 R ?08:14:55 postgres: telsasoft ts [local] INSERT

This was postgres 16 STABLE compiled at 14e991db8.

It's possible that this is a rare issue that we haven't hit before.
It's also possible this this is a recent regression.  We previously
compiled at b2c9936a7 without hitting any issue (possibly by chance).

I could neither strace the process nor attach a debugger.  They got
stuck attaching.  Maybe it's possible there's a kernel issue.  This is a
VM running centos7 / 3.10.0-1160.49.1.el7.x86_64.

$ awk '{print $14, $15}' /proc/7881/stat # usr / sys
229 3088448

When I tried to shut down postgres (hoping to finally be able to attach
a debugger), instead it got stuck:

$ ps -fu postgres
UID PID   PPID  C STIME TTY  TIME CMD
postgres   7881 119674 99 mar27 ?08:38:06 postgres: telsasoft ts 
[local] INSERT
postgres 119674  1  0 mar25 ?00:07:13 /usr/pgsql-16/bin/postgres -D 
/var/lib/pgsql/16/data/
postgres 119676 119674  0 mar25 ?00:00:11 postgres: logger 
postgres 119679 119674  0 mar25 ?00:11:56 postgres: checkpointer 

This first occurred on Mar 27, but I see today that it's now recurring
at a different customer:

backend_start   | 2024-05-05 22:19:17.009477-06
xact_start  | 2024-05-05 22:20:18.129305-06
query_start | 2024-05-05 22:20:19.409555-06
state_change| 2024-05-05 22:20:19.409556-06
pid | 18468
wait_event_type | IO
wait_event  | DataFileExtend
state   | active
backend_xid | 4236249136
backend_xmin| 4236221661
query_id| 2601062835886299431
left| PREPARE mml_1 AS INSERT INTO chil

This time it's running v16.2 (REL_16_STABLE compiled at b78fa8547),
under centos7 / 3.10.0-1160.66.1.el7.x86_64.

The symptoms are the same: backend stuck using 100% CPU in user mode:
[pryzbyj@telsasoft-centos7 ~]$ awk '{print $14, $15}' /proc/18468/stat # usr / 
sys
364 2541168

This seems to have affected other backends, which are now waiting on
RegisterSyncRequest, frozenid, and CheckpointerComm.

In both cases, the backend got stuck after 10pm, which is when a backup
job kicks off, followed by other DB maintenance.  Our backup job uses
pg_export_snapshot() + pg_dump --snapshot.  In today's case, the pg_dump
would've finished and snapshot closed at 2023-05-05 22:15.  The backup
job did some more pg_dumps involving historic tables without snapshots
and finished at 01:11:40, at which point a reindex job started, but it
looks like the DB was already stuck for the purpose of reindex, and so
the script ended after a handful of commands were "[canceled] due to
statement timeout".

Full disclosure: the VM that hit this issue today has had storage-level
errors (reported here at ZZqr_GTaHyuW7fLp@pryzbyj2023), as recently as 3
days ago.

Maybe more importantly, this VM also seems to suffer from some memory
leak, and the leaky process was Killed shortly after the stuck query
started.  This makes me suspect a race condition which was triggered
while swapping:

May  5 22:24:05 localhost kernel: Out of memory: Kill process 17157 (python3.8) 
score 134 or sacrifice child

We don't have as good logs from March, but I'm not aware of killed
processes on the VM where we hit this in March, but it's true that the
I/O there is not fast.

Also, I fibbed when I said these were compiled at 16 STABLE - I'd
backpatched a small number of patches from master:

a97bbe1f1df Reduce branches in heapgetpage()'s per-tuple loop
98f320eb2ef Increase default vacuum_buffer_usage_limit to 2MB.
44086b09753 Preliminary refactor of heap scanning functions
959b38d770b Invent --transaction-size option for pg_restore.
a45c78e3284 Rearrange pg_dump's handling of large objects for better efficiency.
9d1a5354f58 Fix costing bug in MergeAppend
a5cf808be55 Read include/exclude commands for dump/restore from file
8c16ad3b432 Allow using syncfs() in frontend utilities.
6cdeb32 Add support for syncfs() in frontend support functions.
3ed19567198 Make enum for sync methods available to frontend code.
f39b265808b Move PG_TEMP_FILE* macros to file_utils.h.
a14354cac0e Add GUC parameter "huge_pages_status"

I will need to restart services this morning, but if someone wants to
suggest diagnostic measures, I will see whether the command gets stuck
or not.

-- 
Justin

Re: partitioning and identity column

2024-05-06 Thread Dmitry Dolgov

> On Mon, May 06, 2024 at 06:52:41PM +0530, Ashutosh Bapat wrote:
> On Sun, May 5, 2024 at 1:43 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > I had a quick look, it covers the issues mentioned above in the thread.
> > Few nitpicks/questions:
> >
> > * I think it makes sense to verify if the ptup is valid. This approach
> >   would fail if the target column of the root partition is marked as
> >   attisdropped.
> >
>
> The column is searched by name which is derived from attno of child
> partition. So it has to exist in the root partition. If it doesn't
> something is seriously wrong. Do you have a reproducer? We may want to add
> Assert(HeapTupleIsValid(ptup)) just in case. But it seems unnecessary to me.

Sure, normally it should work. I don't have any particular situation in
mind, when attisdropped might be set on a root partition, but obviously
setting it manually crashes this path. Consider it mostly as suggestion
for a more defensive implementation "just in case".

> >  Oid
> > -getIdentitySequence(Oid relid, AttrNumber attnum, bool missing_ok)
> > +getIdentitySequence(Relation rel, AttrNumber attnum, bool missing_ok)
> >  {
> >
> > [...]
> >
> > +   relid = llast_oid(ancestors);
> > +   ptup = SearchSysCacheAttName(relid, attname);
> > +   attnum = ((Form_pg_attribute) GETSTRUCT(ptup))->attnum;
> >
> > * getIdentitySequence is used in build_column_default, which in turn
> >   often appears in loops over table attributes. AFAICT it means that the
> >   same root partition search will be repeated multiple times in such
> >   situations if there is more than one identity. I assume the
> >   performance impact of this repetition is negligible?
> >
>
> I thought having multiple identity columns would be rare and hence avoided
> making code complex. Otherwise we have to get root partition somewhere in
> the caller hierarchy separately the logic much farther apart. Usually the
> ancestor entries will be somewhere in the cache

Yeah, agree, it's reasonable to expect that the case with multiple
identity columns will be rare.

Re: On disable_cost

2024-05-06 Thread Robert Haas

On Sat, May 4, 2024 at 12:57 PM Tom Lane  wrote:
> There is also some weirdness around needing to force use of tidscan
> if we have WHERE CURRENT OF.  But perhaps a different hack could be
> used for that.

Yeah, figuring out what to do about this was the trickiest part of the
experimental patch that I wrote last week. The idea of the current
code is that cost_qual_eval_walker charges disable_cost for
CurrentOfExpr, but cost_tidscan then subtracts disable_cost if
tidquals contains a CurrentOfExpr, so that we effectively disable
everything except TID scan paths and, I think, also any TID scan paths
that don't use the CurrentOfExpr as a qual. I'm not entirely sure
whether the last can happen, but I imagine that it might be possible
if the cursor refers to a query that itself contains some other kind
of TID qual.

It's not very clear that this mechanism is actually 100% reliable,
because we know it's possible in general for the costs of two paths to
be different by more than disable_cost. Maybe that's not possible in
this specific context, though: I'm not sure.

The approach I took for my experimental patch was pretty grotty, and
probably not quite complete, but basically I defined the case where we
currently subtract out disable_cost as a "forced TID-scan". I passed
around a Boolean called forcedTidScan which gets set to true if we
discover that some plan is a forced TID-scan path, and then we discard
any other paths and then only add other forced TID-scan paths after
that point. There can be more than one, because of parameterization.

But I think that the right thing to do is probably to pull some of the
logic up out of create_tidscan_paths() and decide ONCE whether we're
in a forced TID-scan situation or not. If we are, then
set_plain_rel_pathlist() should arrange to create only forced TID-scan
paths; otherwise, it should proceed as it does now.

Maybe if I try to do that I'll find problems, but the current approach
seems backwards to me, like going to a restaurant and ordering one of
everything on the menu, then cancelling all of the orders except the
stuff you actually want.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Increase the length of identifers from 63 characters to 128 characters or more

2024-05-06 Thread Alvaro Herrera

On 2024-May-06, David G. Johnston wrote:

> On Monday, May 6, 2024, Peter Burbery  wrote:
> 
> >
> > Business Use-case: I want to create a table named things_that_take_up_a_
> > lot_of_storage_and_space_on_a_computer_and_hard_drive of 75 characters. I
> > also want to create a column named thing_that_takes_up_a_
> > lot_of_storage_and_space_on_a_computer_and_hard_drive_id of 78
> > characters. People have asked for this feature before.
>
> We have a mailing list archive.  You should do the same research of past
> requests and discussions on it since that is where you will find the
> developers involved in the discussion and can find out the past reasoning
> as to why this hasn’t been changed.

There's been a lot of discussion on this topic -- the most recent seems
to be [1] which will lead you to the older patches by John Naylor at [2].

In short, it's not that we're completely opposed to it, just that
somebody needs to do some more work in order to have a good
implementation for it.

[1] https://www.postgresql.org/message-id/324703.1696948627%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/flat/CALSd-crdmj9PGdvdioU%3Da5W7P%3DTgNmEB2QP9wiF6DTUbBuMXrQ%40mail.gmail.com

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)

Re: partitioning and identity column

2024-05-06 Thread Ashutosh Bapat

On Sun, May 5, 2024 at 1:43 AM Dmitry Dolgov <9erthali...@gmail.com> wrote:

> > On Tue, Apr 30, 2024 at 04:29:11PM +0530, Ashutosh Bapat wrote:
> > On Sun, Apr 28, 2024 at 12:29 PM Alexander Lakhin 
> > wrote:
> >
> > > 27.04.2024 18:00, Alexander Lakhin wrote:
> > > >
> > > > Please look also at another script, which produces the same error:
> > >
> > > I've discovered yet another problematic case:
> > > CREATE TABLE tbl1 (a int GENERATED ALWAYS AS IDENTITY, b text)
> > >  PARTITION BY LIST (a);
> > > CREATE TABLE tbl2 (b text, a int NOT NULL);
> > > ALTER TABLE tbl1 ATTACH PARTITION tbl2 DEFAULT;
> > >
> > > INSERT INTO tbl2 DEFAULT VALUES;
> > > ERROR:  no owned sequence found
> > >
> > > Though it works with tbl2(a int NOT NULL, b text)...
> > > Take a look at this too, please.
> > >
> >
> > Thanks Alexander for the report.
> >
> > PFA patch which fixes all the three problems.
> >
> > I had not fixed getIdentitySequence() to fetch identity sequence
> associated
> > with the partition because I thought it would be better to fail with an
> > error when it's not used correctly. But these bugs show 1. the error is
> > misleading and unwanted 2. there are more places where adding that logic
> > to  getIdentitySequence() makes sense. Fixed the function in these
> patches.
> > Now callers like transformAlterTableStmt have to be careful not to call
> the
> > function on a partition.
> >
> > I have examined all the callers of getIdentitySequence() and they seem to
> > be fine. The code related to SetIdentity, DropIdentity is not called for
> > partitions, errors for which are thrown elsewhere earlier.
>
> Thanks for the fix.
>
> I had a quick look, it covers the issues mentioned above in the thread.
> Few nitpicks/questions:
>
> * I think it makes sense to verify if the ptup is valid. This approach
>   would fail if the target column of the root partition is marked as
>   attisdropped.
>

The column is searched by name which is derived from attno of child
partition. So it has to exist in the root partition. If it doesn't
something is seriously wrong. Do you have a reproducer? We may want to add
Assert(HeapTupleIsValid(ptup)) just in case. But it seems unnecessary to me.


>
>  Oid
> -getIdentitySequence(Oid relid, AttrNumber attnum, bool missing_ok)
> +getIdentitySequence(Relation rel, AttrNumber attnum, bool missing_ok)
>  {
>
> [...]
>
> +   relid = llast_oid(ancestors);
> +   ptup = SearchSysCacheAttName(relid, attname);
> +   attnum = ((Form_pg_attribute) GETSTRUCT(ptup))->attnum;
>
> * getIdentitySequence is used in build_column_default, which in turn
>   often appears in loops over table attributes. AFAICT it means that the
>   same root partition search will be repeated multiple times in such
>   situations if there is more than one identity. I assume the
>   performance impact of this repetition is negligible?
>

I thought having multiple identity columns would be rare and hence avoided
making code complex. Otherwise we have to get root partition somewhere in
the caller hierarchy separately the logic much farther apart. Usually the
ancestor entries will be somewhere in the cache


>
> * Maybe a silly question, but since I'm not aware about all the details
>   here, I'm curious -- the approach of mapping attributes of a partition
>   to the root partition attributes, how robust is it? I guess there is
>   no way that the root partition column will be not what is expected,
>   e.g. due to some sort of concurrency?
>

Any such thing would require a lock on the partition relation in the
question which is locked before passing rel around? So it shouldn't happen.

-- 
Best Wishes,
Ashutosh Bapat

Re: apply_scanjoin_target_to_paths and partitionwise join

2024-05-06 Thread Ashutosh Bapat

On Mon, May 6, 2024 at 4:26 PM Jakub Wartak 
wrote:

> Hi Ashutosh & hackers,
>
> On Mon, Apr 15, 2024 at 9:00 AM Ashutosh Bapat
>  wrote:
> >
> > Here's patch with
> >
> [..]
> > Adding to the next commitfest but better to consider this for the next
> set of minor releases.
>
> 1. The patch does not pass cfbot -
> https://cirrus-ci.com/task/5486258451906560 on master due to test
> failure "not ok 206   + partition_join"
>

So I need to create a patch for master first. I thought CFBot somehow knew
that the patch was created for PG 15. :)


>
> 2. Without the patch applied, the result of the meson test on master
> was clean (no failures , so master is fine). After applying patch
> there were expected some hunk failures (as the patch was created for
> 15_STABLE):
>
> patching file src/backend/optimizer/plan/planner.c
> Hunk #1 succeeded at 7567 (offset 468 lines).
> Hunk #2 succeeded at 7593 (offset 468 lines).
> patching file src/test/regress/expected/partition_join.out
> Hunk #1 succeeded at 4777 (offset 56 lines).
> Hunk #2 succeeded at 4867 (offset 56 lines).
> patching file src/test/regress/sql/partition_join.sql
> Hunk #1 succeeded at 1136 (offset 1 line).
>
> 3. Without patch there is performance regression/bug on master (cost
> is higher with enable_partitionwise_join=on that without it):
>
> data preparation:
> -- Test the process_outer_partition() code path
> CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('',
> '0001', '0002');
> CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0003',
> '0004');
> INSERT INTO plt1_adv SELECT i, i, to_char(i % 5, 'FM') FROM
> generate_series(0, 24) i;
> ANALYZE plt1_adv;
>
> CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002');
> CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0003',
> '0004');
> INSERT INTO plt2_adv SELECT i, i, to_char(i % 5, 'FM') FROM
> generate_series(0, 24) i WHERE i % 5 IN (2, 3, 4);
> ANALYZE plt2_adv;
>
> CREATE TABLE plt3_adv (a int, b int, c text) PARTITION BY LIST (c);
> CREATE TABLE plt3_adv_p1 PARTITION OF plt3_adv FOR VALUES IN ('0001');
> CREATE TABLE plt3_adv_p2 PARTITION OF plt3_adv FOR VALUES IN ('0003',
> '0004');
> INSERT INTO plt3_adv SELECT i, i, to_char(i % 5, 'FM') FROM
> generate_series(0, 24) i WHERE i % 5 IN (1, 3, 4);
> ANALYZE plt3_adv;
>
> off:
> EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1
> LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c
> = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 !=
> 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
>   QUERY PLAN
>
> ---
>  Sort  (cost=22.02..22.58 rows=223 width=27)
>Sort Key: t1.c, t1.a, t2.a, t3.a
>->  Hash Full Join  (cost=4.83..13.33 rows=223 width=27)
> [..]
>
>
> with enable_partitionwise_join=ON (see the jump from cost 22.02 -> 27.65):
> EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1
> LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c
> = t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 !=
> 4 ORDER BY t1.c, t1.a, t2.a, t3.a;
>   QUERY PLAN
>
> ---
>  Sort  (cost=27.65..28.37 rows=289 width=27)
>Sort Key: t1.c, t1.a, t2.a, t3.a
>->  Append  (cost=2.23..15.83 rows=289 width=27)
>  ->  Hash Full Join  (cost=2.23..4.81 rows=41 width=27)
> [..]
>  ->  Hash Full Join  (cost=2.45..9.57 rows=248 width=27)
> [..]
>
>
> However with the patch applied the plan with minimal cost is always
> chosen ("22"):
>
> explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT
> JOIN
> plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
> coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
> t1.c, t1.a, t2.a, t3.a;
>   QUERY PLAN
>
> ---
>  Sort  (cost=22.02..22.58 rows=223 width=27)
>Sort Key: t1.c, t1.a, t2.a, t3.a
>->  Hash Full Join  (cost=4.83..13.33 rows=223 width=27)
> [..]
>
>
> set enable_partitionwise_join to on;
> explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT
> JOIN
> plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
> coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
> t1.c, t1.a, t2.a, t3.a;
>   QUERY PLAN
>
> ---
>  Sort  (cost=22.02..22.58 rows=223 width=27)
>Sort

Re: Increase the length of identifers from 63 characters to 128 characters or more

2024-05-06 Thread David G. Johnston

On Monday, May 6, 2024, Peter Burbery  wrote:

>
> Business Use-case: I want to create a table named things_that_take_up_a_
> lot_of_storage_and_space_on_a_computer_and_hard_drive of 75 characters. I
> also want to create a column named thing_that_takes_up_a_
> lot_of_storage_and_space_on_a_computer_and_hard_drive_id of 78
> characters. People have asked for this feature before.
>
>
We have a mailing list archive.  You should do the same research of past
requests and discussions on it since that is where you will find the
developers involved in the discussion and can find out the past reasoning
as to why this hasn’t been changed.

 David J.

Increase the length of identifers from 63 characters to 128 characters or more

2024-05-06 Thread Peter Burbery

One-line Summary: There are use cases for long table names so people might
use Oracle and MS SQL Server because Postgres does not support table names
longer than 63 characters, so the max character limit should be increased
to 128 or higher.

Business Use-case: I want to create a table
named 
things_that_take_up_a_lot_of_storage_and_space_on_a_computer_and_hard_drive
of 75 characters. I also want to create a column
named 
thing_that_takes_up_a_lot_of_storage_and_space_on_a_computer_and_hard_drive_id
of 78 characters. People have asked for this feature before. For more
details, please see and visit
https://www.reddit.com/r/PostgreSQL/comments/6kyyev/i_have_hit_the_table_name_length_limit_a_number/.
In particular, see
https://www.reddit.com/r/PostgreSQL/comments/6kyyev/i_have_hit_the_table_name_length_limit_a_number/#:~:text=been%20wondering%20the%20same
which states I've been wondering the same. It comes down to changing
NAMEDATALEN documented here:
https://www.postgresql.org/docs/9.3/static/runtime-config-preset.html and
someone made a patch that changes it to 256 characters:
https://gist.github.com/langner/5c7bc1d74a8b957cab26

The Postgres people seem to be resistant to changing this. I accept that
where possible one should keep identifiers short but descriptive but I
haven't seen a good reason WHY it shouldn't be changed since there are many
instances when remaining within 63 characters is quite difficult.

Sincerely,

Peter Burbery

Possible to include xid8 in logical replication

2024-05-06 Thread Dave Cramer

Greetings,

I've been asked by the debezium developers if it is possible to include
xid8 in the logical replication protocol. Are there any previous threads on
this topic?

Any reason why we wouldn't include the epoch ?

Dave Cramer

Re: On disable_cost

2024-05-06 Thread Robert Haas

On Sat, May 4, 2024 at 9:16 AM David Rowley  wrote:
> I don't think you'd need to wait longer than where we do set_cheapest
> and find no paths to find out that there's going to be a problem.

I'm confused by this response, because I thought that the main point
of my previous email was explaining why that's not true. I showed an
example where you do find paths at set_cheapest() time and yet are
unable to complete planning.

> I don't think redoing planning is going to be easy or even useful. I
> mean what do you change when you replan? You can't just do
> enable_seqscan and enable_nestloop as if there's no index to provide
> sorted input and the plan requires some sort, then you still can't
> produce a plan.  Adding enable_sort to the list does not give me much
> confidence we'll never fail to produce a plan either. It just seems
> impossible to know which of the disabled ones caused the RelOptInfo to
> have no paths.  Also, you might end up enabling one that caused the
> planner to do something different than it would do today.  For
> example, a Path that today incurs 2x disable_cost vs a Path that only
> receives 1x disable_cost might do something different if you just went
> and enabled a bunch of enable* GUCs before replanning.

I agree that there are problems here, both in terms of implementation
complexity and also in terms of what behavior you actually get, but I
do not think that a proposal which changes some current behavior
should be considered dead on arrival. Whatever new behavior we might
want to implement needs to make sense, and there need to be good
reasons for making whatever changes are contemplated, but I don't
think we should take the position that it has to be identical to
current.

> I think the int Path.disabledness idea is worth coding up to try it.
> I imagine that a Path will incur the maximum of its subpath's
> disabledness's then add_path() just needs to prefer lower-valued
> disabledness Paths.

It definitely needs to be sum, not max. Otherwise you can't get the
matest example from the regression tests right, where one child lacks
the ability to comply with the GUC setting.

> That doesn't get you the benefits of fewer CPU cycles, but where did
> that come from as a motive to change this? There's no shortage of
> other ways to make the planner faster if that's an issue.

Well, I don't agree with that at all. If there are lots of ways to
make the planner faster, we should definitely do a bunch of that
stuff, because "will slow down the planner too much" has been a
leading cause of proposed planner patches being rejected for as long
as I've been involved with the project. My belief was that we were
rather short of good ideas in that area, actually. But even if it's
true that we have lots of other ways to speed up the planner, that
doesn't mean that it wouldn't be good to do it here, too.

Stepping back a bit, my current view of this area is: disable_cost is
highly imperfect both as an idea and as implemented in PostgreSQL.
Although I'm discovering that the current implementation gets more
things right than I had realized, it also sometimes gets things wrong.
The original poster gave an example of that, and there are others.
Furthermore, the current implementation has some weird
inconsistencies. Therefore, I would like something better. Better, to
me, could mean any combination of (a) superior behavior, (b) superior
performance, and (c) simpler, more elegant code. In a perfect world,
we'd be able to come up with something that wins in all three of those
areas, but I'm not seeing a way to achieve that, so I'm trying to
figure out what is achievable. And because we need to reach consensus
on whatever is to be done, I'm sharing raw research results rather
than just dropping a completed patch. I don't think it's at all easy
to understand what the realistic possibilities are in this area;
certainly it isn't for me. At some point I'm hoping that there will be
a patch (or a bunch of patches) that we can all agree are an
improvement over now and the best we can reasonably do, but I don't
yet know what the shape of those will be, because I'm still trying to
understand (and document on-list) what all the problems are.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: apply_scanjoin_target_to_paths and partitionwise join

2024-05-06 Thread Jakub Wartak

Hi Ashutosh & hackers,

On Mon, Apr 15, 2024 at 9:00 AM Ashutosh Bapat
 wrote:
>
> Here's patch with
>
[..]
> Adding to the next commitfest but better to consider this for the next set of 
> minor releases.

1. The patch does not pass cfbot -
https://cirrus-ci.com/task/5486258451906560 on master due to test
failure "not ok 206   + partition_join"

2. Without the patch applied, the result of the meson test on master
was clean (no failures , so master is fine). After applying patch
there were expected some hunk failures (as the patch was created for
15_STABLE):

patching file src/backend/optimizer/plan/planner.c
Hunk #1 succeeded at 7567 (offset 468 lines).
Hunk #2 succeeded at 7593 (offset 468 lines).
patching file src/test/regress/expected/partition_join.out
Hunk #1 succeeded at 4777 (offset 56 lines).
Hunk #2 succeeded at 4867 (offset 56 lines).
patching file src/test/regress/sql/partition_join.sql
Hunk #1 succeeded at 1136 (offset 1 line).

3. Without patch there is performance regression/bug on master (cost
is higher with enable_partitionwise_join=on that without it):

data preparation:
-- Test the process_outer_partition() code path
CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('',
'0001', '0002');
CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0003', '0004');
INSERT INTO plt1_adv SELECT i, i, to_char(i % 5, 'FM') FROM
generate_series(0, 24) i;
ANALYZE plt1_adv;

CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002');
CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0003', '0004');
INSERT INTO plt2_adv SELECT i, i, to_char(i % 5, 'FM') FROM
generate_series(0, 24) i WHERE i % 5 IN (2, 3, 4);
ANALYZE plt2_adv;

CREATE TABLE plt3_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt3_adv_p1 PARTITION OF plt3_adv FOR VALUES IN ('0001');
CREATE TABLE plt3_adv_p2 PARTITION OF plt3_adv FOR VALUES IN ('0003', '0004');
INSERT INTO plt3_adv SELECT i, i, to_char(i % 5, 'FM') FROM
generate_series(0, 24) i WHERE i % 5 IN (1, 3, 4);
ANALYZE plt3_adv;

off:
EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1
LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c
= t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 !=
4 ORDER BY t1.c, t1.a, t2.a, t3.a;
  QUERY PLAN
---
 Sort  (cost=22.02..22.58 rows=223 width=27)
   Sort Key: t1.c, t1.a, t2.a, t3.a
   ->  Hash Full Join  (cost=4.83..13.33 rows=223 width=27)
[..]


with enable_partitionwise_join=ON (see the jump from cost 22.02 -> 27.65):
EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1
LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c
= t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 !=
4 ORDER BY t1.c, t1.a, t2.a, t3.a;
  QUERY PLAN
---
 Sort  (cost=27.65..28.37 rows=289 width=27)
   Sort Key: t1.c, t1.a, t2.a, t3.a
   ->  Append  (cost=2.23..15.83 rows=289 width=27)
 ->  Hash Full Join  (cost=2.23..4.81 rows=41 width=27)
[..]
 ->  Hash Full Join  (cost=2.45..9.57 rows=248 width=27)
[..]


However with the patch applied the plan with minimal cost is always
chosen ("22"):

explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN
plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
t1.c, t1.a, t2.a, t3.a;
  QUERY PLAN
---
 Sort  (cost=22.02..22.58 rows=223 width=27)
   Sort Key: t1.c, t1.a, t2.a, t3.a
   ->  Hash Full Join  (cost=4.83..13.33 rows=223 width=27)
[..]


set enable_partitionwise_join to on;
explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN
plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
t1.c, t1.a, t2.a, t3.a;
  QUERY PLAN
---
 Sort  (cost=22.02..22.58 rows=223 width=27)
   Sort Key: t1.c, t1.a, t2.a, t3.a
   ->  Hash Full Join  (cost=4.83..13.33 rows=223 width=27)
[..]

with the patch applied, the minimal cost (with toggle on or off) the
cost always stays the minimal from the available ones. We cannot
provide a reproducer for real performance regression, but for the
affected customer it took 530+s (with enable_partitionwise_join=on)
and without that GUC it it was

Re: Extension for PostgreSQL cast jsonb to hstore WIP

2024-05-06 Thread Антуан Виолин

On 2024-04-03 Wn 04:21, Andrew Dunstan
> I don't think a cast that doesn't cater for all the forms json can take
is going to work very well. At the very least you would need to error out
in cases you didn't want to cover, and have tests for all of > > those
errors. But the above is only a tiny fraction of those. If the error cases
are going to be so much more than the cases that work it seems a bit
pointless.
Hi everyone
I changed my mail account to be officially displayed in the correspondence.
I also made an error conclusion if we are given an incorrect value. I
believe that such a cast is needed by PostgreSQL since we already have
several incomplete casts, but they perform their duties well and help in
the right situations.

cheers

Antoine


cast_jsonb_to_hstore2.patch
Description: Binary data

Re: psql: fix variable existence tab completion

2024-05-06 Thread Alexander Korotkov

Hi, Anton!

On Mon, May 6, 2024 at 9:05 AM Anton A. Melnikov
 wrote:
> On 14.03.2024 17:57, Alexander Korotkov wrote:
> > On Sun, Mar 3, 2024 at 5:37 PM Erik Wienhold  wrote:
> >> On 2024-03-03 03:00 +0100, Steve Chavez wrote:
> >>> psql has the :{?name} syntax for testing a psql variable existence.
> >>>
> >>> But currently doing \echo :{?VERB doesn't trigger tab completion.
> >>>
> >>> This patch fixes it. I've also included a TAP test.
> >>
> >> Thanks.  The code looks good, all tests pass, and the tab completion
> >> works as expected when testing manually.
>
> I'm not sure if Debian 10 is actual for the current master. But, if this is 
> the case,
> i suggest a patch, since the test will not work under this OS.
> The thing is that, Debian 10 will backslash curly braces and the question 
> mark and
> TAB completion will lead to the string like that:
>
> \echo :\{\?VERBOSITY\}
>
> instead of expected:
>
> \echo :{?VERBOSITY}
>
> The patch attached fix the 010_tab_completion.pl test in the same way like 
> [1].

Thank you for the fix.  As I get, the fix teaches
010_tab_completion.pl to tolerate the invalid result of tab
completion.  Do you think we could fix it another way to make the
result of tab completion correct?

--
Regards,
Alexander Korotkov
Supabase

Re: wrong comment in libpq.h

2024-05-06 Thread Peter Eisentraut


On 04.05.24 00:29, David Zhang wrote:

On 2024-05-03 4:53 a.m., Daniel Gustafsson wrote:

On 3 May 2024, at 13:48, Peter Eisentraut  wrote:
Maybe it's easier if we just replaced

    prototypes for functions in xxx.c

with

    declarations for xxx.c

throughout src/include/libpq/libpq.h.

+1

+1


It looks like this wording "prototypes for functions in" is used many 
times in src/include/, in many cases equally inaccurately, so I would 
suggest creating a more comprehensive patch for this.

Re: TerminateOtherDBBackends code comments inconsistency.

2024-05-06 Thread Amit Kapila

On Tue, Apr 30, 2024 at 10:36 PM Noah Misch  wrote:
>
> >
> > >  One could argue the function should also check
> > > isBackgroundWorker and ignore even bgworkers that set proc->roleId, but 
> > > I've
> > > not done that.
> >
> > What is the argument for ignoring such workers?
>
> One of the proposed code comments says, "For bgworker authors, it's convenient
> to be able to recommend FORCE if a worker is blocking DROP DATABASE
> unexpectedly."  That argument is debatable, but I do think it applies equally
> to bgworkers whether or not they set proc->roleId.
>

Agreed.

-- 
With Regards,
Amit Kapila.

Test equivclass interferes with tests tsearch and advisory_lock

2024-05-06 Thread Alexander Lakhin


Hello hackers,

While trying to catch a sporadic regression test failure, I've discovered
that tests tsearch and advisory_lock in the parallel_schedule's group:
test: select_views ...  tsearch ... advisory_lock indirect_toast equivclass

might fail, depending on timing, because the test equivclass creates and
drops int alias operators.

With the attached patch applied to make tests in question repeatable, the
following test run fails for me:
(printf 'test: test_setup\n'; \
 printf 'test: tsearch equivclass %.0s\n' `seq 300`}) >/tmp/test_schedule;
make -s check-tests TESTS="--schedule=/tmp/test_schedule"
...
# 17 of 601 tests failed.

regression.diffs contains:
diff -U3 .../src/test/regress/expected/tsearch.out 
.../src/test/regress/results/tsearch.out
--- .../src/test/regress/expected/tsearch.out   2024-05-06 
08:11:54.892649407 +
+++ .../src/test/regress/results/tsearch.out    2024-05-06 
08:13:35.514113420 +
@@ -16,10 +16,7 @@
 WHERE prsnamespace = 0 OR prsstart = 0 OR prstoken = 0 OR prsend = 0 OR
   -- prsheadline is optional
   prslextype = 0;
- oid | prsname
--+-
-(0 rows)
-
+ERROR:  cache lookup failed for type 18517
 SELECT oid, dictname
...

Or with advisory_lock:
(printf 'test: test_setup\n'; \
 printf 'test: advisory_lock equivclass %.0s\n' `seq 300`}) >/tmp/test_schedule;
make -s check-tests TESTS="--schedule=/tmp/test_schedule"
...
# 1 of 601 tests failed.

regression.diffs contains:
diff -U3 .../src/test/regress/expected/advisory_lock.out 
.../src/test/regress/results/advisory_lock.out
--- .../src/test/regress/expected/advisory_lock.out 2024-04-10 
14:36:57.709586678 +
+++ .../src/test/regress/results/advisory_lock.out  2024-05-06 
08:15:09.235456794 +
@@ -14,40 +14,17 @@
 SELECT locktype, classid, objid, objsubid, mode, granted
    FROM pg_locks WHERE locktype = 'advisory' AND database = :datoid
    ORDER BY classid, objid, objsubid;
- locktype | classid | objid | objsubid | mode  | granted
---+-+---+--+---+-
- advisory |   0 | 1 |    1 | ExclusiveLock | t
- advisory |   0 | 2 |    1 | ShareLock | t
- advisory |   1 | 1 |    2 | ExclusiveLock | t
- advisory |   2 | 2 |    2 | ShareLock | t
-(4 rows)
-
+ERROR:  cache lookup failed for type 17976
 -- pg_advisory_unlock_all() shouldn't release xact locks
...

With backtrace_functions = 'getBaseTypeAndTypmod' specified,
I see the following stack trace of the error:
2024-05-06 08:30:51.344 UTC client backend[869479] pg_regress/tsearch ERROR:  
cache lookup failed for type 18041
2024-05-06 08:30:51.344 UTC client backend[869479] pg_regress/tsearch BACKTRACE:
getBaseTypeAndTypmod at lsyscache.c:2550:4
getBaseType at lsyscache.c:2526:1
find_coercion_pathway at parse_coerce.c:3131:18
can_coerce_type at parse_coerce.c:598:6
func_match_argtypes at parse_func.c:939:6
oper_select_candidate at parse_oper.c:327:5
oper at parse_oper.c:428:6
make_op at parse_oper.c:696:30
transformBoolExpr at parse_expr.c:1431:9
 (inlined by) transformExprRecurse at parse_expr.c:226:13
transformExpr at parse_expr.c:133:22
transformWhereClause at parse_clause.c:1867:1
transformSelectStmt at analyze.c:1382:20
 (inlined by) transformStmt at analyze.c:368:15
parse_analyze_fixedparams at analyze.c:110:15
pg_analyze_and_rewrite_fixedparams at postgres.c:691:5
 (inlined by) exec_simple_query at postgres.c:1190:20
PostgresMain at postgres.c:4680:7
BackendMain at backend_startup.c:61:2

Best regards,
Alexanderdiff --git a/src/test/regress/expected/equivclass.out b/src/test/regress/expected/equivclass.out
index 3d5de28354..2605554380 100644
--- a/src/test/regress/expected/equivclass.out
+++ b/src/test/regress/expected/equivclass.out
@@ -1,3 +1,7 @@
+SET client_min_messages TO 'warning';
+DROP TYPE IF EXISTS int8alias1, int8alias2 CASCADE;
+DROP TABLE IF EXISTS ec0, ec1, ec2 CASCADE;
+RESET client_min_messages;
 --
 -- Tests for the planner's "equivalence class" mechanism
 --
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index cfa391ac51..6839be5f37 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -1,3 +1,6 @@
+SET client_min_messages TO 'warning';
+DROP TABLE IF EXISTS test_tsvector, test_tsquery CASCADE;
+RESET client_min_messages;
 -- directory paths are passed to us in environment variables
 \getenv abs_srcdir PG_ABS_SRCDIR
 --
diff --git a/src/test/regress/sql/equivclass.sql b/src/test/regress/sql/equivclass.sql
index 77dd964ebf..fec4a416c8 100644
--- a/src/test/regress/sql/equivclass.sql
+++ b/src/test/regress/sql/equivclass.sql
@@ -1,3 +1,7 @@
+SET client_min_messages TO 'warning';
+DROP TYPE IF EXISTS int8alias1, int8alias2 CASCADE;
+DROP TABLE IF EXISTS ec0, ec1, ec2 CASCADE;
+RESET client_min_messages;
 --
 -- Tests for the planner's "equivalence class" mechanism
 --
diff --git a/src/test/regress/sql/tsearch.sql

Re: UniqueKey v2

2024-05-06 Thread Andy Fan



Hello Antonin,

Thanks for interesting with this topic!

> * I think that, before EC is considered suitable for an UK, its ec_opfamilies
>   field needs to be checked. I try to do that in
>   find_ec_position_matching_expr(), see 0004.

Could you make the reason clearer for adding 'List *opfamily_lists;'
into UniqueKey?  You said "This is needed to create ECs in the parent
query if the upper relation represents a subquery." but I didn't get the
it. Since we need to maintain the UniqueKey in the many places, I'd like
to keep it as simple as possbile. Of course, anything essentical should
be added for sure. 

>
> * Set-returning functions (SRFs) can make a "distinct path" necessary even if
>   the join output is unique.

You are right at this point, I will fix it in the coming version.

>
> * RelOptInfo.notnullattrs
>
>   My understanding is that this field contains the set attributes whose
>   uniqueness is guaranteed by the unique key. They are acceptable because they
>   are either 1) not allowed to be NULL due to NOT NULL constraint or 2) NULL
>   value makes the containing row filtered out, so the row cannot break
>   uniqueness of the output. Am I right?
>
>   If so, I suggest to change the field name to something less generic, maybe
>   'uniquekey_attrs' or 'uniquekey_candidate_attrs', and adding a comment that
>   more checks are needed before particular attribute can actually be used in
>   UniqueKey.

I don't think so, UniqueKey is just one of the places to use this
not-null property, see 3af704098 for the another user case of it. 

(Because of 3af704098, we should leverage notnullattnums somehow in this
patch, which will be included in the next version as well).

> * add_uniquekey_for_uniqueindex()
>
>   I'd appreciate an explanation why the "single-row UK" is created. I think
>   the reason for unique_exprs==NIL is that a restriction clause VAR=CONST
>   exists for each column of the unique index. Whether I'm right or not, a
>   comment should state clearly what the reason is.

You are understanding it correctly. I will add comments in the next
version.

>
> * uniquekey_useful_for_merging()
>
>   How does uniqueness relate to merge join? In README.uniquekey you seem to
>   point out that a single row is always sorted, but I don't think this
>   function is related to that fact. (Instead, I'd expect that pathkeys are
>   added to all paths for a single-row relation, but I'm not sure you do that
>   in the current version of the patch.)

The merging is for "mergejoinable join clauses", see function
eclass_useful_for_merging. Usually I think it as operator "t1.a = t2.a";

> * is_uniquekey_useful_afterjoin()
>
>   Now that my patch (0004) allows propagation of the unique keys from a
>   subquery to the upper query, I was wondering if the UniqueKey structure
>   needs the 'use_for_distinct field' I mean we should now propagate the unique
>   keys to the parent query whether the subquery has DISTINCT clause or not. I
>   noticed that the field is checked by is_uniquekey_useful_afterjoin(), so I
>   changed the function to always returned true. However nothing changed in the
>   output of regression tests (installcheck). Do you insist that the
>   'use_for_distinct' field is needed?
>
>
> * uniquekey_contains_multinulls()
>
>   ** Instead of calling the function when trying to use the UK, how about
>  checking the ECs when considering creation of the UK? If the tests fail,
>  just don't create the UK.

I don't think so since we maintain the UniqueKey from bottom to top, you
can double check if my reason is appropriate.  

CREATE TABLE t1(a int);
CREATE INDEX ON t1(a);

SELECT distinct t1.a FROM t1 JOIN t2 using(a);

We need to create the UniqueKey on the baserel for t1 and the NULL
values is filtered out in the joinrel. so we have to creating it with
allowing NULL values first. 

>   ** What does the 'multi' word in the function name mean?

multi means multiple, I thought we use this short name in the many
places, for ex bt_multi_page_stats after a quick search. 

> * relation_is_distinct_for()
>
>   The function name is too similar to rel_is_distinct_for(). I think the name
>   should indicate that you are checking the relation against a set of
>   pathkeys. Maybe rel_is_distinct_for_pathkeys() (and remove 'distinct' from
>   the argument name)? At the same time, it might be good to rename
>   rel_is_distinct_for() to rel_is_distinct_for_clauses().

OK.

> * uniquekey_contains_in()
>
>   Shouldn't this be uniquekey_contained_in()? And likewise, shouldn't the
>   comment be " ... if UniqueKey is contained in the list of EquivalenceClass"
>   ?

OK.
>
>   (In general, even though I'm not English native speaker, I think I see quite
>   a few grammar issues, which often make reading of the comments/documentation

Your English is really good:)

>
>
> * Combining the UKs
>
>   IMO this is the most problematic part of the patch. You call
>   populate_joinrel_uniquekeys() for the same join multiple

Re: psql: fix variable existence tab completion

2024-05-06 Thread Anton A. Melnikov


Hello!

On 14.03.2024 17:57, Alexander Korotkov wrote:

On Sun, Mar 3, 2024 at 5:37 PM Erik Wienhold  wrote:

On 2024-03-03 03:00 +0100, Steve Chavez wrote:

psql has the :{?name} syntax for testing a psql variable existence.

But currently doing \echo :{?VERB doesn't trigger tab completion.

This patch fixes it. I've also included a TAP test.


Thanks.  The code looks good, all tests pass, and the tab completion
works as expected when testing manually.


I'm not sure if Debian 10 is actual for the current master. But, if this is the 
case,
i suggest a patch, since the test will not work under this OS.
The thing is that, Debian 10 will backslash curly braces and the question mark 
and
TAB completion will lead to the string like that:

\echo :\{\?VERBOSITY\}

instead of expected:

\echo :{?VERBOSITY}

The patch attached fix the 010_tab_completion.pl test in the same way like [1].

With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[1] https://www.postgresql.org/message-id/960764.1643751...@sss.pgh.pa.usFrom 455bec0d785b0f5057fc7e91a5fede458cf8fd36 Mon Sep 17 00:00:00 2001
From: "Anton A. Melnikov" 
Date: Mon, 6 May 2024 08:40:45 +0300
Subject: [PATCH] Fix variable tab completion for broken libedit

---
 src/bin/psql/t/010_tab_completion.pl | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index b45c39f0f5..358478e935 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -414,12 +414,15 @@ check_completion(
 clear_query();
 
 # check completion for psql variable test
+# note: broken versions of libedit want to backslash curly braces and the question mark;
+# not much we can do about that
 check_completion(
 	"\\echo :{?VERB\t",
-	qr/:\{\?VERBOSITY} /,
+	qr/:\\?\{\\?\?VERBOSITY\\?} /,
 	"complete a psql variable test");
 
-clear_query();
+# broken versions of libedit require clear_line not clear_query here
+clear_line();
 
 # check no-completions code path
 check_completion("blarg \t\t", qr//, "check completion failure path");
-- 
2.43.2

86 matches

Mail list logo