Re: GUC names in messages

2023-11-27 Thread Laurenz Albe
On Tue, 2023-11-28 at 07:53 +0900, Michael Paquier wrote:
> On Mon, Nov 27, 2023 at 01:35:44AM -0500, Tom Lane wrote:
> > Laurenz Albe  writes:
> > > On Mon, 2023-11-27 at 13:41 +1100, Peter Smith wrote:
> > > > In the documentation and in the guc_tables.c they are all described in
> > > > MixedCase (e.g. "DateStyle" instead of "datestyle"), so I felt the
> > > > messages should use the same case the documentation, which is why I
> > > > changed all the ones you are referring to.
> > 
> > > I agree with that decision; we should use mixed case for these parameters.
> > > Otherwise we might get complaints that the following query does not return
> > > any results:
> > >   SELECT * FROM pg_settings WHERE name = 'timezone';
> 
> (I'm sure that you mean the opposite.  This query does not return any
> results on HEAD, but it would with "TimeZone".)

No, I meant it just like I said.  If all messages suggest that the parameter
is called "timezone", and not "TimeZone" (because we convert the name to lower
case), then it is surprising that the above query does not return results.

It would be better to call the parameter "TimeZone" everywhere.

(It would be best to convert the parameter to lower case, but I am worried
about the compatibility-pain-to-benefit ratio.)

Yours,
Laurenz Albe




Re: BackgroundPsql's set_query_timer_restart() may not work

2023-11-27 Thread Tom Lane
Bharath Rupireddy  writes:
> A nitpick on the patch - how about honoring the passed-in parameter
> with something like $self->{query_timer_restart} = 1 if !defined
> $self->{query_timer_restart}; instead of just setting it to 1 (a value
> other than undef) $self->{query_timer_restart} = 1;?

I wondered about that too, but the evidence of existing callers is
that nobody cares.  If we did make the code do something like that,
(a) I don't think your fragment is right, and (b) we'd need to rewrite
the function's comment to explain it.  I'm not seeing a reason to
think it's worth spending effort on.

regards, tom lane




Re: Synchronizing slots from primary to standby

2023-11-27 Thread Drouvot, Bertrand

Hi,

On 11/28/23 4:13 AM, shveta malik wrote:

On Mon, Nov 27, 2023 at 4:08 PM Amit Kapila  wrote:


On Mon, Nov 27, 2023 at 2:27 PM Zhijie Hou (Fujitsu)
 wrote:


Here is the updated version(v39_2) which include all the changes made in 0002.
Please use for review, and sorry for the confusion.



--- a/src/backend/replication/logical/launcher.c
+++ b/src/backend/replication/logical/launcher.c
@@ -8,20 +8,27 @@
   *   src/backend/replication/logical/launcher.c
   *
   * NOTES
- *   This module contains the logical replication worker launcher which
- *   uses the background worker infrastructure to start the logical
- *   replication workers for every enabled subscription.
+ *   This module contains the replication worker launcher which
+ *   uses the background worker infrastructure to:
+ *   a) start the logical replication workers for every enabled subscription
+ *  when not in standby_mode.
+ *   b) start the slot sync worker for logical failover slots synchronization
+ *  from the primary server when in standby_mode.

I was wondering do we really need a launcher on standby to invoke
sync-slot worker. If so, why? I guess it may be required for previous
versions where we were managing work for multiple slot-sync workers
which is also questionable in the sense of whether launcher is the
right candidate for the same but now with the single slot-sync worker,
it doesn't seem worth having it. What do you think?

--


Yes, earlier a manager process was needed to manage multiple slot-sync
workers and distribute load among them, but now that does not seem
necessary. I gave it a try (PoC) and it seems to work well.  If  there
are no objections to this approach, I can share the patch soon.



+1 on this new approach, thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] pg_convert improvement

2023-11-27 Thread Drouvot, Bertrand

Hi,

On 11/28/23 2:16 AM, Yurii Rashkovskii wrote:

  Hi Bertrand,


Did some minor changes in the attached:

- Started the multi-line comment with an upper case and finished
it with a "." and re-worded a bit.
- Ran pgindent

What do you think about the attached?


It looks great!


Also, might be good to create a CF entry [1] so that the patch proposal 
does not get lost
and gets visibility.


Just submitted it to SF. Thank you for the review!



Thanks! Just marked it as "Ready for Committer".

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: BackgroundPsql's set_query_timer_restart() may not work

2023-11-27 Thread Bharath Rupireddy
On Tue, Nov 28, 2023 at 6:48 AM Masahiko Sawada  wrote:
>
> Hi,
>
> While adding some TAP tests, I realized that set_query_timer_restart()
> in BackgroundPsql may not work. Specifically, it seems not to work
> unless we pass an argument to the function. Here is the test script I
> used:
>
> If calling set_query_timer_restart() works properly, this test would
> pass since we reset the query timeout before executing the second
> pg_sleep(3). However, this test fail on my environment unless I use
> set_query_timer_restart(1) (i.e. passing something to the function).

Right.

> Currently authentication/t/001_password.pl is the sole user of
> set_query_timer_restart() function. I think we should define a value
> to query_timer_restart in set_query_timer_restart() function even if
> no argument is passed, like the patch attached, or we should change
> the caller to pass 1 to the function.

It is added by commit 664d7575 and I agree that calling the function
without argument doesn't reset the query_timer_restart as intended.

A nitpick on the patch - how about honoring the passed-in parameter
with something like $self->{query_timer_restart} = 1 if !defined
$self->{query_timer_restart}; instead of just setting it to 1 (a value
other than undef) $self->{query_timer_restart} = 1;?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: PostgreSql: Canceled on conflict out to old pivot

2023-11-27 Thread Ilya Kosmodemiansky
Hi,

> On 28. Nov 2023, at 06:41, Wirch, Eduard  wrote:
> 
> 
> 
> :
> 
> ERROR: could not serialize access due to read/write dependencies among 
> transactions
>   Detail: Reason code: Canceled on identification as a pivot, with conflict 
> out to old committed transaction 61866959.
> 
> There is a variation of the error:
> 
> PSQLException: ERROR: could not serialize access due to read/write 
> dependencies among transactions
>   Detail: Reason code: Canceled on conflict out to old pivot 61940806.

Could you show explain analyze output for those queries which fail with such 
errors?


> 
> 




PostgreSql: Canceled on conflict out to old pivot

2023-11-27 Thread Wirch, Eduard
Hi

I posted this question already on pgsql-general, but it got no answers.
Maybe the topic is too technical? So I'm trying it here. Maybe a SSI
specialist is here on the list.

We have a PostgreSql 15 server serving around 30 databases, one schema each
with the same layout. Each database is used by one application instance.
The application consistently uses transactions with isolation level
serializable to access the database, optimizing by using explicit read only
transactions, where applicable. Once the server reaches 100% CPU load we
get an increased amount of serialize conflict errors. This is expected, due
to more concurrent access. But I fail to explain this kind of error:

ERROR: could not serialize access due to read/write dependencies among
transactions
  Detail: Reason code: Canceled on identification as a pivot, with conflict
out to old committed transaction 61866959.

There is a variation of the error:

PSQLException: ERROR: could not serialize access due to read/write
dependencies among transactions
  Detail: Reason code: Canceled on conflict out to old pivot 61940806.

We're logging the id, begin and end of every transaction. Transaction
61940806 was committed without errors. The transaction responsible for the
above error was started 40min later (and failed immediately). With 61866959
it is even more extreme: the first conflict error occurred 2.5h after
61866959 was committed.

The DB table access pattern is too complex to lay out here. There are like
20 tables that are read/written to. Transactions are usually short living.
The longest transaction that could occur is 1 min long. My understanding of
serializable isolation is that only overlapping transactions can conflict.
I can be pretty sure that in the above cases there is no single
transaction, which overlaps with 61940806 and with the failing transaction
40 min later. Such long running transactions would cause different types of
errors in our system ("out of shared memory", "You might need to increase
max_pred_locks_per_transaction").

Why does PostgreSql detect a conflict with a transaction which was
committed more than 1h before? Can there be a long dependency chain between
many short running transactions? Does the high load prevent Postgres from
doing some clean up?

Cheers,
Eduard


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-11-27 Thread Masahiko Sawada
On Thu, Nov 9, 2023 at 7:07 PM Amit Kapila  wrote:
>
> On Wed, Nov 8, 2023 at 11:05 PM vignesh C  wrote:
> >
> > On Wed, 8 Nov 2023 at 08:43, vignesh C  wrote:
> >
> > Here is a small improvisation where num_slots need not be initialized
> > as it will be used only after assigning the result now. The attached
> > patch has the changes for the same.
> >
>
> Pushed!
>

Thank you for your work on this feature!

One month has already been passed since this main patch got committed
but reading this change, I have some questions on new
binary_upgrade_logical_slot_has_caught_up() function:

Is there any reason why this function can be executed only in binary
upgrade mode? It seems to me that other functions in
pg_upgrade_support.c must be called only in binary upgrade mode
because it does some hacky changes internally. On the other hand,
binary_upgrade_logical_slot_has_caught_up() just calls
LogicalReplicationSlotHasPendingWal(), which doesn't change anything
internally. If we make this function usable in normal mode, the user
would be able to  check each slot's upgradability without pg_upgrade
--check command (or without stopping the server if the user can ensure
no more meaningful WAL records are generated).

---
Also, the function checks if the user has the REPLICATION privilege
but I think that only superuser can connect to the server in binary
upgrade mode in the first place.

---
The following error message doesn't match the function name:

/* We must check before dereferencing the argument */
if (PG_ARGISNULL(0))
elog(ERROR, "null argument to
binary_upgrade_validate_wal_records is not allowed");

---
{ oid => '8046', descr => 'for use by pg_upgrade',
  proname => 'binary_upgrade_logical_slot_has_caught_up', proisstrict => 'f',
  provolatile => 'v', proparallel => 'u', prorettype => 'bool',
  proargtypes => 'name',
  prosrc => 'binary_upgrade_logical_slot_has_caught_up' },

The function is not a strict function but we check in the function if
the passed argument is not null. I think it would be clearer to make
it a strict function.

---
LogicalReplicationSlotHasPendingWal() is defined in logical.c but I
guess it's more suitable to be in slotfunc.s where similar functions
such as pg_logical_replication_slot_advance() is also defined.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Improve tab completion for ALTER DEFAULT PRIVILEGE and ALTER TABLE

2023-11-27 Thread Shubham Khanna
On Mon, Nov 27, 2023 at 9:58 PM vignesh C  wrote:
>
> On Fri, 24 Nov 2023 at 18:37, Shubham Khanna
>  wrote:
> >
> > n Fri, Nov 24, 2023 at 6:33 PM vignesh C  wrote:
> > >
> > > Hi,
> > >
> > > Improved tab completion for "ALTER DEFAULT PRIVILEGE" and "ALTER TABLE":
> > > 1) GRANT, REVOKE and FOR USER keyword was not displayed in tab
> > > completion of alter default privileges like the below statement:
> > > ALTER DEFAULT PRIVILEGES GRANT INSERT ON tables TO PUBLIC;
> > > ALTER DEFAULT PRIVILEGES REVOKE INSERT ON tables FROM PUBLIC;
> > > ALTER DEFAULT PRIVILEGES FOR USER vignesh revoke INSERT ON tables FROM 
> > > dba1;
> > >
> > > 2) USER was not displayed for "ALTER DEFAULT PRIVILEGES IN SCHEMA
> > > public FOR " like in below statement:
> > > ALTER DEFAULT PRIVILEGES IN SCHEMA public FOR USER dba1 GRANT INSERT
> > > ON TABLES TO PUBLIC;
> > >
> > > 3) "FOR GRANT OPTION" was not display for "ALTER DEFAULT PRIVILEGES
> > > REVOKE " like in below statement:
> > > alter default privileges revoke grant option for select ON tables FROM 
> > > dba1;
> > >
> > > 4) "DATA TYPE" was missing in "ALTER TABLE table-name ALTER COLUMN
> > > column-name SET" like in:
> > > ALTER TABLE t1 ALTER COLUMN c1 SET DATA TYPE text;
> > >
> > > Attached patch has the changes for the same.
> >
> > +COMPLETE_WITH("ROLE", "USER");
> > +  /* ALTER DEFAULT PRIVILEGES REVOKE */
> > +  else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", "REVOKE"))
> > +COMPLETE_WITH("SELECT", "INSERT", "UPDATE", "DELETE", "TRUNCATE",
> > +    "REFERENCES", "TRIGGER", "CREATE", "EXECUTE", "USAGE",
> > +    "MAINTAIN", "ALL", "GRANT OPTION FOR");
> >
> > I could not find "alter default privileges revoke maintain", should
> > this be removed?
>
> This was reverted as part of:
> 151c22deee66a3390ca9a1c3675e29de54ae73fc.
> Revert MAINTAIN privilege and pg_maintain predefined role.
>
> This reverts the following commits: 4dbdb82513, c2122aae63,
> 5b1a879943, 9e1e9d6560, ff9618e82a, 60684dd834, 4441fc704d,
> and b5d6382496.  A role with the MAINTAIN privilege may be able to
> use search_path tricks to escalate privileges to the table owner.
> Unfortunately, it is too late in the v16 development cycle to apply
> the proposed fix, i.e., restricting search_path when running
> maintenance commands.
>
I have executed the given changes and they are working fine.

Thanks and Regards,
Shubham Khanna.




Re: remaining sql/json patches

2023-11-27 Thread John Naylor
On Mon, Nov 27, 2023 at 8:57 PM Andrew Dunstan  wrote:
> Interesting. But inferring a speed effect from such changes is
> difficult. I don't have a good idea about measuring parser speed, but a
> tool to do that would be useful. Amit has made a start on such
> measurements, but it's only a start. I'd prefer to have evidence rather
> than speculation.

Tom shared this test a while back, and that's the one I've used in the
past. The downside for a micro-benchmark like that is that it can
monopolize the CPU cache. Cache misses in real world queries are
likely much more dominant.

https://www.postgresql.org/message-id/14616.1558560...@sss.pgh.pa.us

Aside on the relevance of parser speed: I've seen customers
successfully lower their monthly cloud bills by moving away from
prepared statements, allowing smaller-memory instances.




Re: New instability in stats regression test

2023-11-27 Thread Michael Paquier
On Sun, Nov 26, 2023 at 10:34:59PM -0500, Tom Lane wrote:
> I'm good with that answer --- I doubt that this test sequence is
> proving anything that's worth the cycles it takes.  If it'd catch
> oversights like failing to add new stats types to the "reset all"
> code path, then I'd be for keeping it; but I don't see how the
> test could notice that.

For now I've applied a patch that removes the whole sequence.  I'll
keep an eye on the buildfarm for a few days in case there are more
failures.
--
Michael


signature.asc
Description: PGP signature


Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Tom Lane
Michael Paquier  writes:
> Or even simpler: plant a (ssl\/tls|sslv3) in these strings.

Yeah, weakening the pattern match was what I had in mind.
I was thinking of something like "ssl[a-z0-9/]*" but your
proposal works too.

regards, tom lane




Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Michael Paquier
On Tue, Nov 28, 2023 at 12:55:37PM +0900, Michael Paquier wrote:
> Sigh.  We could use an extra check_pg_config() with a routine new in
> 3.2.0.  Looking at CHANGES.md, SSL_get0_group_name() seems to be one
> generic choice here.

Or even simpler: plant a (ssl\/tls|sslv3) in these strings.
--
Michael


signature.asc
Description: PGP signature


Re: POC, WIP: OR-clause support for indexes

2023-11-27 Thread Peter Geoghegan
On Mon, Nov 27, 2023 at 5:07 PM Peter Geoghegan  wrote:
> One of the reasons why we shouldn't do this during parse analysis is
> because query rewriting might matter. But that doesn't mean that the
> transformation/normalization process must fundamentally be the
> responsibility of the optimizer, through process of elimination.
>
> Maybe it should be the responsibility of some other phase of query
> processing, invented solely to make life easier for the optimizer, but
> not formally part of query planning per se.

Support for SEARCH and CYCLE clauses for recursive CTEs (added by
commit 3696a600e2) works by literally rewriting a parse node into a
form involving RowExpr and ScalarArrayOpExpr during rewriting. See
rewriteSearchAndCycle(). These implementation details are even
mentioned in user-facing docs.

Separately, the planner has long relied on certain generic
normalization steps from rewriteHandler.c. For example, it reorders
the targetlist from INSERT and UPDATE statements into what it knows to
be standard order within the planner, for the planner's convenience.

I'm not suggesting that these are any kind of precedent to follow now.
Just that they hint that rewriting/transformation prior to query
planning proper could be the right general approach. AFAICT that
really is what is needed. That, plus the work of fixing any
undesirable/unintended side effects that the transformations lead to,
which might be a difficult task in its own right (it likely requires
work in the planner).

-- 
Peter Geoghegan




Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Michael Paquier
On Mon, Nov 27, 2023 at 09:04:23PM -0500, Tom Lane wrote:
> I can confirm that we also fail when using up-to-date MacPorts, which
> seems to have started shipping 3.2.0 last week or so.  I tried the v3
> patch, and while that stops the crash, it looks like 3.2.0 has also
> made some random changes in error messages:
> 
> Failed 3/205 subtests 
> t/002_scram.pl . ok
> t/003_sslinfo.pl ... ok
> 
> Guess we'll need to adjust the test script a bit too.

Sigh.  We could use an extra check_pg_config() with a routine new in
3.2.0.  Looking at CHANGES.md, SSL_get0_group_name() seems to be one
generic choice here.
--
Michael


signature.asc
Description: PGP signature


Re: POC, WIP: OR-clause support for indexes

2023-11-27 Thread Alena Rybakina

On 25.11.2023 19:13, Alexander Korotkov wrote:

On Sat, Nov 25, 2023 at 1:10 PM Alena Rybakina
 wrote:

On 25.11.2023 04:13, Alexander Korotkov wrote:

It seems to me there is a confusion.  I didn't mean we need to move
conversion of OR-expressions to ANY into choose_bitmap_and() function
or anything like this.  My idea was to avoid degradation of plans,
which I've seen in [1].  Current code for generation of bitmap paths
considers the possibility to split OR-expressions into distinct bitmap
index scans.  But it doesn't consider this possibility for
ANY-expressions.  So, my idea was to enhance our bitmap scan
generation to consider split values of ANY-expressions into distinct
bitmap index scans.  So, in the example [1] and similar queries
conversion of OR-expressions to ANY wouldn't affect the generation of
bitmap paths.

Thanks for the explanation, yes, I did not understand the idea correctly at 
first. I will try to implement something similar.

Alena, great, thank you.  I'm looking forward to the updated patch.

I wrote the patch (any_to_or.diff.txt), although it is still under 
development (not all regression tests have been successful so far), it 
is already clear that for a query where a bad plan was chosen before, it 
is now choosing a more optimal query plan.


postgres=# set enable_or_transformation =on;
SET
postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
  QUERY PLAN
--
 Bitmap Heap Scan on test  (cost=8.60..12.62 rows=1 width=12)
   Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = 
'100'::double precision) AND (x = 2)))

   ->  BitmapOr  (cost=8.60..8.60 rows=1 width=0)
 ->  Bitmap Index Scan on test_x_1_y  (cost=0.00..4.30 rows=1 
width=0)

   Index Cond: (y = '100'::double precision)
 ->  Bitmap Index Scan on test_x_2_y  (cost=0.00..4.30 rows=1 
width=0)

   Index Cond: (y = '100'::double precision)
(7 rows)

While I'm thinking how to combine it now.

BTW, while I was figuring out how create_index_paths works and creating 
bitmapscan indexes, I think I found a bug with unallocated memory (fix 
patch is bugfix.diff.txt). Without a fix here, it falls into the crust 
at the stage of assigning a value to any of the variables, specifically, 
skip_lower_stop and skip_nonnative_saop. I discovered it when I forced 
to form a bitmapindex plan for ANY (any_to_or.diff.txt). I'm causing a 
problem with my OR->ANY conversion patch.


--
Regards,
Alena Rybakina
Postgres Professional
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 03a5fbdc6dc..c242eec34d6 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -34,6 +34,7 @@
 #include "optimizer/restrictinfo.h"
 #include "utils/lsyscache.h"
 #include "utils/selfuncs.h"
+#include "parser/parse_expr.h"
 
 
 /* XXX see PartCollMatchesExprColl */
@@ -715,8 +716,8 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel,
List **bitindexpaths)
 {
List   *indexpaths;
-   boolskip_nonnative_saop = false;
-   boolskip_lower_saop = false;
+   boolskip_nonnative_saop;
+   boolskip_lower_saop;
ListCell   *lc;
 
/*
@@ -737,7 +738,7 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel,
 * that supports them, then try again including those clauses.  This 
will
 * produce paths with more selectivity but no ordering.
 */
-   if (skip_lower_saop)
+   if (skip_lower_saop || enable_or_transformation)
{
indexpaths = list_concat(indexpaths,
 
build_index_paths(root, rel,
@@ -778,7 +779,7 @@ get_index_paths(PlannerInfo *root, RelOptInfo *rel,
 * natively, generate bitmap scan paths relying on executor-managed
 * ScalarArrayOpExpr.
 */
-   if (skip_nonnative_saop)
+   if (skip_nonnative_saop || enable_or_transformation)
{
indexpaths = build_index_paths(root, rel,
   
index, clauses,
@@ -908,7 +912,7 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
{
if (!index->amsearcharray)
{
-   if (skip_nonnative_saop)
+   if (skip_nonnative_saop || 
enable_or_transformation)
{
/* Ignore because not supported 
by index */
*skip_nonnative_saop = true;
@@ -919,7 +923,7 @@ build_index_paths(PlannerInfo *root, Re

Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Michael Paquier
On Mon, Nov 27, 2023 at 08:32:28PM -0500, Tom Lane wrote:
> Since this is something we'd need to back-patch, OpenSSL 0.9.8
> and later are relevant: the v12 branch still supports those.
> It's moot given Bo's claim about the origin of the function,
> though.

Yep, unfortunately this needs to be checked down to 0.9.8.  I've just
done this exercise yesterday for another backpatch..
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] psql: Add tab-complete for optional view parameters

2023-11-27 Thread Shubham Khanna
On Thu, Nov 23, 2023 at 4:37 PM Dean Rasheed  wrote:
>
> On Mon, 14 Aug 2023 at 18:34, David Zhang  wrote:
> >
> > it would be great to switch the order of the 3rd and the 4th line to make a
> > better match for "CREATE" and "CREATE OR REPLACE" .
> >
>
> I took a look at this, and I think it's probably neater to keep the
> "AS SELECT" completion for CREATE [OR REPLACE] VIEW xxx WITH (*)
> separate from the already existing support for "AS SELECT" without
> WITH.
>
> A couple of other points:
>
> 1. It looks slightly neater, and works better, to complete one word at
> a time -- e.g., "WITH" then "(", instead of "WITH (", since the latter
> doesn't work if the user has already typed "WITH".
>
> 2. It should also complete with "=" after the option, where appropriate.
>
> 3. CREATE VIEW should offer "local" and "cascaded" after
> "check_option" (though there's no point in doing likewise for the
> boolean options, since they default to true, if present, and false
> otherwise).
>
> Attached is an updated patch, incorporating those comments.
>
> Barring any further comments, I think this is ready for commit.

I reviewed the given Patch and it is working fine.

Thanks and Regards,
Shubham Khanna.




Re: Synchronizing slots from primary to standby

2023-11-27 Thread shveta malik
On Mon, Nov 27, 2023 at 4:08 PM Amit Kapila  wrote:
>
> On Mon, Nov 27, 2023 at 2:27 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > Here is the updated version(v39_2) which include all the changes made in 
> > 0002.
> > Please use for review, and sorry for the confusion.
> >
>
> --- a/src/backend/replication/logical/launcher.c
> +++ b/src/backend/replication/logical/launcher.c
> @@ -8,20 +8,27 @@
>   *   src/backend/replication/logical/launcher.c
>   *
>   * NOTES
> - *   This module contains the logical replication worker launcher which
> - *   uses the background worker infrastructure to start the logical
> - *   replication workers for every enabled subscription.
> + *   This module contains the replication worker launcher which
> + *   uses the background worker infrastructure to:
> + *   a) start the logical replication workers for every enabled subscription
> + *  when not in standby_mode.
> + *   b) start the slot sync worker for logical failover slots synchronization
> + *  from the primary server when in standby_mode.
>
> I was wondering do we really need a launcher on standby to invoke
> sync-slot worker. If so, why? I guess it may be required for previous
> versions where we were managing work for multiple slot-sync workers
> which is also questionable in the sense of whether launcher is the
> right candidate for the same but now with the single slot-sync worker,
> it doesn't seem worth having it. What do you think?
>
> --

Yes, earlier a manager process was needed to manage multiple slot-sync
workers and distribute load among them, but now that does not seem
necessary. I gave it a try (PoC) and it seems to work well.  If  there
are no objections to this approach, I can share the patch soon.

thanks
Shveta




Re: proposal: change behavior on collation version mismatch

2023-11-27 Thread Jeff Davis
On Mon, 2023-11-27 at 15:35 -0800, Jeremy Schneider wrote:

>  For many enterprise customers, if they ask why their database
> wouldn't accept connections after an OS upgrade and we explained that
> durability & correctness is prioritized over availability, I think
> they would agree we're doing the right thing.

They may agree, but their database is still down, and they'll be
looking for a clear process to get it going again, ideally within their
maintenance window.

It would be really nice to agree on such a process, or even better, to
implement it in code.

> After some years of ICU and PG, I'm just coming to a conclusion that
> the right thing to do is stay safe and don't change ICU versions (or
> glibc versions) for existing databases in-place.

I don't disagree, but for a lot of users that depend on their operating
system and packaging infrastructure, that is not very practical.

Regards,
Jeff Davis





Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Tom Lane
I can confirm that we also fail when using up-to-date MacPorts, which
seems to have started shipping 3.2.0 last week or so.  I tried the v3
patch, and while that stops the crash, it looks like 3.2.0 has also
made some random changes in error messages:

# +++ tap check in src/test/ssl +++
t/001_ssltests.pl .. 163/? 
#   Failed test 'certificate authorization fails with revoked client cert: 
matches'
#   at t/001_ssltests.pl line 775.
#   'psql: error: connection to server at "127.0.0.1", port 
58332 failed: SSL error: ssl/tls alert certificate revoked'
# doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'
#   Failed test 'certificate authorization fails with revoked client cert with 
server-side CRL directory: matches'
#   at t/001_ssltests.pl line 880.
#   'psql: error: connection to server at "127.0.0.1", port 
58332 failed: SSL error: ssl/tls alert certificate revoked'
# doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'
#   Failed test 'certificate authorization fails with revoked UTF-8 client cert 
with server-side CRL directory: matches'
#   at t/001_ssltests.pl line 893.
#   'psql: error: connection to server at "127.0.0.1", port 
58332 failed: SSL error: ssl/tls alert certificate revoked'
# doesn't match '(?^:SSL error: sslv3 alert certificate revoked)'
# Looks like you failed 3 tests of 205.
t/001_ssltests.pl .. Dubious, test returned 3 (wstat 768, 0x300)
Failed 3/205 subtests 
t/002_scram.pl . ok
t/003_sslinfo.pl ... ok

Guess we'll need to adjust the test script a bit too.

regards, tom lane




Re: Testing autovacuum wraparound (including failsafe)

2023-11-27 Thread Masahiko Sawada
On Mon, Nov 27, 2023 at 10:40 PM Daniel Gustafsson  wrote:
>
> > On 27 Nov 2023, at 14:06, Masahiko Sawada  wrote:
>
> > Is it true that we can modify the timeout after creating
> > BackgroundPsql object? If so, it seems we don't need to introduce the
> > new timeout argument. But how?
>
> I can't remember if that's leftovers that incorrectly remains from an earlier
> version of the BackgroundPsql work, or if it's a very bad explanation of
> ->set_query_timer_restart().  The timeout will use the timeout_default value
> and that cannot be overridden, it can only be reset per query.

Thank you for confirming this. I see there is the same problem also in
interactive_psql(). So I've attached the 0001 patch to fix these
documentation issues. Which could be backpatched.

> With your patch the timeout still cannot be changed, but at least set during
> start which seems good enough until there are tests warranting more 
> complexity.
> The docs should be corrected to reflect this in your patch.

I've incorporated the comments except for the following one and
attached updated version of the rest patches:

On Fri, Sep 29, 2023 at 7:20 PM vignesh C  wrote:
> Few comments:
> 1) Should we have some validation for the inputs given:
> +PG_FUNCTION_INFO_V1(consume_xids_until);
> +Datum
> +consume_xids_until(PG_FUNCTION_ARGS)
> +{
> +   FullTransactionId targetxid =
> FullTransactionIdFromU64((uint64) PG_GETARG_INT64(0));
> +   FullTransactionId lastxid;
> +
> +   if (!FullTransactionIdIsNormal(targetxid))
> +   elog(ERROR, "targetxid %llu is not normal", (unsigned
> long long) U64FromFullTransactionId(targetxid));
>
> If not it will take inputs like -1 and 999.
> Also the notice messages might confuse for the above values, as it
> will show a different untilxid value like the below:
> postgres=# SELECT consume_xids_until(999);
> NOTICE:  consumed up to 0:1809 / 232830:2764472319

The full transaction ids shown in the notice messages are separated
into epoch and xid so it's not a different value. This epoch-and-xid
style is used also in pg_controldata output and makes sense to me to
show the progress of xid consumption.

Once the new test gets committed, I'll prepare a new buildfarm animal
for that if possible.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


v6-0001-fix-wrong-description-of-BackgroundPsql-s-timeout.patch
Description: Binary data


v6-0003-Add-tests-for-XID-wraparound.patch
Description: Binary data


v6-0002-Add-option-to-specify-timeout-seconds-to-Backgrou.patch
Description: Binary data


Re: remaining sql/json patches

2023-11-27 Thread jian he
On Thu, Nov 23, 2023 at 6:46 PM jian he  wrote:
>
> -however these four will not fail.
> SELECT JSON_QUERY(jsonb '{"a":[3,4]}', '$.z' RETURNING int4range empty
> object on error);
> SELECT JSON_QUERY(jsonb '{"a":[3,4]}', '$.z' RETURNING int4range empty
> array on error);
> SELECT JSON_QUERY(jsonb '{"a":[3,4]}', '$.z' RETURNING int4range empty
> array on empty);
> SELECT JSON_QUERY(jsonb '{"a":[3,4]}', '$.z' RETURNING int4range empty
> object on empty);
>
> should the last four query fail or just return null?

I refactored making the above four queries fail.
SELECT JSON_QUERY(jsonb '{"a":[3,4]}', '$.z' RETURNING int4range empty
object on error);
The new error is: ERROR:  cannot cast DEFAULT expression of type jsonb
to int4range.

also make the following query fail, which is as expected, imho.
select  json_value(jsonb '{"a":[123.45,1]}', '$.z' returning text
error on empty);


v1-0001-handle-key-words-empty-array-and-empty-object.nocfbot
Description: Binary data


Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Tom Lane
"Tristan Partin"  writes:
> On Mon Nov 27, 2023 at 7:14 PM CST, Tom Lane wrote:
>> ... If the function
>> does exist in 0.9.8 then I concur that we don't need to test.

> I have gone back all the way to 1.0.0 and confirmed that the function 
> exists. Didn't choose to go further than that since Postgres doesn't 
> support it.

Since this is something we'd need to back-patch, OpenSSL 0.9.8
and later are relevant: the v12 branch still supports those.
It's moot given Bo's claim about the origin of the function,
though.

regards, tom lane




Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Tristan Partin

On Mon Nov 27, 2023 at 7:14 PM CST, Tom Lane wrote:

"Tristan Partin"  writes:
> On Mon Nov 27, 2023 at 6:21 PM CST, Tom Lane wrote:
>> What about LibreSSL?  In general, I'm not too pleased with just assuming
>> that BIO_get_app_data exists.

> Falling back to what existed before is invalid.

Well, sure it only worked by accident, but it did work with older
OpenSSL versions.  If we assume that BIO_get_app_data exists, and
somebody tries to use it with a version that hasn't got that,
it won't work.

Having said that, my concern was mainly driven by the comments in
configure.ac claiming that this was an OpenSSL 1.1.0 addition.
Looking at the relevant commits, 593d4e47d and 5c6df67e0, it seems
that that was less about "the function doesn't exist before 1.1.0"
and more about "in 1.1.0 we have to use the function because we
can no longer directly access the ptr field".  If the function
does exist in 0.9.8 then I concur that we don't need to test.


I have gone back all the way to 1.0.0 and confirmed that the function 
exists. Didn't choose to go further than that since Postgres doesn't 
support it.


--
Tristan Partin
Neon (https://neon.tech)




BackgroundPsql's set_query_timer_restart() may not work

2023-11-27 Thread Masahiko Sawada
Hi,

While adding some TAP tests, I realized that set_query_timer_restart()
in BackgroundPsql may not work. Specifically, it seems not to work
unless we pass an argument to the function. Here is the test script I
used:

use strict;
use warnings;
use PostgreSQL::Test::Cluster;
use PostgreSQL::Test::Utils;
use Test::More;

my $node = PostgreSQL::Test::Cluster->new('main');
$node->init;
$node->start;

$PostgreSQL::Test::Utils::timeout_default = 5;
my $bg_psql = $node->background_psql('postgres', on_error_stop => 1);

$bg_psql->query_safe("select pg_sleep(3)");
$bg_psql->set_query_timer_restart();
$bg_psql->query_safe("select pg_sleep(3)");
$bg_psql->quit;
is(1,1,"dummy");

$node->stop;
done_testing();


If calling set_query_timer_restart() works properly, this test would
pass since we reset the query timeout before executing the second
pg_sleep(3). However, this test fail on my environment unless I use
set_query_timer_restart(1) (i.e. passing something to the function).

Currently authentication/t/001_password.pl is the sole user of
set_query_timer_restart() function. I think we should define a value
to query_timer_restart in set_query_timer_restart() function even if
no argument is passed, like the patch attached, or we should change
the caller to pass 1 to the function.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


fix_set_query_timer_restart.patch
Description: Binary data


Re: [PATCH] pg_convert improvement

2023-11-27 Thread Yurii Rashkovskii
 Hi Bertrand,


> Did some minor changes in the attached:
>
> - Started the multi-line comment with an upper case and finished
> it with a "." and re-worded a bit.
> - Ran pgindent
>
> What do you think about the attached?
>

It looks great!


>
> Also, might be good to create a CF entry [1] so that the patch proposal
> does not get lost
> and gets visibility.
>

Just submitted it to SF. Thank you for the review!

-- 
Y.


Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Tom Lane
"Tristan Partin"  writes:
> On Mon Nov 27, 2023 at 6:21 PM CST, Tom Lane wrote:
>> What about LibreSSL?  In general, I'm not too pleased with just assuming
>> that BIO_get_app_data exists.

> Falling back to what existed before is invalid.

Well, sure it only worked by accident, but it did work with older
OpenSSL versions.  If we assume that BIO_get_app_data exists, and
somebody tries to use it with a version that hasn't got that,
it won't work.

Having said that, my concern was mainly driven by the comments in
configure.ac claiming that this was an OpenSSL 1.1.0 addition.
Looking at the relevant commits, 593d4e47d and 5c6df67e0, it seems
that that was less about "the function doesn't exist before 1.1.0"
and more about "in 1.1.0 we have to use the function because we
can no longer directly access the ptr field".  If the function
does exist in 0.9.8 then I concur that we don't need to test.

regards, tom lane




Re: POC, WIP: OR-clause support for indexes

2023-11-27 Thread Peter Geoghegan
On Mon, Nov 27, 2023 at 4:07 PM Robert Haas  wrote:
> > I am sure that there is a great deal of truth to this. The general
> > conclusion about parse analysis being the wrong place for this seems
> > very hard to argue with. But I'm much less sure that there needs to be
> > a conventional cost model.
>
> I'm not sure about that part, either. The big reason we shouldn't do
> this in parse analysis is that parse analysis is supposed to produce
> an internal representation which is basically just a direct
> translation of what the user entered. The representation should be
> able to be deparsed to produce more or less what the user entered
> without significant transformations. References to objects like tables
> and operators do get resolved to OIDs at this stage, so deparsing
> results will vary if objects are renamed or the search_path changes
> and more or less schema-qualification is required or things like that,
> but the output of parse analysis is supposed to preserve the meaning
> of the query as entered by the user.

One of the reasons why we shouldn't do this during parse analysis is
because query rewriting might matter. But that doesn't mean that the
transformation/normalization process must fundamentally be the
responsibility of the optimizer, through process of elimination.

Maybe it should be the responsibility of some other phase of query
processing, invented solely to make life easier for the optimizer, but
not formally part of query planning per se.

> The right place to do
> optimization is in the optimizer.

Then why doesn't the optimizer do query rewriting? Isn't that also a
kind of optimization, at least in part?

> > The planner's cost model is supposed to have some basis in physical
> > runtime costs, which is not the case for any of these transformations.
> > Not in any general sense; they're just transformations that enable
> > finding a cheaper way to execute the query. While they have to pay for
> > themselves, in some sense, I think that that's purely a matter of
> > managing the added planner cycles. In principle they shouldn't have
> > any direct impact on the physical costs incurred by physical
> > operators. No?
>
> Right. It's just that, as a practical matter, some of the operators
> deal with one form better than the other. So if we waited until we
> knew which operator we were using to decide on which form to pick,
> that would let us be smart.

ISTM that the real problem is that this is true in the first place. If
the optimizer had only one representation for any two semantically
equivalent spellings of the same qual, then it would always use the
best available representation. That seems even smarter, because that
way the planner can be dumb and still look fairly smart at runtime.

I am trying to be pragmatic, too (at least I think so). If having only
one representation turns out to be very hard, then maybe they weren't
ever really equivalent -- meaning it really is an optimization
problem, and the responsibility of the planner. It seems like it would
be more useful to spend time on making the world simpler for the
optimizer, rather than spending time on making the optimizer smarter.
Especially if we're talking about teaching the optimizer about what
are actually fairly accidental differences that come from
implementation details.

I understand that it'll never be black and white. There are practical
constraints on how far you go with this. We throw around terms like
"semantically equivalent" as if everybody agreed on precisely what
that means, which isn't really true (users complain when their view
definition has "<>" instead of "!="). Even still, I bet that we could
bring things far closer to this theoretical ideal, to good effect.

> > As I keep pointing out, there is a sound theoretical basis to the idea
> > of normalizing to conjunctive normal form as its own standard step in
> > query processing. To some extent we do this already, but it's all
> > rather ad-hoc. Even if (say) the nbtree preprocessing transformations
> > that I described were something that the planner already knew about
> > directly, they still wouldn't really need to be costed. They're pretty
> > much strictly better at runtime (at most you only have to worry about
> > the fixed cost of determining if they apply at all).
>
> It's just a matter of figuring out where we can put the logic and have
> the result make sense. We'd like to put it someplace where it's not
> too expensive and gets the right answer.

Agreed.


--
Peter Geoghegan




Re: GUC names in messages

2023-11-27 Thread Peter Smith
Here is patch set v3.

Patches 0001 and 0002 are unchanged from v2.

Patch 0003 now uses a "%s%s%s" format specifier with GUC_FORMAT macro
in guc.c, as recently suggested by Michael [1].

~

(Meanwhile, the MixedCase stuff is still an open question, to be
addressed in a later patch version)

==
[1] https://www.postgresql.org/message-id/ZWQVxu8zWIx64V7l%40paquier.xyz

Kind Regards,
Peter Smith.
Fujitsu Australia


v3-0001-GUC-names-docs.patch
Description: Binary data


v3-0003-GUC-names-maybe-add-quotes.patch
Description: Binary data


v3-0002-GUC-names-fix-quotes.patch
Description: Binary data


Re: autovectorize page checksum code included elsewhere

2023-11-27 Thread Andres Freund
Hi,

On 2023-11-25 14:09:14 +0700, John Naylor wrote:
> * Note: I have seen the threads with the idea of compiling multiple
> entire binaries, and switching at postmaster start. I think it's a
> good idea, but I also suspect detecting flags from the packager is an
> easier intermediate step.

It's certainly an easier incremental step - but will it get us all that far?
Other architectures have similar issues, e.g. ARM with ARMv8.1-A having much
more scalable atomic instructions than plain ARMv8.0. And even on x86-64,
relying on distros to roll out new minimum x86_64-v2 doesn't get us that far -
we'd be up to a uarch from ~2010. And I doubt that they'll raise the bar to v4
in the next few years, so we'll leave years of improvements on the table. And
even PG specific repos can't just roll out a hard requirement for a new, more
modern, uarch that will crash some time after startup on some hardware.

Greetings,

Andres Freund




Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Tristan Partin

On Mon Nov 27, 2023 at 6:21 PM CST, Tom Lane wrote:

Michael Paquier  writes:
> Interesting.  I have yet to look at that in details, but
> BIO_get_app_data() exists down to 0.9.8, which is the oldest version
> we need to support for stable branches.  So that looks like a safe
> bet.

What about LibreSSL?  In general, I'm not too pleased with just assuming
that BIO_get_app_data exists.  If we can do that, we can probably remove
most of the OpenSSL function probes that configure.ac has today.  Even
if that's a good idea in HEAD, I doubt we want to do it all the way back.


As Bo said, this has been available since before LibreSSL forked off of 
OpenSSL.



I'd be inclined to form the patch more along the lines of
s/BIO_get_data/BIO_get_app_data/g, with a configure check for
BIO_get_app_data and falling back to the existing direct use of
bio->ptr if it's not there.


Falling back to what existed before is invalid. BIO::ptr is private data 
for the BIO implementation. BIO_{get,set}_app_data() does
something completely different than setting BIO::ptr. In Postgres we 
call BIO_meth_set_create() with BIO_meth_get_create() from 
BIO_s_socket(). The create function we pass allocates bi->ptr to 
a struct bss_sock_st * as previously stated, and that's been the case 
since March 10, 2022[0]. Essentially Postgres only worked because the 
BIO implementation didn't use the private data section until the linked 
commit. I don't see any reason to keep compatibility with what only 
worked by accident.


[0]: 
https://github.com/openssl/openssl/commit/a3e53d56831adb60d6875297b3339a4251f735d2

--
Tristan Partin
Neon (https://neon.tech)




Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Tom Lane
Michael Paquier  writes:
> Interesting.  I have yet to look at that in details, but
> BIO_get_app_data() exists down to 0.9.8, which is the oldest version
> we need to support for stable branches.  So that looks like a safe
> bet.

What about LibreSSL?  In general, I'm not too pleased with just assuming
that BIO_get_app_data exists.  If we can do that, we can probably remove
most of the OpenSSL function probes that configure.ac has today.  Even
if that's a good idea in HEAD, I doubt we want to do it all the way back.

I'd be inclined to form the patch more along the lines of
s/BIO_get_data/BIO_get_app_data/g, with a configure check for
BIO_get_app_data and falling back to the existing direct use of
bio->ptr if it's not there.

regards, tom lane




Re: POC, WIP: OR-clause support for indexes

2023-11-27 Thread Robert Haas
On Mon, Nov 27, 2023 at 5:16 PM Peter Geoghegan  wrote:
> [ various observations ]

This all seems to make sense but I don't have anything particular to
say about it.

> I am sure that there is a great deal of truth to this. The general
> conclusion about parse analysis being the wrong place for this seems
> very hard to argue with. But I'm much less sure that there needs to be
> a conventional cost model.

I'm not sure about that part, either. The big reason we shouldn't do
this in parse analysis is that parse analysis is supposed to produce
an internal representation which is basically just a direct
translation of what the user entered. The representation should be
able to be deparsed to produce more or less what the user entered
without significant transformations. References to objects like tables
and operators do get resolved to OIDs at this stage, so deparsing
results will vary if objects are renamed or the search_path changes
and more or less schema-qualification is required or things like that,
but the output of parse analysis is supposed to preserve the meaning
of the query as entered by the user. The right place to do
optimization is in the optimizer.

But where in the optimizer to do it is an open question in my mind.

Previous discussion suggests to me that we might not really have
enough information at the beginning, because it seems like the right
thing to do depends on which plan we ultimately choose to use, which
gets to what you say here:

> The planner's cost model is supposed to have some basis in physical
> runtime costs, which is not the case for any of these transformations.
> Not in any general sense; they're just transformations that enable
> finding a cheaper way to execute the query. While they have to pay for
> themselves, in some sense, I think that that's purely a matter of
> managing the added planner cycles. In principle they shouldn't have
> any direct impact on the physical costs incurred by physical
> operators. No?

Right. It's just that, as a practical matter, some of the operators
deal with one form better than the other. So if we waited until we
knew which operator we were using to decide on which form to pick,
that would let us be smart.

> As I keep pointing out, there is a sound theoretical basis to the idea
> of normalizing to conjunctive normal form as its own standard step in
> query processing. To some extent we do this already, but it's all
> rather ad-hoc. Even if (say) the nbtree preprocessing transformations
> that I described were something that the planner already knew about
> directly, they still wouldn't really need to be costed. They're pretty
> much strictly better at runtime (at most you only have to worry about
> the fixed cost of determining if they apply at all).

It's just a matter of figuring out where we can put the logic and have
the result make sense. We'd like to put it someplace where it's not
too expensive and gets the right answer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: POC, WIP: OR-clause support for indexes

2023-11-27 Thread Matthias van de Meent
On Mon, 27 Nov 2023, 23:16 Peter Geoghegan,  wrote:

> On Mon, Nov 27, 2023 at 1:04 PM Robert Haas  wrote:
> > The use of op_mergejoinable() seems pretty random to me. Why should we
> > care about that? If somebody writes a<1 or a<2 or a<3 or a<4, you can
> > transform that to a > good idea, but I think it's a legal transformation.
>
> That kind of transformation is likely to be a very good idea, because
> nbtree's _bt_preprocess_array_keys() function knows how to perform
> preprocessing that makes the final index qual "a < 1". Obviously that
> could be far more efficient.
>

a < 4, you mean? The example mentioned ANY, not ALL

Further suppose you have a machine generated query  "a<1 or a<2 or a<3
> or a<4 AND a = 2" -- same as before, except that I added "AND a = 2"
> to the end. Now _bt_preprocess_array_keys() will be able to do the
> aforementioned inequality preprocessing, just as before. But this time
> _bt_preprocess_keys() (a different function with a similar name) can
> see that the quals are contradictory. That makes the entire index scan
> end, before it ever really began.
>

With the given WHERE-clause I would hope it did *not* return before
scanning the index, given that any row with a < 3 is valid for that
constraint with current rules of operator precedence.

- Matthias


Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Tristan Partin

On Mon Nov 27, 2023 at 5:53 PM CST, Michael Paquier wrote:

On Mon, Nov 27, 2023 at 12:33:49PM -0600, Tristan Partin wrote:
> -#ifndef HAVE_BIO_GET_DATA
> -#define BIO_get_data(bio) (bio->ptr)
> -#define BIO_set_data(bio, data) (bio->ptr = data)
> -#endif

Shouldn't this patch do a refresh of configure.ac and remove the check
on BIO_get_data() if HAVE_BIO_GET_DATA is gone?


See the attached v3. I am unfamiliar with autotools, so I just hand 
edited the configure.ac script instead of whatever "refresh" means.


--
Tristan Partin
Neon (https://neon.tech)
From 2aa28288c389a2d8f9cbc77a8a710c905227383f Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Mon, 27 Nov 2023 11:49:52 -0600
Subject: [PATCH v3] Use BIO_{get,set}_app_data() instead of
 BIO_{get,set}_data()

Compiling Postgres against OpenSSL 3.2 exposed a regression:

$ psql "postgresql://$DB?sslmode=require"
psql: error: connection to server at "redacted" (redacted), port 5432 failed: ERROR:  Parameter 'user' is missing in startup packet.
double free or corruption (out)
Aborted (core dumped)

Analyzing the backtrace, openssl was overwriting heap-allocated data in
our PGconn struct because it thought BIO::ptr was a struct bss_sock_st
*. OpenSSL then called a memset() on a member of that struct, and we
zeroed out data in our PGconn struct.

BIO_get_data(3) says the following:

> These functions are mainly useful when implementing a custom BIO.
>
> The BIO_set_data() function associates the custom data pointed to by ptr
> with the BIO a. This data can subsequently be retrieved via a call to
> BIO_get_data(). This can be used by custom BIOs for storing
> implementation specific information.

If you take a look at my_BIO_s_socket(), we create a partially custom
BIO, but for the most part are defaulting to the methods defined by
BIO_s_socket(). We need to set application-specific data and not BIO
private data, so that the BIO implementation we rely on, can properly
assert that its private data is what it expects.

Co-authored-by: Bo Andreson 
---
 configure|  2 +-
 configure.ac |  2 +-
 meson.build  |  1 -
 src/backend/libpq/be-secure-openssl.c| 11 +++
 src/include/pg_config.h.in   |  3 ---
 src/interfaces/libpq/fe-secure-openssl.c | 11 +++
 src/tools/msvc/Solution.pm   |  2 --
 7 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/configure b/configure
index c064115038..bf3ea690db 100755
--- a/configure
+++ b/configure
@@ -12836,7 +12836,7 @@ done
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  for ac_func in OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
+  for ac_func in OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free
 do :
   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
 ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index f220b379b3..fed7167e65 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1367,7 +1367,7 @@ if test "$with_ssl" = openssl ; then
   # defines OPENSSL_VERSION_NUMBER to claim version 2.0.0, even though it
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
-  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_get_data BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
+  AC_CHECK_FUNCS([OPENSSL_init_ssl BIO_meth_new ASN1_STRING_get0_data HMAC_CTX_new HMAC_CTX_free])
   # OpenSSL versions before 1.1.0 required setting callback functions, for
   # thread-safety. In 1.1.0, it's no longer required, and CRYPTO_lock()
   # function was removed.
diff --git a/meson.build b/meson.build
index ee58ee7a06..0095fb183a 100644
--- a/meson.build
+++ b/meson.build
@@ -1285,7 +1285,6 @@ if sslopt in ['auto', 'openssl']
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
   ['OPENSSL_init_ssl'],
-  ['BIO_get_data'],
   ['BIO_meth_new'],
   ['ASN1_STRING_get0_data'],
   ['HMAC_CTX_new'],
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 31b6a6eacd..1b8b32c5b3 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -842,11 +842,6 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
-#ifndef HAVE_BIO_GET_DATA
-#define BIO_get_data(bio) (bio->ptr)
-#define BIO_set_data(bio, data) (bio->ptr = data)
-#endif
-
 static BIO_METHOD *my_bio_methods = NULL;
 
 static int
@@ -856,7 +851,7 @@ my_sock_read(BIO *h, char *buf, int size)
 
 	if (buf != NULL)
 	{
-		res = secure_raw_read(((Port *) BIO_get_data(h)), buf, size);
+		res = secure_raw_read(((Port *) BIO_get_app_data(h)), buf, size);
 		BIO_clear_r

Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Michael Paquier
On Mon, Nov 27, 2023 at 12:33:49PM -0600, Tristan Partin wrote:
> - res = secure_raw_read(((Port *) BIO_get_data(h)), buf, size);
> + res = secure_raw_read(((Port *) BIO_get_app_data(h)), buf, 
> size);
>   BIO_clear_retry_flags(h);
>   if (res <= 0)

Interesting.  I have yet to look at that in details, but
BIO_get_app_data() exists down to 0.9.8, which is the oldest version
we need to support for stable branches.  So that looks like a safe
bet.

> -#ifndef HAVE_BIO_GET_DATA
> -#define BIO_get_data(bio) (bio->ptr)
> -#define BIO_set_data(bio, data) (bio->ptr = data)
> -#endif

Shouldn't this patch do a refresh of configure.ac and remove the check
on BIO_get_data() if HAVE_BIO_GET_DATA is gone?
--
Michael


signature.asc
Description: PGP signature


Re: Partial aggregates pushdown

2023-11-27 Thread Robert Haas
First of all, that last email of mine was snippy, and I apologize for it. Sorry.

That said:

On Mon, Nov 27, 2023 at 4:23 PM Tom Lane  wrote:
> Well, one of the founding principles of postgres_fdw was to be able
> to talk to PG servers that are not of the same version as yours.
> If we break that in the name of performance, we are going to have
> a lot of unhappy users.  Even the ones who do get the benefit of
> the speedup are going to be unhappy when it breaks because they
> didn't upgrade local and remote at exactly the same time.

I agree with this.

> Just because we'd like to have it doesn't make the patch workable
> in the real world.

And also with this in concept - I'd like to plan arbitrarily
complicated queries perfectly and near-instantly, and then execute
them at faster-than-light speed, but we can't. However, I don't
understand the fatalism with respect to the feature at hand. As I said
before, it's not like no other product has made this work. Sure, some
of those products may not have the extensible system of data types
that we do, or may not care about cross-version communication, but
those don't seem like good enough reasons to just immediately give up.

TBH, I suspect even some PG forks have made this work, like maybe PGXC
or PGXL, although I don't know for certain. We might not like the
trade-offs they made to get there, but we haven't even talked through
possible design ideas yet, so it seems way too early to give up.

One of the things that I think is a problem in this area is that the
ways we have to configure FDW connections are just not very rich.
We're trying to cram everything into a set of strings that can be
attached to the foreign server or the user mapping, but that's not a
very good fit for something like how all the local SQL functions that
might exist map onto all of the remote SQL functions that might exist.
Now you might well say that we don't want the act of configuring a
foreign data wrapper to be insanely complicated, and I would agree
with that. But, on the other hand, as Larry Wall once said, a good
programming language makes simple things simple and complicated things
possible. I think our current configuration system is only
accomplishing the first of those goals.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Incorrect comment in tableam.h regarding GetHeapamTableAmRoutine()

2023-11-27 Thread Michael Paquier
On Mon, Nov 27, 2023 at 02:14:32PM +0800, Richard Guo wrote:
> +1. Nice catch.

Thanks, applied it.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] Add CHECK_FOR_INTERRUPTS in scram_SaltedPassword loop.

2023-11-27 Thread Michael Paquier
On Mon, Nov 27, 2023 at 10:05:49AM +0100, Daniel Gustafsson wrote:
> I don't see any reason to backpatch further down than 16 given how low the
> hardcoded value is set there, scanning the archives I see no complaints about
> it either.  As a reference, CREATE ROLE using 4096 iterations takes 14ms on my
> 10 year old laptop (1M iterations, 244x the default, takes less than a 
> second).

Agreed, so done it this way.  \password has the same problem, where we
could perhaps do something with a callback or something like that, or
perhaps that's just not worth bothering.
--
Michael


signature.asc
Description: PGP signature


Re: proposal: change behavior on collation version mismatch

2023-11-27 Thread Jeremy Schneider
On 11/27/23 12:29 PM, Jeff Davis wrote:
>> 2) "most users would rather have ease-of-use than 100% safety, since
>> it's uncommon"
>>
>> And I think this led to the current behavior of issuing a warning
>> rather
>> than an error
> The elevel trade-off is *availability* vs safety, not ease-of-use vs
> safety. It's harder to reason about what most users might want in that
> situation.

I'm not in agreement with the idea that this is hard to reason about;
I've always thought durability & correctness is generally supposed to be
prioritized over availability in databases. For many enterprise
customers, if they ask why their database wouldn't accept connections
after an OS upgrade and we explained that durability & correctness is
prioritized over availability, I think they would agree we're doing the
right thing.

In practice this always happens after a major operating system update of
some kind (it would be an unintentional bug in a minor OS upgrade).  In
most cases, I hope the error will happen immediately because users
ideally won't even be able to connect (for DB-level glibc and for ICU
default setting).  Giving a hard error quickly after an OS upgrade is
actually pretty easy for most people to deal with. For most users,
they'll immediately understand that something went wrong related to the
OS upgrade.  And basic testing would turn up connection errors before
the production upgrade as long as a connection was attempted as part of
the test.

It seems to me that much of the hand-wringing is around taking a hard
line on not allowing in-place OS upgrades. We're all aware that when
you're talking about tens of terrabytes, in-place upgrade is just a lot
more convenient and easy than the alternatives. And we're aware that
some other relational databases support this (and also bundle collation
libs directly in the DB rather than using external libraries).

I myself wouldn't frame this as an availability issue, I think it's more
about ease-of-use in the sense of allowing low-downtime major OS
upgrades without the complexity of logical replication (but perhaps with
a risk of data loss, because with unicode nobody can actually be 100%
sure there's no risky characters stored in the DB, and even those of us
with extensive expert knowledge struggle to accurately characterize the
risk level).

The hand-wringing often comes down to the argument "but MAYBE en_US
didn't change in those 3 major version releases of ICU that you jumped
across to land a new Ubuntu LTS release" ~~ however I believe it's one
thing to make this argument with ISO 8859 but in the unicode world en_US
has default sort rules for japanese, chinese, arabic, cyrilic, nepalese,
and all kinds of strings with nonsensical combinations of all these
characters.  After some years of ICU and PG, I'm just coming to a
conclusion that the right thing to do is stay safe and don't change ICU
versions (or glibc versions) for existing databases in-place.

-Jeremy


-- 
Jeremy Schneider
Performance Engineer
Amazon Web Services


Re: Don't use bms_membership in places where it's not needed

2023-11-27 Thread David Rowley
On Tue, 28 Nov 2023 at 11:21, Andres Freund  wrote:
> Hm, does this ever matter from a performance POV? The current code does look
> simpler to read to me. If the overhead is relevant, I'd instead just move the
> code into a static inline?

I didn't particularly find the code in examine_variable() easy to
read. I think what's there now is quite a bit better than what was
there.

bms_get_singleton_member() was added in d25367ec4 for this purpose, so
it seems kinda weird not to use it.

David




Re: GUC names in messages

2023-11-27 Thread Michael Paquier
On Mon, Nov 27, 2023 at 01:35:44AM -0500, Tom Lane wrote:
> Laurenz Albe  writes:
> > On Mon, 2023-11-27 at 13:41 +1100, Peter Smith wrote:
>>> In the documentation and in the guc_tables.c they are all described in
>>> MixedCase (e.g. "DateStyle" instead of "datestyle"), so I felt the
>>> messages should use the same case the documentation, which is why I
>>> changed all the ones you are referring to.
> 
>> I agree with that decision; we should use mixed case for these parameters.
>> Otherwise we might get complaints that the following query does not return
>> any results:
>>   SELECT * FROM pg_settings WHERE name = 'timezone';

(I'm sure that you mean the opposite.  This query does not return any
results on HEAD, but it would with "TimeZone".)

> Yeah.  Like Michael upthread, I've wondered occasionally about changing
> these names to all-lower-case.  It'd surely be nicer if we'd done it
> like that to begin with.  But I can't convince myself that the ensuing
> user pain would be justified.

Perhaps not.  I'd like to think that a lot of queries on pg_settings
have the wisdom to apply a lower() or upper(), but that's very
unlikely.

-errhint("Perhaps you need a different \"datestyle\" setting.")));
+errhint("Perhaps you need a different DateStyle setting.")));

Saying that, I'd let this one be in 0002.  It causes a log of diff
churn in the tests and quoting it based on Alvaro's suggestion would
still be correct because it's fully lower-case.  (Yeah, I'm perhaps
nit-ing here, so feel free to counter-argue if you prefer what the
patch does.)
--
Michael


signature.asc
Description: PGP signature


Re: New instability in stats regression test

2023-11-27 Thread Michael Paquier
On Mon, Nov 27, 2023 at 02:01:51PM -0500, Tom Lane wrote:
> The problem as I see it is that this test:
> 
> SELECT :io_stats_post_reset < :io_stats_pre_reset;
> 
> requires an assumption that less I/O has happened since the commanded
> reset action than happened before it (extending back to the previous
> reset, or cluster start).  Since concurrent processes might be doing
> I/O, this has a race condition.  If we are slow enough about obtaining
> :io_stats_post_reset, the test *will* fail eventually.  But the shorter
> the distance back to the previous reset, the bigger the odds of
> observable trouble; thus Michael's concern that adding more reset
> tests in future would increase the risk of failure.

The new reset added just before checking the contents of pg_stat_io
reduces :io_stats_pre_reset from 7M to 50k.  That's a threshold easy
to reach if you have a checkpoint or an autovacuum running in
parallel.  I have not checked the buildfarm logs in details, but I'd
put a coin on a checkpoint triggered by time if the issue happened on
a slow machine.
--
Michael


signature.asc
Description: PGP signature


Re: Adding facility for injection points (or probe points?) for more advanced tests

2023-11-27 Thread Michael Paquier
On Mon, Nov 27, 2023 at 12:14:05PM +0530, Ashutosh Bapat wrote:
> Since you wroten "(still need to improve ...) I thought you are
> working on v6. No problem. Sorry for the confusion.

I see why my previous message could be confusing.  Sorry about that.
--
Michael


signature.asc
Description: PGP signature


Re: Dynamically generate a nested JSON file

2023-11-27 Thread David G. Johnston
On Mon, Nov 27, 2023 at 2:10 PM Rushabh Shah  wrote:

>
> I want to dynamically generate a nested json file. I have written a
> function for it in PL/PGSQL that accepts 3 arrays. First one is an array of
> all json fields, second one is an array of all json fields with columns
> from tables present in db, third one mentions the type for all the fields
> inside the json file.
>
>
This is a completely inappropriate question for this list - it is for
discussions related to writing patches to the project source code.

The -general mailing list is where you can solicit help for problems you
are having while using PostgreSQL.

David J.


Re: autovectorize page checksum code included elsewhere

2023-11-27 Thread Nathan Bossart
On Mon, Nov 27, 2023 at 03:21:53PM -0600, Nathan Bossart wrote:
> On Sat, Nov 25, 2023 at 02:09:14PM +0700, John Naylor wrote:
>> Sorry, I wasn't clear, I mean: detect that a packager passed
>> "-march=x86_64-v2" in the CFLAGS, so that a symbol in header files
>> would cause inlining of functions containing certain intrinsics.
>> Exhaustively checking features defeats the purpose of having an
>> industry-standard shorthand, and says nothing about what the package
>> does or does not require of the target machine.
> 
> I'm not sure why I thought checking each feature might be necessary.
> --with-isa-level could essentially just be an alias for adding all the
> CFLAGS for the extensions provided at that level, and --with-isa-level=auto
> would just mean -march=native.  With those flags set, the ./configure
> checks would succeed with the base set of compiler flags passed in, which
> could be used as a heuristic for inlining (like CRC32C does today).

Or, perhaps you mean removing those ./configure checks completely and
assuming that the compiler knows about the intrinsics required for the
specified ISA level...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: proposal: change behavior on collation version mismatch

2023-11-27 Thread Jeff Davis
On Mon, 2023-11-27 at 22:37 +0100, Magnus Hagander wrote:
> That is, set it to "warnings only", insert a single row into the
> table
> with an "unlucky" key, set it back to "errors always" and you now
> have
> a corrupt database, but your setting reflects that it shouldn't be
> corrupt.

You would be giving the setting too much credit if you assume that
consistently keeping it on "error" is a guarantee against corruption.

It only affects what we do when we detect potential corruption, but our
detection is subject to both false positives and false negatives.

We'd need to document the setting so that users understand the
consequences and limitations.

I won't push strongly for such a setting to exist because I know that
it's far from a complete solution. But I believe it would be sensible
considering that this problem is going to take a while to resolve.

Regards,
Jeff Davis





Re: Don't use bms_membership in places where it's not needed

2023-11-27 Thread Andres Freund
Hi,

On 2023-11-24 17:06:25 +1300, David Rowley wrote:
> While working on the patch in [1], I noticed that ever since
> 00b41463c, it's now suboptimal to do the following:
> 
> switch (bms_membership(relids))
> {
> case BMS_EMPTY_SET:
>/* handle empty set */
>break;
> case BMS_SINGLETON:
> /* call bms_singleton_member() and handle singleton set */
> break;
> case BMS_MULTIPLE:
>/* handle multi-member set */
>break;
> }
> 
> The following is cheaper as we don't need to call bms_membership() and
> bms_singleton_member() for singleton sets. It also saves function call
> overhead for empty sets.
> 
> if (relids == NULL)
>/* handle empty set */
> else
> {
> int relid;
> 
> if (bms_get_singleton(relids, &relid))
> /* handle singleton set */
>else
>/* handle multi-member set */
> }

Hm, does this ever matter from a performance POV? The current code does look
simpler to read to me. If the overhead is relevant, I'd instead just move the
code into a static inline?

Greetings,

Andres Freund




Re: common signal handler protection

2023-11-27 Thread Nathan Bossart
Thanks for taking a look.

On Wed, Nov 22, 2023 at 02:59:45PM -0800, Andres Freund wrote:
> On 2023-11-22 15:59:44 -0600, Nathan Bossart wrote:
>> +/*
>> + * Except when called with SIG_IGN or SIG_DFL, pqsignal() sets up this 
>> function
>> + * as the handler for all signals.  This wrapper handler function checks 
>> that
>> + * it is called within a process that the server knows about, and not a
>> + * grandchild process forked by system(3), etc.
> 
> Similar comment to earlier - the grandchildren bit seems like a dangling
> reference. And also too specific - I think we could encounter this in single
> user mode as well?
> 
> Perhaps it could be reframed to "postgres processes, as determined by having
> called InitProcessGlobals()"?

Eh, apparently my attempt at being clever didn't pan out.  I like your idea
of specifying InitProcessGlobals(), but I might also include "e.g., client
backends", too.

> Hm. I wonder if this indicates an issue.  Do the preceding changes perhaps
> make it more likely that a child process of the startup process could hang
> around for longer, because the signal we're delivering doesn't terminate child
> processes, because we'd just reset the signal handler?  I think it's fine for
> the startup process, because we ask the startup process to shut down with
> SIGTERM, which defaults to exiting.
> 
> But we do have a few processes that we do ask to shut down with other
> signals, wich do not trigger an exit by default, e.g. Checkpointer, archiver,
> walsender are asked to shut down using SIGUSR2 IIRC.  The only one where that
> could be an issue is archiver, I guess?

This did cross my mind.  AFAICT most default handlers already trigger an
exit (including SIGUSR2), and for the ones that don't, we'd just end up in
the same situation as if the signal arrived a moment later after the child
process has installed its own handlers.  And postmaster doesn't send
certain signals (e.g., SIGHUP) to the whole process group, so we don't have
the opposite problem where things like reloading configuration files
terminates these child processes.

So, I didn't notice any potential issues.  Did you have anything else in
mind?

>> -/* not safe if forked by system(), etc. */
>> -if (MyProc->pid != (int) getpid())
>> -elog(PANIC, "AuxiliaryProcKill() called in child process");
>> -
>>  auxproc = &AuxiliaryProcs[proctype];
>>  
>>  Assert(MyProc == auxproc);
> 
> I think we should leave these checks. It's awful to debug situations where a
> proc gets reused and the cost of the checks is irrelevant. The checks don't
> just protect against child processes, they also protect e.g. against calling
> those functions twice (we IIRC had cases of that).

Sure, that's no problem.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: POC, WIP: OR-clause support for indexes

2023-11-27 Thread Peter Geoghegan
On Mon, Nov 27, 2023 at 1:04 PM Robert Haas  wrote:
> The use of op_mergejoinable() seems pretty random to me. Why should we
> care about that? If somebody writes a<1 or a<2 or a<3 or a<4, you can
> transform that to a good idea, but I think it's a legal transformation.

That kind of transformation is likely to be a very good idea, because
nbtree's _bt_preprocess_array_keys() function knows how to perform
preprocessing that makes the final index qual "a < 1". Obviously that
could be far more efficient.

Further suppose you have a machine generated query  "a<1 or a<2 or a<3
or a<4 AND a = 2" -- same as before, except that I added "AND a = 2"
to the end. Now _bt_preprocess_array_keys() will be able to do the
aforementioned inequality preprocessing, just as before. But this time
_bt_preprocess_keys() (a different function with a similar name) can
see that the quals are contradictory. That makes the entire index scan
end, before it ever really began.

Obviously, this is an example of a more general principle: a great
deal of the benefit of these transformations is indirect, in the sense
that they come from enabling further transformations/optimizations,
that apply in the context of some particular query. So you have to
think holistically.

It's perhaps a little unfortunate that all of this nbtree
preprocessing stuff is totally opaque to the planner. Tom has
expressed concerns about that in the past, FWIW:

https://www.postgresql.org/message-id/2587523.1647982...@sss.pgh.pa.us
(see the final paragraph for the reference)

There might be some bigger picture to doing all of these
transformations, in a way that maximizes opportunities to apply
further transformations/optimizations. You know much more about the
optimizer than I do, so maybe this is already very obvious. Just
pointing it out.

> Honestly, it seems very hard to avoid the conclusion that this
> transformation is being done at too early a stage. Parse analysis is
> not the time to try to do query optimization. I can't really believe
> that there's a way to produce a committable patch along these lines.
> Ideally, a transformation like this should be done after we know what
> plan shape we're using (or considering using), so that we can make
> cost-based decisions about whether to transform or not. But at the
> very least it should happen somewhere in the planner. There's really
> no justification for parse analysis rewriting the SQL that the user
> entered.

I am sure that there is a great deal of truth to this. The general
conclusion about parse analysis being the wrong place for this seems
very hard to argue with. But I'm much less sure that there needs to be
a conventional cost model.

The planner's cost model is supposed to have some basis in physical
runtime costs, which is not the case for any of these transformations.
Not in any general sense; they're just transformations that enable
finding a cheaper way to execute the query. While they have to pay for
themselves, in some sense, I think that that's purely a matter of
managing the added planner cycles. In principle they shouldn't have
any direct impact on the physical costs incurred by physical
operators. No?

As I keep pointing out, there is a sound theoretical basis to the idea
of normalizing to conjunctive normal form as its own standard step in
query processing. To some extent we do this already, but it's all
rather ad-hoc. Even if (say) the nbtree preprocessing transformations
that I described were something that the planner already knew about
directly, they still wouldn't really need to be costed. They're pretty
much strictly better at runtime (at most you only have to worry about
the fixed cost of determining if they apply at all).

-- 
Peter Geoghegan




Re: logical decoding and replication of sequences, take 2

2023-11-27 Thread Peter Smith
FWIW, here are some more minor review comments for v20231127-3-0001

==
doc/src/sgml/logicaldecoding.sgml

1.
+  The txn parameter contains meta information about
+  the transaction the sequence change is part of. Note however that for
+  non-transactional updates, the transaction may be NULL, depending on
+  if the transaction already has an XID assigned.
+  The sequence_lsn has the WAL location of the
+  sequence update. transactional says if the
+  sequence has to be replayed as part of the transaction or directly.

/says if/specifies whether/

==
src/backend/commands/sequence.c

2. DecodeSeqTuple

+ memcpy(((char *) tuple->tuple.t_data),
+data + sizeof(xl_seq_rec),
+SizeofHeapTupleHeader);
+
+ memcpy(((char *) tuple->tuple.t_data) + SizeofHeapTupleHeader,
+data + sizeof(xl_seq_rec) + SizeofHeapTupleHeader,
+datalen);

Maybe I am misreading but isn't this just copying 2 contiguous pieces
of data? Won't a single memcpy of (SizeofHeapTupleHeader + datalen)
achieve the same?

==
.../replication/logical/reorderbuffer.c

3.
+ *   To decide if a sequence change is transactional, we maintain a hash
+ *   table of relfilenodes created in each (sub)transactions, along with
+ *   the XID of the (sub)transaction that created the relfilenode. The
+ *   entries from substransactions are copied to the top-level transaction
+ *   to make checks cheaper. The hash table gets cleaned up when the
+ *   transaction completes (commit/abort).

/substransactions/subtransactions/

~~~

4.
+ * A naive approach would be to just loop through all transactions and check
+ * each of them, but there may be (easily thousands) of subtransactions, and
+ * the check happens for each sequence change. So this could be very costly.

/may be (easily thousands) of/may be (easily thousands of)/

~~~

5. ReorderBufferSequenceCleanup

+ while ((ent = (ReorderBufferSequenceEnt *)
hash_seq_search(&scan_status)) != NULL)
+ {
+ (void) hash_search(txn->toptxn->sequences_hash,
+(void *) &ent->rlocator,
+HASH_REMOVE, NULL);
+ }

Typically, other HASH_REMOVE code I saw would check result for NULL to
give elog(ERROR, "hash table corrupted");

~~~

6. ReorderBufferQueueSequence

+ if (xid != InvalidTransactionId)
+ txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);

How about using the macro: TransactionIdIsValid

~~~

7. ReorderBufferQueueSequence

+ if (reloid == InvalidOid)
+ elog(ERROR, "could not map filenode \"%s\" to relation OID",
+ relpathperm(rlocator,
+ MAIN_FORKNUM));

How about using the macro: OidIsValid

~~~

8.
+ /*
+ * Calculate the first value of the next batch (at which point we
+ * generate and decode another WAL record.
+ */

Missing ')'

~~~

9. ReorderBufferAddRelFileLocator

+ /*
+ * We only care about sequence relfilenodes for now, and those always have
+ * a XID. So if there's no XID, don't bother adding them to the hash.
+ */
+ if (xid == InvalidTransactionId)
+ return;

How about using the macro: TransactionIdIsValid

~~~

10. ReorderBufferProcessTXN

+ if (reloid == InvalidOid)
+ elog(ERROR, "could not map filenode \"%s\" to relation OID",
+ relpathperm(change->data.sequence.locator,
+ MAIN_FORKNUM));

How about using the macro: OidIsValid

~~~

11. ReorderBufferChangeSize

+ if (tup)
+ {
+ sz += sizeof(HeapTupleData);
+ len = tup->tuple.t_len;
+ sz += len;
+ }

Why is the 'sz' increment split into 2 parts?

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Don't use bms_membership in places where it's not needed

2023-11-27 Thread David Rowley
On Fri, 24 Nov 2023 at 19:54, Richard Guo  wrote:
> +1 to the idea.
>
> I think you have a typo in distribute_restrictinfo_to_rels.  We should
> remove the call of bms_singleton_member and use relid instead.

Thanks for reviewing.  I've now pushed this.

David




Re: proposal: change behavior on collation version mismatch

2023-11-27 Thread Magnus Hagander
On Mon, Nov 27, 2023 at 9:30 PM Jeff Davis  wrote:
>
> On Mon, 2023-11-27 at 11:06 -0800, Jeremy Schneider wrote:
> > If we want to have a GUC that
> > allows warning behavior, I think that's OK but I think it should be
> > superuser-only and documented as a "developer" setting similar to
> > zero_damaged_pages.
>
> A GUC seems sensible to express the availability-vs-safety trade-off. I
> suspect we can get a GUC that defaults to "warning" committed, but
> anything else will be controversial.

A guc like this would bring a set of problems similar to what we have
e.g. with fsync.

That is, set it to "warnings only", insert a single row into the table
with an "unlucky" key, set it back to "errors always" and you now have
a corrupt database, but your setting reflects that it shouldn't be
corrupt. Sure, people shouldn't do that - but people will, and it will
make things harder to debug.

There's been talk before about adding a "tainted" flag or similar to
pg_control that gets set if you ever start the system with fsync=off.
Similar things could be done here of course, but I'd worry a bit about
adding another flag like this which can lead to
hard-to-determine-state without resolving that.

(The fact that we have "fsync" under WAL config and not developer
options is an indication that we can't really use the classification
of the config parameters are a good indicator of what's safe and not
safe to set)

I could get behind turning it into an error though :)

//Magnus




Re: Partial aggregates pushdown

2023-11-27 Thread Tom Lane
Robert Haas  writes:
> On Mon, Nov 27, 2023 at 3:59 PM Tom Lane  wrote:
>> TBH, I think this entire proposal is dead in the water.  Which is
>> sad from a performance standpoint, but I can't see any way that
>> we would not regret shipping a feature that makes such assumptions.

> I think it's ridiculous to just hold our breath and pretend like this
> feature isn't needed -- it's at least half a decade overdue. We engage
> in endless hand-wringing over local-remote symmetry in cases where
> other systems seem to effortlessly make that assumption and then get
> on with building new features.

Well, one of the founding principles of postgres_fdw was to be able
to talk to PG servers that are not of the same version as yours.
If we break that in the name of performance, we are going to have
a lot of unhappy users.  Even the ones who do get the benefit of
the speedup are going to be unhappy when it breaks because they
didn't upgrade local and remote at exactly the same time.

Just because we'd like to have it doesn't make the patch workable
in the real world.

regards, tom lane




Re: autovectorize page checksum code included elsewhere

2023-11-27 Thread Nathan Bossart
On Sat, Nov 25, 2023 at 02:09:14PM +0700, John Naylor wrote:
> On Thu, Nov 23, 2023 at 11:51 PM Nathan Bossart
>  wrote:
>>
>> On Thu, Nov 23, 2023 at 05:50:48PM +0700, John Naylor wrote:
>> > On Thu, Nov 23, 2023 at 1:49 AM Nathan Bossart  
>> > wrote:
>> >> One half-formed idea I have is to introduce some sort of ./configure flag
>> >> that enables all the newer instructions that your CPU understands.
>> >
>> > That's not doable,
>>
>> It's not?
> 
> What exactly would our build system do differently with e.g.
> "-march=native" (which is already possible for people who compile
> their own binaries, via CFLAGS), if I understand you correctly?

It would probably just be an easier way of doing that than adjusting COPT
in src/Makefile.custom.

>> Maybe we have something like --with-isa-level where you can specify
>> x86-64-v[1-4] or "auto" to mean "build whatever the current machine can
>> handle."
> 
> That could work, but with the below OS's, it should work
> automatically. Also, we may not be many years off from the day we can
> move our baseline as well, such that older buildfarm members (if we
> have any) would need to pass --disable-isa-extensions, but that may be
> pushing things too much for too little benefit. *

You are probably right.  I guess I'm wondering whether we need to make all
this configurable.  Maybe we could get away with moving our baseline to v2
soon, but if we'd also like to start adding AVX2 enhancements (and I think
we will), I'm assuming we'll want to provide an easy way for users to
declare that they are building for v3+ CPUs.

>> > Not sure how to detect that easily.
>>
>> I briefly looked around and didn't see a portable way to do so.  We might
>> have to exhaustively check the features, which doesn't seem like it'd be
>> too bad for x86_64, but I haven't looked too closely at other
>> architectures.
> 
> Sorry, I wasn't clear, I mean: detect that a packager passed
> "-march=x86_64-v2" in the CFLAGS, so that a symbol in header files
> would cause inlining of functions containing certain intrinsics.
> Exhaustively checking features defeats the purpose of having an
> industry-standard shorthand, and says nothing about what the package
> does or does not require of the target machine.

I'm not sure why I thought checking each feature might be necessary.
--with-isa-level could essentially just be an alias for adding all the
CFLAGS for the extensions provided at that level, and --with-isa-level=auto
would just mean -march=native.  With those flags set, the ./configure
checks would succeed with the base set of compiler flags passed in, which
could be used as a heuristic for inlining (like CRC32C does today).

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Partial aggregates pushdown

2023-11-27 Thread Robert Haas
On Mon, Nov 27, 2023 at 3:59 PM Tom Lane  wrote:
> Even if the partial-aggregate serialization value isn't "internal"
> but some more-narrowly-defined type, it is still an internal
> implementation detail of the aggregate.  You have no right to assume
> that the remote server implements the aggregate the same way the
> local one does.  If we start making such an assumption then we'll
> be unable to revise the implementation of an aggregate ever again.
>
> TBH, I think this entire proposal is dead in the water.  Which is
> sad from a performance standpoint, but I can't see any way that
> we would not regret shipping a feature that makes such assumptions.

I think it's ridiculous to just hold our breath and pretend like this
feature isn't needed -- it's at least half a decade overdue. We engage
in endless hand-wringing over local-remote symmetry in cases where
other systems seem to effortlessly make that assumption and then get
on with building new features. It's not that I disagree with the
concern; we're *already* doing stuff that is unprincipled in a bunch
of different areas and that could and occasionally does cause queries
that push things to the remote side to return wrong answers, and I
hate that. But the response to that can't be to refuse to add new
features and maybe rip out the features we already have. Users don't
like it when pushdown causes queries to return wrong answers, but they
like it even less when the pushdown doesn't happen in the first place
and the query runs until the heat death of the universe. I'm not
entirely sure what the right design ideas are here, but giving up and
refusing to add features ought to be completely off the table.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Dynamically generate a nested JSON file

2023-11-27 Thread Rushabh Shah
Hi,

I want to dynamically generate a nested json file. I have written a
function for it in PL/PGSQL that accepts 3 arrays. First one is an array of
all json fields, second one is an array of all json fields with columns
from tables present in db, third one mentions the type for all the fields
inside the json file.

This what I have so for that is working:

declare outputs text;
 begin
 outputs = '';
 for i in 1 .. array_upper(fieldtype, 1) loop
 select case
 when lower(fieldtype[i]) = 'field' then (outputs || '' ||
jsonb_build_object( fname[i], tcolumn[i] )::text)::text

when lower(fieldtype[i]) = 'json object' then (outputs || '' ||
jsonb_build_object( fname[i], jsonb_build_object() )::text)::text

 when lower(fieldtype[i]) = 'json array' then (outputs || '' ||
json_build_array( fname[i], json_build_array() )::text)::text

 else 'It is not field, object or an array'::text
end case into outputs
 from tblname;
end loop;
 return outputs;
end;

So, not for example the input for my function is:
fname: [‘passenger’, ‘firstname’, ‘lastname’, ‘address’, ‘city’, ‘state’,
‘country’]
tcolumn: [,’pass.fname’, ‘pass.lname’, , ‘address.city’, ‘address.state’,
‘address.country’]
ftype: [‘json object’, ‘field’, ‘field’, ‘json array’, ‘field’, ‘field’,
‘field’]

This is what I want my output to look like:
{
  passenger: {
   “firstname”: “john”,
   “lastname”: “smith”,
   “address”: [
 {
   “city”: “Houston”,
   “state”: “Texas”,
   “country”: “USA”
 }
]
}
}

But currently I am having difficulty adding firstname inside passenger json
object.

I know that I need to again loop through the json field names array to go
to next one inside jsonb_build_object() function to get the fields and
arrays inside but that would make my function very big. This is what I need
some assistance with.

Thanks for all the help.


Re: Missing docs on AT TIME ZONE precedence?

2023-11-27 Thread Andrew Dunstan



On 2023-11-27 Mo 15:34, Tom Lane wrote:

Alvaro Herrera  writes:

(TBH I don't think the added comments really explain the problems fully.
That's most likely because I don't actually understand what the problems
are.)

The actual problem is that nobody has applied a cluestick to the SQL
committee about writing an unambiguous grammar :-(.  But I digress.

I don't like the existing coding for more reasons than just
underdocumentation.  Global assignment of precedence is a really,
really dangerous tool for solving ambiguous-grammar problems, because
it can mask problems unrelated to the one you think you are solving:
basically, it eliminates bison's complaints about grammar ambiguities
related to the token you mark.  (Commits 12b716457 and 28a61fc6c are
relevant here.)  Attaching precedence to individual productions is
far safer, because it won't have any effect that extends beyond that
production.  You still need a precedence attached to the lookahead
token; but I think we should try very hard to not assign a precedence
different from IDENT's to any unreserved keywords.

After a bit of fooling around I found a patch that seems to meet
that criterion; attached.






Looks good. Perhaps the comments above the UNBOUNDED precedence setting 
(esp. the first paragraph) need strengthening, with a stern injunction 
to avoid different precedence for non-reserved keywords if at all possible.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: POC, WIP: OR-clause support for indexes

2023-11-27 Thread Robert Haas
On Mon, Nov 27, 2023 at 3:02 AM Andrei Lepikhov
 wrote:
> On 25/11/2023 08:23, Alexander Korotkov wrote:
> > I think patch certainly gets better in this aspect.  One thing I can't
> > understand is why do we use home-grown code for resolving
> > hash-collisions.  You can just define custom hash and match functions
> > in HASHCTL.  Even if we need to avoid repeated JumbleExpr calls, we
> > still can save pre-calculated hash value into hash entry and use
> > custom hash and match.  This doesn't imply us to write our own
> > collision-resolving code.
>
> Thanks, it was an insightful suggestion.
> I implemented it, and the code has become shorter (see attachment).

Neither the code comments nor the commit message really explain the
design idea here. That's unfortunate, principally because it makes
review difficult.

I'm very skeptical about the idea of using JumbleExpr for any part of
this. It seems fairly expensive, and it might produce false matches.
If expensive is OK, then why not just use equal()? If it's not, then
this probably isn't really OK either. But in any case there should be
comments explaining why this strategy was chosen.

The use of op_mergejoinable() seems pretty random to me. Why should we
care about that? If somebody writes a<1 or a<2 or a<3 or a<4, you can
transform that to ahttp://www.enterprisedb.com




Re: locked reads for atomics

2023-11-27 Thread Nathan Bossart
Here's a v2 of the patch set in which I've attempted to address all
feedback.  I've also added a pg_write_membarrier_u* pair of functions that
provide an easy way to write to an atomic variable with full barrier
semantics.  In the generic implementation, these are just aliases for an
atomic exchange.

0002 demonstrates how these functions might be used to eliminate the
arch_lck spinlock, which is only ever used for one boolean variable.  My
hope is that the membarrier functions make eliminating spinlocks for
non-performance-sensitive code easy to reason about.

(We might be able to use a pg_atomic_flag instead for 0002, but that code
seems intended for a slightly different use-case and has more complicated
barrier semantics.)

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 53f717e84a3aa912f9d89dce4be7f77d9bffb3c2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Fri, 24 Nov 2023 20:21:12 -0600
Subject: [PATCH v2 1/2] add membarrier operations

---
 src/include/port/atomics.h | 59 ++
 src/include/port/atomics/generic.h | 36 ++
 2 files changed, 95 insertions(+)

diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
index bbff945eba..a889f0236e 100644
--- a/src/include/port/atomics.h
+++ b/src/include/port/atomics.h
@@ -239,6 +239,26 @@ pg_atomic_read_u32(volatile pg_atomic_uint32 *ptr)
 	return pg_atomic_read_u32_impl(ptr);
 }
 
+/*
+ * pg_atomic_read_membarrier_u32 - read with barrier semantics.
+ *
+ * This read is guaranteed to return the current value, provided that the value
+ * is only ever updated via operations with barrier semantics, such as
+ * pg_atomic_compare_exchange_u32() and pg_atomic_write_membarrier_u32().  Note
+ * that this is less performant than pg_atomic_read_u32(), but it may be easier
+ * to reason about correctness with this function in less performance-sensitive
+ * code.
+ *
+ * Full barrier semantics.
+ */
+static inline uint32
+pg_atomic_read_membarrier_u32(volatile pg_atomic_uint32 *ptr)
+{
+	AssertPointerAlignment(ptr, 4);
+
+	return pg_atomic_read_membarrier_u32_impl(ptr);
+}
+
 /*
  * pg_atomic_write_u32 - write to atomic variable.
  *
@@ -276,6 +296,27 @@ pg_atomic_unlocked_write_u32(volatile pg_atomic_uint32 *ptr, uint32 val)
 	pg_atomic_unlocked_write_u32_impl(ptr, val);
 }
 
+/*
+ * pg_atomic_write_membarrier_u32 - write with barrier semantics.
+ *
+ * The write is guaranteed to succeed as a whole, i.e., it's not possible to
+ * observe a partial write for any reader.  Note that this correctly interacts
+ * with both pg_atomic_compare_exchange_u32() and
+ * pg_atomic_read_membarrier_u32().  While this may be less performant than
+ * pg_atomic_write_u32() and pg_atomic_unlocked_write_u32(), it may be easier
+ * to reason about correctness with this function in less performance-sensitive
+ * code.
+ *
+ * Full barrier semantics.
+ */
+static inline void
+pg_atomic_write_membarrier_u32(volatile pg_atomic_uint32 *ptr, uint32 val)
+{
+	AssertPointerAlignment(ptr, 4);
+
+	pg_atomic_write_membarrier_u32_impl(ptr, val);
+}
+
 /*
  * pg_atomic_exchange_u32 - exchange newval with current value
  *
@@ -429,6 +470,15 @@ pg_atomic_read_u64(volatile pg_atomic_uint64 *ptr)
 	return pg_atomic_read_u64_impl(ptr);
 }
 
+static inline uint64
+pg_atomic_read_membarrier_u64(volatile pg_atomic_uint64 *ptr)
+{
+#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
+	AssertPointerAlignment(ptr, 8);
+#endif
+	return pg_atomic_read_membarrier_u64_impl(ptr);
+}
+
 static inline void
 pg_atomic_write_u64(volatile pg_atomic_uint64 *ptr, uint64 val)
 {
@@ -438,6 +488,15 @@ pg_atomic_write_u64(volatile pg_atomic_uint64 *ptr, uint64 val)
 	pg_atomic_write_u64_impl(ptr, val);
 }
 
+static inline void
+pg_atomic_write_membarrier_u64(volatile pg_atomic_uint64 *ptr, uint64 val)
+{
+#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
+	AssertPointerAlignment(ptr, 8);
+#endif
+	pg_atomic_write_membarrier_u64_impl(ptr, val);
+}
+
 static inline uint64
 pg_atomic_exchange_u64(volatile pg_atomic_uint64 *ptr, uint64 newval)
 {
diff --git a/src/include/port/atomics/generic.h b/src/include/port/atomics/generic.h
index 95d99dd0be..9bcc3b48ce 100644
--- a/src/include/port/atomics/generic.h
+++ b/src/include/port/atomics/generic.h
@@ -243,6 +243,24 @@ pg_atomic_sub_fetch_u32_impl(volatile pg_atomic_uint32 *ptr, int32 sub_)
 }
 #endif
 
+#if !defined(PG_HAVE_ATOMIC_READ_MEMBARRIER_U32) && defined(PG_HAVE_ATOMIC_FETCH_ADD_U32)
+#define PG_HAVE_ATOMIC_READ_MEMBARRIER_U32
+static inline uint32
+pg_atomic_read_membarrier_u32_impl(volatile pg_atomic_uint32 *ptr)
+{
+	return pg_atomic_fetch_add_u32_impl(ptr, 0);
+}
+#endif
+
+#if !defined(PG_HAVE_ATOMIC_WRITE_MEMBARRIER_U32) && defined(PG_HAVE_ATOMIC_EXCHANGE_U32)
+#define PG_HAVE_ATOMIC_WRITE_MEMBARRIER_U32
+static inline void
+pg_atomic_write_membarrier_u32_impl(volatile pg_atomic_uint32 *ptr, uint32 val)
+{
+	(void) pg_atomic_exchange_u32_impl(ptr, val);
+}
+#endif
+
 #if !defined(PG_HAVE_ATOMIC_EXCHANGE_U64)

Re: Partial aggregates pushdown

2023-11-27 Thread Tom Lane
Robert Haas  writes:
> Also, I want to make one other point here about security and
> reliability. Right now, there is no way for a user to feed arbitrary
> data to a deserialization function. Since serialization and
> deserialization functions are only used in the context of parallel
> query, we always know that the data fed to the deserialization
> function must have come from the serialization function on the same
> machine. Nor can users call the deserialization function directly with
> arbitrary data of their own choosing, because users cannot call
> functions that take or return internal. But with this system, it
> becomes possible to feed arbitrary data to a deserialization function.

Ouch.  That is absolutely horrid --- we have a lot of stuff that
depends on users not being able to get at "internal" values, and
it sounds like the current proposal breaks all of that.

Quite aside from security concerns, there is no justification for
assuming that the "internal" values used on one platform/PG version
are identical to those used on another.  So if the idea is to
ship back "internal" values from the remote server to the local one,
I think it's basically impossible to make that work.

Even if the partial-aggregate serialization value isn't "internal"
but some more-narrowly-defined type, it is still an internal
implementation detail of the aggregate.  You have no right to assume
that the remote server implements the aggregate the same way the
local one does.  If we start making such an assumption then we'll
be unable to revise the implementation of an aggregate ever again.

TBH, I think this entire proposal is dead in the water.  Which is
sad from a performance standpoint, but I can't see any way that
we would not regret shipping a feature that makes such assumptions.

regards, tom lane




Re: proposal: change behavior on collation version mismatch

2023-11-27 Thread Jeff Davis
On Mon, 2023-11-27 at 20:19 +0100, Laurenz Albe wrote:
> I forgot to add that the problem will remain a problem until the
> day we start keeping our own copy of the ICU library in the source
> tree...

Another option is for packagers to keep specific ICU versions around
for an extended time, and make it possible for Postgres to link to the
right one more flexibly (e.g. tie at initdb time, or some kind of
multi-lib system).

Even if ICU is available, we still have the problem of defaults. initdb
defaults to libc, and so does CREATE COLLATION (even if the database
collation is ICU). So it will be a long time before it's used widely
enough to consider the problem solved.

And even after all of that, ICU is not perfect, and our support for it
still has various rough edges.

Regards,
Jeff Davis





Re: Missing docs on AT TIME ZONE precedence?

2023-11-27 Thread Tom Lane
Alvaro Herrera  writes:
> (TBH I don't think the added comments really explain the problems fully.
> That's most likely because I don't actually understand what the problems
> are.)

The actual problem is that nobody has applied a cluestick to the SQL
committee about writing an unambiguous grammar :-(.  But I digress.

I don't like the existing coding for more reasons than just
underdocumentation.  Global assignment of precedence is a really,
really dangerous tool for solving ambiguous-grammar problems, because
it can mask problems unrelated to the one you think you are solving:
basically, it eliminates bison's complaints about grammar ambiguities
related to the token you mark.  (Commits 12b716457 and 28a61fc6c are
relevant here.)  Attaching precedence to individual productions is
far safer, because it won't have any effect that extends beyond that
production.  You still need a precedence attached to the lookahead
token; but I think we should try very hard to not assign a precedence
different from IDENT's to any unreserved keywords.

After a bit of fooling around I found a patch that seems to meet
that criterion; attached.

regards, tom lane

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8c00b119ec..340823588a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -821,11 +821,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %nonassoc	BETWEEN IN_P LIKE ILIKE SIMILAR NOT_LA
 %nonassoc	ESCAPE			/* ESCAPE must be just above LIKE/ILIKE/SIMILAR */
 
-/* SQL/JSON related keywords */
-%nonassoc	UNIQUE JSON
-%nonassoc	KEYS OBJECT_P SCALAR VALUE_P
-%nonassoc	WITH WITHOUT
-
 /*
  * To support target_el without AS, it used to be necessary to assign IDENT an
  * explicit precedence just less than Op.  While that's not really necessary
@@ -850,9 +845,15 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
  * an explicit priority lower than '(', so that a rule with CUBE '(' will shift
  * rather than reducing a conflicting rule that takes CUBE as a function name.
  * Using the same precedence as IDENT seems right for the reasons given above.
+ *
+ * KEYS, OBJECT_P, SCALAR, VALUE_P, WITH, and WITHOUT are similarly assigned
+ * the same precedence as IDENT.  This allows resolving conflicts in the
+ * json_predicate_type_constraint and json_key_uniqueness_constraint_opt
+ * productions (see comments there).
  */
 %nonassoc	UNBOUNDED		/* ideally would have same precedence as IDENT */
 %nonassoc	IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP
+			KEYS OBJECT_P SCALAR VALUE_P WITH WITHOUT
 %left		Op OPERATOR		/* multi-character ops and user-defined operators */
 %left		'+' '-'
 %left		'*' '/' '%'
@@ -16519,21 +16520,35 @@ json_returning_clause_opt:
 			| /* EMPTY */			{ $$ = NULL; }
 		;
 
+/*
+ * We must assign the only-JSON production a precedence less than IDENT in
+ * order to favor shifting over reduction when JSON is followed by VALUE_P,
+ * OBJECT_P, or SCALAR.  (ARRAY doesn't need that treatment, because it's a
+ * fully reserved word.)  Because json_predicate_type_constraint is always
+ * followed by json_key_uniqueness_constraint_opt, we also need the only-JSON
+ * production to have precedence less than WITH and WITHOUT.  UNBOUNDED isn't
+ * really related to this syntax, but it's a convenient choice because it
+ * already has a precedence less than IDENT for other reasons.
+ */
 json_predicate_type_constraint:
-			JSON	{ $$ = JS_TYPE_ANY; }
+			JSON	%prec UNBOUNDED	{ $$ = JS_TYPE_ANY; }
 			| JSON VALUE_P			{ $$ = JS_TYPE_ANY; }
 			| JSON ARRAY			{ $$ = JS_TYPE_ARRAY; }
 			| JSON OBJECT_P			{ $$ = JS_TYPE_OBJECT; }
 			| JSON SCALAR			{ $$ = JS_TYPE_SCALAR; }
 		;
 
-/* KEYS is a noise word here */
+/*
+ * KEYS is a noise word here.  To avoid shift/reduce conflicts, assign the
+ * KEYS-less productions a precedence less than IDENT (i.e., less than KEYS).
+ * This prevents reducing them when the next token is KEYS.
+ */
 json_key_uniqueness_constraint_opt:
 			WITH UNIQUE KEYS			{ $$ = true; }
-			| WITH UNIQUE{ $$ = true; }
+			| WITH UNIQUE%prec UNBOUNDED	{ $$ = true; }
 			| WITHOUT UNIQUE KEYS		{ $$ = false; }
-			| WITHOUT UNIQUE			{ $$ = false; }
-			| /* EMPTY */ %prec KEYS		{ $$ = false; }
+			| WITHOUT UNIQUE			%prec UNBOUNDED	{ $$ = false; }
+			| /* EMPTY */ %prec UNBOUNDED	{ $$ = false; }
 		;
 
 json_name_and_value_list:


Re: proposal: change behavior on collation version mismatch

2023-11-27 Thread Jeff Davis
On Mon, 2023-11-27 at 11:06 -0800, Jeremy Schneider wrote:
> I've been tracking the discussions around collation here on the lists
> and I've had a number of conversations with folks working deeply in
> this
> area inside and outside of AWS, and I was part of the effort to
> address
> it at AWS since we first became aware of it many years ago.

For the record, I don't have a strong opinion on your specific
proposals. Not because I don't care, but because the available options
all seem pretty bad -- including the status quo.

My general opinion (not tied specifically to your proposals) is that we
need to pursue a lot of different approaches and hope to mitigate the
problem. With that in mind, I think your proposals have merit but we of
course need to consider the downsides.

> 2) "most users would rather have ease-of-use than 100% safety, since
> it's uncommon"
> 
> And I think this led to the current behavior of issuing a warning
> rather
> than an error

The elevel trade-off is *availability* vs safety, not ease-of-use vs
safety. It's harder to reason about what most users might want in that
situation.

> First: I'd suggest that a collation version mismatch should cause an
> ERROR rather than a WARNING by default.

Is this proposal based on our current notion of collation version?
There's been a lot of reasonable skepticism that what's stored in
datcollversion is a good indicator.

> If we want to have a GUC that
> allows warning behavior, I think that's OK but I think it should be
> superuser-only and documented as a "developer" setting similar to
> zero_damaged_pages.

A GUC seems sensible to express the availability-vs-safety trade-off. I
suspect we can get a GUC that defaults to "warning" committed, but
anything else will be controversial.

Regards,
Jeff Davis





Re: Table AM Interface Enhancements

2023-11-27 Thread Mark Dilger



> On Nov 25, 2023, at 9:47 AM, Alexander Korotkov  wrote:
> 
>> Should the patch at least document which parts of the EState are expected to 
>> be in which states, and which parts should be viewed as undefined?  If the 
>> implementors of table AMs rely on any/all aspects of EState, doesn't that 
>> prevent future changes to how that structure is used?
> 
> New tuple tuple_insert_with_arbiter() table AM API method needs EState
> argument to call executor functions: ExecCheckIndexConstraints(),
> ExecUpdateLockMode(), and ExecInsertIndexTuples().  I think we
> probably need to invent some opaque way to call this executor function
> without revealing EState to table AM.  Do you think this could work?

We're clearly not accessing all of the EState, just some specific fields, such 
as es_per_tuple_exprcontext.  I think you could at least refactor to pass the 
minimum amount of state information through the table AM API.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Partial aggregates pushdown

2023-11-27 Thread Robert Haas
On Wed, Nov 22, 2023 at 5:16 AM fujii.y...@df.mitsubishielectric.co.jp
 wrote:
> I did not choose Approach2 because I was not confident that the disadvantage 
> mentioned in 2.(2)(a)
> would be accepted by the PostgreSQL development community.
> If it is accepted, I think Approach 2 is smarter.
> Could you please provide your opinion on which
> approach is preferable after comparing these two approaches?
> If we cannot say anything without comparing the amount of source code, as 
> Mr.Momjian mentioned,
> we need to estimate the amount of source code required to implement Approach2.

I've had the same concern, that approach #2 would draw objections, so
I think you were right to be cautious about it. I don't think it is a
wonderful approach in all ways, but I do think that it is superior to
approach #1. If we add dedicated support to the grammar, it is mostly
a one-time effort, and after that, there should not be much need for
anyone to be concerned about it. If we instead add extra aggregates,
then that generates extra work every time someone writes a patch that
adds a new aggregate to core. I have a difficult time believing that
anyone will prefer an approach that involves an ongoing maintenance
effort of that type over one that doesn't.

One point that seems to me to be of particular importance is that if
we add new aggregates, there is a risk that some future aggregate
might do that incorrectly, so that the main aggregate works, but the
secondary aggregate created for this feature does not work. That seems
like it would be very frustrating for future code authors so I'd like
to avoid the risk as much as we can.

Also, I want to make one other point here about security and
reliability. Right now, there is no way for a user to feed arbitrary
data to a deserialization function. Since serialization and
deserialization functions are only used in the context of parallel
query, we always know that the data fed to the deserialization
function must have come from the serialization function on the same
machine. Nor can users call the deserialization function directly with
arbitrary data of their own choosing, because users cannot call
functions that take or return internal. But with this system, it
becomes possible to feed arbitrary data to a deserialization function.
The user could redefine the function on the remote side so that it
produces arbitrary data of their choosing, and the local
deserialization function will ingest it.

That's potentially quite a significant problem. Consider for example
that numericvar_deserialize() does no validity checking on any of the
weight, sign, or dscale, but not all values for those fields are
legal. Right now that doesn't matter, but if you can feed arbitrary
data to that function, then it is. I don't know exactly what the
consequences are if you can get that function to spit out a NumericVar
with values outside the normal legal range. What can you do then?
Store a bogus numeric on disk? Crash the server? Worst case, some
problem like this could be a security issue allowing for escalation to
superuser; more likely, it would be a crash bug, corrupt your
database, or lead to unexpected and strange error messages.

Unfortunately, I have the unpleasant suspicion that most internal-type
aggregates will be affected by this problem. Consider, for example,
string_agg_deserialize(). Generally, strings are one of the
least-constrained data types, so you might hope that this function
would be OK. But it doesn't look very promising. The first int4 in the
serialized representation is the cursor, which would have to be
bounds-checked, lest someone provide a cursor that falls outside the
bounds of the StringInfo and, maybe, cause a reference to an arbitrary
memory location. Then the rest of the representation is the actual
data, which could be anything. This function is used for both bytea
and for text, and for bytea, letting the payload be anything is OK.
But for text, the supplied data shouldn't contain an embedded zero
byte, or otherwise be invalid in the server encoding. If it were, that
would provide a vector to inject invalidly encoded data into the
database. This feature can't be allowed to do that.

What could be a solution to this class of problems? One option is to
just give up on supporting this feature for internal-type aggregates
for now. That's easy enough to do, and just means we have less
functionality, but it's sad because that's functionality we'd like to
have. Another approach is to add necessary sanity checks to the
relevant deserialization functions, but that seems a little hard to
get right, and it would slow down parallel query cases which are
probably going to be more common than the use of this feature. I think
the slowdown might be significant, too. A third option is to change
those aggregates in some way, like giving them a transition function
that operates on some data type other than internal, but there again
we have to be careful of slowdowns. A final option i

Re: proposal: change behavior on collation version mismatch

2023-11-27 Thread Laurenz Albe
I forgot to add that the problem will remain a problem until the
day we start keeping our own copy of the ICU library in the source
tree...

Yours,
Laurenz Albe




Re: proposal: change behavior on collation version mismatch

2023-11-27 Thread Laurenz Albe
On Mon, 2023-11-27 at 11:06 -0800, Jeremy Schneider wrote:
> First: I'd suggest that a collation version mismatch should cause an
> ERROR rather than a WARNING by default. If we want to have a GUC that
> allows warning behavior, I think that's OK but I think it should be
> superuser-only and documented as a "developer" setting similar to
> zero_damaged_pages.
> 
> Second: I'd suggest that all of the "alter ... refresh collation"
> commands should be strictly superuser-only rather than
> owner-of-collation-privs, and that they should be similarly documented
> as something that is generally advised against and exists for
> extraordinary circumstances.

Thanks for spending thought on this painful subject.

I can get behind changing the collation version mismatch warning into
an error.  It would cause more annoyance, but might avert bigger pain
later on.

But I don't think that ALTER DATABASE ... REFRESH COLLATION VERSION
need be superuser-only.  Whoever creates an object may alter it in
PostgreSQL, and system collations are owned by the bootstrap superuser
anyway.  The point of the warning (or proposed error) is that the user
knows "here is a potential problem, have a closer look".

Yours,
Laurenz Albe




Re: Do away with a few backwards compatibility macros

2023-11-27 Thread Nathan Bossart
On Mon, Nov 27, 2023 at 04:29:18PM +0530, Bharath Rupireddy wrote:
> I think it's easy to miss/enforce a documented policy. IMV, moving
> towards pg_attribute_deprecated as Alvaro Herrera said in the other
> thread 
> https://www.postgresql.org/message-id/202311141920.edtj56saukiv%40alvherre.pgsql
> can help. Authors then can declare the variables and functions as
> deprecated so that the code compilation with
> -Wno-deprecated-declarations can help track all such deprecated code.

I'm +1 for adding pg_attribute_deprecated once we have something to use it
for.

> Having said that, I'm all +1 if the v1 patch proposed in this thread gets in.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Partial aggregates pushdown

2023-11-27 Thread Robert Haas
On Wed, Nov 22, 2023 at 1:32 AM Alexander Pyhalov
 wrote:
> Hi. HAVING is also a problem. Consider the following query
>
> SELECT count(a) FROM t HAVING count(a) > 10 - we can't push it down to
> foreign server as HAVING needs full aggregate result, but foreign server
> don't know it.

I don't see it that way. What we would push to the foreign server
would be something like SELECT count(a) FROM t. Then, after we get the
results back and combine the various partial counts locally, we would
locally evaluate the HAVING clause afterward. That is, partial
aggregation is a barrier to pushing down HAVING clause itself, but it
doesn't preclude pushing down the aggregation.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




proposal: change behavior on collation version mismatch

2023-11-27 Thread Jeremy Schneider
I had some interesting conversations with a couple PostgreSQL community
members at PASS Data Summit the week before last about the collation
problem, and then - just in this last week - I saw two more people on
public channels hitting corruption problems. One person on the public
PostgreSQL Slack, we eventually figured out they had upgraded from
Ubuntu 18.04 to 22.04 which hits glibc 2.28; a second person here on the
pgsql-general list reported by Daniel Westermann and I assume
representing a client of dbi services [1]. Everyone who's been tracking
this over the past few years has seen the steady stream of quiet
complaints in public from people at almost every major PG company,
largely around the glibc 2.28 debacle.

I've been tracking the discussions around collation here on the lists
and I've had a number of conversations with folks working deeply in this
area inside and outside of AWS, and I was part of the effort to address
it at AWS since we first became aware of it many years ago.

It seems to me the general perspective on the mailing lists is that:

1) "collation changes are uncommon" (which is relatively correct)
2) "most users would rather have ease-of-use than 100% safety, since
it's uncommon"

And I think this led to the current behavior of issuing a warning rather
than an error, and providing a SQL command "alter ... refresh collation"
which simply instructs the database to permanently forget the warning
without changing anything. I agree that some users might prefer this
behavior, but I think businesses like banks or healthcare companies
would be appalled, and would prefer to do the extra work and have
certainty of avoiding small but known probabilities of silent data
corruption.

As I said on the pgsql-general thread: glibc 2.28 has certainly been the
most obvious and impactful case, so the focus is understandable, but
there's a bit of a myth in the general public that the problem is only
with glibc 2.28 (and not ICU or other glibc versions or data structures
other than indexes). Unfortunately, contrary to current popular belief,
the only truly safe way to update an operating system under PosgreSQL is
with logical dump/load or logical replication, or continuing to compile
and use the exact older version of ICU from the old OS (if you use ICU).
I think the ICU folks are generally careful enough that it'll be far
less likely for compiler changes and new compiler optimizations to
inadvertently change collation on newer operating systems and newer
build toolchains (eg. for strings with don't have linguistically defined
collation, like mixing characters from multiple languages and classes).

It's been two years now since I published the collation torture test
over on github, which directly compares 10 years of both glibc and ICU
changes and demonstrates clearly that both ICU and glibc libraries have
regular small changes, and both libraries have had at least one release
with a massive number of changes. [2]

I also published a blog post this past March with a step-by-step
reproducible demonstration of silent corruption without any indexes
involved by using ICU (not glibc) with an OS upgrade from Ubuntu 20.04
to 22.04. [3]  The warning was not even displayed to the user, because
it happened at connection time rather than query time.

That blog also listed many reasons that glibc & ICU regularly include
small changes and linked to real examples from ICU: new characters
(which use existing code points), fixing incorrect rules, governments or
universities clarifying rules, general improvements, and unintentional
changes from code changes or refactors (like glibc in Ubuntu 15.04
changing sort order for 22 thousand CJK characters, many years prior to
2.28).

My own personal opinion here is that PostgreSQL is on a clear trajectory
to soon be the dominant database of businesses like banks and healthcare
companies, and that the PostgreSQL default configuration with regard to
safety and durability should be bank defaults rather than "easy" defaults.

For this reason, I'd like to revisit two specific current behaviors of
PostgreSQL and get a sense of how strongly everyone feels about them.

First: I'd suggest that a collation version mismatch should cause an
ERROR rather than a WARNING by default. If we want to have a GUC that
allows warning behavior, I think that's OK but I think it should be
superuser-only and documented as a "developer" setting similar to
zero_damaged_pages.

Second: I'd suggest that all of the "alter ... refresh collation"
commands should be strictly superuser-only rather than
owner-of-collation-privs, and that they should be similarly documented
as something that is generally advised against and exists for
extraordinary circumstances.

I know these things have been discussed before, and I realize the
implications are important and inconvenient for many users, and also I
realize that I'm not often involved in discussions here on the hackers
email list. (I usually catch up on hackers from

Re: trying again to get incremental backup

2023-11-27 Thread Robert Haas
On Thu, Nov 23, 2023 at 11:18 PM Thomas Munro  wrote:
> Robert pinged me to see if I had any ideas.

Thanks, Thomas.

> The reason it fails on Windows is because there is a special code path
> for WIN32 in the patch's src/bin/pg_combinebackup/copy_file.c, but it
> is incomplete: it returns early without feeding the data into the
> checksum, so all the checksums have the same initial and bogus value.
> I commented that part out so it took the normal path like Unix, and it
> passed.

Yikes, that's embarrassing. Thanks for running it down. There is logic
in the caller to figure out whether we need to recompute the checksum
or can reuse one we already have, but copy_file() doesn't understand
that it should take the slow path if a new checksum computation is
required.

> The reason it fails on Linux 32 bit with -fsanitize is because this
> has uncovered a bug in xlogreader.c, which overflows a 32 bit pointer
> when doing a size test that could easily be changed to non-overflowing
> formulation.  AFAICS it is not a live bug because it comes to the
> right conclusion without deferencing the pointer due to other checks,
> but the sanitizer is not wrong to complain about it and I will post a
> patch to fix that in a new thread.  With the draft patch I am testing,
> the sanitizer is happy and this passes too.

Thanks so much.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: New instability in stats regression test

2023-11-27 Thread Tom Lane
Andres Freund  writes:
> I am probably under-caffeinated: What precisely is the potential race? Just
> that the timestamps on some system might not be granular enough?

The problem as I see it is that this test:

SELECT :io_stats_post_reset < :io_stats_pre_reset;

requires an assumption that less I/O has happened since the commanded
reset action than happened before it (extending back to the previous
reset, or cluster start).  Since concurrent processes might be doing
I/O, this has a race condition.  If we are slow enough about obtaining
:io_stats_post_reset, the test *will* fail eventually.  But the shorter
the distance back to the previous reset, the bigger the odds of
observable trouble; thus Michael's concern that adding more reset
tests in future would increase the risk of failure.

regards, tom lane




Re: Add recovery to pg_control and remove backup_label

2023-11-27 Thread Robert Haas
On Sun, Nov 26, 2023 at 3:42 AM Stephen Frost  wrote:
> What would really be helpful would be hearing from these individuals
> directly as to what the issues are with the changes, such that perhaps
> we can do things better in the future to avoid whatever the issue is
> they're having with the changes.  Simply saying we shouldn't make
> changes in this area isn't workable and the constant push-back is
> actively discouraging to folks trying to make improvements.  Obviously
> it's a biased view, but we've not had issues making the necessary
> adjustments in pgbackrest with each release and I feel like if the
> authors of wal-g or barman did that they would have spoken up.

I'm happy if people show up to comment on proposed changes, but I
think you're being a little bit unrealistic here. I have had to help
plenty of people who have screwed up their backups in one way or
another, generally by using some home-grown script, sometimes by
misusing some existing backup tool. Those people are EDB customers;
they don't read and participate in discussions here. If they did,
perhaps they wouldn't be paying EDB to have me and my colleagues sort
things out for them when it all goes wrong. I'm not trying to say that
EDB doesn't have customers who participate in mailing list
discussions, because we do, but it's a small minority, and I don't
think that should surprise anyone. Moreover, the people who don't
wouldn't necessarily have the background, expertise, or *time* to
assess specific proposals in detail. If your point is that my
perspective on what's helpful or unhelpful is not valid because I've
only helped 30 people who had problems in this area, but that the
perspective of those 30 people who were helped would be more valid,
well, I don't agree with that. I think your perspective and David's
are valid precisely *because* you've worked a lot on pgbackrest and no
doubt interacted with lots of users; I think Andres's perspective is
valid precisely *because* of his experience working with the fleet at
Microsoft and individual customers at EDB and 2Q before that; and I
think my perspective is valid for the same kinds of reasons.

I am more in agreement with the idea that it would be nice to hear
from backup tool authors, but I think even that has limited value.
Surely we can all agree that if the backup tool is correctly written,
none of this matters, because you'll make the tool do the right things
and then you'll be fine. The difficulty here, and the motivation
behind this proposal and others like it, is that too many users fail
to follow the procedure correctly. If we hear from the authors of
well-written backup tools, I expect they will tell us they can adapt
their tool to whatever we do. And if we hear from the authors of
poorly-written tools, well, I don't think their opinions would form a
great basis for making decisions.

> [ lengthy discussion of tools that don't work any more ]

What confuses me here is that you seem to be arguing that we should
*once again* make a breaking change to the backup API, but at the same
time you're acknowledging that there are plenty of tools out there on
the Internet that have gotten broken by previous rounds of changes.
It's only one step from there to conclude that whacking the API around
does more harm than good, but you seem to reject that conclusion.

Personally, I haven't yet seen any evidence that the removal of
exclusive backup mode made any real difference one way or the other. I
think I've heard about people needing to adjust code for it, but not
about that being a problem. I have yet to run into anyone who was
previously using it but, because it was deprecated, switched to doing
something better and safer. Have you?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'

2023-11-27 Thread Alexander Korotkov
On Mon, Nov 27, 2023 at 8:07 PM Andres Freund  wrote:
>
> On 2023-11-27 11:29:48 +0530, Ashutosh Bapat wrote:
> > How do we ensure that we are not making unnecessary copies of Bitmapsets?
>
> We don't - but that's not specific to this patch. Bitmapsets typically aren't
> very large, I doubt that it's a significant proportion of the memory
> usage. Adding refcounts or such would likely add more overhead than it'd save,
> both in time and memory.
>
> I am a bit worried about the maintainability of remove_rel_from_query() et
> al. Is there any infrastructure for detecting that some PlannerInfo field that
> needs updating wasn't updated?  There's not even a note in PlannerInfo that
> documents that that needs to happen.

That makes sense, thank you.  We need at least a comment about this.
I'll write a patch adding this comment.

BTW, what do you think about the patches upthread [1].

Links
1. 
https://www.postgresql.org/message-id/CAPpHfdtLgCryACcrmLv=Koq9rAB3=tr5y9d84dggvuhscvj...@mail.gmail.com

--
Regards,
Alexander Korotkov




Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Tristan Partin

Here is a v2 which adds back a comment that was not meant to be removed.

--
Tristan Partin
Neon (https://neon.tech)
From 4bcb73eab9ceba950581a890c52820a81134f7e4 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Mon, 27 Nov 2023 11:49:52 -0600
Subject: [PATCH v2] Use BIO_{get,set}_app_data() instead of
 BIO_{get,set}_data()

Compiling Postgres against OpenSSL 3.2 exposed a regression:

$ psql "postgresql://$DB?sslmode=require"
psql: error: connection to server at "redacted" (redacted), port 5432 failed: ERROR:  Parameter 'user' is missing in startup packet.
double free or corruption (out)
Aborted (core dumped)

Analyzing the backtrace, openssl was overwriting heap-allocated data in
our PGconn struct because it thought BIO::ptr was a struct bss_sock_st
*. OpenSSL then called a memset() on a member of that struct, and we
zeroed out data in our PGconn struct.

BIO_get_data(3) says the following:

> These functions are mainly useful when implementing a custom BIO.
>
> The BIO_set_data() function associates the custom data pointed to by ptr
> with the BIO a. This data can subsequently be retrieved via a call to
> BIO_get_data(). This can be used by custom BIOs for storing
> implementation specific information.

If you take a look at my_BIO_s_socket(), we create a partially custom
BIO, but for the most part are defaulting to the methods defined by
BIO_s_socket(). We need to set application-specific data and not BIO
private data, so that the BIO implementation we rely on, can properly
assert that its private data is what it expects.

Co-authored-by: Bo Andreson 
---
 meson.build  |  1 -
 src/backend/libpq/be-secure-openssl.c| 11 +++
 src/include/pg_config.h.in   |  3 ---
 src/interfaces/libpq/fe-secure-openssl.c | 11 +++
 src/tools/msvc/Solution.pm   |  2 --
 5 files changed, 6 insertions(+), 22 deletions(-)

diff --git a/meson.build b/meson.build
index ee58ee7a06..0095fb183a 100644
--- a/meson.build
+++ b/meson.build
@@ -1285,7 +1285,6 @@ if sslopt in ['auto', 'openssl']
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
   ['OPENSSL_init_ssl'],
-  ['BIO_get_data'],
   ['BIO_meth_new'],
   ['ASN1_STRING_get0_data'],
   ['HMAC_CTX_new'],
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 31b6a6eacd..1b8b32c5b3 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -842,11 +842,6 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
-#ifndef HAVE_BIO_GET_DATA
-#define BIO_get_data(bio) (bio->ptr)
-#define BIO_set_data(bio, data) (bio->ptr = data)
-#endif
-
 static BIO_METHOD *my_bio_methods = NULL;
 
 static int
@@ -856,7 +851,7 @@ my_sock_read(BIO *h, char *buf, int size)
 
 	if (buf != NULL)
 	{
-		res = secure_raw_read(((Port *) BIO_get_data(h)), buf, size);
+		res = secure_raw_read(((Port *) BIO_get_app_data(h)), buf, size);
 		BIO_clear_retry_flags(h);
 		if (res <= 0)
 		{
@@ -876,7 +871,7 @@ my_sock_write(BIO *h, const char *buf, int size)
 {
 	int			res = 0;
 
-	res = secure_raw_write(((Port *) BIO_get_data(h)), buf, size);
+	res = secure_raw_write(((Port *) BIO_get_app_data(h)), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res <= 0)
 	{
@@ -952,7 +947,7 @@ my_SSL_set_fd(Port *port, int fd)
 		SSLerr(SSL_F_SSL_SET_FD, ERR_R_BUF_LIB);
 		goto err;
 	}
-	BIO_set_data(bio, port);
+	BIO_set_app_data(bio, port);
 
 	BIO_set_fd(bio, fd, BIO_NOCLOSE);
 	SSL_set_bio(port->ssl, bio, bio);
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index d8a2985567..5f16918243 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -66,9 +66,6 @@
 /* Define to 1 if you have the `backtrace_symbols' function. */
 #undef HAVE_BACKTRACE_SYMBOLS
 
-/* Define to 1 if you have the `BIO_get_data' function. */
-#undef HAVE_BIO_GET_DATA
-
 /* Define to 1 if you have the `BIO_meth_new' function. */
 #undef HAVE_BIO_METH_NEW
 
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index 4aeaf08312..e669bdbf1d 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1815,11 +1815,6 @@ PQsslAttribute(PGconn *conn, const char *attribute_name)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
-#ifndef HAVE_BIO_GET_DATA
-#define BIO_get_data(bio) (bio->ptr)
-#define BIO_set_data(bio, data) (bio->ptr = data)
-#endif
-
 /* protected by ssl_config_mutex */
 static BIO_METHOD *my_bio_methods;
 
@@ -1828,7 +1823,7 @@ my_sock_read(BIO *h, char *buf, int size)
 {
 	int			res;
 
-	res = pqsecure_raw_read((PGconn *) BIO_get_data(h), buf, size);
+	res = pqsecure_raw_read((PGconn *) BIO_get_app_data(h), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res < 0)
 	{
@@ -1858,7 +1853,7

Re: SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Tristan Partin

Nazir,

Thanks for opening a thread. Was just about to start one, here what we 
came up with so far.


Homebrew users discovered a regression[0] when using Postgres compiled 
and linked against OpenSSL version 3.2.


$ psql "postgresql://$DB?sslmode=require"
psql: error: connection to server at "redacted" (redacted), port 5432 failed: 
ERROR:  Parameter 'user' is missing in startup packet.
double free or corruption (out)
Aborted (core dumped)

Analyzing the backtrace, OpenSSL was overwriting heap-allocated data in
our PGconn struct because it thought BIO::ptr was a struct bss_sock_st
*. OpenSSL then called a memset() on a member of that struct, and we
zeroed out data in our PGconn struct.

BIO_get_data(3) says the following:


These functions are mainly useful when implementing a custom BIO.

The BIO_set_data() function associates the custom data pointed to by ptr
with the BIO a. This data can subsequently be retrieved via a call to
BIO_get_data(). This can be used by custom BIOs for storing
implementation specific information.


If you take a look at my_BIO_s_socket(), we create a partially custom
BIO, but for the most part are defaulting to the methods defined by
BIO_s_socket(). We need to set application-specific data and not BIO
private data, so that the BIO implementation we rely on, can properly
assert that its private data is what it expects.

The ssl test suite continues to pass with this patch. This patch should 
be backported to every supported Postgres version most likely.


[0]: https://github.com/Homebrew/homebrew-core/issues/155651

--
Tristan Partin
Neon (https://neon.tech)
From 672103a67aaf0dae5be6c5adcc5ce19e5b6d7676 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Mon, 27 Nov 2023 11:49:52 -0600
Subject: [PATCH v1] Use BIO_{get,set}_app_data() instead of
 BIO_{get,set}_data()

Compiling Postgres against OpenSSL 3.2 exposed a regression:

$ psql "postgresql://$DB?sslmode=require"
psql: error: connection to server at "redacted" (redacted), port 5432 failed: ERROR:  Parameter 'user' is missing in startup packet.
double free or corruption (out)
Aborted (core dumped)

Analyzing the backtrace, openssl was overwriting heap-allocated data in
our PGconn struct because it thought BIO::ptr was a struct bss_sock_st
*. OpenSSL then called a memset() on a member of that struct, and we
zeroed out data in our PGconn struct.

BIO_get_data(3) says the following:

> These functions are mainly useful when implementing a custom BIO.
>
> The BIO_set_data() function associates the custom data pointed to by ptr
> with the BIO a. This data can subsequently be retrieved via a call to
> BIO_get_data(). This can be used by custom BIOs for storing
> implementation specific information.

If you take a look at my_BIO_s_socket(), we create a partially custom
BIO, but for the most part are defaulting to the methods defined by
BIO_s_socket(). We need to set application-specific data and not BIO
private data, so that the BIO implementation we rely on, can properly
assert that its private data is what it expects.

Co-authored-by: Bo Andreson 
---
 meson.build  |  1 -
 src/backend/libpq/be-secure-openssl.c| 11 +++
 src/include/pg_config.h.in   |  3 ---
 src/interfaces/libpq/fe-secure-openssl.c | 20 +++-
 src/tools/msvc/Solution.pm   |  2 --
 5 files changed, 6 insertions(+), 31 deletions(-)

diff --git a/meson.build b/meson.build
index ee58ee7a06..0095fb183a 100644
--- a/meson.build
+++ b/meson.build
@@ -1285,7 +1285,6 @@ if sslopt in ['auto', 'openssl']
   # doesn't have these OpenSSL 1.1.0 functions. So check for individual
   # functions.
   ['OPENSSL_init_ssl'],
-  ['BIO_get_data'],
   ['BIO_meth_new'],
   ['ASN1_STRING_get0_data'],
   ['HMAC_CTX_new'],
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 31b6a6eacd..1b8b32c5b3 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -842,11 +842,6 @@ be_tls_write(Port *port, void *ptr, size_t len, int *waitfor)
  * see sock_read() and sock_write() in OpenSSL's crypto/bio/bss_sock.c.
  */
 
-#ifndef HAVE_BIO_GET_DATA
-#define BIO_get_data(bio) (bio->ptr)
-#define BIO_set_data(bio, data) (bio->ptr = data)
-#endif
-
 static BIO_METHOD *my_bio_methods = NULL;
 
 static int
@@ -856,7 +851,7 @@ my_sock_read(BIO *h, char *buf, int size)
 
 	if (buf != NULL)
 	{
-		res = secure_raw_read(((Port *) BIO_get_data(h)), buf, size);
+		res = secure_raw_read(((Port *) BIO_get_app_data(h)), buf, size);
 		BIO_clear_retry_flags(h);
 		if (res <= 0)
 		{
@@ -876,7 +871,7 @@ my_sock_write(BIO *h, const char *buf, int size)
 {
 	int			res = 0;
 
-	res = secure_raw_write(((Port *) BIO_get_data(h)), buf, size);
+	res = secure_raw_write(((Port *) BIO_get_app_data(h)), buf, size);
 	BIO_clear_retry_flags(h);
 	if (res <= 0)
 	{
@@ -952,7 +947,7 @@ my_SSL_set_fd(Port *port, int fd)
 		SSLerr(SSL_F_SSL

Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'

2023-11-27 Thread Andres Freund
On 2023-11-27 11:29:48 +0530, Ashutosh Bapat wrote:
> How do we ensure that we are not making unnecessary copies of Bitmapsets?

We don't - but that's not specific to this patch. Bitmapsets typically aren't
very large, I doubt that it's a significant proportion of the memory
usage. Adding refcounts or such would likely add more overhead than it'd save,
both in time and memory.

I am a bit worried about the maintainability of remove_rel_from_query() et
al. Is there any infrastructure for detecting that some PlannerInfo field that
needs updating wasn't updated?  There's not even a note in PlannerInfo that
documents that that needs to happen.




SSL tests fail on OpenSSL v3.2.0

2023-11-27 Thread Nazir Bilal Yavuz
Hi,

SSL tests fail on OpenSSL v3.2.0. I tested both on macOS (CI) and
debian (my local) and both failed with the same errors. To trigger
these errors on CI, you may need to clear the repository cache;
otherwise macOS won't install the v3.2.0 of the OpenSSL.

001_ssltests:
psql exited with signal 6 (core dumped): 'psql: error: connection to
server at "127.0.0.1", port 56718 failed: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
SSL SYSCALL error: Connection reset by peer' while running 'psql -XAtq
-d sslkey=invalid sslcert=invalid sslrootcert=invalid sslcrl=invalid
sslcrldir=invalid user=ssltestuser dbname=trustdb hostaddr=127.0.0.1
host=common-name.pg-ssltest.test sslrootcert=invalid sslmode=require
-f - -w' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm


002_scram:
psql exited with signal 6 (core dumped): 'psql: error: connection to
server at "127.0.0.1", port 54531 failed: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
SSL SYSCALL error: Connection reset by peer' while running 'psql -XAtq
-d dbname=trustdb sslmode=require sslcert=invalid sslrootcert=invalid
hostaddr=127.0.0.1 host=localhost user=ssltestuser -f - -w' at
/Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 1997.


003_sslinfo:
psql exited with signal 6 (core dumped): 'psql: error: connection to
server at "127.0.0.1", port 59337 failed: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
SSL SYSCALL error: Connection reset by peer' while running 'psql -XAtq
-d sslkey=invalid sslcert=invalid sslrootcert=invalid sslcrl=invalid
sslcrldir=invalid sslrootcert=ssl/root+server_ca.crt sslmode=require
dbname=certdb hostaddr=127.0.0.1 host=localhost user=ssltestuser
sslcert=ssl/client_ext.crt
sslkey=/Users/admin/pgsql/build/testrun/ssl/003_sslinfo/data/tmp_test_q11O/client_ext.key
-f - -w' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
line 1997.

macOS CI run: https://cirrus-ci.com/task/5128008789393408

I couldn't find the cause yet but just wanted to inform you.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: New instability in stats regression test

2023-11-27 Thread Andres Freund
Hi,

On 2023-11-27 11:56:19 +0900, Michael Paquier wrote:
> I was ready to argue that we'd better keep this test and keep it close
> to the end of stats.sql while documenting why things are kept in this
> order, but two resets done on the same shared stats type would still
> be prone to race conditions without all the previous activity done in
> the tests (like pg_stat_wal).

I am probably under-caffeinated: What precisely is the potential race? Just
that the timestamps on some system might not be granular enough?

Greetings,

Andres Freund




Re: New instability in stats regression test

2023-11-27 Thread Andres Freund
Hi,

On 2023-11-27 15:49:01 +0530, Bharath Rupireddy wrote:
> On Mon, Nov 27, 2023 at 8:26 AM Michael Paquier  wrote:
> > but two resets done on the same shared stats type would still
> > be prone to race conditions without all the previous activity done in
> > the tests (like pg_stat_wal).
> 
> Can running stats.sql in non-parallel mode help stabilize the tests as-is?

I think that'd be a cure *way* worse than the disease. Having concurrent stats
activity isn't exactly uncommon. And because of checkpoints, autovacuum etc,
you'd still have rare situations of concurrency.

Greetings,

Andres Freund




Re: remaining sql/json patches

2023-11-27 Thread Andres Freund
Hi,

On 2023-11-27 15:06:12 +0100, Alvaro Herrera wrote:
> On 2023-Nov-27, Andrew Dunstan wrote:
>
> > Interesting. But inferring a speed effect from such changes is difficult. I
> > don't have a good idea about measuring parser speed, but a tool to do that
> > would be useful. Amit has made a start on such measurements, but it's only a
> > start. I'd prefer to have evidence rather than speculation.

Yea, the parser table sizes are influenced by the increase in complexity of
the grammar, but it's not a trivial correlation. Bison attempts to compress
the state space and it looks like there are some heuristics involved.


> At this point one thing that IMO we cannot afford to do, is stop feature
> progress work on the name of parser speed.

Agreed - I don't think anyone advocated that though.


> But at some point we'll probably have to fix that by parsing differently (a
> top-down parser, perhaps?  Split the parser in smaller pieces that each deal
> with subsets of the whole thing?)

Yea. Both perhaps. Being able to have sub-grammars would be quite powerful I
think, and we might be able to do it without loosing cross-checking from bison
that our grammar is conflict free.  Even if the resulting combined state space
is larger, better locality should more than make up for that.



> The amount of effort spent on the parsing aspect on this thread seems in
> line with what we should always be doing: keep an eye on it, but not
> disregard the work just because the parser tables have grown.

I think we've, in other threads, not paid enough attention to it and just
added stuff to the grammar in the first way that didn't produce shift/reduce
conflicts... Of course a decent part of the problem here is the SQL standard
that so seems to like adding one-off forms of grammar (yes,
func_expr_common_subexpr, I'm looking at you)...

Greetings,

Andres Freund




Re: Missing docs on AT TIME ZONE precedence?

2023-11-27 Thread Alvaro Herrera
We could do something like this.  Is this good?

I tried to merge WITH and WITHOUT with the precedence class immediately
above, but that failed: the main grammar compiles fine and no tests
fail, but ECPG does fail to compile the sqljson.pgc test, so there's
some problem there.  Now, the ecpg grammar stuff *is* absolute black
magic to me, so I have no idea what to do about that.

(TBH I don't think the added comments really explain the problems fully.
That's most likely because I don't actually understand what the problems
are.)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Thou shalt study thy libraries and strive not to reinvent them without
cause, that thy code may be short and readable and thy days pleasant
and productive. (7th Commandment for C Programmers)
>From 3499494625a59e2b5bda85c207b2d0e1181d703d Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 27 Nov 2023 18:01:44 +0100
Subject: [PATCH v1] Merge JSON-related grammar keyword precedence classes

Some of these classes were unnecessary.  Grouping them this way also
helps us document the reason for these to exist at all.
---
 src/backend/parser/gram.y | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8c00b119ec..3060dc88f8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -821,11 +821,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %nonassoc	BETWEEN IN_P LIKE ILIKE SIMILAR NOT_LA
 %nonassoc	ESCAPE			/* ESCAPE must be just above LIKE/ILIKE/SIMILAR */
 
-/* SQL/JSON related keywords */
-%nonassoc	UNIQUE JSON
-%nonassoc	KEYS OBJECT_P SCALAR VALUE_P
-%nonassoc	WITH WITHOUT
-
 /*
  * To support target_el without AS, it used to be necessary to assign IDENT an
  * explicit precedence just less than Op.  While that's not really necessary
@@ -850,9 +845,22 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
  * an explicit priority lower than '(', so that a rule with CUBE '(' will shift
  * rather than reducing a conflicting rule that takes CUBE as a function name.
  * Using the same precedence as IDENT seems right for the reasons given above.
+ *
+ * We also need to support WITH UNIQUE KEYS and WITHOUT UNIQUE KEYS, where
+ * KEYS is optional and the whole rule is also optional.  We can do this by
+ * putting UNIQUE above KEYS, and that one above WITH and WITHOUT.  We also
+ * have to declare the precedence of the empty production to be that of KEYS.
+ * We arbitrarily choose to put UNIQUE together with UNBOUNDED, and KEYS
+ * together with IDENT, then WITH and WITHOUT in a class of their own.
+ *
+ * For IS JSON {VALUE,OBJECT,SCALAR}, we need to split out the latter three
+ * from JSON; we put JSON with UNBOUNDED and the other three keywords in the
+ * group below.
  */
-%nonassoc	UNBOUNDED		/* ideally would have same precedence as IDENT */
+%nonassoc	UNBOUNDED UNIQUE JSON		/* ideally would have same precedence as IDENT */
 %nonassoc	IDENT PARTITION RANGE ROWS GROUPS PRECEDING FOLLOWING CUBE ROLLUP
+			KEYS OBJECT_P SCALAR VALUE_P
+%nonassoc	WITH WITHOUT
 %left		Op OPERATOR		/* multi-character ops and user-defined operators */
 %left		'+' '-'
 %left		'*' '/' '%'
-- 
2.39.2



Re: walwriter interacts quite badly with synchronous_commit=off

2023-11-27 Thread Andres Freund
On 2023-11-27 17:55:34 +0200, Heikki Linnakangas wrote:
> On 25/10/2023 21:59, Andres Freund wrote:
> > > See attached. It's the same logic as in your patch, just formulatd more
> > > clearly IMHO.
> > Yep, makes sense!
> 
> Pushed this. Thanks for the investigation!

Thanks!




Re: Missing docs on AT TIME ZONE precedence?

2023-11-27 Thread Alvaro Herrera
On 2023-Nov-26, Tom Lane wrote:

> I am, however, feeling a little bit on the warpath about the
> grammar comments for the SQL/JSON keyword precedences:
> 
> /* SQL/JSON related keywords */
> %nonassoc UNIQUE JSON
> %nonassoc KEYS OBJECT_P SCALAR VALUE_P
> %nonassoc WITH WITHOUT
> 
> Every other case where we're doing this has a para of explanation
> in the block comment just below here.  These not only have no
> meaningful explanation, they are in the wrong place --- it looks
> like they are unrelated to the block comment, whereas actually
> (I think) they are another instance of it.  I consider this
> well below project standard.

I introduced those in commit 6ee30209a6f1.  That is the minimal set of
keywords for which the precedence had to be declared that was necessary
so that the grammar would compile for the new feature; I extracted that
from a much larger set that was in the original patch submission.  I
spent a long time trying to figure out whether the block comment
applied, and I wasn't sure, so I ended up leaving the comment at what
you see there.

Looking at it again:

UNIQUE and KEYS are there for "WITH UNIQUE KEYS" (& WITHOUT), where KEYS
is optional and the whole clause is optional in some rules.  So as I
understand it, we need to establish the relative precedence of UNIQUE
(first group), KEYS (second group) and WITH/WITHOUT (third group).
We also have a "%prec KEYS" declaration in the
json_key_uniqueness_constraint_opt rule for this.

We also need a relative precedence between JSON and the set below:
VALUE, OBJECT, SCALAR, for the "IS JSON {VALUE/OBJECT/SCALAR}"
construct.

I put KEYS in the same set as the three above just because it was not a
problem to do so; likewise UNIQUE together with JSON.  (I think it would
also work to put WITH and WITHOUT in the second group, but I only ran
bison to verify this, didn't run any tests.)

I am also not sure if the current location of those three groups (or
two, if we merge those) relative to the rest of the groups below the
large block comment is a good one.  As far as compilability of the
grammar goes, it looks like they could even be at the very bottom of the
precedence list, below the join operators.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"¿Qué importan los años?  Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo"  (Mafalda)




Re: PATCH: Add REINDEX tag to event triggers

2023-11-27 Thread jian he
On Mon, Nov 27, 2023 at 6:58 PM Ajin Cherian  wrote:
>
> I just started reviewing the patch. Some minor comments:
> In patch 0001:
> In standard_ProcessUtility(), since you are unconditionally calling
> ProcessUtilitySlow() in case of T_ReindexStmt, you really don't need
> the case statement for T_ReindexStmt just like all the other commands
> which have event trigger support. It will call ProcessUtilitySlow() as
> default.
>
> In patch 0004:
> No need to duplicate reindex_event_trigger_collect in indexcmds.c
> since it is already present in index.c. Add it to index.h and make the
> function extern so that it can be accessed in both index.c and
> indexcmds.c
>
> regards,
> Ajin Cherian
> Fujitsu Australia

Thanks for reviewing it!
The above 2 suggestions are good, now the code is more neat.
From 6a4f1bee35867565b5f65902db642a0d111be6e6 Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Mon, 27 Nov 2023 23:43:01 +0800
Subject: [PATCH v6 1/4] make event triggers facility react to ReindexStmt

Move ReindexStmt moves from standard_ProcessUtility to ProcessUtilitySlow.
By default ProcessUtilitySlow will call trigger related functions.
So the event trigger facility able to support reindex statements.
---
 src/backend/tcop/utility.c| 10 ++
 src/include/tcop/cmdtaglist.h |  2 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index e3ccf6c7..37161eda 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -960,10 +960,6 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 			  (RecoveryInProgress() ? 0 : CHECKPOINT_FORCE));
 			break;
 
-		case T_ReindexStmt:
-			ExecReindex(pstate, (ReindexStmt *) parsetree, isTopLevel);
-			break;
-
 			/*
 			 * The following statements are supported by Event Triggers only
 			 * in some cases, so we "fast path" them in the other cases.
@@ -1574,6 +1570,12 @@ ProcessUtilitySlow(ParseState *pstate,
 }
 break;
 
+			case T_ReindexStmt:
+ExecReindex(pstate, (ReindexStmt *) parsetree, isTopLevel);
+/* commandCollected done in the deep inside of ExecReindex */
+commandCollected = true;
+break;
+
 			case T_CreateExtensionStmt:
 address = CreateExtension(pstate, (CreateExtensionStmt *) parsetree);
 break;
diff --git a/src/include/tcop/cmdtaglist.h b/src/include/tcop/cmdtaglist.h
index 553a3187..320ee915 100644
--- a/src/include/tcop/cmdtaglist.h
+++ b/src/include/tcop/cmdtaglist.h
@@ -194,7 +194,7 @@ PG_CMDTAG(CMDTAG_PREPARE, "PREPARE", false, false, false)
 PG_CMDTAG(CMDTAG_PREPARE_TRANSACTION, "PREPARE TRANSACTION", false, false, false)
 PG_CMDTAG(CMDTAG_REASSIGN_OWNED, "REASSIGN OWNED", false, false, false)
 PG_CMDTAG(CMDTAG_REFRESH_MATERIALIZED_VIEW, "REFRESH MATERIALIZED VIEW", true, false, false)
-PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", false, false, false)
+PG_CMDTAG(CMDTAG_REINDEX, "REINDEX", true, false, false)
 PG_CMDTAG(CMDTAG_RELEASE, "RELEASE", false, false, false)
 PG_CMDTAG(CMDTAG_RESET, "RESET", false, false, false)
 PG_CMDTAG(CMDTAG_REVOKE, "REVOKE", true, false, false)
-- 
2.34.1

From 141b176ba6226e61698a08e0c429d7ba61939e7b Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Fri, 24 Nov 2023 20:44:44 +0800
Subject: [PATCH v6 2/4] tag ReindexStmt as ddl in GetCommandLogLevel.

In GetCommandLogLevel, change T_ReindexStmt from lev = LOGSTMT_ALL to lev = LOGSTMT_DDL.
so log_statement (enum) >= 'ddl' will make the reindex statement be logged.
---
 src/backend/tcop/utility.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 37161eda..e115b0ca 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -3622,7 +3622,7 @@ GetCommandLogLevel(Node *parsetree)
 			break;
 
 		case T_ReindexStmt:
-			lev = LOGSTMT_ALL;	/* should this be DDL? */
+			lev = LOGSTMT_DDL;
 			break;
 
 		case T_CreateConversionStmt:
-- 
2.34.1

From f6a660d14b3649f046c2d54c810ccf28f51f65db Mon Sep 17 00:00:00 2001
From: pgaddict 
Date: Tue, 28 Nov 2023 00:05:50 +0800
Subject: [PATCH v6 3/4] refactor multiple functions to support tracking
 reindex using event trigger.

---
 src/backend/catalog/index.c  |  8 ++---
 src/backend/commands/cluster.c   |  2 +-
 src/backend/commands/indexcmds.c | 60 +---
 src/backend/commands/tablecmds.c |  2 +-
 src/include/catalog/index.h  |  4 +--
 5 files changed, 40 insertions(+), 36 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 143fae01..88ff8c5a 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3558,7 +3558,7 @@ IndexGetRelation(Oid indexId, bool missing_ok)
  * reindex_index - This routine is used to recreate a single index
  */
 void
-reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
+reindex_index(const ReindexStmt *stmt, Oid indexId, bool skip_constraint_checks, char persistence,
 			  const ReindexParams *params)

Re: Improve tab completion for ALTER DEFAULT PRIVILEGE and ALTER TABLE

2023-11-27 Thread vignesh C
On Fri, 24 Nov 2023 at 18:37, Shubham Khanna
 wrote:
>
> n Fri, Nov 24, 2023 at 6:33 PM vignesh C  wrote:
> >
> > Hi,
> >
> > Improved tab completion for "ALTER DEFAULT PRIVILEGE" and "ALTER TABLE":
> > 1) GRANT, REVOKE and FOR USER keyword was not displayed in tab
> > completion of alter default privileges like the below statement:
> > ALTER DEFAULT PRIVILEGES GRANT INSERT ON tables TO PUBLIC;
> > ALTER DEFAULT PRIVILEGES REVOKE INSERT ON tables FROM PUBLIC;
> > ALTER DEFAULT PRIVILEGES FOR USER vignesh revoke INSERT ON tables FROM dba1;
> >
> > 2) USER was not displayed for "ALTER DEFAULT PRIVILEGES IN SCHEMA
> > public FOR " like in below statement:
> > ALTER DEFAULT PRIVILEGES IN SCHEMA public FOR USER dba1 GRANT INSERT
> > ON TABLES TO PUBLIC;
> >
> > 3) "FOR GRANT OPTION" was not display for "ALTER DEFAULT PRIVILEGES
> > REVOKE " like in below statement:
> > alter default privileges revoke grant option for select ON tables FROM dba1;
> >
> > 4) "DATA TYPE" was missing in "ALTER TABLE table-name ALTER COLUMN
> > column-name SET" like in:
> > ALTER TABLE t1 ALTER COLUMN c1 SET DATA TYPE text;
> >
> > Attached patch has the changes for the same.
>
> +COMPLETE_WITH("ROLE", "USER");
> +  /* ALTER DEFAULT PRIVILEGES REVOKE */
> +  else if (Matches("ALTER", "DEFAULT", "PRIVILEGES", "REVOKE"))
> +COMPLETE_WITH("SELECT", "INSERT", "UPDATE", "DELETE", "TRUNCATE",
> +    "REFERENCES", "TRIGGER", "CREATE", "EXECUTE", "USAGE",
> +    "MAINTAIN", "ALL", "GRANT OPTION FOR");
>
> I could not find "alter default privileges revoke maintain", should
> this be removed?

This was reverted as part of:
151c22deee66a3390ca9a1c3675e29de54ae73fc.
Revert MAINTAIN privilege and pg_maintain predefined role.

This reverts the following commits: 4dbdb82513, c2122aae63,
5b1a879943, 9e1e9d6560, ff9618e82a, 60684dd834, 4441fc704d,
and b5d6382496.  A role with the MAINTAIN privilege may be able to
use search_path tricks to escalate privileges to the table owner.
Unfortunately, it is too late in the v16 development cycle to apply
the proposed fix, i.e., restricting search_path when running
maintenance commands.

The attached v2 version has the changes for the same.

Regards,
Vignesh
From 511c14a51ba012400b8b849813d16c0a591666d0 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sun, 2 Jul 2023 19:53:50 +0530
Subject: [PATCH v2 2/2] Fix missing tab completion in "ALTER TABLE table-name
 ALTER COLUMN column-name SET"

"DATA TYPE" was missing in "ALTER TABLE table-name ALTER COLUMN
column-name SET" lke in:
ALTER TABLE t1 ALTER COLUMN c1 SET DATA TYPE text;
---
 src/bin/psql/tab-complete.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 8fd639d102..737d253805 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2488,7 +2488,7 @@ psql_completion(const char *text, int start, int end)
 	/* ALTER TABLE ALTER [COLUMN]  SET */
 	else if (Matches("ALTER", "TABLE", MatchAny, "ALTER", "COLUMN", MatchAny, "SET") ||
 			 Matches("ALTER", "TABLE", MatchAny, "ALTER", MatchAny, "SET"))
-		COMPLETE_WITH("(", "COMPRESSION", "DEFAULT", "GENERATED", "NOT NULL", "STATISTICS", "STORAGE",
+		COMPLETE_WITH("(", "COMPRESSION", "DATA TYPE", "DEFAULT", "GENERATED", "NOT NULL", "STATISTICS", "STORAGE",
 		/* a subset of ALTER SEQUENCE options */
 	  "INCREMENT", "MINVALUE", "MAXVALUE", "START", "NO", "CACHE", "CYCLE");
 	/* ALTER TABLE ALTER [COLUMN]  SET ( */
-- 
2.34.1

From f5839d1eadba5019e760e361e545c4a778774a3e Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Sun, 2 Jul 2023 19:35:46 +0530
Subject: [PATCH v2 1/2] Fix missing tab completion in "ALTER DEFAULT
 PRIVILEGES"

GRANT, REVOKE and FOR USER keyword was not displayed in tab completion of alter default prvileges like the below statement:
ALTER DEFAULT PRIVILEGES GRANT INSERT ON tables TO PUBLIC;
ALTER DEFAULT PRIVILEGES REVOKE INSERT ON tables FROM PUBLIC;
ALTER DEFAULT PRIVILEGES FOR USER vignesh revoke INSERT ON tables FROM vignesh;

USER was not displayed for "ALTER DEFAULT PRIVILEGES IN SCHEMA public FOR " like in below statement:
ALTER DEFAULT PRIVILEGES IN SCHEMA public FOR USER vignesh GRANT INSERT ON TABLES TO PUBLIC;

"FOR GRANT OPTION" was not display for "ALTER DEFAULT PRIVILEGES REVOKE " like in below statement:
alter default privileges revoke grant option for select ON tables FROM testdba;
---
 src/bin/psql/tab-complete.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 006e10f5d2..8fd639d102 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2137,10 +2137,15 @@ psql_completion(const char *text, int start, int end)
 
 	/* ALTER DEFAULT PRIVILEGES */
 	else if (Matches("ALTER", "DEFAULT", "PRIVILEGES"))
-		COMPLETE_WITH("FOR ROLE", "IN SCHEMA");
+		COMPLETE_WITH("FOR", "GRANT", "IN SCHEMA", "REVOKE");
 	/* ALTER DEFAULT PRIVILEGES FOR */

Re: walwriter interacts quite badly with synchronous_commit=off

2023-11-27 Thread Heikki Linnakangas

On 25/10/2023 21:59, Andres Freund wrote:

See attached. It's the same logic as in your patch, just formulatd more
clearly IMHO.

Yep, makes sense!


Pushed this. Thanks for the investigation!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: brininsert optimization opportunity

2023-11-27 Thread Tomas Vondra
On 11/27/23 11:34, Tomas Vondra wrote:
> 
> 
> On 11/27/23 08:37, Richard Guo wrote:
>>
>> On Mon, Nov 27, 2023 at 1:53 PM Soumyadeep Chakraborty
>> mailto:soumyadeep2...@gmail.com>> wrote:
>>
>> On Sun, Nov 26, 2023 at 9:28 PM Richard Guo > > wrote:
>> > It seems that we have an oversight in this commit.  If there is no
>> tuple
>> > that has been inserted, we wouldn't have an available insert state in
>> > the clean up phase.  So the Assert in brininsertcleanup() is not
>> always
>> > right.  For example:
>> >
>> > regression=# update brin_summarize set value = brin_summarize.value;
>> > server closed the connection unexpectedly
>>
>> I wasn't able to repro the issue on
>> 86b64bafc19c4c60136a4038d2a8d1e6eecc59f2.
>> with UPDATE/INSERT:
>>
>> This could be because since c5b7ba4e67aeb5d6f824b74f94114d99ed6e42b7,
>> we have moved ExecOpenIndices()
>> from ExecInitModifyTable() to ExecInsert(). Since we never open the
>> indices if nothing is
>> inserted, we would never attempt to close them with ExecCloseIndices()
>> while the ii_AmCache
>> is NULL (which is what causes this assertion failure).
>>
>>
>> AFAICS we would also open the indices from ExecUpdate().  So if we
>> update the table in a way that no new tuples are inserted, we will have
>> this issue.  As I showed previously, the query below crashes for me on
>> latest master (dc9f8a7983).
>>
>> regression=# update brin_summarize set value = brin_summarize.value;
>> server closed the connection unexpectedly
>>
>> There are other code paths that call ExecOpenIndices(), such as 
>> ExecMerge().  I believe it's not hard to create queries that trigger
>> this Assert for those cases.
>>
> 
> FWIW I can readily reproduce it like this:
> 
>   drop table t;
>   create table t (a int);
>   insert into t values (1);
>   create index on t using brin (a);
>   update t set a = a;
> 
> I however wonder if maybe we should do the check in index_insert_cleanup
> and not in the AM callback. That seems simpler / better, because the AM
> callbacks then can't make this mistake.
> 

I did it this way (check in index_insert_cleanup) and pushed. Thanks for
the report!

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add semi-join pushdown to postgres_fdw

2023-11-27 Thread Alexander Pyhalov

Alexander Korotkov писал(а) 2023-11-27 03:49:


Thank you for the revision.

I've revised the patch myself.  I've replaced StringInfo with
additional conds into a list of strings as I proposed before.  I think
the code became much clearer.  Also, it gets rid of some unnecessary
allocations.

I think the code itself is not in bad shape.  But patch lacks some
high-level description of semi-joins processing as well as comments on
each manipulation with additional conds.  Could you please add this?



Hi. The updated patch looks better. It seems I've failed to fix logic in 
deparseFromExprForRel() when tried to convert StringInfos to Lists.


I've added some comments. The most complete description of how SEMI-JOIN 
is processed, is located in deparseFromExprForRel(). Unfortunately,
there seems to be no single place, describing current JOIN deparsing 
logic.


--
Best regards,
Alexander Pyhalov,
Postgres ProfessionalFrom c17e05d09d5721d22785ed11bed053162d67d967 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov 
Date: Mon, 27 Nov 2023 14:35:29 +0300
Subject: [PATCH] postgres_fdw: add support for deparsing semi joins

---
 contrib/postgres_fdw/deparse.c| 234 ++---
 .../postgres_fdw/expected/postgres_fdw.out| 320 --
 contrib/postgres_fdw/postgres_fdw.c   |  94 -
 contrib/postgres_fdw/postgres_fdw.h   |   3 +
 contrib/postgres_fdw/sql/postgres_fdw.sql | 126 ++-
 5 files changed, 696 insertions(+), 81 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 09fd489a901..8670524578b 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -180,11 +180,15 @@ static void appendConditions(List *exprs, deparse_expr_cxt *context);
 static void deparseFromExprForRel(StringInfo buf, PlannerInfo *root,
   RelOptInfo *foreignrel, bool use_alias,
   Index ignore_rel, List **ignore_conds,
+  List **additional_conds,
   List **params_list);
+static void appendWhereClause(List *exprs, List *additional_conds,
+			  deparse_expr_cxt *context);
 static void deparseFromExpr(List *quals, deparse_expr_cxt *context);
 static void deparseRangeTblRef(StringInfo buf, PlannerInfo *root,
 			   RelOptInfo *foreignrel, bool make_subquery,
-			   Index ignore_rel, List **ignore_conds, List **params_list);
+			   Index ignore_rel, List **ignore_conds,
+			   List **additional_conds, List **params_list);
 static void deparseAggref(Aggref *node, deparse_expr_cxt *context);
 static void appendGroupByClause(List *tlist, deparse_expr_cxt *context);
 static void appendOrderBySuffix(Oid sortop, Oid sortcoltype, bool nulls_first,
@@ -1370,6 +1374,7 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
 {
 	StringInfo	buf = context->buf;
 	RelOptInfo *scanrel = context->scanrel;
+	List	   *additional_conds = NIL;
 
 	/* For upper relations, scanrel must be either a joinrel or a baserel */
 	Assert(!IS_UPPER_REL(context->foreignrel) ||
@@ -1379,14 +1384,11 @@ deparseFromExpr(List *quals, deparse_expr_cxt *context)
 	appendStringInfoString(buf, " FROM ");
 	deparseFromExprForRel(buf, context->root, scanrel,
 		  (bms_membership(scanrel->relids) == BMS_MULTIPLE),
-		  (Index) 0, NULL, context->params_list);
-
-	/* Construct WHERE clause */
-	if (quals != NIL)
-	{
-		appendStringInfoString(buf, " WHERE ");
-		appendConditions(quals, context);
-	}
+		  (Index) 0, NULL, &additional_conds,
+		  context->params_list);
+	appendWhereClause(quals, additional_conds, context);
+	if (additional_conds != NIL)
+		list_free_deep(additional_conds);
 }
 
 /*
@@ -1598,6 +1600,42 @@ appendConditions(List *exprs, deparse_expr_cxt *context)
 	reset_transmission_modes(nestlevel);
 }
 
+/*
+ * Append WHERE clause, containing conditions
+ * from exprs and additional_conds, to context->buf.
+ */
+static void
+appendWhereClause(List *exprs, List *additional_conds, deparse_expr_cxt *context)
+{
+	StringInfo	buf = context->buf;
+	bool		need_and = false;
+	ListCell   *lc;
+
+	if (exprs != NIL || additional_conds != NIL)
+		appendStringInfoString(buf, " WHERE ");
+
+	/*
+	 * If there are some filters, append them.
+	 */
+	if (exprs != NIL)
+	{
+		appendConditions(exprs, context);
+		need_and = true;
+	}
+
+	/*
+	 * If there are some EXISTS conditions, coming from SEMI-JOINS, append
+	 * them.
+	 */
+	foreach(lc, additional_conds)
+	{
+		if (need_and)
+			appendStringInfoString(buf, " AND ");
+		appendStringInfoString(buf, (char *) lfirst(lc));
+		need_and = true;
+	}
+}
+
 /* Output join name for given join type */
 const char *
 get_jointype_name(JoinType jointype)
@@ -1616,6 +1654,9 @@ get_jointype_name(JoinType jointype)
 		case JOIN_FULL:
 			return "FULL";
 
+		case JOIN_SEMI:
+			return "SEMI";
+
 		default:
 			/* Shouldn't come here, but protect from buggy code. */
 			elog(ERROR, "unsupported join type %d", jointype);
@@ -1712,11 +1753,14 @@ deparseSubqueryTargetList(

Re: Questions regarding Index AMs and natural ordering

2023-11-27 Thread Matthias van de Meent
On Fri, 24 Nov 2023, 19:58 Tom Lane,  wrote:
>
> Peter Geoghegan  writes:
> > On Fri, Nov 24, 2023 at 8:44 AM Matthias van de Meent
> >  wrote:
> >> Yes, the part where btree opclasses determine a type's ordering is
> >> clear. But what I'm looking for is "how do I, as an index AM
> >> implementation, get the signal that I need to return column-ordered
> >> data?" If that signal is "index AM marked amcanorder == index must
> >> return ordered data", then that's suboptimal for the index AM writer,
> >> but understandable. If amcanorder does not imply always ordered
> >> retrieval, then I'd like to know how it is signaled to the AM. But as
> >> of yet I've not found a clear and conclusive answer either way.
>
> > I suppose that amcanorder=true cannot mean that, since we have the
> > SAOP path key thing (at least for now).
>
> As things stand, amcanorder definitely means that the index always
> returns ordered data, since the planner will unconditionally apply
> pathkeys to the indexscan Paths generated for it (see plancat.c's
> get_relation_info which sets up info->sortopfamily, and
> build_index_pathkeys which uses that).  We could reconsider that
> definition if there were a reason to, but so far it hasn't seemed
> interesting.

For GIST there is now a case for improving the support for optionally
ordered retrieval, as there is a patch that tries to hack ORDER BY
support into GIST. Right now that patch applies (what I consider) a
hack by implicitly adding an operator ordering clause for ORDER BY
column with the column type's btree ordering operator, but with
improved APIs that shouldn't need such a hacky approach.

> The hack in 807a40c5 is a hack, without a doubt, but
> that doesn't necessarily mean we should spend time on generalizing it,
> and even less that we should have done so in 2012.  Maybe by now there
> are non-core index AMs that have cases where it's worth being pickier.
> We'd have to invent some API that allows the index AM to have a say in
> what pathkeys get applied.

I think that would be quite useful, as it would allow indexes to
return ordered results in other orders than the defined key order, and
it would allow e.g. BRIN to run its sort for ordered retrieval inside
the index scan node (rather than requiring its own sort node type).

CC: Tomas, maybe you have some ideas about this as well? What was the
reason for moving BRIN-assisted sort into its own node? Was there more
to it than "BRIN currently doesn't have amgettuple, and amgettuple
can't always be used"?

Kind regards,

Matthias van de Meent




Re: pgoutput incorrectly replaces missing values with NULL since PostgreSQL 15

2023-11-27 Thread Nikhil Benesch
Thank you for turning this around so quickly!




Re: Synchronizing slots from primary to standby

2023-11-27 Thread Drouvot, Bertrand

Hi,

On 11/27/23 1:23 PM, Zhijie Hou (Fujitsu) wrote:

On Monday, November 27, 2023 8:05 PM Drouvot, Bertrand 
 wrote:

Hi,


On 11/6/23 2:30 AM, Zhijie Hou (Fujitsu) wrote:

On Friday, November 3, 2023 7:32 PM Amit Kapila




I don't see a corresponding change in repl_gram.y. I think the following part

of

the code needs to be changed:
/* CREATE_REPLICATION_SLOT slot [TEMPORARY] LOGICAL plugin [options]

*/

| K_CREATE_REPLICATION_SLOT IDENT opt_temporary K_LOGICAL IDENT
create_slot_options



I think after 0266e98, we started to use the new syntax(see the
generic_option_list rule) and we can avoid changing the repl_gram.y when

adding

new options. The new failover can be detected when parsing the generic

option

list(in parseCreateReplSlotOptions).


Did not look in details but it looks like there is more to do here as
this is failing (with v39_2):

"
postgres@primary: psql replication=database
psql (17devel)
Type "help" for help.

postgres=# CREATE_REPLICATION_SLOT test_logical20 LOGICAL pgoutput
FAILOVER;
ERROR:  syntax error


I think the command you executed is of old syntax style, which was kept for
compatibility with older releases. And I think we can avoid supporting new
option for the old syntax as described in the original thread[1] of commit
0266e98. So, the "syntax error" is as expected IIUC.

The new style command is like:
CREATE_REPLICATION_SLOT test_logical20 LOGICAL pgoutput (FAILOVER);



If / As we are not going to support the old syntax for the FAILOVER option
so I think we can get rid of the check on "use_new_options_syntax" here:

-
+   if (failover)
+   {
+   appendStringInfoString(&cmd, "FAILOVER");
+   if (use_new_options_syntax)
+   appendStringInfoString(&cmd, ", ");
+   else
+   appendStringInfoChar(&cmd, ' ');
+   }

as we'd error out before if using the old syntax.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: remaining sql/json patches

2023-11-27 Thread Alvaro Herrera
On 2023-Nov-27, Andrew Dunstan wrote:

> Interesting. But inferring a speed effect from such changes is difficult. I
> don't have a good idea about measuring parser speed, but a tool to do that
> would be useful. Amit has made a start on such measurements, but it's only a
> start. I'd prefer to have evidence rather than speculation.

At this point one thing that IMO we cannot afford to do, is stop feature
progress work on the name of parser speed.  I mean, parser speed is
important, and we need to be mindful that what we add is reasonable.
But at some point we'll probably have to fix that by parsing
differently (a top-down parser, perhaps?  Split the parser in smaller
pieces that each deal with subsets of the whole thing?)

Peter told me earlier today that he noticed that the parser changes he
proposed made the parser source code smaller, they result in larger
parser tables (in terms of the number of states, I think he said).  But
source code maintainability is also very important, so my suggestion
would be that those changes be absorbed into Amit's commits nonetheless.

The amount of effort spent on the parsing aspect on this thread seems in
line with what we should always be doing: keep an eye on it, but not
disregard the work just because the parser tables have grown.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La persona que no quería pecar / estaba obligada a sentarse
 en duras y empinadas sillas/ desprovistas, por cierto
 de blandos atenuantes"  (Patricio Vogel)




Re: Random pg_upgrade test failure on drongo

2023-11-27 Thread Andrew Dunstan



On 2023-11-27 Mo 07:39, Andrew Dunstan wrote:


On 2023-11-27 Mo 07:00, Alexander Lakhin wrote:

Hello Kuroda-san,

25.11.2023 18:19, Hayato Kuroda (Fujitsu) wrote:

Thanks for attaching a program. This helps us to understand the issue.
I wanted to confirm your env - this failure was occurred on windows 
server , right?


I see that behavior on:
Windows 10 Version 1607 (OS Build 14393.0)
Windows Server 2016 Version 1607 (OS Build 14393.0)
Windows Server 2019 Version 1809 (OS Build 17763.1)

But it's not reproduced on:
Windows 10 Version 1809 (OS Build 17763.1) (triple-checked)
Windows Server 2019 Version 1809 (OS Build 17763.592)
Windows 10 Version 22H2 (OS Build 19045.3693)
Windows 11 Version 21H2 (OS Build 22000.613)

So it looks like the failure occurs depending not on Windows edition, 
but

rather on it's build. For Windows Server 2019 the "good" build is
somewhere between 17763.1 and 17763.592, but for Windows 10 it's between
14393.0 and 17763.1.
(Maybe there was some change related to 
FILE_DISPOSITION_POSIX_SEMANTICS/

FILE_DISPOSITION_ON_CLOSE implementation; I don't know where to find
information about that change.)

It's also interesting, what is full version/build of OS on drongo and
fairywren.




It's WS 2019 1809/17763.4252. The latest available AFAICT is 17763.5122





I've updated it to 17763.5122 now.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: remaining sql/json patches

2023-11-27 Thread Andrew Dunstan



On 2023-11-27 Mo 05:42, Alvaro Herrera wrote:

On 2023-Nov-27, Amit Langote wrote:


  For example, the jump between 13 and 14 looks worse.
(I do wonder what happened there.)

The following commit sounds like it might be related?

Yes, but not only that one.  I did some more trolling in the commit log
for the 14 timeframe further and found that the following commits are
the ones with highest additions to YYLAST during that cycle:

  yylast │ yylast_addition │   commit   │
subject
┼─┼┼
  106051 │1883 │ 92bf7e2d02 │ Provide the OR REPLACE option for 
CREATE TRIGGER.
  105325 │1869 │ 06a7c3154f │ Allow most keywords to be used as 
column labels without requiring AS.
  104395 │1816 │ 45b9805706 │ Allow CURRENT_ROLE where CURRENT_USER 
is accepted
  107537 │1139 │ a4d75c86bf │ Extended statistics on expressions
  105410 │1067 │ b5913f6120 │ Refactor CLUSTER and REINDEX grammar 
to use DefElem for option lists
  106007 │ 965 │ 3696a600e2 │ SEARCH and CYCLE clauses
  106864 │ 733 │ be45be9c33 │ Implement GROUP BY DISTINCT
  105886 │ 609 │ 844fe9f159 │ Add the ability for the core grammar 
to have more than one parse target.
  108400 │ 571 │ ec48314708 │ Revert per-index collation version 
tracking feature.
  108939 │ 539 │ e6241d8e03 │ Rethink definition of 
pg_attribute.attcompression.

but we also have these:

  105521 │-530 │ 926fa801ac │ Remove undocumented IS [NOT] OF 
syntax.
  104202 │-640 │ c4325cefba │ Fold AlterForeignTableStmt into 
AlterTableStmt
  104168 │-718 │ 40c24bfef9 │ Improve our ability to regurgitate 
SQL-syntax function calls.
  108111 │-828 │ e56bce5d43 │ Reconsider the handling of procedure 
OUT parameters.
  106398 │-834 │ 71f4c8c6f7 │ ALTER TABLE ... DETACH PARTITION ... 
CONCURRENTLY
  104402 │-923 │ 2453ea1422 │ Support for OUT parameters in 
procedures
  103456 │-939 │ 1ed6b89563 │ Remove support for postfix 
(right-unary) operators.
  104343 │   -1178 │ 873ea9ee69 │ Refactor parsing rules for option 
lists of EXPLAIN, VACUUM and ANALYZE
  102784 │   -1417 │ 8f5b596744 │ Refactor AlterExtensionContentsStmt 
grammar
(59 filas)



Interesting. But inferring a speed effect from such changes is 
difficult. I don't have a good idea about measuring parser speed, but a 
tool to do that would be useful. Amit has made a start on such 
measurements, but it's only a start. I'd prefer to have evidence rather 
than speculation.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: logical decoding and replication of sequences, take 2

2023-11-27 Thread Tomas Vondra
On 11/27/23 13:08, Hayato Kuroda (Fujitsu) wrote:
> Dear Amit, Tomas,
> 

 I am wondering that instead of building the infrastructure to know
 whether a particular change is transactional on the decoding side,
 can't we have some flag in the WAL record to note whether the change
 is transactional or not? I have discussed this point with my colleague
 Kuroda-San and we thought that it may be worth exploring whether we
 can use rd_createSubid/rd_newRelfilelocatorSubid in RelationData to
 determine if the sequence is created/changed in the current
 subtransaction and then record that in WAL record. By this, we need to
 have additional information in the WAL record like XLOG_SEQ_LOG but we
 can probably do it only with wal_level as logical.

>>>
>>> I may not understand the proposal exactly, but it's not enough to know
>>> if it was created in the same subxact. It might have been created in
>>> some earlier subxact in the same top-level xact.
>>>
>>
>> We should be able to detect even some earlier subxact or top-level
>> xact based on rd_createSubid/rd_newRelfilelocatorSubid.
> 
> Here is a small PoC patchset to help your understanding. Please see attached
> files.
> 
> 0001, 0002 were not changed, and 0004 was reassigned to 0003.
> (For now, I focused only on test_decoding, because it is only for evaluation 
> purpose.)
> 
> 0004 is what we really wanted to say. is_transactional is added in WAL 
> record, and it stores
> whether the operations is transactional. In order to distinguish the status, 
> rd_createSubid and
> rd_newRelfilelocatorSubid are used. According to the comment, they would be a 
> valid value
> only when the relation was changed within the transaction
> Also, sequences_hash was not needed anymore, so it and related functions were 
> removed.
> 
> How do you think?
> 

I think it's an a very nice idea, assuming it maintains the current
behavior. It makes a lot of code unnecessary, etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: logical decoding and replication of sequences, take 2

2023-11-27 Thread Tomas Vondra



On 11/27/23 12:11, Amit Kapila wrote:
> On Mon, Nov 27, 2023 at 4:17 PM Tomas Vondra
>  wrote:
>>
>> On 11/27/23 11:13, Amit Kapila wrote:
>>> On Mon, Nov 27, 2023 at 11:34 AM Amit Kapila  
>>> wrote:

 On Mon, Nov 27, 2023 at 6:41 AM Tomas Vondra
  wrote:
>
> While going over 0001, I realized there might be an optimization for
> ReorderBufferSequenceIsTransactional. As coded in 0001, it always
> searches through all top-level transactions, and if there's many of them
> that might be expensive, even if very few of them have any relfilenodes
> in the hash table. It's still linear search, and it needs to happen for
> each sequence change.
>
> But can the relfilenode even be in some other top-level transaction? How
> could it be - our transaction would not see it, and wouldn't be able to
> generate the sequence change. So we should be able to simply check *our*
> transaction (or if it's a subxact, the top-level transaction). Either
> it's there (and it's transactional change), or not (and then it's
> non-transactional change).
>

 I also think the relfilenode should be part of either the current
 top-level xact or one of its subxact, so looking at all the top-level
 transactions for each change doesn't seem advisable.

> The 0004 does this.
>
> This of course hinges on when exactly the transactions get created, and
> assignments processed. For example if this would fire before the txn
> gets assigned to the top-level one, this would break. I don't think this
> can happen thanks to the immediate logging of assignments, but I'm too
> tired to think about it now.
>

 This needs some thought because I think we can't guarantee the
 association till we reach the point where we can actually decode the
 xact. See comments in AssertTXNLsnOrder() [1].

>>
>> I suppose you mean the comment before the SnapBuildXactNeedsSkip call,
>> which says:
>>
>>   /*
>>* Skip the verification if we don't reach the LSN at which we start
>>* decoding the contents of transactions yet because until we reach
>>* the LSN, we could have transactions that don't have the association
>>* between the top-level transaction and subtransaction yet and
>>* consequently have the same LSN.  We don't guarantee this
>>* association until we try to decode the actual contents of
>>* transaction. The ordering of the records prior to the
>>* start_decoding_at LSN should have been checked before the restart.
>>*/
>>
>> But doesn't this say that after we actually start decoding / stop
>> skipping, we should have seen the assignment? We're already decoding
>> transaction contents (because sequence change *is* part of xact, even if
>> we decide to replay it in the non-transactional way).
>>
> 
> It means to say that the assignment is decided after start_decoding_at
> point. We haven't decided that we are past start_decoding_at by the
> time the patch is computing the transactional flag.
> 

Ah, I see. We're deciding if the change is transactional before calling
SnapBuildXactNeedsSkip. That's a bit unfortunate.

>>>
>>> I am wondering that instead of building the infrastructure to know
>>> whether a particular change is transactional on the decoding side,
>>> can't we have some flag in the WAL record to note whether the change
>>> is transactional or not? I have discussed this point with my colleague
>>> Kuroda-San and we thought that it may be worth exploring whether we
>>> can use rd_createSubid/rd_newRelfilelocatorSubid in RelationData to
>>> determine if the sequence is created/changed in the current
>>> subtransaction and then record that in WAL record. By this, we need to
>>> have additional information in the WAL record like XLOG_SEQ_LOG but we
>>> can probably do it only with wal_level as logical.
>>>
>>
>> I may not understand the proposal exactly, but it's not enough to know
>> if it was created in the same subxact. It might have been created in
>> some earlier subxact in the same top-level xact.
>>
> 
> We should be able to detect even some earlier subxact or top-level
> xact based on rd_createSubid/rd_newRelfilelocatorSubid.
> 

Interesting. I admit I haven't considered using these fields before, so
I need to familiarize with it a bit, and try if it'd work.

>> FWIW I think one of the earlier patch versions did something like this,
>> by adding a "created" flag in the xlog record. And we concluded doing
>> this on the decoding side is a better solution.
>>
> 
> oh, I thought it would be much simpler than what we are doing on the
> decoding-side. Can you please point me to the email discussion where
> this is concluded or share the reason?
> 

I think the discussion started around [1], and then in a bunch of
following messages (search for "relfilenode").

regards


[1]
https://www.postgresql.org/message-id/CAExHW5v_vVqkhF4ehST9EzpX1L3bemD1S%2BkTk_-ZVu_ir-nKDw%40mail.gma

Re: Testing autovacuum wraparound (including failsafe)

2023-11-27 Thread Daniel Gustafsson
> On 27 Nov 2023, at 14:06, Masahiko Sawada  wrote:

> Is it true that we can modify the timeout after creating
> BackgroundPsql object? If so, it seems we don't need to introduce the
> new timeout argument. But how?

I can't remember if that's leftovers that incorrectly remains from an earlier
version of the BackgroundPsql work, or if it's a very bad explanation of
->set_query_timer_restart().  The timeout will use the timeout_default value
and that cannot be overridden, it can only be reset per query.

With your patch the timeout still cannot be changed, but at least set during
start which seems good enough until there are tests warranting more complexity.
The docs should be corrected to reflect this in your patch.

--
Daniel Gustafsson





  1   2   >