date:20231226

On Mon, Dec 25, 2023 at 12:51 PM Kyotaro Horiguchi
 wrote:
>
> A new function check_control_file() in pg_combinebackup.c has the
> following message.
>
> >   pg_fatal("%s: crc is incorrect", controlpath);
>
> I think "crc" should be in all uppercase in general and a brief
> grep'ing told me that it is almost always or consistently used in
> uppercase in our tree.

I pushed this also.

Re: A typo in a messsage?

On Fri, Dec 22, 2023 at 1:50 PM Kyotaro Horiguchi
 wrote:
>
> I found the following message introduced by a recent commit.
>
> > errdetail("The first unsummarized LSN is this range is %X/%X.",
>
> Shouldn't the "is" following "LSN" be "in"?

Pushed.

Re: Tab complete for CREATE SUBSCRIPTION ... CONECTION does not work

2023-12-26 Thread Japin Li



On Tue, 26 Dec 2023 at 18:10, Shubham Khanna  
wrote:
> On Tue, Dec 26, 2023 at 3:02 PM Japin Li  wrote:
>>
>>
>> Hi hacker,
>>
>> As $subject detailed, the tab-complete cannot work such as:
>>
>>CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=6543' \t
>>
>> It seems that the get_previous_words() could not parse the single quote.
>>
>> OTOH, it works for CREATE SUBSCRIPTION sub CONNECTION xx \t, should we fix 
>> it?
>>
> I reviewed the Patch and it looks fine to me.
>
Thanks for the review.

--
Regrads,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.

Re: Synchronizing slots from primary to standby

2023-12-26 Thread Masahiko Sawada

Hi,

Thank you for working on this.

On Tue, Dec 26, 2023 at 9:27 PM shveta malik  wrote:
>
> On Tue, Dec 26, 2023 at 4:41 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Wednesday, December 20, 2023 7:37 PM Amit Kapila 
> >  wrote:
> > >
> > > On Wed, Dec 20, 2023 at 3:29 PM shveta malik 
> > > wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 9:12 AM Amit Kapila 
> > > wrote:
> > > > >
> > > > > On Tue, Dec 19, 2023 at 5:30 PM shveta malik 
> > > wrote:
> > > > > >
> > > > > > Thanks for reviewing. I have addressed these in v50.
> > > > > >
> > > > >
> > > > > I was looking at this patch to see if something smaller could be
> > > > > independently committable. I think we can extract
> > > > > pg_get_slot_invalidation_cause() and commit it as that function
> > > > > could be independently useful as well. What do you think?
> > > > >
> > > >
> > > > Sure, forked another thread [1]
> > > > [1]:
> > > >
> > > https://www.postgresql.org/message-id/CAJpy0uBpr0ym12%2B0mXpjcRFA6
> > > N%3D
> > > > anX%2BYk9aGU4EJhHNu%3DfWykQ%40mail.gmail.com
> > > >
> > >
> > > Thanks, thinking more, we can split the patch into the following three 
> > > patches
> > > which can be committed separately (a) Allowing the failover property to 
> > > be set
> > > for a slot via SQL API and subscription commands
> > > (b) sync slot worker infrastructure (c) GUC standby_slot_names and the the
> > > corresponding wait logic in server-side.
> > >
> > > Thoughts?
> >
> > I agree. Here is the V54 patch set which was split based on the suggestion.
> > The commit message in each patch is also improved.
> >
>
> I would like to revisit the current dependency of slotsync worker on
> dbname used in 002 patch. Currently we accept dbname in
> primary_conninfo and thus the user has to make sure to provide one (by
> manually altering it) even in case of a conf file auto-generated by
> "pg_basebackup -R".
> Thus I would like to discuss if there are better ways to do it.
> Complete background is as follow:
>
> We need dbname for 2 purposes:
>
> 1) to connect to remote db in order to run SELECT queries to fetch the
> info needed by slotsync worker.
> 2) to make connection in slot-sync worker itself in order to be able
> to use libpq APIs for 1)
>
> We run 3 kind of select queries in slot-sync worker currently:
>
> a) To fetch all failover slots (logical slots) info at once in
> synchronize_slots().
> b) To fetch a particular slot info during
> wait_for_primary_slot_catchup() logic (logical slot).
> c) To validate primary slot (physical one) and also to distinguish
> between standby and cascading standby by running pg_is_in_recovery().
>
>  1) One approach to avoid dependency on dbname is using commands
> instead of SELECT.  This will need implementing LIST_SLOTS command for
> a), and for b) we can use LIST_SLOTS and fetch everything (even though
> it is not needed) or have LIST_SLOTS with a filter on slot-name or
> extend READ_REPLICATION_SLOT,  and for c) we can have some other
> command to get pg_is_in_recovery() info. But, I feel by relying on
> commands we will be making the extension of the slot-sync feature
> difficult. In future, if there is some more requirement to fetch any
> other info,
> then there too we have to implement a command. I am not sure if it is
> good and extensible approach.
>
> 2) Another way to avoid asking for a dbname in primary_conninfo is to
> use the default dbname internally. This brings us to two questions:
> 'How' and 'Which default db'?
>
> 2.1) To answer 'How':
> Using default dbname is simpler for the purpose of slot-sync worker
> having its own db-connection, but is a little tricky for the purpose
> of connection to remote_db. This is because we have to inject this
> dbname internally in our connection-info.
>
> 2.1.1) Say we use primary_conninfo (i.e. original one w/o dbname),
> then currently it could have 2 formats:
>
> a) The simple "=" format for key-value pairs, example:
> 'user=replication host=127.0.0.1 port=5433 dbname=postgres'.
> b) URI format, example:
> postgresql://other@localhost/otherdb?connect_timeout=10_name=myapp
>
> We can distinguish between the 2 formats using 'uri_prefix_length' but
> injecting the dbname part will be messy specially for URI format.  If
> we want to do it w/o injecting and only by changing libpq interfaces
> to accept dbname separately apart from conninfo, then there is no
> current simpler way available. It will need a good amount of changes
> in libpq.
>
> 2.1.2) Another way is to not rely on primary_conninfo directly but
> rely on 'WalRcv->conninfo' in order to connect to remote_db. This is
> because the latter is never URI format, it is some parsed format and
> appending may work. As an example, primary_conninfo =
> 'postgresql://replication@localhost:5433', WalRcv->conninfo loaded
> internally is:
> "user=replication passfile=/home/shveta/.pgpass channel_binding=prefer
> dbname=replication host=localhost port=5433
> fallback_application_name=walreceiver sslmode=prefer

Re: POC: GROUP BY optimization

Andrei Lepikhov  writes:
> To be clear. In [1], I mentioned we can perform micro-benchmarks and 
> structure costs of operators. At least for fixed-length operators, it is 
> relatively easy.

I repeat what I said: this is a fool's errand.  You will not get
trustworthy results even for the cases you measured, let alone
all the rest.  I'd go as far as to say I would not believe your
microbenchmarks, because they would only apply for one platform,
compiler, backend build, phase of the moon, etc.

regards, tom lane

[PATCH] pg_dump: Do not dump statistics for excluded tables

2023-12-26 Thread Rian McGuire


Hi hackers,

I've attached a patch against master that addresses a small bug in pg_dump.

Previously, pg_dump would include CREATE STATISTICS statements for
tables that were excluded from the dump, causing reload to fail if any
excluded tables had extended statistics.

The patch skips the creation of the StatsExtInfo if the associated
table does not have the DUMP_COMPONENT_DEFINITION flag set. This is
similar to how getPublicationTables behaves if a table is excluded.

I've covered this with a regression test by altering one of the CREATE
STATISTICS examples to work with the existing 'exclude_test_table'
run. Without the fix, that causes the test to fail with:
# Failed test 'exclude_test_table: should not dump CREATE STATISTICS
extended_stats_no_options'
# at t/002_pg_dump.pl line 4934.

Regards,
RianFrom 0fe06338728981fa727e3e6b99247741deda75fb Mon Sep 17 00:00:00 2001
From: Rian McGuire 
Date: Wed, 27 Dec 2023 13:09:31 +1100
Subject: [PATCH] pg_dump: Do not dump statistics for excluded tables

Previously, pg_dump would include CREATE STATISTICS statements for
tables that were excluded from the dump, causing reload to fail.
---
 src/bin/pg_dump/pg_dump.c| 43 +++-
 src/bin/pg_dump/t/002_pg_dump.pl |  5 ++--
 2 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 8c0b5486b9..c67ed416e9 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -7288,11 +7288,13 @@ getExtendedStatistics(Archive *fout)
int ntups;
int i_tableoid;
int i_oid;
+   int i_stxrelid;
int i_stxname;
int i_stxnamespace;
int i_stxowner;
int i_stattarget;
-   int i;
+   int i,
+   j;
 
/* Extended statistics were new in v10 */
if (fout->remoteVersion < 10)
@@ -7301,11 +7303,11 @@ getExtendedStatistics(Archive *fout)
query = createPQExpBuffer();
 
if (fout->remoteVersion < 13)
-   appendPQExpBufferStr(query, "SELECT tableoid, oid, stxname, "
+   appendPQExpBufferStr(query, "SELECT tableoid, oid, stxrelid, 
stxname, "
 "stxnamespace, 
stxowner, (-1) AS stxstattarget "
 "FROM 
pg_catalog.pg_statistic_ext");
else
-   appendPQExpBufferStr(query, "SELECT tableoid, oid, stxname, "
+   appendPQExpBufferStr(query, "SELECT tableoid, oid, stxrelid, 
stxname, "
 "stxnamespace, 
stxowner, stxstattarget "
 "FROM 
pg_catalog.pg_statistic_ext");
 
@@ -7315,27 +7317,44 @@ getExtendedStatistics(Archive *fout)
 
i_tableoid = PQfnumber(res, "tableoid");
i_oid = PQfnumber(res, "oid");
+   i_stxrelid = PQfnumber(res, "stxrelid");
i_stxname = PQfnumber(res, "stxname");
i_stxnamespace = PQfnumber(res, "stxnamespace");
i_stxowner = PQfnumber(res, "stxowner");
i_stattarget = PQfnumber(res, "stxstattarget");
 
+   /* this allocation may be more than we need */
statsextinfo = (StatsExtInfo *) pg_malloc(ntups * sizeof(StatsExtInfo));
+   j = 0;
 
for (i = 0; i < ntups; i++)
{
-   statsextinfo[i].dobj.objType = DO_STATSEXT;
-   statsextinfo[i].dobj.catId.tableoid = atooid(PQgetvalue(res, i, 
i_tableoid));
-   statsextinfo[i].dobj.catId.oid = atooid(PQgetvalue(res, i, 
i_oid));
-   AssignDumpId([i].dobj);
-   statsextinfo[i].dobj.name = pg_strdup(PQgetvalue(res, i, 
i_stxname));
-   statsextinfo[i].dobj.namespace =
+   Oid stxrelid;
+   TableInfo  *tbinfo;
+
+   /*
+* Only dump extended statistics if we're going to dump the 
definition
+* of the table that they're associated with.
+*/
+   stxrelid = atoi(PQgetvalue(res, i, i_stxrelid));
+   tbinfo = findTableByOid(stxrelid);
+   if (tbinfo == NULL || !(tbinfo->dobj.dump & 
DUMP_COMPONENT_DEFINITION))
+   continue;
+
+   statsextinfo[j].dobj.objType = DO_STATSEXT;
+   statsextinfo[j].dobj.catId.tableoid = atooid(PQgetvalue(res, i, 
i_tableoid));
+   statsextinfo[j].dobj.catId.oid = atooid(PQgetvalue(res, i, 
i_oid));
+   AssignDumpId([j].dobj);
+   statsextinfo[j].dobj.name = pg_strdup(PQgetvalue(res, i, 
i_stxname));
+   statsextinfo[j].dobj.namespace =
findNamespace(atooid(PQgetvalue(res,

Re: POC: GROUP BY optimization

2023-12-26 Thread Andrei Lepikhov


On 27/12/2023 11:15, Alexander Korotkov wrote:

On Wed, Dec 27, 2023 at 5:23 AM Tom Lane  wrote:

Alexander Korotkov  writes:

2) An accurate estimate of the sorting cost is quite a difficult task.


Indeed.


What if we make a simple rule of thumb that sorting integers and
floats is cheaper than sorting numerics and strings with collation C,
in turn, that is cheaper than sorting collation-aware strings
(probably more groups)?  Within the group, we could keep the original
order of items.


I think it's a fool's errand to even try to separate different sort
column orderings by cost.  We simply do not have sufficiently accurate
cost information.  The previous patch in this thread got reverted because
of that (well, also some implementation issues, but mostly that), and
nothing has happened to make me think that another try will fare any
better.
To be clear. In [1], I mentioned we can perform micro-benchmarks and 
structure costs of operators. At least for fixed-length operators, it is 
relatively easy. So, the main block here is an accurate prediction of 
ndistincts for different combinations of columns. Does it make sense to 
continue to design the feature in the direction of turning on choosing 
between different sort column orderings if we have extended statistics 
on the columns?


[1] 
https://www.postgresql.org/message-id/e3602ccb-e643-2e79-ed2c-1175a8053...@postgrespro.ru


--
regards,
Andrei Lepikhov
Postgres Professional

Re: POC: GROUP BY optimization

Alexander Korotkov  writes:
> On Wed, Dec 27, 2023 at 5:23 AM Tom Lane  wrote:
>> I think it's a fool's errand to even try to separate different sort
>> column orderings by cost.

> Besides sorting column orderings by cost, this patch also tries to
> match GROUP BY pathkeys to input pathkeys and ORDER BY pathkeys.  Do
> you think there is a chance for the second part if we leave the cost
> part aside?

I think it's definitely reasonable to try to match up available
orderings, because that doesn't really require fine distinctions
of cost: either it matches or it doesn't.  Eliminating a sort step
entirely is clearly a win.  (Incremental sort complicates this though.
I doubt our cost model for incremental sorts is any good either, so
I am not eager to rely on that more heavily.)

regards, tom lane

Re: POC: GROUP BY optimization

On Wed, Dec 27, 2023 at 5:23 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > 2) An accurate estimate of the sorting cost is quite a difficult task.
>
> Indeed.
>
> > What if we make a simple rule of thumb that sorting integers and
> > floats is cheaper than sorting numerics and strings with collation C,
> > in turn, that is cheaper than sorting collation-aware strings
> > (probably more groups)?  Within the group, we could keep the original
> > order of items.
>
> I think it's a fool's errand to even try to separate different sort
> column orderings by cost.  We simply do not have sufficiently accurate
> cost information.  The previous patch in this thread got reverted because
> of that (well, also some implementation issues, but mostly that), and
> nothing has happened to make me think that another try will fare any
> better.

If there is a choice of what to compare first: 8-bytes integers or
collation-aware texts possibly toasted, then the answer is pretty
evident for me.  For sure, there are cases then this choice is wrong.
But even if all the integers appear to be the same, the penalty isn't
that much.

Besides sorting column orderings by cost, this patch also tries to
match GROUP BY pathkeys to input pathkeys and ORDER BY pathkeys.  Do
you think there is a chance for the second part if we leave the cost
part aside?

--
Regards,
Alexander Korotkov

Re: POC: GROUP BY optimization

Alexander Korotkov  writes:
> 2) An accurate estimate of the sorting cost is quite a difficult task.

Indeed.

> What if we make a simple rule of thumb that sorting integers and
> floats is cheaper than sorting numerics and strings with collation C,
> in turn, that is cheaper than sorting collation-aware strings
> (probably more groups)?  Within the group, we could keep the original
> order of items.

I think it's a fool's errand to even try to separate different sort
column orderings by cost.  We simply do not have sufficiently accurate
cost information.  The previous patch in this thread got reverted because
of that (well, also some implementation issues, but mostly that), and
nothing has happened to make me think that another try will fare any
better.

regards, tom lane

Re: [PoC] Improve dead tuple storage for lazy vacuum

On Tue, Dec 26, 2023 at 12:43 PM Masahiko Sawada  wrote:
>
> On Thu, Dec 21, 2023 at 4:41 PM John Naylor  wrote:

> > +TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber 
> > *offsets,
> > + int num_offsets)
> > +{
> > + char buf[MaxBlocktableEntrySize];
> > + BlocktableEntry *page = (BlocktableEntry *) buf;
> >
> > I'm not sure this is safe with alignment. Maybe rather than plain
> > "char", it needs to be a union with BlocktableEntry, or something.
>
> I tried it in the new patch set but could you explain why it could not
> be safe with alignment?

I was thinking because "buf" is just an array of bytes. But, since the
next declaration is a cast to a pointer to the actual type, maybe we
can rely on the compiler to do the right thing. (It seems to on my
machine in any case)

> > About separation of responsibilities for locking: The only thing
> > currently where the tid store is not locked is tree iteration. That's
> > a strange exception. Also, we've recently made RT_FIND return a
> > pointer, so the caller must somehow hold a share lock, but I think we
> > haven't exposed callers the ability to do that, and we rely on the tid
> > store lock for that. We have a mix of tree locking and tid store
> > locking. We will need to consider carefully how to make this more
> > clear, maintainable, and understandable.
>
> Yes, tidstore should be locked during the iteration.
>
> One simple direction about locking is that the radix tree has the lock
> but no APIs hold/release it. It's the caller's responsibility. If a
> data structure using a radix tree for its storage has its own lock
> (like tidstore), it can use it instead of the radix tree's one. A

It looks like the only reason tidstore has its own lock is because it
has no way to delegate locking to the tree's lock. Instead of working
around the limitations of the thing we've designed, let's make it work
for the one use case we have. I think we need to expose RT_LOCK_*
functions to the outside, and have tid store use them. That would
allow us to simplify all those "if (TidStoreIsShared(ts)
LWLockAcquire(..., ...)" calls, which are complex and often redundant.

At some point, we'll probably want to keep locking inside, at least to
smooth the way for fine-grained locking you mentioned.

> > In a green field, it'd be fine to replace these with an expression of
> > "num_offsets", but it adds a bit of noise for reviewers and the git
> > log. Is it really necessary?
>
> I see your point. I think we can live with having both
> has_lpdead_items and num_offsets. But we will have to check if these
> values are consistent, which could be less maintainable.

It would be clearer if that removal was split out into a separate patch.

> >  I'm also not quite sure why "deadoffsets" and "lpdead_items" got
> > moved to the PruneState. The latter was renamed in a way that makes
> > more sense, but I don't see why the churn is necessary.
...
> > I guess it was added here, 800 lines away? If so, why?
>
> The above changes are related. The idea is not to use tidstore in a
> one-pass strategy. If the table doesn't have any indexes, in
> lazy_scan_prune() we collect offset numbers of dead tuples on the page
> and vacuum the page using them. In this case, we don't need to use
> tidstore so we pass the offsets array to lazy_vacuum_heap_page(). The
> LVPagePruneState is a convenient place to store collected offset
> numbers.

Okay, that makes sense, but if it was ever explained, I don't
remember, and there is nothing in the commit message either.

I'm not sure this can be split up easily, but if so it might help reviewing.

This change also leads to a weird-looking control flow:

if (vacrel->nindexes == 0)
{
  if (prunestate.num_offsets > 0)
  {
...
  }
}
else if (prunestate.num_offsets > 0)
{
  ...
}

Re: POC: GROUP BY optimization

On Tue, Dec 26, 2023 at 1:37 PM Andrei Lepikhov
 wrote:
> On 21/12/2023 17:53, Alexander Korotkov wrote:
> > On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
> >  wrote:
> >> New version of the patch. Fixed minor inconsistencies and rebased onto
> >> current master.
> > Thank you (and other authors) for working on this subject.  Indeed to
> > GROUP BY clauses are order-agnostic.  Reordering them in the most
> > suitable order could give up significant query planning benefits.  I
> > went through the thread: I see significant work has been already made
> > on this patch, the code is quite polished.
> Maybe, but issues, mentioned in [1], still not resolved. It is the only
> reason, why this thread hasn't been active.

Yes, this makes sense.  I have a couple of points from me on this subject.
1) The patch reorders GROUP BY items not only to make comparison
cheaper but also to match the ordering of input paths and to match the
ORDER BY clause.  Thus, even if we leave aside for now sorting GROUP
BY items by their cost, the patch will remain valuable.
2) An accurate estimate of the sorting cost is quite a difficult task.
What if we make a simple rule of thumb that sorting integers and
floats is cheaper than sorting numerics and strings with collation C,
in turn, that is cheaper than sorting collation-aware strings
(probably more groups)?  Within the group, we could keep the original
order of items.

--
Regards,
Alexander Korotkov

Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'

On Sun, Dec 24, 2023 at 2:02 PM Alexander Korotkov  wrote:
> The most noticeable thing for me is that self-join removal doesn't work with 
> partitioned tables.  I think this is the direction for future work on this 
> subject.  In non-partitioned cases, patchset gives a small memory overhead.  
> However, the memory consumption is still much less than it is without the 
> self-join removal.  So, removing the join still lowers memory consumption 
> even if it copies some Bitmapsets.  Given that patchset [1] is required for 
> the correctness of memory manipulations in Bitmapsets during join removals, 
> I'm going to push it if there are no objections.

Pushed!

--
Regards,
Alexander Korotkov

Re: [HACKERS] Changing references of password encryption to hashing

2023-12-26 Thread Bruce Momjian

On Tue, Nov 28, 2023 at 10:01:57AM -0500, Robert Haas wrote:
> On Tue, Nov 28, 2023 at 9:55 AM Stephen Frost  wrote:
> > I do think we should use the correct terminology in our documentation
> > and would support your working on improving things in this area.
> 
> +1.

Attached is a draft patch to use the term "hash" instead of "encrypt"
for password storage.  I was not able to use Michael Paquier's version
from 2017 because the code has changed too much.

I did _not_ change the user API, so CREATE/ALTER ROLE still uses
[ENCRYPTED] PASSWORD, the GUC is still called password_encryption, and
the libpq function is still called PQencryptPasswordConn().  This makes
the user interface confusing since the API uses "encryption" but the
text calls it "hashing".  Is there support for renaming the API to use
"hash" and keeping "encrypt" for backward compatiblity.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/contrib/passwordcheck/expected/passwordcheck.out b/contrib/passwordcheck/expected/passwordcheck.out
index 2027681daf..627b556bdf 100644
--- a/contrib/passwordcheck/expected/passwordcheck.out
+++ b/contrib/passwordcheck/expected/passwordcheck.out
@@ -11,7 +11,7 @@ ERROR:  password must not contain user name
 -- error: contains only letters
 ALTER USER regress_passwordcheck_user1 PASSWORD 'alessnicelongpassword';
 ERROR:  password must contain both letters and nonletters
--- encrypted ok (password is "secret")
+-- hashed ok (password is "secret")
 ALTER USER regress_passwordcheck_user1 PASSWORD 'md592350e12ac34e52dd598f90893bb3ae7';
 -- error: password is user name
 ALTER USER regress_passwordcheck_user1 PASSWORD 'md507a112732ed9f2087fa90b192d44e358';
diff --git a/contrib/passwordcheck/expected/passwordcheck_1.out b/contrib/passwordcheck/expected/passwordcheck_1.out
index 5d8d5dcc1c..cff6da9a63 100644
--- a/contrib/passwordcheck/expected/passwordcheck_1.out
+++ b/contrib/passwordcheck/expected/passwordcheck_1.out
@@ -11,7 +11,7 @@ ERROR:  password must not contain user name
 -- error: contains only letters
 ALTER USER regress_passwordcheck_user1 PASSWORD 'alessnicelongpassword';
 ERROR:  password must contain both letters and nonletters
--- encrypted ok (password is "secret")
+-- hashed ok (password is "secret")
 ALTER USER regress_passwordcheck_user1 PASSWORD 'md592350e12ac34e52dd598f90893bb3ae7';
 -- error: password is user name
 ALTER USER regress_passwordcheck_user1 PASSWORD 'md507a112732ed9f2087fa90b192d44e358';
diff --git a/contrib/passwordcheck/passwordcheck.c b/contrib/passwordcheck/passwordcheck.c
index ae4a0abe2e..6abd5c1634 100644
--- a/contrib/passwordcheck/passwordcheck.c
+++ b/contrib/passwordcheck/passwordcheck.c
@@ -35,13 +35,13 @@ static check_password_hook_type prev_check_password_hook = NULL;
 /*
  * check_password
  *
- * performs checks on an encrypted or unencrypted password
+ * performs checks on an hashed or unencrypted password
  * ereport's if not acceptable
  *
  * username: name of role being created or changed
- * password: new password (possibly already encrypted)
+ * password: new password (possibly already hashed)
  * password_type: PASSWORD_TYPE_* code, to indicate if the password is
- *			in plaintext or encrypted form.
+ *			in plaintext or hashed form.
  * validuntil_time: password expiration time, as a timestamptz Datum
  * validuntil_null: true if password expiration time is NULL
  *
@@ -64,9 +64,9 @@ check_password(const char *username,
 	if (password_type != PASSWORD_TYPE_PLAINTEXT)
 	{
 		/*
-		 * Unfortunately we cannot perform exhaustive checks on encrypted
+		 * Unfortunately we cannot perform exhaustive checks on hashed
 		 * passwords - we are restricted to guessing. (Alternatively, we could
-		 * insist on the password being presented non-encrypted, but that has
+		 * insist on the password being presented non-hashed, but that has
 		 * its own security disadvantages.)
 		 *
 		 * We only check for username = password.
diff --git a/contrib/passwordcheck/sql/passwordcheck.sql b/contrib/passwordcheck/sql/passwordcheck.sql
index 1fbd6b0e96..f7befd2e41 100644
--- a/contrib/passwordcheck/sql/passwordcheck.sql
+++ b/contrib/passwordcheck/sql/passwordcheck.sql
@@ -14,7 +14,7 @@ ALTER USER regress_passwordcheck_user1 PASSWORD 'xyzregress_passwordcheck_user1'
 -- error: contains only letters
 ALTER USER regress_passwordcheck_user1 PASSWORD 'alessnicelongpassword';
 
--- encrypted ok (password is "secret")
+-- hashed ok (password is "secret")
 ALTER USER regress_passwordcheck_user1 PASSWORD 'md592350e12ac34e52dd598f90893bb3ae7';
 
 -- error: password is user name
diff --git a/contrib/pgcrypto/crypt-des.c b/contrib/pgcrypto/crypt-des.c
index 98c30ea122..4d4bac7c39 100644
--- a/contrib/pgcrypto/crypt-des.c
+++ b/contrib/pgcrypto/crypt-des.c
@@ -750,7 +750,7 @@ px_crypt_des(const char *key, const char *setting)
 		output[0] = setting[0];
 
 		/*
-

Re: Update docs for default value of fdw_tuple_cost

2023-12-26 Thread Richard Guo

On Wed, Dec 27, 2023 at 12:27 AM Umair Shahid 
wrote:

> Commit cac169d686eddb277880a0d8a760ac3007b4846a updated the default value
> of fdw_tuple_cost from 0.01 to 0.2. The attached patch updates the docs to
> reflect this change.
>

+1. Nice catch.

Thanks
Richard

>

Multidimensional Histograms

2023-12-26 Thread Alexander Cheshev

Hello Hackers,

To improve selectivities of queries I suggest to add support of
multidimensional histograms as described in paper [1].

To query multidimensional histograms efficiently we can use H-trees as
described in paper [2].

Postgres has limited support of multivariate statistics:
 * MCV only useful for columns with small number of distinct values;
 * functional dependencies only reflect dependencies among columns
(not column values).

[1] http://www.cs.cmu.edu/~rcarlson/docs/RyanCarlson_databases.pdf
[2] https://dl.acm.org/doi/pdf/10.1145/50202.50205

-- 
Regards,
Alexander Cheshev

Re: A tiny improvement of psql

2023-12-26 Thread Jelte Fennema-Nio

On Tue, 26 Dec 2023 at 22:45, Vik Fearing  wrote:
> It is kind of something we control.  Per the psql docs, setting
>
>  HISTCONTROL=ignoredups
>
> will do the trick.

Yeah, the easiest "fix" (that I know of) for a user is to set
HISTCONTROL in ~/.psqlrc to ignoredups using:

\set HISTCONTROL ignoredups

But honestly, I think that should probably be made the default. I
can't really think of a reason who would actually want the current
default of "none". And while we're at it maybe there are some other
defaults in psql that are worth changing. The main ones from my psqlrc
that seem like good defaults for pretty much everyone:

\x auto
\pset linestyle unicode

And maybe fixing the major pitfall I always run into with psql: Having
ON_ERROR_STOP default to on when a script is passed in using -f/--file

Re: Fix Brin Private Spool Initialization (src/backend/access/brin/brin.c)

2023-12-26 Thread Tomas Vondra

On 12/26/23 19:10, Ranier Vilela wrote:
> Hi,
> 
> The commit b437571  I
> think has an oversight.
> When allocate memory and initialize private spool in function:
> _brin_leader_participate_as_worker
> 
> The behavior is the bs_spool (heap and index fields)
> are left empty.
> 
> The code affected is:
>   buildstate->bs_spool = (BrinSpool *) palloc0(sizeof(BrinSpool));
> - buildstate->bs_spool->heap = buildstate->bs_spool->heap;
> - buildstate->bs_spool->index = buildstate->bs_spool->index;
> + buildstate->bs_spool->heap = heap;
> + buildstate->bs_spool->index = index;
> 
> Is the fix correct?
> 

Thanks for noticing this. Yes, I believe this is a bug - the assignments
are certainly wrong, it leaves the fields set to NULL.

I wonder how come this didn't fail during testing. Surely, if the leader
participates as a worker, the tuplesort_begin_index_brin shall be called
with heap/index being NULL, leading to some failure during the sort. But
maybe this means we don't actually need the heap/index fields, it's just
a copy of TuplesortIndexArg, but BRIN does not need that because we sort
the tuples by blkno, and we don't need the descriptors for that.

In any case, the _brin_parallel_scan_and_build does not actually need
the separate heap/index arguments, those are already in the spool.

I'll try to figure out if we want to simplify the tuplesort or remove
the arguments from _brin_parallel_scan_and_build.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2023-12-26 Thread Tomas Vondra

Hi,

I wanted to take a look at the patch, and I noticed it's broken since
3d51cb5197 renamed a couple pgstat functions in August. I plan to maybe
do some benchmarks etc. preferably on current master, so here's a
version fixing that minor bitrot.

As for the patch, I only skimmed through the thread so far, to get some
idea of what the approach and goals are, etc. I didn't look at the code
yet, so can't comment on that.

However, at pgconf.eu a couple week ago I had quite a few discussions
about such "backend memory limit" could/should work in principle, and
I've been thinking about ways to implement this. So let me share some
thoughts about how this patch aligns with that ...

(FWIW it's not my intent to hijack or derail this patch in any way, but
there's a couple things I think we should do differently.)

I'm 100% on board with having a memory limit "above" work_mem. It's
really annoying that we have no way to restrict the amount of memory a
backend can allocate for complex queries, etc.

But I find it a bit strange that we aim to introduce a "global" memory
limit for all backends combined first. I'm not against having that too,
but it's not the feature I usually wish to have. I need some protection
against runaway backends, that happen to allocate a lot memory.

Similarly, I'd like to be able to have different limits depending on
what the backend does - a backend doing OLAP may naturally need more
memory, while a backend doing OLTP may have a much tighter limit.

But with a single global limit none of this is possible. It may help
reducing the risk of unexpected OOM issues (not 100%, but useful), but
it can't limit the impact to the one backend - if memory starts runnning
out, it will affect all other backends a bit randomly (depending on the
order in which the backends happen to allocate memory). And it does not
consider what workloads the backends execute.

Let me propose a slightly different architecture that I imagined while
thinking about this. It's not radically differrent from what the patch
does, but it focuses on the local accounting first. I believe it's
possible to extend this to enforce the global limit too.

FWIW I haven't tried implementing this - I don't want to "hijack" this
thread and do my own thing. I can take a stab at a PoC if needed.

Firstly, I'm not quite happy with how all the memory contexts have to
do their own version of the accounting and memory checking. I think we
should move that into a new abstraction which I call "memory pool".
It's very close to "memory context" but it only deals with allocating
blocks, not the chunks requested by palloc() etc. So when someone does
palloc(), that may be AllocSetAlloc(). And instead of doing malloc()
that would do MemoryPoolAlloc(blksize), and then that would do all the
accounting and checks, and then do malloc().

This may sound like an unnecessary indirection, but the idea is that a
single memory pool would back many memory contexts (perhaps all for
a given backend). In principle we might even establish separate memory
pools for different parts of the memory context hierarchy, but I'm not
sure we need that.

I can imagine the pool could also cache blocks for cases when we create
and destroy contexts very often, but glibc should already does that for
us, I believe.

For me, the accounting and memory context is the primary goal. I wonder
if we considered this context/pool split while working on the accounting
for hash aggregate, but I think we were too attached to doing all of it
in the memory context hierarchy.

Of course, this memory pool is per backend, and so would be the memory
accounting and limit enforced by it. But I can imagine extending to do
a global limit similar to what the current patch does - using a counter
in shared memory, or something. I haven't reviewed what's the overhead
or how it handles cases when a backend terminates in some unexpected
way. But whatever the current patch does, memory pool could do too.


Secondly, I think there's an annoying issue with the accounting at the
block level - it makes it problematic to use low limit values. We double
the block size, so we may quickly end up with a block size a couple MBs,
which means the accounting granularity gets very coarse.

I think it'd be useful to introduce a "backpressure" between the memory
pool and the memory context, depending on how close we are to the limit.
For example if the limit is 128MB and the backend allocated 16MB so far,
we're pretty far away from the limit. So if the backend requests 8MB
block, that's fine and the memory pool should malloc() that. But if we
already allocated 100MB, maybe we should be careful and not allow 8MB
blocks - the memory pool should be allowed to override this and return
just 1MB block. Sure, this would have to be optional, and not all places
can accept a smaller block than requested (when the chunk would not fit
into the smaller block). It would require a suitable memory pool API
and more work in the memory contexts,

Re: A tiny improvement of psql

2023-12-26 Thread Vik Fearing


On 12/26/23 17:36, Tom Lane wrote:

Kevin Wang  writes:

As you know, we can use the UP arrow key to get the previous command to
avoid extra typing. This is a wonderful feature to save the lives of every
DBA. However, if I type the commands like this sequence: A, B, B, B, B, B,
B,  as you can see, B is the last command I execute.



But if I try to get command A, I have to press the UP key 7 times.  I think
the best way is: when you press the UP key, plsql should show the command
that is different from the previous command, so the recall sequence should
be B -> A, not B -> B -> ... -> A.  Then I only press the UP key 2 times to
get command A.


This is driven by libreadline, not anything we control.  I have
seen the behavior you describe in some other programs, so I wonder
whether it's configurable.


It is kind of something we control.  Per the psql docs, setting

HISTCONTROL=ignoredups

will do the trick.

https://www.postgresql.org/docs/current/app-psql.html#APP-PSQL-VARIABLES-HISTCONTROL
--
Vik Fearing

Re: pg_stat_statements: more test coverage

2023-12-26 Thread Peter Eisentraut


On 24.12.23 03:03, Michael Paquier wrote:

On Sat, Dec 23, 2023 at 03:18:01PM +0100, Peter Eisentraut wrote:

+/* LCOV_EXCL_START */
+PG_FUNCTION_INFO_V1(pg_stat_statements);
  PG_FUNCTION_INFO_V1(pg_stat_statements_1_2);
+/* LCOV_EXCL_STOP */


The only reason why I've seen this used at the C level was to bump up
the coverage requirements because of some internal company projects.
I'm pretty sure to have proposed in the past at least one patch that
would make use of that, but it got rejected.  It is not the only code
area that has a similar pattern.  So why introducing that now?


What other code areas have similar patterns (meaning, extension entry 
points for upgrade support that are not covered by currently available 
extension installation files)?



That's a lot of bloat.  This relies on pg_stat_statements.max's
default to be at 100.


The default is 5000.  I set 100 explicitly in the configuration file for 
the test.



- Use a DO block of a PL function, say with something like that to
ensure an amount of N queries?  Say with something like that after
tweaking pg_stat_statements.track:
CREATE OR REPLACE FUNCTION create_tables(num_tables int)
   RETURNS VOID AS
   $func$
   BEGIN
   FOR i IN 1..num_tables LOOP
 EXECUTE format('
   CREATE TABLE IF NOT EXISTS %I (id int)', 't_' || i);
   END LOOP;
END
$func$ LANGUAGE plpgsql;


I tried it like this first, but this doesn't register as separately 
executed commands for pg_stat_statements.



- Switch the minimum to be around 40~50 in the local
pg_stat_statements.conf used for the regression tests.


100 is the hardcoded minimum for the setting.


+is( $node->safe_psql(
+   'postgres',
+   "SELECT count(*) FROM pg_stat_statements WHERE query LIKE 
'%t1%'"),
+   '2',
+   'pg_stat_statements data kept across restart');


Checking that the contents match would be a bit more verbose than just
a count.  One trick I've used in the patch is in
027_stream_regress.pl, where there is a query grouping the stats
depending on the beginning of the queries.  Not exact, a bit more
verbose.


Yeah, this could be expanded a bit.

Re: No LINGUAS file yet for pg_combinebackup

2023-12-26 Thread Peter Eisentraut


On 26.12.23 13:18, Michael Paquier wrote:

On Mon, Dec 25, 2023 at 12:48:17PM -0500, Tom Lane wrote:

I don't particularly care to see that warning until whenever
it is that the translations first get populated.  Perhaps
we should hack up nls-global.mk to hide the warning, but
that might bite somebody someday.  I'm inclined to just
add an empty LINGUAS file as a temporary measure.
Thoughts?


I've noticed this noise as well, so +1 for the temporary empty file.


I checked that meson also complains about the missing file, except that 
we hadn't equipped pg_combinebackup with NLS support in meson yet.  So I 
added that as well.  So having the empty file is the correct solution 
for both build systems.  Tom has already added the empty file.

Re: [meson] expose buildtype debug/optimization info to pg_config

2023-12-26 Thread Peter Eisentraut


On 14.12.23 10:24, Junwang Zhao wrote:

On Thu, Dec 14, 2023 at 4:50 PM Peter Eisentraut  wrote:


On 12.12.23 11:40, Junwang Zhao wrote:

build system using configure set VAL_CFLAGS with debug and
optimization flags, so pg_config will show these infos. Some
extensions depend on the mechanism.

This patch exposes these flags with a typo fixed together.


I have committed the typo fix.

But I would like to learn more about the requirements of extensions in
this area.  This seems a bit suspicious to me.


This is what I found when building citus against an installation
of meson debug build pg instance, since the CFLAGS doesn't
contain -g flag, the binary doesn't include the debug information,
which is different behavior from configure building system.


Ok, that makes sense.

I think a better place to add those options would the variable 
var_cflags, which are the combined C flags that we export to 
Makefile.global and pg_config.  The cflags variable that you used is 
more for internal use, for passing to the actual compilation commands, 
so adding more options there would be duplicative.


And then set var_cxxflags as well.

Maybe you should also check whether the compiler takes unix-style 
arguments, perhaps using cc.get_argument_syntax().

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

Pavel,

On Mon, Dec 25, 2023 at 8:32 PM Pavel Borisov  wrote:
> I've reviewed both patches:
> 0001 - is a pure refactoring replacing argument transfer from via struct 
> member to transfer explicitly as a function argument. It's justified by the 
> fact firstPage is localized only to several places. The patch looks simple 
> and good enough.
>
> 0002:
> continuescanPrechecked is semantically much better than previous 
> requiredMatchedByPrecheck which confused me earlier. Thanks!
>
> From the new comments, it looks a little bit hard to understand who does 
> what. Semantics "if caller told" in comments looks more clear to me. Could 
> you especially give attention to the comments:
>
> "If they wouldn't be matched, then the *continuescan flag would be set for 
> the current item and the last item on the page accordingly."
> "If the key is required for the opposite direction scan, we need to know 
> there was already at least one matching item on the page.  For those keys."
>
> > Prechecking the value of the continuescan flag for the last item on the
> >+ * page (according to the scan direction).
> Maybe, in this case, it would be more clear like: "...(for backwards scan it 
> will be the first item on a page)"
>
> Otherwise the patch 0002 looks like a good fix for the bug to be pushed.

Thank you for your review.  I've revised comments to meet your suggestions.

--
Regards,
Alexander Korotkov


0001-Remove-BTScanOpaqueData.firstPage-v4.patch
Description: Binary data


0002-Improvements-and-fixes-for-e0b1ee17dc-v4.patch
Description: Binary data

Re: Statistics Import and Export

Bruce Momjian  writes:
> I think we need a robust API to handle two cases:

> *  changes in how we store statistics
> *  changes in how how data type values are represented in the statistics

> We have had such changes in the past, and I think these two issues are
> what have prevented import/export of statistics up to this point.
> Developing an API that doesn't cleanly handle these will cause long-term
> pain.

Agreed.

> In summary, I think we need an SQL-level command for this.

I think a SQL command is an actively bad idea.  It'll just add development
and maintenance overhead that we don't need.  When I worked on this topic
years ago at Salesforce, I had things set up with simple functions, which
pg_dump would invoke by writing more or less

SELECT pg_catalog.load_statistics();

This has a number of advantages, not least of which is that an extension
could plausibly add compatible functions to older versions.  The trick,
as you say, is to figure out what the argument lists ought to be.
Unfortunately I recall few details of what I wrote for Salesforce,
but I think I had it broken down in a way where there was a separate
function call occurring for each pg_statistic "slot", thus roughly

load_statistics(table regclass, attname text, stakind int, stavalue ...);

I might have had a separate load_statistics_xxx function for each
stakind, which would ease the issue of deciding what the datatype
of "stavalue" is.  As mentioned already, we'd also need some sort of
version identifier, and we'd expect the load_statistics() functions
to be able to transform the data if the old version used a different
representation.  I agree with the idea that an explicit representation
of the source table attribute's type would be wise, too.

regards, tom lane

Two small bugs in guc.c

I investigated the report at [1] about pg_file_settings not reporting
invalid values of "log_connections".  It turns out it's broken for
PGC_BACKEND and PGC_SU_BACKEND parameters, but not other ones.
The cause is a bit of premature optimization in this logic:

 * If a PGC_BACKEND or PGC_SU_BACKEND parameter is changed in
 * the config file, we want to accept the new value in the
 * postmaster (whence it will propagate to
 * subsequently-started backends), but ignore it in existing
 * backends. ...

Upon detecting that case, set_config_option just returns -1 immediately
without bothering to validate the value.  It should check for invalid
input before returning -1, which we can mechanize with a one-line fix:

-return -1;
+changeVal = false;

While studying this, I also noted that the bit to prevent changes in
parallel workers seems seriously broken:

if (IsInParallelMode() && changeVal && action != GUC_ACTION_SAVE)
ereport(elevel,
(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
 errmsg("cannot set parameters during a parallel operation")));

This is evidently assuming that ereport() won't return, which seems
like a very dubious assumption given the various values that elevel
can have.  Maybe it's accidentally true -- I don't recall any
reports of trouble here -- but it sure looks fragile.

Hence, proposed patch attached.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/CAK30z9T9gaF_isNquccZxi7agXCSjPjMsFXiifmkfu4VpZguxw%40mail.gmail.com

diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 959a1c76bf..4126b90ad7 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -3412,9 +3412,12 @@ set_config_with_handle(const char *name, config_handle *handle,
 	 * Other changes might need to affect other workers, so forbid them.
 	 */
 	if (IsInParallelMode() && changeVal && action != GUC_ACTION_SAVE)
+	{
 		ereport(elevel,
 (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
  errmsg("cannot set parameters during a parallel operation")));
+		return -1;
+	}
 
 	/* if handle is specified, no need to look up option */
 	if (!handle)
@@ -3513,7 +3516,9 @@ set_config_with_handle(const char *name, config_handle *handle,
  * postmaster (whence it will propagate to
  * subsequently-started backends), but ignore it in existing
  * backends.  This is a tad klugy, but necessary because we
- * don't re-read the config file during backend start.
+ * don't re-read the config file during backend start.  Handle
+ * this by clearing changeVal but continuing, since we do want
+ * to report incorrect values.
  *
  * In EXEC_BACKEND builds, this works differently: we load all
  * non-default settings from the CONFIG_EXEC_PARAMS file
@@ -3526,7 +3531,7 @@ set_config_with_handle(const char *name, config_handle *handle,
  * applies.
  */
 if (IsUnderPostmaster && !is_reload)
-	return -1;
+	changeVal = false;
 			}
 			else if (context != PGC_POSTMASTER &&
 	 context != PGC_BACKEND &&

Re: Moving forward with TDE

2023-12-26 Thread Bruce Momjian

On Sun, Dec 17, 2023 at 06:30:50AM +, Chris Travers wrote:
> Hi,
> 
> I was re-reading the patches here  and there was one thing I didn't 
> understand.
> 
> There are provisions for a separation of data encryption keys for primary and 
> replica I see, and these share a single WAL key.
> 
> But if I am setting up a replica from the primary, and the primary is already 
> encrypted, then do these forceably share the same data encrypting keys?  Is 
> there a need to have (possibly in a follow-up patch) an ability to decrypt 
> and re-encrypt in pg_basebackup (which would need access to both keys) or is 
> this handled already and I just missed it?

Yes, decrypt and re-encrypt in pg_basebackup would be necessary, or in
the actual protocol stream.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: Statistics Import and Export

2023-12-26 Thread Bruce Momjian

On Tue, Dec 26, 2023 at 02:18:56AM +0100, Tomas Vondra wrote:
> interfaces
> --
> 
> When I thought about the ability to dump/load statistics in the past, I
> usually envisioned some sort of DDL that would do the export and import.
> So for example we'd have EXPORT STATISTICS / IMPORT STATISTICS commands,
> or something like that, and that'd do all the work. This would mean
> stats are "first-class citizens" and it'd be fairly straightforward to
> add this into pg_dump, for example. Or at least I think so ...
> 
> Alternatively we could have the usual "functional" interface, with a
> functions to export/import statistics, replacing the DDL commands.
> 
> Unfortunately, none of this works for the pg_upgrade use case, because
> existing cluster versions would not support this new interface, of
> course. That's a significant flaw, as it'd make this useful only for
> upgrades of future versions.
> 
> So I think for the pg_upgrade use case, we don't have much choice other
> than using "custom" export through a view, which is what the patch does.
> 
> However, for the other use case (tweaking optimizer stats) this is not
> really an issue - that always happens on the same instance, so no issue
> with not having the "export" function and so on. I'd bet there are more
> convenient ways to do this than using the export view. I'm sure it could
> share a lot of the infrastructure, ofc.
> 
> I suggest we focus on the pg_upgrade use case for now. In particular, I
> think we really need to find a good way to integrate this into
> pg_upgrade. I'm not against having custom CLI commands, but it's still a
> manual thing - I wonder if we could extend pg_dump to dump stats, or
> make it built-in into pg_upgrade in some way (possibly disabled by
> default, or something like that).

I have some thoughts on this too.  I understand the desire to add
something that can be used for upgrades _to_ PG 17, but I am concerned
that this will give us a cumbersome API that will hamper future
development.  I think we should develop the API we want, regardless of
how useful it is for upgrades _to_ PG 17, and then figure out what
short-term hacks we can add to get it working for upgrades _to_ PG 17; 
these hacks can eventually be removed.  Even if they can't be removed,
they are export-only and we can continue developing the import SQL
command cleanly, and I think import is going to need the most long-term
maintenance.

I think we need a robust API to handle two cases:

*  changes in how we store statistics
*  changes in how how data type values are represented in the statistics

We have had such changes in the past, and I think these two issues are
what have prevented import/export of statistics up to this point.
Developing an API that doesn't cleanly handle these will cause long-term
pain.

In summary, I think we need an SQL-level command for this.  I think we
need to embed the Postgres export version number into the statistics
export file (maybe in the COPY header), and then load the file via COPY
internally (not JSON) into a temporary table that we know matches the
exported Postgres version.  We then need to use SQL to make any
adjustments to it before loading it into pg_statistic.  Doing that
internally in JSON just isn't efficient.  If people want JSON for such
cases, I suggest we add a JSON format to COPY.

I think we can then look at pg_upgrade to see if we can simulate the
export action which can use the statistics import SQL command.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Fix Brin Private Spool Initialization (src/backend/access/brin/brin.c)

2023-12-26 Thread Ranier Vilela

Hi,

The commit b437571  I
think has an oversight.
When allocate memory and initialize private spool in function:
_brin_leader_participate_as_worker

The behavior is the bs_spool (heap and index fields)
are left empty.

The code affected is:
  buildstate->bs_spool = (BrinSpool *) palloc0(sizeof(BrinSpool));
- buildstate->bs_spool->heap = buildstate->bs_spool->heap;
- buildstate->bs_spool->index = buildstate->bs_spool->index;
+ buildstate->bs_spool->heap = heap;
+ buildstate->bs_spool->index = index;

Is the fix correct?

best regards,
Ranier Vilela


001-fix-brin-private-spool-initialization.patch
Description: Binary data

Re: Add the ability to limit the amount of memory that can be allocated to backends.

2023-12-26 Thread Tomas Vondra

On 12/26/23 11:49, Anton A. Melnikov wrote:
> Hello!
> 
> Earlier in this thread, the pgbench results were published, where with a
> strong memory limit of 100MB
> a significant, about 10%, decrease in TPS was observed [1].
> 
> Using dedicated server with 12GB RAM and methodology described in [3], i
> performed five series
> of measurements for the patches from the [2].

Can you share some info about the hardware? For example the CPU model,
number of cores, and so on. 12GB RAM is not quite huge, so presumably it
was a small machine.

> The series were like this:
> 1) unpatched 16th version at the REL_16_BETA1 (e0b82fc8e83) as close to
> [2] in time.
> 2) patched REL_16_BETA1 at e0b82fc8e83 with undefined
> max_total_backend_memory GUC (with default value = 0).
> 3) patched REL_16_BETA1 with max_total_backend_memory = 16GB
> 4) the same with max_total_backend_memory = 8GB
> 5) and again with max_total_backend_memory = 200MB
> 

OK

> Measurements with max_total_backend_memory = 100MB were not be carried out,
> with limit 100MB the server gave an error on startup:
> FATAL:  configured max_total_backend_memory 100MB is <=
> shared_memory_size 143MB
> So i used 200MB to retain all other GUCs the same.
> 

I'm not very familiar with the patch yet, but this seems a bit strange.
Why should shared_buffers be included this limit?

> Pgbench gave the following results:
> 1) and 2) almost the same: ~6350 TPS. See orange and green
> distributions on the attached graph.png respectively.
> 3) and 4) identical to each other (~6315 TPS) and a bit slower than 1)
> and 2) by ~0,6%.
> See blue and yellow distributions respectively.
> 5) is slightly slower (~6285 TPS) than 3) and 4) by another 0,5%. (grey
> distribution)
> The standard error in all series was ~0.2%. There is a raw data in the
> raw_data.txt.
> 

I think 6350 is a pretty terrible number, especially for scale 8, which
is maybe 150MB of data. I think that's a pretty clear sign the system
was hitting some other bottleneck, which can easily mask regressions in
the memory allocation code. AFAICS the pgbench runs were regular r/w
benchmarks, so I'd bet it was hitting I/O, and possibly even subject to
some random effects at that level.

I think what would be interesting are runs with

   pgbench -M prepared -S -c $N -j $N

i.e. read-only tests (to not hit I/O), and $N being sufficiently large
to maybe also show some concurrency/locking bottlenecks, etc.

I may do some benchmarks if I happen to find a bit of time, but maybe
you could collect such numbers too?

The other benchmark that might be interesting is more OLAP, with low
concurrency but backends allocating a lot of memory.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: A tiny improvement of psql

Kevin Wang  writes:
> As you know, we can use the UP arrow key to get the previous command to
> avoid extra typing. This is a wonderful feature to save the lives of every
> DBA. However, if I type the commands like this sequence: A, B, B, B, B, B,
> B,  as you can see, B is the last command I execute.

> But if I try to get command A, I have to press the UP key 7 times.  I think
> the best way is: when you press the UP key, plsql should show the command
> that is different from the previous command, so the recall sequence should
> be B -> A, not B -> B -> ... -> A.  Then I only press the UP key 2 times to
> get command A.

This is driven by libreadline, not anything we control.  I have
seen the behavior you describe in some other programs, so I wonder
whether it's configurable.

> Another requirement is: could we use / to repeat executing the last command
> in plsql just like sqlplus in Oracle?

I'm pretty certain you can configure this for yourself with readline.

regards, tom lane

Update docs for default value of fdw_tuple_cost

2023-12-26 Thread Umair Shahid

Hi

Commit cac169d686eddb277880a0d8a760ac3007b4846a updated the default value
of fdw_tuple_cost from 0.01 to 0.2. The attached patch updates the docs to
reflect this change.


Thanks!

Umair Shahid | Founder


Professional Services for PostgreSQL
https://stormatics.tech/


update-doc-default-fdw_tuple_value.patch
Description: Binary data

A tiny improvement of psql

2023-12-26 Thread Kevin Wang

Hello hackers!

I am an Oracle/PostgreSQL DBA, I am not a PG hacker.  During my daily job,
I find a pain that should be fixed.

As you know, we can use the UP arrow key to get the previous command to
avoid extra typing. This is a wonderful feature to save the lives of every
DBA. However, if I type the commands like this sequence: A, B, B, B, B, B,
B,  as you can see, B is the last command I execute.

But if I try to get command A, I have to press the UP key 7 times.  I think
the best way is: when you press the UP key, plsql should show the command
that is different from the previous command, so the recall sequence should
be B -> A, not B -> B -> ... -> A.  Then I only press the UP key 2 times to
get command A.

I think this should change little code in psql, but it will make all DBA's
lives much easier.  This is a strong requirement from the real DBA. Hope to
get some feedback on this.

Another requirement is: could we use / to repeat executing the last command
in plsql just like sqlplus in Oracle?

I will try to learn how to fix it sooner or later, but if some
proficient hacker focuses on this, it can be fixed quickly, I guess.

Thoughts?

Regards,

Kevin

Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-26 Thread Isaac Morland

On Thu, 21 Dec 2023 at 09:26, Amit Kapila  wrote:

> A conflicting column where NULL indicates no conflict, and other
> > values indicate the reason for the conflict, doesn't seem too bad.
> >
>
> This is fine too.
>

I prefer this option. There is precedent for doing it this way, for example
in pg_stat_activity.wait_event_type.

The most common test of this field is likely to be "is there a conflict"
and it's better to write this as "[fieldname] IS NOT NULL" than to
introduce a magic constant. Also, it makes clear to future maintainers that
this field has one purpose: saying what type of conflict there is, if any.
If we find ourselves wanting to record a new non-conflict status (no idea
what that could be: "almost conflict"? "probably conflict soon"?) there
would be less temptation to break existing tests for "is there a conflict".

Re: Show WAL write and fsync stats in pg_stat_io

2023-12-26 Thread Nazir Bilal Yavuz

Hi,

On Tue, 26 Dec 2023 at 13:10, Michael Paquier  wrote:
>
> On Tue, Dec 26, 2023 at 11:27:16AM +0300, Nazir Bilal Yavuz wrote:
> > Maybe it is better to create a pg_stat_io_wal view like you said
> > before. We could remove unused columns and add op_bytes for each
> > writes and reads. Also, we can track both the number of bytes and the
> > number of the operations. This doesn't fully solve the problem but it
> > will be easier to modify it to meet our needs.
>
> I am not sure while the whole point of the exercise is to have all the
> I/O related data in a single view.  Something that I've also found a
> bit disturbing yesterday while looking at your patch is the fact that
> the operation size is guessed from the context and object type when
> querying the view because now everything is tied to BLCKSZ.  This
> patch extends it with two more operation sizes, and there are even
> cases where it may be a variable.  Could it be a better option to
> extend pgstat_count_io_op_time() so as callers can themselves give the
> size of the operation?

Do you mean removing the op_bytes column and tracking the number of
bytes in reads, writes, and extends? If so, that makes sense to me but
I don't want to remove the number of operations; I believe that has a
value too. We can extend the pgstat_count_io_op_time() so it can both
track the number of bytes and the number of operations.
Also, it is not directly related to this patch but vectored IO [1] is
coming soon; so the number of operations could be wrong since vectored
IO could merge a couple of operations.

>
> The whole patch is kind of itself complicated enough, so I'd be OK to
> discard the case of the WAL receiver for now.  Now, if we do so, the
> code stack of pgstat_io.c should handle WAL receivers as something
> entirely disabled until all the known issues are solved.  There is
> still a lot of value in tracking WAL data associated to the WAL
> writer, normal backends and WAL senders.

Why can't we add comments and leave it as it is? Is it because this
could cause misunderstandings?

If we want to entirely disable it, we can add

if (MyBackendType == B_WAL_RECEIVER && io_object == IOOBJECT_WAL)
return;

to the top of the pgstat_count_io_op_time() since all IOOBJECT_WAL
calls are done by this function, then we can disable it at
pgstat_tracks_io_bktype().

[1] 
https://www.postgresql.org/message-id/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com

--
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Synchronizing slots from primary to standby

2023-12-26 Thread shveta malik

On Tue, Dec 26, 2023 at 4:41 PM Zhijie Hou (Fujitsu)
 wrote:
>
> On Wednesday, December 20, 2023 7:37 PM Amit Kapila  
> wrote:
> >
> > On Wed, Dec 20, 2023 at 3:29 PM shveta malik 
> > wrote:
> > >
> > > On Wed, Dec 20, 2023 at 9:12 AM Amit Kapila 
> > wrote:
> > > >
> > > > On Tue, Dec 19, 2023 at 5:30 PM shveta malik 
> > wrote:
> > > > >
> > > > > Thanks for reviewing. I have addressed these in v50.
> > > > >
> > > >
> > > > I was looking at this patch to see if something smaller could be
> > > > independently committable. I think we can extract
> > > > pg_get_slot_invalidation_cause() and commit it as that function
> > > > could be independently useful as well. What do you think?
> > > >
> > >
> > > Sure, forked another thread [1]
> > > [1]:
> > >
> > https://www.postgresql.org/message-id/CAJpy0uBpr0ym12%2B0mXpjcRFA6
> > N%3D
> > > anX%2BYk9aGU4EJhHNu%3DfWykQ%40mail.gmail.com
> > >
> >
> > Thanks, thinking more, we can split the patch into the following three 
> > patches
> > which can be committed separately (a) Allowing the failover property to be 
> > set
> > for a slot via SQL API and subscription commands
> > (b) sync slot worker infrastructure (c) GUC standby_slot_names and the the
> > corresponding wait logic in server-side.
> >
> > Thoughts?
>
> I agree. Here is the V54 patch set which was split based on the suggestion.
> The commit message in each patch is also improved.
>

I would like to revisit the current dependency of slotsync worker on
dbname used in 002 patch. Currently we accept dbname in
primary_conninfo and thus the user has to make sure to provide one (by
manually altering it) even in case of a conf file auto-generated by
"pg_basebackup -R".
Thus I would like to discuss if there are better ways to do it.
Complete background is as follow:

We need dbname for 2 purposes:

1) to connect to remote db in order to run SELECT queries to fetch the
info needed by slotsync worker.
2) to make connection in slot-sync worker itself in order to be able
to use libpq APIs for 1)

We run 3 kind of select queries in slot-sync worker currently:

a) To fetch all failover slots (logical slots) info at once in
synchronize_slots().
b) To fetch a particular slot info during
wait_for_primary_slot_catchup() logic (logical slot).
c) To validate primary slot (physical one) and also to distinguish
between standby and cascading standby by running pg_is_in_recovery().

 1) One approach to avoid dependency on dbname is using commands
instead of SELECT.  This will need implementing LIST_SLOTS command for
a), and for b) we can use LIST_SLOTS and fetch everything (even though
it is not needed) or have LIST_SLOTS with a filter on slot-name or
extend READ_REPLICATION_SLOT,  and for c) we can have some other
command to get pg_is_in_recovery() info. But, I feel by relying on
commands we will be making the extension of the slot-sync feature
difficult. In future, if there is some more requirement to fetch any
other info,
then there too we have to implement a command. I am not sure if it is
good and extensible approach.

2) Another way to avoid asking for a dbname in primary_conninfo is to
use the default dbname internally. This brings us to two questions:
'How' and 'Which default db'?

2.1) To answer 'How':
Using default dbname is simpler for the purpose of slot-sync worker
having its own db-connection, but is a little tricky for the purpose
of connection to remote_db. This is because we have to inject this
dbname internally in our connection-info.

2.1.1) Say we use primary_conninfo (i.e. original one w/o dbname),
then currently it could have 2 formats:

a) The simple "=" format for key-value pairs, example:
'user=replication host=127.0.0.1 port=5433 dbname=postgres'.
b) URI format, example:
postgresql://other@localhost/otherdb?connect_timeout=10_name=myapp

We can distinguish between the 2 formats using 'uri_prefix_length' but
injecting the dbname part will be messy specially for URI format.  If
we want to do it w/o injecting and only by changing libpq interfaces
to accept dbname separately apart from conninfo, then there is no
current simpler way available. It will need a good amount of changes
in libpq.

2.1.2) Another way is to not rely on primary_conninfo directly but
rely on 'WalRcv->conninfo' in order to connect to remote_db. This is
because the latter is never URI format, it is some parsed format and
appending may work. As an example, primary_conninfo =
'postgresql://replication@localhost:5433', WalRcv->conninfo loaded
internally is:
"user=replication passfile=/home/shveta/.pgpass channel_binding=prefer
dbname=replication host=localhost port=5433
fallback_application_name=walreceiver sslmode=prefer sslcompression=0
sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2
gssencmode=disable krbsrvname=postgres gssdelegation=0
target_session_attrs=any load_balance_hosts=disable", '\000'

So we can try appending our default dbname to this. But all the
defaults loaded in WalRcv->conninfo need some careful

Re: No LINGUAS file yet for pg_combinebackup

On Mon, Dec 25, 2023 at 12:48:17PM -0500, Tom Lane wrote:
> I don't particularly care to see that warning until whenever
> it is that the translations first get populated.  Perhaps
> we should hack up nls-global.mk to hide the warning, but
> that might bite somebody someday.  I'm inclined to just
> add an empty LINGUAS file as a temporary measure.
> Thoughts?

I've noticed this noise as well, so +1 for the temporary empty file.
--
Michael


signature.asc
Description: PGP signature

Re: Intermittent failure with t/003_logical_slots.pl test on windows

2023-12-26 Thread Shlok Kyal

Hi,
The same intermittent failure is reproducible on my system.
For the intermittent issues I found that many issues are due to errors
where commands like 'psql -V' are not returning any output.
To reproduce it in an easy way, I wrote a script (.bat file) with
'--version' option for different binaries. And found out that it was
not giving any output for some command (varies for each run).
Then I tried to run the same script after adding 'fflush(stdout)' in
the function called with  '--version' option and it started to give
output for each command.
I noticed the same for '--help' option and did the changes for the same.

I have attached the test script(changes the extension to .txt as gmail
is blocking it), output of test before the changes.
I have also attached the patch with changes which resolved the above issue.

This change has resolved most of the intermittent issues for me. I am
facing some more intermittent issues. Will analyse and share it as
well.

Thanks and regards
Shlok Kyal

On Tue, 7 Nov 2023 at 11:05, Kyotaro Horiguchi  wrote:
>
> At Mon, 6 Nov 2023 19:42:21 +0530, Nisha Moond  
> wrote in
> > > Appending '2>&1 test:
> > > The command still results in NULL and ends up failing as no data is
> > > returned. Which means even no error message is returned. The error log
>
> Thanks for confirmation. So, at least the child process was launced
> successfully in the cmd.exe's view.
>
> Upon a quick check on my end with Windows' _popen, I have obseved the
> following:
>
> - Once a child process is started, it seems to go undetected as an
>   error by _popen or subsequent fgets calls if the process ends
>   abnormally, with a non-zero exit status or even with a SEGV.
>
> - After the child process has flushed data to stdout, it is possible
>   to read from the pipe even if the child process crashes or ends
>   thereafter.
>
> - Even if fgets is called before the program starts, it will correctly
>   block until the program outputs something. Specifically, when I used
>   popen("sleep 5 & target.exe") and immediately performed fgets on the
>   pipe, I was able to read the output of target.exe as the first line.
>
> Therefore, based on the information available, it is conceivable that
> the child process was killed by something right after it started, or
> the program terminated on its own without any error messages.
>
> By the way, in the case of aforementioned SEGV, Application Errors
> corresponding to it were identifiable in the Event
> Viewer. Additionally, regarding the exit statuses, they can be
> captured by using a wrapper batch file (.bat) that records
> %ERRORLEVEL% after running the target program. This may yield
> insights, aothough its effectiveness is not guaranteed.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>

D:\project\pg_meson_64\bin>pg_ctl -V   
pg_ctl (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_controldata -V  
pg_controldata (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dump -V  
pg_dump (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dumpall -V  
pg_dumpall (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_ctl -V  
pg_ctl (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_controldata -V  
pg_controldata (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dump -V  
pg_dump (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dumpall -V  
pg_dumpall (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_ctl -V  
pg_ctl (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_controldata -V  
pg_controldata (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dump -V  
pg_dump (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dumpall -V  
pg_dumpall (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_ctl -V  
pg_ctl (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_controldata -V  
pg_controldata (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dump -V  
pg_dump (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dumpall -V  
pg_dumpall (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_ctl -V  
pg_ctl (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_controldata -V  
pg_controldata (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dump -V  
pg_dump (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dumpall -V  
pg_dumpall (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_ctl -V  
pg_ctl (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_controldata -V  

D:\project\pg_meson_64\bin>pg_dump -V  
pg_dump (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dumpall -V  
pg_dumpall (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_ctl -V  
pg_ctl (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_controldata -V  
pg_controldata (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dump -V  
pg_dump (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_dumpall -V  
pg_dumpall (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_ctl -V  
pg_ctl (PostgreSQL) 17devel

D:\project\pg_meson_64\bin>pg_controldata -V  
pg_controldata

Re: POC: GROUP BY optimization

2023-12-26 Thread Andrei Lepikhov

On 21/12/2023 17:53, Alexander Korotkov wrote:

On Sun, Oct 1, 2023 at 11:45 AM Andrei Lepikhov
wrote:

New version of the patch. Fixed minor inconsistencies and rebased onto
current master.

Thank you (and other authors) for working on this subject. Indeed to
GROUP BY clauses are order-agnostic. Reordering them in the most
suitable order could give up significant query planning benefits. I
went through the thread: I see significant work has been already made
on this patch, the code is quite polished.
Maybe, but issues, mentioned in [1], still not resolved. It is the only
reason, why this thread hasn't been active.

I'd like to make some notes.
1) As already mentioned, there is clearly a repetitive pattern for the
code following after get_useful_group_keys_orderings() calls. I think
it would be good to extract it into a separate function. Please, do
this as a separate patch coming before the group-by patch. That would
simplify the review.
Yeah, these parts of code a bit different. I will try to make common
routine.

2) I wonder what planning overhead this patch could introduce? Could
you try to measure the worst case? What if we have a table with a lot
of indexes and a long list of group-by clauses partially patching
every index. This should give us an understanding on whether we need
a separate GUC to control this feature.

Ok> 3) I see that get_useful_group_keys_orderings() makes 3 calls to

get_cheapest_group_keys_order() function. Each time
get_cheapest_group_keys_order() performs the cost estimate and
reorders the free keys. However, cost estimation implies the system
catalog lookups (that is quite expensive). I wonder if we could
change the algorithm. Could we just sort the group-by keys by cost
once, save this ordering and then just re-use it. So, every time we
need to reorder a group by, we can just pull the required keys to the
top and use saved ordering for the rest. I also wonder if we could do
this once for add_paths_to_grouping_rel() and
create_partial_grouping_paths() calls. So, it probably should be
somewhere in create_ordinary_grouping_paths().
Thanks for the idea!> 4) I think we can do some optimizations when
enable_incremental_sort

== off. Then in get_useful_group_keys_orderings() we should only deal
with input_path fully matching the group-by clause, and try only full
match of group-by output to the required order.
Oh, we had designed before the incremental sort was invented. Will see
what we can do here.

[1]
https://www.postgresql.org/message-id/60610df1-c32f-ebdf-e58c-7a664431f452%40enterprisedb.com

--
regards,
Andrei Lepikhov
Postgres Professional

An improvement on parallel DISTINCT

2023-12-26 Thread Richard Guo

While reviewing Heikki's Omit-junk-columns patchset[1], I noticed that
root->upper_targets[] is used to set target for partial_distinct_rel,
which is not great because root->upper_targets[] is not supposed to be
used by the core code.  The comment in grouping_planner() says:

  * Save the various upper-rel PathTargets we just computed into
  * root->upper_targets[].  The core code doesn't use this, but it
  * provides a convenient place for extensions to get at the info.

Then while fixing this issue, I noticed an opportunity for improvement
in how we generate Gather/GatherMerge paths for the two-phase DISTINCT.
The Gather/GatherMerge paths are added by generate_gather_paths(), which
does not consider ordering that might be useful above the GatherMerge
node.  This can be improved by using generate_useful_gather_paths()
instead.  With this change I can see query plan improvement from the
regression test "select_distinct.sql".  For instance,

-- Test parallel DISTINCT
SET parallel_tuple_cost=0;
SET parallel_setup_cost=0;
SET min_parallel_table_scan_size=0;
SET max_parallel_workers_per_gather=2;

-- Ensure we get a parallel plan
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;

-- on master
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;
 QUERY PLAN

 Unique
   ->  Sort
 Sort Key: four
 ->  Gather
   Workers Planned: 2
   ->  HashAggregate
 Group Key: four
 ->  Parallel Seq Scan on tenk1
(8 rows)

-- on patched
EXPLAIN (costs off)
SELECT DISTINCT four FROM tenk1;
 QUERY PLAN

 Unique
   ->  Gather Merge
 Workers Planned: 2
 ->  Sort
   Sort Key: four
   ->  HashAggregate
 Group Key: four
 ->  Parallel Seq Scan on tenk1
(8 rows)

I believe the second plan is better.

Attached is a patch that includes this change and also eliminates the
usage of root->upper_targets[] in the core code.  It also makes some
tweaks for the comment.

Any thoughts?

[1]
https://www.postgresql.org/message-id/flat/2ca5865b-4693-40e5-8f78-f3b45d5378fb%40iki.fi

Thanks
Richard


v1-0001-Improve-parallel-DISTINCT.patch
Description: Binary data

Re: Show WAL write and fsync stats in pg_stat_io

On Tue, Dec 26, 2023 at 11:27:16AM +0300, Nazir Bilal Yavuz wrote:
> Maybe it is better to create a pg_stat_io_wal view like you said
> before. We could remove unused columns and add op_bytes for each
> writes and reads. Also, we can track both the number of bytes and the
> number of the operations. This doesn't fully solve the problem but it
> will be easier to modify it to meet our needs.

I am not sure while the whole point of the exercise is to have all the
I/O related data in a single view.  Something that I've also found a
bit disturbing yesterday while looking at your patch is the fact that
the operation size is guessed from the context and object type when
querying the view because now everything is tied to BLCKSZ.  This
patch extends it with two more operation sizes, and there are even
cases where it may be a variable.  Could it be a better option to
extend pgstat_count_io_op_time() so as callers can themselves give the
size of the operation?

The whole patch is kind of itself complicated enough, so I'd be OK to
discard the case of the WAL receiver for now.  Now, if we do so, the
code stack of pgstat_io.c should handle WAL receivers as something
entirely disabled until all the known issues are solved.  There is
still a lot of value in tracking WAL data associated to the WAL
writer, normal backends and WAL senders.
--
Michael

signature.asc
Description: PGP signature

Re: Tab complete for CREATE SUBSCRIPTION ... CONECTION does not work

2023-12-26 Thread Shubham Khanna

On Tue, Dec 26, 2023 at 3:02 PM Japin Li  wrote:
>
>
> Hi hacker,
>
> As $subject detailed, the tab-complete cannot work such as:
>
>CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=6543' \t
>
> It seems that the get_previous_words() could not parse the single quote.
>
> OTOH, it works for CREATE SUBSCRIPTION sub CONNECTION xx \t, should we fix it?
>
I reviewed the Patch and it looks fine to me.

Thanks and Regards,
Shubham Khanna.

Re: Synchronizing slots from primary to standby

2023-12-26 Thread Amit Kapila

On Tue, Dec 26, 2023 at 3:00 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > I think we should be able to detect it if we want but do we want to
> > add this restriction considering that users can always install the
> > required plugins after standby gets promoted? I think we can do either
> > way in this case but as we are not going to use these slots till the
> > standby node is promoted, it seems okay to validate the plugins after
> > promotion once users use the synced slots.
>
> Personally it should be detected, but I want to hear opinions from others.
> Below are my reasons:
>
> 1)
> We can avoid a possibility that users miss the installation of plugins. 
> Basically
> we should detect before the issue will really occur.
>
> 2)
> Rules around here might be inconsistent. Slots which will be synchronized can 
> be
> created either way:
>
> a) manual creation via SQL function, or
> b) automatic creation by slotsync worker.
>
> In case of a), the decoding context is created when creation so that the 
> plugin
> must be installed. Case b), however, we allow not to install beforehand. I 
> felt
> it might be confused for users. Thought?
>

I think the (a) way could lead to the setting of incorrect LSNs
(restart_LSN and confirmed_flush_lsn) considering they are not copied
from the primary.

-- 
With Regards,
Amit Kapila.

Re: pg_basebackup has an accidentaly separated help message

On Mon, Dec 25, 2023 at 05:07:28PM +0900, Kyotaro Horiguchi wrote:
> Yes. So, it turns out that they're found after they have been
> committed.

No problem.  I've just applied what you had.  I hope this makes your
life a bit easier ;)
--
Michael


signature.asc
Description: PGP signature

Specify description of the SpecialJoinInfo structure

2023-12-26 Thread Andrei Lepikhov


Hi,

Working on Asymmetric Join, I found slight inconsistency in the 
description of SpecialJoinInfo: join type JOIN_ANTI can be accompanied 
by a zero value of the ojrelid if this join was created by the 
transformation of the NOT EXISTS subquery.


--
regards,
Andrei Lepikhov
Postgres Professionaldiff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ed85dc7414..189b59f127 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -2791,7 +2791,8 @@ typedef struct PlaceHolderVar
  *
  * ojrelid is the RT index of the join RTE representing this outer join,
  * if there is one.  It is zero when jointype is INNER or SEMI, and can be
- * zero for jointype ANTI (if the join was transformed from a SEMI join).
+ * zero for jointype ANTI (if the join was transformed from a SEMI join or
+ * converted from a sublink).
  * One use for this field is that when constructing the output targetlist of a
  * join relation that implements this OJ, we add ojrelid to the varnullingrels
  * and phnullingrels fields of nullable (RHS) output columns, so that the

RE: Synchronizing slots from primary to standby

2023-12-26 Thread Hayato Kuroda (Fujitsu)

Dear Amit,

> I think we should be able to detect it if we want but do we want to
> add this restriction considering that users can always install the
> required plugins after standby gets promoted? I think we can do either
> way in this case but as we are not going to use these slots till the
> standby node is promoted, it seems okay to validate the plugins after
> promotion once users use the synced slots.

Personally it should be detected, but I want to hear opinions from others.
Below are my reasons:

1)
We can avoid a possibility that users miss the installation of plugins. 
Basically
we should detect before the issue will really occur.

2)
Rules around here might be inconsistent. Slots which will be synchronized can be
created either way:

a) manual creation via SQL function, or
b) automatic creation by slotsync worker.

In case of a), the decoding context is created when creation so that the plugin
must be installed. Case b), however, we allow not to install beforehand. I felt
it might be confused for users. Thought?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: planner chooses incremental but not the best one

2023-12-26 Thread ywgrit

Hi，Tomas

Recently, I looked at papers related to estimation of cardinarity with
selection. I may be biased towards the scheme provided by the paper
"Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries
and Event Reports". This paper uses distinct sampling as opposed to the
current uniform sampling, and the main differences between the two are as
follows.

1)It not only counts the distincts on each column or multiple columns, but
also saves some rows corresponding to each distinct value, i.e., it saves
some part of the rows of the original relation as samples. The purpose of
saving these rows is to accommodate restrictions on the queries, such as
where clauses.

2)The queries are executed on the samples, and the result of the execution
is used as the statistical value of cardinarity.

The advantages of this paper over existing practices are as follows.

1)The samples collected can be applied to arbitrary predicates, e.g.
predicates that are correlated with the columns of group by clauses.

2)The accuracy is very high, and in some scenarios, the statistical error
can be minimized by hundreds of times compared to uniform sampling.

However, the scheme provided in this paper also has some defects, as
mentioned above, the scheme relies on the collected samples, which will
lead to a significant increase in the storage overhead of statistical
information.

I'd like to hear your opinions.

ywgrit.

ywgrit  于2023年12月22日周五 16:20写道：

> The possible solution of one scenario I can come up with so far is the
> query's predicate columns and group columns belonging to one table .
>
> For a query that contains where clause, perhaps num_groups could be
> estimated according to the following formula.
>
> num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where
> clause * ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
> sort_col_2, ... sort_col_m) / ndistinct(pred_col_1, pred_col_2, ...
> pred_col_n).
>
> ndistinct(pred_col_1, pred_col_2, ... pred_col_n) with where clause =
> ndistinct(pred_var_1, pred_var_2, ... pred_var_n) * selectivity of rel.
>
> pred_col_n belong to the columns involved in the where clause and
> sort_col_m belong to the columns involved in the group by clause.
>
> The reason for multiplying by selectivity of rel directly is that the
> selectivity of rel depends on only pred_col not sort_col. So the above
> formula can be simplified as follows.
>
> num_groups = ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1,
> sort_col_2, ... sort_col_m) * selectivity of rel.
>
> The correctness of the above formula depends on the following conditions.
>
>-
>
>ndistinct(pred_col_1, pred_col_2, ... pred_col_n)*
>ndistinct(pred_col_1, pred_col_2, ... pred_col_n, sort_col_1, sort_col_2,
>... sort_col_m) statistics already exist, and need be accurate.
>-
>
>Both pred_col_n and sort_col_m are uniformly distributed, if not,
>statistics such as mcv are needed for correction.
>-
>
>The tuples of rel are the number of total tuples of the table , not
>the number of filtered tuples.
>
> After experimentation, in the scenario mentioned in previous thread. The
> estimate num_groups is 3, the accuracy of result strongly relies on the
> uniform distribution of b, which makes ndistinct(pred_col_1, pred_col_2,
> ... pred_col_n) with where clause could be able to estimated accurately.
>
> I'd like to hear your opinions.
>
> Regards.
>
> ywgrit.
>
> Tomas Vondra  于2023年12月18日周一 20:53写道：
>
>>
>>
>> On 12/18/23 11:40, Richard Guo wrote:
>> >
>> > On Mon, Dec 18, 2023 at 7:31 AM Tomas Vondra
>> > mailto:tomas.von...@enterprisedb.com>>
>> > wrote:
>> >
>> > Oh! Now I see what you meant by using the new formula in 84f9a35e3
>> > depending on how we sum tuples. I agree that seems like the right
>> thing.
>> >
>> > I'm not sure it'll actually help with the issue, though - if I
>> apply the
>> > patch, the plan does not actually change (and the cost changes just
>> a
>> > little bit).
>> >
>> > I looked at this only very briefly, but I believe it's due to the
>> > assumption of independence I mentioned earlier - we end up using
>> the new
>> > formula introduced in 84f9a35e3, but it assumes it assumes the
>> > selectivity and number of groups are independent. But that'd not the
>> > case here, because the groups are very clearly correlated (with the
>> > condition on "b").
>> >
>> >
>> > You're right.  The patch allows us to adjust the estimate of distinct
>> > values for appendrels using the new formula introduced in 84f9a35e3.
>> > But if the restrictions are correlated with the grouping expressions,
>> > the new formula does not behave well.  That's why the patch does not
>> > help in case [1], where 'b' and 'c' are correlated.
>> >
>> > OTOH, if the restrictions are not correlated with the grouping
>> > expressions, the new formula would perform quite well.  And in this case
>> > the patch would

Re: Track in pg_replication_slots the reason why slots conflict?

2023-12-26 Thread Bertrand Drouvot

Hi,

On Tue, Dec 26, 2023 at 05:23:56PM +0900, Michael Paquier wrote:
> On Tue, Dec 26, 2023 at 08:44:44AM +0530, Amit Kapila wrote:
> > Does anyone else have a preference on whether to change the existing
> > column or add a new one?
> 
> Just to be clear here, I'd vote for replacing the existing boolean
> with a text.

Same here, I'd vote to avoid 2 columns having the same "meaning".

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2023-12-26 Thread Amit Kapila

On Thu, Dec 21, 2023 at 6:37 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> 10. synchronize_one_slot
>
> IIUC, this function can synchronize slots even if the used plugin on primary 
> is
> not installed on the secondary server. If the slot is created by the slotsync
> worker, users will recognize it after the server is promoted and the decode is
> starting. I felt it is not good specification. Can we detect in the validation
> phase?
>

I think we should be able to detect it if we want but do we want to
add this restriction considering that users can always install the
required plugins after standby gets promoted? I think we can do either
way in this case but as we are not going to use these slots till the
standby node is promoted, it seems okay to validate the plugins after
promotion once users use the synced slots.

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

2023-12-26 Thread shveta malik

On Fri, Dec 22, 2023 at 7:59 PM Bertrand Drouvot
 wrote:
>
> Hi,
>
> On Fri, Dec 22, 2023 at 04:02:21PM +0530, shveta malik wrote:
> > PFA v53. Changes are:
>
> Thanks!
>
> > patch002:
> > 2) Addressed comments in [2] for v52-002.
> > 3) Fixed CFBot failure. The failure was caused by an assert in
> > wait_for_primary_slot_catchup() for null confirmed_lsn received. In
> > wait_for_primary_slot_catchup(), we had an assumption that if
> > restart_lsn is valid and 'conflicting' is also false, then we must
> > have non-null confirmed_lsn. But this is not true. It is possible to
> > get null values for confirmed_lsn and catalog_xmin if on the primary
> > server the slot is just created with a valid restart_lsn and slot-sync
> > worker has fetched the slot before the primary server could set valid
> > confirmed_lsn and catalog_xmin. In
> > pg_create_logical_replication_slot(), there is a small window between
> > CreateInitDecodingContext-->ReplicationSlotReserveWal() which sets
> > restart_lsn and DecodingContextFindStartpoint() which sets
> > confirmed_lsn. If the slot-sync worker fetches the slot in this
> > window, confirmed_lsn received will be NULL. Corrected the code to
> > remove assert and added one additional condition that confirmed_lsn
> > should be valid before moving the slot to 'r'.
> >
>
> Looking at v53-0002 commit message:
>
> It states:
>
> "
> If a logical slot on the primary is valid but is invalidated on the standby,
> then that slot is dropped and recreated on the standby in next sync-cycle.
> "
>
> and one of the reasons mentioned is:
>
> "
> - The primary changes wal_level to a level lower than logical.
> "
>
> I think that as long at there is still logical replication slot on the primary
> that should not be possible. The primary should fail to start with messages 
> like:
>
> "
> 2023-12-22 14:06:09.281 UTC [31824] FATAL:  logical replication slot 
> "logical_slot" exists, but wal_level < logical
> "

Yes, right. It fails in such a case.

>
> Now, if:
>
> - The standby is shutdown
> - All the logical replication slots are removed on the primary
> - wal_level is set to < logical on the primary and it is restarted
>
> Then when the standby starts, the "synced" slots will be invalidated and later
> removed but not re-created on the next sync-cycle (because they don't exist
> anymore on the primary).
>
> Worth to reword a bit that part?

yes, will change these details. Thanks!

> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com

Re: Show WAL write and fsync stats in pg_stat_io

2023-12-26 Thread Nazir Bilal Yavuz

Hi,

On Tue, 26 Dec 2023 at 03:06, Michael Paquier  wrote:
>
> On Mon, Dec 25, 2023 at 04:09:34PM +0300, Nazir Bilal Yavuz wrote:
> > On Wed, 9 Aug 2023 at 21:52, Melanie Plageman  
> > wrote:
> >> If there is any combination of BackendType and IOContext which will
> >> always read XLOG_BLCKSZ bytes, we could use XLOG_BLCKSZ for that row's
> >> op_bytes. For other cases, we may have to consider using op_bytes 1 and
> >> tracking reads and write IOOps in number of bytes (instead of number of
> >> pages). I don't actually know if there is a clear separation by
> >> BackendType for these different cases.
> >
> > Using op_bytes as 1 solves this problem but since it will be different
> > from the rest of the pg_stat_io view it could be hard to understand.
> > There is no clear separation by backends as it can be seen from the 
> > walsender.
>
> I find the use of 1 in this context a bit confusing, because when
> referring to a counter at N, then it can be understood as doing N
> times a operation, but it would be much less than that.  Another
> solution would be to use NULL (as a synonym of "I don't know") and
> then document that in this case all the bigint counters of pg_stat_io
> track the number of bytes rather than the number of operations?

Yes, that makes sense.

Maybe it is better to create a pg_stat_io_wal view like you said
before. We could remove unused columns and add op_bytes for each
writes and reads. Also, we can track both the number of bytes and the
number of the operations. This doesn't fully solve the problem but it
will be easier to modify it to meet our needs.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Re: Track in pg_replication_slots the reason why slots conflict?

On Tue, Dec 26, 2023 at 08:44:44AM +0530, Amit Kapila wrote:
> Does anyone else have a preference on whether to change the existing
> column or add a new one?

Just to be clear here, I'd vote for replacing the existing boolean
with a text.
--
Michael


signature.asc
Description: PGP signature

Re: "pgoutput" options missing on documentation

2023-12-26 Thread Amit Kapila

On Thu, Dec 21, 2023 at 7:16 PM Emre Hasegeli  wrote:
>
> Fixed versions are attached.
>

Pushed!

-- 
With Regards,
Amit Kapila.

Re: Change GUC hashtable to use simplehash?