date:20240321

Nathan Bossart  writes:
> The Bloom filter appears to help a bit, although it regresses the
> create-roles.sql portion of the test.  I'm assuming that's thanks to all
> the extra pallocs and pfrees, which are probably avoidable if we store the
> filter in a long-lived context and just clear it at the beginning of each
> call to roles_is_member_of().

The zero-fill to reinitialize the filter probably costs a good deal
all by itself, considering we're talking about at least a megabyte.
Maybe a better idea is to not enable the filter till we're dealing
with at least 1000 or so entries in roles_list, though obviously that
will complicate the logic.

+if (bloom_lacks_element(bf, (unsigned char *) , 
sizeof(Oid)))
+roles_list = lappend_oid(roles_list, otherid);
+else
+roles_list = list_append_unique_oid(roles_list, otherid);
+bloom_add_element(bf, (unsigned char *) , sizeof(Oid));

Hmm, I was imagining something more like

if (bloom_lacks_element(bf, (unsigned char *) , sizeof(Oid)) ||
!list_member_oid(roles_list, otherid))
{
roles_list = lappend_oid(roles_list, otherid);
bloom_add_element(bf, (unsigned char *) , sizeof(Oid));
}

to avoid duplicate bloom_add_element calls.  Probably wouldn't move
the needle much in this specific test case, but this formulation
would simplify letting the filter kick in later.  Very roughly,

if ((bf && bloom_lacks_element(bf, (unsigned char *) , 
sizeof(Oid))) ||
!list_member_oid(roles_list, otherid))
{
if (bf == NULL && list_length(roles_list) > 1000)
{
... create bf and populate with existing list entries
}
roles_list = lappend_oid(roles_list, otherid);
if (bf)
bloom_add_element(bf, (unsigned char *) , sizeof(Oid));
}


regards, tom lane

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-21 Thread alex work

First of all thank you for looking into this.

At the moment we workaround the problem by altering `acc` ROLE into a SUPERUSER
in PostgreSQL 16 instances. It sidestep the problem and having the lowest cost
to implement for us. While at first we think this feels like opening a security
hole, it does not introduce side effects for **our use case** by the way our
application make use of this `acc` ROLE.

Of course we cannot recommend the workaround we took to others having similar
situation.

On Fri, Mar 22, 2024 at 7:59 AM Tom Lane  wrote:
>
> Nathan Bossart  writes:
> > On Thu, Mar 21, 2024 at 03:40:12PM -0500, Nathan Bossart wrote:
> >> On Thu, Mar 21, 2024 at 04:31:45PM -0400, Tom Lane wrote:
> >>> I don't think we have any really cheap way to de-duplicate the role
> >>> OIDs, especially seeing that it has to be done on-the-fly within the
> >>> collection loop, and the order of roles_list is at least potentially
> >>> interesting.  Not sure how to make further progress without a lot of
> >>> work.
>
> >> Assuming these are larger lists, this might benefit from optimizations
> >> involving SIMD intrinsics.
>
> > Never mind.  With the reproduction script, I'm only seeing a ~2%
> > improvement with my patches.
>
> Yeah, you cannot beat an O(N^2) problem by throwing SIMD at it.
>
> However ... I just remembered that we have a Bloom filter implementation
> in core now (src/backend/lib/bloomfilter.c).  How about using that
> to quickly reject (hopefully) most role OIDs, and only do the
> list_member_oid check if the filter passes?
>
> regards, tom lane

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

On Thu, Mar 21, 2024 at 08:03:32PM -0500, Nathan Bossart wrote:
> On Thu, Mar 21, 2024 at 08:59:54PM -0400, Tom Lane wrote:
>> However ... I just remembered that we have a Bloom filter implementation
>> in core now (src/backend/lib/bloomfilter.c).  How about using that
>> to quickly reject (hopefully) most role OIDs, and only do the
>> list_member_oid check if the filter passes?
> 
> Seems worth a try.  I've been looking for an excuse to use that...

The Bloom filter appears to help a bit, although it regresses the
create-roles.sql portion of the test.  I'm assuming that's thanks to all
the extra pallocs and pfrees, which are probably avoidable if we store the
filter in a long-lived context and just clear it at the beginning of each
call to roles_is_member_of().

HEAD  hash  hash+bloom
create  2.02  2.06  2.92
grant   4.63  0.27  0.08

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c
index cf5d08576a..445a14646c 100644
--- a/src/backend/utils/adt/acl.c
+++ b/src/backend/utils/adt/acl.c
@@ -36,6 +36,7 @@
 #include "common/hashfn.h"
 #include "foreign/foreign.h"
 #include "funcapi.h"
+#include "lib/bloomfilter.h"
 #include "lib/qunique.h"
 #include "miscadmin.h"
 #include "utils/acl.h"
@@ -4946,6 +4947,7 @@ roles_is_member_of(Oid roleid, enum RoleRecurseType type,
 	ListCell   *l;
 	List	   *new_cached_roles;
 	MemoryContext oldctx;
+	bloom_filter *bf;
 
 	Assert(OidIsValid(admin_of) == PointerIsValid(admin_role));
 	if (admin_role != NULL)
@@ -4987,6 +4989,9 @@ roles_is_member_of(Oid roleid, enum RoleRecurseType type,
 	 */
 	roles_list = list_make1_oid(roleid);
 
+	bf = bloom_create(1024, work_mem, 0);
+	bloom_add_element(bf, (unsigned char *) , sizeof(Oid));
+
 	foreach(l, roles_list)
 	{
 		Oid			memberid = lfirst_oid(l);
@@ -5023,16 +5028,29 @@ roles_is_member_of(Oid roleid, enum RoleRecurseType type,
 			 * graph, we must test for having already seen this role. It is
 			 * legal for instance to have both A->B and A->C->B.
 			 */
-			roles_list = list_append_unique_oid(roles_list, otherid);
+			if (bloom_lacks_element(bf, (unsigned char *) , sizeof(Oid)))
+roles_list = lappend_oid(roles_list, otherid);
+			else
+roles_list = list_append_unique_oid(roles_list, otherid);
+			bloom_add_element(bf, (unsigned char *) , sizeof(Oid));
 		}
 		ReleaseSysCacheList(memlist);
 
 		/* implement pg_database_owner implicit membership */
 		if (memberid == dba && OidIsValid(dba))
-			roles_list = list_append_unique_oid(roles_list,
-ROLE_PG_DATABASE_OWNER);
+		{
+			Oid dbowner = ROLE_PG_DATABASE_OWNER;
+
+			if (bloom_lacks_element(bf, (unsigned char *) , sizeof(Oid)))
+roles_list = lappend_oid(roles_list, ROLE_PG_DATABASE_OWNER);
+			else
+roles_list = list_append_unique_oid(roles_list, ROLE_PG_DATABASE_OWNER);
+			bloom_add_element(bf, (unsigned char *) , sizeof(Oid));
+		}
 	}
 
+	bloom_free(bf);
+
 	/*
 	 * Copy the completed list into TopMemoryContext so it will persist.
 	 */

Re: Catalog domain not-null constraints

2024-03-21 Thread Vik Fearing


On 3/22/24 01:46, Tom Lane wrote:

Vik Fearing  writes:

Anyway, I will bring this up with the committee and report back.  My
proposed solution will be for CAST to check domain constraints even if
the input is NULL.


Please do not claim that that is the position of the Postgres project.



Everything that I do on the SQL Committee is in my own name.

I do not speak for either PostgreSQL or for EDB (my employer), even 
though my opinions are of course often influenced by some combination of 
both.

--
Vik Fearing

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

On Thu, Mar 21, 2024 at 08:59:54PM -0400, Tom Lane wrote:
> However ... I just remembered that we have a Bloom filter implementation
> in core now (src/backend/lib/bloomfilter.c).  How about using that
> to quickly reject (hopefully) most role OIDs, and only do the
> list_member_oid check if the filter passes?

Seems worth a try.  I've been looking for an excuse to use that...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

Nathan Bossart  writes:
> On Thu, Mar 21, 2024 at 03:40:12PM -0500, Nathan Bossart wrote:
>> On Thu, Mar 21, 2024 at 04:31:45PM -0400, Tom Lane wrote:
>>> I don't think we have any really cheap way to de-duplicate the role
>>> OIDs, especially seeing that it has to be done on-the-fly within the
>>> collection loop, and the order of roles_list is at least potentially
>>> interesting.  Not sure how to make further progress without a lot of
>>> work.

>> Assuming these are larger lists, this might benefit from optimizations
>> involving SIMD intrinsics.

> Never mind.  With the reproduction script, I'm only seeing a ~2%
> improvement with my patches.

Yeah, you cannot beat an O(N^2) problem by throwing SIMD at it.

However ... I just remembered that we have a Bloom filter implementation
in core now (src/backend/lib/bloomfilter.c).  How about using that
to quickly reject (hopefully) most role OIDs, and only do the
list_member_oid check if the filter passes?

regards, tom lane

Re: remaining sql/json patches

2024-03-21 Thread Kyotaro Horiguchi

At Wed, 20 Mar 2024 21:53:52 +0900, Amit Langote  
wrote in 
> I'll push 0001 tomorrow.

This patch (v44-0001-Add-SQL-JSON-query-functions.patch) introduced the 
following new erro message:

+errmsg("can only specify 
constant, non-aggregate"
+   " function, or 
operator expression for"
+   " DEFAULT"),

I believe that our convention here is to write an error message in a
single string literal, not split into multiple parts, for better
grep'ability.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

On Thu, Mar 21, 2024 at 03:40:12PM -0500, Nathan Bossart wrote:
> On Thu, Mar 21, 2024 at 04:31:45PM -0400, Tom Lane wrote:
>> I don't think we have any really cheap way to de-duplicate the role
>> OIDs, especially seeing that it has to be done on-the-fly within the
>> collection loop, and the order of roles_list is at least potentially
>> interesting.  Not sure how to make further progress without a lot of
>> work.
> 
> Assuming these are larger lists, this might benefit from optimizations
> involving SIMD intrinsics.  I looked into that a while ago [0], but the
> effort was abandoned because we didn't have any concrete use-cases at the
> time.  (I'm looking into some additional optimizations in a separate thread
> [1] that would likely apply here, too.)

Never mind.  With the reproduction script, I'm only seeing a ~2%
improvement with my patches.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Catalog domain not-null constraints

Vik Fearing  writes:
> On 3/22/24 00:17, Tom Lane wrote:
>> Vik Fearing  writes:
>>> As also said somewhere in that thread, I think that 
>>> short-cutting a NULL input value without considering the constraints of
>>> a domain is a bug that needs to be fixed in the standard.

>> I think it's probably intentional.  It certainly fits with the lack of
>> syntax for DOMAIN NOT NULL.  Also, it's been like that since SQL99;
>> do you think nobody's noticed it for 25 years?

> Haven't we (postgres) had bug reports of similar age?

Well, they've looked it at it since then.  SQL99 has

c) If SV is the null value, then the result is the null value.

SQL:2008 and later have the text I quoted:

c) If SV is the null value, then the result of CS is the null
value and no further General Rules of this Sub-clause are
applied.

I find it *extremely* hard to believe that they would have added
that explicit text without noticing exactly which operations they
were saying to skip.

> Anyway, I will bring this up with the committee and report back.  My 
> proposed solution will be for CAST to check domain constraints even if 
> the input is NULL.

Please do not claim that that is the position of the Postgres project.

regards, tom lane

Re: DRAFT: Pass sk_attno to consistent function

2024-03-21 Thread Michał Kłeczek




> On 21 Mar 2024, at 23:42, Matthias van de Meent 
>  wrote:
> 
> On Tue, 19 Mar 2024 at 17:00, Michał Kłeczek  wrote:
> 
>> With this operator we can write our queries like:
>> 
>> account_number ||= [list of account numbers] AND
>> account_number = ANY ([list of account numbers]) — redundant for partition 
>> pruning as it does not understand ||=
>> 
>> and have optimal plans:
>> 
>> Limit
>> — Merge Append
>> —— Index scan of relevant partitions
>> 
>> The problem is that now each partition scan is for the same list of accounts.
>> The “consistent” function cannot assume anything about contents of the table 
>> so it has to check all elements of the list
>> and that in turn causes reading unnecessary index pages for accounts not in 
>> this partition.
> 
> You seem to already be using your own operator class, so you may want
> to look into CREATE FUNCTION's support_function parameter; which would
> handle SupportRequestIndexCondition and/or SupportRequestSimplify.
> I suspect a support function that emits multiple clauses that each
> apply to only a single partition at a time should get you quite far if
> combined with per-partition constraints that filter all but one of
> those. Also, at plan-time you can get away with much more than at
> runtime.

Thanks for suggestion.

I am afraid I don’t understand how it might actually work though:

1) I think planning time is too early for us - we are heavily using 
planner_mode = force_generic_plan:
- we have many partitions and planning times started to dominate execution time
- generic plans are not worse than specialised

2) I am not sure how I could transform
"col ||= [array]" to multiple criteria to make sure it works well with 
partition pruning and planner.

It looks like what you are suggesting is to generate something like:
(part_condition AND col ||= [subarray1]) OR (part_condition AND col ||= 
[subarray2])
and hope the planner would generate proper Merge Append node (which I doubt 
would happen and planner would generate Bitmap scan due to lack of OR support 
in Gist).
What’s more - there is no part_condition for hash partitions.

> 
>> What we need is a way for the “consistent” function to be able to 
>> pre-process input query array and remove elements
>> that are not relevant for this scan. To do that it is necessary to have 
>> enough information to read necessary metadata from the catalog.
> 
>> The proposed patch addresses this need and seems (to me) largely 
>> uncontroversial as it does not break any existing extensions.
> 
> I don't think "my index operator class will go into the table
> definition and check if parts of the scankey are consistent with the
> table constraints" is a good reason to expose the index column to
> operator classes.

Quite possibly but I still don’t see any other way to do that TBH.

—
Michal

Re: documentation structure

2024-03-21 Thread Bruce Momjian

On Fri, Mar 22, 2024 at 01:12:30AM +0100, Daniel Gustafsson wrote:
> > On 22 Mar 2024, at 00:33, Peter Eisentraut  wrote:
> > 
> > On 19.03.24 14:50, Tom Lane wrote:
> >> Daniel Gustafsson  writes:
> >>> It's actually not very odd, the reference section is using  
> >>> elements
> >>> and we had missed the arabic numerals setting on those.  The attached 
> >>> fixes
> >>> that for me.  That being said, we've had roman numerals for the reference
> >>> section since forever (all the way down to the 7.2 docs online has it) so 
> >>> maybe
> >>> it was intentional?
> >> I'm quite sure it *was* intentional.  Maybe it was a bad idea, but
> >> it's not that way simply because nobody thought about it.
> > 
> > Looks to me it was just that way because it's the default setting of the 
> > stylesheets.
> 
> That's quite possible.  I don't have strong opinions on whether we should
> change, or keep it the way it is.

If we can't justify why it should be different, it should be like the
surrounding sections.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: documentation structure

> On 22 Mar 2024, at 00:33, Peter Eisentraut  wrote:
> 
> On 19.03.24 14:50, Tom Lane wrote:
>> Daniel Gustafsson  writes:
>>> It's actually not very odd, the reference section is using  
>>> elements
>>> and we had missed the arabic numerals setting on those.  The attached fixes
>>> that for me.  That being said, we've had roman numerals for the reference
>>> section since forever (all the way down to the 7.2 docs online has it) so 
>>> maybe
>>> it was intentional?
>> I'm quite sure it *was* intentional.  Maybe it was a bad idea, but
>> it's not that way simply because nobody thought about it.
> 
> Looks to me it was just that way because it's the default setting of the 
> stylesheets.

That's quite possible.  I don't have strong opinions on whether we should
change, or keep it the way it is.

--
Daniel Gustafsson

Re: Flushing large data immediately in pqcomm

2024-03-21 Thread Melih Mutlu

Hi,

PSA v3.

Jelte Fennema-Nio , 21 Mar 2024 Per, 12:58 tarihinde
şunu yazdı:

> On Thu, 21 Mar 2024 at 01:24, Melih Mutlu  wrote:
> > What if I do a simple comparison like PqSendStart == PqSendPointer
> instead of calling pq_is_send_pending()
>
> Yeah, that sounds worth trying out. So the new suggestions to fix the
> perf issues on small message sizes would be:
>
> 1. add "inline" to internal_flush function
> 2. replace pq_is_send_pending() with PqSendStart == PqSendPointer
> 3. (optional) swap the order of PqSendStart == PqSendPointer and len
> >= PqSendBufferSize
>

I did all of the above changes and it seems like those resolved the
regression issue.
Since the previous results were with unix sockets, I share here the results
of v3 when using unix sockets for comparison.
Sharing only the case where all messages are 100 bytes, since this was when
the regression was most visible.

row size = 100 bytes, # of rows = 100
┌───┬┬──┬──┬──┬──┬──┐
│   │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───┼┼──┼──┼──┼──┼──┤
│ HEAD  │ 1106   │ 1006 │ 947  │ 920  │ 899  │ 888  │
├───┼┼──┼──┼──┼──┼──┤
│ patch │ 1094   │ 997  │ 943  │ 913  │ 894  │ 881  │
├───┼┼──┼──┼──┼──┼──┤
│ no buffer │ 6389   │ 6195 │ 6214 │ 6271 │ 6325 │ 6211 │
└───┴┴──┴──┴──┴──┴──┘

David Rowley , 21 Mar 2024 Per, 00:57 tarihinde şunu
yazdı:

> On Fri, 15 Mar 2024 at 01:46, Heikki Linnakangas  wrote:
> > - the "(int *) )" cast is not ok, and will break visibly on
> > big-endian systems where sizeof(int) != sizeof(size_t).
>
> I think fixing this requires adjusting the signature of
> internal_flush_buffer() to use size_t instead of int.   That also
> means that PqSendStart and PqSendPointer must also become size_t, or
> internal_flush() must add local size_t variables to pass to
> internal_flush_buffer and assign these back again to the global after
> the call.  Upgrading the globals might be the cleaner option.
>
> David


This is done too.

I actually tried to test it over a real network for a while. However, I
couldn't get reliable-enough numbers with both HEAD and the patch due to
network related issues.
I've decided to go with Jelte's suggestion [1]  which is decreasing MTU of
the loopback interface to 1500 and using localhost.

Here are the results:

1- row size = 100 bytes, # of rows = 100
┌───┬┬───┬───┬───┬───┬───┐
│   │ 1400 bytes │ 2KB   │  4KB  │  8KB  │  16KB │  32KB │
├───┼┼───┼───┼───┼───┼───┤
│ HEAD  │ 1351   │ 1233  │ 1074  │  988  │  944  │  916  │
├───┼┼───┼───┼───┼───┼───┤
│ patch │ 1369   │ 1232  │ 1073  │  981  │  928  │  907  │
├───┼┼───┼───┼───┼───┼───┤
│ no buffer │ 14949  │ 14533 │ 14791 │ 14864 │ 14612 │ 14751 │
└───┴┴───┴───┴───┴───┴───┘

2-  row size = half of the rows are 1KB and rest is 10KB , # of rows =
100
┌───┬┬───┬───┬───┬───┬───┐
│   │ 1400 bytes │ 2KB   │ 4KB   │ 8KB   │ 16KB  │ 32KB  │
├───┼┼───┼───┼───┼───┼───┤
│ HEAD  │ 37212  │ 31372 │ 25520 │ 21980 │ 20311 │ 18864 │
├───┼┼───┼───┼───┼───┼───┤
│ patch │ 23006  │ 23127 │ 23147 │ 9 │ 20367 │ 19155 │
├───┼┼───┼───┼───┼───┼───┤
│ no buffer │ 30725  │ 31090 │ 30917 │ 30796 │ 30984 │ 30813 │
└───┴┴───┴───┴───┴───┴───┘

3-  row size = half of the rows are 1KB and rest is 1MB , # of rows = 1000
┌───┬┬──┬──┬──┬──┬──┐
│   │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───┼┼──┼──┼──┼──┼──┤
│ HEAD  │ 4296   │ 3713 │ 3040 │ 2711 │ 2528 │ 2449 │
├───┼┼──┼──┼──┼──┼──┤
│ patch │ 2401   │ 2411 │ 2404 │ 2374 │ 2395 │ 2408 │
├───┼┼──┼──┼──┼──┼──┤
│ no buffer │ 2399   │ 2403 │ 2408 │ 2389 │ 2402 │ 2403 │
└───┴┴──┴──┴──┴──┴──┘

4-  row size = all rows are 1MB , # of rows = 1000
┌───┬┬──┬──┬──┬──┬──┐
│   │ 1400 bytes │ 2KB  │ 4KB  │ 8KB  │ 16KB │ 32KB │
├───┼┼──┼──┼──┼──┼──┤
│ HEAD  │ 8335   │ 7370 │ 6017 │ 5368 │ 5009 │ 4843 │
├───┼┼──┼──┼──┼──┼──┤
│ patch │ 4711   │ 4722 │ 4708 │ 4693 │ 4724 │ 4717 │
├───┼┼──┼──┼──┼──┼──┤
│ no buffer │ 4704   │ 4712 │ 4746 │ 4728 │ 4709 │ 4730

Re: documentation structure


On 21.03.24 15:31, Robert Haas wrote:

On Thu, Mar 21, 2024 at 9:38 AM Tom Lane  wrote:

I'd follow the extend.sgml precedent: have a file corresponding to the
chapter and containing any top-level text we need, then that includes
a file per sect1.


OK, here's a new patch set. I've revised 0003 and 0004 to use this
approach, and I've added a new 0005 that does essentially the same
thing for the PL chapters.


I'm highly against this.  If I want to read about PL/Python, why should 
I have to wade through PL/Perl and PL/Tcl?


I think, abstractly, in a book, PL/Python should be a chapter of its 
own.  Just like GiST should be a chapter of its own.  Because they are 
self-contained topics.

Re: Catalog domain not-null constraints

2024-03-21 Thread Vik Fearing


On 3/22/24 00:17, Tom Lane wrote:

Vik Fearing  writes:

On 3/21/24 15:30, Tom Lane wrote:

The SQL spec's answer to that conundrum appears to be "NULL is
a valid value of every domain, and if you don't like it, tough".



I don't see how you can infer this from the standard at all.


I believe where we got that from is 6.13 ,
which quoth (general rule 2):

 c) If SV is the null value, then the result of CS is the null
 value and no further General Rules of this Subclause are applied.

In particular, that short-circuits application of the domain
constraints (GR 23), implying that CAST(NULL AS some_domain) is
always successful.  Now you could argue that there's some other
context that would reject nulls, but being inconsistent with
CAST would seem more like a bug than a feature.



I think the main bug is in what you quoted from .

I believe that the POLA for casting to a domain is for all constraints 
of the domain to be verified for ALL values including the null value.




As also said somewhere in that thread, I think that 
short-cutting a NULL input value without considering the constraints of
a domain is a bug that needs to be fixed in the standard.


I think it's probably intentional.  It certainly fits with the lack of
syntax for DOMAIN NOT NULL.  Also, it's been like that since SQL99;
do you think nobody's noticed it for 25 years?



Haven't we (postgres) had bug reports of similar age?

There is also the possibility that no one has noticed because major 
players have not implemented domains.  For example, Oracle only just got 
them last year: 
https://blogs.oracle.com/coretec/post/less-coding-with-sql-domains-in-23c


Anyway, I will bring this up with the committee and report back.  My 
proposed solution will be for CAST to check domain constraints even if 
the input is NULL.

--
Vik Fearing

Re: documentation structure


On 20.03.24 17:43, Robert Haas wrote:

0001 removes the "Installation from Binaries" chapter. The whole thing
is four sentences. I moved the most important information into the
"Installation from Source Code" chapter and retitled it
"Installation".


But this separation was explicitly added a few years ago, because most 
people just want to read about the binaries.

Re: documentation structure


On 19.03.24 14:50, Tom Lane wrote:

Daniel Gustafsson  writes:

It's actually not very odd, the reference section is using  elements
and we had missed the arabic numerals setting on those.  The attached fixes
that for me.  That being said, we've had roman numerals for the reference
section since forever (all the way down to the 7.2 docs online has it) so maybe
it was intentional?


I'm quite sure it *was* intentional.  Maybe it was a bad idea, but
it's not that way simply because nobody thought about it.


Looks to me it was just that way because it's the default setting of the 
stylesheets.

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

2024-03-21 Thread Peter Geoghegan

On Mon, Mar 18, 2024 at 9:25 AM Matthias van de Meent
 wrote:
> I was thinking about a more unified processing model, where
> _bt_preprocess_keys would iterate over all keys, including processing
> of array keys (by merging and reduction to normal keys) if and when
> found. This would also reduce the effort expended when there are
> contradictory scan keys, as array preprocessing is relatively more
> expensive than other scankeys and contradictions are then found before
> processing of later keys.

Does it really matter? I just can't get excited about maybe getting
one less syscache lookup in queries that involve *obviously*
contradictory keys at the SQL level. Especially because we're so much
better off with the new design here anyway; calling
_bt_preprocess_keys once rather than once per distinct set of array
keys is an enormous improvement, all on its own.

My concern with preprocessing overhead is almost entirely limited to
pathological performance issues involving extreme (even adversarial)
input scan keys/arrays. I feel that if we're going to guarantee that
we won't choke in _bt_checkkeys, even given billions of distinct sets
of array keys, we should make the same guarantee in
_bt_preprocess_keys -- just to avoid POLA violations. But that's the
only thing that seems particularly important in the general area of
preprocessing performance. (Preprocessing performance probably does
matter quite a bit in more routine cases, where />= are
mixed together on the same attribute. But there'll be no new
_bt_preprocess_keys operator/function lookups for stuff like that.)

The advantage of not completely merging _bt_preprocess_array_keys with
_bt_preprocess_keys is that preserving this much of the existing code
structure allows us to still decide how many array keys we'll need for
the scan up front (if the scan doesn't end up having an unsatisfiable
qual). _bt_preprocess_array_keys will eliminate redundant arrays
early, in practically all cases (the exception is when it has to deal
with an incomplete opfamily lacking the appropriate cross-type ORDER
proc), so we don't have to think about merging arrays after that
point. Rather like how we don't have to worry about "WHERE a <
any('{1, 2, 3}')" type inequalities after _bt_preprocess_array_keys
does an initial round of array-specific preprocessing
(_bt_preprocess_keys can deal with those in the same way as it will
with standard inequalities).

> > This preprocessing work should all be happening during planning, not
> > during query execution -- that's the design that makes the most sense.
> > This is something we've discussed in the past in the context of skip
> > scan (see my original email to this thread for the reference).
>
> Yes, but IIRC we also agreed that it's impossible to do this fully in
> planning, amongst others due to joins on array fields.

Even with a nested loop join's inner index scan, where the constants
used for each btrescan are not really predictable in the planner, we
can still do most preprocessing in the planner, at least most of the
time.

We could still easily do analysis that is capable of ruling out
redundant or contradictory scan keys for any possible parameter value
-- seen or unseen. I'd expect this to be the common case -- most of
the time these inner index scans need only one simple = operator
(maybe 2 = operators). Obviously, tjat approach still requires that
btrescan at least accept a new set of constants for each new inner
index scan invocation. But that's still much cheaper than calling
_bt_preprocess_keys from scratch every time btresca is called.

> > It
> > would be especially useful for the very fancy kinds of preprocessing
> > that are described by the MDAM paper, like using an index scan for a
> > NOT IN() list/array (this can actually make sense with a low
> > cardinality index).
>
> Yes, indexes such as those on enums. Though, in those cases the NOT IN
> () could be transformed into IN()-lists by the planner, but not the
> index.

I think that it would probably be built as just another kind of index
path, like the ones we build for SAOPs. Anti-SAOPs?

Just as with SAOPs, the disjunctive accesses wouldn't really be
something that the planner would need too much direct understanding of
(except during costing). I'd only expect the plan to use such an index
scan when it wasn't too far off needing a full index scan anyway. Just
like with skip scan, the distinction between these NOT IN() index scan
paths and a simple full index scan path is supposed to be fuzzy (maybe
they'll actually turn out to be full scans at runtime, since the
number of primitive index scans is determined dynamically, based on
the structure of the index at runtime).

> > The structure for preprocessing that I'm working towards (especially
> > in v15) sets the groundwork for making those shifts in the planner,
> > because we'll no longer treat each array constant as its own primitive
> > index scan during preprocessing.
>
> I hope that's going to be a

Re: Catalog domain not-null constraints

Vik Fearing  writes:
> On 3/21/24 15:30, Tom Lane wrote:
>> The SQL spec's answer to that conundrum appears to be "NULL is
>> a valid value of every domain, and if you don't like it, tough".

> I don't see how you can infer this from the standard at all.

I believe where we got that from is 6.13 ,
which quoth (general rule 2):

c) If SV is the null value, then the result of CS is the null
value and no further General Rules of this Subclause are applied.

In particular, that short-circuits application of the domain
constraints (GR 23), implying that CAST(NULL AS some_domain) is
always successful.  Now you could argue that there's some other
context that would reject nulls, but being inconsistent with
CAST would seem more like a bug than a feature.

> As also said somewhere in that thread, I think that  
> short-cutting a NULL input value without considering the constraints of 
> a domain is a bug that needs to be fixed in the standard.

I think it's probably intentional.  It certainly fits with the lack of
syntax for DOMAIN NOT NULL.  Also, it's been like that since SQL99;
do you think nobody's noticed it for 25 years?

regards, tom lane

Re: Flushing large data immediately in pqcomm

2024-03-21 Thread Melih Mutlu

Heikki Linnakangas , 14 Mar 2024 Per, 15:46 tarihinde şunu
yazdı:

> On 14/03/2024 13:22, Melih Mutlu wrote:
> > @@ -1282,14 +1283,32 @@ internal_putbytes(const char *s, size_t len)
> >   if (internal_flush())
> >   return EOF;
> >   }
> > - amount = PqSendBufferSize - PqSendPointer;
> > - if (amount > len)
> > - amount = len;
> > - memcpy(PqSendBuffer + PqSendPointer, s, amount);
> > - PqSendPointer += amount;
> > - s += amount;
> > - len -= amount;
> > +
> > + /*
> > +  * If the buffer is empty and data length is larger than
> the buffer
> > +  * size, send it without buffering. Otherwise, put as much
> data as
> > +  * possible into the buffer.
> > +  */
> > + if (!pq_is_send_pending() && len >= PqSendBufferSize)
> > + {
> > + int start = 0;
> > +
> > + socket_set_nonblocking(false);
> > + if (internal_flush_buffer(s, , (int *)))
> > + return EOF;
> > + }
> > + else
> > + {
> > + amount = PqSendBufferSize - PqSendPointer;
> > + if (amount > len)
> > + amount = len;
> > + memcpy(PqSendBuffer + PqSendPointer, s, amount);
> > + PqSendPointer += amount;
> > + s += amount;
> > + len -= amount;
> > + }
> >   }
> > +
> >   return 0;
> >  }
>
> Two small bugs:
>
> - the "(int *) )" cast is not ok, and will break visibly on
> big-endian systems where sizeof(int) != sizeof(size_t).
>
> - If internal_flush_buffer() cannot write all the data in one call, it
> updates 'start' for how much it wrote, and leaves 'end' unchanged. You
> throw the updated 'start' value away, and will send the same data again
> on next iteration.
>

There are two possible options for internal_flush_buffer() in
internal_putbytes() case:
1- Write all the data and return 0. We don't need start or end of the data
in this case.
2- Cannot write all and return EOF. In this case internal_putbytes() also
returns EOF immediately and does not really retry. There will be no next
iteration.

If it was non-blocking, then we may need to keep the new value. But I think
we do not need the updated start value in both cases here. What do you
think?

Thanks,
-- 
Melih Mutlu
Microsoft

Re: [HACKERS] make async slave to wait for lsn to be replayed


On 17.03.24 15:09, Alexander Korotkov wrote:

My current attempt was to commit minimal implementation as less
invasive as possible.  A new clause for BEGIN doesn't require
additional keywords and doesn't introduce additional statements.  But
yes, this is still a new qual.  And, yes, Amit you're right that even
if I had committed that, there was still a high risk of further
debates and revert.


I had written in [0] about my questions related to using this with 
connection poolers.  I don't think this was addressed at all.  I haven't 
seen any discussion about how to make this kind of facility usable in a 
full system.  You have to manually query and send LSNs; that seems 
pretty cumbersome.  Sure, this is part of something that could be 
useful, but how would an actual user with actual application code get to 
use this?


[0]: 
https://www.postgresql.org/message-id/8b5b172f-0ae7-d644-8358-e2851dded43b%40enterprisedb.com

Re: documentation structure

2024-03-21 Thread David G. Johnston

On Thu, Mar 21, 2024 at 11:30 AM Robert Haas  wrote:

>
> My second thought is that the stuff from "VII. Internals" that I
> categorized as reference material should move into section "VI.
> Reference". I think we should also consider moving appendix F,
> "Additional Supplied Modules and Extensions," and appendix G,
> "Additional Supplied Programs" to the reference section.
>
>
For "VI. Reference" I propose the following Chapters:

SQL Commands
PL/pgSQL
Cumulative Statistics Views
System Views
System Catalogs
Client Applications
Server Applications
Modules and Extensions

-- Remove Appendix G (Programs) altogether and just note for the two that
are listed that they are in contrib as opposed to core.

-- The PostgreSQL qualifier doesn't seem helpful and once you add the
additional chapters its unusual presence stands out even more.

-- PL/pgSQL gets its own reference chapter since we wrote it.  Stuff like
Perl and Python have entire books that the user can consult as reference
material for those languages.

David J.

Re: Catalog domain not-null constraints

2024-03-21 Thread Vik Fearing

On 3/21/24 15:30, Tom Lane wrote:

Peter Eisentraut writes:

A quick reading of the SQL standard suggests to me that the way we are
doing null handling in domain constraints is all wrong. The standard
says that domain constraints are only checked on values that are not
null. So both the handling of constraints using the CHECK syntax is
nonstandard and the existence of explicit NOT NULL constraints is an
extension. The CREATE DOMAIN reference page already explains why all of
this is a bad idea. Do we want to document all of that further, or
maybe we just want to rip out domain not-null constraints, or at least
not add further syntax for it?

Yeah. The real problem with domain not null is: how can a column
that's propagated up through the nullable side of an outer join
still be considered to belong to such a domain?

Per spec, it is not considered to be so. The domain only applies to
table storage and CASTs and gets "forgotten" in a query.

The SQL spec's answer to that conundrum appears to be "NULL is
a valid value of every domain, and if you don't like it, tough".

I don't see how you can infer this from the standard at all.

I'm too lazy to search the archives, but we have had at least one
previous discussion about how we should adopt the spec's semantics.
It'd be an absolutely trivial fix in CoerceToDomain (succeed
immediately if input is NULL), but the question is what to do
with existing "DOMAIN NOT NULL" DDL.

Here is a semi-random link into a conversation you and I have recently
had about this:
https://www.postgresql.org/message-id/a13db59c-c68f-4a30-87a5-177fe135665e%40postgresfriends.org

As also said somewhere in that thread, I think that
short-cutting a NULL input value without considering the constraints of
a domain is a bug that needs to be fixed in the standard.

--
Vik Fearing

Re: [HACKERS] make async slave to wait for lsn to be replayed


On 19.03.24 18:38, Kartyshov Ivan wrote:

  CALL pg_wait_lsn('0/3002AE8', 1);
  BEGIN;
  SELECT * FROM tbl; // read fresh insertions
  COMMIT;


I'm not endorsing this or any other approach, but I think the timeout 
parameter should be of type interval, not an integer with a unit that is 
hidden in the documentation.

Re: DRAFT: Pass sk_attno to consistent function

2024-03-21 Thread Matthias van de Meent

On Tue, 19 Mar 2024 at 17:00, Michał Kłeczek  wrote:
>
> Hi All,
>
> Since it looks like there is not much interest in the patch I will try to 
> provide some background to explain why I think it is needed.
>
[...]
> What we really need is for Gist to support “= ANY (…)”.
> As Gist index is extensible in terms of queries it supports it is quite easy 
> to implement an operator class/family with operator:
>
> ||= (text, text[])
>
> that has semantics the same as “= ANY (…)”

I've had a similar idea while working on BRIN, and was planning on
overloading `@>` for @>(anyarray, anyelement) or using a new
`@>>(anyarray, anyelement)` operator. No implementation yet, though.

> With this operator we can write our queries like:
>
> account_number ||= [list of account numbers] AND
> account_number = ANY ([list of account numbers]) — redundant for partition 
> pruning as it does not understand ||=
>
> and have optimal plans:
>
> Limit
> — Merge Append
> —— Index scan of relevant partitions
>
> The problem is that now each partition scan is for the same list of accounts.
> The “consistent” function cannot assume anything about contents of the table 
> so it has to check all elements of the list
> and that in turn causes reading unnecessary index pages for accounts not in 
> this partition.

You seem to already be using your own operator class, so you may want
to look into CREATE FUNCTION's support_function parameter; which would
handle SupportRequestIndexCondition and/or SupportRequestSimplify.
I suspect a support function that emits multiple clauses that each
apply to only a single partition at a time should get you quite far if
combined with per-partition constraints that filter all but one of
those. Also, at plan-time you can get away with much more than at
runtime.

> What we need is a way for the “consistent” function to be able to pre-process 
> input query array and remove elements
> that are not relevant for this scan. To do that it is necessary to have 
> enough information to read necessary metadata from the catalog.

> The proposed patch addresses this need and seems (to me) largely 
> uncontroversial as it does not break any existing extensions.

I don't think "my index operator class will go into the table
definition and check if parts of the scankey are consistent with the
table constraints" is a good reason to expose the index column to
operator classes.
Note that operator classes were built specifically so that they're
independent from their column position. It doens't really make sense
to expose this. Maybe my suggestion up above helps?

Kind regards,

Matthias van de Meent

Re: documentation structure

2024-03-21 Thread David G. Johnston

On Wed, Mar 20, 2024 at 9:43 AM Robert Haas  wrote:

> On Tue, Mar 19, 2024 at 5:39 PM Andrew Dunstan 
> wrote:
> > +many for improving the index.
>
> Here's a series of four patches.

I reviewed the most recent set of 5 patches.

> Taken together, they cut down the
> number of numbered chapters from 76 to 68. I think we could easily
> save that much again if I wrote a few more patches along similar
> lines, but I'm posting these first to see what people think.
>
> 0001 removes the "Installation from Binaries" chapter. The whole thing
> is four sentences. I moved the most important information into the
> "Installation from Source Code" chapter and retitled it
> "Installation".
>

Makes sense

> 0002 removes the "Monitoring Disk Usage" chapter by folding it into
> the immediately-preceding "Monitoring Database Activity" chapter. I
> kind of feel like the "Monitoring Disk Usage" chapter might be in need
> of a bigger rewrite or just outright removal, but there's surely not
> enough content here to justify making it a top-level chapter.
>

Just going to note that the section on the cumulative statistics views
being a single page is still a strongly bothersome issue here.  Though the
quick fix actually requires upgrading the section to chapter status...

Maybe we can stub out that section in the "Monitoring Database Activity"
chapter and move that entire section after "System Views" in the Internals
part?

I agree with subordinating Monitoring Disk Usage.

> 0003 merges all of the "Internals" chapters whose names are the names
> of built-in index access methods (Btree, Gin, etc.) into a single
> chapter called "Built-In Index Access Methods". All of these chapters
> have a very similar structure and none of them are very long, so it
> makes a lot of sense, at least in my mind, to consolidate them into
> one.
>

One of the more impactful and wanted improvements, IMO.

> 0004 merges the "Generic WAL Records" and "Custom WAL Resource
> Managers" chapter together, creating a new chapter called "Write Ahead
> Logging for Extensions".
>
>
The positioning of this and the preceding Built-in Index Access Methods
chapter seem like they should be switched.

If this sticks we should add an introductory paragraph for the chapter.

and I've added a new 0005 that does essentially the same
> thing for the PL chapters.
>

The following page needs to be reworded to take the new structure into
account:

https://www.postgresql.org/docs/current/xfunc-pl.html

Not having pl/pgsql appear on the main ToC seems like a loss but the others
make sense and a special exception for it probably isn't warranted.

Maybe "pl/pgsql and Other Procedural Languages" as the title?

David J.

Re: [PATCH] ltree hash functions

jian he  writes:
> I also made the following changes:
> from

> uint64 levelHash = hash_any_extended((unsigned char *) al->name, al->len, 
> seed);
> uint32 levelHash = hash_any((unsigned char *) al->name, al->len);

> to
> uint64 levelHash = DatumGetUInt64(hash_any_extended((unsigned char *)
> al->name, al->len, seed));
> uint32 levelHash = DatumGetUInt32(hash_any((unsigned char *) al->name,
> al->len));

Yeah, that'd fail on 32-bit machines.

Pushed v5 with some minor cosmetic tweaking.

regards, tom lane

Re: Adding comments to help understand psql hidden queries


On 21.03.24 18:31, David Christensen wrote:

Thanks for the feedback.  Enclosed is a v2 of this series(?) rebased
and with that warning fixed; @Greg Sabino Mullane I just created a
commit on your behalf with the message to hackers.  I'm also creating
a commit-fest entry for this thread.


I don't think your patch takes into account that the

/**... QUERY ...**/
...
/**...   ...**/

lines are supposed to align vertically.  With your patch, the first line 
would have variable length depending on the command.

Re: doc issues in event-trigger-matrix.html

> On 21 Mar 2024, at 22:47, Peter Eisentraut  wrote:
> 
> On 19.03.24 10:34, Daniel Gustafsson wrote:
 "Only for local objects"
 is there any reference explaining "local objects"?
 I think local object means objects that only affect one single database?
>> That's a bigger problem than the table representation, we never define what
>> "local object" mean anywhere in the EVT docs.  EV's are global for a 
>> database,
>> but not a cluster, so I assume what this means is that EVs for non-DDL 
>> commands
>> like COMMENT can only fire for a specific relation they are attached to and 
>> not
>> database wide?
> 
> I think we could replace this whole table by a few definitions:

Simply extending the "Overview of Event Trigger Behavior" section slightly
might even be enough?

> If tomorrow someone changes ... will they remember to update this table?

Highly unlikely.

--
Daniel Gustafsson

Re: Refactoring of pg_resetwal/t/001_basic.pl


On 21.03.24 17:58, Maxim Orlov wrote:
In commit 7b5275eec more tests and test coverage were added into 
pg_resetwal/t/001_basic.pl .
All the added stuff are pretty useful in my view.  Unfortunately, there 
were some magic constants
been used.  In overall, this is not a problem.  But while working on 64 
bit XIDs I've noticed these
changes and spent some time to figure it out what this magic values are 
stands fore.


And it turns out that I’m not the only one.

So, by Svetlana Derevyanko's suggestion, I made this patch.  I add 
constants, just like we did

in verify_heapam tests.


Ok, this sounds like a reasonable idea.



Sidenote here: in defines in multixact.c TransactionId type used, but 
I'm sure this is not correct,
since we're dealing here with MultiXactId and MultiXactOffset.  For now, 
this is obviously not a
problem, since sizes of this particular types are equal.  But this will 
manifest itself when we switch

to the 64 bits types for MultiXactOffset or MultiXactId.


Please send a separate patch for this if you want to propose any changes.

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

2024-03-21 Thread Peter Geoghegan

On Thu, Mar 7, 2024 at 10:42 AM Benoit Tigeot  wrote:
> I am not up to date with the last version of patch but I did a regular 
> benchmark with version 11 and typical issue we have at the moment and the 
> result are still very very good. [1]

Thanks for providing the test case. It was definitely important back
when the ideas behind the patch had not yet fully developed. It helped
me to realize that my thinking around non-required arrays (meaning
arrays that cannot reposition the scan, and just filter out
non-matching tuples) was still sloppy.

> In term of performance improvement the last proposals could be a real game 
> changer for 2 of our biggest databases. We hope that Postgres 17 will contain 
> those improvements.

Current plan is to commit this patch in the next couple of weeks,
ahead of Postgres 17 feature freeze.

-- 
Peter Geoghegan

Re: Partial aggregates pushdown

2024-03-21 Thread Bruce Momjian

On Thu, Mar 21, 2024 at 11:37:50AM +, 
fujii.y...@df.mitsubishielectric.co.jp wrote:
> Hi. Mr.Momjian, Mr.Lane, Mr.Haas, hackers.
> 
> I apologize for any misunderstanding regarding the context of the attached 
> patch and
> the points on which I requested a review. Could you please allow me to 
> clarify?
> 
> In the review around early December 2023, I received the following three 
> issues pointed out by Mr.Haas[1].
> 1. Transmitting state value safely between machines
> 2. Making the patch clearer by adding SQL keywords
> 3. Fixing the behavior when the HAVING clause is present
> 
> In the email sent on February 22, 2024[2], I provided an update on the 
> progress made in addressing these issues.
> Regarding issue 1, I have only provided a proposed solution in the email and 
> have not started the programming. 
> Therefore, the latest patch is not in a commit-ready state. As mentioned 
> later, we have also temporarily reverted the changes made to the 
> documentation.
> Before proceeding with the programming, I would like to discuss the proposed 
> solution with the community and seek consensus.
> If it is necessary to have source code in order to discuss, I can create a 
> simple prototype so that I can receive your feedback.
> Would you be able to provide your opinions on it?
> 
> Regarding issue 2., I have confirmed that creating a prototype allows us to 
> address the issue and clear the patch.
> In this prototype creation, the main purpose was to verify if the patch can 
> be cleared and significant revisions were made to the previous version.
> Therefore, I have removed all the document differences.
> I have submitted a patch [3] that includes the fixes for issue 3. to the 
> patch that was posted in [2].
> Regarding the proposed solution for issue 1, unlike the patch posted in [3], 
> we have a policy of not performing partial aggregation pushdown if we cannot 
> guarantee compatibility and safety.
> The latest patch in [3] is a POC patch. The patch that Mr. Momjian reviewed 
> is this.
> If user-facing documentation is needed for this POC patch, it can be added.
> 
> I apologize for the lack of explanation regarding this positioning, which may 
> have caused misunderstandings regarding the patch posted in [3].

That makes sense.  Let's get you answers to those questions first before
you continue.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: doc issues in event-trigger-matrix.html


On 19.03.24 10:34, Daniel Gustafsson wrote:

"Only for local objects"
is there any reference explaining "local objects"?
I think local object means objects that only affect one single database?

That's a bigger problem than the table representation, we never define what
"local object" mean anywhere in the EVT docs.  EV's are global for a database,
but not a cluster, so I assume what this means is that EVs for non-DDL commands
like COMMENT can only fire for a specific relation they are attached to and not
database wide?


I think we could replace this whole table by a few definitions:

- "Local objects" are everything except "global objects".

- "Global objects", for the purpose of event triggers, are databases, 
tablespaces, roles, role memberships, and parameter ACLs.


- DDL commands are all commands except SELECT, INSERT, UPDATE, DELETE, 
MERGE.


- Events triggers are supported for all DDL commands on all local objects.

Is this table saying anything else?

Is there any way to check if it's even correct?  For example, it shows 
that the event "sql_drop" can fire for a few ALTER commands, but how is 
this determined?  If tomorrow someone changes ALTER DOMAIN to possibly 
do a table rewrite, will they remember to update this table?

Re: Bytea PL/Perl transform

Alexander Korotkov  writes:
> On Tue, Jan 30, 2024 at 8:46 PM Pavel Stehule  wrote:
>> I marked this patch as ready for committer.

> The last version of the patch still provides transform for builtin
> type in a separate extension.  As discussed upthread such transforms
> don't need separate extensions, and could be provided as part of
> upgrades of existing extensions.  There is probably no consensus yet
> on what to do with existing extensions like jsonb_plperl and
> jsonb_plpython, but we clearly shouldn't spread such cases.

Yeah, I think including this as part of "plperl[u] 1.1" is probably
the best way forward.  The patch of record doesn't do that, so
I've set the CF entry back to Waiting On Author.

Taking a quick look at the rest of the patch ... I think the
documentation is pretty inadequate, as it just says that the transform
causes byteas to be "passed and returned as native Perl octet
strings", a term that it doesn't define, and googling doesn't exactly
clarify either.  The "example" is no example at all, as it does not
show what happens or how the results are different.  (Admittedly
the nearby example for the bool transform is nearly as bad, but we
could improve that too while we're at it.)

regards, tom lane

Re: Avoiding inadvertent debugging mode for pgbench

On Wed, Mar 20, 2024 at 11:57:25AM -0400, Greg Sabino Mullane wrote:
> My mistake. Attached please find version 3, which should hopefully make
> cfbot happy again.

Here is what I have staged for commit.  I plan to commit this within the
next few days.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 25523d7513615128ef234c03c87612c3b5eb98e5 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 21 Mar 2024 16:06:03 -0500
Subject: [PATCH v4 1/1] pgbench: Disambiguate -d option.

Many other utilities use -d to specify the database to use, but
pgbench uses it to enable debugging mode.  This is causing some
users to accidentally enable debugging mode.  This commit moves the
debugging mode to --debug, converts -d to accept the database name,
and introduces --dbname.  This is a backward-incompatible change,
but that has been judged to be worth the trade-off.

Author: Greg Sabino Mullane
Reviewed-by: Tomas Vondra, Euler Taveira, Alvaro Herrera, David Christensen
Discussion: https://postgr.es/m/CAKAnmmLjAzwVtb%3DVEaeuCtnmOLpzkJ1uJ_XiQ362YdD9B72HSg%40mail.gmail.com
---
 doc/src/sgml/ref/pgbench.sgml|  4 +--
 src/bin/pgbench/pgbench.c| 32 
 src/bin/pgbench/t/001_pgbench_with_server.pl |  2 +-
 src/bin/pgbench/t/002_pgbench_no_server.pl   |  2 +-
 4 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 279bb0ad7d..c3d0ec240b 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -162,7 +162,8 @@ pgbench  options  d
 
 
  
-  dbname
+  -d dbname
+  --dbname=dbname
   

 Specifies the name of the database to test in. If this is
@@ -463,7 +464,6 @@ pgbench  options  d
  
 
  
-  -d
   --debug
   

diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index af1f75257f..af776b31d8 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -933,7 +933,8 @@ usage(void)
 		   "  --show-script=NAME   show builtin script code, then exit\n"
 		   "  --verbose-errors print messages of all errors\n"
 		   "\nCommon options:\n"
-		   "  -d, --debug  print debugging output\n"
+		   "  --debug  print debugging output\n"
+		   "  -d, --dbname=DBNAME  database name to connect to\n"
 		   "  -h, --host=HOSTNAME  database server host or socket directory\n"
 		   "  -p, --port=PORT  database server port number\n"
 		   "  -U, --username=USERNAME  connect as specified database user\n"
@@ -6620,7 +6621,7 @@ main(int argc, char **argv)
 		{"builtin", required_argument, NULL, 'b'},
 		{"client", required_argument, NULL, 'c'},
 		{"connect", no_argument, NULL, 'C'},
-		{"debug", no_argument, NULL, 'd'},
+		{"dbname", required_argument, NULL, 'd'},
 		{"define", required_argument, NULL, 'D'},
 		{"file", required_argument, NULL, 'f'},
 		{"fillfactor", required_argument, NULL, 'F'},
@@ -6661,6 +6662,7 @@ main(int argc, char **argv)
 		{"max-tries", required_argument, NULL, 14},
 		{"verbose-errors", no_argument, NULL, 15},
 		{"exit-on-abort", no_argument, NULL, 16},
+		{"debug", no_argument, NULL, 17},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -6732,7 +6734,7 @@ main(int argc, char **argv)
 	if (!set_random_seed(getenv("PGBENCH_RANDOM_SEED")))
 		pg_fatal("error while setting random seed from PGBENCH_RANDOM_SEED environment variable");
 
-	while ((c = getopt_long(argc, argv, "b:c:CdD:f:F:h:iI:j:lL:M:nNp:P:qrR:s:St:T:U:v", long_options, )) != -1)
+	while ((c = getopt_long(argc, argv, "b:c:Cd:D:f:F:h:iI:j:lL:M:nNp:P:qrR:s:St:T:U:v", long_options, )) != -1)
 	{
 		char	   *script;
 
@@ -6773,7 +6775,7 @@ main(int argc, char **argv)
 is_connect = true;
 break;
 			case 'd':
-pg_logging_increase_verbosity();
+dbName = pg_strdup(optarg);
 break;
 			case 'D':
 {
@@ -6998,6 +7000,9 @@ main(int argc, char **argv)
 benchmarking_option_set = true;
 exit_on_abort = true;
 break;
+			case 17:			/* debug */
+pg_logging_increase_verbosity();
+break;
 			default:
 /* getopt_long already emitted a complaint */
 pg_log_error_hint("Try \"%s --help\" for more information.", progname);
@@ -7048,16 +7053,19 @@ main(int argc, char **argv)
 	 */
 	throttle_delay *= nthreads;
 
-	if (argc > optind)
-		dbName = argv[optind++];
-	else
+	if (dbName == NULL)
 	{
-		if ((env = getenv("PGDATABASE")) != NULL && *env != '\0')
-			dbName = env;
-		else if ((env = getenv("PGUSER")) != NULL && *env != '\0')
-			dbName = env;
+		if (argc > optind)
+			dbName = argv[optind++];
 		else
-			dbName = get_user_name_or_exit(progname);
+		{
+			if ((env = getenv("PGDATABASE")) != NULL && *env != '\0')
+dbName = env;
+			else if ((env = getenv("PGUSER")) != NULL && *env != '\0')
+dbName = env;
+			else
+dbName = get_user_name_or_exit(progname);
+		}
 	}
 
 	if (optind < argc)
diff --git

Re: Building with meson on NixOS/nixpkgs

2024-03-21 Thread Wolfgang Walther


Nazir Bilal Yavuz:

0001 & 0002: Adding code comments to explain why they have fallback
could be nice.
0003: Looks good to me.


Added some comments in the attached.

Best,

WolfgangFrom 2d271aafd96a0ea21710a06ac5236e47217c36d1 Mon Sep 17 00:00:00 2001
From: Wolfgang Walther 
Date: Sat, 2 Mar 2024 17:18:38 +0100
Subject: [PATCH v2 1/3] Fallback to uuid for ossp-uuid with meson

The upstream name for the ossp-uuid package / pkg-config file is "uuid". Many
distributions change this to be "ossp-uuid" to not conflict with e2fsprogs.

This lookup fails on distributions which don't change this name, for example
NixOS / nixpkgs. Both "ossp-uuid" and "uuid" are also checked in configure.ac.
---
 meson.build | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index c8fdfeb0ec3..942c79c8be3 100644
--- a/meson.build
+++ b/meson.build
@@ -1346,7 +1346,8 @@ if uuidopt != 'none'
 uuidfunc = 'uuid_to_string'
 uuidheader = 'uuid.h'
   elif uuidopt == 'ossp'
-uuid = dependency('ossp-uuid', required: true)
+# upstream is called "uuid", but many distros change this to "ossp-uuid"
+uuid = dependency('ossp-uuid', 'uuid', required: true)
 uuidfunc = 'uuid_export'
 uuidheader = 'uuid.h'
   else
-- 
2.44.0

From f150a8dcb92b08eab40b5dfec130a18f297c709f Mon Sep 17 00:00:00 2001
From: Wolfgang Walther 
Date: Sat, 2 Mar 2024 22:06:25 +0100
Subject: [PATCH v2 2/3] Fallback to clang in PATH with meson

Some distributions put clang into a different path than the llvm binary path.

For example, this is the case on NixOS / nixpkgs, which failed to find clang
with meson before this patch.
---
 meson.build | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/meson.build b/meson.build
index 942c79c8be3..f9e96b01cfa 100644
--- a/meson.build
+++ b/meson.build
@@ -759,7 +759,10 @@ if add_languages('cpp', required: llvmopt, native: false)
 llvm_binpath = llvm.get_variable(configtool: 'bindir')
 
 ccache = find_program('ccache', native: true, required: false)
-clang = find_program(llvm_binpath / 'clang', required: true)
+
+# Some distros put LLVM and clang in different paths, so fallback to
+# find via PATH, too.
+clang = find_program(llvm_binpath / 'clang', 'clang', required: true)
   endif
 elif llvmopt.auto()
   message('llvm requires a C++ compiler')
-- 
2.44.0

From fddee56b5e27a3b5e4c406e8caa2d230b49eb447 Mon Sep 17 00:00:00 2001
From: Wolfgang Walther 
Date: Mon, 11 Mar 2024 19:54:41 +0100
Subject: [PATCH v2 3/3] Support absolute bindir/libdir in regression tests
 with meson

Passing an absolute bindir/libdir will install the binaries and libraries to
/tmp_install/ and /tmp_install/ respectively.

This is path is correctly passed to the regression test suite via configure/make,
but not via meson, yet. This is because the "/" operator in the following expression
throws away the whole left side when the right side is an absolute path:

  test_install_location / get_option('libdir')

This was already correctly handled for dir_prefix, which is likely absolute as well.
This patch handles both bindir and libdir in the same way - prefixing absolute paths
with the tmp_install path correctly.
---
 meson.build | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/meson.build b/meson.build
index f9e96b01cfa..144c443b5e4 100644
--- a/meson.build
+++ b/meson.build
@@ -3048,15 +3048,17 @@ test_install_destdir = meson.build_root() / 'tmp_install/'
 if build_system != 'windows'
   # On unixoid systems this is trivial, we just prepend the destdir
   assert(dir_prefix.startswith('/')) # enforced by meson
-  test_install_location = '@0@@1@'.format(test_install_destdir, dir_prefix)
+  temp_install_bindir = '@0@@1@'.format(test_install_destdir, dir_prefix / dir_bin)
+  temp_install_libdir = '@0@@1@'.format(test_install_destdir, dir_prefix / dir_lib)
 else
   # drives, drive-relative paths, etc make this complicated on windows, call
   # into a copy of meson's logic for it
   command = [
 python, '-c',
 'import sys; from pathlib import PurePath; d1=sys.argv[1]; d2=sys.argv[2]; print(str(PurePath(d1, *PurePath(d2).parts[1:])))',
-test_install_destdir, dir_prefix]
-  test_install_location = run_command(command, check: true).stdout().strip()
+test_install_destdir]
+  temp_install_bindir = run_command(command, dir_prefix / dir_bin, check: true).stdout().strip()
+  temp_install_libdir = run_command(command, dir_prefix / dir_lib, check: true).stdout().strip()
 endif
 
 meson_install_args = meson_args + ['install'] + {
@@ -3093,7 +3095,6 @@ testport = 4
 
 test_env = environment()
 
-temp_install_bindir = test_install_location / get_option('bindir')
 test_initdb_template = meson.build_root() / 'tmp_install' / 'initdb-template'
 test_env.set('PG_REGRESS', pg_regress.full_path())
 test_env.set('REGRESS_SHLIB', regress_module.full_path())
@@ -3108,7 +3109,7 @@ test_env.set('PG_TEST_EXTRA',

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

On Thu, Mar 21, 2024 at 04:31:45PM -0400, Tom Lane wrote:
> I wrote:
>> ... I still see the problematic GRANT taking ~250ms, compared
>> to 5ms in v15.  roles_is_member_of is clearly on the hook for that.
> 
> Ah: looks like that is mainly the fault of the list_append_unique_oid
> calls in roles_is_member_of.  That's also an O(N^2) cost of course,
> though with a much smaller constant factor.
> 
> I don't think we have any really cheap way to de-duplicate the role
> OIDs, especially seeing that it has to be done on-the-fly within the
> collection loop, and the order of roles_list is at least potentially
> interesting.  Not sure how to make further progress without a lot of
> work.

Assuming these are larger lists, this might benefit from optimizations
involving SIMD intrinsics.  I looked into that a while ago [0], but the
effort was abandoned because we didn't have any concrete use-cases at the
time.  (I'm looking into some additional optimizations in a separate thread
[1] that would likely apply here, too.)

[0] https://postgr.es/m/20230308002502.GA3378449%40nathanxps13
[1] https://postgr.es/m/20240321183823.GA1800896%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

I wrote:
> ... I still see the problematic GRANT taking ~250ms, compared
> to 5ms in v15.  roles_is_member_of is clearly on the hook for that.

Ah: looks like that is mainly the fault of the list_append_unique_oid
calls in roles_is_member_of.  That's also an O(N^2) cost of course,
though with a much smaller constant factor.

I don't think we have any really cheap way to de-duplicate the role
OIDs, especially seeing that it has to be done on-the-fly within the
collection loop, and the order of roles_list is at least potentially
interesting.  Not sure how to make further progress without a lot of
work.

regards, tom lane

Re: Avoiding inadvertent debugging mode for pgbench

2024-03-21 Thread David Christensen

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Did a quick review of this one; CFbot is now happy, local regression tests all 
pass.

I think the idea here is sane; it's particularly confusing for some tools 
included in the main distribution to have different semantics, and this seems 
low on the breakage risk here, so worth the tradeoffs IMO.

The new status of this patch is: Ready for Committer

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

I wrote:
> It looks like part of the blame might be ascribable to catcache.c,
> as if you look at the problem microscopically you find that
> roles_is_member_of is causing catcache to make a ton of AUTHMEMMEMROLE
> catcache lists, and SearchSysCacheList is just iterating linearly
> through the cache's list-of-lists, so that search is where the O(N^2)
> time is actually getting taken.  Up to now that code has assumed that
> any one catcache would not have very many catcache lists.  Maybe it's
> time to make that smarter; but since we've gotten away with this
> implementation for decades, I can't help feeling that the real issue
> is with roles_is_member_of's usage pattern.

I wrote a quick finger exercise to make catcache.c use a hash table
instead of a single list for CatCLists, modeling it closely on the
existing hash logic for simple catcache entries.  This helps a good
deal, but I still see the problematic GRANT taking ~250ms, compared
to 5ms in v15.  roles_is_member_of is clearly on the hook for that.

regards, tom lane

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index d5a3c1b591..569f51cb33 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -88,6 +88,8 @@ static void CatCachePrintStats(int code, Datum arg);
 #endif
 static void CatCacheRemoveCTup(CatCache *cache, CatCTup *ct);
 static void CatCacheRemoveCList(CatCache *cache, CatCList *cl);
+static void RehashCatCache(CatCache *cp);
+static void RehashCatCacheLists(CatCache *cp);
 static void CatalogCacheInitializeCache(CatCache *cache);
 static CatCTup *CatalogCacheCreateEntry(CatCache *cache,
 		HeapTuple ntp, SysScanDesc scandesc,
@@ -444,6 +446,7 @@ CatCachePrintStats(int code, Datum arg)
 	long		cc_neg_hits = 0;
 	long		cc_newloads = 0;
 	long		cc_invals = 0;
+	long		cc_nlists = 0;
 	long		cc_lsearches = 0;
 	long		cc_lhits = 0;
 
@@ -453,7 +456,7 @@ CatCachePrintStats(int code, Datum arg)
 
 		if (cache->cc_ntup == 0 && cache->cc_searches == 0)
 			continue;			/* don't print unused caches */
-		elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %ld lsrch, %ld lhits",
+		elog(DEBUG2, "catcache %s/%u: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %d lists, %ld lsrch, %ld lhits",
 			 cache->cc_relname,
 			 cache->cc_indexoid,
 			 cache->cc_ntup,
@@ -465,6 +468,7 @@ CatCachePrintStats(int code, Datum arg)
 			 cache->cc_searches - cache->cc_hits - cache->cc_neg_hits - cache->cc_newloads,
 			 cache->cc_searches - cache->cc_hits - cache->cc_neg_hits,
 			 cache->cc_invals,
+			 cache->cc_nlist,
 			 cache->cc_lsearches,
 			 cache->cc_lhits);
 		cc_searches += cache->cc_searches;
@@ -472,10 +476,11 @@ CatCachePrintStats(int code, Datum arg)
 		cc_neg_hits += cache->cc_neg_hits;
 		cc_newloads += cache->cc_newloads;
 		cc_invals += cache->cc_invals;
+		cc_nlists += cache->cc_nlist;
 		cc_lsearches += cache->cc_lsearches;
 		cc_lhits += cache->cc_lhits;
 	}
-	elog(DEBUG2, "catcache totals: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %ld lsrch, %ld lhits",
+	elog(DEBUG2, "catcache totals: %d tup, %ld srch, %ld+%ld=%ld hits, %ld+%ld=%ld loads, %ld invals, %ld lists, %ld lsrch, %ld lhits",
 		 CacheHdr->ch_ntup,
 		 cc_searches,
 		 cc_hits,
@@ -485,6 +490,7 @@ CatCachePrintStats(int code, Datum arg)
 		 cc_searches - cc_hits - cc_neg_hits - cc_newloads,
 		 cc_searches - cc_hits - cc_neg_hits,
 		 cc_invals,
+		 cc_nlists,
 		 cc_lsearches,
 		 cc_lhits);
 }
@@ -573,6 +579,8 @@ CatCacheRemoveCList(CatCache *cache, CatCList *cl)
 	 cache->cc_keyno, cl->keys);
 
 	pfree(cl);
+
+	--cache->cc_nlist;
 }
 
 
@@ -611,14 +619,19 @@ CatCacheInvalidate(CatCache *cache, uint32 hashValue)
 	 * Invalidate *all* CatCLists in this cache; it's too hard to tell which
 	 * searches might still be correct, so just zap 'em all.
 	 */
-	dlist_foreach_modify(iter, >cc_lists)
+	for (int i = 0; i < cache->cc_nlbuckets; i++)
 	{
-		CatCList   *cl = dlist_container(CatCList, cache_elem, iter.cur);
+		dlist_head *bucket = >cc_lbucket[i];
 
-		if (cl->refcount > 0)
-			cl->dead = true;
-		else
-			CatCacheRemoveCList(cache, cl);
+		dlist_foreach_modify(iter, bucket)
+		{
+			CatCList   *cl = dlist_container(CatCList, cache_elem, iter.cur);
+
+			if (cl->refcount > 0)
+cl->dead = true;
+			else
+CatCacheRemoveCList(cache, cl);
+		}
 	}
 
 	/*
@@ -691,14 +704,19 @@ ResetCatalogCache(CatCache *cache)
 	int			i;
 
 	/* Remove each list in this cache, or at least mark it dead */
-	dlist_foreach_modify(iter, >cc_lists)
+	for (i = 0; i < cache->cc_nlbuckets; i++)
 	{
-		CatCList   *cl = dlist_container(CatCList, cache_elem, iter.cur);
+		dlist_head *bucket = >cc_lbucket[i];
 
-		if (cl->refcount > 0)
-			cl->dead = true;
-		else
-			CatCacheRemoveCList(cache, cl);
+		dlist_foreach_modify(iter, bucket)
+		{
+			CatCList   *cl = dlist_container(CatCList, cache_elem,

Re: Statistics Import and Export

2024-03-21 Thread Corey Huinker

>
>
>
> But ideally we'd just make it safe to dump and reload stats on your own
> tables, and then not worry about it.
>

That is my strong preference, yes.


>
> > Not off hand, no.
>
> To me it seems like inconsistent data to have most_common_freqs in
> anything but descending order, and we should prevent it.
>

Sorry, I misunderstood, I thought we were talking about values, not the
frequencies. Yes, the frequencies should only be monotonically
non-increasing (i.e. it can go down or flatline from N->N+1). I'll add a
test case for that.

Re: Statistics Import and Export

On Thu, 2024-03-21 at 15:10 -0400, Corey Huinker wrote:
> 
> In which case wouldn't the checkCanModify on pg_statistic would be a
> proxy for is_superuser/has_special_role_we_havent_created_yet.

So if someone pg_dumps their table and gets the statistics in the SQL,
then they will get errors loading it unless they are a member of a
special role?

If so we'd certainly need to make --no-statistics the default, and have
some way of skipping stats during reload of the dump (perhaps make the
set function a no-op based on a GUC?).

But ideally we'd just make it safe to dump and reload stats on your own
tables, and then not worry about it.

> Not off hand, no.

To me it seems like inconsistent data to have most_common_freqs in
anything but descending order, and we should prevent it.

> > 
Regards,
Jeff Davis

RE: Popcount optimization using AVX512

2024-03-21 Thread Amonson, Paul D

> -Original Message-
> From: David Rowley 
> Sent: Wednesday, March 20, 2024 5:28 PM
> To: Amonson, Paul D 
> Cc: Nathan Bossart ; Andres Freund
>
> I'm not sure about this "extern negates inline" comment.  It seems to me the
> compiler is perfectly free to inline a static function into an external 
> function
> and it's free to inline the static function elsewhere within the same .c file.
> 
> The final sentence of the following comment that the 0001 patch removes
> explains this:
> 
> /*
>  * When the POPCNT instruction is not available, there's no point in using
>  * function pointers to vary the implementation between the fast and slow
>  * method.  We instead just make these actual external functions when
>  * TRY_POPCNT_FAST is not defined.  The compiler should be able to inline
>  * the slow versions here.
>  */
> 
> Also, have a look at [1].  You'll see f_slow() wasn't even compiled and the 
> code
> was just inlined into f().  I just added the
> __attribute__((noinline)) so that usage() wouldn't just perform constant
> folding and just return 6.
> 
> I think, unless you have evidence that some common compiler isn't inlining the
> static into the extern then we shouldn't add the macros.
> It adds quite a bit of churn to the patch and will break out of core code as 
> you
> no longer have functions named pg_popcount32(),
> pg_popcount64() and pg_popcount().

This may be a simple misunderstanding extern != static. If I use the "extern" 
keyword then a symbol *will* be generated and inline will be ignored. This is 
NOT true of "static inline", where the compiler will try to inline the method. 
:)

In this patch set:
* I removed the macro implementation.
* Made everything that could possibly be inlined marked with the "static 
inline" keyword.
* Conditionally made the *_slow() functions "static inline" when 
TRY_POPCONT_FAST is not set.
* Found and fixed some whitespace errors in the AVX code implementation.

Thanks,
Paul


v12-0001-Refactor-Split-pg_popcount-functions-into-multiple-f.patch
Description: v12-0001-Refactor-Split-pg_popcount-functions-into-multiple-f.patch


v12-0002-Feature-Added-AVX-512-acceleration-to-the-pg_popcoun.patch
Description: v12-0002-Feature-Added-AVX-512-acceleration-to-the-pg_popcoun.patch

Re: Statistics Import and Export

2024-03-21 Thread Corey Huinker

>
> How about just some defaults then? Many of them have a reasonable
> default, like NULL or an empty array. Some are parallel arrays and
> either both should be specified or neither (e.g.
> most_common_vals+most_common_freqs), but you can check for that.
>

+1
Default NULL has been implemented for all parameters after n_distinct.


>
> > > Why are you calling checkCanModifyRelation() twice?
> >
> > Once for the relation itself, and once for pg_statistic.
>
> Nobody has the privileges to modify pg_statistic except superuser,
> right? I thought the point of a privilege check is that users could
> modify statistics for their own tables, or the tables they maintain.
>

In which case wouldn't the checkCanModify on pg_statistic would be a proxy
for is_superuser/has_special_role_we_havent_created_yet.



>
> >
> > I can see making it void and returning an error for everything that
> > we currently return false for, but if we do that, then a statement
> > with one pg_set_relation_stats, and N pg_set_attribute_stats (which
> > we lump together in one command for the locking benefits and atomic
> > transaction) would fail entirely if one of the set_attributes named a
> > column that we had dropped. It's up for debate whether that's the
> > right behavior or not.
>
> I'd probably make the dropped column a WARNING with a message like
> "skipping dropped column whatever". Regardless, have some kind of
> explanatory comment.
>

That's certainly do-able.




>
> >
> > I pulled most of the hardcoded values from pg_stats itself. The
> > sample set is trivially small, and the values inserted were in-order-
> > ish. So maybe that's why.
>
> In my simple test, most_common_freqs is descending:
>
>CREATE TABLE a(i int);
>INSERT INTO a VALUES(1);
>INSERT INTO a VALUES(2);
>INSERT INTO a VALUES(2);
>INSERT INTO a VALUES(3);
>INSERT INTO a VALUES(3);
>INSERT INTO a VALUES(3);
>INSERT INTO a VALUES(4);
>INSERT INTO a VALUES(4);
>INSERT INTO a VALUES(4);
>INSERT INTO a VALUES(4);
>ANALYZE a;
>SELECT most_common_vals, most_common_freqs
>  FROM pg_stats WHERE tablename='a';
> most_common_vals | most_common_freqs
>--+---
> {4,3,2}  | {0.4,0.3,0.2}
>(1 row)
>
> Can you show an example where it's not?
>

Not off hand, no.



>
> >
> > Maybe we could have the functions restricted to a role or roles:
> >
> > 1. pg_write_all_stats (can modify stats on ANY table)
> > 2. pg_write_own_stats (can modify stats on tables owned by user)
>
> If we go that route, we are giving up on the ability for users to
> restore stats on their own tables. Let's just be careful about
> validating data to mitigate this risk.
>

A great many test cases coming in the next patch.

Re: Comments on Custom RMGRs

On Thu, 2024-03-21 at 19:47 +0700, Danil Anisimow wrote:
> The proposed patch is not a complete solution for pgss and may not
> work correctly with replication.

Also, what is the desired behavior during replication? Should queries
on the primary be represented in pgss on the replica? If the answer is
yes, should they be differentiated somehow so that you can know where
the slow queries are running?

Regards,
Jeff Davis

Re: Comments on Custom RMGRs

On Thu, 2024-03-21 at 19:47 +0700, Danil Anisimow wrote:
> [pgss_001.v1.patch] adds a custom resource manager to the
> pg_stat_statements extension.

Did you consider moving the logic for loading the initial contents from
disk from pgss_shmem_startup to .rmgr_startup?

> The rm_checkpoint hook allows saving shared memory data to disk at
> each checkpoint. However, for pg_stat_statements, it matters when the
> checkpoint occurred. When the server shuts down, pgss deletes the
> temporary file of query texts. In other cases, this is unacceptable.
> To provide this capability, a flags parameter was added to the
> rm_checkpoint hook. The changes are presented in [rmgr_003.v2.patch].

Overall this seems fairly reasonable to me. I think this will work for
similar extensions, where the data being stored is independent from the
buffers.

My biggest concern is that it might not be quite right for a table AM
that has complex state that needs action to be taken at a slightly
different time, e.g. right after CheckPointBuffers().

Then again, the rmgr is a low-level API, and any extension using it
should be prepared to adapt to changes. If it works for pgss, then we
know it works for at least one thing, and we can always improve it
later. For instance, we might call the hook several times and pass it a
"phase" argument.

Regards,
Jeff Davis

Re: Refactor SASL exchange in preparation for OAuth Bearer

> On 20 Mar 2024, at 15:28, Daniel Gustafsson  wrote:
> 
>> On 29 Feb 2024, at 20:58, Jacob Champion  
>> wrote:
>> 
>> On Wed, Feb 28, 2024 at 2:54 PM Daniel Gustafsson  wrote:
>>> I rank that as one of my better typos actually. Fixed though.
>> 
>> LGTM!
> 
> Thanks for review, and since Heikki marked it ready for committer I assume 
> that
> counting as a +1 as well.  Attached is a rebase on top of HEAD to get a fresh
> run from the CFBot before applying this.

And after another pass over it I ended up pushing it today.

--
Daniel Gustafsson

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

2024-03-21 Thread walther


Tom Lane:

Actually, roles_is_member_of sucks before v16 too; the new thing
is only that it's being invoked during GRANT ROLE.  Using the
roles created by the given test case, I see in v15:

[...]
So it takes ~3.5s to populate the roles_is_member_of cache for "acc"
given this membership set.  This is actually considerably worse than
in v16 or HEAD, where the same test takes about 1.6s for me.


Ah, this reminds me that I hit the same problem about a year ago, but 
haven't had the time to put together a test-case, yet. In my case, it's 
like this:
- I have one role "authenticator" with which the application (PostgREST) 
connects to the database.
- This role has been granted all of the actual user roles and will then 
do a SET ROLE for each authenticated request it handles.
- In my case that's currently about 120k roles granted to 
"authenticator", back then it was probably around 60k.
- The first request (SET ROLE) for each session took between 5 and 10 
*minutes* to succeed - subsequent requests were instant.
- When the "authenticator" role is made SUPERUSER, the first request is 
instant, too.


I guess this matches exactly what you are observing.

There is one more thing that is actually even worse, though: When you 
try to cancel the query or terminate the backend while the SET ROLE is 
still running, this will not work. It will not only not cancel the 
query, but somehow leave the process for that backend in some kind of 
zombie state that is impossible to recover from.


All of this was v15.

Best,

Wolfgang

Re: An improved README experience for PostgreSQL

On Thu, Mar 21, 2024 at 10:24:12AM -0700, Andres Freund wrote:
> I was out while this was proposed and committed. Just wanted to say: Thanks!
> It was high time that we added this...

Definitely.  I hope we are able to build on this in the near future.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: add AVX2 support to simd.h

On Thu, Mar 21, 2024 at 12:09:44PM -0500, Nathan Bossart wrote:
> On Thu, Mar 21, 2024 at 11:30:30AM +0700, John Naylor wrote:
>> Further, now that the algorithm is more SIMD-appropriate, I wonder
>> what doing 4 registers at a time is actually buying us for either SSE2
>> or AVX2. It might just be a matter of scale, but that would be good to
>> understand.
> 
> I'll follow up with these numbers shortly.

It looks like the 4-register code still outperforms the 2-register code,
except for a handful of cases where there aren't many elements.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: documentation structure

On Thu, Mar 21, 2024 at 12:43 PM Alvaro Herrera  wrote:
> which is a bit odd: why are the two WAL chapters in the middle of the
> chapters 62 and 63 talking about AMs?  Maybe put 66 right after 63
> instead.Also, is it really better to have 62/63 first and 66
> later?  It sounds to me like 66 is more user-oriented and the other two
> are developer-oriented, so I'm inclined to suggest putting them the
> other way around, but I'm not really sure about this.  (Also, starting
> chapter 66 straight with 66.1 BTree without any intro text looks a bit
> odd; maybe one short introductory paragraph is sufficient?)

I had similar thoughts. I think that we should consider some changes
to the chapter ordering, but I didn't want to try to change too many
things all at once, since somebody only has to hate one thing about
the patch to sink the whole thing.

But since you brought it up, what I've been thinking about is that the
whole division into parts might need to be rethought a bit. I feel
like "VII. Internals" is a mix of about four different kinds of
content. First, the biggest portion of it is information about
developing certain kinds of C extensions -- all the "Writing a
Whatever" chapters, the "Whatever Access Method Interface Definition"
chapters, "Generic WAL Records", "Custom WAL Resource Managers", and
all the index-related chapters. Second, we've got some information
that I think is mainly of interest to people developing PostgreSQL
itself, namely, "PostgreSQL Coding Conventions", "Native Language
Support", and "System Catalog Declarations and Initial Contents". You
*might* care about these if you're developing extensions, or even if
you're not a developer at all, but then again you might not. Third,
we've got some reference material, namely "System Catalogs", "System
Views", and perhaps "Frontend/Backend Protocol". I distinguish these
from the previous two categories because I think you could care about
this stuff as a random user, or a developer of products that
interoperate with PostgreSQL but don't link with it or share any
common code. Finally, there's just a bunch of random bits and bobs
that we've decided to document here for one reason or another, either
because somebody else did a bunch of the work, like "Overview of
PostgreSQL Internals", or because some developer did something and
someone said "hey, that should be documented!", like "Backup Manifest
Format."

So my first thought is to pull out the stuff that's mainly for
PostgreSQL core developers and move it to an appendix. I propose we
create an appendix called "Developer Guide" and that it absorb the
existing appendix I, "The Source Code Repository", possibly all or
part of appendix J, "Documentation", and the chapters from "VII.
Internals" that are mostly of developer interest. I think that
possibly some of what's in "J. Documentation" should actually be moved
into the "Installation" chapter where we talk about building the
source code, because it doesn't make much sense to document the build
tool chain in one part of the documentation and the documentation
toolchain someplace else entirely, but "J.6. Style Guide" is developer
information, not build instructions.

My second thought is that the stuff from "VII. Internals" that I
categorized as reference material should move into section "VI.
Reference". I think we should also consider moving appendix F,
"Additional Supplied Modules and Extensions," and appendix G,
"Additional Supplied Programs" to the reference section. However,
prior to doing that, I think that appendix G needs some cleanup or
possibly we should just find a way to remove it outright. We're
shipping an appendix G with two major subsections, one of which is
completely empty and has been since v14, and the other of which
contains only two things. I think we should just remove the empty
sub-section entirely. I'm not sure what to do about the only with only
2 things in it (vacuumlo and oid2name). Would it be a bad idea to just
merge those bits into the client applications reference section?

My third thought is about what to do with the material in "VII.
Internals" that is about developing specific kind of extensions, like,
say, "Writing a Foreign Data Wrapper." If you look at "V. Server
Programming", you see that we actually have some very similar sections
there, like chapter 47, "Background Worker Processes" and chapter 50,
"Archive Modules". I think it's not very clear in the current
structure where topics that are relevant for extension developers
should go under "Server Programming" or under "Internals", and it
looks to me like different people have just done different things and
it's all a bit haphazard. One idea is to decide that the correct
answer is "Server Programming" and move all of the internals chapters
that fall into this category over to there. I don't think this is the
right answer, because that section also contains information about a
bunch of stuff that's strictly SQL-level, like rules and triggers. So

Re: broken JIT support on Fedora 40

2024-03-21 Thread Dmitry Dolgov

> On Sun, Mar 17, 2024 at 09:02:08PM +0100, Dmitry Dolgov wrote:
> > On Fri, Mar 15, 2024 at 01:54:38PM +1300, Thomas Munro wrote:
> > For me it seems that the LLVMRunPasses() call, new in
> >
> > commit 76200e5ee469e4a9db5f9514b9d0c6a31b496bff
> > Author: Thomas Munro 
> > Date:   Wed Oct 18 22:15:54 2023 +1300
> >
> > jit: Changes for LLVM 17.
> >
> > is reaching code that segfaults inside libLLVM, specifically in
> > llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, bool,
> > llvm::AAResults*, bool, llvm::Function*).  First obvious question
> > would be: is that NULL argument still acceptable?  Perhaps it wants
> > our LLVMTargetMachineRef there:
> >
> >err = LLVMRunPasses(module, passes, NULL, options);
> >
> > But then when we see what is does with that argument, it arrives at a
> > place that apparently accepts nullptr.
> >
> > https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/lib/Passes/PassBuilderBindings.cpp#L56
> > https://github.com/llvm/llvm-project/blob/6b2bab2839c7a379556a10287034bd55906d7094/llvm/include/llvm/Passes/PassBuilder.h#L124
> >
> > Hrmph.  Might need an assertion build to learn more.  I'll try to look
> > again next week or so.
>
> Looks like I can reproduce this as well, libLLVM crashes when reaching
> AddReturnAttributes inside InlineFunction, when trying to access
> operands of the return instruction. I think, for whatever reason, the
> latest LLVM doesn't like (i.e. do not expect this when performing
> inlining pass) return instructions that do not have a return value, and
> this is what happens in the outblock of deform function we generate
> (slot_compile_deform).
>
> For verification, I've modified the deform.outblock to call LLVMBuildRet
> instead of LLVMBuildRetVoid and this seems to help -- inline and deform
> stages are still performed as before, but nothing crashes. But of course
> it doesn't sound right that inlining pass cannot process such code.
> Unfortunately I don't see any obvious change in the recent LLVM history
> that would justify this outcome, might be a genuine bug, will
> investigate further.

I think I found the change that got it all started [1], the commit has a
set of tags like 18.1.0-rc1 and is relatively recent. The message
doesn't say anything related to the crash that we see, so I assume it's
indeed a bug. I've opened an issue to confirm this understanding [2]
(wow, issues were indeed moved to github since the last time I was
touching LLVM), let's see what would be the response.

[1]: 
https://github.com/llvm/llvm-project/commit/2da4960f20f7e5d88a68ce25636a895284dc66d8
[2]: https://github.com/llvm/llvm-project/issues/86162

Re: Adding comments to help understand psql hidden queries

2024-03-21 Thread David Christensen

Created the CF entry in commitfest 48 but didn't see it was already in 47; 
closing the CFEntry in 48. (Doesn't appear to be a different status than 
"Withdrawn"...)

Re: Improve WALRead() to suck data directly from WAL buffers when possible

2024-03-21 Thread Bharath Rupireddy

On Tue, Feb 20, 2024 at 11:40 AM Bharath Rupireddy
 wrote:
>
> Ran pgperltidy on the new TAP test file added. Please see the attached
> v25 patch set.

Please find the v26 patches after rebasing.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v26-0001-Add-check-in-WALReadFromBuffers-against-requeste.patch
Description: Binary data


v26-0002-Add-test-module-for-verifying-read-from-WAL-buff.patch
Description: Binary data


v26-0003-Use-WALReadFromBuffers-in-more-places.patch
Description: Binary data


v26-0004-Demonstrate-reading-unflushed-WAL-directly-from-.patch
Description: Binary data

Re: Introduce XID age and inactive timeout based replication slot invalidation

2024-03-21 Thread Bharath Rupireddy

On Thu, Mar 21, 2024 at 4:25 PM Amit Kapila  wrote:
>
> This makes sense to me. Apart from this, few more comments on 0001.

Thanks for looking into it.

> 1.
> - "%s as caught_up, conflict_reason IS NOT NULL as invalid "
> + "%s as caught_up, invalidation_reason IS NOT NULL as invalid "
>   live_check ? "FALSE" :
> - "(CASE WHEN conflict_reason IS NOT NULL THEN FALSE "
> + "(CASE WHEN conflicting THEN FALSE "
>
> I think here at both places we need to change 'conflict_reason' to
> 'conflicting'.

Basically, the idea there is to not live_check for invalidated logical
slots. It has nothing to do with conflicting. Up until now,
conflict_reason is also reporting wal_removed (although wrongly
including rows_removed, wal_level_insufficient, the two reasons for
conflicts). So, I think invalidation_reason is right for invalid
column. Also, I think we need to change conflicting to
invalidation_reason for live_check. So, I've changed that to use
invalidation_reason for both columns.

> 2.
>
> Can the reasons 'rows_removed' and 'wal_level_insufficient' appear for
> physical slots?

No. They can only occur for logical slots, check
InvalidatePossiblyObsoleteSlot, only the logical slots get
invalidated.

> If not, then it is not clear from above text.

I've stated that "It is set only for logical slots." for rows_removed
and wal_level_insufficient. Other reasons can occur for both slots.

> 3.
> -# Verify slots are reported as non conflicting in pg_replication_slots
> +# Verify slots are reported as valid in pg_replication_slots
>  is( $node_standby->safe_psql(
>   'postgres',
>   q[select bool_or(conflicting) from
> -   (select conflict_reason is not NULL as conflicting
> -from pg_replication_slots WHERE slot_type = 'logical')]),
> +   (select conflicting from pg_replication_slots
> + where slot_type = 'logical')]),
>   'f',
> - 'Logical slots are reported as non conflicting');
> + 'Logical slots are reported as valid');
>
> I don't think we need to change the comment or success message in this test.

Yes. There the intention of the test case is to verify logical slots
are reported as non conflicting. So, I changed them.

Please find the v14-0001 patch for now. I'll post the other patches soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

v14-0001-Track-invalidation_reason-in-pg_replication_slot.patch
Description: Binary data

Re: Catalog domain not-null constraints

2024-03-21 Thread Isaac Morland

On Thu, 21 Mar 2024 at 10:30, Tom Lane  wrote:

> The SQL spec's answer to that conundrum appears to be "NULL is
> a valid value of every domain, and if you don't like it, tough".
>

To be fair, NULL is a valid value of every type. Even VOID has NULL.

In this context, it’s a bit weird to be able to decree up front when
defining a type that no table column of that type, anywhere, may ever
contain a NULL. It would be nice if there was a way to reverse the default
so that if you (almost or) never want NULLs anywhere that’s what you get
without saying "NOT NULL" all over the place, and instead just specify
"NULLABLE" (or something) where you want. But that effectively means
optionally changing the behaviour of CREATE TABLE and ALTER TABLE.

Re: Statistics Import and Export

On Thu, 2024-03-21 at 03:27 -0400, Corey Huinker wrote:
> 
> They can, but part of what I wanted to show was that the values that
> aren't directly passed in as parameters (staopN, stacollN) get set to
> the correct values, and those values aren't guaranteed to match
> across databases, hence testing them in the regression test rather
> than in a TAP test. I'd still like to be able to test that.

OK, that's fine.

> > The function signature for pg_set_attribute_stats could be more
> > friendly 
...
> 1. We'd have to compare the stats provided against the stats that are
> already there, make that list in-memory, and then re-order what
> remains
> 2. There would be no way to un-set statistics of a given stakind,
> unless we added an "actually set it null" boolean for each parameter
> that can be null. 
> 3. I tried that with the JSON formats, it made the code even messier
> than it already was.

How about just some defaults then? Many of them have a reasonable
default, like NULL or an empty array. Some are parallel arrays and
either both should be specified or neither (e.g.
most_common_vals+most_common_freqs), but you can check for that.

> > Why are you calling checkCanModifyRelation() twice?
> 
> Once for the relation itself, and once for pg_statistic.

Nobody has the privileges to modify pg_statistic except superuser,
right? I thought the point of a privilege check is that users could
modify statistics for their own tables, or the tables they maintain.

> 
> I can see making it void and returning an error for everything that
> we currently return false for, but if we do that, then a statement
> with one pg_set_relation_stats, and N pg_set_attribute_stats (which
> we lump together in one command for the locking benefits and atomic
> transaction) would fail entirely if one of the set_attributes named a
> column that we had dropped. It's up for debate whether that's the
> right behavior or not.

I'd probably make the dropped column a WARNING with a message like
"skipping dropped column whatever". Regardless, have some kind of
explanatory comment.

> 
> I pulled most of the hardcoded values from pg_stats itself. The
> sample set is trivially small, and the values inserted were in-order-
> ish. So maybe that's why.

In my simple test, most_common_freqs is descending:

   CREATE TABLE a(i int);
   INSERT INTO a VALUES(1);
   INSERT INTO a VALUES(2);
   INSERT INTO a VALUES(2);
   INSERT INTO a VALUES(3);
   INSERT INTO a VALUES(3);
   INSERT INTO a VALUES(3);
   INSERT INTO a VALUES(4);
   INSERT INTO a VALUES(4);
   INSERT INTO a VALUES(4);
   INSERT INTO a VALUES(4);
   ANALYZE a;
   SELECT most_common_vals, most_common_freqs
 FROM pg_stats WHERE tablename='a';
most_common_vals | most_common_freqs 
   --+---
{4,3,2}  | {0.4,0.3,0.2}
   (1 row)

Can you show an example where it's not?

> 
> Maybe we could have the functions restricted to a role or roles:
> 
> 1. pg_write_all_stats (can modify stats on ANY table)
> 2. pg_write_own_stats (can modify stats on tables owned by user)

If we go that route, we are giving up on the ability for users to
restore stats on their own tables. Let's just be careful about
validating data to mitigate this risk.

Regards,
Jeff Davis

Re: Adding comments to help understand psql hidden queries

2024-03-21 Thread David Christensen

Hi Jim,

Thanks for the feedback.  Enclosed is a v2 of this series(?) rebased
and with that warning fixed; @Greg Sabino Mullane I just created a
commit on your behalf with the message to hackers.  I'm also creating
a commit-fest entry for this thread.

Best,

David


v2-0002-Add-output-of-the-command-that-got-us-here-to-the.patch
Description: Binary data


v2-0001-Include-SQL-comments-on-echo-hidden-output.patch
Description: Binary data

Re: An improved README experience for PostgreSQL

2024-03-21 Thread Andres Freund

On 2024-02-28 14:59:16 -0600, Nathan Bossart wrote:
> On Wed, Feb 28, 2024 at 02:21:49PM -0600, Nathan Bossart wrote:
> > On Wed, Feb 28, 2024 at 02:07:34PM -0600, Andrew Atkinson wrote:
> >> I agree that starting with the direct conversion is reasonable. Markdown
> >> "modernizes" the file using a popular plain text file format that's
> >> renderable.
> > 
> > Thanks.  I'll commit this shortly.
> 
> Committed.

I was out while this was proposed and committed. Just wanted to say: Thanks!
It was high time that we added this...

Re: Trying to build x86 version on windows using meson

2024-03-21 Thread Dave Cramer

Andres,


On Thu, 21 Mar 2024 at 12:51, Andres Freund  wrote:

> Hi,
>
> On 2024-03-21 07:11:23 -0400, Dave Cramer wrote:
> > It seems that attempting to cross-compile on an ARM machine might be
> asking
> > too much as the use cases are pretty limited.
>
> It for sure is if you don't even provide the precise commands and logs of a
> failed run...
>
>
> > So the impetus for this is that folks require 32bit versions of psqlODBC.
> > Unfortunately EDB is no longer distributing a 32 bit windows version.
> >
> > All I really need is a 32bit libpq. This seems like a much smaller lift.
> > Suggestions ?
>
> FWIW, I can cross compile postgres from linux to 32bit windows without an
> issue. If you really just need a 32bit libpq, that might actually be
> easier.
>
> cd /tmp/ && rm -rf /tmp/meson-w32 && m setup --buildtype debug
> -Dcassert=true -Db_pch=true --cross-file
> ~/src/meson/cross/linux-mingw-w64-32bit.txt /tmp/meson-w32 ~/src/postgresql
> && cd /tmp/meson-w32 && ninja
>
> file src/interfaces/libpq/libpq.dll
> src/interfaces/libpq/libpq.dll: PE32 executable (DLL) (console) Intel
> 80386, for MS Windows, 19 sections
>
> You'd need a windows openssl to actually have a useful libpq, but that
> should
> be fairly simple.
>
>
> There are two warnings that I think point to us doing something wrong, but
> they're not affecting libpq:
>
> [1585/1945 42  81%] Linking target src/bin/pgevent/pgevent.dll
> /usr/bin/i686-w64-mingw32-ld: warning: resolving _DllRegisterServer by
> linking to _DllRegisterServer@0
> Use --enable-stdcall-fixup to disable these warnings
> Use --disable-stdcall-fixup to disable these fixups
> /usr/bin/i686-w64-mingw32-ld: warning: resolving _DllUnregisterServer by
> linking to _DllUnregisterServer@0
>
>
>
Attached correct log file

Dave
Build started at 2024-03-21T13:07:08.707715
Main binary: C:\Program Files\Meson\meson.exe
Build Options: '-Dextra_include_dirs=c:\Program Files\OpenSSL-Win64\include' 
-Derrorlogs=True '-Dextra_lib_dirs=c:\Program Files\OpenSSL-win64' 
'-Dprefix=c:\postgres86'
Python system: Windows
The Meson build system
Version: 1.3.1
Source dir: C:\Users\davec\projects\postgresql
Build dir: C:\Users\davec\projects\postgresql\build
Build type: native build
Project name: postgresql
Project version: 17devel
---
Detecting compiler via: `icl ""` -> [WinError 2] The system cannot find the 
file specified
---
Detecting compiler via: `cl /?` -> 0
stdout:
C/C++ COMPILER OPTIONS


  -OPTIMIZATION-

/O1 maximum optimizations (favor space) /O2 maximum optimizations (favor speed)
/Ob inline expansion (default n=0)   /Od disable optimizations (default)
/Og enable global optimization  /Oi[-] enable intrinsic functions
/Os favor code space/Ot favor code speed
/Ox optimizations (favor speed) /Oy[-] enable frame pointer omission 
/favor: select processor to optimize for, one of:
blend - a combination of optimizations for several different x86 processors
ATOM - Intel(R) Atom(TM) processors 

 -CODE GENERATION-

/Gu[-] ensure distinct functions have distinct addresses
/Gw[-] separate global variables for linker
/GF enable read-only string pooling /Gm[-] enable minimal rebuild
/Gy[-] separate functions for linker/GS[-] enable security checks
/GR[-] enable C++ RTTI  /GX[-] enable C++ EH (same as /EHsc)
/guard:cf[-] enable CFG (control flow guard)
/guard:ehcont[-] enable EH continuation metadata (CET)
/EHs enable C++ EH (no SEH exceptions)  /EHa enable C++ EH (w/ SEH exceptions)
/EHc extern "C" defaults to nothrow 
/EHr always generate noexcept runtime termination checks
/fp: choose floating-point model:
contract - consider floating-point contractions when generating code
except[-] - consider floating-point exceptions when generating code
fast - "fast" floating-point model; results are less predictable
precise - "precise" floating-point model; results are predictable
strict - "strict" floating-point model (implies /fp:except)
/Qfast_transcendentals generate inline FP intrinsics even with /fp:except
/Qspectre[-] enable mitigations for CVE 2017-5753
/Qpar[-] enable parallel code generation
/Qpar-report:1 auto-parallelizer diagnostic; indicate parallelized loops
/Qpar-report:2 auto-parallelizer diagnostic; indicate loops not parallelized
/Qvec-report:1 auto-vectorizer diagnostic; indicate vectorized loops
/Qvec-report:2 auto-vectorizer diagnostic; indicate loops not vectorized
/GL[-] enable link-time code generation 
/volatile: choose volatile model:
iso - Acquire/release semantics not guaranteed on volatile accesses
ms  - Acquire/release semantics guaranteed on volatile accesses
/GA optimize for Windows Application/Ge force stack checking for all funcs
/Gs[num] control stack checking calls   /Gh enable _penter function call
/GH enable _pexit function call /GT generate fiber-safe TLS accesses
/RTC1 Enable fast checks

Re: add AVX2 support to simd.h

On Thu, Mar 21, 2024 at 12:09:44PM -0500, Nathan Bossart wrote:
> It does still eventually win, although not nearly to the same extent as
> before.  I extended the benchmark a bit to show this.  I wouldn't be
> devastated if we only got 0001 committed for v17, given these results.

(In case it isn't clear from the graph, after 128 elements, I only tested
at 200, 300, 400, etc. elements.)

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: add AVX2 support to simd.h

On Thu, Mar 21, 2024 at 11:30:30AM +0700, John Naylor wrote:
> I'm much happier about v5-0001. With a small tweak it would match what
> I had in mind:
> 
> + if (nelem < nelem_per_iteration)
> + goto one_by_one;
> 
> If this were "<=" then the for long arrays we could assume there is
> always more than one block, and wouldn't need to check if any elements
> remain -- first block, then a single loop and it's done.
> 
> The loop could also then be a "do while" since it doesn't have to
> check the exit condition up front.

Good idea.  That causes us to re-check all of the tail elements when the
number of elements is evenly divisible by nelem_per_iteration, but that
might be worth the trade-off.

> Yes, that spike is weird, because it seems super-linear. However, the
> more interesting question for me is: AVX2 isn't really buying much for
> the numbers covered in this test. Between 32 and 48 elements, and
> between 64 and 80, it's indistinguishable from SSE2. The jumps to the
> next shelf are postponed, but the jumps are just as high. From earlier
> system benchmarks, I recall it eventually wins out with hundreds of
> elements, right? Is that still true?

It does still eventually win, although not nearly to the same extent as
before.  I extended the benchmark a bit to show this.  I wouldn't be
devastated if we only got 0001 committed for v17, given these results.

> Further, now that the algorithm is more SIMD-appropriate, I wonder
> what doing 4 registers at a time is actually buying us for either SSE2
> or AVX2. It might just be a matter of scale, but that would be good to
> understand.

I'll follow up with these numbers shortly.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 5d4d91d169b973838c99e8d4fdadcb09df36a6ea Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Wed, 20 Mar 2024 14:20:24 -0500
Subject: [PATCH v6 1/2] pg_lfind32(): add "overlap" code for remaining
 elements

---
 src/include/port/pg_lfind.h | 103 
 1 file changed, 70 insertions(+), 33 deletions(-)

diff --git a/src/include/port/pg_lfind.h b/src/include/port/pg_lfind.h
index b8dfa66eef..22a3711ab5 100644
--- a/src/include/port/pg_lfind.h
+++ b/src/include/port/pg_lfind.h
@@ -80,6 +80,49 @@ pg_lfind8_le(uint8 key, uint8 *base, uint32 nelem)
 	return false;
 }
 
+/*
+ * pg_lfind32_helper
+ *
+ * Searches one 4-register-block of integers.  The caller is responsible for
+ * ensuring that there are at least 4-registers-worth of integers remaining.
+ */
+static inline bool
+pg_lfind32_helper(const Vector32 keys, uint32 *base)
+{
+	const uint32 nelem_per_vector = sizeof(Vector32) / sizeof(uint32);
+	Vector32	vals1,
+vals2,
+vals3,
+vals4,
+result1,
+result2,
+result3,
+result4,
+tmp1,
+tmp2,
+result;
+
+	/* load the next block into 4 registers */
+	vector32_load(, base);
+	vector32_load(, [nelem_per_vector]);
+	vector32_load(, [nelem_per_vector * 2]);
+	vector32_load(, [nelem_per_vector * 3]);
+
+	/* compare each value to the key */
+	result1 = vector32_eq(keys, vals1);
+	result2 = vector32_eq(keys, vals2);
+	result3 = vector32_eq(keys, vals3);
+	result4 = vector32_eq(keys, vals4);
+
+	/* combine the results into a single variable */
+	tmp1 = vector32_or(result1, result2);
+	tmp2 = vector32_or(result3, result4);
+	result = vector32_or(tmp1, tmp2);
+
+	/* return whether there was a match */
+	return vector32_is_highbit_set(result);
+}
+
 /*
  * pg_lfind32
  *
@@ -119,46 +162,40 @@ pg_lfind32(uint32 key, uint32 *base, uint32 nelem)
 	}
 #endif
 
-	for (i = 0; i < tail_idx; i += nelem_per_iteration)
+	/*
+	 * If there aren't enough elements for the SIMD code, jump to the standard
+	 * one-by-one linear search code.
+	 */
+	if (nelem <= nelem_per_iteration)
+		goto one_by_one;
+
+	/*
+	 * Process as many elements as possible with a block of 4 registers.
+	 */
+	do
 	{
-		Vector32	vals1,
-	vals2,
-	vals3,
-	vals4,
-	result1,
-	result2,
-	result3,
-	result4,
-	tmp1,
-	tmp2,
-	result;
-
-		/* load the next block into 4 registers */
-		vector32_load(, [i]);
-		vector32_load(, [i + nelem_per_vector]);
-		vector32_load(, [i + nelem_per_vector * 2]);
-		vector32_load(, [i + nelem_per_vector * 3]);
-
-		/* compare each value to the key */
-		result1 = vector32_eq(keys, vals1);
-		result2 = vector32_eq(keys, vals2);
-		result3 = vector32_eq(keys, vals3);
-		result4 = vector32_eq(keys, vals4);
-
-		/* combine the results into a single variable */
-		tmp1 = vector32_or(result1, result2);
-		tmp2 = vector32_or(result3, result4);
-		result = vector32_or(tmp1, tmp2);
-
-		/* see if there was a match */
-		if (vector32_is_highbit_set(result))
+		if (pg_lfind32_helper(keys, [i]))
 		{
 			Assert(assert_result == true);
 			return true;
 		}
-	}
+
+		i += nelem_per_iteration;
+
+	} while (i < tail_idx);
+
+	/*
+	 * Process the last 'nelem_per_iteration' elements in the array with a
+	 * 4-register block.  This will cause us to

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

On Thu, Mar 14, 2024 at 9:07 PM James Coleman  wrote:
> If the goal here is the most minimal patch possible, then please
> commit what you proposed. I am interested in improving the document
> further, but I don't know how to do that easily if the requirement is
> effectively "must only change one specific detail at a time".

So, yesterday I wrote a long email on how I saw the goals here.
Despite our disagreements, I believe we agree that the text I proposed
is better than what's there, so I've committed that change now. I've
also marked the CF entry as committed. Please propose the other
changes you want separately.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

I wrote:
> I poked into this a bit.  It seems the problem is that as of v16, we
> try to search for the "best" role membership path from the current
> user to the target role, and that's done in a very brute-force way,
> as a side effect of computing the set of *all* role memberships the
> current role has.

Actually, roles_is_member_of sucks before v16 too; the new thing
is only that it's being invoked during GRANT ROLE.  Using the
roles created by the given test case, I see in v15:

$ psql
psql (15.6)
Type "help" for help.

regression=# drop table at;
DROP TABLE
regression=# set role a_0010308;
SET
regression=> create table at(f1 int);
CREATE TABLE
regression=> \timing
Timing is on.
regression=> set role acc;
SET
Time: 0.493 ms
regression=> insert into at values(1);
INSERT 0 1
Time: 3565.029 ms (00:03.565)
regression=> insert into at values(1);
INSERT 0 1
Time: 2.308 ms

So it takes ~3.5s to populate the roles_is_member_of cache for "acc"
given this membership set.  This is actually considerably worse than
in v16 or HEAD, where the same test takes about 1.6s for me.

Apparently the OP has designed their use-case so that they dodge these
implementation problems in v15 and earlier, but that's a far cry from
saying that there were no problems with lots-o-roles before v16.

regards, tom lane

Refactoring of pg_resetwal/t/001_basic.pl

2024-03-21 Thread Maxim Orlov

Hi!

In commit 7b5275eec more tests and test coverage were added into
pg_resetwal/t/001_basic.pl.
All the added stuff are pretty useful in my view.  Unfortunately, there
were some magic constants
been used.  In overall, this is not a problem.  But while working on 64 bit
XIDs I've noticed these
changes and spent some time to figure it out what this magic values are
stands fore.

And it turns out that I’m not the only one.

So, by Svetlana Derevyanko's suggestion, I made this patch.  I add
constants, just like we did
in verify_heapam tests.

Sidenote here: in defines in multixact.c TransactionId type used, but I'm
sure this is not correct,
since we're dealing here with MultiXactId and MultiXactOffset.  For now,
this is obviously not a
problem, since sizes of this particular types are equal.  But this will
manifest itself when we switch
to the 64 bits types for MultiXactOffset or MultiXactId.

As always, reviews and opinions are very welcome!

-- 
Best regards,
Maxim Orlov.


v1-0001-Refactor-pg_resetwal-t-001_basic.pl.patch
Description: Binary data

Re: Trying to build x86 version on windows using meson

2024-03-21 Thread Andres Freund

Hi,

On 2024-03-21 07:11:23 -0400, Dave Cramer wrote:
> It seems that attempting to cross-compile on an ARM machine might be asking
> too much as the use cases are pretty limited.

It for sure is if you don't even provide the precise commands and logs of a
failed run...


> So the impetus for this is that folks require 32bit versions of psqlODBC.
> Unfortunately EDB is no longer distributing a 32 bit windows version.
>
> All I really need is a 32bit libpq. This seems like a much smaller lift.
> Suggestions ?

FWIW, I can cross compile postgres from linux to 32bit windows without an
issue. If you really just need a 32bit libpq, that might actually be easier.

cd /tmp/ && rm -rf /tmp/meson-w32 && m setup --buildtype debug -Dcassert=true 
-Db_pch=true --cross-file ~/src/meson/cross/linux-mingw-w64-32bit.txt 
/tmp/meson-w32 ~/src/postgresql && cd /tmp/meson-w32 && ninja

file src/interfaces/libpq/libpq.dll
src/interfaces/libpq/libpq.dll: PE32 executable (DLL) (console) Intel 80386, 
for MS Windows, 19 sections

You'd need a windows openssl to actually have a useful libpq, but that should
be fairly simple.


There are two warnings that I think point to us doing something wrong, but 
they're not affecting libpq:

[1585/1945 42  81%] Linking target src/bin/pgevent/pgevent.dll
/usr/bin/i686-w64-mingw32-ld: warning: resolving _DllRegisterServer by linking 
to _DllRegisterServer@0
Use --enable-stdcall-fixup to disable these warnings
Use --disable-stdcall-fixup to disable these fixups
/usr/bin/i686-w64-mingw32-ld: warning: resolving _DllUnregisterServer by 
linking to _DllUnregisterServer@0


Greetings,

Andres Freund

Re: documentation structure

2024-03-21 Thread Alvaro Herrera

On 2024-Mar-21, Robert Haas wrote:

> On Thu, Mar 21, 2024 at 9:38 AM Tom Lane  wrote:
> > I'd follow the extend.sgml precedent: have a file corresponding to the
> > chapter and containing any top-level text we need, then that includes
> > a file per sect1.
> 
> OK, here's a new patch set. I've revised 0003 and 0004 to use this
> approach, 

Great, thanks.  Looking at the index in the PDF after (only) 0003, we
now have this structure

62. Table Access Method Interface Definition 
... 2475
63. Index Access Method Interface Definition 
... 2476
63.1. Basic API Structure for Indexes 
.. 2476
63.2. Index Access Method Functions 
.. 2479
63.3. Index Scanning 

 2485
63.4. Index Locking Considerations 
. 2486
63.5. Index Uniqueness Checks 
.. 2487
63.6. Index Cost Estimation Functions 
. 2489
64. Generic WAL Records 
.
 2492
65. Custom WAL Resource Managers 
. 2494
66. Built-in Index Access Methods 
.. 2496

which is a bit odd: why are the two WAL chapters in the middle of the
chapters 62 and 63 talking about AMs?  Maybe put 66 right after 63
instead.Also, is it really better to have 62/63 first and 66
later?  It sounds to me like 66 is more user-oriented and the other two
are developer-oriented, so I'm inclined to suggest putting them the
other way around, but I'm not really sure about this.  (Also, starting
chapter 66 straight with 66.1 BTree without any intro text looks a bit
odd; maybe one short introductory paragraph is sufficient?)

> and I've added a new 0005 that does essentially the same
> thing for the PL chapters.

I was looking at the PL chapters earlier today too, wondering whether
this would be valuable; but I worry that there are too many
sub-sub-sections there, so it could end up being a bit messy.  I didn't
look at the resulting output though.

> 0001 and 0002 are [un]changed. Should 0002 use the include-an-entity
> approach as well?

Shrug, I wouldn't, doesn't look worth it.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No es bueno caminar con un hombre muerto"

Re: AIX support

2024-03-21 Thread Sriram RK

Thanks, Tom and Alvaro, for the info.
We shall look into to details and get back.

From: Tom Lane 
Date: Thursday, 21 March 2024 at 7:27 PM
To: Sriram RK 
Cc: pgsql-hack...@postgresql.org 
Subject: Re: AIX support
Sriram RK  writes:
> We are working on AIX systems and noticed that the thread on removing AIX 
> support in Postgres going forward.
> https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b
> We would be glad to understand any outstanding issues hindering the
> support on AIX.

Did you read the linked thread?  Starting say about here:

https://www.postgresql.org/message-id/flat/20240224172345.32%40rfd.leadboat.com#8b41e50c2190c82c861d91644eed9c30

> Also we would like to provide any feasible support from our end for enabling 
> the support on AIX.

Who is "we", and how much resources are you prepared to put into this?

> We would request the community to extend the support on AIX ..

The community, in the sense of the existing people doing significant
work on Postgres, are absolutely not going to do that.  If you can
bring a bunch of work to fix all the problems noted in the discussion
thread, and commit to providing ongoing development manpower and
hardware to keep it working, maybe something could happen.  But I
suspect you will find it cheaper to start thinking about migrating
off AIX.

regards, tom lane

Re: documentation structure

On Thu, Mar 21, 2024 at 10:31 AM Robert Haas  wrote:
> 0001 and 0002 are changed. Should 0002 use the include-an-entity
> approach as well?

Woops. I meant to say that 0001 and 0002 are *unchanged*.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

2024-03-21 Thread Michael P

On Mar 21, 2024, at 13:07, Alvaro Herrera  wrote:
> Given that Michaël is temporarily gone, I propose to push the attached
> tomorrow.

Thanks for doing so. I’m wondering whether I should be an author of this patch 
at this stage, tbh. I wrote all the tests and rewrote most of the internals to 
adjust with the consensus reached ;)
--
Michael

Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2024-03-21 Thread Andrey M. Borodin




> On 21 Mar 2024, at 18:54, Peter Geoghegan  wrote:
>  Do the posting lists that are corrupt
> (reported on elsewhere) also have duplicate TIDs?

I do not have access now, but AFAIR yes.
Thanks for pointers!

BTW there were also some errors in logs like
ERROR:  index "tablename" contains unexpected zero page at block 1265985 HINT:
and even
MultiXactId 34043703 has not been created yet -- apparent wraparound
"right sibling's left-link doesn't match: right sibling 4 of target 2 with 
leafblkno 2 and scanblkno 2 spuriously links to non-target 1 on level 0 of 
index "indexname"

Multixact problem was fixed by vacuum freeze, other indexes were repacked.


> On 21 Mar 2024, at 20:21, Matthias van de Meent 
>  wrote:
> 
> Would you happen to have a PostgreSQL version number (or commit hash)
> to help debugging? Has it always had that PG version, or has the
> version been upgraded?

Vanilla 14.11 (14.10 when created).

> Considering the age of this thread, and thus potential for v14 pre-.4:
> Did this cluster use REINDEX (concurrently) for the relevant indexes?


As now I see, chances are my case is not related to the original thread.

Thanks!


Best regards, Andrey Borodin.

Re: Slow GRANT ROLE on PostgreSQL 16 with thousands of ROLEs

[ redirecting to -hackers ]

alex work  writes:
> We encounter slow `GRANT ROLES` only on PostgreSQL 16 instances up to 42 
> seconds
> in production, the client process at PostgresSQL would use 100% of the CPU.
> Which is a surprise compared to other instances running older PostgreSQL
> releases. On production we have a *LOT* of ROLEs, which unfortunately a case
> that we did not test before switching the new servers into production mode.

I poked into this a bit.  It seems the problem is that as of v16, we
try to search for the "best" role membership path from the current
user to the target role, and that's done in a very brute-force way,
as a side effect of computing the set of *all* role memberships the
current role has.  In the given case, we could have skipped all that
if we simply tested whether the current role is directly a member
of the target: it is, so there can't be any shorter path.  But in
any case roles_is_member_of has horrid performance when the current
role is a member of a lot of roles.

It looks like part of the blame might be ascribable to catcache.c,
as if you look at the problem microscopically you find that
roles_is_member_of is causing catcache to make a ton of AUTHMEMMEMROLE
catcache lists, and SearchSysCacheList is just iterating linearly
through the cache's list-of-lists, so that search is where the O(N^2)
time is actually getting taken.  Up to now that code has assumed that
any one catcache would not have very many catcache lists.  Maybe it's
time to make that smarter; but since we've gotten away with this
implementation for decades, I can't help feeling that the real issue
is with roles_is_member_of's usage pattern.

For self-containedness, attached is a directly usable shell script
to reproduce the problem.  The complaint is that the last GRANT
takes multiple seconds (about 5s on my machine), rather than
milliseconds.

regards, tom lane

#!/bin/bash

echo "CREATE ROLE acc WITH LOGIN NOSUPERUSER INHERIT CREATEDB CREATEROLE NOREPLICATION;" > create-roles.sql

#create a lot of `a_` roles and make sure `acc` is member of each one of them:
for idx1 in $(seq -w 1 100); do for idx2 in $(seq -w 1 12); do for idx3 in $(seq -w 1 10); do
 echo "CREATE ROLE a_${idx1}${idx2}${idx3} WITH NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;"
 echo "GRANT a_${idx1}${idx2}${idx3} TO acc WITH ADMIN OPTION;"
done; done; done >>create-roles.sql

#create a lot of `d_` roles and make sure `acc` is member of each one of them:
for idx1 in $(seq -w 1 100); do for idx2 in $(seq -w 1 12); do for idx3 in $(seq -w 1 10); do
 echo "CREATE ROLE d_${idx1}${idx2}${idx3} WITH NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;"
 echo "GRANT d_${idx1}${idx2}${idx3} TO acc WITH ADMIN OPTION;"
done; done; done >>create-roles.sql

#create a lot of `s_` roles:
for idx1 in $(seq -w 1 100); do for idx2 in $(seq -w 1 12); do for idx3 in $(seq -w 1 10); do
 echo "CREATE ROLE s_${idx1}${idx2}${idx3} WITH NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN;"
done; done; done >>create-roles.sql

time psql -f create-roles.sql -q -d postgres

time psql -U acc postgres -c 'GRANT d_0010109 TO s_0010109;'

Re: Possibility to disable `ALTER SYSTEM`

2024-03-21 Thread Robert Treat

On Wed, Mar 20, 2024 at 10:31 PM Magnus Hagander  wrote:
>
> On Wed, Mar 20, 2024 at 8:52 PM Robert Haas  wrote:
>>
>> On Wed, Mar 20, 2024 at 3:17 PM Magnus Hagander  wrote:
>> > Right, what I meant is that making it a packaging decision is the better 
>> > place. Wherever it goes, allowing the administrator to choose what fits 
>> > them should be made possible.
>>
>> +1. Which is also the justification for this patch, when it comes
>> right down to it. The administrator gets to decide how the contents of
>> postgresql.conf are to be managed on their particular installation.
>
>
> Not really. The administrator can *already* do that. It's trivial.
>
> This patch is about doing it in a way that doesn't produce as ugly a 
> message.But if we're "delegating" it to packagers and "os administrators", 
> then the problem is already solved. This patch is about trying to solve it 
> *without* involving the packagers or OS administrators.
>
> Not saying we shouldn't do it, but I'd argue the exact opposite of yours 
> aboe, which is that it's very much not the justification of the patch :)
>
>
>>
>> They can decide that postgresql.conf should be writable by the same
>> user that runs PostgreSQL, or not. And they should also be able to
>> decide that ALTER SYSTEM is an OK way to change configuration, or that
>> it isn't. How we enable them to make that decision is a point for
>> discussion, and how exactly we phrase the documentation is a point for
>> discussion, but we have no business trying to impose conditions, as if
>> they're only allowed to make that decision if they conform to some
>> (IMHO ridiculous) requirements that we dictate from on high. It's
>> their system, not ours.
>
>
> Agreed on all those except they can already do this. It's just that the error 
> message is ugly. The path of least resistance would be to just specifically 
> detect a permissions error on the postgresql.auto.conf file when you try to 
> do ALTER SYSTEM, and throw at least an error hint about "you must allow 
> writing to this file for the feature to work".
>
> So this patch isn't at all about enabling this functionality. It's about 
> making it more user friendly.
>
>
>> I mean, for crying out loud, users can set enable_seqscan=off in
>> postgresql.conf and GLOBALLY DISABLE SEQUENTIAL SCANS. They can set
>
>
> This is actually a good example, because it's kind of like this patch. It 
> doesn't *actually* disable the ability to run sequential scans, it just 
> disables the "usual way". Just like this patch doesn't prevent the superuser 
> from editing the config, but it does prevent them droin doing it "the usual 
> way".
>
>
>>
>> zero_damaged_pages=on in postgresql.conf and silently remove vast
>> quantities of data without knowing that they're doing anything. We
>> don't even question that stuff ... although we probably should be
>
>
> I like how you got this far and didn't even mention fsync=off :)
>

And yet somehow query hints are more scary than ALL of these things. Go figure!

Robert Treat
https://xzilla.net

Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2024-03-21 Thread Matthias van de Meent

On Thu, 21 Mar 2024 at 07:17, Andrey M. Borodin  wrote:
>
> > On 29 Jun 2022, at 17:43, Robins Tharakan  wrote:
>
> Sorry to bump ancient thread, I have some observations that might or might 
> not be relevant.
> Recently we noticed a corruption on one of clusters. The corruption at hand 
> is not in system catalog, but in user indexes.
> The cluster was correctly configured: checksums, fsync, FPI etc.
> The cluster never was restored from a backup. It’s a single-node cluster, so 
> it was not ever promoted, pg_rewind-ed etc. VM had never been rebooted.
>
> But, the cluster had been experiencing 10 OOMs a day. There were no torn 
> pages, no checsum erros at log at all. Yet, B-tree indexes became corrupted.

Would you happen to have a PostgreSQL version number (or commit hash)
to help debugging? Has it always had that PG version, or has the
version been upgraded?

Considering the age of this thread, and thus potential for v14 pre-.4:
Did this cluster use REINDEX (concurrently) for the relevant indexes?


Kind regards,

Matthias van de Meent

Re: UUID v7

2024-03-21 Thread Jelte Fennema-Nio

On Wed, 20 Mar 2024 at 19:08, Andrey M. Borodin  wrote:
> Timer-based bits contribute to global sortability. But the real timers we 
> have are not even millisecond adjusted. We can hope for ~few ms variation in 
> one datacenter or in presence of atomic clocks.

I think the main benefit of using microseconds would not be
sortability between servers, but sortability between backends. With
the current counter approach between backends we only have sortability
at the millisecond level.

However, I don't really think it is incredibly important to get the
"perfect" approach to filling in rand_a/rand_b right now. As long as
we don't document what we do, we can choose to change the method
without breaking backwards compatibility. Because either approach
results in valid UUIDv7s.

Re: An improved README experience for PostgreSQL

On Thu, Mar 21, 2024 at 03:24:17PM +0100, Daniel Gustafsson wrote:
>> On 21 Mar 2024, at 15:11, Nathan Bossart  wrote:
>> I added that to maintain the line break that was in the non-Markdown
>> version.  I'd rather match the style of the following paragraph (patch
>> attached) than mess with .gitattributes.
> 
> +1, this looks better IMHO.

Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: SQL:2011 application time


On 20.03.24 17:21, Paul Jungwirth wrote:

On 3/20/24 03:55, jian he wrote:

hi.
minor cosmetic issues, other than that, looks good.

*pk_period = (indexStruct->indisexclusion);
to
*pk_period = indexStruct->indisexclusion;

... >
if (with_period && !fkconstraint->fk_with_period)
ereport(ERROR,
(errcode(ERRCODE_INVALID_FOREIGN_KEY),
errmsg("foreign key uses PERIOD on the referenced table but not the
referencing table")));


Both included in the new patches here.

Rebased to a0390f6ca6.


Two more questions:

1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to 
ri_AttributesEqual():


-   if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], 
RIAttType(rel, attnums[i]),

-   oldvalue, newvalue))
+   if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]),
+   newvalue, oldvalue))

But the declared arguments of ri_AttributesEqual() are oldvalue and 
newvalue, so passing them backwards is really confusing.  And the change 
does matter in the tests.


Can we organize this better?

2. There are some tests that error with

ERROR:  only b-tree indexes are supported for non-PERIOD foreign keys

But this is an elog() error, so should not normally be visible.  I 
suspect some other error should really show here, and the order of 
checks is a bit wrong or something?

Re: Support a wildcard in backtrace_functions

2024-03-21 Thread Jelte Fennema-Nio

On Wed, 13 Mar 2024 at 16:32, Jelte Fennema-Nio  wrote:
> How
> about the following aproach. It still uses 3 GUCs, but they now all
> work together. There's one entry point and two additional filters
> (level and function name)
>
> # Says what log entries to log backtraces for
> log_backtrace = {all|internal|none} (default: internal)
>
> # Excludes log entries from log_include_backtrace by level
> backtrace_min_level = {debug4|...|fatal} (default: error)
>
> # Excludes log entries from log_include_backtrace if function name
> # does not match list, but empty string disables this filter (thus
> # logging for all functions)
> backtrace_functions = {...} (default: '')

Attached is a patch that implements this. Since the more I think about
it, the more I like this approach.


v8-0001-Add-PGErrorVerbosity-to-typedefs.list.patch
Description: Binary data


v8-0002-Restructure-backtrace-related-GUCs.patch
Description: Binary data

Re: documentation structure

On Thu, Mar 21, 2024 at 9:38 AM Tom Lane  wrote:
> I'd follow the extend.sgml precedent: have a file corresponding to the
> chapter and containing any top-level text we need, then that includes
> a file per sect1.

OK, here's a new patch set. I've revised 0003 and 0004 to use this
approach, and I've added a new 0005 that does essentially the same
thing for the PL chapters.

0001 and 0002 are changed. Should 0002 use the include-an-entity
approach as well?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v2-0004-docs-Consolidate-into-new-WAL-for-Extensions-chap.patch
Description: Binary data


v2-0002-docs-Demote-Monitoring-Disk-Usage-from-chapter-to.patch
Description: Binary data


v2-0003-docs-Merge-separate-chapters-on-built-in-index-AM.patch
Description: Binary data


v2-0001-docs-Remove-the-Installation-from-Binaries-chapte.patch
Description: Binary data


v2-0005-docs-Merge-all-procedural-language-documentation-.patch
Description: Binary data

Re: speed up a logical replica setup

2024-03-21 Thread Euler Taveira

On Thu, Mar 21, 2024, at 10:33 AM, vignesh C wrote:
> If we commit this we might not be able to change the way the option
> behaves once the customers starts using it. How about removing these
> options in the first version and adding it in the next version after
> more discussion.

We don't need to redesign this one if we want to add a format string in a next
version. A long time ago, pg_dump started to accept pattern for tables without
breaking or deprecating the -t option. If you have 100 databases and you don't
want to specify the options or use a script to generate it for you, you also
have the option to let pg_createsubscriber generate the object names for you.
Per my experience, it will be a rare case.

> Currently dry-run will do the check and might fail on identifying a
> few failures like after checking subscriber configurations. Then the
> user will have to correct the configuration and re-run then fix the
> next set of failures. Whereas the suggest-config will display all the
> optimal configuration for both the primary and the standby in a single
> shot. This is not a must in the first version, it can be done as a
> subsequent enhancement.

Do you meant publisher, right? Per order, check_subscriber is done before
check_publisher and it checks all settings on the subscriber before exiting. In
v30, I changed the way it provides the required settings. In a previous version,
it fails when it found a wrong setting; the current version, check all settings
from that server before providing a suitable error.

pg_createsubscriber: checking settings on publisher
pg_createsubscriber: primary has replication slot "physical_slot"
pg_createsubscriber: error: publisher requires wal_level >= logical
pg_createsubscriber: error: publisher requires 2 replication slots, but only 0 
remain
pg_createsubscriber: hint: Consider increasing max_replication_slots to at 
least 3.
pg_createsubscriber: error: publisher requires 2 wal sender processes, but only 
0 remain
pg_createsubscriber: hint: Consider increasing max_wal_senders to at least 3.

If you have such an error, you will fix them all and rerun using dry run mode
again to verify everything is ok. I don't have a strong preference about it. It
can be changed easily (unifying the check functions or providing a return for
each of the check functions).

--
Euler Taveira
EDB   https://www.enterprisedb.com/

Re: Catalog domain not-null constraints

Peter Eisentraut  writes:
> 
> A quick reading of the SQL standard suggests to me that the way we are 
> doing null handling in domain constraints is all wrong.  The standard 
> says that domain constraints are only checked on values that are not 
> null.  So both the handling of constraints using the CHECK syntax is 
> nonstandard and the existence of explicit NOT NULL constraints is an 
> extension.  The CREATE DOMAIN reference page already explains why all of 
> this is a bad idea.  Do we want to document all of that further, or 
> maybe we just want to rip out domain not-null constraints, or at least 
> not add further syntax for it?
> 

Yeah.  The real problem with domain not null is: how can a column
that's propagated up through the nullable side of an outer join
still be considered to belong to such a domain?

The SQL spec's answer to that conundrum appears to be "NULL is
a valid value of every domain, and if you don't like it, tough".
I'm too lazy to search the archives, but we have had at least one
previous discussion about how we should adopt the spec's semantics.
It'd be an absolutely trivial fix in CoerceToDomain (succeed
immediately if input is NULL), but the question is what to do
with existing "DOMAIN NOT NULL" DDL.

Anyway, now that I recall all that, e5da0fe3c is throwing good work
after bad, and I wonder if we shouldn't revert it.

regards, tom lane

Re: An improved README experience for PostgreSQL

> On 21 Mar 2024, at 15:11, Nathan Bossart  wrote:
> 
> On Thu, Mar 21, 2024 at 02:42:27PM +0100, Peter Eisentraut wrote:
>> The committed README.md contains trailing whitespace on one line:
>> 
>> General documentation about this version of PostgreSQL can be found at:$
>> -https://www.postgresql.org/docs/devel/$
>> +  $
>> 
>> If this is intentional (it could be, since trailing whitespace is
>> potentially significant in Markdown), then please add something to
>> .gitattributes.  Otherwise, please fix that.
> 
> I added that to maintain the line break that was in the non-Markdown
> version.  I'd rather match the style of the following paragraph (patch
> attached) than mess with .gitattributes.

+1, this looks better IMHO.

--
Daniel Gustafsson

Re: An improved README experience for PostgreSQL

On Thu, Mar 21, 2024 at 02:42:27PM +0100, Peter Eisentraut wrote:
> The committed README.md contains trailing whitespace on one line:
> 
>  General documentation about this version of PostgreSQL can be found at:$
> -https://www.postgresql.org/docs/devel/$
> +  $
> 
> If this is intentional (it could be, since trailing whitespace is
> potentially significant in Markdown), then please add something to
> .gitattributes.  Otherwise, please fix that.

I added that to maintain the line break that was in the non-Markdown
version.  I'd rather match the style of the following paragraph (patch
attached) than mess with .gitattributes.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/README.md b/README.md
index 445eb2f2d0..f6104c038b 100644
--- a/README.md
+++ b/README.md
@@ -11,11 +11,10 @@ and functions.  This distribution also contains C language bindings.
 
 Copyright and license information can be found in the file COPYRIGHT.
 
-General documentation about this version of PostgreSQL can be found at:
-  
-In particular, information about building PostgreSQL from the source
-code can be found at:
-
+General documentation about this version of PostgreSQL can be found at
+.  In particular, information
+about building PostgreSQL from the source code can be found at
+.
 
 The latest version of this software, and related software, may be
 obtained at .  For more information

Re: AIX support

Sriram RK  writes:
> We are working on AIX systems and noticed that the thread on removing AIX 
> support in Postgres going forward.
> https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b
> We would be glad to understand any outstanding issues hindering the
> support on AIX.

Did you read the linked thread?  Starting say about here:

https://www.postgresql.org/message-id/flat/20240224172345.32%40rfd.leadboat.com#8b41e50c2190c82c861d91644eed9c30

> Also we would like to provide any feasible support from our end for enabling 
> the support on AIX.

Who is "we", and how much resources are you prepared to put into this?

> We would request the community to extend the support on AIX ..

The community, in the sense of the existing people doing significant
work on Postgres, are absolutely not going to do that.  If you can
bring a bunch of work to fix all the problems noted in the discussion
thread, and commit to providing ongoing development manpower and
hardware to keep it working, maybe something could happen.  But I
suspect you will find it cheaper to start thinking about migrating
off AIX.

regards, tom lane

Re: AIX support

2024-03-21 Thread Alvaro Herrera

On 2024-Mar-21, Sriram RK wrote:

> Hello Team,
> 
> We are working on AIX systems and noticed that the thread on removing AIX 
> support in Postgres going forward.
> 
> https://github.com/postgres/postgres/commit/0b16bb8776bb834eb1ef8204ca95dd7667ab948b”
> 
> We would be glad to understand any outstanding issues hindering the support 
> on AIX.

There's a Discussion link at the bottom of that commit message.  I
suggest you read that discussion complete, and consider how much effort
you or your company are willing to spend on doing the maintenance of the
port yourselves for the community.  Maybe ponder this question: would it
be less onerous to migrate your Postgres servers to Linux, like Phil
Florent described on the currently-last message of that thread?

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Para tener más hay que desear menos"

Re: 13dev failed assert: comparetup_index_btree(): ItemPointer values should never be equal

2024-03-21 Thread Peter Geoghegan

On Thu, Mar 21, 2024 at 2:16 AM Andrey M. Borodin  wrote:
> Most of these messages look similar, except last one: “cross page item order 
> invariant violated for index”. Indeed, index scans were hanging in a cycle.
> I could not locate problem in WAL yet, because a lot of other stuff is going 
> on. But I have no other ideas, but suspect that posting list redo is 
> corrupting index in case of a crash.

Some of these errors seem unrelated to posting lists. For example, this one:

2024-03-01 11:54:08,162 ERROR : Corrupted index: 96066
webhooks_webhookresponse_webhook_id_db49ebcd XX002 ERROR: item order
invariant violated for index
"webhooks_webhookresponse_webhook_id_db49ebcd" DETAIL: Lower index
tid=(522,24) (points to heap tid=(73981,1)) higher index tid=(522,25)
(points to heap tid=(73981,1)) page lsn=31B/E522B640.

Notice that there are duplicate heap TIDs here, but no posting list.
This is almost certainly a symptom of heap related corruption -- often
a problem with recovery. Do the posting lists that are corrupt
(reported on elsewhere) also have duplicate TIDs?

Such problems tend to first get noticed when inserts fail with
"posting list split failed" errors -- but that's just a symptom. It
just so happens that the hardening added to places like
_bt_swap_posting() and _bt_binsrch_insert() is much more likely to
visibly break than anything else, at least in practice.

-- 
Peter Geoghegan

Re: An improved README experience for PostgreSQL

2024-03-21 Thread Roberto Mello

On Wed, Feb 28, 2024 at 11:55 AM David E. Wheeler 
wrote:

>
> IME the keys to decent-looking Markdown are:
>
> 1. Wrapping lines to a legible width (76-80 chars)
> 2. Link references rather than inline links


+1 on Markdown including David's suggestions above. Agree that without
proper guidelines,
md files can become messy looking.

Roberto

Re: An improved README experience for PostgreSQL