Re: Document NULL

2024-05-01 Thread Tom Lane
David Rowley  writes:
> Let's bash it into shape a bit more before going any further on actual 
> wording.

FWIW, I want to push back on the idea of making it a tutorial section.
I too considered that, but in the end I think it's a better idea to
put it into the "main" docs, for two reasons:

1. I want this to be a fairly official/formal statement about how we
treat nulls; not that it has to be written in dry academic style or
whatever, but it has to be citable as The Reasons Why We Act Like That,
so the tutorial seems like the wrong place.

2. I think we'll soon be cross-referencing it from other places in the
docs, even if we don't actually move existing bits of text into it.
So again, cross-ref'ing the tutorial doesn't feel quite right.

Those arguments don't directly say where it should go, but after
surveying things a bit I think it could become section 5.2 in
ddl.sgml, between "Table Basics" and "Default Values".  Another
angle could be to put it after "Default Values" --- except that
that section already assumes you know what a null is.

I've not read any of David's text in detail yet, but that's my
two cents on where to place it.

regards, tom lane




Re: Document NULL

2024-05-01 Thread David Rowley
On Thu, 2 May 2024 at 03:12, David G. Johnston
 wrote:
> Attached is a very rough draft attempting this, based on my own thoughts and 
> those expressed by Tom in [1], which largely align with mine.

Thanks for picking this up. I agree that we should have something to
improve this.

It would be good to see some subtitles in this e.g "Three-valued
boolean logic" and document about NULL being unknown, therefore false.
Giving a few examples would be good to, which I think is useful as it
at least demonstrates a simple way of testing these things using a
simple FROMless SELECT, e.g. "SELECT NULL = NULL;".  You could link to
this section from where we document WHERE clauses.

Maybe another subtitle would be "GROUP BY / DISTINCT clauses with NULL
values", and then explain that including some other examples using
"SELECT 1 IS NOT DISTINCT FROM NULL;" to allow the reader to
experiment and learn by running queries.

You likely skipped them due to draft status, but if not, references
back to other sections likely could do with links back to that
section, e.g "amount of precipitation Hayward" is not on that page.
Without that you're assuming the reader is reading the documents
linearly.

Another section might briefly explain about disallowing NULLs in
columns with NOT NULL constraints, then link to wherever we properly
document those.

typo:

+ Handling Unkowns (NULL)

Maybe inject "Values" after Unknown.

Let's bash it into shape a bit more before going any further on actual wording.

David




Re: Document NULL

2024-05-01 Thread Kashif Zeeshan
On Wed, May 1, 2024 at 8:12 PM David G. Johnston 
wrote:

> Hi,
>
> Over in [1] it was rediscovered that our documentation assumes the reader
> is familiar with NULL.  It seems worthwhile to provide both an introduction
> to the topic and an overview of how this special value gets handled
> throughout the system.
>
> Attached is a very rough draft attempting this, based on my own thoughts
> and those expressed by Tom in [1], which largely align with mine.
>
> I'll flesh this out some more once I get support for the goal, content,
> and placement.  On that point, NULL is a fundamental part of the SQL
> language and so having it be a section in a Chapter titled "SQL Language"
> seems to fit well, even if that falls into our tutorial.  Framing this up
> as tutorial content won't be that hard, though I've skipped on examples and
> such pending feedback.  It really doesn't fit as a top-level chapter under
> part II nor really under any of the other chapters there.  The main issue
> with the tutorial is the forward references to concepts not yet discussed
> but problem points there can be addressed.
>
> I do plan to remove the entity reference and place the content into
> query.sgml directly in the final version.  It is just much easier to write
> an entire new section in its own file.
>

Reviewed the documentation update and it's quite extensive, but I think
it's better to include some examples as well.

Regards
Kashif Zeeshan

>
> David J.
>
> [1]
> https://www.postgresql.org/message-id/1859814.1714532025%40sss.pgh.pa.us
>
>


Re: [PATCH] json_lex_string: don't overread on bad UTF8

2024-05-01 Thread Michael Paquier
On Thu, May 02, 2024 at 11:23:13AM +0900, Michael Paquier wrote:
> About the fact that we may finish by printing unfinished UTF-8
> sequences, I'd be curious to hear your thoughts.  Now, the information
> provided about the partial byte sequences can be also useful for
> debugging on top of having the error code, no?

By the way, as long as I have that in mind..  I am not sure that it is
worth spending cycles in detecting the unfinished sequences and make
these printable.  Wouldn't it be enough for more cases to adjust
token_error() to truncate the byte sequences we cannot print?

Another thing that I think would be nice would be to calculate the
location of what we're parsing on a given line, and provide that in
the error context.  That would not be backpatchable as it requires a
change in JsonLexContext, unfortunately, but it would help in making
more sense with an error if the incomplete byte sequence is at the
beginning of a token or after an expected character.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] json_lex_string: don't overread on bad UTF8

2024-05-01 Thread Michael Paquier
On Wed, May 01, 2024 at 04:22:24PM -0700, Jacob Champion wrote:
> On Tue, Apr 30, 2024 at 11:09 PM Michael Paquier  wrote:
>> Not sure to like much the fact that this advances token_terminator
>> first.  Wouldn't it be better to calculate pg_encoding_mblen() first,
>> then save token_terminator?  I feel a bit uneasy about saving a value
>> in token_terminator past the end of the string.  That a nit in this
>> context, still..
> 
> v2 tries it that way; see what you think. Is the concern that someone
> might add code later that escapes that macro early?

Yeah, I am not sure if that's something that would really happen, but
that looks like a good practice to keep anyway to keep a clean stack
at any time.

>> Ah, that makes sense.  That looks OK here.  A comment around the test
>> would be adapted to document that, I guess.
> 
> Done.

That seems OK at quick glance.  I don't have much room to do something
about this patch this week as an effect of Golden Week and the
buildfarm effect, but I should be able to get to it next week once the
next round of minor releases is tagged.

About the fact that we may finish by printing unfinished UTF-8
sequences, I'd be curious to hear your thoughts.  Now, the information
provided about the partial byte sequences can be also useful for
debugging on top of having the error code, no?
--
Michael


signature.asc
Description: PGP signature


Re: New GUC autovacuum_max_threshold ?

2024-05-01 Thread David Rowley
On Sat, 27 Apr 2024 at 02:13, Robert Haas  wrote:
> Let's compare the current situation to the situation post-patch with a
> cap of 500k. Consider a table 1024 times larger than the one I
> mentioned above, so pgbench scale factor 25600, size on disk 320GB.
> Currently, that table will be vacuumed for bloat when the number of
> dead tuples exceeds 20% of the table size, because that's the default
> value of autovacuum_vacuum_scale_factor. The table has 2.56 billion
> tuples, so that means that we're going to vacuum it when there are
> more than 510 million dead tuples. Post-patch, we will vacuum when we
> have 500 thousand dead tuples. Suppose a uniform workload that slowly
> updates rows in the table. If we were previously autovacuuming the
> table once per day (1440 minutes) we're now going to try to vacuum it
> almost every minute (1440 minutes / 1024 = 84 seconds).

I've not checked your maths, but if that's true, that's not going to work.

I think there are fundamental problems with the parameters that drive
autovacuum that need to be addressed before we can consider a patch
like this one.

Here are some of the problems that I know about:

1. Autovacuum has exactly zero forward vision and operates reactively
rather than proactively.  This "blind operating" causes tables to
either not need vacuumed or suddenly need vacuumed without any
consideration of how busy autovacuum is at that current moment.
2. There is no prioritisation for the order in which tables are autovacuumed.
3. With the default scale factor, the larger a table becomes, the more
infrequent the autovacuums.
4. Autovacuum is more likely to trigger when the system is busy
because more transaction IDs are being consumed and there is more DML
occurring. This results in autovacuum having less work to do during
quiet periods when there are more free resources to be doing the
vacuum work.

In my opinion, the main problem with Frédéric's proposed GUC/reloption
is that it increases the workload that autovacuum is responsible for
and, because of #2, it becomes more likely that autovacuum works on
some table that isn't the highest priority table to work on which can
result in autovacuum starvation of tables that are more important to
vacuum now.

I think we need to do a larger overhaul of autovacuum to improve
points 1-4 above.  I also think that there's some work coming up that
might force us into this sooner than we think.  As far as I understand
it, AIO will break vacuum_cost_page_miss because everything (providing
IO keeps up) will become vacuum_cost_page_hit. Maybe that's not that
important as that costing is quite terrible anyway.

Here's a sketch of an idea that's been in my head for a while:

Round 1:
1a) Give autovacuum forward vision (#1 above) and instead of vacuuming
a table when it (atomically) crosses some threshold, use the existing
scale_factors and autovacuum_freeze_max_age to give each table an
autovacuum "score", which could be a number from 0-100, where 0 means
do nothing and 100 means nuclear meltdown.  Let's say a table gets 10
points for the dead tuples meeting the current scale_factor and maybe
an additional point for each 10% of proportion the size of the table
is according to the size of the database (gives some weight to space
recovery for larger tables).  For relfrozenxid, make the score the
maximum of dead tuple score vs the percentage of the age(relfrozenxid)
is to 2 billion. Use a similar maximum score calc for age(relminmxid)
2 billion.
1b) Add a new GUC that defines the minimum score a table must reach
before autovacuum will consider it.
1c) Change autovacuum to vacuum the tables with the highest scores first.

Round 2:
2a) Have autovacuum monitor the score of the highest scoring table
over time with buckets for each power of 2 seconds in history from
now. Let's say 20 buckets, about 12 days of history. Migrate scores
into older buckets to track the score over time.
2b) Have autovacuum cost limits adapt according to the history so that
if the maximum score of any table is trending upwards, that autovacuum
speeds up until the score buckets trend downwards towards the present.
2c) Add another GUC to define the minimum score that autovacuum will
be "proactive". Must be less than the minimum score to consider
autovacuum (or at least, ignored unless it is.).  This GUC would not
cause an autovacuum speedup due to 2b) as we'd only consider tables
which meet the GUC added in 1b) in the score history array in 2a).
This stops autovacuum running faster than autovacuum_cost_limit when
trying to be proactive.

While the above isn't well a well-baked idea. The exact way to
calculate the scores isn't well thought through, certainly. However, I
do think it's an idea that we should consider and improve upon.  I
believe 2c) helps solve the problem of large tables becoming bloated
as autovacuum could get to these sooner when the workload is low
enough for it to run proactively.

I think we need at least 1a) before we can give 

Re: [PATCH] json_lex_string: don't overread on bad UTF8

2024-05-01 Thread Jacob Champion
On Tue, Apr 30, 2024 at 11:09 PM Michael Paquier  wrote:
> Not sure to like much the fact that this advances token_terminator
> first.  Wouldn't it be better to calculate pg_encoding_mblen() first,
> then save token_terminator?  I feel a bit uneasy about saving a value
> in token_terminator past the end of the string.  That a nit in this
> context, still..

v2 tries it that way; see what you think. Is the concern that someone
might add code later that escapes that macro early?

> Hmm.  Keeping it around as currently designed means that it could
> cause more harm than anything in the long term, even in the stable
> branches if new code uses it.  There is a risk of seeing this new code
> incorrectly using it again, even if its top comment tells that we rely
> on the string being nul-terminated.  A safer alternative would be to
> redesign it so as the length of the string is provided in input,
> removing the dependency of strlen in its internals, perhaps.  Anyway,
> without any callers of it, I'd be tempted to wipe it from HEAD and
> call it a day.

Removed in v2.

> > The new test needs to record two versions of the error message, one
> > for invalid token and one for invalid escape sequence. This is
> > because, for smaller chunk sizes, the partial-token logic in the
> > incremental JSON parser skips the affected code entirely when it can't
> > find an ending double-quote.
>
> Ah, that makes sense.  That looks OK here.  A comment around the test
> would be adapted to document that, I guess.

Done.

> Advancing the tracking pointer 's' before reporting an error related
> the end of the string is a bad practive, IMO, and we should avoid
> that.  json_lex_string() does not offer a warm feeling regarding that
> with escape characters, at least :/

Yeah... I think some expansion of the json_errdetail test coverage is
probably in my future. :)

Thanks,
--Jacob


v2-0001-json_lex_string-don-t-overread-on-bad-UTF8.patch
Description: Binary data


Re: Weird test mixup

2024-05-01 Thread Noah Misch
While writing an injection point test, I encountered a variant of the race
condition that f4083c4 fixed.  It had three sessions and this sequence of
events:

s1: local-attach to POINT
s2: enter InjectionPointRun(POINT), yield CPU just before injection_callback()
s3: detach POINT, deleting the InjectionPointCondition record
s2: wake up and run POINT as though it had been non-local

On Sat, Mar 16, 2024 at 08:40:21AM +0900, Michael Paquier wrote:
> On Fri, Mar 15, 2024 at 11:23:31AM +0200, Heikki Linnakangas wrote:
> > Wrt. the spinlock and shared memory handling, I think this would be simpler
> > if you could pass some payload in the InjectionPointAttach() call, which
> > would be passed back to the callback function:
> > 
> > In this case, the payload would be the "slot index" in shared memory.
> > 
> > Or perhaps always allocate, say, 1024 bytes of working area for every
> > attached injection point that the test module can use any way it wants. Like
> > for storing extra conditions, or for the wakeup counter stuff in
> > injection_wait(). A fixed size working area is a little crude, but would be
> > very handy in practice.

That would be one good way to solve it.  (Storing a slot index has the same
race condition, but it fixes the race to store a struct containing the PID.)

The best alternative I see is to keep an InjectionPointCondition forever after
creating it.  Give it a "bool valid" field that we set on detach.  I don't see
a major reason to prefer one of these over the other.  One puts a negligible
amount of memory pressure on the main segment, but it simplifies the module
code.  I lean toward the "1024 bytes of working area" idea.  Other ideas or
opinions?


Separately, injection_points_cleanup() breaks the rules by calling
InjectionPointDetach() while holding a spinlock.  The latter has an
elog(ERROR), and reaching that elog(ERROR) leaves a stuck spinlock.  I haven't
given as much thought to solutions for this one.

Thanks,
nm




Re: Add SPLIT PARTITION/MERGE PARTITIONS commands

2024-05-01 Thread Dmitry Koval

Hi!

30.04.2024 23:15, Justin Pryzby пишет:

Is this issue already fixed ?
I wasn't able to reproduce it.  Maybe it only happened with earlier
patch versions applied ?


I think this was fixed in commit [1].

[1] 
https://github.com/postgres/postgres/commit/fcf80c5d5f0f3787e70fca8fd029d2e08a923f91


--
With best regards,
Dmitry Koval

Postgres Professional: http://postgrespro.com




Re: TLS certificate alternate trust paths issue in libpq - certificate chain validation failing

2024-05-01 Thread Jacob Champion
On Wed, May 1, 2024 at 11:57 AM Thomas Spear  wrote:
> It does fail to validate for case 4 after all. I must have had a copy/paste 
> error during past tests.

Okay, good. Glad it's behaving as expected!

> So then it sounds like putting the MS root in root.crt (as we have done to 
> fix this) is the correct thing to do, and there's no issue. It doesn't seem 
> libpq will use the trusted roots that are typically located in either 
> /etc/ssl or /etc/pki so we have to provide the root in the path where libpq 
> expects it to be to get verify-full to work properly.

Right. Versions 16 and later will let you use `sslrootcert=system` to
load those /etc locations more easily, but if the MS root isn't in the
system PKI stores and the server isn't sending the DigiCert chain then
that probably doesn't help you.

> Thanks for helping me to confirm this. I'll get a case open with MS regarding 
> the wrong root download from the portal in GovCloud.

Happy to help!

Have a good one,
--Jacob




Re: Query Discrepancy in Postgres HLL Test

2024-05-01 Thread Robert Haas
On Wed, May 1, 2024 at 1:10 PM Ayush Vatsa  wrote:
> I'm currently delving into Postgres HLL (HyperLogLog) functionality and have 
> encountered an unexpected behavior while executing queries from the 
> "cumulative_add_sparse_edge.sql" regress test. This particular test data file 
> involves three columns, with the last column representing an HLL 
> (HyperLogLog) value derived from the previous HLL value and the current raw 
> value.
>
> Upon manual inspection of the query responsible for deriving the last row's 
> HLL value, I noticed a discrepancy. When executing the query:
> """
> -- '\x148B481002' is second last rows hll value
> SELECT hll_add('\x148B481002.', hll_hashval(2561));
> """
> instead of obtaining the expected value (''\x148B481002''), I received a 
> different output which is ('\x138b48000200410061008100a1 ').

PostgreSQL has no function called hll_add or hll_hashval, and no
regression test file called cumulative_add_sparse_edge.sql. A quick
Google search suggests that these things are part of citusdata's fork
of PostgreSQL, so you might want to contact them.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: TLS certificate alternate trust paths issue in libpq - certificate chain validation failing

2024-05-01 Thread Thomas Spear
On Wed, May 1, 2024 at 12:31 PM Jacob Champion <
jacob.champ...@enterprisedb.com> wrote:

> On Wed, May 1, 2024 at 8:17 AM Thomas Spear  wrote:
> > Circling back to my original question, why is there a difference in
> behavior?
> >
> > What I believe should be happening isn't what's happening:
> > 1. If ~/.postgresql/root.crt contains the MS root, and I don't specify
> sslrootcert= -- successful validation
> > 2. If ~/.postgresql/root.crt contains the MS root, and I specify
> sslrootcert= -- successful validation
> > 3. If ~/.postgresql/root.crt contains the DigiCert root, and I don't
> specify sslrootcert= -- validation fails
> > 4. If ~/.postgresql/root.crt contains the DigiCert root, and I specify
> sslrootcert= -- successful validation
>
> Number 4 is the only one that seems off to me given what we know.


I see how that could be true.


> If you're saying that the server's chain never mentions DigiCert as an
> issuer, then I see no reason that the DigiCert root should ever
> successfully validate the chain. You mentioned on the other thread
> that
>
> > We eventually found the intermediate cert was missing from the system
> trust, so we tried adding that without success
>
> and that has me a little worried. Why would the intermediate need to
> be explicitly trusted?
>
>
Right, so just to be clear, all of the details from the other thread was
testing done in a container running on Kubernetes, so when adding the
intermediate to the "system trust" it was the container's java trust store.
When that didn't work, we removed it from the Dockerfile again so the
future builds didn't include the trust for that cert.


> I also notice from the other thread that sometimes you're testing on
> Linux and sometimes you're testing on Windows, and that you've mixed
> in a couple of different sslmodes during debugging. So I want to make
> absolutely sure: are you _certain_ that case number 4 is a true
> statement? In other words, there's nothing in your default root.crt
> except the DigiCert root, you've specified exactly the same path in
> sslrootcert as the one that's loaded by default, and your experiments
> are all using verify-full?
>

> The default can also be modified by a bunch of environmental factors,
> including $PGSSLROOTCERT, $HOME, the effective user ID, etc. (On
> Windows I don't know what the %APPDATA% conventions are.) If you empty
> out your root.crt file, you should get a clear message that libpq
> tried to load certificates from it and couldn't; otherwise, it's
> finding the default somewhere else.
>
>
I redid the command line tests to be sure, from Windows command prompt so
that I can't rely on my bash command history from AlmaLinux and instead had
to type everything out by hand.
It does fail to validate for case 4 after all. I must have had a copy/paste
error during past tests.

With no root.crt file present, the psql command complains that root.crt is
missing as well.

So then it sounds like putting the MS root in root.crt (as we have done to
fix this) is the correct thing to do, and there's no issue. It doesn't seem
libpq will use the trusted roots that are typically located in either
/etc/ssl or /etc/pki so we have to provide the root in the path where libpq
expects it to be to get verify-full to work properly.

Thanks for helping me to confirm this. I'll get a case open with MS
regarding the wrong root download from the portal in GovCloud.

--Thomas


Re: New GUC autovacuum_max_threshold ?

2024-05-01 Thread Robert Haas
On Wed, May 1, 2024 at 2:19 PM Imseih (AWS), Sami  wrote:
> > Unless I'm missing something major, that's completely bonkers. It
> > might be true that it would be a good idea to vacuum such a table more
> > often than we do at present, but there's no shot that we want to do it
> > that much more often.
>
> This is really an important point.
>
> Too small of a threshold and a/v will constantly be vacuuming a fairly large
> and busy table with many indexes.
>
> If the threshold is large, say 100 or 200 million, I question if you want 
> autovacuum
> to be doing the work of cleanup here?  That long of a period without a 
> autovacuum
> on a table means there maybe something  misconfigured in your autovacuum 
> settings.
>
> At that point aren't you just better off performing a manual vacuum and
> taking advantage of parallel index scans?

As far as that last point goes, it would be good if we taught
autovacuum about several things it doesn't currently know about;
parallelism is one. IMHO, it's probably not the most important one,
but it's certainly on the list. I think, though, that we should
confine ourselves on this thread to talking about what the threshold
ought to be.

And as far as that goes, I'd like you - and others - to spell out more
precisely why you think 100 or 200 million tuples is too much. It
might be, or maybe it is in some cases but not in others. To me,
that's not a terribly large amount of data. Unless your tuples are
very wide, it's a few tens of gigabytes. That is big enough that I can
believe that you *might* want autovacuum to run when you hit that
threshold, but it's definitely not *obvious* to me that you want
autovacuum to run when you hit that threshold.

To make that concrete: If the table is 10TB, do you want to vacuum to
reclaim 20GB of bloat? You might be vacuuming 5TB of indexes to
reclaim 20GB of heap space - is that the right thing to do? If yes,
why?

I do think it's interesting that other people seem to think we should
be vacuuming more often on tables that are substantially smaller than
the ones that seem like a big problem to me. I'm happy to admit that
my knowledge of this topic is not comprehensive and I'd like to learn
from the experience of others. But I think it's clearly and obviously
unworkable to multiply the current frequency of vacuuming for large
tables by a three or four digit number. Possibly what we need here is
something other than a cap, where, say, we vacuum a 10GB table twice
as often as now, a 100GB table four times as often, and a 1TB table
eight times as often. Or whatever the right answer is. But we can't
just pull numbers out of the air like that: we need to be able to
justify our choices. I think we all agree that big tables need to be
vacuumed more often than the current formula does, but we seem to be
rather far apart on the values of "big" and "more".

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: New GUC autovacuum_max_threshold ?

2024-05-01 Thread Imseih (AWS), Sami
I've been following this discussion and would like to add my
2 cents.

> Unless I'm missing something major, that's completely bonkers. It
> might be true that it would be a good idea to vacuum such a table more
> often than we do at present, but there's no shot that we want to do it
> that much more often. 

This is really an important point.

Too small of a threshold and a/v will constantly be vacuuming a fairly large 
and busy table with many indexes. 

If the threshold is large, say 100 or 200 million, I question if you want 
autovacuum 
to be doing the work of cleanup here?  That long of a period without a 
autovacuum 
on a table means there maybe something  misconfigured in your autovacuum 
settings. 

At that point aren't you just better off performing a manual vacuum and
taking advantage of parallel index scans?

Regards,

Sami Imseih
Amazon Web Services (AWS)










Re: cataloguing NOT NULL constraints

2024-05-01 Thread Alvaro Herrera
On 2024-Apr-25, Alvaro Herrera wrote:

> > Also, I've found a weird behaviour with a non-inherited NOT NULL
> > constraint for a partitioned table:
> > CREATE TABLE pt(a int NOT NULL NO INHERIT) PARTITION BY LIST (a);

> Ugh.  Maybe a way to handle this is to disallow NO INHERIT in
> constraints on partitioned tables altogether.  I mean, they are a
> completely useless gimmick, aren't they?

Here are two patches that I intend to push soon (hopefully tomorrow).

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No me acuerdo, pero no es cierto.  No es cierto, y si fuera cierto,
 no me acuerdo." (Augusto Pinochet a una corte de justicia)
>From 97318ac81cfe82cf52629f141b26a5a497b0913c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 1 May 2024 17:32:50 +0200
Subject: [PATCH 1/2] Disallow direct change of NO INHERIT of not-null
 constraints

We support changing NO INHERIT constraint to INHERIT for constraints in
child relations when adding a constraint to some ancestor relation, and
also during pg_upgrade's schema restore; but other than those special
cases, command ALTER TABLE ADD CONSTRAINT should not be allowed to
change an existing constraint from NO INHERIT to INHERIT, as that would
require to process child relations so that they also acquire an
appropriate constraint, which we may not be in a position to do.  (It'd
also be surprising behavior.)

It is conceivable that we want to allow ALTER TABLE SET NOT NULL to make
such a change; but in that case some more code is needed to implement it
correctly, so for now I've made that throw the same error message.

Also, during the prep phase of ALTER TABLE ADD CONSTRAINT, acquire locks
on all descendant tables; otherwise we might operate on child tables on
which no locks are held, particularly in the mode where a primary key
causes not-null constraints to be created on children.

Reported-by: Alexander Lakhin 
Discussion: https://postgr.es/m/7d923a66-55f0-3395-cd40-81c142b54...@gmail.com
---
 doc/src/sgml/ref/alter_table.sgml |  2 +-
 src/backend/catalog/heap.c| 20 +++-
 src/backend/catalog/pg_constraint.c   | 15 +++
 src/backend/commands/tablecmds.c  | 17 +++--
 src/include/catalog/pg_constraint.h   |  2 +-
 src/test/regress/expected/inherit.out | 24 +++-
 src/test/regress/sql/inherit.sql  | 15 +++
 7 files changed, 81 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index ebd8c62038..0bf11f6cb6 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -105,7 +105,7 @@ WITH ( MODULUS numeric_literal, REM
 and column_constraint is:
 
 [ CONSTRAINT constraint_name ]
-{ NOT NULL |
+{ NOT NULL [ NO INHERIT ] |
   NULL |
   CHECK ( expression ) [ NO INHERIT ] |
   DEFAULT default_expr |
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index f0278b9c01..136cc42a92 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2549,13 +2549,23 @@ AddRelationNewConstraints(Relation rel,
 
 			/*
 			 * If the column already has an inheritable not-null constraint,
-			 * we need only adjust its inheritance status and we're done.  If
-			 * the constraint is there but marked NO INHERIT, then it is
-			 * updated in the same way, but we also recurse to the children
-			 * (if any) to add the constraint there as well.
+			 * we need only adjust its coninhcount and we're done.  In certain
+			 * cases (see below), if the constraint is there but marked NO
+			 * INHERIT, then we mark it as no longer such and coninhcount
+			 * updated, plus we must also recurse to the children (if any) to
+			 * add the constraint there.
+			 *
+			 * We only allow the inheritability status to change during binary
+			 * upgrade (where it's used to add the not-null constraints for
+			 * children of tables with primary keys), or when we're recursing
+			 * processing a table down an inheritance hierarchy; directly
+			 * allowing a constraint to change from NO INHERIT to INHERIT
+			 * during ALTER TABLE ADD CONSTRAINT would be far too surprising
+			 * behavior.
 			 */
 			existing = AdjustNotNullInheritance1(RelationGetRelid(rel), colnum,
- cdef->inhcount, cdef->is_no_inherit);
+ cdef->inhcount, cdef->is_no_inherit,
+ IsBinaryUpgrade || allow_merge);
 			if (existing == 1)
 continue;		/* all done! */
 			else if (existing == -1)
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index aaf3537d3f..6b8496e085 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -721,7 +721,7 @@ extractNotNullColumn(HeapTuple constrTup)
  */
 int
 AdjustNotNullInheritance1(Oid relid, AttrNumber attnum, int count,
-		  bool is_no_inherit)
+		  bool is_no_inherit, bool allow_noinherit_change)
 {
 	HeapTuple	

Re: TLS certificate alternate trust paths issue in libpq - certificate chain validation failing

2024-05-01 Thread Jacob Champion
On Wed, May 1, 2024 at 8:17 AM Thomas Spear  wrote:
> Circling back to my original question, why is there a difference in behavior?
>
> What I believe should be happening isn't what's happening:
> 1. If ~/.postgresql/root.crt contains the MS root, and I don't specify 
> sslrootcert= -- successful validation
> 2. If ~/.postgresql/root.crt contains the MS root, and I specify sslrootcert= 
> -- successful validation
> 3. If ~/.postgresql/root.crt contains the DigiCert root, and I don't specify 
> sslrootcert= -- validation fails
> 4. If ~/.postgresql/root.crt contains the DigiCert root, and I specify 
> sslrootcert= -- successful validation

Number 4 is the only one that seems off to me given what we know. If
you're saying that the server's chain never mentions DigiCert as an
issuer, then I see no reason that the DigiCert root should ever
successfully validate the chain. You mentioned on the other thread
that

> We eventually found the intermediate cert was missing from the system trust, 
> so we tried adding that without success

and that has me a little worried. Why would the intermediate need to
be explicitly trusted?

I also notice from the other thread that sometimes you're testing on
Linux and sometimes you're testing on Windows, and that you've mixed
in a couple of different sslmodes during debugging. So I want to make
absolutely sure: are you _certain_ that case number 4 is a true
statement? In other words, there's nothing in your default root.crt
except the DigiCert root, you've specified exactly the same path in
sslrootcert as the one that's loaded by default, and your experiments
are all using verify-full?

The default can also be modified by a bunch of environmental factors,
including $PGSSLROOTCERT, $HOME, the effective user ID, etc. (On
Windows I don't know what the %APPDATA% conventions are.) If you empty
out your root.crt file, you should get a clear message that libpq
tried to load certificates from it and couldn't; otherwise, it's
finding the default somewhere else.

--Jacob




Query Discrepancy in Postgres HLL Test

2024-05-01 Thread Ayush Vatsa
Hi PostgreSQL Community,
I'm currently delving into Postgres HLL (HyperLogLog) functionality and
have encountered an unexpected behavior while executing queries from the "
cumulative_add_sparse_edge.sql
"
regress test. This particular test data file

involves
three columns, with the last column representing an HLL (HyperLogLog) value
derived from the previous HLL value and the current raw value.

Upon manual inspection of the query responsible for deriving the last row's
HLL value, I noticed a discrepancy. When executing the query:
"""
-- '\x148B481002' is second last rows hll value
SELECT hll_add('\x148B481002.', hll_hashval(2561));
"""
instead of obtaining the expected value (''\x148B481002''), I received
a different output which is ('\x138b48000200410061008100a1 ').

I am using
postgres=> select version();
   version

-
 PostgreSQL 16.1 on aarch64-unknown-linux-gnu, compiled by
aarch64-unknown-linux-gnu-gcc (GCC) 9.5.0, 64-bit

My initial assumption is that this could potentially be attributed to a
precision error. However, I'm reaching out to seek clarity on why this
disparity is occurring and to explore potential strategies for mitigating
it (as I want the behaviour to be consistent to regress test file).

Regards
Ayush Vatsa


Re: Logging which interface was connected to in log_line_prefix

2024-05-01 Thread Greg Sabino Mullane
Thank you for taking the time to review this. I've attached a new rebased
version, which has no significant changes.


> There is a comment in the patch that states:
>
> /* We do not need clean_ipv6_addr here: just report verbatim */
>
> I am not quite sure what it means, but I am guessing it means that the
> patch does not need to format the IPv6 addresses in any specific way.


Yes, basically correct. There is a kluge (their word, not mine) in
utils/adt/network.c to strip the zone - see the comment for the
clean_ipv6_addr() function in that file. I added the patch comment in case
some future person wonders why we don't "clean up" the ipv6 address, like
other places in the code base do. We don't need to pass it back to anything
else, so we can simply output the correct version, zone and all.

Cheers,
Greg


add_local_interface_to_log_line_prefix.v2.patch
Description: Binary data


Re: Support tid range scan in parallel?

2024-05-01 Thread Cary Huang
 > This isn't a complete review. It's just that this seems enough to keep
 > you busy for a while. I can look a bit harder when the patch is
 > working correctly. I think you should have enough feedback to allow
 > that now.

Thanks for the test, review and feedback. They are greatly appreciated! 
I will polish the patch some more following your feedback and share new
results / patch when I have them. 

Thanks again!

Cary






Proposal for Updating CRC32C with AVX-512 Algorithm.

2024-05-01 Thread Amonson, Paul D
Hi,

Comparing the current SSE4.2 implementation of the CRC32C algorithm in 
Postgres, to an optimized AVX-512 algorithm [0] we observed significant gains. 
The result was a ~6.6X average multiplier of increased performance measured on 
3 different Intel products. Details below. The AVX-512 algorithm in C is a port 
of the ISA-L library [1] assembler code.

Workload call size distribution details (write heavy):
   * Average was approximately around 1,010 bytes per call
   * ~80% of the calls were under 256 bytes
   * ~20% of the calls were greater than or equal to 256 bytes up to the max 
buffer size of 8192

The 256 bytes is important because if the buffer is smaller, it makes sense 
fallback to the existing implementation. This is because the AVX-512 algorithm 
needs a minimum of 256 bytes to operate.

Using the above workload data distribution, 
at 0%calls < 256 bytes, a 841% improvement on average for crc32c 
functionality was observed.
at 50%   calls < 256 bytes, a 758% improvement on average for crc32c 
functionality was observed.
at 90%   calls < 256 bytes, a 44% improvement on average for crc32c 
functionality was observed. 
at 97.6% calls < 256 bytes, the workload's crc32c performance breaks-even.
at 100%  calls < 256 bytes, a 14% regression is seen when using AVX-512 
implementation. 

The results above are averages over 3 machines, and were measured on: Intel 
Saphire Rapids bare metal, and using EC2 on AWS cloud: Intel Saphire Rapids 
(m7i.2xlarge) and Intel Ice Lake (m6i.2xlarge).

Summary Data (Saphire Rapids bare metal, AWS m7i-2xl, and AWS m6i-2xl):
+-+---+---+---++
| Rates in Bytes/us   | Bare Metal|AWS m6i-2xl|   AWS m7i-2xl   
  ||
| (Larger is Better)  
+-+-+-+-+-+-+ Overall 
Multiplier |
| | SSE 4.2 | AVX-512 | SSE 4.2 | AVX-512 | SSE 4.2 | 
AVX-512 ||
+-+-+-+-+-+-+-++
| Numbers 256-8192|  12,046 |  83,196 |   7,471 |  39,965 |  11,867 |  
84,589 |6.62|
+-+-+-+-+-+-+-++
| Numbers 64 - 255|  16,865 |  15,909 |   9,209 |   7,363 |  12,496 |  
10,046 |0.86|
+-+-+-+-+-+-+-++
|  Weighted Multiplier [*]  
  |1.44|

+-++
There was no evidence of AVX-512 frequency throttling from perf data, which 
stayed steady during the test.

Feedback on this proposed improvement is appreciated. Some questions: 
1) This AVX-512 ISA-L derived code uses BSD-3 license [2]. Is this compatible 
with the PostgreSQL License [3]? They both appear to be very permissive 
licenses, but I am not an expert on licenses. 
2) Is there a preferred benchmark I should run to test this change? 

If licensing is a non-issue, I can post the initial patch along with my 
Postgres benchmark function patch for further review.

Thanks,
Paul

[0] 
https://www.researchgate.net/publication/263424619_Fast_CRC_computation#full-text
[1] https://github.com/intel/isa-l
[2] https://opensource.org/license/bsd-3-clause
[3] https://opensource.org/license/postgresql
  
[*] Weights used were 90% of requests less than 256 bytes, 10% greater than or 
equal to 256 bytes.




Re: Document NULL

2024-05-01 Thread Thom Brown
On Wed, May 1, 2024, 16:13 David G. Johnston 
wrote:

> Hi,
>
> Over in [1] it was rediscovered that our documentation assumes the reader
> is familiar with NULL.  It seems worthwhile to provide both an introduction
> to the topic and an overview of how this special value gets handled
> throughout the system.
>
> Attached is a very rough draft attempting this, based on my own thoughts
> and those expressed by Tom in [1], which largely align with mine.
>
> I'll flesh this out some more once I get support for the goal, content,
> and placement.  On that point, NULL is a fundamental part of the SQL
> language and so having it be a section in a Chapter titled "SQL Language"
> seems to fit well, even if that falls into our tutorial.  Framing this up
> as tutorial content won't be that hard, though I've skipped on examples and
> such pending feedback.  It really doesn't fit as a top-level chapter under
> part II nor really under any of the other chapters there.  The main issue
> with the tutorial is the forward references to concepts not yet discussed
> but problem points there can be addressed.
>
> I do plan to remove the entity reference and place the content into
> query.sgml directly in the final version.  It is just much easier to write
> an entire new section in its own file.
>
> David J.
>
> [1]
> https://www.postgresql.org/message-id/1859814.1714532025%40sss.pgh.pa.us
>

"The cardinal rule, NULL is never equal or unequal to any non-null value."

This implies that a NULL is generally equal or unequal to another NULL.
While this can be true (e.g. in aggregates), in general it is not. Perhaps
immediately follow it with something along the lines of "In most cases NULL
is also not considered equal or unequal to any other NULL (i.e. NULL = NULL
will return NULL), but there are occasional exceptions, which will be
explained further on."

Regards

Thom


Re: TLS certificate alternate trust paths issue in libpq - certificate chain validation failing

2024-05-01 Thread Thomas Spear
On Wed, May 1, 2024 at 9:23 AM Jacob Champion <
jacob.champ...@enterprisedb.com> wrote:

> On Wed, May 1, 2024 at 6:48 AM Thomas Spear  wrote:
> > I dumped out the certificates presented by the server using openssl, and
> the chain that gets output includes "Microsoft Azure RSA TLS Issuing CA 08".
> > On https://www.microsoft.com/pkiops/docs/repository.htm the page says
> that that cert was cross-signed by the DigiCert RSA G2 root.
>
> It's been a while since I've looked at cross-signing, but that may not
> be enough information to prove that it's the "correct" version of the
> intermediate. You'd need to know the Issuer, not just the Subject, for
> all the intermediates that were given to the client. (It may not match
> the one they have linked on their support page.)
>
>
Fair enough. The server issuer is C=US,O=Microsoft Corporation,CN=Microsoft
Azure RSA TLS Issuing CA 08
The intermediate's issuer is C=US,O=Microsoft Corporation,CN=Microsoft RSA
Root Certificate Authority 2017 so I think that you're absolutely correct.
The intermediate on the support page reflects the DigiCert issuer, but the
one from the server reflects the Microsoft issuer.

Circling back to my original question, why is there a difference in
behavior?

What I believe should be happening isn't what's happening:
1. If ~/.postgresql/root.crt contains the MS root, and I don't specify
sslrootcert= -- successful validation
2. If ~/.postgresql/root.crt contains the MS root, and I specify
sslrootcert= -- successful validation
3. If ~/.postgresql/root.crt contains the DigiCert root, and I don't
specify sslrootcert= -- validation fails
4. If ~/.postgresql/root.crt contains the DigiCert root, and I specify
sslrootcert= -- successful validation

Case 3 should succeed IMHO since case 4 does.


> > The postgres server appears to send the Microsoft root certificate
> instead of the DigiCert one, which should be fine. The server sends the
> "Microsoft RSA Root Certificate Authority 2017" root.
> > As far as I understand, a server sending a root certificate along with
> the intermediate is a big no-no, but that's a topic for a different thread
> and audience most likely. :)
>
> To me, that only makes me more suspicious that the chain the server is
> sending you may not be the chain you're expecting. Especially since
> you mentioned on the other thread that the MS root is working and the
> DigiCert root is not.
>
>
Yeah, I agree. So then I need to talk to MS about why the portal is giving
us the wrong root -- and I'll open a support ticket with them for this. I
still don't understand why the above difference in behavior happens though.
Is that specifically because the server is sending the MS root? Still
doesn't seem to make a whole lot of sense. If the DigiCert root can
validate the chain when it's explicitly passed, it should be able to
validate the chain when it's implicitly the only root cert available to a
postgres client.


> > The openssl version in my Windows test system is 3.0.7. It's running
> Almalinux 9 in WSL2, so openssl is from the package manager. The container
> image I'm using has an old-as-dirt openssl 1.1.1k.
>
> I'm not aware of any validation issues with 1.1.1k, for what it's
> worth. If upgrading helps, great! -- but I wouldn't be surprised if it
> didn't.
>
>
I was thinking the same honestly. If it breaks for jdbc-postgres on 1.1.1k
and psql cli on 3.0.7 then it's likely not an issue there.

> I'll have to check one of our public cloud postgres instances to see if I
> can reproduce the issue there in order to get a chain that I can share
> because the system where I'm testing is a locked down jump host to our
> Azure GovCloud infrastructure, and I can't copy anything out from it.
>
> Yeah, if at all possible, that'd make it easier to point at any
> glaring problems.
>
>
I should be able to do this today.

Thanks again!

--Thomas


Document NULL

2024-05-01 Thread David G. Johnston
Hi,

Over in [1] it was rediscovered that our documentation assumes the reader
is familiar with NULL.  It seems worthwhile to provide both an introduction
to the topic and an overview of how this special value gets handled
throughout the system.

Attached is a very rough draft attempting this, based on my own thoughts
and those expressed by Tom in [1], which largely align with mine.

I'll flesh this out some more once I get support for the goal, content, and
placement.  On that point, NULL is a fundamental part of the SQL language
and so having it be a section in a Chapter titled "SQL Language" seems to
fit well, even if that falls into our tutorial.  Framing this up as
tutorial content won't be that hard, though I've skipped on examples and
such pending feedback.  It really doesn't fit as a top-level chapter under
part II nor really under any of the other chapters there.  The main issue
with the tutorial is the forward references to concepts not yet discussed
but problem points there can be addressed.

I do plan to remove the entity reference and place the content into
query.sgml directly in the final version.  It is just much easier to write
an entire new section in its own file.

David J.

[1] https://www.postgresql.org/message-id/1859814.1714532025%40sss.pgh.pa.us
From a068247e92e620455a925a0ae746adc225ae1339 Mon Sep 17 00:00:00 2001
From: "David G. Johnston" 
Date: Wed, 1 May 2024 07:45:48 -0700
Subject: [PATCH] Document NULL

---
 doc/src/sgml/filelist.sgml |  1 +
 doc/src/sgml/null.sgml | 79 ++
 doc/src/sgml/query.sgml|  2 +
 3 files changed, 82 insertions(+)
 create mode 100644 doc/src/sgml/null.sgml

diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 38ec362d8f..ac4fd52978 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -10,6 +10,7 @@
 
 
 
+
 
 
 
diff --git a/doc/src/sgml/null.sgml b/doc/src/sgml/null.sgml
new file mode 100644
index 00..5f95b2494e
--- /dev/null
+++ b/doc/src/sgml/null.sgml
@@ -0,0 +1,79 @@
+
+ Handling Unkowns (NULL)
+
+ 
+  NULL
+ 
+
+ 
+  Looking again at our example weather data you will note that we do not know
+  the amount of precipitation Hayward.  We communicated that implicitly by
+  not including the prcp column in the insert.  Explicitly, we can communicate
+  this fact by writing.  [example using null].  When a column is not specified
+  in an insert the default value for that column is recorded and the default
+  default value is NULL.
+ 
+
+ 
+  As a value NULL crops up all throughout the database and interacts with many
+  features.  The main portion of this book will detail the interactions specific
+  features have with NULL but it is helpful to have a reference page where one
+  can get an overview.
+ 
+
+ 
+  First, like all values, NULLs are typed.  But since any value can be unknown
+  NULL is a valid value for all data types.
+ 
+
+ 
+  Second, when speaking generally NULL is assumed to mean unknown.  However,
+  in practice meaning comes from context and so a model design may state that
+  NULL is to be used to represent "not applicable" - i.e., that a value is not
+  even possible.  SQL has only the single value NULL while there are multiple
+  concepts that people have chosen to apply it to.  In any case the behavior
+  of the system when dealing with NULL is the same regardless of the meaning
+  the given to it in the surrounding context.
+ 
+
+ 
+  The cardinal rule, NULL is never equal or unequal to any non-null
+  value; and when asked to be combined with a known value in an operation the
+  result of the operation becomes unknown.  e.g., both 1 = NULL and 1 + NULL
+  result in NULL.  Exceptions to this are documented.  See [chapter] for
+  details on how to test for null.  Specifically, note that concept of
+  distinctness is introduced to allow for true/false equality tests.
+ 
+
+ 
+  Extending from the previous point, function calls are truly a mixed bag.
+  Aggregate functions in particular will usually just ignore NULL inputs
+  instead of forcing the entire aggregate result to NULL.  Function
+  specifications has a "strictness" attribute that, when set to "strict"
+  (a.k.a. "null on null input") will tell the executor to return NULL for any
+  function call having at least one NULL input value, without executing the
+  function.
+ 
+
+ 
+  A WHERE clause that evaluates to NULL for a given row will exclude that row.
+  This was demonstrated in the tutorial query where cities with prcp > 0 were
+  requested and Hayward was not returned due to this and the cardinal rule.
+ 
+
+ 
+  While not yet discussed, it is possible to define validation expressions on
+  tables that ensure only values passing those expressions are inserted.  While
+  this seems like it would behave as a WHERE clause on a table, the choice here
+  when an expression evaulates to NULL is allow the row to be inserted.  See
+  [check constraints] in create table for 

Re: TLS certificate alternate trust paths issue in libpq - certificate chain validation failing

2024-05-01 Thread Jacob Champion
On Wed, May 1, 2024 at 6:48 AM Thomas Spear  wrote:
> I dumped out the certificates presented by the server using openssl, and the 
> chain that gets output includes "Microsoft Azure RSA TLS Issuing CA 08".
> On https://www.microsoft.com/pkiops/docs/repository.htm the page says that 
> that cert was cross-signed by the DigiCert RSA G2 root.

It's been a while since I've looked at cross-signing, but that may not
be enough information to prove that it's the "correct" version of the
intermediate. You'd need to know the Issuer, not just the Subject, for
all the intermediates that were given to the client. (It may not match
the one they have linked on their support page.)

> The postgres server appears to send the Microsoft root certificate instead of 
> the DigiCert one, which should be fine. The server sends the "Microsoft RSA 
> Root Certificate Authority 2017" root.
> As far as I understand, a server sending a root certificate along with the 
> intermediate is a big no-no, but that's a topic for a different thread and 
> audience most likely. :)

To me, that only makes me more suspicious that the chain the server is
sending you may not be the chain you're expecting. Especially since
you mentioned on the other thread that the MS root is working and the
DigiCert root is not.

> The openssl version in my Windows test system is 3.0.7. It's running 
> Almalinux 9 in WSL2, so openssl is from the package manager. The container 
> image I'm using has an old-as-dirt openssl 1.1.1k.

I'm not aware of any validation issues with 1.1.1k, for what it's
worth. If upgrading helps, great! -- but I wouldn't be surprised if it
didn't.

> I'll have to check one of our public cloud postgres instances to see if I can 
> reproduce the issue there in order to get a chain that I can share because 
> the system where I'm testing is a locked down jump host to our Azure GovCloud 
> infrastructure, and I can't copy anything out from it.

Yeah, if at all possible, that'd make it easier to point at any
glaring problems.

Thanks,
--Jacob




Re: TLS certificate alternate trust paths issue in libpq - certificate chain validation failing

2024-05-01 Thread Thomas Spear
On Tue, Apr 30, 2024 at 5:19 PM Jacob Champion <
jacob.champ...@enterprisedb.com> wrote:

On Tue, Apr 30, 2024 at 2:41 PM Thomas Spear  wrote:
> The full details can be found at github.com/pgjdbc/pgjdbc/discussions/3236 -
in summary, both jdbc-postgres and the psql cli seem to be affected by an
issue validating the certificate chain up to a publicly trusted root
certificate that has cross-signed an intermediate certificate coming from a
Postgres server in Azure, when using sslmode=verify-full and trying to rely
on the default path for sslrootcert.

Hopefully someone more familiar with the Azure cross-signing setup
sees something obvious and chimes in, but in the meantime there are a
couple things I can think to ask:

1. Are you sure that the server is actually putting the cross-signed
intermediate in the chain it's serving to the client?


Hello Jacob, thanks for your reply.

I can say I'm reasonably certain. I dumped out the certificates presented
by the server using openssl, and the chain that gets output includes
"Microsoft Azure RSA TLS Issuing CA 08".
On https://www.microsoft.com/pkiops/docs/repository.htm the page says that
that cert was cross-signed by the DigiCert RSA G2 root.
The postgres server appears to send the Microsoft root certificate instead
of the DigiCert one, which should be fine. The server sends the "Microsoft
RSA Root Certificate Authority 2017" root.
As far as I understand, a server sending a root certificate along with the
intermediate is a big no-no, but that's a topic for a different thread and
audience most likely. :)

2. What version of OpenSSL? There used to be validation bugs with
alternate trust paths; hopefully you're not using any of those (I
think they're old as dirt), but it doesn't hurt to know.


The openssl version in my Windows test system is 3.0.7. It's running
Almalinux 9 in WSL2, so openssl is from the package manager. The container
image I'm using has an old-as-dirt openssl 1.1.1k. It's built using a RHEL
UBI8 image as the base layer, so it doesn't surprise me that the package
manager-provided version of openssl here is old as dirt, so I'll have to
look at making a build of 3.x for this container or maybe switching out the
base layer to ubuntu temporarily to test if we need to.


3. Can you provide a sample public certificate chain that should
validate and doesn't?


I'll get back to you on this one. I'll have to check one of our public
cloud postgres instances to see if I can reproduce the issue there in order
to get a chain that I can share because the system where I'm testing is a
locked down jump host to our Azure GovCloud infrastructure, and I can't
copy anything out from it.

Thanks again

--Thomas


Re: Refactoring backend fork+exec code

2024-05-01 Thread Anton A. Melnikov



On 28.04.2024 22:36, Heikki Linnakangas wrote:

Peter E noticed and Michael fixed them in commit 768ceeeaa1 already.


Didn't check that is already fixed in the current master. Sorry!
Thanks for pointing this out!

With the best wishes,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Removing unneeded self joins

2024-05-01 Thread Alexander Korotkov
On Wed, May 1, 2024 at 2:00 PM Alexander Lakhin  wrote:
> 30.04.2024 13:20, Alexander Korotkov wrote:
> > On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin  
> > wrote:
> >> I've discovered another failure, introduced by d3d55ce57.
> >> Please try the following:
> >> CREATE TABLE t (a int unique, b float);
> >> SELECT * FROM t NATURAL JOIN LATERAL
> >>(SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;
> > I think we should just forbid SJE in case when relations to be merged
> > have cross-references with lateral vars.  The draft patch for this is
> > attached.  I'd like to ask Alexander to test it, Richard and Andrei to
> > review it.  Thank you!
>
> Beside LATERAL vars, it seems that SJR doesn't play well with TABLESAMPLE
> in general. For instance:
> CREATE TABLE t (a int unique);
> INSERT INTO t SELECT * FROM generate_series (1,100);
>
> SELECT COUNT(*) FROM (SELECT * FROM t TABLESAMPLE BERNOULLI(1)) t1
>   NATURAL JOIN (SELECT * FROM t TABLESAMPLE BERNOULLI(100)) t2;
> returned 100, 100, 100 for me, though with enable_self_join_removal = off,
> I got 4, 0, 1...

Right, thank you for reporting this.

BTW, I found another case where my previous fix doesn't work.

SELECT * FROM t NATURAL JOIN LATERAL (SELECT * FROM t t2 TABLESAMPLE
SYSTEM (t.b) NATURAL JOIN LATERAL(SELECT * FROM t t3 TABLESAMPLE
SYSTEM (t2.b)) t3) t2;

I think we probably could forbid SJE for the tables with TABLESAMPLE
altogether.  Please, check the attached patch.

--
Regards,
Alexander Korotkov


sje_tablesample.patch
Description: Binary data


Re: SQL:2011 application time

2024-05-01 Thread jian he
On Wed, May 1, 2024 at 12:39 AM Paul Jungwirth
 wrote:
>
> On 4/30/24 09:24, Robert Haas wrote:
> > Peter, could you have a look at
> > http://postgr.es/m/47550967-260b-4180-9791-b224859fe...@illuminatedcomputing.com
> > and express an opinion about whether each of those proposals are (a)
> > good or bad ideas and (b) whether they need to be fixed for the
> > current release?
>
> Here are the same patches but rebased. I've added a fourth which is my 
> progress on adding the CHECK
> constraint. I don't really consider it finished though, because it has these 
> problems:
>
> - The CHECK constraint should be marked as an internal dependency of the PK, 
> so that you can't drop
> it, and it gets dropped when you drop the PK. I don't see a good way to tie 
> the two together though,
> so I'd appreciate any advice there. They are separate AlterTableCmds, so how 
> do I get the
> ObjectAddress of both constraints at the same time? I wanted to store the 
> PK's ObjectAddress on the
> Constraint node, but since ObjectAddress isn't a Node it doesn't work.
>
> - The CHECK constraint should maybe be hidden when you say `\d foo`? Or maybe 
> not, but that's what
> we do with FK triggers.
>
> - When you create partitions you get a warning about the constraint already 
> existing, because it
> gets created via the PK and then also the partitioning code tries to copy it. 
> Solving the first
> issue here should solve this nicely though.
>
> Alternately we could just fix the GROUP BY functional dependency code to only 
> accept b-tree indexes.
> But I think the CHECK constraint approach is a better solution.
>

I will consider these issues later.
The following are general ideas after applying your patches.

CREATE TABLE temporal_rng1(
id int4range,
valid_at daterange,
CONSTRAINT temporal_rng1_pk unique (id, valid_at WITHOUT OVERLAPS)
);
insert into temporal_rng1(id, valid_at) values (int4range '[1,1]',
'empty'::daterange), ('[1,1]', 'empty');
table temporal_rng1;
  id   | valid_at
---+--
 [1,2) | empty
 [1,2) | empty
(2 rows)

i hope i didn't miss something:
exclude the 'empty' special value, WITHOUT OVERLAP constraint will be
unique and is more restrictive?

if so,
then adding a check constraint to make the WITHOUT OVERLAP not include
the special value 'empty'
is better than
writing a doc explaining that on some special occasion, a unique
constraint is not meant to be unique
?

in here
https://www.postgresql.org/docs/devel/ddl-constraints.html#DDL-CONSTRAINTS-UNIQUE-CONSTRAINTS
says:
<<
Unique constraints ensure that the data contained in a column, or a
group of columns, is unique among all the rows in the table.
<<

+ /*
+ * The WITHOUT OVERLAPS part (if any) must be
+ * a range or multirange type.
+ */
+ if (constraint->without_overlaps && lc == list_last_cell(constraint->keys))
+ {
+ Oid typid = InvalidOid;
+
+ if (!found && cxt->isalter)
+ {
+ /*
+ * Look up the column type on existing table.
+ * If we can't find it, let things fail in DefineIndex.
+ */
+ Relation rel = cxt->rel;
+ for (int i = 0; i < rel->rd_att->natts; i++)
+ {
+ Form_pg_attribute attr = TupleDescAttr(rel->rd_att, i);
+ const char *attname;
+
+ if (attr->attisdropped)
+ break;
+
+ attname = NameStr(attr->attname);
+ if (strcmp(attname, key) == 0)
+ {
+ typid = attr->atttypid;
+ break;
+ }
+ }
+ }
+ else
+ typid = typenameTypeId(NULL, column->typeName);
+
+ if (OidIsValid(typid) && !type_is_range(typid) && !type_is_multirange(typid))
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("column \"%s\" in WITHOUT OVERLAPS is not a range or
multirange type", key),
+ parser_errposition(cxt->pstate, constraint->location)));
+ }

+ if (attr->attisdropped)
+ break;
it will break the loop?
but here you want to continue the loop?

+ if (OidIsValid(typid) && !type_is_range(typid) && !type_is_multirange(typid))
didn't consider the case where typid is InvalidOid,
maybe we can simplify to
+ if (!type_is_range(typid) && !type_is_multirange(typid))


+ notnullcmds = lappend(notnullcmds, notemptycmd);
seems weird.
we can imitate notnullcmds related logic for notemptycmd,
not associated notnullcmds in any way.




Re: Removing unneeded self joins

2024-05-01 Thread Alexander Lakhin

30.04.2024 13:20, Alexander Korotkov wrote:

On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin  wrote:

I've discovered another failure, introduced by d3d55ce57.
Please try the following:
CREATE TABLE t (a int unique, b float);
SELECT * FROM t NATURAL JOIN LATERAL
   (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;

I think we should just forbid SJE in case when relations to be merged
have cross-references with lateral vars.  The draft patch for this is
attached.  I'd like to ask Alexander to test it, Richard and Andrei to
review it.  Thank you!


Beside LATERAL vars, it seems that SJR doesn't play well with TABLESAMPLE
in general. For instance:
CREATE TABLE t (a int unique);
INSERT INTO t SELECT * FROM generate_series (1,100);

SELECT COUNT(*) FROM (SELECT * FROM t TABLESAMPLE BERNOULLI(1)) t1
 NATURAL JOIN (SELECT * FROM t TABLESAMPLE BERNOULLI(100)) t2;
returned 100, 100, 100 for me, though with enable_self_join_removal = off,
I got 4, 0, 1...

Best regards,
Alexander




Re: Control flow in logical replication walsender

2024-05-01 Thread Ashutosh Bapat
On Tue, Apr 30, 2024 at 11:28 PM Christophe Pettus  wrote:

>
> Hi,
>
> I wanted to check my understanding of how control flows in a walsender
> doing logical replication.  My understanding is that the (single) thread in
> each walsender process, in the simplest case, loops on:
>
> 1. Pull a record out of the WAL.
> 2. Pass it to the recorder buffer code, which,
> 3. Sorts it out into the appropriate in-memory structure for that
> transaction (spilling to disk as required), and then continues with #1, or,
> 4. If it's a commit record, it iteratively passes the transaction data one
> change at a time to,
> 5. The logical decoding plugin, which returns the output format of that
> plugin, and then,
> 6. The walsender sends the output from the plugin to the client. It cycles
> on passing the data to the plugin and sending it to the client until it
> runs out of changes in that transaction, and then resumes reading the WAL
> in #1.
>
>
This is correct barring some details on master.


> In particular, I wanted to confirm that while it is pulling the reordered
> transaction and sending it to the plugin (and thence to the client), that
> particular walsender is *not* reading new WAL records or putting them in
> the reorder buffer.
>
>
This is correct.


> The specific issue I'm trying to track down is an enormous pileup of spill
> files.  This is in a non-supported version of PostgreSQL (v11), so an
> upgrade may fix it, but at the moment, I'm trying to find a cause and a
> mitigation.
>
>
Is there a large transaction which is failing to be replicated repeatedly -
timeouts, crashes on upstream or downstream?

-- 
Best Wishes,
Ashutosh Bapat


Re: Support tid range scan in parallel?

2024-05-01 Thread David Rowley
On Wed, 1 May 2024 at 07:10, Cary Huang  wrote:
> Yes of course. These numbers were obtained earlier this year on master with 
> the patch applied most likely without the read stream code you mentioned. The 
> patch attached here is rebased to commit 
> dd0183469bb779247c96e86c2272dca7ff4ec9e7 on master, which is quite recent and 
> should have the read stream code for v17 as I can immediately tell that the 
> serial scans run much faster now in my setup. I increased the records on the 
> test table from 40 to 100 million because serial scans are much faster now. 
> Below is the summary and details of my test. Note that I only include the 
> EXPLAIN ANALYZE details of round1 test. Round2 is the same except for 
> different execution times.

It would be good to see the EXPLAIN (ANALYZE, BUFFERS) with SET
track_io_timing = 1;

Here's a quick review

1. Does not produce correct results:

-- serial plan
postgres=# select count(*) from t where ctid >= '(0,0)' and ctid < '(10,0)';
 count
---
  2260
(1 row)

-- parallel plan
postgres=# set max_parallel_workers_per_gather=2;
SET
postgres=# select count(*) from t where ctid >= '(0,0)' and ctid < '(10,0)';
 count
---
 0
(1 row)

I've not really looked into why, but I see you're not calling
heap_setscanlimits() in parallel mode. You need to somehow restrict
the block range of the scan to the range specified in the quals. You
might need to do more work to make the scan limits work with parallel
scans.

If you look at heap_scan_stream_read_next_serial(), it's calling
heapgettup_advance_block(), where there's  "if (--scan->rs_numblocks
== 0)".  But no such equivalent code in
table_block_parallelscan_nextpage() called by
heap_scan_stream_read_next_parallel().  To make Parallel TID Range
work, you'll need heap_scan_stream_read_next_parallel() to abide by
the scan limits.

2. There's a 4 line comment you've added to cost_tidrangescan() which
is just a copy and paste from cost_seqscan().  If you look at the
seqscan costing, the comment is true in that scenario, but not true in
where you've pasted it.  The I/O cost is all tied in to run_cost.

+ /* The CPU cost is divided among all the workers. */
+ run_cost /= parallel_divisor;
+
+ /*
+ * It may be possible to amortize some of the I/O cost, but probably
+ * not very much, because most operating systems already do aggressive
+ * prefetching.  For now, we assume that the disk run cost can't be
+ * amortized at all.
+ */

3. Calling TidRangeQualFromRestrictInfoList() once for the serial path
and again for the partial path isn't great. It would be good to just
call that function once and use the result for both path types.

4. create_tidrangescan_subpaths() seems like a weird name for a
function.  That seems to imply that scans have subpaths. Scans are
always leaf paths and have no subpaths.

This isn't a complete review. It's just that this seems enough to keep
you busy for a while. I can look a bit harder when the patch is
working correctly. I think you should have enough feedback to allow
that now.

David




Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-05-01 Thread John Naylor
On Thu, Apr 25, 2024 at 8:36 AM Masahiko Sawada  wrote:
>
> On Mon, Apr 15, 2024 at 6:12 PM John Naylor  wrote:

> > - RT_KEY_GET_SHIFT is not covered for key=0:
> >
> > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L803
> >
> > That should be fairly simple to add to the tests.
>
> There are two paths to call RT_KEY_GET_SHIFT():
>
> 1. RT_SET() -> RT_KEY_GET_SHIFT()
> 2. RT_SET() -> RT_EXTEND_UP() -> RT_KEY_GET_SHIFT()
>
> In both cases, it's called when key > tree->ctl->max_val. Since the
> minimum value of max_val is 255, RT_KEY_GET_SHIFT() is never called
> when key=0.

Ah, right, so it is dead code. Nothing to worry about, but it does
point the way to some simplifications, which I've put together in the
attached.

> > - RT_DELETE: "if (key > tree->ctl->max_val)" is not covered:
> >
> > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L2644
> >
> > That should be easy to add.
>
> Agreed. The patch is attached.

LGTM

> > - TidStoreCreate* has some memory clamps that are not covered:
> >
> > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/common/tidstore.c.gcov.html#L179
> > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/common/tidstore.c.gcov.html#L234
> >
> > Maybe we could experiment with using 1MB for shared, and something
> > smaller for local.
>
> I've confirmed that the local and shared tidstore with small max sizes
> such as 4kB and 1MB worked. Currently the max size is hard-coded in
> test_tidstore.c but if we use work_mem as the max size, we can pass
> different max sizes for local and shared in the test script.

Seems okay, do you want to try that and see how it looks?
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index 2896a6efc5..fdac103763 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -217,7 +217,6 @@
 #define RT_NODE_48_GET_CHILD RT_MAKE_NAME(node_48_get_child)
 #define RT_NODE_256_IS_CHUNK_USED RT_MAKE_NAME(node_256_is_chunk_used)
 #define RT_NODE_256_GET_CHILD RT_MAKE_NAME(node_256_get_child)
-#define RT_KEY_GET_SHIFT RT_MAKE_NAME(key_get_shift)
 #define RT_SHIFT_GET_MAX_VAL RT_MAKE_NAME(shift_get_max_val)
 #define RT_NODE_SEARCH RT_MAKE_NAME(node_search)
 #define RT_NODE_DELETE RT_MAKE_NAME(node_delete)
@@ -320,9 +319,6 @@ RT_SCOPE void RT_STATS(RT_RADIX_TREE * tree);
 /* Mask for extracting a chunk from a key */
 #define RT_CHUNK_MASK ((1 << RT_SPAN) - 1)
 
-/* Maximum shift needed to extract a chunk from a key */
-#define RT_MAX_SHIFT	RT_KEY_GET_SHIFT(UINT64_MAX)
-
 /* Maximum level a tree can reach for a key */
 #define RT_MAX_LEVEL	((sizeof(uint64) * BITS_PER_BYTE) / RT_SPAN)
 
@@ -797,28 +793,15 @@ RT_NODE_256_GET_CHILD(RT_NODE_256 * node, uint8 chunk)
 	return >children[chunk];
 }
 
-/*
- * Return the smallest shift that will allowing storing the given key.
- */
-static inline int
-RT_KEY_GET_SHIFT(uint64 key)
-{
-	if (key == 0)
-		return 0;
-	else
-		return (pg_leftmost_one_pos64(key) / RT_SPAN) * RT_SPAN;
-}
-
 /*
  * Return the max value that can be stored in the tree with the given shift.
  */
 static uint64
 RT_SHIFT_GET_MAX_VAL(int shift)
 {
-	if (shift == RT_MAX_SHIFT)
-		return UINT64_MAX;
-	else
-		return (UINT64CONST(1) << (shift + RT_SPAN)) - 1;
+	int max_shift = (sizeof(uint64) - 1) * BITS_PER_BYTE;
+
+	return UINT64_MAX >> (max_shift - shift);
 }
 
 /*
@@ -1574,9 +1557,8 @@ RT_NODE_INSERT(RT_RADIX_TREE * tree, RT_PTR_ALLOC * parent_slot, RT_CHILD_PTR no
  * and move the old node below it.
  */
 static pg_noinline void
-RT_EXTEND_UP(RT_RADIX_TREE * tree, uint64 key)
+RT_EXTEND_UP(RT_RADIX_TREE * tree, uint64 key, int target_shift)
 {
-	int			target_shift = RT_KEY_GET_SHIFT(key);
 	int			shift = tree->ctl->start_shift;
 
 	Assert(shift < target_shift);
@@ -1713,11 +1695,15 @@ RT_SET(RT_RADIX_TREE * tree, uint64 key, RT_VALUE_TYPE * value_p)
 	/* Extend the tree if necessary */
 	if (unlikely(key > tree->ctl->max_val))
 	{
+		int			start_shift;
+
+		/* compute the smallest shift that will allowing storing the key */
+		start_shift = pg_leftmost_one_pos64(key) / RT_SPAN * RT_SPAN;
+
 		if (tree->ctl->num_keys == 0)
 		{
 			RT_CHILD_PTR node;
 			RT_NODE_4  *n4;
-			int			start_shift = RT_KEY_GET_SHIFT(key);
 
 			/*
 			 * With an empty root node, we don't extend the tree upwards,
@@ -1738,7 +1724,7 @@ RT_SET(RT_RADIX_TREE * tree, uint64 key, RT_VALUE_TYPE * value_p)
 			goto have_slot;
 		}
 		else
-			RT_EXTEND_UP(tree, key);
+			RT_EXTEND_UP(tree, key, start_shift);
 	}
 
 	slot = RT_GET_SLOT_RECURSIVE(tree, >ctl->root,
@@ -2937,7 +2923,6 @@ RT_DUMP_NODE(RT_NODE * node)
 #undef RT_SPAN
 #undef RT_NODE_MAX_SLOTS
 #undef RT_CHUNK_MASK
-#undef RT_MAX_SHIFT
 #undef RT_MAX_LEVEL
 #undef RT_GET_KEY_CHUNK
 #undef RT_BM_IDX
@@ -3032,7 +3017,6 @@ RT_DUMP_NODE(RT_NODE * node)
 #undef RT_NODE_48_GET_CHILD
 #undef RT_NODE_256_IS_CHUNK_USED
 #undef RT_NODE_256_GET_CHILD
-#undef 

Re: [PATCH] json_lex_string: don't overread on bad UTF8

2024-05-01 Thread Michael Paquier
On Tue, Apr 30, 2024 at 10:39:04AM -0700, Jacob Champion wrote:
> When json_lex_string() hits certain types of invalid input, it calls
> pg_encoding_mblen_bounded(), which assumes that its input is
> null-terminated and calls strnlen(). But the JSON lexer is constructed
> with an explicit string length, and we don't ensure that the string is
> null-terminated in all cases, so we can walk off the end of the
> buffer. This isn't really relevant on the server side, where you'd
> have to get a superuser to help you break string encodings, but for
> client-side usage on untrusted input (such as my OAuth patch) it would
> be more important.

Not sure to like much the fact that this advances token_terminator
first.  Wouldn't it be better to calculate pg_encoding_mblen() first,
then save token_terminator?  I feel a bit uneasy about saving a value
in token_terminator past the end of the string.  That a nit in this
context, still..

> Attached is a draft patch that explicitly checks against the
> end-of-string pointer and clamps the token_terminator to it. Note that
> this removes the only caller of pg_encoding_mblen_bounded() and I'm
> not sure what we should do with that function. It seems like a
> reasonable API, just not here.

Hmm.  Keeping it around as currently designed means that it could
cause more harm than anything in the long term, even in the stable
branches if new code uses it.  There is a risk of seeing this new code
incorrectly using it again, even if its top comment tells that we rely
on the string being nul-terminated.  A safer alternative would be to
redesign it so as the length of the string is provided in input,
removing the dependency of strlen in its internals, perhaps.  Anyway,
without any callers of it, I'd be tempted to wipe it from HEAD and
call it a day.

> The new test needs to record two versions of the error message, one
> for invalid token and one for invalid escape sequence. This is
> because, for smaller chunk sizes, the partial-token logic in the
> incremental JSON parser skips the affected code entirely when it can't
> find an ending double-quote.

Ah, that makes sense.  That looks OK here.  A comment around the test
would be adapted to document that, I guess.  

> Tangentially: Should we maybe rethink pieces of the json_lex_string
> error handling? For example, do we really want to echo an incomplete
> multibyte sequence once we know it's bad? It also looks like there are
> places where the FAIL_AT_CHAR_END macro is called after the `s`
> pointer has already advanced past the code point of interest. I'm not
> sure if that's intentional.

Advancing the tracking pointer 's' before reporting an error related
the end of the string is a bad practive, IMO, and we should avoid
that.  json_lex_string() does not offer a warm feeling regarding that
with escape characters, at least :/
--
Michael


signature.asc
Description: PGP signature