Re: DROP OWNED BY fails to clean out pg_init_privs grants

2024-04-26 Thread Daniel Gustafsson
> On 21 Apr 2024, at 23:08, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>>> On 6 Apr 2024, at 01:10, Tom Lane  wrote:
>>> (So now I'm wondering why *only* copperhead has shown this so far.
>>> Are our other cross-version-upgrade testing animals AWOL?)
> 
>> Clicking around searching for Xversion animals I didn't spot any, but without
>> access to the database it's nontrivial to know which animal does what.
> 
> I believe I see why this is (or isn't) happening.  The animals
> currently running xversion tests are copperhead, crake, drongo,
> and fairywren.  copperhead is using the makefiles while the others
> are using meson.  And I find this in
> src/test/modules/test_pg_dump/meson.build (from 3f0e786cc):
> 
># doesn't delete its user
>'runningcheck': false,
> 
> So the meson animals are not running the test that sets up the
> problematic data.

ugh =/

> I think we should remove the above, since (a) the reason to have
> it is gone, and (b) it seems really bad that the set of tests
> run by meson is different from that run by the makefiles.

Agreed.

> However, once we do that, those other three animals will presumably go
> red, greatly complicating detection of any Windows-specific problems.
> So I'm inclined to not do it till just before we intend to commit
> a fix for the underlying problem.  (Enough before that we can confirm
> that they do go red.)

Agreed, we definitely want that but compromising the ability to find Windows
issues at this point in the cycle seems bad.

> ... were you going to look at it?  I can take a whack if it's
> too far down your priority list.

I took a look at this, reading code and the linked thread.  My gut feeling is
that Stephen is right in that the underlying bug is these privileges ending up
in pg_init_privs to begin with.  That being said, I wasn't able to fix that in
a way that doesn't seem like a terrible hack.  The attached POC hack fixes it
for me but I'm not sure how to fix it properly. Your wisdom would be much 
appreciated.

Clusters which already has such entries aren't helped by a fix for this though,
fixing that would either require pg_dump to skip them, or pg_upgrade to have a
check along with instructions for fixing the issue.  Not sure what's the best
strategy here, the lack of complaints could indicate this isn't terribly common
so spending cycles on it for every pg_dump might be excessive compared to a
pg_upgrade check?

--
Daniel Gustafsson



init_privs.diff
Description: Binary data


Re: Improve the connection failure error messages

2024-04-26 Thread Daniel Gustafsson
> On 22 Mar 2024, at 11:42, Nisha Moond  wrote:

> Here is the v4 patch with changes required in slotfuncs.c and slotsync.c 
> files.

-   errmsg("could not connect to the primary server: %s", err));
+   errmsg("\"%s\" could not connect to the primary server: %s", 
app_name.data, err));

Messages like this should perhaps have translator comments to indicate what the
leading "%s" will contain?

--
Daniel Gustafsson





Re: some additional (small) problems with pg_combinebackup and tablespaces

2024-04-25 Thread Daniel Gustafsson
> On 25 Apr 2024, at 17:48, Robert Haas  wrote:
> 
> On Wed, Apr 24, 2024 at 3:20 PM Daniel Gustafsson  wrote:
>> Patch LGTM.
> 
> Thanks. Here's an updated version with fixes for the other issues you 
> mentioned.

LGTM, only one small error in the commitmessage: s/Gustaffson/Gustafsson/

--
Daniel Gustafsson





Re: Add notes to pg_combinebackup docs

2024-04-25 Thread Daniel Gustafsson
> On 25 Apr 2024, at 18:16, Robert Haas  wrote:
> 
> On Wed, Apr 24, 2024 at 3:08 PM Daniel Gustafsson  wrote:
>> LGTM; only one small comment which you can ignore if you feel it's not worth
>> the extra words.
>> 
>> +pg_combinebackup when the checksum status of the
>> +cluster has been changed; see
>> 
>> I would have preferred that this sentence included the problematic period for
>> the change, perhaps "..has been changed after the initial backup." or ideally
>> something even better.  In other words, clarifying that if checksums were
>> enabled before any backups were taken this limitation is not in play.  It's 
>> not
>> critical as the link aptly documents this, it just seems like the sentence is
>> cut short.
> 
> This was somewhat deliberate.

Given your reasoning below, +1 on the patch you proposed.

> I think the user will probably still get the
> point, but I also think they'll probably get the point without the
> extra verbiage. I think that it will be natural for people to imagine
> that what matters is not whether the checksum status has ever changed,
> but whether it has changed within the relevant time period, whatever
> that is exactly.

Fair enough.

> Again, this is not to say that what you're proposing is necessarily
> wrong; I'm just explaining my own thinking.

Gotcha, much appreciated.

--
Daniel Gustafsson





Re: Doc anchors for COPY formats

2024-04-25 Thread Daniel Gustafsson
> On 25 Apr 2024, at 13:23, Dagfinn Ilmari Mannsåker  wrote:

> An IRC conversation just now made me notice that it would be handy to
> have stable links for the descrpitions of the various COPY formats, per
> the attached patch.

No objections, that seems perfectly a reasonable idea.  Maybe we should set an
xreflabel while at it for completeness sake?

--
Daniel Gustafsson





Re: some additional (small) problems with pg_combinebackup and tablespaces

2024-04-24 Thread Daniel Gustafsson
> On 24 Apr 2024, at 19:59, Robert Haas  wrote:

> Here is a very small patch correcting these regrettable errors.

Patch LGTM.

In addition to those, unless I'm reading it wrong the current coding seems to
include a "-P" short option which is missing in the command parsing switch
statement (or in the help output):

while ((c = getopt_long(argc, argv, "dnNPo:T:",

Should that be removed?

A tiny nit-pick in the same area: while not wrong, it reads a bit odd to have
"don't" and "do not" on lines directly next to each other:

printf(_("  -n, --dry-run don't actually do anything\n"));
printf(_("  -N, --no-sync do not wait for changes to be written 
safely to disk\n"));

--
Daniel Gustafsson





Re: Add notes to pg_combinebackup docs

2024-04-24 Thread Daniel Gustafsson
> On 22 Apr 2024, at 20:52, Robert Haas  wrote:
> 
> On Thu, Apr 18, 2024 at 3:25 PM Daniel Gustafsson  wrote:
>>> On 18 Apr 2024, at 20:11, Robert Haas  wrote:
>>> 2. As (1), but make check_control_files() emit a warning message when
>>> the problem case is detected.
>> 
>> Being in the post-freeze part of the cycle, this seems like the best option.
> 
> Here is a patch for that.

LGTM; only one small comment which you can ignore if you feel it's not worth
the extra words.

+pg_combinebackup when the checksum status of the
+cluster has been changed; see

I would have preferred that this sentence included the problematic period for
the change, perhaps "..has been changed after the initial backup." or ideally
something even better.  In other words, clarifying that if checksums were
enabled before any backups were taken this limitation is not in play.  It's not
critical as the link aptly documents this, it just seems like the sentence is
cut short.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-24 Thread Daniel Gustafsson
> On 24 Apr 2024, at 00:20, Michael Paquier  wrote:
> 
> On Tue, Apr 23, 2024 at 10:08:13PM +0200, Daniel Gustafsson wrote:
>> Hearing no objections to this plan (and the posted v10), I'll go ahead with
>> 0001, 0003 and 0004 into v17 tomorrow.
> 
> WFM, thanks.

Done.  Attached are the two remaining patches, rebased over HEAD, for removing
support for OpenSSL 1.0.2 in v18. Parking them in the commitfest for now.

--
Daniel Gustafsson



v11-0002-Remove-pg_strong_random-initialization.patch
Description: Binary data


v11-0001-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data


Re: Use XLOG_CONTROL_FILE macro everywhere?

2024-04-24 Thread Daniel Gustafsson
> On 24 Apr 2024, at 11:13, Anton A. Melnikov  wrote:
> 
> On 24.04.2024 12:02, Peter Eisentraut wrote:
>> On 19.04.24 05:50, Anton A. Melnikov wrote:
>>> 
>>> May be better use this macro everywhere in C code?
>> I don't know.  I don't find XLOG_CONTROL_FILE to be a very intuitive proxy 
>> for "pg_control".

Maybe, but inconsistent usage is also unintuitive.

> Then maybe replace XLOG_CONTROL_FILE with PG_CONTROL_FILE?
> 
> The PG_CONTROL_FILE_SIZE macro is already in the code.
> With the best regards,

XLOG_CONTROL_FILE is close to two decades old so there might be extensions
using it (though the risk might be slim), perhaps using offering it as as well
as backwards-compatability is warranted should we introduce a new name?

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-23 Thread Daniel Gustafsson
> On 19 Apr 2024, at 10:06, Peter Eisentraut  wrote:
> 
> On 19.04.24 07:37, Michael Paquier wrote:
>> On Thu, Apr 18, 2024 at 12:53:43PM +0200, Peter Eisentraut wrote:
>>> If everything is addressed, I agree that 0001, 0003, and 0004 can go into
>>> PG17, the rest later.
>> About the PG17 bits, would you agree about a backpatch?  Or perhaps
>> you disagree?
> 
> I don't think any of these need to be backpatched, at least right now.
> 
> 0001 is just a cosmetic documentation tweak, has no reason to be backpatched.
> 
> 0003 adds new functionality for LibreSSL.  While the code looks 
> straightforward, we have little knowledge about how it works in practice.  
> How is the buildfarm coverage of LibreSSL (with SSL tests enabled!)?  If 
> people are keen on this, it might be better to get it into PG17 and at least 
> let to go through a few months of beta testing.
> 
> 0004 effectively just enhances an error message for LibreSSL; there is little 
> reason to backpatch this.

Hearing no objections to this plan (and the posted v10), I'll go ahead with
0001, 0003 and 0004 into v17 tomorrow.

--
Daniel Gustafsson





Re: slightly misleading Error message in guc.c

2024-04-23 Thread Daniel Gustafsson
> On 22 Apr 2024, at 18:04, Tom Lane  wrote:

> Seems like a useful change

Agreed.

> ... about like this?

Patch LGTM.

--
Daniel Gustafsson





Re: DROP OWNED BY fails to clean out pg_init_privs grants

2024-04-22 Thread Daniel Gustafsson
> On 21 Apr 2024, at 23:08, Tom Lane  wrote:

> ... were you going to look at it?  I can take a whack if it's too far down 
> your priority list.

Yeah, I’m working on a patchset right now.

  ./daniel



Re: Use XLOG_CONTROL_FILE macro everywhere?

2024-04-20 Thread Daniel Gustafsson
> On 20 Apr 2024, at 01:23, Michael Paquier  wrote:
> 
> On Fri, Apr 19, 2024 at 10:12:14AM +0200, Daniel Gustafsson wrote:
>> Off the cuff that seems to make sense, it does make it easier to grep for 
>> uses
>> of the control file.
> 
> +1 for switching to the macro where we can.  Still, I don't see a
> point in rushing and would just switch that once v18 opens up for
> business.

Absolutely, this is not fixing a defect so it's v18 material.

Anton: please register this patch in the Open commitfest to ensure it's not
forgotten about.

--
Daniel Gustafsson





Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions

2024-04-19 Thread Daniel Gustafsson
> On 19 Apr 2024, at 15:13, David Benjamin  wrote:
> 
> Circling back here, anything else needed from my end on this patch?

Patience mostly, v17 very recently entered feature freeze so a lof of the
developers are busy finding and fixing bugs to stabilize the release.  When the
window for new features opens again I am sure this patch will get reviews (I
have it on my personal TODO and hope to get to it).

--
Daniel Gustafsson





Re: Okay to remove mention of mystery @ and ~ operators?

2024-04-19 Thread Daniel Gustafsson
> On 19 Apr 2024, at 13:49, Daniel Gustafsson  wrote:
>> On 19 Apr 2024, at 12:31, Aleksander Alekseev  
>> wrote:

>> Here is the patch.
> 
> Nice catch, and thanks for the patch.  I'll apply it with a backpatch to when
> they were removed.

Done, thanks for the report and the patch!

--
Daniel Gustafsson





Re: Okay to remove mention of mystery @ and ~ operators?

2024-04-19 Thread Daniel Gustafsson
> On 19 Apr 2024, at 12:31, Aleksander Alekseev  
> wrote:
> 
> Hi,
> 
>>> This page says that the `@` and `~` operators on various types can be 
>>> accelerated by a GiST index.
>>> 
>>> https://www.postgresql.org/docs/current/gist-builtin-opclasses.html
>>> 
>>> These operators have been listed in the file since it was created in 2014, 
>>> but if they exist then I don't know how to use them or what they do.
>>> 
>>> Code examples, for clarity:
>>> 
>>>> select box '(0,0),(1, 1)' ~ box '(2,2),(3,3)';
>>> operator does not exist: box ~ box
>>> 
>>>> select box '(0,0),(1, 1)' @ box '(2,2),(3,3)';
>>> operator does not exist: box @ box
>>> 
>>> If they're a typo or some removed thing then I'd like to remove them from 
>>> the page. This email is me asking to find out if I'm wrong about that 
>>> before I try to submit a patch (also very happy for someone with a 
>>> committer bit to just fix this).
>> 
>> Indeed, there is no @(box,box) or ~(box,box) in the \dAo output. These
>> operators were removed by 2f70fdb0644c back in 2020.
>> 
>> I will submit a patch for the documentation shortly. Thanks for reporting.
> 
> Here is the patch.

Nice catch, and thanks for the patch.  I'll apply it with a backpatch to when
they were removed.

--
Daniel Gustafsson





Re: Typos in the code and README

2024-04-19 Thread Daniel Gustafsson
> On 16 Apr 2024, at 15:37, Nazir Bilal Yavuz  wrote:

> I realized two small typos: 'sgmr' -> 'smgr'. You may want to include
> them in 0001.

Thanks, I incorporated these into 0001 before pushing.  All the commits in this
patchset are now applied.

--
Daniel Gustafsson





Re: Use XLOG_CONTROL_FILE macro everywhere?

2024-04-19 Thread Daniel Gustafsson
> On 19 Apr 2024, at 05:50, Anton A. Melnikov  wrote:

> May be better use this macro everywhere in C code?

Off the cuff that seems to make sense, it does make it easier to grep for uses
of the control file.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-19 Thread Daniel Gustafsson
> On 19 Apr 2024, at 07:37, Michael Paquier  wrote:
> 
> On Thu, Apr 18, 2024 at 12:53:43PM +0200, Peter Eisentraut wrote:
>> If everything is addressed, I agree that 0001, 0003, and 0004 can go into
>> PG17, the rest later.
> 
> About the PG17 bits, would you agree about a backpatch?  Or perhaps
> you disagree?

If we want to 0001 can be baclpatched to v15, 0004 to v13 and 0003 all the way.
I don't have strong opinions either way.

--
Daniel Gustafsson





Re: improve performance of pg_dump --binary-upgrade

2024-04-18 Thread Daniel Gustafsson
> On 18 Apr 2024, at 22:28, Nathan Bossart  wrote:
> 
> On Thu, Apr 18, 2024 at 10:23:01AM -0500, Nathan Bossart wrote:
>> On Thu, Apr 18, 2024 at 09:24:53AM +0200, Daniel Gustafsson wrote:
>>> That does indeed seem like a saner approach.  Since we look up the relkind 
>>> we
>>> can also remove the is_index parameter to binary_upgrade_set_pg_class_oids
>>> since we already know that without the caller telling us?
>> 
>> Yeah.  It looks like that's been possible since commit 9a974cb, so I can
>> write a prerequisite patch for this.
> 
> Here's a new patch set with this change.

From a read-through they look good, a very nice performance improvement in an
important path.  I think it would be nice with some comments on the
BinaryUpgradeClassOids struct (since the code using it is thousands of lines
away), and a comment on the if (oids == NULL) block explaining the caching.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-18 Thread Daniel Gustafsson
> On 18 Apr 2024, at 12:53, Peter Eisentraut  wrote:

> Review of the latest batch:

Thanks for reviewing!

> 8 v9-0002-Remove-support-for-OpenSSL-1.0.2.patch
> 
> Ok, but maybe make the punctuation consistent here:

Fixed.

> * v9-0004-Support-SSL_R_VERSION_TOO_LOW-on-LibreSSL.patch
> 
> Seems ok, but the reason isn't clear to me.  Are there LibreSSL versions that 
> have SSL_R_VERSION_TOO_LOW but not SSL_R_VERSION_TOO_HIGH?  Maybe this could 
> be explained better.

LibreSSL doesn't support SSL_R_VERSION_TOO_HIGH at all, they only support
_TOO_LOW starting with the OpenBSD 7.2 release.  I've expanded the commit
message to document this.

> Also, "OpenSSL 7.2" in the commit message probably meant "OpenBSD"?

Ah yes, fixed.

> * v9-0005-Remove-pg_strong_random-initialization.patch
> 
> I don't understand the reason for this phrase in the commit message: "1.1.1 
> is being increasingly phased out from production use".  Did you mean 1.1.0 
> there?

Correct, I got lost among the version numbers it seems. Fixed.

> Conditionally sticking the RAND_poll() into pg_strong_random(), does that 
> have the effect we want?  It wouldn't reinitialize after a fork, AFAICT.

No I think you're right, my previous version would have worked (but was ugly)
but this doesn't guarantee that.  Thinking more about it maybe it's best to
just keep the init function and have a version check for 1.1.0 there, making it
an empty no-op for all other cases.  When we move past 1.1.0 due to a new API
requirement we can blow it all away.

> If everything is addressed, I agree that 0001, 0003, and 0004 can go into 
> PG17, the rest later.

Agreed, 0002 and 0005 are clearly for the v18 cycle.

--
Daniel Gustafsson



v10-0001-Doc-Use-past-tense-for-things-which-happened-in-.patch
Description: Binary data


v10-0002-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data


v10-0003-Support-disallowing-SSL-renegotiation-in-LibreSS.patch
Description: Binary data


v10-0004-Support-SSL_R_VERSION_TOO_LOW-on-LibreSSL.patch
Description: Binary data


v10-0005-Remove-pg_strong_random-initialization.patch
Description: Binary data


Re: Add notes to pg_combinebackup docs

2024-04-18 Thread Daniel Gustafsson
> On 18 Apr 2024, at 20:11, Robert Haas  wrote:

> 2. As (1), but make check_control_files() emit a warning message when
> the problem case is detected.

Being in the post-freeze part of the cycle, this seems like the best option.

> 3. As (2), but also add a command-line option to pg_combinebackup to
> flip the checksum flag to false in the control file.

I don't think this is the way to go, such an option will be plastered on to
helpful tutorials which users will copy/paste from even when they don't need
the option at all, only to disable checksums when there was no reason for doing
so.

> We could perhaps consider doing (2) for now and (5) for a future
> release, or something like that.

+1

--
Daniel Gustafsson





Re: improve performance of pg_dump --binary-upgrade

2024-04-18 Thread Daniel Gustafsson
> On 18 Apr 2024, at 06:17, Nathan Bossart  wrote:

> The attached work-in-progress patch speeds up 'pg_dump --binary-upgrade'
> for this case.  Instead of executing the query in every call to the
> function, we can execute it once during the first call and store all the
> required information in a sorted array that we can bsearch() in future
> calls.

That does indeed seem like a saner approach.  Since we look up the relkind we
can also remove the is_index parameter to binary_upgrade_set_pg_class_oids
since we already know that without the caller telling us?

> One downside of this approach is the memory usage.

I'm not too worried about the worst-case performance of this.

> This was more-or-less
> the first approach that crossed my mind, so I wouldn't be surprised if
> there's a better way.  I tried to keep the pg_dump output the same, but if
> that isn't important, maybe we could dump all the pg_class OIDs at once
> instead of calling binary_upgrade_set_pg_class_oids() for each one.

Without changing the backend handling of the Oid's we can't really do that
AFAICT, the backend stores the Oid for the next call so it needs to be per
relation like now?

For Greenplum we moved this to the backend by first dumping all Oids which were
read into backend cache, and during relation creation the Oid to use was looked
up in the backend.  (This wasn't a performance change, it was to allow multiple
shared-nothing clusters to have a unified view of Oids, so I never benchmarked
it all that well.) The upside of that is that the magic Oid variables in the
backend can be removed, but it obviously adds slight overhead in others.

--
Daniel Gustafsson





Re: Add bump memory context type and use it for tuplesorts

2024-04-16 Thread Daniel Gustafsson
> On 16 Apr 2024, at 07:12, Amul Sul  wrote:
> 
> Attached is a small patch adding the missing BumpContext description to the
> README.

Nice catch, we should add it to the README.

+  pfree'd or realloc'd.
I think it's best to avoid mixing API:s, "pfree'd or repalloc'd" keeps it to
functions in our API instead.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-16 Thread Daniel Gustafsson


> On 16 Apr 2024, at 01:03, Michael Paquier  wrote:
> 
> On Mon, Apr 15, 2024 at 11:07:05AM +0200, Daniel Gustafsson wrote:
>> On 15 Apr 2024, at 07:04, Michael Paquier  wrote:
>>> On Fri, Apr 12, 2024 at 02:42:57PM +0200, Daniel Gustafsson wrote:
>>>> Is the attached split in line with how you were thinking about it?
>>> 
>>> If I may, 0001 looks sensible here.  The bits from 0003 and 0004 could
>>> be applied before 0002, as well.
>> 
>> Agreed, once we are in post-freeze I think those three are mostly
>> ready to go.
> 
> Is there a point to wait for 0001, 0003 and 0004, though, and
> shouldn't these three be backpatched?  0001 is certainly OK as a
> doc-only change to be consistent across the board without waiting 5
> more years.  0003 and 0004 are conditional and depend on if
> SSL_R_VERSION_TOO_LOW and SSL_OP_NO_CLIENT_RENEGOTIATION are defined
> at compile-time.  0003 is much more important, and note that
> 01e6f1a842f4 has been backpatched all the way down.  0004 is nice,
> still not strongly mandatory.

I forgot (and didn't check) that we backpatched 01e6f1a842f4, with that in mind
I agree that we should backpatch 0003 as well to put LibreSSL on par as much as
we can.  0004 is a fix for the LibreSSL support, not adding anything new, so
pushing that to master now makes sense.  Unless objections are raised I'll push
0001, 0003 and 0004 shortly.  0002 and 0005 can hopefully be addressed in the
July commitfest.

>>> Rather than calling always RAND_poll(), this
>>> could use a static flag to call it only once when pg_strong_random is
>>> called for the first time.
>> 
>> Agreed, we can good that. Fixed.
> 
> +#if (OPENSSL_VERSION_NUMBER <= 0x1010L)
> + static bool rand_initialized = false;
> 
> This does not look right.  At the top of upstream's branch
> OpenSSL_1_1_0-stable, OPENSSL_VERSION_NUMBER is 0x101000d0L, so the
> initialization would be missed for any version in the 1.1.0 series
> except the GA one without this code being compiled, no?

Meh, I was clearly not caffeinated as that's a thinko with a typo attached to
it.  The check should be "< 0x10101000" to catch any version prior to 1.1.1.
Fixed.

--
Daniel Gustafsson



v9-0005-Remove-pg_strong_random-initialization.patch
Description: Binary data


v9-0004-Support-SSL_R_VERSION_TOO_LOW-on-LibreSSL.patch
Description: Binary data


v9-0003-Support-disallowing-SSL-renegotiation-in-LibreSSL.patch
Description: Binary data


v9-0002-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data


v9-0001-Doc-Use-past-tense-for-things-which-happened-in-t.patch
Description: Binary data


Re: Typos in the code and README

2024-04-15 Thread Daniel Gustafsson
> On 14 Apr 2024, at 13:19, David Rowley  wrote:
> 
> On Sat, 13 Apr 2024 at 09:17, Daniel Gustafsson  wrote:
>> 
>>> On 12 Apr 2024, at 23:15, Heikki Linnakangas  wrote:
>>> Here's a few more. I've accumulate these over the past couple of months, 
>>> keeping them stashed in a branch, adding to it whenever I've spotted a 
>>> minor typo while reading the code.
>> 
>> Nice, let's lot all these together.
> 
> Here are a few additional ones to add to that.

Thanks.  Collecting all the ones submitted here, as well as a few submitted
off-list by Alexander, the patch is now a 3-part patchset of cleanups:

0001 contains the typos and duplicate words fixups, 0002 fixes a parameter with
the wrong name in the prototype and 0003 removes a leftover prototype which was
accidentally left in a refactoring.

--
Daniel Gustafsson



v2-0003-Remove-unused-function-prototype.patch
Description: Binary data


v2-0002-Fix-incorrect-parameter-name-in-prototype.patch
Description: Binary data


v2-0001-Fix-typos-and-duplicate-words-in-comments.patch
Description: Binary data


Re: "backend process" confused with "server process"

2024-04-15 Thread Daniel Gustafsson
> On 15 Apr 2024, at 11:07, Andrey M. Borodin  wrote:
>> On 15 Apr 2024, at 14:01, jian he  wrote:

>> "is not a PostgreSQL server process" is the same thing as "not a
>> PostgreSQL backend process”?
> 
> As far as I understand, backend is something attached to frontend.
> There might be infrastructure processes like syslogger or stats collector, 
> client can signal it too.

I think that's a good summary, in line with the glossary in our docs where we
define "Auxiliary process" and "Backend (process)" (but not PostgreSQL server
process which maybe we should?).  "PostgreSQL server process" is used in copy
to/from error messaging as well in what seems to be a user-freindly way to say
"backend".  In the docs we mostly use "server process" to mean a process
running in the operating system on the server.  In a few places we use
"database server process" too when talking about backends.

In the case of pg_log_backend_memory_contexts(), the function comment states
the following:

"Signal a backend or an auxiliary process to log its memory contexts"

So in this case, "PostgreSQL server process" seems like the correct term.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-15 Thread Daniel Gustafsson
> On 15 Apr 2024, at 07:04, Michael Paquier  wrote:
> On Fri, Apr 12, 2024 at 02:42:57PM +0200, Daniel Gustafsson wrote:

>> Is the attached split in line with how you were thinking about it?
> 
> If I may, 0001 looks sensible here.  The bits from 0003 and 0004 could
> be applied before 0002, as well.

Agreed, once we are in post-freeze I think those three are mostly ready to go.

> - /* do post-fork initialization for random number generation */
> - pg_strong_random_init();
> 
> Perhaps you intented this diff to be in 0005 rather than in 0002?
> With 0002 applied, only support for 1.0.2 is removed, not 1.1.0 yet.

Yes, nice catch, that was a mistake in splitting up the patch into multiple
pieces, it should be in the 0005 patch for strong random. Fixed.

> s/requred/required/.

Fixed.

> Rather than calling always RAND_poll(), this
> could use a static flag to call it only once when pg_strong_random is
> called for the first time.

Agreed, we can good that. Fixed.

> I would not mind seeing this part entirely
> gone with the removal of support for 1.1.0.

If we want to keep autoconf from checking versions and just check compatibility
(with our code) then we will remain at 1.1.0 compatibility.  The only 1.1.1 API
we use is not present in LibreSSL so we can't really place a hard restriction
on that.  It might be that keeping it for now, and removing it later during the
v18 cycle as we modernize our OpenSSL code (which I hope to find time to work
on) and make use of newer 1.1.1 API:s.  That way we can keep our autoconf/meson
checks consistent across library checks.  If we end up with no new API:s to
check for by the time the last commitfest of v18 rolls around, we can revisit
the decision then.

--
Daniel Gustafsson



v8-0005-Remove-pg_strong_random-initialization.patch
Description: Binary data


v8-0004-Support-SSL_R_VERSION_TOO_LOW-on-LibreSSL.patch
Description: Binary data


v8-0003-Support-disallowing-SSL-renegotiation-in-LibreSSL.patch
Description: Binary data


v8-0002-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data


v8-0001-Doc-Use-past-tense-for-things-which-happened-in-t.patch
Description: Binary data


Re: Typos in the code and README

2024-04-12 Thread Daniel Gustafsson
> On 12 Apr 2024, at 23:15, Heikki Linnakangas  wrote:
> 
> On 11/04/2024 16:05, Daniel Gustafsson wrote:
>> Now that the tree has settled down a bit post-freeze I ran some tooling to
>> check spelling.  I was primarily interested in docs and README* which were
>> mostly free from simply typos, while the code had some in various comments 
>> and
>> one in code.  The attached fixes all that I came across (not cross-referenced
>> against ongoing reverts or any other fixup threads but will be before pushing
>> of course).
> 
> Here's a few more. I've accumulate these over the past couple of months, 
> keeping them stashed in a branch, adding to it whenever I've spotted a minor 
> typo while reading the code.

Nice, let's lot all these together.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-12 Thread Daniel Gustafsson
> On 10 Apr 2024, at 09:31, Peter Eisentraut  wrote:

> I think it might be better to separate this into two steps:

Fair enough.

> 1. Move to 1.1.0.  This is an API update.  Change OPENSSL_API_COMPAT, and 
> remove a bunch of code that no longer needs to be conditional.  We could 
> check for a representative function like OPENSSL_init_ssl() in 
> configure/meson, or we could just let the compilation fail with older 
> versions.

The attached 0002 bumps the minimum required version to 1.1.0 with a hard error
in autoconf/meson, and removes all the 1.0.2 support code.

I think the documentation for PQinitOpenSSL should be reworded from "you have
to do this, unless you run 1.1.0 or later" to "If you run 1.1.0, you need to do
this).  As it is now the important bit of the paragrapg is at the end rather
than in the beginning.  Trying this I didn't find a wording which seemed like
an improvement though, suggestions are welcome.

> 2. Move to 1.1.1.  I understand this has to do with the fork-safety of 
> pg_strong_random(), and it's not an API change but a behavior change. Let's 
> make this association clearer in the code.  For example, add a version check 
> or assertion about this into pg_strong_random() itself.

0005 moves the fork safety init inline with calling pg_strong_random, and
removes it for everyone else.  This allows 1.1.0 to be supported as we
effectively are at the 1.1.0 API level, at the cost of calls for strong random
being slower on 1.1.0.  An unscientific guess based on packaged OpenSSL
versions and the EOL and ELS/LTS status of 1.1.0, is that the number of
production installs of postgres 17 using OpenSSL 1.1.0 is close to zero.

> I don't know how LibreSSL interacts with either of these two points. That's 
> something that could be clearer.

The oldest LibreSSL we have in the buildfarm is 3.2 (from OpenBSD 6.8), which
the attached version supports and passes tests with.  LibreSSL has been
providing fork safety since 2.0.2 which is well into the past.

> * doc/src/sgml/libpq.sgml
> 
> This small documentation patch could be committed forthwith.

Agreed, separated into 0001 in the attached and can be committed regardless of
the remaining ones.

> * src/backend/libpq/be-secure-openssl.c
> 
> +#include 
> 
> This patch doesn't appear to add anything, so why does it need a new include?

As mentioned downthread, an indirect inclusion was removed so we need to
explicitly include it.

> Could the additions of SSL_OP_NO_CLIENT_RENEGOTIATION and 
> SSL_R_VERSION_TOO_LOW be separate patches?

They can, done in the attached.

SSL_R_VERSION_TOO_LOW was introduced quite recently in LibreSSL 3.6.3 (OpenBSD
7.2), but splitting the check for TOO_LOW and TOO_HIGH into two seems pretty
uncontroversial and a good way to get ever so slightly better error handling on
recent LibreSSL.

SSL renegotiation has been supported for much longer in LibreSSL so adding that
to make OpenSSL and LibreSSL support slightly more on par seems seems like a
good idea regardless of version bump.

> * src/common/hmac_openssl.c
> 
> There appears to be some unrelated refactoring happening here?

I assume you mean changing back to FRONTEND from USE_RESOWNER_FOR_HMAC, the
latter was added recently in 38698dd38 and have never shipped, so it seemed
more backpatch-friendly to move back.  Given the amount of changes it probably
won't move the needle though so reverted.

> * src/include/common/openssl.h
> 
> Is the comment no longer applicable to OpenSSL, only to LibreSSL?

OpenSSL has since 0.9.8 defined TLS_MAX_VERSION which points highest version
TLS protocol supported, but AFAIK there is no such construction in LibreSSL.
Assuming I didn't totally misunderstand the comment of course.

> * src/port/pg_strong_random.c
> 
> I would prefer to remove pg_strong_random_init() if it's no longer useful.  I 
> mean, if we leave it as is, and we are not removing any callers, then we are 
> effectively continuing to support OpenSSL <1.1.1, right?

The attached removes pg_strong_random_init and instead calls it explicitly for
1.1.0 users by checking the OpenSSL version.

Is the attached split in line with how you were thinking about it?

--
Daniel Gustafsson



v7-0005-Remove-pg_strong_random-initialization.patch
Description: Binary data


v7-0004-Support-SSL_R_VERSION_TOO_LOW-on-LibreSSL.patch
Description: Binary data


v7-0003-Support-disallowing-SSL-renegotiation-in-LibreSSL.patch
Description: Binary data


v7-0002-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data


v7-0001-Doc-Use-past-tense-for-things-which-happened-in-t.patch
Description: Binary data


Re: Typos in the code and README

2024-04-11 Thread Daniel Gustafsson
> On 11 Apr 2024, at 15:29, David Rowley  wrote:
> 
> On Fri, 12 Apr 2024, 1:05 am Daniel Gustafsson,  <mailto:dan...@yesql.se>> wrote:
>> Now that the tree has settled down a bit post-freeze I ran some tooling to
>> check spelling.  I was primarily interested in docs and README* which were
>> mostly free from simply typos, while the code had some in various comments 
>> and
>> one in code.  The attached fixes all that I came across (not cross-referenced
>> against ongoing reverts or any other fixup threads but will be before pushing
>> of course).
> 
> 
> I see you've corrected "iif" to "if". It should be "iff".

Gotcha, will fix.  I opted for "if" due to recent threads where the use of
"iff" was discouraged due to not being commonly known, but that was in doc/ and
not code comments.

--
Daniel Gustafsson



Typos in the code and README

2024-04-11 Thread Daniel Gustafsson
Now that the tree has settled down a bit post-freeze I ran some tooling to
check spelling.  I was primarily interested in docs and README* which were
mostly free from simply typos, while the code had some in various comments and
one in code.  The attached fixes all that I came across (not cross-referenced
against ongoing reverts or any other fixup threads but will be before pushing
of course).

--
Daniel Gustafsson



v1-0001-Fix-typos.patch
Description: Binary data


Re: type in basebackup_incremental.c ?

2024-04-11 Thread Daniel Gustafsson
> On 11 Apr 2024, at 13:16, Andrew Dunstan  wrote:

> Thanks, I will include that in the cleanups I'm intending to push today.

+1, thanks!

--
Daniel Gustafsson





Re: type in basebackup_incremental.c ?

2024-04-11 Thread Daniel Gustafsson
> On 11 Apr 2024, at 11:49, Daniel Westermann (DWE) 
>  wrote:
> 
> Hi,
> 
> /*
>  * we expect the find the last lines of the manifest, including the checksum,
>  * in the last MIN_CHUNK bytes of the manifest. We trigger an incremental
>  * parse step if we are about to overflow MAX_CHUNK bytes.
>  */
> 
> Shouldn't this be:
> /*
>  * we expect to find the last lines of the manifest,...
>  */

That sounds about right, and since it's a full sentence it should also start
with a capital 'W': "We expect to find the..".

--
Daniel Gustafsson





Re: Fix resource leak (src/backend/libpq/be-secure-common.c)

2024-04-10 Thread Daniel Gustafsson
> On 10 Apr 2024, at 20:31, Ranier Vilela  wrote:
> 
> Em ter., 2 de abr. de 2024 às 15:31, Daniel Gustafsson  <mailto:dan...@yesql.se>> escreveu:
>> > On 2 Apr 2024, at 20:13, Ranier Vilela > > <mailto:ranier...@gmail.com>> wrote:
>> 
>> > Fix by freeing the pointer, like pclose_check (src/common/exec.c) similar 
>> > case.
>> 
>> Off the cuff, seems reasonable when loglevel is LOG.
> 
> Per Coverity.
> 
> Another case of resource leak, when loglevel is LOG.
> In the function shell_archive_file (src/backend/archive/shell_archive.c)
> The pointer *xlogarchcmd*  is not freed.

Thanks, I'll have a look.  I've left this for post-freeze on purpose to not
cause unnecessary rebasing.  Will take a look over the next few days unless
beaten to it.

--
Daniel Gustafsson



Re: IPC::Run::time[r|out] vs our TAP tests

2024-04-09 Thread Daniel Gustafsson


> On 4 Apr 2024, at 23:46, Daniel Gustafsson  wrote:
> 
>> On 4 Apr 2024, at 23:24, Tom Lane  wrote:
> 
>> A minimum fix that seems to make this work better is as attached,
>> but I feel like somebody ought to examine all the IPC::Run::timer
>> and IPC::Run::timeout calls and see which ones are mistaken.
>> It's a little scary to convert a timeout to a timer because of
>> the hazard that someplace that would need to be checking for
>> is_expired isn't.
> 
> Skimming this and a few callsites it seems reasonable to use a timer instead 
> of
> a timeout, but careful study is needed to make sure we're not introducing
> anything subtly wrong in the other direction.

Sharing a few preliminary results from looking at this, the attached passes
check-world but need more study/testing.

It seems wrong to me that we die() in query and query_until rather than giving
the caller the power to decide how to proceed.  We have even documented that we
do just this:

"Dies on failure to invoke psql, or if psql fails to connect.  Errors
occurring later are the caller's problem"

Turning the timeout into a timer and returning undef along with logging a test
failure in case of expiration seems a bit saner (maybe Andrew can suggest an
API which has a better Perl feel to it).  Most callsites don't need any changes
to accommodate for this, the attached 0002 implements this timer change and
modify the few sites that need it, converting one to plain query() where the
added complexity of query_until isn't required.

A few other comments on related things that stood out while reviewing:

The tab completion test can use the API call for automatically restart the
timer to reduce the complexity of check_completion a hair.  Done in 0001 (but
really not necessary).

Commit Af279ddd1c2 added this sequence to 040_standby_failover_slots_sync.pl in
the recovery tests:

$back_q->query_until(
qr/logical_slot_get_changes/, q(
   \echo logical_slot_get_changes
   SELECT pg_logical_slot_get_changes('test_slot', NULL, NULL);
));

...  ...

# Since there are no slots in standby_slot_names, the function
# pg_logical_slot_get_changes should now return, and the session can be
# stopped.
$back_q->quit;

There is no guarantee that pg_logical_slot_get_changes has returned when
reaching this point.  This might still work as intended, but the comment is
slightly misleading IMO.


recovery/t/043_wal_replay_wait.pl calls pg_wal_replay_wait() since 06c418e163e
in a background session which it then skips terminating.  Calling ->quit is
mandated by the API, in turn required by IPC::Run.  Calling ->quit on the
process makes the test fail from the process having already exited, but we can
call ->finish directly like we do in test_misc/t/005_timeouts.pl.  0003 fixes
this.

--
Daniel Gustafsson



v1-0001-Configure-interactive-instance-to-restart-timer.patch
Description: Binary data


v1-0002-Report-test-failure-rather-than-aborting-in-case-.patch
Description: Binary data


v1-0003-Clean-up-BackgroundPsql-object-when-test-ends.patch
Description: Binary data



Re: Converting README documentation to Markdown

2024-04-08 Thread Daniel Gustafsson
> On 8 Apr 2024, at 22:30, Erik Wienhold  wrote:
> On 2024-04-08 21:29 +0200, Daniel Gustafsson wrote:

> I've only peeked at a couple of those READMEs, but they look alright so
> far (at least on GitHub).  Should we settle on a specific Markdown
> flavor[1]?  Because I'm never sure if some markups only work on
> specific code-hosting sites.

Probably, but if we strive for maintained textual readability with avoiding
most of the creative markup then we're probably close to the original version.
But I agree, it should be evaluated.

> Maybe also a guide on writing Markdown
> that renders properly, especially with regard to escaping that may be
> necessary (see below).

That's a good point, if we opt for an actual format there should be some form
of documentation about that format, especially if we settle for using a
fraction of the capabilities of the format.

>> * In the regex README there are two file references using * as a wildcard, 
>> but
>>  the combination of the two makes Markdown render the text between them in
>>  italics.  Wrapping these in backticks solves it, but I'm not a fan since we
>>  don't do that elsewhere.  A solution which avoids backticks would ne nice.
> 
> Escaping does the trick: regc_\*.c

Right, but that makes the plaintext version less readable than the backticks I
think.

> Can be escaped as well: \

..and same with this one. It's all very subjective though.

--
Daniel Gustafsson





Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread Daniel Gustafsson
> On 8 Apr 2024, at 01:04, Andres Freund  wrote:

> What makes you think that's unrelated? To me that looks like the same issue?

Nvm, I misread the assert, ETOOLITTLESLEEP. Sorry for the noise.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-07 Thread Daniel Gustafsson
> On 8 Apr 2024, at 00:46, Michael Paquier  wrote:
> 
> On Sat, Apr 06, 2024 at 07:47:43PM +0200, Daniel Gustafsson wrote:
>> My apologies, I thought I did but clearly failed.  My point was that this is 
>> a
>> special/corner case where we try to find one of two different libraries (with
>> different ideas about backwards compatability etc) for supporting a single
>> thing.  So instead I tested for the explicit versions like how we already 
>> test
>> for the exact Perl version in config/perl.m4 (albeit that a library and a
>> program are two different things of course).
> 
> +  # Function introduced in OpenSSL 1.1.1 and LibreSSL 3.3.2
> +  AC_CHECK_FUNCS([EVP_PKEY_new_CMAC_key], [], [AC_MSG_ERROR([OpenSSL 1.1.1 
> or later is required for SSL support])]) 
> 
> I can see why you want to do that, but we've always relied on
> compilation failures while documenting the versions supported.  I
> don't disagree to your point of detecting that earlier, but it sounds
> like this should be a separate patch separate from the one removing
> support for OpenSSL 1.0.2 and 1.1.0, at least, because you are solving
> two problems at once.
> 
>> In bumping we want to move to 1.1.1 since that's the first version with the
>> rewritten RNG which is fork-safe by design, something PostgreSQL clearly
>> benefits from.  There is no new API for this to gate on though.  For LibreSSL
>> we want 3.3.2 to a) ensure we have coverage in the BF and b) since it's the
>> first version where the tests pass due to error message alignment with 
>> OpenSSL.
>> The combination of these gets rid of lots of specialcased #ifdef soup.  I
>> wasn't however able to find a specific API call which is unique to the two
>> version which we rely on.
> 
> Based on the state of the patch, all the symbols cleaned up in
> pg_config.h.in would be removed only by dropping support for
> 1.0.2.  The routines of protocol_openssl.c exist in 1.1.0.  libpq
> internal locking can be removed by dropping support for 1.0.2.  So
> a bunch of ifdefs are removed with 1.0.2 support, but that's much,
> much less cleaned once 1.1.0 is removed.  

If we are settling for removing 1.0.2 we need to make sure there is 1.1.0
buildfarm animals that actually run the sslcheck.  1.1.0 was never an LTS
release so it's presence in distributions is far less widespread.  I would
prefer 1.1.1 but, either way, we have ample time to discuss that during the v18
cycle.

> And pg_strong_random.c stuff would still remain around.

It stays but as belts-and-suspenders safety against bugs, not as a requirement.

>> Testing for the presence of an API provided and introduced by both libraries 
>> in
>> the version we're interested in, but which we don't use, is the alternative 
>> but
>> I thought that would be more frowned upon.  EVP_PKEY_new_CMAC_key() was
>> introduced in 1.1.1 and LibreSSL 3.3.2, so an AC_CHECK_FUNCS for that, as in
>> the attached, achieves the version check but pollutes pg_config.h with a 
>> define
>> which will never be used which seemed a bit ugly.
> 
> Two lines in pg_config.h is a minimal cost.  That does not look like a
> big deal to me.
> 
> -  * Make sure processes do not share OpenSSL randomness state.  This is 
> no
> -  * longer required in OpenSSL 1.1.1 and later versions, but until we 
> drop
> -  * support for version < 1.1.1 we need to do this.
> +  * Make sure processes do not share OpenSSL randomness state.  This is 
> in
> +  * theory no longer be required in OpenSSL 1.1.1 and later versions, but
> +  * there is no harm in taking extra precautions.
> 
> I was wondering if this should also document what you've mentioned,
> aka that OpenSSL still found ways to break the randomness state and
> that this is a cheap insurance against future mistakes that could
> happen in this area.

I'm not quite sure how stable Github links are over time, but I guess we could
post the commit sha for the bug in OpenSSL (as well as the fix) which is a
stable documentation of how it can be subtly broken.

--
Daniel Gustafsson





Re: Add bump memory context type and use it for tuplesorts

2024-04-07 Thread Daniel Gustafsson
> On 8 Apr 2024, at 00:41, Tom Lane  wrote:
> 
> Tomas Vondra  writes:
>> Yup, changing it to this:
> 
>> #define KeeperBlock(set) ((BumpBlock *) ((char *) (set) +
>> MAXALIGN(sizeof(BumpContext
> 
>> fixes the issue for me.
> 
> Mamba is happy with that change, too.

Unrelated to that one, seems like turaco ran into another issue:

running bootstrap script ... TRAP: failed Assert("total_allocated == 
context->mem_allocated"), File: "bump.c", Line: 808, PID: 7809

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=turaco=2024-04-07%2022%3A42%3A54

--
Daniel Gustafsson





Re: Cluster::restart dumping logs when stop fails

2024-04-07 Thread Daniel Gustafsson
> On 7 Apr 2024, at 18:51, Daniel Gustafsson  wrote:
>> On 7 Apr 2024, at 18:28, Andres Freund  wrote:

>> I'm ok with printing path + some content or just the path.
> 
> I think printing the last 512 bytes or so would be a good approach, I'll take
> care of it later tonight. That would be a backpatchable change IMHO.

In the end I opted for just printing the path to avoid introducing logic (at
this point in the cycle) for ensuring that a negative offset doesn't exceed the
filesize.  Since it changes behavior I haven't backpatched it, but can do that
if we think it's more of a fix than a change (I think it can be argued either
way, I have no strong feelings).

--
Daniel Gustafsson





Re: Cluster::restart dumping logs when stop fails

2024-04-07 Thread Daniel Gustafsson
> On 7 Apr 2024, at 18:28, Andres Freund  wrote:
> 
> On 2024-04-07 16:52:05 +0200, Daniel Gustafsson wrote:
>>> On 7 Apr 2024, at 14:51, Andrew Dunstan  wrote:
>>> On 2024-04-06 Sa 20:49, Andres Freund wrote:
>> 
>>>> That's probably unnecessary optimization, but it seems a tad silly to read 
>>>> an
>>>> entire, potentially sizable, file to just use the last 1k. Not sure if the 
>>>> way
>>>> slurp_file() uses seek supports negative ofsets, the docs read to me like 
>>>> that
>>>> may only be supported with SEEK_END.
>>> 
>>> We should enhance slurp_file() so it uses SEEK_END if the offset is 
>>> negative.
>> 
>> Absolutely agree.  Reading the thread I think Andres argues for not printing
>> anything at all in this case but we should support negative offsets anyways, 
>> it
>> will fort sure come in handy.
> 
> I'm ok with printing path + some content or just the path.

I think printing the last 512 bytes or so would be a good approach, I'll take
care of it later tonight. That would be a backpatchable change IMHO.

--
Daniel Gustafsson





Re: Cluster::restart dumping logs when stop fails

2024-04-07 Thread Daniel Gustafsson
> On 7 Apr 2024, at 14:51, Andrew Dunstan  wrote:
> On 2024-04-06 Sa 20:49, Andres Freund wrote:

>> That's probably unnecessary optimization, but it seems a tad silly to read an
>> entire, potentially sizable, file to just use the last 1k. Not sure if the 
>> way
>> slurp_file() uses seek supports negative ofsets, the docs read to me like 
>> that
>> may only be supported with SEEK_END.
> 
> We should enhance slurp_file() so it uses SEEK_END if the offset is negative.

Absolutely agree.  Reading the thread I think Andres argues for not printing
anything at all in this case but we should support negative offsets anyways, it
will fort sure come in handy.

--
Daniel Gustafsson





Re: Cluster::restart dumping logs when stop fails

2024-04-07 Thread Daniel Gustafsson
> On 7 Apr 2024, at 02:49, Andres Freund  wrote:
> On 2024-04-07 00:19:35 +0200, Daniel Gustafsson wrote:
>>> On 6 Apr 2024, at 23:44, Andres Freund  wrote:

>> The non-context aware fix would be to just print the last 1024 (or something)
>> bytes from the logfile:
> 
> That'd be better, yes. I'd mainly log the path to the logfile though, that's
> probably at least as helpful for actually investigating the issue?

IIRC this was modelled around how it used to be earlier, in v14 in the
pre-refactored version we print the full logfile.

How about doing the below backpatched down to 15?

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 54e1008ae5..a2f9409842 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -951,8 +951,7 @@ sub start
 
if ($ret != 0)
{
-   print "# pg_ctl start failed; logfile:\n";
-   print PostgreSQL::Test::Utils::slurp_file($self->logfile);
+   print "# pg_ctl start failed; see logfile for details: " . 
$self->logfile . "\n";
 
# pg_ctl could have timed out, so check to see if there's a pid 
file;
# otherwise our END block will fail to shut down the new 
postmaster.
@@ -1090,8 +1089,7 @@ sub restart
 
if ($ret != 0)
{
-   print "# pg_ctl restart failed; logfile:\n";
-   print PostgreSQL::Test::Utils::slurp_file($self->logfile);
+   print "# pg_ctl restart failed; see logfile for details: " . 
$self->logfile . "\n";
 
# pg_ctl could have timed out, so check to see if there's a pid 
file;
# otherwise our END block will fail to shut down the new 
postmaster.

--
Daniel Gustafsson





Re: Cluster::restart dumping logs when stop fails

2024-04-06 Thread Daniel Gustafsson
> On 6 Apr 2024, at 23:44, Andres Freund  wrote:

> It might be useful to print a few lines, but the whole log files can be
> several megabytes worth of output.

The non-context aware fix would be to just print the last 1024 (or something)
bytes from the logfile:

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 54e1008ae5..53d4751ffc 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -951,8 +951,8 @@ sub start
 
if ($ret != 0)
{
-   print "# pg_ctl start failed; logfile:\n";
-   print PostgreSQL::Test::Utils::slurp_file($self->logfile);
+   print "# pg_ctl start failed; logfile excerpt:\n";
+   print substr 
PostgreSQL::Test::Utils::slurp_file($self->logfile), -1024;
 
# pg_ctl could have timed out, so check to see if there's a pid 
file;
# otherwise our END block will fail to shut down the new 
postmaster.


Would that be a reasonable fix?

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-06 Thread Daniel Gustafsson
> On 6 Apr 2024, at 16:04, Tom Lane  wrote:
> Daniel Gustafsson  writes:
>>> On 6 Apr 2024, at 08:02, Peter Eisentraut  wrote:

>>> Why do we need to check for the versions at all?  We should just check for 
>>> the functions we need.  At least that's always been the normal approach in 
>>> configure.
> 
>> We could, but finding a stable set of functions which identifies the version 
>> of
>> OpenSSL *and* LibreSSL that we want, and their successors, while not matching
>> any older versions seemed more opaque than testing two numeric values.
> 
> I don't think you responded to Peter's point at all.  The way autoconf
> is designed to work is explicitly NOT to try to identify the exact
> version of $whatever.  Rather, the idea is to probe for the API
> features that you want to rely on: functions, macros, struct fields,
> or the like.  If you can't point to an important API difference
> between 1.0.2 and 1.1.1, why drop support for 1.0.2?

My apologies, I thought I did but clearly failed.  My point was that this is a
special/corner case where we try to find one of two different libraries (with
different ideas about backwards compatability etc) for supporting a single
thing.  So instead I tested for the explicit versions like how we already test
for the exact Perl version in config/perl.m4 (albeit that a library and a
program are two different things of course).

In bumping we want to move to 1.1.1 since that's the first version with the
rewritten RNG which is fork-safe by design, something PostgreSQL clearly
benefits from.  There is no new API for this to gate on though.  For LibreSSL
we want 3.3.2 to a) ensure we have coverage in the BF and b) since it's the
first version where the tests pass due to error message alignment with OpenSSL.
The combination of these gets rid of lots of specialcased #ifdef soup.  I
wasn't however able to find a specific API call which is unique to the two
version which we rely on.

Testing for the presence of an API provided and introduced by both libraries in
the version we're interested in, but which we don't use, is the alternative but
I thought that would be more frowned upon.  EVP_PKEY_new_CMAC_key() was
introduced in 1.1.1 and LibreSSL 3.3.2, so an AC_CHECK_FUNCS for that, as in
the attached, achieves the version check but pollutes pg_config.h with a define
which will never be used which seemed a bit ugly.

--
Daniel Gustafsson



v6-0001-Remove-support-for-OpenSSL-1.0.2-and-1.1.0.patch
Description: Binary data


Re: DROP OWNED BY fails to clean out pg_init_privs grants

2024-04-06 Thread Daniel Gustafsson
> On 6 Apr 2024, at 01:10, Tom Lane  wrote:

> (So now I'm wondering why *only* copperhead has shown this so far.
> Are our other cross-version-upgrade testing animals AWOL?)

Clicking around searching for Xversion animals I didn't spot any, but without
access to the database it's nontrivial to know which animal does what.

> I doubt this is something we'll have fixed by Monday, so I will
> go add an open item for it.

+1. Having opened the can of worms I'll have a look at it next week.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-06 Thread Daniel Gustafsson
> On 6 Apr 2024, at 08:02, Peter Eisentraut  wrote:
> 
> On 05.04.24 23:48, Daniel Gustafsson wrote:
>> The reason to expand the check is to ensure that we have the version we want
>> for both OpenSSL and LibreSSL, and deprecating OpenSSL versions isn't all 
>> that
>> commonly done so having to change the version in the check didn't seem that
>> invasive to me.
> 
> Why do we need to check for the versions at all?  We should just check for 
> the functions we need.  At least that's always been the normal approach in 
> configure.

We could, but finding a stable set of functions which identifies the version of
OpenSSL *and* LibreSSL that we want, and their successors, while not matching
any older versions seemed more opaque than testing two numeric values.  The
suggested check is modelled on the LDAP check which tests for an explicit
version in a header file (albeit not for erroring out).

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Daniel Gustafsson
> On 5 Apr 2024, at 22:55, Jacob Champion  
> wrote:
> 
> On Fri, Apr 5, 2024 at 9:59 AM Daniel Gustafsson  wrote:
>> Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 
>> fails
>> on Windows in CI which I will investigate next.

The attached version fixes the Windows 1.1.1 check which was missing the
include directory.

> The changes for SSL_OP_NO_CLIENT_RENEGOTIATION and
> SSL_R_VERSION_TOO_LOW look good to me.

Thanks!

>> +++ b/src/include/pg_config.h.in
>> @@ -84,9 +84,6 @@
>> /* Define to 1 if you have the  header file. */
>> #undef HAVE_CRTDEFS_H
>> 
>> -/* Define to 1 if you have the `CRYPTO_lock' function. */
>> -#undef HAVE_CRYPTO_LOCK
> 
> An autoreconf run on my machine pulls in more changes (getting rid of
> the symbols we no longer check for).

Ah yes, missed updating before formatting the patch. Done in the attached.

--
Daniel Gustafsson



v5-0001-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data


Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Daniel Gustafsson
> On 5 Apr 2024, at 23:26, Peter Eisentraut  wrote:
> 
> On 05.04.24 18:59, Daniel Gustafsson wrote:
>> Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 
>> fails
>> on Windows in CI which I will investigate next.
> 
> I'm not a fan of the new PGAC_CHECK_OPENSSL.  It creates a second place where 
> the OpenSSL version number has to be updated.  We had this carefully 
> constructed so that there is only one place that OPENSSL_API_COMPAT is 
> defined and that is the only place that needs to be updated.  We put the 
> setting of OPENSSL_API_COMPAT into configure so that the subsequent OpenSSL 
> tests would use it, and if the API number higher than what the library 
> supports, the tests should fail.

But does that actually work?  If I change the API_COMPAT to the 1.1.1 version
number and run configure against 1.0.2 it passes just fine.  Am I missing some
clever trick here?

The reason to expand the check is to ensure that we have the version we want
for both OpenSSL and LibreSSL, and deprecating OpenSSL versions isn't all that
commonly done so having to change the version in the check didn't seem that
invasive to me.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Daniel Gustafsson
> On 5 Apr 2024, at 18:41, Jacob Champion  
> wrote:
> On Thu, Apr 4, 2024 at 6:37 PM Michael Paquier  wrote:

>> I would be OK to draw a line to what we test in the buildfarm if it
>> comes to that, down to OpenBSD 6.9.
> 
> That would correspond to LibreSSL 3.3 if I'm not mistaken. Any
> particular reason for 6.9 as the dividing line, and not something
> later? And by "test in the buildfarm", do you mean across all
> versions, or just what we support for PG17? (For the record, I don't
> think there's any reason to drop older LibreSSL testing for earlier
> branches.)

We should draw the line on something we can reliably test, so 6.9 seems fine to
me (unless there is evidence of older versions being common in the wild).
OpenBSD themselves support 2 backbranches so 6.9 is still far beyond the EOL
mark upstream.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-05 Thread Daniel Gustafsson
> On 5 Apr 2024, at 03:37, Michael Paquier  wrote:
> On Thu, Apr 04, 2024 at 11:03:35AM -0700, Jacob Champion wrote:

>> v3 does that by putting back checks for symbols that aren't part of
>> LibreSSL (tested back to 2.7, which is where the 1.1.x APIs started to
>> arrive).
> 
> From where did you pull the LibreSSL sources?  Directly from the
> OpenBSD tree?
> 
>> It also makes adjustments for the new OPENSSL_API_COMPAT
>> version, getting rid of OpenSSL_add_all_algorithms() and adding a
>> missing header.
> 
> Ah, right.  OpenSSL_add_all_algorithms() is documented as having no
> effect in 1.1.0.

This API was deprecated and made into a no-op in OpenBSD 6.4 which corresponds
to LibreSSL 2.8.3.

>> This patch has a deficiency where 1.1.0 itself isn't actually rejected
>> at configure time; Daniel's working on an explicit check for the
>> OPENSSL/LIBRESSL_VERSION_NUMBER that should fix that up. There's an
>> open question about which version we should pin for LibreSSL, which
>> should ultimately come down to which versions of OpenBSD we want PG17
>> to support.
> 
> I would be OK to draw a line to what we test in the buildfarm if it
> comes to that, down to OpenBSD 6.9.  This version is already not
> supported, and we had a number of issues with older versions and
> timestamps going backwards.

There is a member on 6.8 as well, and while 6.8 work fine the tests all fail
due to the error messages being different.  Rather than adding alternate output
for an EOL version of OpenBSD (which currently don't even run the ssl checks in
the BF) I think using 6.9 as the minimum makes sense.

> +  # Functions introduced in OpenSSL 1.1.0/LibreSSL 2.7.0.
> +  ['OPENSSL_init_ssl', {'required': true}],
> +  ['BIO_meth_new', {'required': true}],
> +  ['ASN1_STRING_get0_data', {'required': true}],
> +  ['HMAC_CTX_new', {'required': true}],
> +  ['HMAC_CTX_free', {'required': true}],
> 
> These should be removed to save cycles in ./configure and meson, no?

Correct, they are removed in favor of a compile test for OpenSSL version.

> -  cdata.set('OPENSSL_API_COMPAT', '0x10002000L',
> +  cdata.set('OPENSSL_API_COMPAT', '0x1010L',
> 
> Seems to me that this should also document in meson.build why 1.1.0 is
> chosen, same as ./configure.

Done.

> It seems to me that src/common/protocol_openssl.c could be cleaned up;
> I see SSL_CTX_set_min_proto_version and SSL_CTX_set_max_proto_version
> listed in LibreSSL (looking at some past version of
> https://github.com/libressl/libressl.git that I still have around).

Both SSL_CTX_set_min_proto_version and SSL_CTX_set_max_proto_version are
available in at least OpenBSD 6.3 which is LibreSSL 2.7.5.  With this we can
thus remove the whole file.

> There is an extra thing in pg_strong_random.c once we cut OpenSSL <
> 1.1.1..  Do we still need pg_strong_random_init() and its RAND_poll()
> when it comes to LibreSSL?  This is a sensitive area, so we should be
> careful.

Re-reading the thread which added this comment, and the OpenSSL docs and code,
I'm leaning towards leaving this in.  The overhead is marginal and fork safety
has been broken at least once in OpenSSL since 1.1.1:

https://github.com/openssl/openssl/issues/12377

That particular bug was thankfully caught before it shipped, but mitigating the
risk is this cheap enough that is seems reasonable to keep this in.

Attached is a WIP patch to get more eyes on it, the Meson test for 1.1.1 fails
on Windows in CI which I will investigate next.

--
Daniel Gustafsson



v4-0001-Remove-support-for-OpenSSL-1.0.2.patch
Description: Binary data



Re: IPC::Run::time[r|out] vs our TAP tests

2024-04-04 Thread Daniel Gustafsson
> On 4 Apr 2024, at 23:24, Tom Lane  wrote:

> A minimum fix that seems to make this work better is as attached,
> but I feel like somebody ought to examine all the IPC::Run::timer
> and IPC::Run::timeout calls and see which ones are mistaken.
> It's a little scary to convert a timeout to a timer because of
> the hazard that someplace that would need to be checking for
> is_expired isn't.

Skimming this and a few callsites it seems reasonable to use a timer instead of
a timeout, but careful study is needed to make sure we're not introducing
anything subtly wrong in the other direction.

> Also, the debug printout code at the bottom of check_completion
> is quite useless, because control can never reach it since
> BackgroundPsql::query_until will "die" on failure.  I think that
> that code worked when written, and I'm suspicious that 664d75753
> broke it, but I've not dug deeply into the history.

AFAICT, in the previous coding the interactive_psql object would use a timer or
timeout based on what the caller provided, and check_completion used a timer so
the debug logging probably worked as written.

--
Daniel Gustafsson





Re: Security lessons from liblzma

2024-04-04 Thread Daniel Gustafsson
> On 4 Apr 2024, at 23:02, Jelte Fennema-Nio  wrote:

> How about we make it meson/make targets, so they are simply cached
> just like any of our other build artefacts are cached. Then only clean
> builds are impacted, not every test run.

They already are (well, make not meson yet), they're just not hooked up to any
top-level commands.  Running "make ssfiles-clean ssfiles" in src/test/ssl
regenerates all the files from the base config files that define their
contents.
 
--
Daniel Gustafsson





Re: Security lessons from liblzma

2024-04-04 Thread Daniel Gustafsson
> On 4 Apr 2024, at 22:47, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> On Thu, Apr 4, 2024 at 4:25 PM Daniel Gustafsson  wrote:
>>> I don't disagree, like I said that very email: it's non-trivial and I wish 
>>> we
>>> could make it better somehow, but I don't hav an abundance of good ideas.
> 
>> Is the basic issue that we can't rely on the necessary toolchain to be
>> present on every machine where someone might try to build PostgreSQL?
> 
> IIUC, it's not really that, but that regenerating these files is
> expensive; multiple seconds even on fast machines.  Putting that
> into tests that are run many times a day is unappetizing.

That's one aspect of it.  We could cache the results of course to amortize the
cost over multiple test-runs but at the end of the day it will add time to
test-runs regardless of what we do.

One thing to consider would be to try and rearrange/refactor the tests to
require a smaller set of keys and certificates.  I haven't looked into what
sort of savings that could yield (if any) but if we go the route of
regeneration at test-time we shouldn't leave potential savings on the table.

--
Daniel Gustafsson





Re: Security lessons from liblzma

2024-04-04 Thread Daniel Gustafsson
> On 4 Apr 2024, at 22:40, Robert Haas  wrote:
> 
> On Thu, Apr 4, 2024 at 4:25 PM Daniel Gustafsson  wrote:
>> I don't disagree, like I said that very email: it's non-trivial and I wish we
>> could make it better somehow, but I don't hav an abundance of good ideas.
> 
> Is the basic issue that we can't rely on the necessary toolchain to be
> present on every machine where someone might try to build PostgreSQL?

AFAIK we haven't historically enforced that installations have the openssl
binary in PATH, but it would be a pretty low bar to add.  The bigger issue is
likely to find someone to port this to Windows, it probably won't be too hard
but as with all things building on Windows, we need someone skilled in that
area to do it.

>> Removing the generated versions and creating them when running tests makes
>> sneaking in malicious content harder since it then has to be submitted in
>> clear-text *only*.  The emphasis added since it's like that today as well: 
>> *I*
>> fully trust our team of committers to not accept a binary file in a patch
>> without replacing with a regenerated version, but enforcing it might make it
>> easier for a wider community to share that level of trust?
> 
> To be honest, I'm not at all sure that I would have considered
> regenerating a binary file to be a must-do kind of a thing, so I guess
> that's a lesson learned for me. Trust is a really tricky thing in
> cases like this. It's not just about whether some committer is
> secretly a malicious actor; it's also about whether everyone
> understands the best practices and follows them consistently. In that
> regard, I don't even trust myself. I hope that it's unlikely that I
> would mistakenly commit something malicious, but I think it could
> happen, and I think it could happen to anyone else, too.

It absolutelty could.  Re-reading Ken Thompsons Turing Lecture "Reflections on
Trusting Trust" at periodic intervals is a good reminder to self just how
complicated this is.

--
Daniel Gustafsson





Re: Security lessons from liblzma

2024-04-04 Thread Daniel Gustafsson
> On 4 Apr 2024, at 21:38, Robert Haas  wrote:

> Essentially, your argument is the same as his, namely: hey, don't
> worry, you could totally verify these test files if you wanted to! But
> of course, nobody did, because it was hard, and everybody had better
> things to do with their time. And I think the same thing is probably
> true here: nobody really is going to verify much about these files.

I don't disagree, like I said that very email: it's non-trivial and I wish we
could make it better somehow, but I don't hav an abundance of good ideas.

Removing the generated versions and creating them when running tests makes
sneaking in malicious content harder since it then has to be submitted in
clear-text *only*.  The emphasis added since it's like that today as well: *I*
fully trust our team of committers to not accept a binary file in a patch
without replacing with a regenerated version, but enforcing it might make it
easier for a wider community to share that level of trust?

--
Daniel Gustafsson





Re: postgres_fdw fails because GMT != UTC

2024-04-04 Thread Daniel Gustafsson
> On 4 Apr 2024, at 08:19, Tom Lane  wrote:
> 
> Over at [1] we have a complaint of postgres_fdw failing with
> a remote-server error
> 
>> ERROR:  invalid value for parameter "TimeZone": "UTC"
> 
> I am not quite clear on how broken an installation needs to be to
> reject "UTC" as a time zone setting, except that the breakage cannot
> be subtle.  However, I notice that our code in pgtz.c and other
> places treats "GMT" as a hard-wired special case ... but not "UTC".
> I wonder if we ought to modify those places to force "UTC" down the
> same hard-wired paths.  If we acted like that, this would have worked
> no matter how misconfigured the installation was.

+1. It makes little sense to support GMT like that but not UTC.

> An alternative answer could be to change postgres_fdw to send "GMT"
> not "UTC".  That's ugly from a standards-compliance viewpoint, but
> it would fix this problem even with a non-updated remote server,
> and I think postgres_fdw is generally intended to work with even
> very old remote servers.

There is always a risk in accomodating broken installations that it might hide
other subtle bugs, but off the cuff that risk seems quite low in this case.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-04 Thread Daniel Gustafsson
> On 4 Apr 2024, at 01:50, Thomas Munro  wrote:

> That's a shame.  But it sounds like the developer burden isn't so
> different from 1.1.1 to 3.x, so maybe it's not such a big deal from
> our point of view.

It isn't as of now since OpenSSL still supply the deprecated API's we use, but
there is no guarantee that they will do that for 5 more years.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-04 Thread Daniel Gustafsson
> On 4 Apr 2024, at 01:24, Michael Paquier  wrote:
> 
> On Wed, Apr 03, 2024 at 01:38:50PM -0400, Tom Lane wrote:
>> The discussion we had last year concluded that we were OK with
>> dropping 1.0.1 support when RHEL6 goes out of extended support
>> (June 2024 per this thread, I didn't check it).  Seems like we
>> should have the same policy for RHEL7.  Also, calling Photon 3
>> dead because it went EOL three days ago seems over-hasty.
> 
> Yeah.  A bunch of users of Photon are VMware (or you could say
> Broadcom) product appliances, and I'd suspect that quite a lot of them
> rely on Photon 3 for their base OS image.  Upgrading that stuff is not
> easy work in my experience because they need to cope with a bunch of
> embedded services.

That's true, but Photon3 won't package new major versions of PostgreSQL (the
latest RPM is 13.14).  Anyone who builds v17 on Photon 3 on their own can just
as well be expected to build an updated OpenSSL no?  This is equivalent to
RHEL7 which was discussed elsewhere in this thread.

If we are going to pin version dependencies for postgres X to available OS
release packages then it, IMHO, is reasonable to be for OS's that realistically
will package X (either by the vendor or a trusted external packager like PGDG).

It's possible, but not guaranteed, that RHEL8 ships v17 packages in ther
Application Streams Life Cycle model, they have packaged v15 so far with
retirement in 2028 so it seems likely there will be another package to retire
in 2029 when RHEL8 finally goes away (whether that will be v16 or v17 is also
speculation).  Thus, pinning on 1.1.1 is grounded in packaging reality, even
though I sincerely hope that noone who isn't paying for support is running
1.1.1 now, let alone in 4 years from now.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-04 Thread Daniel Gustafsson
> On 4 Apr 2024, at 00:51, Peter Eisentraut  wrote:
> 
> On 30.03.24 22:27, Thomas Munro wrote:
>> On Sun, Mar 31, 2024 at 9:59 AM Tom Lane  wrote:
>>> Thomas Munro  writes:
>>>> I was reminded of this thread by ambient security paranoia.  As it
>>>> stands, we require 1.0.2 (but we very much hope that package
>>>> maintainers and others in control of builds don't decide to use it).
>>>> Should we skip 1.1.1 and move to requiring 3 for v17?
>>> 
>>> I'd be kind of sad if I couldn't test SSL stuff anymore on my
>>> primary workstation, which has
>>> 
>>> $ rpm -q openssl
>>> openssl-1.1.1k-12.el8_9.x86_64
>>> 
>>> I think it's probably true that <=1.0.2 is not in any distro that
>>> we still need to pay attention to, but I reject the contention
>>> that RHEL8 is not in that set.
>> Hmm, OK so it doesn't have 3 available in parallel from base repos.
>> But it's also about to reach end of "full support" in 2 months[1], so
>> if we applied the policies we discussed in the LLVM-vacuuming thread
>> (to wit: build farm - EOL'd OSes), then...  One question I'm unclear
>> on is whether v17 will be packaged for RHEL8.
> 
> The rest of the thread talks about the end of support of RHEL 7, but you are 
> here talking about RHEL 8.   It is true that "full support" for RHEL 8 ended 
> in May 2024, but that is the not the one we are tracking.  We are tracking 
> the 10-year one, which I suppose is now called "maintenance support".
> 
> So if the above package list is correct, then we ought to keep supporting 
> openssl 1.1.* until 2029.

Not 1.1.* but 1.1.1+.  In the old OpenSSL version numbering scheme, releases
changing the last digit would contain new features and releases that updated
the appended letter only fixed bugs.
--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Daniel Gustafsson
> On 4 Apr 2024, at 00:06, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> On 3 Apr 2024, at 19:38, Tom Lane  wrote:
>>> Bottom line for me is that pulling 1.0.1 support now is OK,
>>> but I think pulling 1.0.2 is premature.
> 
>> Is Red Hat building and and shipping v17 packages for RHEL7 ELS customers?  
>> If
>> not then it seems mostly academical to tie our dependencies to RHEL ELS 
>> unless
>> I'm missing something.
> 
> True, they won't be doing that, and neither will Devrim.  So maybe
> we can leave RHEL7 out of the discussion, in which case there's
> not a lot of reason to keep 1.0.2 support.  We'll need to notify
> buildfarm owners to adjust their configurations.

The patch will also need to be adjusted to work with LibreSSL, but I know Jacob
was looking into that so ideally we should have something to review before
the weekend.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Daniel Gustafsson
> On 3 Apr 2024, at 21:48, Andrew Dunstan  wrote:
> On 2024-04-03 We 15:12, Daniel Gustafsson wrote:
>>   The
>> fact that very few animals run the ssl tests is a pet peeve of mine, it would
>> be nice if we could get broader coverage there.
> 
> Well, the only reason for that is that the SSL tests need to be listed in 
> PG_TEST_EXTRA, and the only reason for that is that   there's a possible 
> hazard on multi-user servers. But I bet precious few buildfarm animals run in 
> such an environment. Mine don't - I'm the only user. 
> 
> Maybe we could send out an email to the buildfarm-owners list asking people 
> to consider allowing the ssl tests.

I think that sounds like a good idea.

--
Daniel Gustafsson





Re: Security lessons from liblzma

2024-04-03 Thread Daniel Gustafsson
> On 3 Apr 2024, at 20:09, Peter Eisentraut  wrote:
> 
> On 30.03.24 00:14, Daniel Gustafsson wrote:
>> One take-away for me is how important it is to ship recipes for regenerating
>> any testdata which is included in generated/compiled/binary format.  Kind of
>> how we in our tree ship the config for test TLS certificates and keys which 
>> can
>> be manually inspected, and used to rebuild the testdata (although the risk 
>> for
>> injections in this particular case seems low).  Bad things can still be
>> injected, but formats which allow manual review at least goes some way 
>> towards
>> lowering risk.
> 
> I actually find the situation with the test TLS files quite unsatisfactory.  
> While we have build rules, the output files are not reproducible, both 
> because of some inherent randomness in the generation, and because different 
> OpenSSL versions produce different details.

This testdata is by nature not reproducible, and twisting arms to make it so
will only result in testing against synthetic data which risk hiding bugs IMO.

> So you just have to "trust" that what's there now makes sense.

Not entirely, you can review the input files which are used to generate the
test data and verify that those make sense..

> Of course, you can use openssl tools to unpack these files, but that is 
> difficult and unreliable unless you know exactly what you are looking for.

..and check like you mention above, but it's as you say not entirely trivial.  
It
would be nice to improve this but I'm not sure how.  Document how to inspect
the data in src/test/ssl/README perhaps?

> It would be better if we created the required test files as part of the test 
> run.  (Why not?  Too slow?)

The make sslfiles step requires OpenSSL 1.1.1, which is higher than what we
require to be installed to build postgres.  The higher-than-base requirement is
due to it building test data only used when running 1.1.1 or higher, so
technically it could be made to work if anyone is interesting in investing time
in 1.0.2.

Time is one aspect, on my crusty old laptop it takes ~2 seconds to regenerate
the files.  That in itself isn't that much, but we've rejected test-time
additions far less than that.  We could however make CI and the Buildfarm run
the regeneration and leave it up to each developer to decide locally?  Or
remove them and regenerate on the first SSL test run and then use the cached
ones after that?

On top of time I have a feeling the regeneration won't run on Windows.  When
it's converted to use Meson then that might be fixed though.

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Daniel Gustafsson
> On 3 Apr 2024, at 17:29, Tom Lane  wrote:
> 
> Jacob Champion  writes:
>> As far as I can tell, no versions of LibreSSL so far provide
>> X509_get_signature_info(), so this patch is probably a bit too
>> aggressive.
> 
> Another problem with cutting support is how many buildfarm members
> will we lose.  I scraped recent configure logs and got the attached
> results.  I count 3 machines running 1.0.1,

Support for 1.0.1 was removed with 8e278b657664 in July 2023 so those are not
building with OpenSSL enabled already.

> 18 running some flavor of 1.0.2,

massasauga and snakefly run the ssl_passphrase_callback-check test but none of
these run the ssl-check tests AFAICT, so we have very low coverage as is.  The
fact that very few animals run the ssl tests is a pet peeve of mine, it would
be nice if we could get broader coverage there.

Worth noting is that the OpenSSL check in configure.ac only reports what the
version of the OpenSSL binary in $PATH is, not which version of the library
that we build against (using --with-libs/--with-includes etc).

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-03 Thread Daniel Gustafsson
> On 3 Apr 2024, at 19:38, Tom Lane  wrote:
> 
> Jacob Champion  writes:
>> The RHEL7-alikes are the biggest set, but that's already discussed
>> above. Looks like SUSE 12 goes EOL later this year (October 2024), and
>> it ships OpenSSL 1.1.1 as an option. Already-dead distros are Ubuntu
>> 16.04 (April 2021), Photon 2 (January 2023), and Photon 3 (March
>> 2024). That leaves AL2, OpenIndiana Hipster, and Solaris 11.4, all of
>> which appear to have newer versions of OpenSSL shipped and selectable.
> 
> The discussion we had last year concluded that we were OK with
> dropping 1.0.1 support when RHEL6 goes out of extended support
> (June 2024 per this thread, I didn't check it).  Seems like we
> should have the same policy for RHEL7.  Also, calling Photon 3
> dead because it went EOL three days ago seems over-hasty.
> 
> Bottom line for me is that pulling 1.0.1 support now is OK,
> but I think pulling 1.0.2 is premature.

Is Red Hat building and and shipping v17 packages for RHEL7 ELS customers?  If
not then it seems mostly academical to tie our dependencies to RHEL ELS unless
I'm missing something.

--
Daniel Gustafsson





Re: Add missing error codes to PANIC/FATAL error reports in xlog.c and relcache.c

2024-04-03 Thread Daniel Gustafsson
> On 6 Mar 2024, at 09:59, Daniel Gustafsson  wrote:

> Nothing sticks out from reading through these patches, they seem quite ready 
> to
> me.

Took another look at this today and committed it. Thanks!

--
Daniel Gustafsson





Re: RFC: Additional Directory for Extensions

2024-04-03 Thread Daniel Gustafsson
> On 3 Apr 2024, at 09:13, Alvaro Herrera  wrote:
> 
> On 2024-Apr-02, David E. Wheeler wrote:
> 
>> That quotation comes from this Debian patch[2] maintained by Christoph
>> Berg. I’d like to formally propose integrating this patch into the
>> core. And not only because it’s overhead for package maintainers like
>> Christoph, but because a number of use cases have emerged since we
>> originally discussed something like this back in 2013[3]:
> 
> I support the idea of there being a second location from where to load
> shared libraries

Agreed, the case made upthread that installing an extension breaks the app
signing seems like a compelling reason to do this.

The implementation of this need to make sure the directory is properly set up
however to avoid similar problems that CVE 2019-10211 showed.

--
Daniel Gustafsson





Re: [HACKERS] make async slave to wait for lsn to be replayed

2024-04-03 Thread Daniel Gustafsson
Buildfarm animal mamba (NetBSD 10.0 on macppc) started failing on this with
what seems like a bogus compiler warning:

waitlsn.c: In function 'WaitForLSN':
waitlsn.c:275:24: error: 'endtime' may be used uninitialized in this function 
[-Werror=maybe-uninitialized]
  275 |delay_ms = (endtime - GetCurrentTimestamp()) / 1000;
  |   ~^~~~
cc1: all warnings being treated as errors

endtime is indeed initialized further up, but initializing endtime at
declaration seems innocent enough and should make this compiler happy and the
buildfarm greener.

diff --git a/src/backend/commands/waitlsn.c b/src/backend/commands/waitlsn.c
index 6679378156..17ad0057ad 100644
--- a/src/backend/commands/waitlsn.c
+++ b/src/backend/commands/waitlsn.c
@@ -226,7 +226,7 @@ void
 WaitForLSN(XLogRecPtr targetLSN, int64 timeout)
 {
XLogRecPtr  currentLSN;
-   TimestampTz endtime;
+   TimestampTz endtime = 0;

/* Shouldn't be called when shmem isn't initialized */
Assert(waitLSN);

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-02 Thread Daniel Gustafsson
> On 2 Apr 2024, at 20:55, Daniel Gustafsson  wrote:

> The attached removes 1.0.2 support (meson build parts untested yet)

And a rebased version which applies over the hmac_openssl.c changes earlier
today that I hadn't pulled in.

--
Daniel Gustafsson



v2-0001-Remove-support-for-OpenSSL-1.0.2-and-1.1.0.patch
Description: Binary data


Re: Reports on obsolete Postgres versions

2024-04-02 Thread Daniel Gustafsson
> On 2 Apr 2024, at 22:46, Bruce Momjian  wrote:
> On Tue, Apr  2, 2024 at 11:34:46AM +0200, Magnus Hagander wrote:

>> I do like the term "current"  better. It conveys (at least a bit) that we
>> really consider all the older ones to be, well, obsolete. The difference
>> "current vs obsolete" is stronger than "latest vs not quite latest".
> 
> Okay, I changed "superseded" to "old", and changed "latest" to
> "current", patch attached.

+1, LGTM

--
Daniel Gustafsson





Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2024-04-02 Thread Daniel Gustafsson
> On 30 Mar 2024, at 22:27, Thomas Munro  wrote:
> On Sun, Mar 31, 2024 at 9:59 AM Tom Lane  wrote:

Thanks a lot for bringing this up again Thomas, 1.0.2 has bitten me so many
times and I'd be thrilled to get rid of it.

>> I think it's probably true that <=1.0.2 is not in any distro that
>> we still need to pay attention to, but I reject the contention
>> that RHEL8 is not in that set.
> 
> Hmm, OK so it doesn't have 3 available in parallel from base repos.
> But it's also about to reach end of "full support" in 2 months[1], so
> if we applied the policies we discussed in the LLVM-vacuuming thread
> (to wit: build farm - EOL'd OSes), then...  One question I'm unclear
> on is whether v17 will be packaged for RHEL8.

While 1.1.1 is EOL in upstream, it won't buy us much to deprecate past it since
we don't really make use of 3.x specific functionality.  I wouldn't mind not
being on the hook to support an EOL version of OpenSSL for another 5 years, but
it also won't shift the needle too much.  For v18 I'd like to work on modernize
our OpenSSL code to make more use of 3+ features/API's and that could be a good
point to cull 1.1.1 support.

Settling for removing support for 1.0.2, which is antiques roadshow material at
this point (no TLSv1.3 support for example), removes most of the compatibility
mess we have in libpq.  1.1.1 was not a deprecation point in OpenSSL but we can
define 1.1.0 as our compatibility level to build warning-free.

The attached removes 1.0.2 support (meson build parts untested yet) with a few
small touch ups of related documentation.  I haven't yet done the research on
where that leaves LibreSSL since we don't really define anywhere what we
support (so for we've gotten by assuming it's kind of sort 1.0.2 for the parts
we care about which is skating on fairly thin ice).

--
Daniel Gustafsson



v1-0001-Remove-support-for-OpenSSL-1.0.2-and-1.1.0.patch
Description: Binary data


Re: Fix resource leak (src/backend/libpq/be-secure-common.c)

2024-04-02 Thread Daniel Gustafsson
> On 2 Apr 2024, at 20:13, Ranier Vilela  wrote:

> Fix by freeing the pointer, like pclose_check (src/common/exec.c) similar 
> case.

Off the cuff, seems reasonable when loglevel is LOG.

--
Daniel Gustafsson





Re: Confusing #if nesting in hmac_openssl.c

2024-04-02 Thread Daniel Gustafsson
> On 2 Apr 2024, at 15:50, Tom Lane  wrote:

> I'll go ahead with what I have.

+1

--
Daniel Gustafsson





Re: Confusing #if nesting in hmac_openssl.c

2024-04-02 Thread Daniel Gustafsson
> On 2 Apr 2024, at 02:01, Tom Lane  wrote:

> hmac_openssl.c:90:1: warning: unused function 'ResourceOwnerRememberHMAC' 
> [-Wunused-function]
> hmac_openssl.c:95:1: warning: unused function 'ResourceOwnerForgetHMAC' 
> [-Wunused-function]
> 
> Looking at the code, this is all from commit e6bdfd970, and apparently
> batfish is our only animal that doesn't HAVE_HMAC_CTX_NEW.

Thanks for looking at this, it's been on my TODO for some time.  It's a warning
which only shows up when building against 1.0.2, the functions are present in
1.1.0 and onwards (while deprecated in 3.0).

> I don't think
> it is helpful to put the resource owner manipulations inside #ifdef
> HAVE_HMAC_CTX_NEW and HAVE_HMAC_CTX_FREE --- probably, it would never
> be the case that only one of those is defined,

Correct, no version of OpenSSL has only one of them defined.

> What do you think of rearranging it as attached?

+1 on this patch, it makes the #ifdef soup more readable.  We could go even
further and remove the HAVE_HMAC defines completely with USE_RESOWNER_FOR_HMAC
being set by autoconf/meson?  I've attached an untested sketch diff to
illustrate.

A related tangent.  If we assembled the data to calculate on ourselves rather
than rely on OpenSSL to do it with subsequent _update calls we could instead
use the simpler HMAC() API from OpenSSL.  That would remove the need for the
HMAC_CTX and resource owner tracking entirely and just have our pg_hmac_ctx.
Thats clearly not for this patch though, just thinking out loud that we set up
OpenSSL infrastructure that we don't really use.

--
Daniel Gustafsson



hmac_context.diff
Description: Binary data


Re: Reports on obsolete Postgres versions

2024-04-02 Thread Daniel Gustafsson
> On 2 Apr 2024, at 00:56, Bruce Momjian  wrote:

> I ended up writing the attached doc patch.  I found that some or our
> text was overly-wordy, causing the impact of what we were trying to say
> to be lessened.  We might want to go farther than this patch, but I
> think it is an improvement.

Agreed, this is an good incremental improvement over what we have.

> I also moved the  text to the bottom of the section

+1

A few small comments:

+considers performing minor upgrades to be less risky than continuing to
+run superseded minor versions.

I think "superseded minor versions" could be unnecessarily complicated for
non-native speakers, I consider myself fairly used to reading english but still
had to spend a few extra (brain)cycles parsing the meaning of it in this
context.

+ We recommend that users always run the latest minor release associated

Or perhaps "current minor release" which is the term we use in the table below
on the same page?

--
Daniel Gustafsson





Re: Remove excessive trailing semicolons

2024-03-29 Thread Daniel Gustafsson
> On 29 Mar 2024, at 10:14, Richard Guo  wrote:

> Noticed some newly introduced excessive trailing semicolons:
> 
> $ git grep -E ";;$" -- *.c *.h
> src/include/lib/radixtree.h:int deletepos = 
> slot - n4->children;;
> src/test/modules/test_tidstore/test_tidstore.c: BlockNumber prevblkno = 0;;
> 
> Here is a trivial patch to remove them.

Thanks, applied!

--
Daniel Gustafsson





Re: Security lessons from liblzma

2024-03-29 Thread Daniel Gustafsson
> On 29 Mar 2024, at 23:59, Andres Freund  wrote:
> On 2024-03-29 18:37:24 -0400, Bruce Momjian wrote:

>> Now, we don't take pull requests, and all our committers are known
>> individuals, but this might have cautionary lessons for us.
> 
> I am doubtful that every committer would find something sneaky hidden in
> e.g. one of the test changes in a large commit. It's not too hard to hide
> something sneaky.

One take-away for me is how important it is to ship recipes for regenerating
any testdata which is included in generated/compiled/binary format.  Kind of
how we in our tree ship the config for test TLS certificates and keys which can
be manually inspected, and used to rebuild the testdata (although the risk for
injections in this particular case seems low).  Bad things can still be
injected, but formats which allow manual review at least goes some way towards
lowering risk.

--
Daniel Gustafsson





Re: pgsql: Clean up role created in new subscription test.

2024-03-27 Thread Daniel Gustafsson
> On 27 Mar 2024, at 16:26, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> The only option is to make the check opt-in via a command-line parameter for
>> use it in the main regress tests, and not use it for the contrib tests.  The
>> attached v7 does that and passes CI, but I wonder if it's worth it all with
>> that restriction?
> 
> Yeah, that seems hardly worth it --- and it's only an accident of
> implementation that we don't run the main tests in parallel with
> something else, anyway.  Too bad, but ...

Agreed.  The excercise did catch the leftovers in 0001 so I'll go ahead with
those before closing the CF entry.

--
Daniel Gustafsson





Re: pgsql: Clean up role created in new subscription test.

2024-03-27 Thread Daniel Gustafsson
> On 25 Mar 2024, at 19:48, Andres Freund  wrote:

> I don't think it's that, but that the freebsd task tests the installcheck
> equivalent in meson.  I haven't checked what your patch is doing, but perhaps
> the issue is that it's seeing global objects concurrently created by another
> test?

Sorry, I had a look when Peter replied a while back but forgot to update this
thread.  The reason for the failure is that when running multiple pg_regress in
parallel against a single instance it is impossible to avoid global object
pollution from other tests concurrently executing.  Since pg_regress has no
idea about the contents of the tests it also cannot apply any smarts in
filtering out such objects.  The CI failures comes from the contrib tests which
run in parallel.

The only option is to make the check opt-in via a command-line parameter for
use it in the main regress tests, and not use it for the contrib tests.  The
attached v7 does that and passes CI, but I wonder if it's worth it all with
that restriction?

The 0001 cleanup patch is still relevant (and was found by using this feature)
though but that might be all for this thread.

--
Daniel Gustafsson



v7-0002-pg_regress-Detect-global-objects-left-over-after-.patch
Description: Binary data


v7-0001-Drop-global-objects-after-completed-test.patch
Description: Binary data


Re: Possibility to disable `ALTER SYSTEM`

2024-03-26 Thread Daniel Gustafsson
> On 26 Mar 2024, at 13:11, Robert Haas  wrote:
> On Mon, Mar 25, 2024 at 5:04 PM Bruce Momjian  wrote:

>> To me, externally_managed_configuration is promising a lot more than it
>> delivers because there is still a lot of ocnfiguration it doesn't
>> control.  I am also confused why the purpose of the feature, external
>> management of configuation, is part of the variable name.  We usually
>> name parameters for what they control.
> 
> I actually agree with this. I wasn't going to quibble with it because
> other people seemed to like it. But I think something like
> allow_alter_system would be better, as it would describe the exact
> thing that the parameter does, rather than how we think the parameter
> ought to be used.

+Many.  Either allow_alter_system or enable_alter_system_command is IMO
preferrable, not least because someone might use this without using any
external configuration tool, making the name even more misleading.

--
Daniel Gustafsson





Re: Potential issue in ecpg-informix decimal converting functions

2024-03-25 Thread Daniel Gustafsson
> On 6 Mar 2024, at 20:12, a.ima...@postgrespro.ru wrote:

> I agree with the proposed changes in favor of backward compatibility.

I went ahead to pushed this after another look.  I'm a bit hesitant to
backpatch this since there are no reports against it, and I don't have good
sense for how ECPG code is tested and maintained across minor version upgrades.
If we want to I will of course do so, so please chime in in case there are
different and more informed opinions.

> Also, is it a big deal that the PGTYPESnumeric_to_long() function doesn't
> exactly match the documentation, compared to PGTYPESnumeric_to_int()? It
> handles underflow case separately and sets errno to PGTYPES_NUM_UNDERFLOW
> additionally.

Fixed as well.

--
Daniel Gustafsson





Re: sslinfo extension - add notbefore and notafter timestamps

2024-03-22 Thread Daniel Gustafsson
> On 20 Mar 2024, at 17:32, Jacob Champion  
> wrote:

> I can't find anything else to note; patch LGTM.

While staging this to commit I realized one silly thing about it warranting
another round here.  The ASN.1 timediff code can diff against *any* timestamp,
not just the UNIX epoch, so we could just pass in the postgres epoch and skip
the final subtraction since we're already correctly adjusted.  This removes the
non-overflow checked arithmetic with a simpler logic.

--
Daniel Gustafsson



v12-0001-Add-notBefore-and-notAfter-to-SSL-cert-info-disp.patch
Description: Binary data


Re: documentation structure

2024-03-21 Thread Daniel Gustafsson
> On 22 Mar 2024, at 00:33, Peter Eisentraut  wrote:
> 
> On 19.03.24 14:50, Tom Lane wrote:
>> Daniel Gustafsson  writes:
>>> It's actually not very odd, the reference section is using  
>>> elements
>>> and we had missed the arabic numerals setting on those.  The attached fixes
>>> that for me.  That being said, we've had roman numerals for the reference
>>> section since forever (all the way down to the 7.2 docs online has it) so 
>>> maybe
>>> it was intentional?
>> I'm quite sure it *was* intentional.  Maybe it was a bad idea, but
>> it's not that way simply because nobody thought about it.
> 
> Looks to me it was just that way because it's the default setting of the 
> stylesheets.

That's quite possible.  I don't have strong opinions on whether we should
change, or keep it the way it is.

--
Daniel Gustafsson





Re: doc issues in event-trigger-matrix.html

2024-03-21 Thread Daniel Gustafsson
> On 21 Mar 2024, at 22:47, Peter Eisentraut  wrote:
> 
> On 19.03.24 10:34, Daniel Gustafsson wrote:
>>>> "Only for local objects"
>>>> is there any reference explaining "local objects"?
>>>> I think local object means objects that only affect one single database?
>> That's a bigger problem than the table representation, we never define what
>> "local object" mean anywhere in the EVT docs.  EV's are global for a 
>> database,
>> but not a cluster, so I assume what this means is that EVs for non-DDL 
>> commands
>> like COMMENT can only fire for a specific relation they are attached to and 
>> not
>> database wide?
> 
> I think we could replace this whole table by a few definitions:

Simply extending the "Overview of Event Trigger Behavior" section slightly
might even be enough?

> If tomorrow someone changes ... will they remember to update this table?

Highly unlikely.

--
Daniel Gustafsson





Re: Refactor SASL exchange in preparation for OAuth Bearer

2024-03-21 Thread Daniel Gustafsson
> On 20 Mar 2024, at 15:28, Daniel Gustafsson  wrote:
> 
>> On 29 Feb 2024, at 20:58, Jacob Champion  
>> wrote:
>> 
>> On Wed, Feb 28, 2024 at 2:54 PM Daniel Gustafsson  wrote:
>>> I rank that as one of my better typos actually. Fixed though.
>> 
>> LGTM!
> 
> Thanks for review, and since Heikki marked it ready for committer I assume 
> that
> counting as a +1 as well.  Attached is a rebase on top of HEAD to get a fresh
> run from the CFBot before applying this.

And after another pass over it I ended up pushing it today.

--
Daniel Gustafsson





Re: An improved README experience for PostgreSQL

2024-03-21 Thread Daniel Gustafsson
> On 21 Mar 2024, at 15:11, Nathan Bossart  wrote:
> 
> On Thu, Mar 21, 2024 at 02:42:27PM +0100, Peter Eisentraut wrote:
>> The committed README.md contains trailing whitespace on one line:
>> 
>> General documentation about this version of PostgreSQL can be found at:$
>> -https://www.postgresql.org/docs/devel/$
>> +<https://www.postgresql.org/docs/devel/>  $
>> 
>> If this is intentional (it could be, since trailing whitespace is
>> potentially significant in Markdown), then please add something to
>> .gitattributes.  Otherwise, please fix that.
> 
> I added that to maintain the line break that was in the non-Markdown
> version.  I'd rather match the style of the following paragraph (patch
> attached) than mess with .gitattributes.

+1, this looks better IMHO.

--
Daniel Gustafsson





Re: [PATCH] Exponential backoff for auth_delay

2024-03-20 Thread Daniel Gustafsson
> On 20 Mar 2024, at 22:21, Jacob Champion  
> wrote:
> 
> On Wed, Mar 20, 2024 at 2:15 PM Jacob Champion
>  wrote:
>> I think solutions for case 1 and case 2 are necessarily at odds under
>> the current design, if auth_delay relies on slot exhaustion to do its
>> work effectively. Weakening that on purpose doesn't make much sense to
>> me; if a DBA is uncomfortable with the DoS implications then I'd argue
>> they need a different solution. (Which we could theoretically
>> implement, but it's not my intention to sign you up for that. :D )
> 
> The thread got quiet, and I'm nervous that I squashed it unintentionally. :/
> 
> Is there consensus on whether the backoff is useful, even without the
> host tracking? (Or, alternatively, is the host tracking helpful in a
> way I'm not seeing?) Failing those, is there a way forward that could
> make it useful in the future?

I actually wrote more or less the same patch with rudimentary attacker
fingerprinting, and after some off-list discussion decided to abandon it for
the reasons discussed in this thread.  It's unlikely to protect against the
attackers we wan't to protect the cluster against since they won't wait for the
delay anyways.

--
Daniel Gustafsson





Re: sslinfo extension - add notbefore and notafter timestamps

2024-03-20 Thread Daniel Gustafsson
> On 20 Mar 2024, at 15:28, Jacob Champion  
> wrote:

>> +   result -= ((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY);
>> +   return TimestampTzGetDatum(result);
> 
> Is that final bare subtraction able to wrap around for dates far in the past?

We are subtracting 30 years from a calculation that we know didnt overflow, so
I guess if the certificate notBefore (the notAfter cannot be that early since
we wouldn't be able to connect with it) was set to early enough?  It didn't
strike me as anything above academical unless I'm thinking wrong here.

--
Daniel Gustafsson





Re: Refactor SASL exchange in preparation for OAuth Bearer

2024-03-20 Thread Daniel Gustafsson
> On 29 Feb 2024, at 20:58, Jacob Champion  
> wrote:
> 
> On Wed, Feb 28, 2024 at 2:54 PM Daniel Gustafsson  wrote:
>> I rank that as one of my better typos actually. Fixed though.
> 
> LGTM!

Thanks for review, and since Heikki marked it ready for committer I assume that
counting as a +1 as well.  Attached is a rebase on top of HEAD to get a fresh
run from the CFBot before applying this.

--
Daniel Gustafsson



v3-0002-Explicitly-require-password-for-SCRAM-exchange.patch
Description: Binary data


v3-0001-Refactor-SASL-exchange-to-return-tri-state-status.patch
Description: Binary data


Re: sslinfo extension - add notbefore and notafter timestamps

2024-03-20 Thread Daniel Gustafsson
> On 20 Mar 2024, at 00:24, Cary Huang  wrote:

> but it seems to me that many of the timestamp related functions still consider
> timestamp or timestampTz as "double values with units of seconds" though. 

The issue here is that postgres use a different epoch from the unix epoch, so
any dates calcuated based on the unix epoch need to be adjusted.  I've hacked
this up in the attached v11 using overflow-safe integer mul/add as proposed by
Jacob upthread (we really shouldn't risk overflowing an int64 here but there is
no harm in using belts and suspenders here as a defensive measure).

The attached v11 is what I propose we go ahead with unless there further
comments on this.

--
Daniel Gustafsson



v11-0001-Add-notBefore-and-notAfter-to-SSL-cert-info-disp.patch
Description: Binary data


Re: Possibility to disable `ALTER SYSTEM`

2024-03-19 Thread Daniel Gustafsson
> On 19 Mar 2024, at 15:51, Tom Lane  wrote:
> 
> Heikki Linnakangas  writes:
>> Perhaps we could make that even better with a GUC though. I propose a 
>> GUC called 'configuration_managed_externally = true / false". If you set 
>> it to true, we prevent ALTER SYSTEM and make the error message more 
>> definitive:
> 
>> postgres=# ALTER SYSTEM SET wal_level TO minimal;
>> ERROR:  configuration is managed externally
> 
>> As a bonus, if that GUC is set, we could even check at server startup 
>> that all the configuration files are not writable by the postgres user, 
>> and print a warning or refuse to start up if they are.
> 
> I like this idea.  The "bonus" is not optional though, because
> setting the files' ownership/permissions is the only way to be
> sure that the prohibition is even a little bit bulletproof.

Agreed, assuming we can solve the below..

> One small issue: how do we make that work on Windows?  Have recent
> versions grown anything that looks like real file permissions?

--
Daniel Gustafsson





Re: Possibility to disable `ALTER SYSTEM`

2024-03-19 Thread Daniel Gustafsson
> On 19 Mar 2024, at 17:53, Jelte Fennema-Nio  wrote:
> 
> On Tue, 19 Mar 2024 at 17:05, Tom Lane  wrote:
>> I've said this repeatedly: it's not enough.  The only reason we need
>> any feature whatsoever is that somebody doesn't trust their database
>> superusers to not try to modify the configuration.
> 
> And as everyone else on this thread has said: It is enough. Because
> the point is not security, the point is hinting to a superuser that a
> workflow they know from other systems (or an ALTER SYSTEM command they
> copied from the internet) is not the intended way to modify their
> server configuration on the system they are currently working on.

Well.  Protection against superusers randomly copying ALTER SYSTEM commands
from the internet actually does turn this into a security feature =)

--
Daniel Gustafsson





Re: Reducing connection overhead in pg_upgrade compat check phase

2024-03-19 Thread Daniel Gustafsson
> On 19 Mar 2024, at 08:07, Peter Eisentraut  wrote:
> 
> On 18.03.24 13:11, Daniel Gustafsson wrote:
>> Attached is a fresh rebase with only minor cosmetic touch-ups which I would
>> like to go ahead with during this CF.
>> Peter: does this address the comments you had on translation and code
>> duplication?
> 
> Yes, this looks good.

Thanks for review! I took another look at this and pushed it.

--
Daniel Gustafsson





Re: Proposal to include --exclude-extension Flag in pg_dump

2024-03-19 Thread Daniel Gustafsson
> On 19 Mar 2024, at 12:19, Dean Rasheed  wrote:

> I'm marking this ready-for-commit (which I'll probably do myself in a
> day or two, unless anyone else claims it first).

LGTM too from a read through.  I did notice a few mistakes in the --filter
documentation portion for other keywords but that's unrelated to this patch,
will fix them once this is in to avoid conflicts.

--
Daniel Gustafsson





Re: doc issues in event-trigger-matrix.html

2024-03-19 Thread Daniel Gustafsson
> On 19 Mar 2024, at 02:14, Michael Paquier  wrote:
> 
> On Tue, Mar 19, 2024 at 08:00:00AM +0800, jian he wrote:
>> I think the "X" and "-" mean in this matrix [0] is not very intuitive.
>> mainly because "X" tends to mean negative things in most cases.
>> we can write a sentence saying "X"  means this, "-" means that.
>> 
>> or maybe Check mark [1] and Cross mark [2]  are more universal.
>> and we can use these marks.
>> 
>> "Only for local objects"
>> is there any reference explaining "local objects"?
>> I think local object means objects that only affect one single database?

That's a bigger problem than the table representation, we never define what
"local object" mean anywhere in the EVT docs.  EV's are global for a database,
but not a cluster, so I assume what this means is that EVs for non-DDL commands
like COMMENT can only fire for a specific relation they are attached to and not
database wide?

> It is true that in Japan the cross mark refers to a negation, and
> that's the opposite in France: I would put a cross on a table in the
> case where something is supported.  I've never seen anybody complain
> about the format of these tables, FWIW, but if these were to be
> changed, the update should happen across the board for all the tables
> and not only one.

AFAICT we only have one other table with "X" denoting support, the "Conflicting
Lock Modes" table under Concurrency Control chapter, and there we simply leave
the "not supported" column empty instead of using a dash.  Maybe the simple fix
here is to make these tables consistent by removing the dash from the event
trigger firing matrix?

As a sidenote, the table should gain a sentence explaining why the login column
is missing to avoid confusion.

--
Daniel Gustafsson





Re: documentation structure

2024-03-19 Thread Daniel Gustafsson
> On 18 Mar 2024, at 22:40, Laurenz Albe  wrote:
> On Mon, 2024-03-18 at 10:11 -0400, Robert Haas wrote:

>> For reasons that I don't understand, all chapters except
>> for those in "VI. Reference" are numbered, but the chapters in that
>> section have Roman numerals instead.
> 
> That last fact is very odd indeed and could be easily fixed.

It's actually not very odd, the reference section is using  elements
and we had missed the arabic numerals setting on those.  The attached fixes
that for me.  That being said, we've had roman numerals for the reference
section since forever (all the way down to the 7.2 docs online has it) so maybe
it was intentional?  Or no one managed to see it until Robert did, I've
certainly never noticed it until now.

--
Daniel Gustafsson



reference-autolabel.diff
Description: Binary data


Re: Possibility to disable `ALTER SYSTEM`

2024-03-18 Thread Daniel Gustafsson
> On 18 Mar 2024, at 16:34, Magnus Hagander  wrote:
> 
> On Mon, Mar 18, 2024 at 2:09 PM Daniel Gustafsson  wrote:
>> 
>>> On 18 Mar 2024, at 13:57, Robert Haas  wrote:
>> 
>>> my proposal is something like this, taking a
>>> bunch of text from Jelte's patch and some inspiration from Magnus's
>>> earlier remarks:
>> 
>> I still think any wording should clearly mention that settings in the file 
>> are
>> still applied.  The proposed wording says to implicitly but to avoid 
>> confusion
>> I think it should be explicit.
> 
> I haven't kept up with the thread, but in general I'd prefer it to
> actually turn off parsing the file as well. I think just turning off
> the ability to change it -- including the ability to *revert* changes
> that were made to it before -- is going to be confusing.

Wouldn't that break pgBackrest which IIRC write to .auto.conf directly
without using ALTER SYSTEM?

> But, if we have decided it shouldn't do that, then IMHO we should
> consider naming it maybe enable_alter_system_command instead -- since
> we're only disabling the alter system command, not the actual feature
> in total.

Good point.

--
Daniel Gustafsson





Re: Add LSN <-> time conversion functionality

2024-03-18 Thread Daniel Gustafsson
> On 22 Feb 2024, at 03:45, Melanie Plageman  wrote:
> On Fri, Feb 16, 2024 at 3:41 PM Tomas Vondra
>  wrote:

>> - Not sure why we need 0001. Just so that the "estimate" functions in
>> 0002 have a convenient "start" point? Surely we could look at the
>> current LSNTimeline data and use the oldest value, or (if there's no
>> data) use the current timestamp/LSN?
> 
> When there are 0 or 1 entries in the timeline you'll get an answer
> that could be very off if you just return the current timestamp or
> LSN. I guess that is okay?

I don't think that's a huge problem at such a young "lsn-age", but I might be
missing something.

>> - I wonder what happens if we lose the data - we know that if people
>> reset statistics for whatever reason (or just lose them because of a
>> crash, or because they're on a replica), bad things happen to
>> autovacuum. What's the (expected) impact on pruning?
> 
> This is an important question. Because stats aren't crashsafe, we
> could return very inaccurate numbers for some time/LSN values if we
> crash. I don't actually know what we could do about that. When I use
> the LSNTimeline for the freeze heuristic it is less of an issue
> because the freeze heuristic has a fallback strategy when there aren't
> enough stats to do its calculations. But other users of this
> LSNTimeline will simply get highly inaccurate results (I think?). Is
> there anything we could do about this? It seems bad.

A complication with this over stats is that we can't recompute this in case of
a crash/corruption issue.  The simple solution is to consider this unlogged
data and start fresh at every unclean shutdown, but I have a feeling that won't
be good enough for basing heuristics on?

> Andres had brought up something at some point about, what if the
> database is simply turned off for awhile and then turned back on. Even
> if you cleanly shut down, will there be "gaps" in the timeline? I
> think that could be okay, but it is something to think about.

The gaps would represent reality, so there is nothing wrong per se with gaps,
but if they inflate the interval of a bucket which in turns impact the
precision of the results then that can be a problem.

> Just a note, all of my comments could use a lot of work, but I want to
> get consensus on the algorithm before I make sure and write about it
> in a perfect way.

I'm not sure "a lot of work" is accurate, they seem pretty much there to me,
but I think that an illustration of running through the algorithm in an
ascii-art array would be helpful.


Reading through this I think such a function has merits, not only for your
usecase but other heuristic based work and quite possibly systems debugging.

While the bucketing algorithm is a clever algorithm for degrading precision for
older entries without discarding them, I do fear that we'll risk ending up with
answers like "somewhere between in the past and even further in the past".
I've been playing around with various compression algorithms for packing the
buckets such that we can retain precision for longer.  Since you were aiming to
work on other patches leading up to the freeze, let's pick this up again once
things calm down.

When compiling I got this warning for lsntime_merge_target:

pgstat_wal.c:242:1: warning: non-void function does not return a value in all 
control paths [-Wreturn-type]
}
^
1 warning generated.

The issue seems to be that the can't-really-happen path is protected with an
Assertion, which is a no-op for production builds.  I think we should handle
the error rather than rely on testing catching it (since if it does happen even
though it can't, it's going to be when it's for sure not tested).  Returning an
invalid array subscript like -1 and testing for it in lsntime_insert, with an
elog(WARNING..), seems enough.


-last_snapshot_lsn <= GetLastImportantRecPtr())
+last_snapshot_lsn <= current_lsn)
I think we should delay extracting the LSN with GetLastImportantRecPtr until we
know that we need it, to avoid taking locks in this codepath unless needed.

I've attached a diff with the above suggestions which applies on op of your
patchset.

--
Daniel Gustafsson

commit 908f3bb511450f05980ba01d42c909cc9ef8007a
Author: Daniel Gustafsson 
Date:   Thu Mar 14 14:32:15 2024 +0100

Code review hackery

diff --git a/src/backend/postmaster/bgwriter.c 
b/src/backend/postmaster/bgwriter.c
index 7d9ad9046f..115204f708 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -267,7 +267,7 @@ BackgroundWriterMain(void)
{
TimestampTz timeout = 0;
TimestampTz now = GetCurrentTimestamp();
-   XLogRe

Re: Is there still password max length restrictions in PG?

2024-03-18 Thread Daniel Gustafsson
> On 18 Mar 2024, at 14:43, Sean  wrote:
> 
> Hi, 
> Thanks for your information.  Even using SCRAM,  when specified the content 
> of "password",  still there is a basic request about the length of it.  From 
> the source code, seems there is no restriction, right? 

SCRAM stores a hashed fixed-size representation of the password, so there is no
restriction in terms of length on the user supplied secret.

--
Daniel Gustafsson





  1   2   3   4   5   6   7   8   9   10   >