DISCARD ALL does not force re-planning of plpgsql functions/procedures
I got a report on the PgBouncer repo[1] that running DISCARD ALL was not sufficient between connection handoffs to force replanning of stored procedures. Turns out that while DISCARD AL and DISCARD PLAN reset the plan cache they do not reset the num_custom_plans fields of the existing PlanSources. So while the generic plan is re-planned after DISCARD ALL, the decision on whether to choose it or not won't be changed. See below for a minimally reproducing example: create table test_mode (a int); insert into test_mode select 1 from generate_series(1,100) union all select 2; create index on test_mode (a); analyze test_mode; create function test_mode_func(int) returns integer as $$ declare v_count integer; begin select into v_count count(*) from test_mode where a = $1; return v_count; END; $$ language plpgsql; \timing on -- trigger execution 5 times SELECT test_mode_func(1); SELECT test_mode_func(1); SELECT test_mode_func(1); SELECT test_mode_func(1); SELECT test_mode_func(1); DISCARD ALL; -- slow because of bad plan, even after DISCARD ALL SELECT test_mode_func(2); \c -- fast after re-connect, because of custom plan SELECT test_mode_func(2); [1]: https://github.com/pgbouncer/pgbouncer/issues/912#issuecomment-2131250109
Re: Volatile write caches on macOS and Windows, redux
On Tue, 18 Jul 2023 at 05:29, Thomas Munro wrote: > It's possible that fcntl(F_FULLFSYNC) might fail with ENOSUPP or other > errors in obscure cases (eg unusual file systems). In that case, you > could manually lower fsync to just "on" and do your own research on > whether power loss can toast your database, but that doesn't seem like > a reason for us not to ship good solid defaults for typical users. Is this the only reason why you're suggesting adding fsync=full, instead of simply always setting F_FULLFSYNC when fsync=true on MacOS. If so, I'm not sure we really gain anything by this tri-state. I think people either care about data loss on power loss, or they don't. I doubt many people want his third intermediate option, which afaict basically means lose data on powerloss less often than fsync=false but still lose data most of the time. If you're going to keep this tri-state for MacOS, then it still seems nicer to me to "fix" fsync=true on MacOS and introduce a fsync=partial or something. Then defaults are the same across platforms and anyone setting fsync=yes currently in their postgresql.conf would get the fixed behaviour on upgrade.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
(pressed send to early) On Sat, 25 May 2024 at 12:39, Jelte Fennema-Nio wrote: > But again if I'm alone in this, then I don't ... mind budging on this to move this decision along. Using _pq_.xxx parameters for all protocol changes would totally be acceptable to me.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, 23 May 2024 at 20:12, Tom Lane wrote: > > Jelte Fennema-Nio writes: > > On Fri, 17 May 2024 at 21:24, Robert Haas wrote: > >> Perhaps these are all minority positions, but I can't tell you what > >> everyone thinks, only what I think. > > > I'd love to hear some opinions from others on these design choices. So > > far it seems like we're the only two that have opinions on this (which > > seems hard to believe) and our opinions are clearly conflicting. And > > above all I'd like to move forward with this, be it my way or yours > > (although I'd prefer my way of course ;) ) > > I got around to looking through this thread in preparation for next > week's patch review session. I have a couple of opinions to offer: > > 1. Protocol versions suck. Bumping them is seldom a good answer, > and should never be done if you have any finer-grained negotiation > mechanism available. My aversion to this is over thirty years old: > I learned that lesson from watching the GIF87-to-GIF89 transition mess. > Authors of GIF-writing tools tended to take the easy way out and write > "GIF89" in the header whether they were actually using any of the new > version's features or not. This led to an awful lot of pictures that > couldn't be read by available GIF-displaying tools, for no good reason > whatsoever. The PNG committee, a couple years later, reacted to that > mess by designing PNG to have no version number whatsoever, and yet > be extensible in a fine-grained way. (Basically, a PNG file is made > up of labeled chunks. If a reader doesn't recognize a particular > chunk code, it can still tell whether the chunk is "critical" or not, > and thereby decide if it must give up or can proceed while ignoring > that chunk.) > > So overall, I have a strong preference for using the _pq_.xxx > mechanism instead of a protocol version bump. I do not believe > the latter has any advantage. I'm not necessarily super opposed to only using the _pq_.xxx mechanism. I mainly think it's silly to have a protocol version number and then never use it. And I feel some of the proposed changes don't really benefit from being able to be turned on-and-off by themselves. My rule of thumb would be: 1. Things that a modern client/pooler would always request: version bump 2. Everything else: _pq_.xxx Of the proposed changes so far on the mailing list the only 2 that would fall under 1 imho are: 1. The ParameterSet message 2. Longer than 32bit secret in BackendKeyData I also don't think the GIF situation you describe translates fully to this discussion. We have active protocol version negotiation, so if a server doesn't support protocol 3.1 a client is expected to fall back to the 3.0 protocol when communicating. Of course you can argue that a badly behaved client will fail to connect when it gets a downgrade request from the server, but that same argument can be made about a server not reporting support for a _pq_.xxx parameter that every modern client/pooler requests. So I don't think there's a practical difference in the problem you're describing. But again if I'm alone in this, then I don't
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, 23 May 2024 at 20:40, Tom Lane wrote: > > Jacob Champion writes: > > Would it be good to expand on that idea of criticality? IIRC one of > > Jelte's complaints earlier was that middleware has to know all the > > extension types anyway, to be able to figure out whether it has to do > > something about them or not. HTTP has the concept of hop-by-hop vs > > end-to-end headers for related reasons. > > Yeah, perhaps. We'd need to figure out just which classes we need > to divide protocol parameters into, and then think about a way for > code to understand which class a parameter falls into even when > it doesn't specifically know that parameter. I think this class is so rare, that it's not worth complicating the discussion on new protocol features even more. AFAICT there is only one proposed protocol change that does not need any pooler support (apart from syncing the feature value when re-assigning the connectin): Automatic binary encoding for a list of types All others need some support from poolers, at the very least they need new message types to not error out. But in many cases more complex stuff is needed.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 24 May 2024 at 15:28, Robert Haas wrote: > > On Thu, May 23, 2024 at 4:00 PM Tom Lane wrote: > > I don't recall exactly what I thought earlier, but now I think we'd > > be better off with separate infrastructure. guc.c is unduly complex > > already. Perhaps there are bits of it that could be factored out > > and shared, but I bet not a lot. > > OK. That seems fine to me, but I bet Jelte is going to disagree. I indeed disagree. I think the effort needed to make guc.c handle protocol parameters is extremely little. The 0011 patch are all the changes that are needed to achieve that, and that patch only adds 65 additional lines. And only 15 of those 65 lines actually have to do anything somewhat weird, to be able to handle the transactionality discrepancy between protocol parameters and other GUCs. The other 50 lines are (imho) really clean and fit perfectly with the way guc.c is currently structured (i.e. they add PGC_PROTOCOL and PGC_SU_PROTOCOL in a really obvious way) Separating it from the GUC infrastructure will mean we need to duplicate a lot of the infrastructure. Assuming we don't care about SHOW or pg_settings (which I agree are not super important), the things that we would want for protocol parameters to have that guc.c gives us for free are: 1. Reporting the value of the parameter to the client (done using ParameterStatus) 2. Parsing and validating of the input, bool, int, enum, etc, but also check_hook and assign_hook. 3. Logic in all connection poolers to change GUC values to the client's expected values whenever a server connection is handed off to a client 4. Permission checking, if we want some protocol extensions to only be configurable by a highly privileged user All of those things would have to be duplicated/re-implemented if we make protocol parameters their own dedicated thing. Doing that work seems like a waste of time to me, and would imho add much more complexity than the proposed 65 lines of code in 0011.
Re: libpq compression (part 3)
On Mon, 20 May 2024 at 21:42, Jacob Champion wrote: > As Andrey points out, there was prior work done that started to take > this into account. I haven't reviewed it to see how good it is -- and > I think there are probably many use cases in which queries and tables > contain both private and attacker-controlled information -- but if we > agree that they have to be separated, then the strategy can at least > be improved upon. To help get everyone on the same page I wanted to list all the security concerns in one place: 1. Triggering excessive CPU usage before authentication, by asking for very high compression levels 2. Triggering excessive memory/CPU usage before authentication, by sending a client sending a zipbomb 3. Triggering excessive CPU after authentication, by asking for a very high compression level 4. Triggering excessive memory/CPU after authentication due to zipbombs (i.e. small amount of data extracting to lots of data) 5. CRIME style leakage of information about encrypted data 1 & 2 can easily be solved by not allowing any authentication packets to be compressed. This also has benefits for 5. 3 & 4 are less of a concern than 1&2 imho. Once authenticated a client deserves some level of trust. But having knobs to limit impact definitely seems useful. 3 can be solved in two ways afaict: a. Allow the server to choose the maximum compression level for each compression method (using some GUC), and downgrade the level transparently when a higher level is requested b. Don't allow the client to choose the compression level that the server uses. I'd prefer option a 4 would require some safety limits on the amount of data that a (small) compressed message can be decompressed to, and stop decompression of that message once that limit is hit. What that limit should be seems hard to choose though. A few ideas: a. The size of the message reported by the uncompressed header. This would mean that at most the 4GB will be uncompressed, since maximum message length is 4GB (limited by 32bit message length field) b. Allow servers to specify maximum client decompressed message length lower than this 4GB, e.g. messages of more than 100MB of uncompressed size should not be allowed. I think 5 is the most complicated to deal with, especially as it depends on the actual usage to know what is safe. I believe we should let users have the freedom to make their own security tradeoffs, but we should protect them against some of the most glaring issues (especially ones that benefit little from compression anyway). As already shown by Andrey, sending LDAP passwords in a compressed way seems extremely dangerous. So I think we should disallow compressing any authentication related packets. To reduce similar risks further we can choose to compress only the message types that we expect to benefit most from compression. IMHO those are the following (marked with (B)ackend or (F)rontend to show who sends them): - Query (F) - Parse (F) - Describe (F) - Bind (F) - RowDescription (B) - DataRow (B) - CopyData (B/F) Then I think we should let users choose how they want to compress and where they want their compression stream to restart. Something like this: a. compression_restart=query: Restart the stream after every query. Recommended if queries across the same connection are triggered by different end-users. I think this would be a sane default b. compression_restart=message: Restart the stream for every message. Recommended if the amount of correlation between rows of the same query is a security concern. c. compression_restart=manual: Don't restart the stream automatically, but only when the client user calls a specific function. Recommended only if the user can make trade-offs, or if no encryption is used anyway.
Re: Multiple startup messages over the same connection
On Sat, 18 May 2024 at 23:10, Vladimir Churyukin wrote: > I guess the process of passing control from child processes to the parent > could be a bit tricky for that one, but doable? > Is there anything I'm missing that can be a no-go for this? One seriously difficult/possibly impossible thing is passing SSL session state between processes using shared memory.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 17 May 2024 at 21:24, Robert Haas wrote: > I think the > fear that we're going to run into cases where such complex handshaking > is necessary is a major reason why I'm afraid of Jelte's ideas about > ParameterSet: it seems much more opinionated to me than he seems to > think it is. I think that fear is valid, and I agree that we might want to add a bespoke message for cases where ParameterSet is not enough. But as far as I can tell ParameterSet would at least cover all the protocol changes that have been suggested so far. Using an opinionated but limited message type for 90% of the cases and a bespoke message for the last 10% seems much better to me than having a bespoke one for each (especially if currently none of the protocol proposals fall into the last 10%). > To the extent that I can wrap my head around what Jelte is proposing, > and all signs point to that extent being quite limited, I suppose I > was thinking of something like his option (2). That is, I assumed that > a client would request all the optional features that said client > might wish to use at connection startup time. But I can see how that > assumption might be somewhat problematic in a connection-pooling > environment. To be clear, I'd also be totally fine with my option (2). I'm personally slightly leaning towards my option (1), due to the reasons listed before. But connection poolers could request all the protocol extensions at the start just fine (using the default "disabled" value) to check for support. So I think option (2) would probably be the most conservative, i.e. we could always decide that option (1) is fine in some future release. > Still, at least to me, it seems better than trying to > rely on GUC_REPORT. My opinion is (1) GUC_REPORT isn't a particularly > well-designed mechanism so I dislike trying to double down on it I agree that GUC_REPORT is not particularly well designed, currently... But even in its current form it's already a very effective mechanism for connection poolers to find out to which value a specific GUC is set to, and if something similar to patch 0014 would be merged my main gripe with GUC_REPORT would be gone. Tracking GUC settings by using ParameterSet would actually be harder, because if ParameterSet errors at Postgres then the connection pooler would have to roll back its cache of that setting. While with the GUC_REPORT response from Postgres it can simply rely on Postgres telling the pooler that the GUC has changed, even rollbacks are handled correctly this way. > and > (2) trying to mix these protocol-level parameters and the > transactional GUCs we have together in a single mechanism seems > potentially problematic I don't understand what potential problems you're worried about here. Could you clarify? > and (3) I'm still not particularly happy about > the idea of making protocol parameters into GUCs in the first place. Similar to the above: Could you clarify why you're not happy about that? > Perhaps these are all minority positions, but I can't tell you what > everyone thinks, only what I think. I'd love to hear some opinions from others on these design choices. So far it seems like we're the only two that have opinions on this (which seems hard to believe) and our opinions are clearly conflicting. And above all I'd like to move forward with this, be it my way or yours (although I'd prefer my way of course ;) )
Re: libpq compression (part 3)
On Fri, 17 May 2024 at 23:10, Jacob Champion wrote: > We're talking about a transport-level option, though -- I thought the > proposal enabled compression before authentication completed? Or has > that changed? I think it would make sense to only compress messages after authentication has completed. The gain of compressing authentication related packets seems pretty limited. > (I'm suspicious of arguments that begin "well you can already do bad > things" Once logged in it's really easy to max out a core of the backend you're connected as. There's many trivial queries you can use to do that. An example would be: SELECT sum(i) from generate_series(1, 10) i; So I don't think it makes sense to worry about an attacker using a high compression level as a means to DoS the server. Sending a few of the above queries seems much easier.
Re: libpq compression (part 3)
On Fri, 17 May 2024 at 23:40, Robert Haas wrote: > To be clear, I am not arguing that it should be the receiver's choice. > I'm arguing it should be the client's choice, which means the client > decides what it sends and also tells the server what to send to it. > I'm open to counter-arguments, but as I've thought about this more, > I've come to the conclusion that letting the client control the > behavior is the most likely to be useful and the most consistent with > existing facilities. I think we're on the same page based on the rest > of your email: I'm just clarifying. +1 > I have some quibbles with the syntax but I agree with the concept. > What I'd probably do is separate the server side thing into two GUCs, > each with a list of algorithms, comma-separated, like we do for other > lists in postgresql.conf. Maybe make the default 'all' meaning > "everything this build of the server supports". On the client side, > I'd allow both things to be specified using a single option, because > wanting to do the same thing in both directions will be common, and > you actually have to type in connection strings sometimes, so > verbosity matters more. > > As far as the format of the value for that keyword, what do you think > about either compression=DO_THIS_BOTH_WAYS or > compression=DO_THIS_WHEN_SENDING/DO_THIS_WHEN_RECEIVING, with each "do > this" being a specification of the same form already accepted for > server-side compression e.g. gzip or gzip:level=9? If you don't like > that, why do you think the proposal you made above is better, and why > is that one now punctuated with slashes instead of semicolons? +1
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, 17 May 2024 at 17:59, Robert Haas wrote: > If they > get attention, it's much more likely to be because somebody saw their > email and wrote back than it is to be because somebody went through > the CommitFest and found their entry and was like "oh, I should review > this". I think this is an important insight. I used the commitfest app to find patches to review when I was just starting out in postgres development, since I didn't subscribe to all af pgsql-hackers yet. I do subscribe nowadays, but now I rarely look at the commitfest app, instead I skim the titles of emails that go into my "Postgres" folder in my mailbox. Going from such an email to a commitfest entry is near impossible (at least I don't know how to do this efficiently). I guess I'm not the only one doing this. > Honestly, if we get to a situation where a patch author is sad > because they have to click a link every 2 months to say "yeah, I'm > still here, please review my patch," we've already lost the game. That > person isn't sad because we asked them to click a link. They're sad > it's already been N * 2 months and nothing has happened. Maybe it wouldn't be so bad for an author to click the 2 months button, if it would actually give their patch some higher chance of being reviewed by doing that. And given the previous insight, that people don't look at the commitfest app that often, it might be good if pressing this button would also bump the item in people's mailboxes.
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, 17 May 2024 at 14:19, Joe Conway wrote: > > On 5/16/24 22:26, Robert Haas wrote: > > For example, imagine that the CommitFest is FORCIBLY empty > > until a week before it starts. You can still register patches in the > > system generally, but that just means they get CI runs, not that > > they're scheduled to be reviewed. A week before the CommitFest, > > everyone who has a patch registered in the system that still applies > > gets an email saying "click here if you think this patch should be > > reviewed in the upcoming CommitFest -- if you don't care about the > > patch any more or it needs more work before other people review it, > > don't click here". Then, the CommitFest ends up containing only the > > things where the patch author clicked there during that week. > > 100% agree. This is in line with what I suggested on an adjacent part of > the thread. Such a proposal would basically mean that no-one that cares about their patches getting reviews can go on holiday and leave work behind during the week before a commit fest. That seems quite undesirable to me.
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, 17 May 2024 at 13:39, Robert Haas wrote: > > On Fri, May 17, 2024 at 7:11 AM Tomas Vondra > wrote: > > Yeah, I 100% agree with this. If a patch bitrots and no one cares enough > > to rebase it once in a while, then sure - it's probably fine to mark it > > RwF. But forcing all contributors to do a dance every 2 months just to > > have a chance someone might take a look, seems ... not great. > > I don't think that clicking on a link that someone sends you that asks > "hey, is this ready to be reviewed' qualifies as a dance. If there's been any useful response to the patch since the last time you pressed this button, then it might be okay. But it's definitely not uncommon for items on the commitfest app to get no actual response at all for half a year, i.e. multiple commits fests (except for the odd request for a rebase that an author does within a week). I'd most certainly get very annoyed if for those patches where it already seems as if I'm screaming into the void I'd also be required to press a button every two months, for it to even have a chance at receiving a response. > So I think the right interpretation of his comment is that > managing the CommitFest has become about an order of magnitude more > difficult than what it needs to be for the task to be done well. +1 > > Long time ago there was a "rule" that people submitting patches are > > expected to do reviews. Perhaps we should be more strict this. > > This sounds like it's just generating more manual work to add to a > system that's already suffering from needing too much manual work. Who > would keep track of how many reviews each person is doing, and how > many patches they're submitting, and whether those reviews were > actually any good, and what would they do about it? +1
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, 17 May 2024 at 11:02, Jelte Fennema-Nio wrote: > > On Fri, 17 May 2024 at 10:47, Daniel Gustafsson wrote: > > One way to ensure we capture detail could be if the system would send an > > automated email to the thread summarizing the entry when it's marked as > > "committed"? > > This sounds great! Especially if (oops pressed send too early) **Especially if it contains the git commit hash** > Going back from an archived thread > to the commitfest entry or git commit is currently very hard. If I'll > just be able to Ctrl+F on commitf...@postgresql.com that would be so > helpful. I think it would even be useful to have an email be sent > whenever a patch gets initially added to the commitfest, so that > there's a back link to and it's easy to mark yourself as reviewer. > Right now, I almost never take the time to do that because if I look > at my inbox, I have no clue what the interesting email thread is > called in the commitfest app.
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, 17 May 2024 at 10:47, Daniel Gustafsson wrote: > One way to ensure we capture detail could be if the system would send an > automated email to the thread summarizing the entry when it's marked as > "committed"? This sounds great! Especially if Going back from an archived thread to the commitfest entry or git commit is currently very hard. If I'll just be able to Ctrl+F on commitf...@postgresql.com that would be so helpful. I think it would even be useful to have an email be sent whenever a patch gets initially added to the commitfest, so that there's a back link to and it's easy to mark yourself as reviewer. Right now, I almost never take the time to do that because if I look at my inbox, I have no clue what the interesting email thread is called in the commitfest app.
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, 17 May 2024 at 06:58, Peter Eisentraut wrote: > Yeah, some fine-tuning might be required. But I would be wary of > over-designing too many new states at this point. I think the key idea > is that there ought to be a state that communicates "needs attention > from someone other than author, reviewer, or committer". +1 on adding a new state like this. I think it would make sense for the author to request additional input, but I think it would need two states, something like "Request for new reviewer" and "Request for new committer". Just like authors disappear sometimes after a driveby patch submission, it's fairly common too imho for reviewers or committers to disappear after a driveby review. Having a way for an author to mark a patch as such would be helpful, both to the author, and to reviewers/committers looking to do help some patch out. On Fri, 17 May 2024 at 09:33, Heikki Linnakangas wrote: > Things like "cfbot says > this has bitrotted" or "Will review this after other patch this depends > on". On the mailing list, notes like that are both noisy and easily lost > in the threads. But as a little free-form text box on the commitfest, it > would be handy. +many on the free form text box
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, 16 May 2024 at 18:57, Robert Haas wrote: > Ugh, it's so hard to communicate clearly about this stuff. I didn't > really have any thought that we'd ever try to handle something as > complicated as compression using ParameterSet. Okay, then it's definitely very hard to communicate clearly about this. Because being able to re-configure compression using ParameterSet is exactly the type of thing I want to be able to do in PgBouncer. Otherwise a connection from PgBouncer to Postgres cannot be handed off to any client, because its compression state cannot be changed on the fly to the state that a client expects, if the first one wants lz4 compression, and a second one wants zstd compression. There's no way the same connection can be reused for both unless you decompress and recompress in pgbouncer, which is expensive to do. Being able to reconfigure the stream to compress the messages in the expected form is much cheaper. > Does it send a NegotiateCompressionType message after > the NegotiateProtocolVersion, for example? That seems like it could > lead to the client having to be prepared for a lot of NegotiateX > messages somewhere down the road. Like Jacob explains, you'd want to allow the client to provide a list of options in order of preference. And then have the server respond with a ParameterStatus message saying what it ended up being. So no new NegotiateXXX messages are needed, as long as we make sure any _pq_.xxx falls back to reporting its default value on failure. This is exactly why I said I prefer option 1 of the options I listed, because we need _pq_.xxx messages to report their current value to the client. To be clear, this is not special to compression. This applies to ALL proposed protocol parameters. The server should fall back to some "least common denominator" if it doesn't understand (part of) the value for the protocol parameter that's provided by the client, possibly falling back to disabling the protocol extension completely. > Changing compression algorithms in mid-stream is tricky, too. If I > tell the server "hey, turn on server-to-client compression!" Yes it is tricky, but it's something that it would need to support imho. And Jacob actually implemented it this way, so I feel like we're discussing a non-problem here. > I don't know if we want to support changing compression algorithms in > mid-stream. I don't think there's any reason we can't, but it might be > a bunch of work for something that nobody really cares about. Again, I guess I wasn't clear at all in my previous emails and/or commit messages. Connection poolers care **very much** about this. Poolers need to be able to re-configure any protocol parameter to be able to pool the same server connection across clients with differently configured protocol parameters. Again: This is the primary reason for me wanting to introduce the ParameterSet message.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, 16 May 2024 at 17:29, Robert Haas wrote: > You're probably not going to like this answer, but I feel like this is > another sign that you're trying to use the protocol extensibility > facilities in the wrong way. In my first reply to the thread, I > proposed having the client send _pq_.protocol_set=1 in that startup > message. If the server accepts that message, then you can send > whatever set of message types are associated with that option, which > could include messages to list known settings, as well as messages to > set them. Alternatively, if we used a wire protocol bump for this, you > could request version 3.1 and everything that I just said still > applies. In other words, I feel that if you had adopted the design > that I proposed back in March, or some variant of it, the problem > you're having now wouldn't exist. I don't really understand the benefit of your proposal over option 2 that I proposed. Afaict you're proposing that for e.g. compression we first set _pq_.supports_compression=1 in the StartupMessage and use that to do feature detection, and then after we get the response we send ParameterSet("compression", "gzip"). To me this is pretty much identical to option 2, except that it introduces an extra round trip for no benefit (as far as I can see). Why not go for option 2 and send _pq_.compression=gzip in the StartupMessage directly. > IMHO, we need to negotiate the language that we're going to use to > communicate before we start communicating. We should find out which > protocol version we're using, and what protocol options are accepted, > based on sending a StartupMessage and receiving a reply. Then, after > that, having established a common language, we can do whatever. I > think you're trying to meld those two steps into one, which is > understandable from the point of view of saving a round trip, but I > just don't see it working out well. I think not increasing the number of needed round trips in the startup of a connection is extremely important. I think it's so important that I honestly don't think we should merge a protocol change that introduces an extra round trip without a VERY good reason, and this round trip should only be needed when actually using the feature. > What we can do in the startup > message is extremely limited, because any startup messages we generate > can't break existing servers, and also because of the concerns I > raised earlier about leaving room for more extension in the future. > Once we get past the startup message negotiation, the sky's the limit! Sure, what we can do in the StartupMessage is extremely limited, but what it does allow is passing arbitrary key value pairs to the server. But by only using _pq_.feature_name=1, we're effectively only using the key part of the key value pair. Limiting ourselves even more, by throwing half of our communication channel away, seems like a bad idea to me. But maybe I'm just not understanding the problem you're seeing with using the value too.
Re: PostgreSQL 17 Beta 1 release announcement draft
On Thu, 16 May 2024 at 03:45, Jonathan S. Katz wrote: > Attached is a copy of the PostgreSQL 17 Beta 1 release announcement > draft. I think we can quickly mention c4ab7da6061 in the COPY paragraph, in some benchmarks it improved perf by close to 2x. Something like this: "has improved performance in PostgreSQL 17 when the source encoding matches the destination encoding *and when sending large rows from server to client*" Also, I think it's a bit weird to put the current COPY paragraph under Developer Experience. I think if you want to keep it there instead of move it to the per section, we should put the line about IGNORE_ERROR first instead of the perf improvements. Now the IGNORE_ERROR addition seems more of an afterthought. s/IGNORE_ERROR/ON_ERROR I think it would be good to clarify if the following applies when upgrading from or to PostgreSQL 17: "Starting with PostgreSQL 17, you no longer need to drop logical replication slots when using pg_upgrade" Finally, I personally would have included a lot more links for the new items in this document. Some that would benefit from being a link imho: - pg_createsubscriber - JSON_TABLE - SQL/JSON constructor - SQL/JSON query functions - ON_ERROR - sslnegotiation - PQchangePassword - pg_maintain
Re: First draft of PG 17 release notes
On Thu, 16 May 2024 at 05:48, Andres Freund wrote: > We're having this debate every release. I think the ongoing reticence to note > performance improvements in the release notes is hurting Postgres. > > For one, performance improvements are one of the prime reason users > upgrade. Without them being noted anywhere more dense than the commit log, > it's very hard to figure out what improved for users. A halfway widely > applicable performance improvement is far more impactful than many of the > feature changes we do list in the release notes. > > For another, it's also very frustrating for developers that focus on > performance. The reticence to note their work, while noting other, far > smaller, things in the release notes, pretty much tells us that our work isn't > valued. +1 to the general gist of listing every perf improvement **and memory usage reduction** in the release notes. Most of them are already grouped together in a dedicated "General performance" section anyway, having that section be big would only be good imho to show that we're committed to improving perf. I think one thing would make this a lot easier though is if commits that knowlingy impact perf would clearly say so in the commit message, because now it's sometimes hard to spot as someone not deeply involved with the specific patch. e.g. c4ab7da606 doesn't mention performance at all, so I'm not surprised it wasn't listed initially. And while 667e65aac3 states that multiple rounds of heap scanning is now extremely rare, it doesn't explicitly state what the kind of perf impact can be expected because of that. Maybe something like introducing a common "Perf-Improvement: true" marker in the commit message and when doing so add a clear paragraph explaining the expected perf impact perf impact. Another option could be to add a "User Impact" section to the commit message, where an author could add their suggestion for a release note entry. So basically this suggestion boils down to more clearly mentioning user impact in commit messages, instead of mostly/only including technical/implementation details.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 5 Apr 2024 at 18:30, Dave Cramer wrote: > On Fri, 5 Apr 2024 at 12:09, Jelte Fennema-Nio wrote: >> I'll take a look at redesigning the protocol parameter stuff. To work >> with dedicated functions instead. > > +1 It's been a while, but I now actually took the time to look into this. And I ran into a problem that I'd like to get some feedback on before continuing the implementation: If we're not setting the protocol parameter in the StartupMessage, there's currently no way for us to know if the protocol parameter is supported by the server. If protocol parameters were unchangable then that would be fine, but the whole point of introducing ParameterSet is to make it possible to change protocol parameters on an existing connection. Having the function SupportsProtocolCompression return false, even though you can enable compression just fine, only because we didn't ask for compression when connecting seems quite silly and confusing. I see five ways around this problem and would love some feedback on which you think is best (or if you can think of any other/better ones): 1. Have protocol parameters always be GUC_REPORT, so that the presence of a ParameterStatus message during connection startup can be used as a way of detecting support for the protocol parameter. 2. Make libpq always send each known protocol parameter in the StartupMessage to check for their support, even if the connection string does not contain the related parameters (set them to their default value then). Then the non-presence of the parameter in the NegotiateProtocolVersion message can be used reliably to determine support for the feature. We could even disallow changing a protocol parameter at the server side using ParameterSet if it was not requested in the StartupMessage. 3. Very similar to 1, but require explicit user input in the connection string to request the feature on connection startup by having the user explicitly provide its default value. If it's not requested on connection startup assume its unsupported and disallow usage of the feature (even if the server might actually support it). 4. Make SupportsProtocolCompression return a tri-state, SUPPORTED, UNSUPPORTED, UNKNOWN. If it's UNKNOWN people could send a ParameterSet message themselves to check for feature support after connection startup. We could even recognize this and change the state that SupportProtocolCompression function to return SUPPORTED/UNSUPPORTED on future calls according to the server response. 5. Basically the same as 4 but automatically send a ParameterSet message internally when calling SupportsProtocolCompression and the state is UNKNOWN, so we'd only ever return SUPPORTED or UNSUPPORTED. The above options are listed in my order of preference, below some reasoning why: 1 and 2 would increase the bandwidth used during connection handshake slightly for each protocol parameter that we would add, but they have the best user experience IMHO. I slightly prefer 1 over 2 because there is another argument to be made for always having protocol parameters be GUC_REPORT: these parameters change what message types a client can send or receive. So it makes sense to me to have the server tell the client what the current value of such a parameter is. This might not be a strong argument though, because this value would only ever change due to user interaction. But still, one might imagine scenarios where the value that the client sent is not exactly what the server would set the parameter to on receiving that value from the client. e.g. for protocol compression, maybe the client sends a list of prefered compression methods and the server would send a ParameterStatus containing only the specific compression method that it will use when sending messages to the client. 3 seems also an acceptable option to me. While having slightly worse user experience than 2, it allows the user of the client to make the decision if the extra bandwidth during connection startup is worth it to be able to enable the feature later. 4 assumes that we want people to be able to trigger sending ParameterSet messages for every protocol parameter. I'm not sure we'd want to give that ability in all cases. 5 would require SupportsProtocolCompression to also have a non-blocking version, which bloats our API more than I'd like. Also as a user you wouldn't be able to know if SupportsProtocolCompression will do a network request or not. PS. This is only a problem for feature detection for features relying on protocol parameters, feature-support relying on protocol version bumps are easy to detect based on the NegotiateProtocolVersion message.
Re: Postgres and --config-file option
On Wed, 15 May 2024 at 11:49, Peter Eisentraut wrote: > Yeah, some of this is becoming quite unwieldy, if we document and > mention each spelling variant of each option everywhere. > > Maybe if the original problem is that the option --config-file is not > explicitly in the --help output, let's add it to the --help output? I definitely think it would be useful to list this --config variant in more places, imho it's nicer than the -c variant. Especially in the PGOPTIONS docs it would be useful. People are already using it in the wild and I regressed on support for that in PgBouncer by accident: https://github.com/pgbouncer/pgbouncer/pull/1064
Re: First draft of PG 17 release notes
On Tue, 14 May 2024 at 02:56, Bruce Momjian wrote: > > On Sat, May 11, 2024 at 10:24:39AM -0400, Joe Conway wrote: > > On 5/11/24 09:57, Jelte Fennema-Nio wrote: > > > The buffering change improved performance up to ~40% in some of the > > > benchmarks. The case it improves mostly is COPY of large rows and > > > streaming a base backup. That sounds user-visible enough to me to > > > warrant an entry imho. > > > > +1 > > Attached patch applied. I think we shouldn't list this under the libpq changes and shouldn't mention libpq in the description, since this patch changes src/backend/libpq files instead of src/interfaces/libpq files. I think it should be in the "General performance" section and describe the change as something like the below: Improve performance when transferring large blocks of data to a client PS. I completely understand that this was not clear from the commit message.
Re: JIT compilation per plan node
On Tue, 14 May 2024 at 10:19, David Rowley wrote: > Here's a plan where the total cost of a subnode is greater than the > total cost of the top node: Ah I didn't realize it was possible for that to happen. **reads up on plan costs** This actually makes me think that using total_cost of the sub-nodes is not the enough to determine determining if the node should be JITet. We wouldn't want to start jitting plans like this, i.e. introducing all the JIT setup overhead for just a single row: set max_parallel_workers_per_gather=0; create table t0 as select a from generate_Series(1,100)a; analyze t0; explain select a+a*a+a*a+a from t0 limit 1; QUERY PLAN - Limit (cost=0.00..0.03 rows=1 width=4) -> Seq Scan on t0 (cost=0.00..26980.00 rows=100 width=4) An easy way to work around that issue I guess is by using the minimum total_cost of all the total_costs from the current sub-node up to the root node. The current minimum could be passed along as a part of the context I guess.
Re: JIT compilation per plan node
On Tue, 20 Feb 2024 at 06:38, David Rowley wrote: > > On Tue, 20 Feb 2024 at 18:31, Tom Lane wrote: > > FWIW, I seriously doubt that an extra walk of the plan tree is even > > measurable compared to the number of cycles JIT compilation will > > expend if it's called. So I don't buy your argument here. > > We would be better off to do this in a way that's clean and doesn't > > add overhead for non-JIT-enabled builds. > > The extra walk of the tree would need to be done for every plan, not > just the ones where we do JIT. I'd rather find a way to not add this > extra plan tree walk, especially since the vast majority of cases on > an average instance won't be doing any JIT. I'm not saying I'd prefer the extra walk, but I don't think you'd need to do this extra walk for all plans. Afaict you could skip the extra walk when top_plan->total_cost < jit_above_cost. i.e. only doing the extra walk to determine which exact nodes to JIT for cases where we currently JIT all nodes. That would limit the extra walk overhead to cases where we currently already spend significant resources on JITing stuff.
Re: Direct SSL connection with ALPN and HBA rules
On Mon, 13 May 2024 at 18:14, Robert Haas wrote: > I disagree with Jacob's assertion that sslmode=require has no security > benefits over sslmode=prefer. That seems like the kind of pessimism > that makes people hate security professionals. There have got to be > some attacks that are foreclosed by encrypting the connection, even if > you don't stop MITM attacks or other things that are more > sophisticated than running wireshark and seeing what goes by on the > wire. Like Jacob already said, I guess you meant me here. The main point I was trying to make is that sslmode=require is extremely insecure too, so if we're changing the default then I'd rather bite the bullet and actually make the default a secure one this time. No-ones browser trusts self-signed certs by default, but currently lots of people trust self-signed certs when connecting to their production database without realizing. IMHO the only benefit that sslmode=require brings over sslmode=prefer is detecting incorrectly configured servers i.e. servers that are supposed to support ssl but somehow don't due to a misconfigured GUC/pg_hba. Such "incorrectly configured server" detection avoids sending data to such a server, which an eavesdropper on the network could see. Which is definitely a security benefit, but it's an extremely small one. In all other cases sslmode=prefer brings exactly the same protection as sslmode=require, because sslmode=prefer encrypts the connection unless postgres actively tells the client to downgrade to plaintext (which never happens when the server is configured correctly).
Re: Direct SSL connection with ALPN and HBA rules
On Mon, 13 May 2024 at 13:07, Heikki Linnakangas wrote: > "channel_binding=require sslmode=require" also protects from MITM attacks. Cool, I didn't realize we had this connection option and it could be used like this. But I think there's a few security downsides of channel_binding=require over sslmode=verify-full: If the client relies on channel_binding to validate server authenticity, a leaked server-side SCRAM hash is enough for an attacker to impersonate a server. While normally a leaked scram hash isn't really much of a security concern (assuming long enough passwords). I also don't know of many people rotating their scram hashes, even though many rotate TLS certs. > I think these options should be designed from the user's point of view, > so that the user can specify the risks they're willing to accept, and > the details of how that's accomplished are handled by libpq. For > example, I'm OK with (tick all that apply): > > [ ] passive eavesdroppers seeing all the traffic > [ ] MITM being able to hijack the session > [ ] connecting without verifying the server's identity > [ ] divulging the plaintext password to the server > [ ] ... I think that sounds like a great idea, looking forward to the proposal.
Re: Direct SSL connection with ALPN and HBA rules
On Mon, 13 May 2024 at 15:38, Heikki Linnakangas wrote: > Here's a patch to implement that. + if (conn->sslnegotiation[0] == 'd' && + conn->sslmode[0] != 'r' && conn->sslmode[0] != 'v') I think these checks should use strcmp instead of checking magic first characters. I see this same clever trick is used in the recently added init_allowed_encryption_methods, and I think that should be changed to use strcmp too for readability.
Re: Direct SSL connection with ALPN and HBA rules
On Sun, 12 May 2024 at 23:39, Heikki Linnakangas wrote: > You might miss that by changing sslnegotiation to 'postgres', or by > removing it altogether, you not only made it compatible with older > server versions, but you also allowed falling back to a plaintext > connection. Maybe you're fine with that, but maybe not. I'd like to > nudge people to use sslmode=require, not rely on implicit stuff like > this just to make connection strings a little shorter. I understand your worry, but I'm not sure that this is actually much of a security issue in practice. sslmode=prefer and sslmode=require are the same amount of insecure imho (i.e. extremely insecure). The only reason sslmode=prefer would connect as non-ssl to a server that supports ssl is if an attacker has write access to the network in the middle (i.e. eavesdropping ability alone is not enough). Once an attacker has this level of network access, it's trivial for this attacker to read any data sent to Postgres by intercepting the TLS handshake and doing TLS termination with some arbitrary cert (any cert is trusted by sslmode=require). So the only actual case where this is a security issue I can think of is when an attacker has only eavesdropping ability on the network. And somehow the Postgres server that the client tries to connect to is configured incorrectly, so that no ssl is set up at all. Then a client would drop to plaintext, when connecting to this server instead of erroring, and the attacker could now read the traffic. But I don't really see this scenario end up any differently when requiring people to enter sslmode=require. The only action a user can take to connect to a server that does not have ssl support is to remove sslmode=require from the connection string. Except if they are also the server operator, in which case they could enable ssl on the server. But if ssl is not set up, then it was probably never set up, and thus providing sslnegotiation=direct would be to test if ssl works. But, if you disagree I'm fine with erroring on plain sslnegotiation=direct > In v18, I'd like to make sslmode=require the default. Or maybe introduce > a new setting like "encryption=ssl|gss|none", defaulting to 'ssl'. If we > want to encourage encryption, that's the right way to do it. (I'd still > recommend everyone to use an explicit sslmode=require in their > connection strings for many years, though, because you might be using an > older client without realizing it.) I'm definitely a huge proponent of making the connection defaults more secure. But as described above sslmode=require is still extremely insecure and I don't think we gain anything substantial by making it the default. I think the only useful secure default would be to use sslmode=verify-full (with probably some automatic fallback to sslmode=prefer when connecting to hardcoded IPs or localhost). Which probably means that sslrootcert=system should also be made the default. Which would mean that ~/.postgresql/root.crt would not be the default anymore, which I personally think is fine but others likely disagree.
Re: Direct SSL connection with ALPN and HBA rules
On Fri, 10 May 2024 at 15:50, Heikki Linnakangas wrote: > New proposal: > > - Remove the "try both" mode completely, and rename "requiredirect" to > just "direct". So there would be just two modes: "postgres" and > "direct". On reflection, the automatic fallback mode doesn't seem very > useful. It would make sense as the default, because then you would get > the benefits automatically in most cases but still be compatible with > old servers. But if it's not the default, you have to fiddle with libpq > settings anyway to enable it, and then you might as well use the > "requiredirect" mode when you know the server supports it. There isn't > anything wrong with it as such, but given how much confusion there's > been on how this all works, I'd prefer to cut this back to the bare > minimum now. We can add it back in the future, and perhaps make it the > default at the same time. This addresses points 2. and 3. above. > > and: > > - Only allow sslnegotiation=direct with sslmode=require or higher. This > is what you, Jacob, wanted to do all along, and addresses point 1. > > Thoughts? Sounds mostly good to me. But I think we'd want to automatically increase sslmode to require if it is unset, but sslnegotation is set to direct. Similar to how we bump sslmode to verify-full if sslrootcert is set to system, but sslmode is unset. i.e. it seems unnecessary/unwanted to throw an error if the connection string only contains sslnegotiation=direct
Re: First draft of PG 17 release notes
On Fri, 10 May 2024 at 23:31, Tom Lane wrote: > > Bruce Momjian writes: > > I looked at both of these. In both cases I didn't see why the user > > would need to know these changes were made: > > I agree that the buffering change is not likely interesting, but > the fact that you can now control-C out of a psql "\c" command > is user-visible. People might have internalized the fact that > it didn't work, or created complicated workarounds. The buffering change improved performance up to ~40% in some of the benchmarks. The case it improves mostly is COPY of large rows and streaming a base backup. That sounds user-visible enough to me to warrant an entry imho.
Re: First draft of PG 17 release notes
On Thu, 9 May 2024 at 06:04, Bruce Momjian wrote: > > I have committed the first draft of the PG 17 release notes; you can > see the results here: > > https://momjian.us/pgsql_docs/release-17.html Great work! There are two commits that I think would benefit from being listed (but maybe they are already listed and I somehow missed them, or they are left out on purpose for some reason): - c4ab7da60617f020e8d75b1584d0754005d71830 - cafe1056558fe07cdc52b95205588fcd80870362
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, 23 Apr 2024 at 19:39, Jacob Champion wrote: > > On Mon, Apr 22, 2024 at 2:20 PM Jelte Fennema-Nio wrote: > > 1. I strongly believe minor protocol version bumps after the initial > > 3.1 one can be made painless for clients/poolers (so the ones to > > 3.2/3.3/etc). Similar to how TLS 1.3 can be safely introduced, and not > > having to worry about breaking TLS 1.2 communication. > > Apologies for focusing on a single portion of your argument, but this > claim in particular stuck out to me. To my understanding, IETF worried > a _lot_ about breaking TLS 1.2 implementations with the TLS 1.3 > change, to the point that TLS 1.3 clients and servers advertise > themselves as TLS 1.2 and sneak the actual version used into a TLS > extension (roughly analogous to the _pq_ stuff). I vaguely recall that > the engineering work done for that update was pretty far from > painless. My bad... I guess TLS 1.3 was a bad example, due to it changing the handshake itself so significantly.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Tue, 23 Apr 2024 at 17:03, Robert Haas wrote: > If a client doesn't know about encrypted columns and sets that > bit at random, that will break things, and formally I think that's a > risk, because I don't believe we document anywhere that you shouldn't > set unused bits in the format mask. But practically, it's not likely. > (And also, maybe we should document that you shouldn't do that.) Currently Postgres errors out when you set anything other than 0 or 1 in the format field. And it's already documented that these are the only allowed values: "Each must presently be zero (text) or one (binary)." > Let's consider a hypothetical country much like Canada except that > there are three official languages rather than two: English, French, > and Robertish. I don't really understand the point you are trying to make with this analogy. Sure an almost equivalent unused language maybe shouldn't be put on the signs, but having two different names (aka protocol versions) for English and Robertish seems quite useful to avoid confusion. > And so here. If someone codes a connection pooler in the way you > suppose, then it will break. But, first of all, they probably won't do > that, both because it's not particularly likely that someone wants to > gather that particular set of statistics and also because erroring out > seems like an overreaction. Is it really that much of an overreaction? Postgres happily errors on any weirdness in the protocol including unknown format codes. Why wouldn't a pooler/proxy in the middle be allowed to do the same? The pooler has no clue what the new format means, maybe it requires some session state to be passed on to the server to be interpreted correctly. The pooler would be responsible for syncing that session state but might not have synced it because it doesn't know that the format code requires that. An example of this would be a compressed value of which the compression algorithm is configured using a GUC. Thus the pooler being strict in what it accepts would be a good way of notifying the client that the pooler needs to be updated (or the feature not used). But I do agree that it seems quite unlikely that a pooler would implement it like that though. However, a pooler logging a warning seems not that crazy. And in that case the connection might not be closed, but the logs would be flooded. > And secondly, let's imagine that we do > bump the protocol version and think about whether and how that solves > the problem. A client will request from the pooler a version 3.1 > connection and the pooler will say, sorry, no can do, I only > understand 3.0. So the client will now say, oh ok, no problem, I'm > going to refrain from setting that parameter format bit. Cool, right? > > Well, no, not really. First, now the client application is probably > broken. If the client is varying its behavior based on the server's > protocol version, that must mean that it cares about accessing > encrypted columns, and that means that the bit in question is not an > optional feature. So actually, the fact that the pooler can force the > client to downgrade hasn't fixed anything at all. It seems quite a lot nicer if the client can safely fallback to not writing data using the new format code, instead of getting an error from the pooler/flooding the pooler logs/having the pooler close the connection. > Third, applications, drivers, and connection poolers now all need to > worry about handling downgrades smoothly. If a connection pooler > requests a v3.1 connection to the server and gets v3.0, it had better > make sure that it only advertises 3.0 to the client. This seems quite straightforward to solve from a pooler perspective: 1. Connect to the server first to find out what the maximum version is that it support 2. If a client asks for a higher version, advertise the server version > If the client > requests v3.0, the pooler had better make sure to either request v3.0 > from the server. Or alternatively, the pooler can be prepared to > translate between 3.0 and 3.1 wherever that's needed, in either > direction. But it's not at all clear what that would look like for > something like TCE. Will the pooler arrange to encrypt parameters > destined for encrypted tables if the client doesn't do so? Will it > arrange to decrypt values coming from encrypted tables if the client > doesn't understand encryption? This is a harder problem, and indeed one I hadn't considered much. >From a pooler perspective I really would like to be able to have just one pool of connections to the server all initially connected with the 3.1 protocol and use those 3.1 server connections for clients that use the 3.1 protocol but also for clients that use the 3.0 protocol. Creating separate pools of server connections for different protocol versions is super annoying to manage for operators (e.g. how should these pools be sized). One of the reasons I'm proposing the ParameterSet message is to avoid this problem for protocol
Re: Reducing the log spam
On Thu, 2 May 2024 at 13:08, Jelte Fennema-Nio wrote: > 2. Change the newly added check in errcode() to only set > output_to_server to false when IsLogicalWorker() returns false. Actually a third, and probably even better solution would be to only apply this new GUC to non-backgroundworker processes. That seems quite reasonable, since often the only way to access background worker errors is often through the logs.
Re: Reducing the log spam
On Thu, 2 May 2024 at 12:47, Laurenz Albe wrote: > Yes. But I'd argue that that is a shortcoming of logical replication: > there should be a ways to get this information via SQL. Having to look into > the log file is not a very useful option. Definitely agreed that accessing the error details using SQL would be much better. But having no way at all (by default) to find the cause of the failure is clearly much worse. > The feature will become much less useful if unique voilations keep getting > logged. Agreed. How about changing the patch so that the GUC is not applied to logical replication apply workers (and document that accordingly). I can think of two ways of achieving that (but there might be other/better ones): 1. Set the GUC to empty string when an apply worker is started. 2. Change the newly added check in errcode() to only set output_to_server to false when IsLogicalWorker() returns false.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Mon, 22 Apr 2024 at 16:26, Robert Haas wrote: > That's a fair point, but I'm still not seeing much practical > advantage. It's unlikely that a client is going to set a random bit in > a format parameter for no reason. I think you're missing an important point of mine here. The client wouldn't be "setting a random bit in a format parameter for no reason". The client would decide it is allowed to set this bit, because the PG version it connected to supports column encryption (e.g. PG18). But this completely breaks protocol and application layer separation. It doesn't seem completely outside of the realm of possibility for a pooler to gather some statistics on the amount of Bind messages that use text vs binary query parameters. That's very easily doable now, while looking only at the protocol layer. If a client then sets the new format parameter bit, this pooler could then get confused and close the connection. > Perhaps this is the root of our disagreement, or at least part of it. > I completely agree that it is important for human beings to be able to > understand whether, and how, the wire protocol has changed from one > release to another. I think this is partially the reason for our disagreement, but I think there are at least two other large reasons: 1. I strongly believe minor protocol version bumps after the initial 3.1 one can be made painless for clients/poolers (so the ones to 3.2/3.3/etc). Similar to how TLS 1.3 can be safely introduced, and not having to worry about breaking TLS 1.2 communication. Once clients and poolers implement version negotiation support for 3.1, there's no reason for version negation support to work for 3.0 and 3.1 to then suddenly break on the 3.2 bump. To be clear, I'm talking about the act of bumping the version here, not the actual protocol changes. So assuming zero/near-zero client implementation effort for the new features (like never setting the newly supported bit in a format parameter), then bumping the protocol version for these new features can never have negative consequences. 2. I very much want to keep a clear split between the protocol layer and the application layer of our communication. And these layers merge whenever (like you say) "the wire protocol has changed from one release to another", but no protocol version bump or protocol extension is used to indicate that. When that happens the only way for a client to know what valid wire protocol messages are according to the server, is by checking the server version. This completely breaks the separation between layers. So, while checking the server version indeed works for direct client to postgres communication, it starts to break down whenever you put a pooler inbetween (as explained in the example earlier in this email). And it breaks down even more when connecting to servers that implement the Postgres wire protocol, but are not postgres at all, like CockroachDB. Right now libpq and other postgres drivers can be used to talk to these other servers and poolers, but if we start mixing protocol and application layer stuff then eventually that will stop being the case. Afaict from your responses, you disagree with 1. However, it's not at all clear to me what exact problems you're worried about. It sounds like you don't know either, and it's more that you're worried about things breaking for not yet known reasons. I hoped to take away/reduce those worries using some arguments in a previous email (quoted below), but you didn't respond to those arguments, so I'm not sure if they were able to change your mind. On Thu, 18 Apr 2024 at 21:34, Jelte Fennema-Nio wrote: > When the server supports a lower version than the client, the client > should disable certain features because it gets the > ProtocolVersionNegotiation message. This is also true if we don't bump > the version. Negotiating a lower version actually makes it clearer for > the client what features to disable. Using the reported postgres > version for this might not, because a connection pooler in the middle > might not support the features that the client wants and thus throw an > error (e.g. due to the client sending unknown messages) even if the > backing Postgres server would support these features. Not to mention > non-postgresql servers that implement the PostgreSQL protocol (of > which there are more and more). > > When the server supports a higher version, the client never even > notices this because the server will silently accept and only enable > the features of the lower version. So this could never cause breakage, > as from the client's perspective the server didn't bump their protocol > version.
Re: Support a wildcard in backtrace_functions
On Sat, 20 Apr 2024 at 01:19, Michael Paquier wrote: > Removing this GUC and making the backend react by default the same way > as when this GUC was enabled sounds like a sensible route here. This > stuff is useful. I definitely agree it's useful. But I feel like changing the default of the GUC and removing the ability to disable it at the same time are pretty radical changes that we should not be doing after a feature freeze. I think we should at least have a way to turn this feature off to be able to avoid log spam. Especially given the fact that extensions use elog much more freely than core. Which afaict from other threads[1] Tom also thinks we should normally be careful about. Of the options to resolve the open item so far, I think there are only three somewhat reasonable to do after the feature freeze: 1. Rename the GUC to something else (but keep behaviour the same) 2. Decide to keep the GUC as is 3. Revert a740b213d4 I hoped 1 was possible, but based on the discussion so far it doesn't seem like we'll be able to get a quick agreement on a name. IMHO 2 is just a version of 1, but with a GUC name that no-one thinks is a good one. So I think we are left with option 3. [1]: https://www.postgresql.org/message-id/flat/524751.1707240550%40sss.pgh.pa.us#59710fd4f3f186e642b8e6b886b2fdff
Re: Support a wildcard in backtrace_functions
On Fri, 19 Apr 2024 at 10:58, Peter Eisentraut wrote: > This presupposes that there is consensus about how the future > functionality should look like. This topic has gone through half a > dozen designs over a few months, and I think it would be premature to > randomly freeze that discussion now and backport that design. While there maybe isn't consensus on what a new design exactly looks like, I do feel like there's consensus on this thread that the backtrace_on_internal_error GUC is almost certainly not the design that we want. I guess a more conservative approach to this problem would be to revert the backtrace_on_internal_error commit and agree on a better design for PG18. But I didn't think that would be necessary if we could agree on the name for a more flexibly named GUC, which seemed quite possible to me (after a little bit of bikeshedding). > If a better, more comprehensive design arises for PG18, I think it would > be pretty easy to either remove backtrace_on_internal_error or just > internally remap it. I think releasing a GUC (even if it's just meant for developers) in PG17 and then deprecating it for a newer version in PG18 wouldn't be a great look. And even if that's not a huge problem, it still seems better not to have the problem at all. Renaming the GUC now seems to have only upsides to me: worst case the new design turns out not to be what we want either, and we have to deprecate the GUC. But in the best case we don't need to deprecate anything.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, 18 Apr 2024 at 23:36, Jelte Fennema-Nio wrote: > To clarify: My proposed approach is to use a protocol extension > parameter for to enable the new messages that the server can send > (like Peter is doing now). And **in addition to that** gate the new > Bind format type behind a feature switch. ugh, correction: gate the new Bind format type behind a **protocol bump**
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, 18 Apr 2024 at 22:17, Robert Haas wrote: > If we go with Peter's approach, every driver that supports his feature > will work perfectly, and every driver that doesn't will work exactly > as it does today. The risk of breaking anything is as near to zero as > human developers can reasonably hope to achieve. Nobody who doesn't > care about the feature will have to lift a single finger, today, > tomorrow, or ever. That's absolutely brilliant. > > If we instead go with your approach, then anyone who wants to support > 3.2 when it materializes will have to also support 3.1, which means > they have to support this feature. To clarify: My proposed approach is to use a protocol extension parameter for to enable the new messages that the server can send (like Peter is doing now). And **in addition to that** gate the new Bind format type behind a feature switch. There is literally nothing clients will have to do to "support" that feature (except for requesting a higher version protocol). Your suggestion of not bumping the version but still allowing the new format type on version 3.0 doesn't have any advantage afaict, except secretly hiding from any pooler in the middle that such a format type might be sent. > Also, even just 3.1 is going to break > something for somebody. There's just no way that we've left the > protocol version unchanged for this long and the first change we make > doesn't cause some collateral damage. Sure, but the exact same argument holds for protocol extension parameters. We've never set them, so they are bound to break something the first time. My whole point is that once we bite that bullet, the next protocol parameters and protocol version bumps won't cause such breakage. > Sure, those are minor downsides in the grand scheme of things. But > AFAICS the only downside of Peter's approach that you've alleged is > that doesn't involve bumping the version number. Of course, if bumping > the version number is an intrinsic good, then no further justification > is required, but I don't buy that. I do not believe that users or > maintainers will throw us a pizza party when they find out that we've > changed the version number. There's no reason for anyone to be happy > about that for its own sake. As a connection pooler maintainer I would definitely love it if every protocol change required either a protocol version parameter or a protocol version bump. That way I can easily check every release if the protocol changed by looking at two things, instead of diffing the protocol docs for some tiny "supposedly irrelevant" change was made.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Mon, 15 Apr 2024 at 21:47, Dave Cramer wrote: >> On Mon, 15 Apr 2024 at 19:52, Robert Haas wrote: >> > surely it can't be right to use protocol >> > version 3.0 to refer to a bunch of different things. But at the same >> > time, surely we don't want clients to start panicking and bailing out >> > when everything would have been fine. >> >> I feel like the ProtocolVersionNegotiation should make sure people >> don't panic and bail out. And if not, then feature flags won't help >> with this either. Because clients would then still bail out if some >> feature is not supported. > > I don't think a client should ever bail out. They may not support something > but IMO bailing out is not an option. On Thu, 18 Apr 2024 at 21:01, Robert Haas wrote: > On Thu, Apr 18, 2024 at 1:49 PM Jelte Fennema-Nio wrote: > > IMHO that means that we should also bump the protocol version for this > > change, because it's changing the wire protocol by adding a new > > parameter format code. And it does so in a way that does not depend on > > the new protocol extension. > > I think we're more or less covering the same ground we did on the > other thread here -- in theory I don't love the fact that we never > bump the protocol version when we change stuff, but in practice if we > start bumping it every time we do anything I think it's going to just > break a bunch of stuff without any real benefit. (the second quoted message comes from Peter his column encryption thread, but responding here to keep this discussion in one place) I really don't understand what exactly you're worried about. What do you expect will break when bumping the protocol version? As Dave said, clients should never bail out due to protocol version differences. When the server supports a lower version than the client, the client should disable certain features because it gets the ProtocolVersionNegotiation message. This is also true if we don't bump the version. Negotiating a lower version actually makes it clearer for the client what features to disable. Using the reported postgres version for this might not, because a connection pooler in the middle might not support the features that the client wants and thus throw an error (e.g. due to the client sending unknown messages) even if the backing Postgres server would support these features. Not to mention non-postgresql servers that implement the PostgreSQL protocol (of which there are more and more). When the server supports a higher version, the client never even notices this because the server will silently accept and only enable the features of the lower version. So this could never cause breakage, as from the client's perspective the server didn't bump their protocol version. So, I don't understand why you seem to view bumping the protocol version with so much negativity. We're also bumping PG versions every year. Afaik people only like that, partially because it's immediately clear that certain features (e.g. MERGE) are not supported when connecting to older servers. To me the same is true for bumping the protocol version. There are no downsides to bumping it, only upsides.
Re: Transparent column encryption
On Thu, 18 Apr 2024 at 18:46, Robert Haas wrote: > With regard to the Bind message, I suggest that we regard the protocol > change as reserving a currently-unused bit in the message to indicate > whether the value is pre-encrypted, without reference to the protocol > extension. It could be legal for a client that can't understand > encryption message from the server to supply an encrypted value to be > inserted into a column. And I don't think we would ever want the bit > that's being reserved here to be used by some other extension for some > other purpose, even when this extension isn't used. So I don't see a > need for this to be tied into the protocol extension. I think this is an interesting idea. I can indeed see use cases for e.g. inserting a new row based on another row (where the secret is the same). IMHO that means that we should also bump the protocol version for this change, because it's changing the wire protocol by adding a new parameter format code. And it does so in a way that does not depend on the new protocol extension.
Re: Transparent column encryption
On Thu, 18 Apr 2024 at 13:25, Peter Eisentraut wrote: > Hopefully, the reason for key rotation is mainly that policies require > key rotation, not that keys get compromised all the time. These key rotation policies are generally in place to reduce the impact of a key compromise by limiting the time a compromised key is valid. > This > seems pretty standard to me. For example, I can change the password on > my laptop's file system encryption, which somehow wraps a lower-level > key, but I can't reencrypt the actual file system in place. I think the threat model for this proposal and a laptop's file system encryption are different enough that the same choices/tradeoffs don't automatically translate. Specifically in this proposal the unencrypted CEK is present on all servers that need to read/write those encrypted values. And a successful attacker would then be able to read the encrypted values forever with this key, because it effectively cannot be rotated. That is a much bigger attack surface and risk than a laptop's disk encryption. So, I feel quite strongly that shipping the proposed feature without being able to re-encrypt columns in an online fashion would be a mistake. > That's the > reason for having this two-tier key system in the first place. If we allow for online-rotation of the actual encryption key, then maybe we don't even need this two-tier system ;) Not having this two tier system would have a few benefits in my opinion: 1. We wouldn't need to be sending encrypted key material from the server to every client. Which seems nice from a security, bandwidth and client implementation perspective. 2. Asymmetric encryption of columns is suddenly an option. Allowing certain clients to enter encrypted data into the database but not read it. > Two problems here: One, for deterministic encryption, everyone needs to > agree on the representation, otherwise equality comparisons won't work. > Two, if you give clients the option of storing text or binary, then > clients also get back a mix of text or binary, and it will be a mess. > Just giving the option of storing the payload in binary wouldn't be that > hard, but it's not clear what you can sensibly do with that in the end. How about defining at column creation time if the underlying value should be binary or not? Something like: CREATE TABLE t( mytime timestamp ENCRYPTED WITH (column_encryption_key = cek1, binary=true) ); > Even if the identifiers > somehow were global (but OIDs can also change and are not guaranteed > unique forever), OIDs of existing rows can't just change while a connection is active, right? (all I know is upgrades can change them but that seems fine) Also they are unique within a catalog table, right? > the state of which keys have already been sent is > session state. I agree that this is the case. But it's state that can be tracked fairly easily by a transaction pooler. Similar to how prepared statements can be tracked. And this is easier to do when at the IDs of the same keys are the same across each session to the server, because if they differ then you need to do re-mapping of IDs. > This is kind of like SASL or TLS can add new methods dynamically without > requiring a new version. I mean, as we are learning, making new > protocol versions is kind of hard, so the point was to avoid it. Fair enough > I guess you could do that, but wouldn't that making the decoding of > these messages much more complicated? You would first have to read the > "short" variant, decode the format, and then decide to read the rest. > Seems weird. I see your point. But with the current approach even for queries that don't return any encrypted columns, these useless fields would be part of the RowDescryption. It seems quite annoying to add extra network and parsing overhead all of your queries even if only a small percentage use the encryption feature. Maybe we should add a new message type instead like EncryptedRowDescription, or add some flag field at the start of RowDescription that can be used to indicate that there is encryption info for some of the columns. > Yes, that's what would happen, and that's the intention, so that for > example you can use pg_dump to back up encrypted columns without having > to decrypt them. Okay, makes sense. But I think it would be good to document that. > > A related question to this is that currently libpq throws an error if > > e.g. a master key realm is not defined but another one is. Is that > > really what we want? Is not having one of the realms really that > > different from not providing any realms at all? > > Can you provide a more concrete example of what scenario you have a > concern about? A server has table A and B. A is encrypted with a master key realm X and B is encrypted with master key realm Y. If libpq is only given a key for realm X, and it then tries to read table B, an error is thrown. While if you don't provide any realm at all, you can read from table B just
Re: Support a wildcard in backtrace_functions
On Thu, 18 Apr 2024 at 09:02, Michael Paquier wrote: > > On Fri, Apr 12, 2024 at 09:36:36AM +0900, Michael Paquier wrote: > > log_backtrace speaks a bit more to me as a name for this stuff because > > it logs a backtrace. Now, there is consistency on HEAD as well > > because these GUCs are all prefixed with "backtrace_". Would > > something like a backtrace_mode where we have an enum rather than a > > boolean be better? I guess it depends what we want consistency with. If we want naming consistency with all other LOGGING_WHAT GUCs or if we want naming consistency with the current backtrace_functions GUC. I personally like log_backtrace slightly better, but I don't have a super strong opinion on this either. Another option could be plain "backtrace". > > One thing would be to redesign the existing GUC as > > having two values on HEAD as of: > > - "none", to log nothing. > > - "internal", to log backtraces for internal errors. If we go for backtrace_mode or backtrace, then I think I'd prefer "disabled"/"off" and "internal_error" for these values. > The rest of the proposals had better happen as a v18 discussion, where > extending this GUC is a benefit. agreed
Re: Support a wildcard in backtrace_functions
On Thu, 18 Apr 2024 at 10:50, Peter Eisentraut wrote: > Why exactly is this an open item? Is there anything wrong with the > existing feature? The name of the GUC backtrace_on_internal_error is so specific that it's impossible to extend our backtrace behaviour in future releases without adding yet another backtrace GUC. You started the discussion on renaming it upthread: On Fri, 8 Mar 2024 at 15:51, Peter Eisentraut wrote: > What is the relationship of these changes with the recently added > backtrace_on_internal_error? We had similar discussions there, I feel > like we are doing similar things here but slightly differently. Like, > shouldn't backtrace_functions_min_level also affect > backtrace_on_internal_error? Don't you really just want > backtrace_on_any_error? You are sneaking that in through the backdoor > via backtrace_functions. Can we somehow combine all these use cases > more elegantly? backtrace_on_error = {all|internal|none}?
Re: Speed up clean meson builds by ~25%
On Wed, 17 Apr 2024 at 16:00, Tom Lane wrote: > Break gram.y (say, misspell some token in a production) and > see what happens. The behavior's likely to be timing sensitive > though. Thanks for clarifying. It took me a little while to break gram.y in such a way that I was able to consistently reproduce, but I managed in the end using the attached small diff. And then running ninja in non-parallel mode: ninja -C build all -j1 As I expected this problem was indeed fairly easy to address by still building "backend/parser" before "interfaces". See attached patch. v1-0001-Speed-up-clean-parallel-meson-builds-a-lot.patch Description: Binary data trigger-confusing-build-error.diff Description: Binary data
Re: Speed up clean meson builds by ~25%
On Wed, 17 Apr 2024 at 13:41, Peter Eisentraut wrote: > > On 10.04.24 17:33, Tom Lane wrote: > > The immediate question then is do we want to take Jelte's patch > > as a way to ameliorate the pain meanwhile. I'm kind of down on > > it, because AFAICS what would happen if you break the core > > grammar is that (most likely) the failure would be reported > > against ecpg first. That seems pretty confusing. > > Yeah that would be confusing. How can I test if this actually happens? Because it sounds like if that indeed happens it should be fixable fairly easily.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Mon, 15 Apr 2024 at 19:43, Robert Haas wrote: > > On Sat, Apr 6, 2024 at 6:14 PM Jelte Fennema-Nio wrote: > > I think for clients/drivers, the work would generally be pretty > > minimal. For almost all proposed changes, clients can "support" the > > protocol version update by simply not using the new features, ... > > I mean, I agree if a particular protocol version bump does nothing > other than signal the presence of some optional, ignorable feature, > then it doesn't cause a problem if we force clients to support it. But > that seems a bit like saying that eating wild mushrooms is fine > because some of them aren't poisonous. The point is that if we roll > out two protocol changes, A and B, each of which requires the client > to make some change in order to work with the newer protocol version, > then using version numbers as the gating mechanism requires that the > client can't support the newer of those two changes without also > supporting the older one. Using feature flags doesn't impose that > constraint, which I think is a plus. I think we're in agreement here, i.e. it depends on the situation if a feature flag or version bump is more appropriate. I think the guidelines could be as follows: 1. For protocol changes that require "extremely minimal" work from clients & poolers: bump the protocol version. 2. For "niche" features that require some work from clients and/or poolers: use a protocol parameter feature flag. 3. For anything in between, let's discuss on the thread for that specific protocol change on the tradeoffs. On Mon, 15 Apr 2024 at 19:52, Robert Haas wrote: > surely it can't be right to use protocol > version 3.0 to refer to a bunch of different things. But at the same > time, surely we don't want clients to start panicking and bailing out > when everything would have been fine. I feel like the ProtocolVersionNegotiation should make sure people don't panic and bail out. And if not, then feature flags won't help with this either. Because clients would then still bail out if some feature is not supported. > I'm unconvinced that we should let ParameterSet change > non-PGC_PROTOCOL GUCs. The pooler can agree on a list of protocol GUCs > with the end client that differs from what the server agreed with the > pooler - e.g., it can add pgbouncer.pool_mode to the list. But for > truly non-protocol GUCs, we have a lot of ways to set those already. I feel like you're glossing over something fairly important here. How exactly would the client know about pgbouncer.pool_mode? Are you envisioning a list of GUCs which can be changed using ParameterSet, which the server then sends to the client during connection startup (using presumably some new protocol message)? If so, then I feel this same problem still exists. How would the client know which of those GUCs change wire-protocol behaviour and which don't? It still would need a hardcoded list (now including pgbouncer.pool_mode and maybe more) of things that a user is allowed to change using ParameterSet. So I think a well-known prefix would still be applicable. To be clear, imho the well-known prefix discussion is separate from the discussion about whether Postgres should throw an ERROR when ParameterSet is used to change any non-PGC_PROTOCOL GUC. I'd be fine with disallowing that if that seems better/safer/clearer to you (although I'd love to hear your exact concerns about this). But I'd still want a well-known prefix for protocol parameters. Because that prefix is not for the benefit of the server, it's for the benefit of the client and pooler. So the client/pooler can error if any dangerous GUC is being changed, because the server would accept it and change the wire-protocol accordingly.
Re: psql: Greatly speed up "\d tablename" when not using regexes
On Wed, 10 Apr 2024 at 23:51, Tom Lane wrote: > Is it really necessary for Citus' filter to be a security qual rather > than a plain ol' filter condition? No, it's not. I think using security quals simply required the least amount of code (and it worked just fine if you didn't have lots of tables). I created a PR for Citus to address this issue[1] by changing to a normal filter condition. Thanks a lot for pointing me in the right direction to fix this. > That is, as long as the derived condition is leakproof, there's no > reason not to let it go before the security qual. We're probably > failing to consider generating derived quals for anything that isn't > qualified to become an indexqual, and this example shows that that's > leaving money on the table. I think even though my immediate is fixed, I think this would be a good improvement anyway. [1]: https://github.com/citusdata/citus/pull/7577
Re: psql: Greatly speed up "\d tablename" when not using regexes
On Wed, 10 Apr 2024 at 22:11, Tom Lane wrote: > There may be an argument for psql to do what you suggest, > but so far it seems like duplicative complication. > > If there's a case you can demonstrate where "\d foo" doesn't optimize > into an indexscan, we should look into exactly why that's happening, > because I think the cause must be more subtle than this. Hmm, okay so I took a closer look and you're completely right: It's quite a lot more subtle than I initially thought. The query from "\d foo" is fast as long as you don't have Citus installed. It turns out that Citus breaks this regex index search optimization somehow by adding "NOT relation_is_a_known_shard(c.oid)" to the securityQuals of the rangeTableEntry for pg_class in its planner hook. Citus does this to filter out the underlying shards of a table for every query on pg_class. The reason is that these underlying shards cluttered the output of \d and PgAdmin etc. Users also tended to get confused by them, sometimes badly enough to remove them (and thus requiring restore from backup). We have a GUC to turn this filtering off for advanced users: SET citus.show_shards_for_app_name_prefixes = '*'; If you set that the index is used and the query is fast again. Just like what is happening for you. Not using the regex search also worked as a way to trigger an index scan. I'll think/research a bit tomorrow and try some stuff out to see if this is fixable in Citus. That would definitely be preferable to me as it would solve this issue on all Postgres/psql versions that citus supports. If I cannot think of a way to address this in Citus, would it be possible to still consider to merge this patch (assuming comments explaining that Citus is the reason)? Because this planner issue that Citus its behaviour introduces is fixed by the change I proposed in my Patch too (I'm not yet sure how exactly).
Re: psql: Greatly speed up "\d tablename" when not using regexes
On Wed, 10 Apr 2024 at 20:21, Kirill Reshke wrote: > Do we need to force Collaction here like in other branches? > if (PQserverVersion(conn) >= 12) >appendPQExpBufferStr(buf, " COLLATE pg_catalog.default"); According to the commit and codecomment that introduced the COLLATE, it was specifically added for correct regex matching (e.g. \w). So I don't think it's necessary, and I'm pretty sure adding it will cause the index scan not to be used anymore.
Re: psql: Greatly speed up "\d tablename" when not using regexes
On Wed, 10 Apr 2024 at 20:06, Tom Lane wrote: > Really? ISTM this argument is ignoring an optimization the backend > has understood for a long time. Interesting. I didn't know about that optimization. I can't check right now, but probably the COLLATE breaks that optimization.
psql: Greatly speed up "\d tablename" when not using regexes
Running "\d tablename" from psql could take multiple seconds when running on a system with 100k+ tables. The reason for this was that a sequence scan on pg_class takes place, due to regex matching being used. Regex matching is obviously unnecessary when we're looking for an exact match. This checks for this (common) case and starts using plain equality in that case. v1-0001-psql-Greatly-speed-up-d-tablename-when-not-using-.patch Description: Binary data
Re: Transparent column encryption
On Wed, 10 Apr 2024 at 12:13, Peter Eisentraut wrote: > > To kick some things off for PG18, here is an updated version of the > patch for automatic client-side column-level encryption. I only read the docs and none of the code, but here is my feedback on the current design: > (The CEK can't be rotated easily, since > that would require reading out all the data from a table/column and > reencrypting it. We could/should add some custom tooling for that, > but it wouldn't be a routine operation.) This seems like something that requires some more thought because CEK rotation seems just as important as CMK rotation (often both would be compromised at the same time). As far as I can tell the only way to rotate a CEK is by re-encrypting the column for all rows in a single go at the client side, thus taking a write-lock on all rows of the table. That seems quite problematic, because that makes key rotation an operation that requires application downtime. Allowing online rotation is important, otherwise almost no-one will do it preventative at a regular interval. One way to allow online CEK rotation is by allowing a column to be encrypted by one of several keys and/or allow a key to have multiple versions. And then for each row we would store which key/version it was encrypted with. That way for new insertions/updates clients would use the newest version. But clients would still be able to decrypt both old rows with the old key and new rows encrypted with the new key, because the server would give them both keys and tell which row was encrypted with which. Then the old rows can be rewritten by a client in small batches, so that writes to the table can keep working while this operation takes place. This could even be used to allow encrypting previously unencrypted columns using something like "ALTER COLUMN mycol ENCRYPTION KEY cek1". Then unencrypted rows could be indicated by e.g. returning something like NULL for the CEK. +The plaintext inside +the ciphertext is always in text format, but this is invisible to the +protocol. It seems like it would be useful to have a way of storing the plaintext in binary form too. I'm not saying this should be part of the initial version, but it would be good to keep that in mind with the design. + The session-specific identifier of the key. Is it necessary for this identifier to be session-specific? Why not use a global identifier like an oid? Anything session specific makes the job of transaction poolers quite a bit harder. If this identifier would be global, then the message can be forwarded as is to the client instead of re-mapping identifiers between clients and servers (like is needed for prepared statements). + Additional algorithms may be added to this protocol specification without a + change in the protocol version number. What's the reason for not requiring a version bump for this? + If the protocol extension _pq_.column_encryption is + enabled (see ), then + there is also the following for each parameter: It seems a little bit wasteful to include these for all columns, even the ones that don't require encryption. How about only adding these fields when format code is 0x11 Finally, I'm trying to figure out if this really needs to be a protocol extension or if a protocol version bump would work as well without introducing a lot of work for clients/poolers that don't care about it (possibly with some modifications to the proposed protocol changes). What makes this a bit difficult for me is that there's not much written in the documentation on what is supposed to happen for encrypted columns when the protocol extension is not enabled. Is the content just returned/written like it would be with a bytea? Or is writing disallowed because the format code would never be set to 0x11. A related question to this is that currently libpq throws an error if e.g. a master key realm is not defined but another one is. Is that really what we want? Is not having one of the realms really that different from not providing any realms at all? But no-matter these behavioural details, I think it would be fairly easy to add minimal "non-support" for this feature while supporting the new protocol messages. All they would need to do is understand what the new protocol messages/fields mean and either ignore them or throw a clear error. For poolers it's a different story however. For transaction pooling there's quite a bit of work to be done. I already mentioned the session-specific ID being a problem, but even assuming we change that to a global ID there's still difficulties. Key information is only sent by the server if it wasn't sent before in the session[1], so a pooler would need to keep it's own cache and send keys to clients that haven't received them yet. So yeah, I think it would make sense to put this behind a protocol extension feature flag, since it's fairly niche and would require significant work at the pooler side to support.
Re: Support a wildcard in backtrace_functions
On Fri, 22 Mar 2024 at 11:09, Jelte Fennema-Nio wrote: > On Thu, 21 Mar 2024 at 15:41, Jelte Fennema-Nio wrote: > > Attached is a patch that implements this. Since the more I think about > > it, the more I like this approach. > > I now added a 3rd patch to this patchset which changes the > log_backtrace default to "internal", because it seems quite useful to > me if user error reports of internal errors included a backtrace. I think patch 0002 should be considered an Open Item for PG17. Since it's proposing changing the name of the newly introduced backtrace_on_internal_error GUC. If we want to change it in this way, we should do it before the release and preferably before the beta. I added it to the Open Items list[1] so we don't forget to at least decide on this. [1]: https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items
Re: Add trim_trailing_whitespace to editorconfig file
On Thu, 4 Apr 2024 at 16:58, Jelte Fennema-Nio wrote: > > ISTM that with a small shell script, .editorconfig could be generated > > from .gitattributes? > > Honestly, I don't think building such automation is worth the effort. Okay, I spent the time to add a script to generate the editorconfig based on .gitattributes after all. So attached is a patch that adds that. v5-0001-Add-script-to-keep-.editorconfig-in-sync-with-.gi.patch Description: Binary data
Re: PostgreSQL 17 Release Management Team & Feature Freeze
On Mon, 8 Apr 2024 at 20:15, Tomas Vondra wrote: > I 100% understand how frustrating the lack of progress can be, and I > agree we need to do better. I tried to move a number of stuck patches > this CF, and I hope (and plan) to do more of this in the future. > > But I don't quite see how would this rule modification change the > situation for non-committers. The problem that I feel I'm seeing is that committers mostly seem to materially review large patchsets in the last two commit fests. This might be not based in reality, but it is definitely how it feels to me. If someone has stats on this, feel free to share. I'll sketch a situation: There's a big patch that some non-committer submitted that has been sitting on the mailinglist for 6 months or more, only being reviewed by other non-committers, which the submitter quickly addresses. Then in the final commit fest it is finally reviewed by a committer, and they say it requires significant changes. Right now, the submitter can decide to quickly address those changes, and hope to keep the attention of this committer, to hopefully get it merged before the deadline after probably a few more back and forths. But this new rule would mean that those significant changes would be a reason not to put it in the upcoming release. Which I expect would first of all really feel like a slap in the face to the submitter, because it's not their fault that those changes were only requested in the last commit fest. This would then likely result in the submitter not following up quickly (why make time right at this moment, if you're suddenly going to have another full year), which would then cause the committer to lose context of the patch and thus interest in the patch. And then finally getting into the exact same situation next year in the final commit fest, when some other committer didn't agree with the redesign of the first one and request a new one pushing it another year. So yeah, I definitely agree with Matthias. I definitely feel like his rule would seriously impact contributions made by non-committers. Maybe a better solution to this problem would be to spread impactful reviews by committers more evenly throughout the year. Then there wouldn't be such a rush to address them in the last commit fest.
Re: Flushing large data immediately in pqcomm
On Sun, 7 Apr 2024 at 11:34, David Rowley wrote: > That seems to require modifying the following function signatures: > secure_write(), be_tls_write(), be_gssapi_write(). That's not an area > I'm familiar with, however. Attached is a new patchset where 0003 does exactly that. The only place where we need to cast to non-const is for GSS, but that seems fine (commit message has more details). I also added patch 0002, which is a small addition to the function comment of internal_flush_buffer that seemed useful to me to differentiate it from internal_flush (feel free to ignore/rewrite). v9-0001-Make-a-few-variables-size_t-in-pqcomm.c.patch Description: Binary data v9-0002-Expand-comment-of-internal_flush_buffer.patch Description: Binary data v9-0003-Make-backend-libpq-write-functions-take-const-poi.patch Description: Binary data
Re: Flushing large data immediately in pqcomm
On Sun, 7 Apr 2024 at 14:41, David Rowley wrote: > Looking at the code in socket_putmessage_noblock(), I don't understand > why it's ok for PqSendBufferSize to be int but "required" must be > size_t. There's a line that does "PqSendBufferSize = required;". It > kinda looks like they both should be size_t. Am I missing something > that you've thought about? You and Ranier are totally right (I missed this assignment). Attached is v8. v8-0001-Make-a-few-variables-size_t-in-pqcomm.c.patch Description: Binary data
Re: Speed up clean meson builds by ~25%
On Mon, 8 Apr 2024 at 10:02, Peter Eisentraut wrote: > I have tested this with various compilers, and I can confirm that this > shaves off about 5 seconds from the build wall-clock time, which > represents about 10%-20% of the total time. I think this is a good patch. Great to hear. > Interestingly, if I apply the analogous changes to the makefiles, I > don't get any significant improvements. (Depends on local > circumstances, of course.) But I would still suggest to keep the > makefiles aligned. Attached is a patch that also updates the Makefiles, but indeed I don't get any perf gains there either. On Mon, 8 Apr 2024 at 07:23, Michael Paquier wrote: > Well, this is also painful with ./configure. So, even if we are going > to move away from it at this point, we still need to support it for a > couple of years. It looks to me that letting the clang folks know > about the situation is the best way forward. I reported the issue to the clang folks: https://github.com/llvm/llvm-project/issues/87973 But even if my patch doesn't help for ./configure, it still seems like a good idea to me to still reduce compile times when using meson while we wait for clang folks to address the issue. v2-0001-Speed-up-clean-parallel-meson-builds-a-lot.patch Description: Binary data
Re: Speed up clean meson builds by ~25%
On Mon, 8 Apr 2024 at 10:00, Alexander Lakhin wrote: > As I wrote in [1], I didn't observe the issue with clang-18, so maybe it > is fixed already. > Perhaps it's worth rechecking... Using the attached script I got these timings. Clang is significantly slower in all of them. But especially with -Og the difference between is huge. gcc 11.4.0: 7.276s clang 18.1.3: 17.216s gcc 11.4.0 --debug: 7.441s clang 18.1.3 --debug: 18.164s gcc 11.4.0 --debug -Og: 2.418s clang 18.1.3 --debug -Og: 14.864s I reported this same issue to the LLVM project here: https://github.com/llvm/llvm-project/issues/87973 #!/bin/bash set -exo pipefail compile() { ninja -C build src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o rm build/src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o time ninja -C build src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o -v } CC=gcc CC_LD=lld meson setup --reconfigure build --wipe > /dev/null compile CC=clang-18 CC_LD=lld meson setup --reconfigure build --wipe > /dev/null compile CC=gcc CC_LD=lld meson setup --reconfigure --debug build --wipe > /dev/null compile CC=clang-18 CC_LD=lld meson setup --reconfigure --debug build --wipe > /dev/null compile CC=gcc CC_LD=lld meson setup --reconfigure --debug -Dc_args="-Og" build --wipe > /dev/null compile CC=clang-18 CC_LD=lld meson setup --reconfigure --debug -Dc_args="-Og" build --wipe > /dev/null compile
Re: Flushing large data immediately in pqcomm
On Sun, 7 Apr 2024 at 03:39, Andres Freund wrote: > Changing the global vars to size_t seems mildly bogus to me. All it's > achieving is to use slightly more memory. It also just seems unrelated to the > change. I took a closer look at this. I agree that changing PqSendBufferSize to size_t is unnecessary: given the locations that it is used I see no risk of overflow anywhere. Changing the type of PqSendPointer and PqSendStart is needed though, because (as described by Heiki and David upthread) the argument type of internal_flush_buffer is size_t*. So if you actually pass int* there, and the sizes are not the same then you will start writing out of bounds. And because internal_flush_buffer is introduced in this patch, it is related to this change. This is what David just committed too. However, the "required" var actually should be of size_t to avoid overflow if len is larger than int even without this change. So attached is a tiny patch that does that. v7-0001-Avoid-possible-overflow-in-socket_putmessage_nonb.patch Description: Binary data
Re: Flushing large data immediately in pqcomm
On Sat, 6 Apr 2024 at 22:21, Andres Freund wrote: > The small regression for small results is still kinda visible, I haven't yet > tested the patch downthread. Thanks a lot for the faster test script, I'm also impatient. I still saw the small regression with David his patch. Here's a v6 where I think it is now gone. I added inline to internal_put_bytes too. I think that helped especially because for two calls to internal_put_bytes len is a constant (1 and 4) that is smaller than PqSendBufferSize. So for those calls the compiler can now statically eliminate the new codepath because "len >= PqSendBufferSize" is known to be false at compile time. Also I incorporated all of Ranier his comments. v6-0001-Faster-internal_putbytes.patch Description: Binary data
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 5 Apr 2024 at 18:48, Robert Haas wrote: > Maybe we'd be better off adding a libpq connection > option that forces the use of a specific minor protocol version, but > then we'll need backward-compatibility code in libpq basically > forever. But maybe we need that anyway to avoid older and newer > servers being unable to communicate. I think this would be good because it makes testing easy and, like you said, I think we'll need this backward-compatibility code in libpq anyway to be able to connect to old servers. To have even better and more realistic test coverage though, I think we might also want to actually test new libpq against old postgres servers and vice-versa in a build farm animal though. > Plus, you've got all of the consequences for non-core drivers, which > have to both add support for the new wire protocol - if they don't > want to seem outdated and eventually obsolete - and also test that > they're still compatible with all supported server versions. I think for clients/drivers, the work would generally be pretty minimal. For almost all proposed changes, clients can "support" the protocol version update by simply not using the new features, e.g. a client can "support" the ParameterSet feature, by simply never sending the ParameterSet message. So binding it to a protocol version bump doesn't make it any harder for that client to support that protocol version. I'm not saying that is the case for all protocol changes, but based on what's being proposed so far that's definitely a very common theme. Overall, I think this is something to discuss for each protocol change in isolation: i.e. how to make supporting the new feature as painless as possible for clients/drivers. > Connection poolers have the same set of problems. For connection poolers this is indeed a bigger hassle, because they at least need to be able to handle all the new message types that a client can send and maybe do something special for them. But I think if we're careful to keep connection poolers in mind when designing the features themselves then I think this isn't necessarily a problem. And probably indeed for the features that we think are hard for connection poolers to implement, we should be using protocol extension parameter feature flags. But I think a lot of protocol would be fairly trivial for a connection pooler to support. > The whole thing is > almost a hole with no bottom. Keeping up with core changes in this > area could become a massive undertaking for lots and lots of people, > some of whom may be the sole maintainer of some important driver that > now needs a whole bunch of work. I agree with Dave here, if you want to benefit from new features there's some expectation to keep up with the changes. But to be clear, we'd still support old protocol versions too. So we wouldn't break connecting using those clients, they simply wouldn't benefit from some of the new features. I think that's acceptable. > I'm not sure how much it improves things if we imagine adding feature > flags to the existing protocol versions, rather than whole new > protocol versions, but at least it cuts down on the assumption that > adopting new features is mandatory, and that such features are > cumulative. If a driver wants to support TDE but not protocol > parameters or protocol parameters but not TDE, who are we to say no? > If in supporting those things we bump the protocol version to 3.2, and > then 3.3 fixes a huge performance problem, are drivers going to be > required to add support for features they don't care about to get the > performance fixes? I think there's an important trade-off here. On one side we don't want to make maintainers of clients/poolers do lots of work to support features they don't care about. And on the other side it seems quite useful to limit the amount of feature combinations that are used it the wild (both for users and for us) e.g. the combinations of backwards compatibility testing you were talking about would explode if every protocol change was a feature flag. I think this trade-off is something we should be deciding on based on the specific protocol change. But if work needed to "support" the feature is "minimal" (to-be-defined exactly what we consider minimal), I think making it part of a protocol version bump is reasonable. > I see some benefit in bumping the protocol version > for major changes, or for changes that we have an important reason to > make mandatory, or to make previously-optional features for which > support has become in practical terms universal part of the base > feature set. But I'm very skeptical of the idea that we should just > handle as many things as possible via a protocol version bump. We've > been avoiding protocol version bumps like the plague since forever, > and swinging all the way to the other extreme doesn't sound like the > right idea to me. I think there's two parts to a protocol version bump: 1. The changes that cause us to consider a protocol
Re: Flushing large data immediately in pqcomm
On Sat, 6 Apr 2024 at 03:34, David Rowley wrote: > Does anyone else want to try the attached script on the v5 patch to > see if their numbers are better? On my machine (i9-10900X, in Ubuntu 22.04 on WSL on Windows) v5 consistently beats master by ~0.25 seconds: master: Run 100 100 500: 1.948975205 Run 1024 10240 20: 3.039986587 Run 1024 1048576 2000: 2.444176276 Run 1048576 1048576 1000: 2.475328596 v5: Run 100 100 500: 1.997170909 Run 1024 10240 20: 3.057802598 Run 1024 1048576 2000: 2.199449857 Run 1048576 1048576 1000: 2.210328762 The first two runs are pretty much equal, and I ran your script a few more times and this seems like just random variance (sometimes v5 wins those, sometimes master does always quite close to each other). But the last two runs v5 consistently wins. Weird that on your machines you don't see a difference. Are you sure you didn't make a silly mistake, like not restarting postgres or something?
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 5 Apr 2024 at 18:43, Tom Lane wrote: > I don't buy that argument, actually. libpq, and pretty much every > other client AFAIK, has provisions to let higher code levels insert > random options into the startup packet. So to make this work libpq > would have to filter or at least inspect such options, which is > logic that doesn't exist and doesn't seem nice to need. libpq actually doesn't support doing this (only by putting them in the "options" parameter, but _pq_ parameters would not be allowed there), but indeed many other clients do allow this and indeed likely don't have logic to filter/disallow _pq_ prefixed parameters. This seems very easy to address though: Only parse _pq_ options when protocol version 3.1 is requested by the client, and otherwise always report them as "not supported". Then clients upgrading to 3.1, they should filter/disallow _pq_ parameters to be arbitrarily set. I don't think that's hard/not nice to add, it's literally a prefix check for the "_pq_." string. > The other problem with adding these things in the startup packet > is that when you send that packet, you don't know what the server > version is and hence don't know if it will take these options. (imho) the whole point of the _pq_ options is that they don't trigger an error when they are requested by the client, but not supported by the server. So I don't understand your problem here. > What's so bad about insisting that these options must be sent in a > separate message? To not require an additional roundtrip waiting for the server to respond.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 5 Apr 2024 at 18:30, Dave Cramer wrote: >> > I really intended the _pq_ prefix as a way of taking something out of >> > the GUC namespace, not as a part of the GUC namespace that users would >> > see. And I'm reluctant to go back on that. If we want to make >> > pg_protocol.${NAME} mean a wire protocol parameter, well maybe there's >> > something to that idea [insert caveats here]. But doesn't _pq_ look >> > like something that was intended to be internal? That's certainly how >> > I intended it. > > > Is this actually used in practice? If so, how ? No, it's not used for anything at the moment. This whole thread is basically about trying to agree on how we want to make protocol changes in the future in a somewhat standardized way. But using the tools available that we have to not break connecting to old postgres servers: ProtocolVersionNegotation messages, minor version numbers, and _pq_ parameters in the startup message. All of those have so far been completely theoretical and have not appeared in any client-server communication.
Re: Speed up clean meson builds by ~25%
On Fri, 5 Apr 2024 at 17:24, Andres Freund wrote: > I recommend opening a bug report for clang, best with an already preprocessed > input file. > We're going to need to do something about this from our side as well, I > suspect. The times aren't great with gcc either, even if not as bad as with > clang. Agreed. While not a full solution, I think this patch is still good to address some of the pain: Waiting 10 seconds at the end of my build with only 1 of my 10 cores doing anything. So while this doesn't decrease CPU time spent it does decrease wall-clock time significantly in some cases, with afaict no downsides.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 5 Apr 2024 at 16:02, Robert Haas wrote: > Often? > > I kind of hope that the protocol starts to evolve a bit more than it > has, but I don't want a continuous stream of changes. That will be > very hard to test and verify correctness, and a hassle for drivers to > keep up with, and a mess for compatibility. I definitely think protocol changes require a lot more scrutiny than many other things, given their hard/impossible to change nature. But I do think that we shouldn't be at all averse to the act of bumping the protocol version itself. If we have a single small protocol change in one release, then imho it's no problem to bump the protocol version. Bumping the version should be painless. So we shouldn't be inclined to push an already agreed upon protocol change to the next release, because there are some more protocol changes in the pipeline that won't make it in the current one. I don't think this would be any harder for drivers to keep up with, then if we'd bulk all changes together. If driver developers only want to support changes in bulk changes, they can simply choose not to support 3.1 at all if they want and wait until 3.2 to support everything in bulk then. > > So, what approach of checking feature support are you envisioning > > instead? A function for every feature? > > Something like SupportsSetProtocolParameter, that returns an error > > message if it's not supported and NULL otherwise. And then add such a > > support function for every feature? > > Yeah, something like that. > ... > > > I'm also not sure why you're saying a user is not entitled to this > > information. We already provide users of libpq a way to see the full > > Postgres version, and the major protocol version. I think allowing a > > user to access this information is only a good thing. But I agree that > > providing easy to use feature support functions is a better user > > experience in some cases. > > I guess that's a fair point. But I'm worried that if we expose too > much of the internals, we won't be able to change things later. I'll take a look at redesigning the protocol parameter stuff. To work with dedicated functions instead. > I really intended the _pq_ prefix as a way of taking something out of > the GUC namespace, not as a part of the GUC namespace that users would > see. And I'm reluctant to go back on that. If we want to make > pg_protocol.${NAME} mean a wire protocol parameter, well maybe there's > something to that idea [insert caveats here]. But doesn't _pq_ look > like something that was intended to be internal? That's certainly how > I intended it. I agree that _pq_ does look internal and doesn't clearly indicate that it's a protocol related change. But sadly the _pq_ prefix is the only one that doesn't error in startup packets, waiting another 5 years until pg_protocol is allowed in the startup packet doesn't seem like a reasonable solution either. How about naming the GUCs pg_protocol.${NAME}, but still requiring the _pq_ prefix in the StartupPacket. That way only client libraries would have to see this internal prefix and they could remap it someway. I see two options for that: 1. At the server replace the _pq_ prefix with pg_protocol. So _pq_.${NAME} would map to pg_protocol.${name} 2. At the server replace the _pq_.pg_protocol prefix with pg_protocol. So _pq_.pg_protocol.${NAME} would map to pg_protocol.${name}. I guess you prefer option 2, because that would still leave lots of space to do something with the rest of the _pq_ space, i.e. _pq_.magic_pixie_dust can still be used for something different than a GUC. Bikeshedding: I think I prefer protocol.${NAME} over pg_protocol.${NAME}, it's shorter and it seems obvious that protocol is the postgres protocol in this context. This should be a fairly simple change to make. > Wouldn't libpq already know what value it last set? Or is this needed > because it doesn't know what the other side's default is? libpq could/should indeed know this, but for debugging/testing purposes it is quite useful to have a facility to read the server side value. I think defaults should always be whatever was happening if the parameter wasn't specified before, so knowing the server default is not something the client needs to worry about (i.e. the default is defined as part of the protocol spec). > Hmm, OK. I guess if the PGC_PROTOCOL flag makes it so that the GUC can > only be set using ParameterSet, and it also makes them > non-transactional, then it's fine. So to be clear, I can't set these > in postgresql.conf, or postgresql.auto.conf, or via ALTER $ANYTHING, > or via SET, or in any other way than by sending ParameterStatus > messages. And when I send a ParameterStatus message, it doesn't matter > if I'm in a good transaction, an aborted transaction, or no > transaction at all, and the setting change takes effect regardless of > that and regardless of any subsequent rollbacks. Is that right? > > I feel like maybe it's not, because you seem to be
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 5 Apr 2024 at 16:04, Robert Haas wrote: > > On Thu, Apr 4, 2024 at 1:10 PM Jelte Fennema-Nio wrote: > > Attached is a rebased patchset > > We should keep talking about this, but I think we're far too close to > the wire at this point to think about committing anything for v17 at > this point. These are big changes, they haven't been thoroughly > reviewed by anyone AFAICT, and we don't have consensus on what we > ought to be doing. I know that's probably not what you want to hear, > but realistically, I think that's the only reasonable decision at this > point. Agreed on not considering this for PG17 nor this commitfest anymore. I changed the commit fest entry accordingly.
Re: Speed up clean meson builds by ~25%
On Fri, 5 Apr 2024 at 00:45, Jelte Fennema-Nio wrote: > It improved clean build times on my machine (10 cores/20 threads) from ~40 > seconds to ~30 seconds. After discussing this off-list with Bilal, I realized that this gain is only happening for clang builds on my system. Because those take a long time as was also recently discussed in[1]. My builds don't take nearly as long though. I tried with clang 15 through 18 and they all took 10-22 seconds to run and clang comes from apt.llvm.org on Ubuntu 22.04 [1]: https://www.postgresql.org/message-id/flat/CA%2BhUKGLvJ7-%3DfS-J9kN%3DaZWrpyqykwqCBbxXLEhUa9831dPFcg%40mail.gmail.com
Speed up clean meson builds by ~25%
Building the generated ecpg preproc file can take a long time. You can check how long using: ninja -C build src/interfaces/ecpg/preproc/ecpg.p/meson-generated_.._preproc.c.o This moves that file much closer to the top of our build order, so building it can be pipelined much better with other files. It improved clean build times on my machine (10 cores/20 threads) from ~40 seconds to ~30 seconds. You can check improvements for yourself with: ninja -C build clean && ninja -C build all v1-0001-Speed-up-clean-parallel-meson-builds-a-lot.patch Description: Binary data
Re: WIP Incremental JSON Parser
On Thu, 4 Apr 2024 at 23:52, Jelte Fennema-Nio wrote: > Fair enough, I guess I'll change my invocation to include the ninja > "test" target too: > ninja -C build test install-quiet && meson test -C build Actually that doesn't do what I want either because that actually runs the tests too... And I prefer to do that using meson directly, so I can add some filters to it. I'll figure something out when I'm more fresh tomorrow.
Re: WIP Incremental JSON Parser
On Thu, 4 Apr 2024 at 23:34, Jacob Champion wrote: > > On Thu, Apr 4, 2024 at 1:42 PM Jelte Fennema-Nio wrote: > > Maybe that's something worth addressing too. I expected that > > install/install-quiet was a strict superset of a plain ninja > > invocation. > > Maybe that's the intent, but I hope not, because I don't see any > reason for `ninja install` to worry about test-only binaries that > won't be installed. For the tests, there's a pretty clear rationale > for the dependency. Fair enough, I guess I'll change my invocation to include the ninja "test" target too: ninja -C build test install-quiet && meson test -C build
Re: Security lessons from liblzma
On Thu, 4 Apr 2024 at 23:06, Daniel Gustafsson wrote: > > > On 4 Apr 2024, at 23:02, Jelte Fennema-Nio wrote: > > > How about we make it meson/make targets, so they are simply cached > > just like any of our other build artefacts are cached. Then only clean > > builds are impacted, not every test run. > > They already are (well, make not meson yet), they're just not hooked up to any > top-level commands. Running "make ssfiles-clean ssfiles" in src/test/ssl > regenerates all the files from the base config files that define their > contents. Okay turns out even generating them in parallel isn't any faster than running that sequentially. I guess it's because of the strong random generation being the slow part. Command I used was the following and took ~5 seconds on my machine: make -C src/test/ssl sslfiles-clean && make -C src/test/ssl sslfiles -j20 And I think that's actually a good thing, because that would mean total build time is pretty much not impacted if we'd include it as part of the regular clean build. Since building these certs are bottlenecked on randomness, not on CPU (as pretty much all of our other build artifacts are). So they should pipeline pretty very well with the other items, assuming build concurrency is set slightly higher than the number of cores. As a proof of concept I ran the above command in a simple bash loop constantly: while true; do make -C src/test/ssl sslfiles-clean && make -C src/test/ssl sslfiles -j20; done And then ran a clean (parallel) build in another shell: ninja -C build clean && ninja -C build And total build time went from 41 to 43 seconds. To be clear, that's while constantly running the ssl file creation. If I only run the creation once, there's no noticeable increase in build time at all.
Re: Security lessons from liblzma
On Thu, 4 Apr 2024 at 22:56, Daniel Gustafsson wrote: > > > On 4 Apr 2024, at 22:47, Tom Lane wrote: > > > > Robert Haas writes: > >> On Thu, Apr 4, 2024 at 4:25 PM Daniel Gustafsson wrote: > >>> I don't disagree, like I said that very email: it's non-trivial and I > >>> wish we > >>> could make it better somehow, but I don't hav an abundance of good ideas. > > > >> Is the basic issue that we can't rely on the necessary toolchain to be > >> present on every machine where someone might try to build PostgreSQL? > > > > IIUC, it's not really that, but that regenerating these files is > > expensive; multiple seconds even on fast machines. Putting that > > into tests that are run many times a day is unappetizing. > > That's one aspect of it. We could cache the results of course to amortize the > cost over multiple test-runs but at the end of the day it will add time to > test-runs regardless of what we do. How about we make it meson/make targets, so they are simply cached just like any of our other build artefacts are cached. Then only clean builds are impacted, not every test run.
Re: WIP Incremental JSON Parser
On Thu, 4 Apr 2024 at 20:48, Jacob Champion wrote: > > On Thu, Apr 4, 2024 at 11:12 AM Jacob Champion > wrote: > > What's in the `...`? I wouldn't expect to find the test binary in your > > tmp_install. > > Oh, I wonder if this is just a build dependency thing? I typically run > a bare `ninja` right before testing, because I think most of those > dependencies are missing for the tests at the moment. (For example, if > I manually remove the `libpq_pipeline` executable and then try to run > `meson test` without a rebuild, it fails.) > > We can certainly fix that (see attached patch as a first draft) but I > wonder if there was a reason we decided not to during the Meson port? Yeah, that patch fixed my problem. (as well as running plain ninja) To clarify: I did do a rebuild. But it turns out that install-quiet doesn't build everything... The full command I was having problems with was: ninja -C build install-quiet && meson test -C build Maybe that's something worth addressing too. I expected that install/install-quiet was a strict superset of a plain ninja invocation.
Re: WIP Incremental JSON Parser
On Thu, 4 Apr 2024 at 18:13, Andrew Dunstan wrote: > Argh. You get out of the habit when you're running with meson :-( I'm having some issues with meson too actually. "meson test -C build" is now failing with this for me: Command 'test_json_parser_incremental' not found in /home/jelte/opensource/postgres/build/tmp_install//home/jelte/.pgenv/pgsql-17beta9/bin, ...
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, 4 Apr 2024 at 14:50, Peter Eisentraut wrote: > It appears there are several different perspectives about this. My > intuition was that a protocol version change indicates something that we > eventually want all client libraries to support. Whereas protocol > extensions are truly opt-in. > > For example, if we didn't have the ParameterStatus message and someone > wanted to add it, then this ought to be a protocol version change, so > that we eventually get everyone to adopt it. > > But, for example, the column encryption feature is not something I'd > expect all client libraries to implement, so it can be opt-in. I think that is a good way of deciding between version bump vs protocol extension parameter. But I'd like to make one clarification/addition to this logic, if the protocol feature already requires opt-in from the client because of how the feature works, then there's no need for a protocol extension parameter. e.g. if you're column-encryption feature would require the client to send a ColumnDecrypt message before the server would exhibit any behaviour relating to the column-encryption feature, then the client can simply "support" the feature by never sending the ColumnDecrypt message. Thus, in such cases a protocol extension parameter would not be necessary, even if the feature is considered opt-in. > (I believe we have added a number of new protocol messages since the > beginning of the 3.0 protocol, without any version change, so "new > protocol message" might not always be a good example to begin with.) Personally, I feel like this is something we should change. IMHO, we should get to a state where protocol minor version bumps are so low-risk that we can do them whenever we add message types. Right now there are basically multiple versions of the 3.0 protocol, which is very confusing to anyone implementing it. Different servers implementing the 3.0 protocol without the NegotiateVersion message is a good example of that. > I fear that if we prefer protocol extensions over version increases, > then we'd get a very fragmented landscape of different client libraries > supporting different combinations of things. +1
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Ugh, I seem to have somehow missed this response completely. On Thu, 14 Mar 2024 at 21:33, Robert Haas wrote: > While I think that will be a common > pattern, I do not think it will be a universal one. I do agree, > however, that every possible variation of the protocol is either > Boolean-valued (i.e. are we doing X or not?) or something more > complicated (i.e. how exactly are doing X?). Agreed > This is definitely not how I was thinking about it. I was imagining > that we wanted to reserve bumping the protocol version for more > significant changes, and that we'd use _pq_ parameters for relatively > minor new functionality whether Boolean-valued or otherwise. Yeah, we definitely think differently here then. To me bumping the minor protocol version shouldn't be a thing that we would need to carefully consider. It should be easy to do, and probably done often. > The user is entitled to > know whether PQsetProtocolParameter() will work or not, and the user > is entitled to know whether it has a chance of working next time if it > didn't work this time, and when it fails, the user is entitled to a > good error message explaining the reason for the failure. But the user > is not entitled to know what negotiation took place over the wire to > figure that out. They shouldn't need to know that the _pq_ namespace > exists, and they shouldn't need to know whether we negotiated the > availability or unavailability of PQsetProtocolParameter() using [a] > the protocol minor version number, [b] the protocol major version > number, [c] a protocol option called parameter_set, or [d] a protocol > option called bananas_foster. Those are all things that might change > in the future. So, what approach of checking feature support are you envisioning instead? A function for every feature? Something like SupportsSetProtocolParameter, that returns an error message if it's not supported and NULL otherwise. And then add such a support function for every feature? I think I might agree with you that it would be nice for features that depend on a protocol extension parameter, indeed because in the future we might make them required and they don't have to update their logic then. But for features only depending on the protocol version I honestly don't see the point. A protocol version check is always going to continue working. I'm also not sure why you're saying a user is not entitled to this information. We already provide users of libpq a way to see the full Postgres version, and the major protocol version. I think allowing a user to access this information is only a good thing. But I agree that providing easy to use feature support functions is a better user experience in some cases. > But this is another example of a problem that can *easily* be fixed > without using up the entirety of the _pq_ namespace. You can introduce > _pq_.dont_error_on_unknown_gucs=1 or > _pq_.dont_error_on_these_gucs='foo, bar, baz'. The distinction between > the startup packet containing whizzbang=frob and instead containing > _pq_.whizzbang=frob shouldn't be just whether an error is thrown if we > don't know anything about whizzbang. Nice idea, but that sadly won't work well in practice. Because old PG versions don't know about _pq_.dont_error_on_unknown_gucs. So that would mean we'd have to wait another 5 years (and probably more) until we could set non-_pq_-prefixed GUCs safely in the startup message, using this approach. Two side-notes: 1. I realized there is a second benefit to using _pq_ for all GUCs that change the protocol level that I didn't mention in my previous email: It allows clients to assume that _pq_ prefixed GUCs will change the wire-protocol and that they should not allow a user of the client to set them willy-nilly. I'll go into this benefit more in the rest of this email. 2. To clarify: I'm suggesting that a startup packet containing _pq_.whizzbang would actually set the _pq_.whizzbang GUC, and not the whizzbang GUC. i.e. the _pq_ prefix is not stripped-off when parsing the startup packet. > I guess I'm in the same position as you are -- your argument doesn't > really make any sense to me. That also has the unfortunate > disadvantage of making it difficult for me to explain why I don't > agree with you, but let me just tick off a few things that I'm > thinking about here: Thanks for listing these thoughts. That makes it much easier to discuss something concrete. > 1. Connection poolers. If I'm talking to pgpool and pgpool is talking > to the server, and pgpool and I agree to use compression, that's > completely separate from whether pgpool and the server are using > compression. If I have to interrogate the compression state by > executing "SHOW some_compression_guc", then I'm going to get the wrong > answer unless pgpool runs a full SQL parser on every command that I > execute and intercepts the ones that touch protocol parameters. That's > bound to be expensive an unreliable -- consider something like SELECT >
Re: Add trim_trailing_whitespace to editorconfig file
On Thu, 4 Apr 2024 at 17:23, Peter Eisentraut wrote: > git diff-tree --check $(git hash-object -t tree /dev/null) HEAD > > That's what I was hoping for for editorconfig-check, but as I said, the > experience wasn't good. Ah, I wasn't able to find that git incantation. I definitely think it would be good if there was an official cli tool like that for editorconfig, but the Javascript one was the closest I could find. The Go one I haven't tried. On Thu, 4 Apr 2024 at 17:23, Peter Eisentraut wrote: > > On 04.04.24 16:58, Jelte Fennema-Nio wrote: > > On Thu, 4 Apr 2024 at 15:25, Peter Eisentraut wrote: > >> Everybody has git. Everybody who edits .gitattributes can use git to > >> check what they did. > > What CLI command do you use to fix/ gitattributes on all existing > > files? Afaict there's no command to actually remove the trailing > > whitespace that git add complains about. If you don't have such a > > command, then afaict updating gitattributes is also essentially > > blind-updating. > > I don't have a command to fix files automatically, but I have a command > to check them: > > git diff-tree --check $(git hash-object -t tree /dev/null) HEAD > > That's what I was hoping for for editorconfig-check, but as I said, the > experience wasn't good. >
Re: Add trim_trailing_whitespace to editorconfig file
On Thu, 4 Apr 2024 at 15:25, Peter Eisentraut wrote: > Everybody has git. Everybody who edits .gitattributes can use git to > check what they did. What CLI command do you use to fix/ gitattributes on all existing files? Afaict there's no command to actually remove the trailing whitespace that git add complains about. If you don't have such a command, then afaict updating gitattributes is also essentially blind-updating. > But I don't want users of a common tool to bear the > burden of blindly updating files for a much-less-common tool. It's used quite a bit. Many editors/IDEs have built in support (Vim, Visual Studio, IntelliJ), and the ones that don't have an easy to install plugin. It's not meant to be used as a command line tool, but as the name suggests it's meant as editor integration. > ISTM that with a small shell script, .editorconfig could be generated > from .gitattributes? Honestly, I don't think building such automation is worth the effort. Changing the .editorconfig file to be the same is pretty trivial if you look at the existing examples, honestly editorconfig syntax is much more straightforward to me than the gitattributes one. Also gitattributes is only changed very rarely, only 15 times in the 10 years since its creation in our repo, which makes any automation around it probably not worth the investement. This whole comment really seems to only really be about 0004. We already have an outdated editorconfig file in the repo, and it's severely annoying me whenever I'm writing any docs for postgres because it doesn't trim my trailing spaces. If we wouldn't have this editorconfig file in the repo at all, it would actually be better for me, because I could maintain my own file locally myself. But now because there's an incorrect file, I'd have to git stash/pop all the time. Is there any chance the other commits can be at least merged.
Re: Flushing large data immediately in pqcomm
On Thu, 4 Apr 2024 at 13:08, Melih Mutlu wrote: > I changed internal_flush() to an inline function, results look better this > way. It seems you also change internal_flush_buffer to be inline (but only in the actual function definition, not declaration at the top). I don't think inlining internal_flush_buffer should be necessary to avoid the perf regressions, i.e. internal_flush is adding extra indirection compared to master and is only a single line, so that one makes sense to inline. Other than that the code looks good to me. The new results look great. One thing that is quite interesting about these results is that increasing the buffer size results in even better performance (by quite a bit). I don't think we can easily choose a perfect number, as it seems to be a trade-off between memory usage and perf. But allowing people to configure it through a GUC like in your second patchset would be quite useful I think, especially because larger buffers could be configured for connections that would benefit most for it (e.g. replication connections or big COPYs). I think your add-pq_send_buffer_size-GUC.patch is essentially what we would need there but it would need some extra changes to actually be merge-able: 1. needs docs 2. rename PQ_SEND_BUFFER_SIZE (at least make it not UPPER_CASE, but maybe also remove the PQ_ prefix) 3. It's marked as PGC_USERSET, but there's no logic to grow/shrink it after initial allocation
Re: psql not responding to SIGINT upon db reconnection
On Wed, 3 Apr 2024 at 21:49, Robert Haas wrote: > It seems to me that 0001 should either remove the pg_unreachable() > call or change the break to a return, but not both. Updated the patch to only remove the pg_unreachable call (and keep the breaks). > but there's not much value in omitting > pg_unreachable() from unreachable places, either, so I'm not > convinced. But I don't agree with this. Having pg_unreachable in places where it brings no perf benefit has two main downsides: 1. It's extra lines of code 2. If a programmer/reviewer is not careful about maintaining this unreachable invariant (now or in the future), undefined behavior can be introduced quite easily. Also, I'd expect any optimizing compiler to know that the code after the loop is already unreachable if there are no break/goto statements in its body. > I agree with Tristan's analysis of 0002. Updated 0002, to only change the comment. But honestly I don't think using "default" really introduces many chances for future bugs in this case, since it seems very unlikely we'll ever add new variants to the PostgresPollingStatusType enum. From b0ccfb6fca3e4ae9e1e62ac3b6a4c5f428b56e04 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Wed, 3 Apr 2024 15:19:04 +0200 Subject: [PATCH v14 1/2] Fix actually reachable pg_unreachable call In cafe1056558fe07cdc52b95205588fcd80870362 a call to pg_unreachable was introduced that was actually reachable, because there were break statements in the loop. This addresses that by removing the call to pg_unreachable, which seemed of dubious necessity anyway. --- src/bin/psql/command.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index c005624e9c3..479f9f2be59 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -3815,8 +3815,6 @@ wait_until_connected(PGconn *conn) pg_unreachable(); } } - - pg_unreachable(); } void base-commit: 936e3fa3787a51397280c1081587586e83c20399 -- 2.34.1 From 799eb6989d481ab617c473b8acbbfc1a405c3c7d Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Wed, 3 Apr 2024 15:21:52 +0200 Subject: [PATCH v14 2/2] Change comment on PGRES_POLLING_ACTIVE Updates the comment on PGRES_POLLING_ACTIVE, since we're stuck with it forever due to ABI compatibility guarantees. --- src/interfaces/libpq/libpq-fe.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h index 8d3c5c6f662..c184e853889 100644 --- a/src/interfaces/libpq/libpq-fe.h +++ b/src/interfaces/libpq/libpq-fe.h @@ -91,8 +91,7 @@ typedef enum PGRES_POLLING_READING, /* These two indicate that one may */ PGRES_POLLING_WRITING, /* use select before polling again. */ PGRES_POLLING_OK, - PGRES_POLLING_ACTIVE /* unused; keep for awhile for backwards - * compatibility */ + PGRES_POLLING_ACTIVE /* unused; keep for backwards compatibility */ } PostgresPollingStatusType; typedef enum -- 2.34.1
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 4 Apr 2024 at 10:45, Alvaro Herrera wrote: > Yeah, this seems to work and I no longer get that complaint from > headerscheck. patch LGTM
Re: psql not responding to SIGINT upon db reconnection
On Wed, 3 Apr 2024 at 16:55, Tristan Partin wrote: > Removing from the switch statement causes a warning: > > > [1/2] Compiling C object src/bin/psql/psql.p/command.c.o > > ../src/bin/psql/command.c: In function ‘wait_until_connected’: > > ../src/bin/psql/command.c:3803:17: warning: enumeration value > > ‘PGRES_POLLING_ACTIVE’ not handled in switch [-Wswitch] > > 3803 | switch (PQconnectPoll(conn)) > > | ^~ Ofcourse... fixed now From 29100578b5f0b4cec68ee1b8c572c4f280335da3 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Wed, 3 Apr 2024 15:19:04 +0200 Subject: [PATCH v13 1/2] Fix actually reachable pg_unreachable call In cafe1056558fe07cdc52b95205588fcd80870362 a call to pg_unreachable was introduced that was actually reachable, because there were break statements in the loop. This addresses that by both changing the break statements to return and removing the call to pg_unreachable, which seemed of dubious necessity anyway. --- src/bin/psql/command.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index c005624e9c3..dc3cc7c10b7 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -3771,7 +3771,7 @@ wait_until_connected(PGconn *conn) * user has requested a cancellation. */ if (cancel_pressed) - break; + return; /* * Do not assume that the socket remains the same across @@ -3779,7 +3779,7 @@ wait_until_connected(PGconn *conn) */ sock = PQsocket(conn); if (sock == -1) - break; + return; /* * If the user sends SIGINT between the cancel_pressed check, and @@ -3815,8 +3815,6 @@ wait_until_connected(PGconn *conn) pg_unreachable(); } } - - pg_unreachable(); } void base-commit: 936e3fa3787a51397280c1081587586e83c20399 -- 2.34.1 From 76cef25162b44adc20172afee47836ca765d3b5c Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Wed, 3 Apr 2024 15:21:52 +0200 Subject: [PATCH v13 2/2] Remove PGRES_POLLING_ACTIVE case This enum variant has been unused for at least 21 years 44aba280207740d0956160c0288e61f28f024a71. It's been there only for backwards compatibility of the ABI, so no need to check for it. Also update the comment on PGRES_POLLING_ACTIVE, since we're stuck with it forever due to ABI compatibility guarantees. --- src/bin/psql/command.c | 7 ++- src/interfaces/libpq/libpq-fe.h | 3 +-- 2 files changed, 3 insertions(+), 7 deletions(-) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index dc3cc7c10b7..5b31d08bf70 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -3802,17 +3802,14 @@ wait_until_connected(PGconn *conn) switch (PQconnectPoll(conn)) { - case PGRES_POLLING_OK: - case PGRES_POLLING_FAILED: -return; case PGRES_POLLING_READING: forRead = true; continue; case PGRES_POLLING_WRITING: forRead = false; continue; - case PGRES_POLLING_ACTIVE: -pg_unreachable(); + default: +return; } } } diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h index 8d3c5c6f662..c184e853889 100644 --- a/src/interfaces/libpq/libpq-fe.h +++ b/src/interfaces/libpq/libpq-fe.h @@ -91,8 +91,7 @@ typedef enum PGRES_POLLING_READING, /* These two indicate that one may */ PGRES_POLLING_WRITING, /* use select before polling again. */ PGRES_POLLING_OK, - PGRES_POLLING_ACTIVE /* unused; keep for awhile for backwards - * compatibility */ + PGRES_POLLING_ACTIVE /* unused; keep for backwards compatibility */ } PostgresPollingStatusType; typedef enum -- 2.34.1
Re: psql not responding to SIGINT upon db reconnection
On Wed, 3 Apr 2024 at 16:31, Tom Lane wrote: > If we do the latter, we will almost certainly get pushback from > distros who check for library ABI breaks. I fear the comment > suggesting that we could remove it someday is too optimistic. Alright, changed patch 0002 to keep the variant. But remove it from the recently added switch/case. And also updated the comment to remove the "for awhile". From 29100578b5f0b4cec68ee1b8c572c4f280335da3 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Wed, 3 Apr 2024 15:19:04 +0200 Subject: [PATCH v12 1/2] Fix actually reachable pg_unreachable call In cafe1056558fe07cdc52b95205588fcd80870362 a call to pg_unreachable was introduced that was actually reachable, because there were break statements in the loop. This addresses that by both changing the break statements to return and removing the call to pg_unreachable, which seemed of dubious necessity anyway. --- src/bin/psql/command.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index c005624e9c3..dc3cc7c10b7 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -3771,7 +3771,7 @@ wait_until_connected(PGconn *conn) * user has requested a cancellation. */ if (cancel_pressed) - break; + return; /* * Do not assume that the socket remains the same across @@ -3779,7 +3779,7 @@ wait_until_connected(PGconn *conn) */ sock = PQsocket(conn); if (sock == -1) - break; + return; /* * If the user sends SIGINT between the cancel_pressed check, and @@ -3815,8 +3815,6 @@ wait_until_connected(PGconn *conn) pg_unreachable(); } } - - pg_unreachable(); } void base-commit: 936e3fa3787a51397280c1081587586e83c20399 -- 2.34.1 From f71634361a0c9ba3416c75fc3ff6d55536f134c4 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Wed, 3 Apr 2024 15:21:52 +0200 Subject: [PATCH v12 2/2] Remove PGRES_POLLING_ACTIVE case This enum variant has been unused for at least 21 years 44aba280207740d0956160c0288e61f28f024a71. It's been there only for backwards compatibility of the ABI, so no need to check for it. Also update the comment on PGRES_POLLING_ACTIVE, since we're stuck with it forever due to ABI compatibility guarantees. --- src/bin/psql/command.c | 2 -- src/interfaces/libpq/libpq-fe.h | 3 +-- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index dc3cc7c10b7..9a163cf22cd 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -3811,8 +3811,6 @@ wait_until_connected(PGconn *conn) case PGRES_POLLING_WRITING: forRead = false; continue; - case PGRES_POLLING_ACTIVE: -pg_unreachable(); } } } diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h index 8d3c5c6f662..c184e853889 100644 --- a/src/interfaces/libpq/libpq-fe.h +++ b/src/interfaces/libpq/libpq-fe.h @@ -91,8 +91,7 @@ typedef enum PGRES_POLLING_READING, /* These two indicate that one may */ PGRES_POLLING_WRITING, /* use select before polling again. */ PGRES_POLLING_OK, - PGRES_POLLING_ACTIVE /* unused; keep for awhile for backwards - * compatibility */ + PGRES_POLLING_ACTIVE /* unused; keep for backwards compatibility */ } PostgresPollingStatusType; typedef enum -- 2.34.1
Re: psql not responding to SIGINT upon db reconnection
On Tue, 2 Apr 2024 at 16:33, Robert Haas wrote: > Committed it, I did. My thanks for working on this issue, I extend. Looking at the committed version of this patch, the pg_unreachable calls seemed weird to me. 1 is actually incorrect, thus possibly resulting in undefined behaviour. And for the other call an imho better fix would be to remove the now 21 year unused enum variant, instead of introducing its only reference in the whole codebase. Attached are two trivial patches, feel free to remove both of the pg_unreachable calls. From 7f4279bb939e2d2707fdd0f471893d734959ab91 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Wed, 3 Apr 2024 15:21:52 +0200 Subject: [PATCH v11 2/2] Remove PGRES_POLLING_ACTIVE This enum variant has been unused for at least 21 years 44aba280207740d0956160c0288e61f28f024a71. It was left for backwards compatibility for "awhile". I think that time has come. --- src/bin/psql/command.c | 2 -- src/interfaces/libpq/libpq-fe.h | 2 -- 2 files changed, 4 deletions(-) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index dc3cc7c10b7..9a163cf22cd 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -3811,8 +3811,6 @@ wait_until_connected(PGconn *conn) case PGRES_POLLING_WRITING: forRead = false; continue; - case PGRES_POLLING_ACTIVE: -pg_unreachable(); } } } diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h index 8d3c5c6f662..2459b0cf5e1 100644 --- a/src/interfaces/libpq/libpq-fe.h +++ b/src/interfaces/libpq/libpq-fe.h @@ -91,8 +91,6 @@ typedef enum PGRES_POLLING_READING, /* These two indicate that one may */ PGRES_POLLING_WRITING, /* use select before polling again. */ PGRES_POLLING_OK, - PGRES_POLLING_ACTIVE /* unused; keep for awhile for backwards - * compatibility */ } PostgresPollingStatusType; typedef enum -- 2.34.1 From 29100578b5f0b4cec68ee1b8c572c4f280335da3 Mon Sep 17 00:00:00 2001 From: Jelte Fennema-Nio Date: Wed, 3 Apr 2024 15:19:04 +0200 Subject: [PATCH v11 1/2] Fix actually reachable pg_unreachable call In cafe1056558fe07cdc52b95205588fcd80870362 a call to pg_unreachable was introduced that was actually reachable, because there were break statements in the loop. This addresses that by both changing the break statements to return and removing the call to pg_unreachable, which seemed of dubious necessity anyway. --- src/bin/psql/command.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c index c005624e9c3..dc3cc7c10b7 100644 --- a/src/bin/psql/command.c +++ b/src/bin/psql/command.c @@ -3771,7 +3771,7 @@ wait_until_connected(PGconn *conn) * user has requested a cancellation. */ if (cancel_pressed) - break; + return; /* * Do not assume that the socket remains the same across @@ -3779,7 +3779,7 @@ wait_until_connected(PGconn *conn) */ sock = PQsocket(conn); if (sock == -1) - break; + return; /* * If the user sends SIGINT between the cancel_pressed check, and @@ -3815,8 +3815,6 @@ wait_until_connected(PGconn *conn) pg_unreachable(); } } - - pg_unreachable(); } void base-commit: 936e3fa3787a51397280c1081587586e83c20399 -- 2.34.1
Re: Support prepared statement invalidation when result types change
On Sun, 7 Jan 2024 at 07:55, vignesh C wrote: > One of the test has aborted in CFBot at [1] with: Rebased the patchset and removed patch 0003 since it was causing the CI issue reported by vignesh and it seems much less useful and more controversial to me anyway (due to the extra required roundtrip). On Sun, 7 Jan 2024 at 09:17, Shay Rojansky wrote: > Just to point out, FWIW, that the .NET Npgsql driver does indeed cache > RowDescriptions... The whole point of preparation is to optimize things as > much as possible for repeated execution of the query; I get that the value > there is much lower than e.g. doing another network roundtrip, but that's > still extra work that's better off being cut if it can be. Hmm, interesting. I totally agree that it's always good to do less when possible. The problem is that in the face of server side prepared statement invalidations due to DDL changes to the table or search path changes, the row types might change. Or the server needs to constantly throw an error, like it does now, but that seems worse imho. I'm wondering though if we can create a middleground, where a client can still cache the RowDescription client side when no DDL or search_patch changes are happening. But still tell the client about a new RowDescription when they do happen. The only way of doing that I can think of is changing the Postgres protocol in a way similar to this: Allow Execute to return a RowDescription too, but have the server only do so when the previously received RowDescription for this prepared statement is now invalid. This would definitely require some additional tracking at PgBouncer to make it work, i.e. per client and per server it should now keep track of the last RowDescription for each prepared statement. But that's definitely something we could do. This would make this patch much more involved though, as now it would suddenly involve an actual protocol change, and that basically depends on this patch moving forward[1]. [1]: https://www.postgresql.org/message-id/flat/cageczqtg2hcmb5gau53uuwcdc7gcnjfll6mnw0wnhwhgq9u...@mail.gmail.com v6-0002-Completely-remove-fixed_result-from-CachedPlanSou.patch Description: Binary data v6-0001-Support-prepared-statement-invalidation-when-resu.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 28 Mar 2024 at 19:03, Tom Lane wrote: > > Alvaro Herrera writes: > > It doesn't fail when it's too fast -- it's just that it doesn't cover > > the case we want to cover. > > That's hardly better, because then you think you have test > coverage but maybe you don't. Honestly, that seems quite a lot better. Instead of having randomly failing builds, you have a test that creates coverage 80+% of the time. And that also seems a lot better than having no coverage at all (which is what we had for the last 7 years since introduction of cancellations to postgres_fdw). It would be good to expand the comment in the test though saying that the test might not always cover the intended code path, due to timing problems. > Could we make this test bulletproof by using an injection point? > If not, I remain of the opinion that we're better off without it. Possibly, and if so, I agree that would be better than the currently added test. But I honestly don't feel like spending the time on creating such a test. And given 7 years have passed without someone adding any test for this codepath at all, I don't expect anyone else will either. If you both feel we're better off without the test, feel free to remove it. This was just some small missing test coverage that I noticed while working on this patch, that I thought I'd quickly address. I don't particularly care a lot about the specific test.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 28 Mar 2024 at 17:43, Alvaro Herrera wrote: > Hah, you're right, I can reproduce with a smaller timeout, and using SET > LOCAL works as a fix. If we're doing that, why not reduce the timeout > to 1ms? We don't need to wait extra 9ms ... I think we don't really want to make the timeout too short. Otherwise the query might get cancelled before we push any query down to the FDW. I guess that means that for some slow machines even 10ms is not enough to make the test do the intended purpose. I'd keep it at 10ms, which seems long enough for normal systems, while still being pretty short.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Thu, 28 Mar 2024 at 17:34, Alvaro Herrera wrote: > > Eh, kestrel has also failed[1], apparently every query after the large > JOIN that this commit added as test fails with a statement timeout error. > > [1] > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel=2024-03-28%2016%3A01%3A14 Ugh that's annoying, the RESET is timing out too I guess. That can hopefully be easily fixed by changing the new test to: BEGIN; SET LOCAL statement_timeout = '10ms'; select count(*) from ft1 CROSS JOIN ft2 CROSS JOIN ft4 CROSS JOIN ft5; -- this takes very long ROLLBACK;
Re: Possibility to disable `ALTER SYSTEM`
On Thu, 28 Mar 2024 at 12:57, Robert Haas wrote: > I disagree with a lot of these changes. I think the old version was > mostly better. But I can live with a lot of it if it makes other > people happy. I'd have been fine with many of the previous versions of the docs too. (I'm not a native english speaker though, so that might be part of it) > However: Attached is a patch with your last bit of feedback addressed. v12-0001-Add-allow_alter_system-GUC.patch Description: Binary data
Re: Possibility to disable `ALTER SYSTEM`
On Thu, 28 Mar 2024 at 10:24, Jelte Fennema-Nio wrote: > I addressed them all I think. Mostly the small changes that were > suggested, but I rewrote the sentence with "might be discarded". And I > added references to the new GUC in both places suggested by David. Changed the error hint to use "external tool" too. And removed a duplicate header definition of AllowAlterSystem (I moved it to guc.h so it was together with other definitions a few patches ago, but apparently forgot to remove it from guc_tables.h) v11-0001-Add-allow_alter_system-GUC.patch Description: Binary data
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 27 Mar 2024 at 19:27, Alvaro Herrera wrote: > I ended up reducing the two PG_TRY blocks to a single one. I see no > reason to split them up, and this way it looks more legible. I definitely agree this looks better. Not sure why I hadn't done that, maybe it wasn't possible in one of the earlier iterations of the API.
Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
On Wed, 27 Mar 2024 at 19:46, Alvaro Herrera wrote: > > On 2024-Mar-27, Alvaro Herrera wrote: > > > I changed it so that the error messages are returned as translated > > phrases, and was bothered by the fact that if errors happen repeatedly, > > the memory for them might be leaked. Maybe this is fine depending on > > the caller's memory context, but since it's only at most one string each > > time, it's quite easy to just keep track of it so that we can release it > > on the next. > > (Actually this sounds clever but fails pretty obviously if the caller > does free the string, such as in a memory context reset. So I guess we > have to just accept the potential leakage.) Your changes look good, apart from the prverror stuff indeed. If you remove the prverror stuff again I think this is ready to commit.
Re: Possibility to disable `ALTER SYSTEM`
On Thu, 28 Mar 2024 at 01:43, David G. Johnston wrote: > > On Wed, Mar 27, 2024 at 5:17 PM Bruce Momjian wrote: >> >> I addressed them all I think. Mostly the small changes that were suggested, but I rewrote the sentence with "might be discarded". And I added references to the new GUC in both places suggested by David. v10-0001-Add-allow_alter_system-GUC.patch Description: Binary data
Re: Possibility to disable `ALTER SYSTEM`
On Wed, 27 Mar 2024 at 23:06, Bruce Momjian wrote: > > > > > +some outside mechanism. In such environments, using > > > > > ALTER > > > > > +SYSTEM to make configuration changes might appear > > > > > to work, > > > > > +but then may be discarded at some point in the future when > > > > > that outside > > > > > > > > "might" > > > > > > This does not seem like a mistake to me. I'm not sure why you think it is. > > > > I also think the original sentence was correct, but I don't think it > > read very naturally. Changed it now in hopes to improve that. > > So, might means "possibility" while "may" means permission, so "might" > is clearer here. Aaah, I misunderstood your original feedback then. I thought you didn't like the use of "might" in "might appear to work". But I now realize you meant "may be discarded" should be changed to "might be discarded". I addressed that in my latest version of the patch.