Re: libpq compression (part 3)
On Mon, May 20, 2024 at 9:09 PM Andrey M. Borodin wrote: > > > > > On 20 May 2024, at 23:37, Robert Haas wrote: > > > > But, does this mean that we should just refuse to offer compression as > > a feature? > > No, absolutely, we need the feature. > > > I guess I don't understand why TLS removed > > support for encryption entirely instead of disclaiming its use in some > > appropriate way. > > I think, the scope of TLS is too broad. HTTPS in turn has a compression. > But AFAIK it never compress headers. > IMO we should try to avoid compressing authentication information. > That used to be the case in HTTP/1. But header compression was one of the headline features of HTTP/2, which isn't exactly new anymore. But there's a special algorithm, HPACK, for it. And then http/3 uses QPACK. Cloudflare has a pretty decent blog post explaining why and how: https://blog.cloudflare.com/hpack-the-silent-killer-feature-of-http-2/, or rfc7541 for all the details. tl;dr; is yes, let's be careful not to expose headers to a CRIME-style attack. And I doubt our connections has as much to gain by compressing "header style" fields as http, so we are probably better off just not compressing those parts. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: commitfest.postgresql.org is no longer fit for purpose
On Thu, May 16, 2024 at 10:46 PM Melanie Plageman wrote: > On Thu, May 16, 2024 at 2:30 PM Robert Haas wrote: > > > > Hi, > > > > The original intent of CommitFests, and of commitfest.postgresql.org > > by extension, was to provide a place where patches could be registered > > to indicate that they needed to be reviewed, thus enabling patch > > authors and patch reviewers to find each other in a reasonably > > efficient way. I don't think it's working any more. I spent a good > > deal of time going through the CommitFest this week, and I didn't get > > through a very large percentage of it, and what I found is that the > > status of the patches registered there is often much messier than can > > be captured by a simple "Needs Review" or "Waiting on Author," and the > > number of patches that are actually in need of review is not all that > > large. For example, there are: > > > > - patches parked there by a committer who will almost certainly do > > something about them after we branch > > - patches parked there by a committer who probably won't do something > > about them after we branch, but maybe they will, or maybe somebody > > else will, and anyway this way we at least run CI > > -- snip -- > > > So, our CommitFest application has turned into a patch tracker. IMHO, > > patch trackers intrinsically tend to suck, because they fill up with > > garbage that nobody cares about, and nobody wants to do the colossal > > amount of work that it takes to maintain them. But our patch tracker > > sucks MORE, because it's not even intended to BE a general-purpose > > patch tracker. > > I was reflecting on why a general purpose patch tracker sounded > appealing to me, and I realized that, at least at this time of year, I > have a few patches that really count as "waiting on author" that I > know I need to do additional work on before they need more review but > which aren't currently my top priority. I should probably simply > withdraw and re-register them. My justification was that I'll lose > them if I don't keep them in the commitfest app. But, I could just, > you know, save them somewhere myself instead of polluting the > commitfest app with them. I don't know if others are in this > situation. Anyway, I'm definitely currently guilty of parking. > One thing I think we've talked about before (but not done) is to basically have a CF called "parking lot", where you can park patches that aren't active in a commitfest but you also don't want to be dead. It would probably also be doable to have the cf bot run patches in that commitfest as well as the current one, if that's what people are using it for there. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Security lessons from liblzma - libsystemd
On Thu, Apr 4, 2024 at 1:10 AM Peter Eisentraut wrote: > On 03.04.24 23:19, Magnus Hagander wrote: > > When the code is this simple, we should definitely consider carrying it > > ourselves. At least if we don't expect to need *other* functionality > > from the same library in the future, which I doubt we will from > libsystemd. > > Well, I've long had it on my list to do some integration to log directly > to the journal, so you can preserve metadata better. I'm not sure right > now whether this would use libsystemd, but it's not like there is > absolutely no other systemd-related functionality that could be added. > Ah interesting. I hadn't thought of that use-case. > Personally, I think this proposed change is trying to close a barndoor > after a horse has bolted. There are many more interesting and scary > libraries in the dependency tree of "postgres", so just picking off one > right now doesn't really accomplish anything. The next release of > libsystemd will drop all the compression libraries as hard dependencies, > so the issue in that sense is gone anyway. Also, fun fact: liblzma is > also a dependency via libxml2. > To be clear, I didn't mean to single out this one, just saying that it's something we should keep in consideration in general when adding library dependencies. Every new dependency, no matter how small, increases the management and risks for it. And we should just be aware of that and weigh them against each other. As in we should *consider* it, that doesn't' mean we should necessarily *do* it. (And yes, there are many scary dependencies down the tree) -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Add notes to pg_combinebackup docs
On Fri, Apr 12, 2024 at 3:01 PM Tomas Vondra wrote: > > > On 4/12/24 11:12, Magnus Hagander wrote: > > On Tue, Apr 9, 2024 at 11:46 AM Tomas Vondra < > tomas.von...@enterprisedb.com> > > wrote: > > > >> > >> > >> On 4/9/24 09:59, Martín Marqués wrote: > >>> Hello, > >>> > >>> While doing some work/research on the new incremental backup feature > >>> some limitations were not listed in the docs. Mainly the fact that > >>> pg_combienbackup works with plain format and not tar. > >>> > >> > >> Right. The docs mostly imply this by talking about output directory and > >> backup directories, but making it more explicit would not hurt. > >> > >> FWIW it'd be great if we could make incremental backups work with tar > >> format in the future too. People probably don't want to keep around the > >> expanded data directory or extract everything before combining the > >> backups is not very convenient. Reading and writing the tar would make > >> this simpler. > >> > >>> Around the same time, Tomas Vondra tested incremental backups with a > >>> cluster where he enabled checksums after taking the previous full > >>> backup. After combining the backups the synthetic backup had pages > >>> with checksums and other pages without checksums which ended in > >>> checksum errors. > >>> > >> > >> I'm not sure just documenting this limitation is sufficient. We can't > >> make the incremental backups work in this case (it's as if someone > >> messes with cluster without writing stuff into WAL), but I think we > >> should do better than silently producing (seemingly) corrupted backups > > > > > > +1. I think that should be an open item that needs to get sorted. > > > > > > I say seemingly, because the backup is actually fine, the only problem > >> is it has checksums enabled in the controlfile, but the pages from the > >> full backup (and the early incremental backups) have no checksums. > >> > >> What we could do is detect this in pg_combinebackup, and either just > >> disable checksums with a warning and hint to maybe enable them again. Or > >> maybe just print that the user needs to disable them. > >> > > > > I don't think either of these should be done automatically. Something > like > > pg_combinebackup simply failing and requiring you to say > > "--disable-checksums" to have it do that would be the way to go, IMO. > > (once we can reliably detect it of course) > > > > You mean pg_combinebackup would have "--disable-checksums" switch? Yeah, > that'd work, I think. It's probably better than producing a backup that > would seem broken when the user tries to start the instance. > > Also, I realized the user probably can't disable the checksums without > starting the instance to finish recovery. And if there are invalid > checksums, I'm not sure that would actually work. > > > > > I was thinking maybe we could detect this while taking the backups, and > >> force taking a full backup if checksums got enabled since the last > >> backup. But we can't do that because we only have the manifest from the > >> last backup, and the manifest does not include info about checksums. > >> > > > > Can we forcibly read and parse it out of pg_control? > > > > You mean when taking the backup, or during pg_combinebackup? > Yes. That way combining the backups into something that doesn't have proper checksums (either by actually turning them off or as today just breaking them and forcing you to turn it off yourself) can only happen intentionally. And if you weren't aware of the problem, it turns into a hard error, so you will notice before it's too late. During backup, it depends. For the instance we should be able to just > get that from the instance, no need to get it from pg_control. But for > the backup (used as a baseline for the increment) we can't read the > pg_control - the only thing we have is the manifest. > During pg_combinebackup we obviously can read pg_control for all the > backups to combine, but at that point it feels a bit too late - it does > not seem great to do backups, and then at recovery to tell the user the > backups are actually not valid. > I think it'd be better to detect this while taking the basebackup. > Agreed. In the end, we might want to do *both*, but the earlier the better. But to do that what we'd need is to add a flag to the initial manifest that says "this cluster is supposed to h
Re: Add notes to pg_combinebackup docs
On Tue, Apr 9, 2024 at 11:46 AM Tomas Vondra wrote: > > > On 4/9/24 09:59, Martín Marqués wrote: > > Hello, > > > > While doing some work/research on the new incremental backup feature > > some limitations were not listed in the docs. Mainly the fact that > > pg_combienbackup works with plain format and not tar. > > > > Right. The docs mostly imply this by talking about output directory and > backup directories, but making it more explicit would not hurt. > > FWIW it'd be great if we could make incremental backups work with tar > format in the future too. People probably don't want to keep around the > expanded data directory or extract everything before combining the > backups is not very convenient. Reading and writing the tar would make > this simpler. > > > Around the same time, Tomas Vondra tested incremental backups with a > > cluster where he enabled checksums after taking the previous full > > backup. After combining the backups the synthetic backup had pages > > with checksums and other pages without checksums which ended in > > checksum errors. > > > > I'm not sure just documenting this limitation is sufficient. We can't > make the incremental backups work in this case (it's as if someone > messes with cluster without writing stuff into WAL), but I think we > should do better than silently producing (seemingly) corrupted backups +1. I think that should be an open item that needs to get sorted. I say seemingly, because the backup is actually fine, the only problem > is it has checksums enabled in the controlfile, but the pages from the > full backup (and the early incremental backups) have no checksums. > > What we could do is detect this in pg_combinebackup, and either just > disable checksums with a warning and hint to maybe enable them again. Or > maybe just print that the user needs to disable them. > I don't think either of these should be done automatically. Something like pg_combinebackup simply failing and requiring you to say "--disable-checksums" to have it do that would be the way to go, IMO. (once we can reliably detect it of course) I was thinking maybe we could detect this while taking the backups, and > force taking a full backup if checksums got enabled since the last > backup. But we can't do that because we only have the manifest from the > last backup, and the manifest does not include info about checksums. > Can we forcibly read and parse it out of pg_control? It's a bit unfortunate we don't have a way to enable checksums online. > That'd not have this problem IIRC, because it writes proper WAL. Maybe > it's time to revive that idea ... I recall there were some concerns > about tracking progress to allow resuming stuff, but maybe not having > anything because in some (rare?) cases it'd need to do more work does > not seem like a great trade off. > > For that one I still think it would be perfectly acceptable to have no resume at all, but that's a whole different topic :) -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Add notes to pg_combinebackup docs
compression that can be > done on the data. Of course, you'd want to store the compressed tars in > the object store, but that does mean storing an expanded copy somewhere > to do pg_combinebackup. > Object stores are definitely getting more common. I wish they were getting a lot more common than they actually are, because they simplify a lot. But they're in my experience still very far from being a majority. But if the argument is that all this can/will be fixed in the future, I > guess the smart thing for users to do is wait a few releases for > incremental backups to become a practical feature. > There's always going to be another set of goalposts further ahead. I think it can still be practical for quite a few people. I'm more worried about the issue you raised in the other thread about missing files, for example... -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Security lessons from liblzma - libsystemd
On Wed, Apr 3, 2024 at 7:57 PM Andres Freund wrote: > Hi, > > As most will know by now, the way xz debacle was able to make sshd > vulnerable > was through a dependency from sshd to libsystemd and then from libsystemd > to > liblzma. One lesson from this is that unnecessary dependencies can still > increase risk. > Yeah, I think that's something to consider for every dependency added. I think we're fairly often protected against "adding too many libraries" because many libraries simply don't exist for all the platforms we want to build on. But it's nevertheless something to think about each time. It's worth noting that we have an optional dependency on libsystemd as well. > > Openssh has now integrated [1] a patch to remove the dependency on > libsystemd > for triggering service manager readyness notifications, by inlining the > necessary function. That's not hard, the protocol is pretty simple. > > I suspect we should do the same. We're not even close to being a target as > attractive as openssh, but still, it seems unnecessary. > +1. When the code is this simple, we should definitely consider carrying it ourselves. At least if we don't expect to need *other* functionality from the same library in the future, which I doubt we will from libsystemd. An argument could be made to instead just remove support, but I think it's > quite valuable to have intra service dependencies that can rely on the > server > actually having started up. > > If we remove support we're basically just asking most of our linux packagers to add it back in, and they will add it back in the same way we did it. I think we do everybody a disservice if we do that. It's useful functionality. //Magnus
Re: Reports on obsolete Postgres versions
On Tue, Apr 2, 2024 at 9:24 AM Daniel Gustafsson wrote: > > On 2 Apr 2024, at 00:56, Bruce Momjian wrote: > > > I ended up writing the attached doc patch. I found that some or our > > text was overly-wordy, causing the impact of what we were trying to say > > to be lessened. We might want to go farther than this patch, but I > > think it is an improvement. > > Agreed, this is an good incremental improvement over what we have. > > > I also moved the text to the bottom of the section > > +1 > > A few small comments: > > +considers performing minor upgrades to be less risky than continuing to > +run superseded minor versions. > > I think "superseded minor versions" could be unnecessarily complicated for > non-native speakers, I consider myself fairly used to reading english but > still > had to spend a few extra (brain)cycles parsing the meaning of it in this > context. > > + We recommend that users always run the latest minor release associated > > Or perhaps "current minor release" which is the term we use in the table > below > on the same page? > I do like the term "current" better. It conveys (at least a bit) that we really consider all the older ones to be, well, obsolete. The difference "current vs obsolete" is stronger than "latest vs not quite latest". -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Reports on obsolete Postgres versions
On Wed, Mar 13, 2024 at 7:47 PM Jeremy Schneider wrote: > > > On Mar 13, 2024, at 11:39 AM, Tom Lane wrote: > > > > Jeremy Schneider writes: > >>> On 3/13/24 11:21 AM, Tom Lane wrote: > >>> Agreed, we would probably add confusion not reduce it if we were to > >>> change our longstanding nomenclature for this. > > > >> Before v10, the quarterly maintenance updates were unambiguously and > >> always called patch releases > > > > I think that's highly revisionist history. I've always called them > > minor releases, and I don't recall other people using different > > terminology. I believe the leadoff text on > > > > https://www.postgresql.org/support/versioning/ > > > > is much older than when we switched from two-part major version > > numbers to one-part major version numbers. > > Huh, that wasn’t what I expected. I only started (in depth) working with > PG around 9.6 and I definitely thought of “6” as the minor version. This is > an interesting mailing list thread. > That common misunderstanding was, in fact, one of the reasons to go to two-part version numbers instead of 3. Because people did not realize that the full 9.6 digit was the major version, and thus what was maintained and such. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Statistics Import and Export
On Sat, Mar 30, 2024 at 1:26 AM Corey Huinker wrote: > > > On Fri, Mar 29, 2024 at 7:34 PM Jeff Davis wrote: > >> On Fri, 2024-03-29 at 18:02 -0400, Stephen Frost wrote: >> > I’d certainly think “with stats” would be the preferred default of >> > our users. >> >> I'm concerned there could still be paths that lead to an error. For >> pg_restore, or when loading a SQL file, a single error isn't fatal >> (unless -e is specified), but it still could be somewhat scary to see >> errors during a reload. >> > > To that end, I'm going to be modifying the "Optimizer statistics are not > transferred by pg_upgrade..." message when stats _were_ transferred, > width additional instructions that the user should treat any stats-ish > error messages encountered as a reason to manually analyze that table. We > should probably say something about extended stats as well. > > I'm getting late into this discussion and I apologize if I've missed this being discussed before. But. Please don't. That will make it *really* hard for any form of automation or drivers of this. The information needs to go somewhere where such tools can easily consume it, and an informational message during runtime (which is also likely to be translated in many environments) is the exact opposite of that. Surely we can come up with something better. Otherwise, I think all those tools are just going ot have to end up assuming that it always failed and proceed based on that, and that would be a shame. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Possibility to disable `ALTER SYSTEM`
On Mon, Mar 25, 2024 at 7:27 PM Tom Lane wrote: > Robert Haas writes: > > OK, great. The latest patch doesn't specifically talk about backing it > > up with filesystem-level controls, but it does clearly say that this > > feature is not going to stop a determined superuser from bypassing the > > feature, which I think is the appropriate level of detail. We don't > > actually know whether a user has filesystem-level controls available > > on their system that are equal to the task; certainly chmod isn't good > > enough, unless you can prevent the superuser from just running chmod > > again, which you probably can't. An FS-level immutable flag or some > > other kind of OS-level wizardry might well get the job done, but I > > don't think our documentation needs to speculate about that. > > True. For postgresql.conf, you can put it outside the data directory > and make it be owned by some other user, and the job is done. It's > harder for postgresql.auto.conf because that always lives in the data > directory which is necessarily postgres-writable, so even if you > did those two things to it the superuser could just rename or > remove it and then write postgresql.auto.conf of his choosing. > Just to add to that -- if you use chattr +i on it, the superuser in postgres won't be able to rename it -- only the actual root user. Just chowning it won't help of course, then the rename part works. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Possibility to disable `ALTER SYSTEM`
On Wed, Mar 20, 2024 at 8:52 PM Robert Haas wrote: > On Wed, Mar 20, 2024 at 3:17 PM Magnus Hagander > wrote: > > Right, what I meant is that making it a packaging decision is the better > place. Wherever it goes, allowing the administrator to choose what fits > them should be made possible. > > +1. Which is also the justification for this patch, when it comes > right down to it. The administrator gets to decide how the contents of > postgresql.conf are to be managed on their particular installation. > Not really. The administrator can *already* do that. It's trivial. This patch is about doing it in a way that doesn't produce as ugly a message.But if we're "delegating" it to packagers and "os administrators", then the problem is already solved. This patch is about trying to solve it *without* involving the packagers or OS administrators. Not saying we shouldn't do it, but I'd argue the exact opposite of yours aboe, which is that it's very much not the justification of the patch :) > They can decide that postgresql.conf should be writable by the same > user that runs PostgreSQL, or not. And they should also be able to > decide that ALTER SYSTEM is an OK way to change configuration, or that > it isn't. How we enable them to make that decision is a point for > discussion, and how exactly we phrase the documentation is a point for > discussion, but we have no business trying to impose conditions, as if > they're only allowed to make that decision if they conform to some > (IMHO ridiculous) requirements that we dictate from on high. It's > their system, not ours. > Agreed on all those except they can already do this. It's just that the error message is ugly. The path of least resistance would be to just specifically detect a permissions error on the postgresql.auto.conf file when you try to do ALTER SYSTEM, and throw at least an error hint about "you must allow writing to this file for the feature to work". So this patch isn't at all about enabling this functionality. It's about making it more user friendly. I mean, for crying out loud, users can set enable_seqscan=off in > postgresql.conf and GLOBALLY DISABLE SEQUENTIAL SCANS. They can set > This is actually a good example, because it's kind of like this patch. It doesn't *actually* disable the ability to run sequential scans, it just disables the "usual way". Just like this patch doesn't prevent the superuser from editing the config, but it does prevent them droin doing it "the usual way". > zero_damaged_pages=on in postgresql.conf and silently remove vast > quantities of data without knowing that they're doing anything. We > don't even question that stuff ... although we probably should be > I like how you got this far and didn't even mention fsync=off :) -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Possibility to disable `ALTER SYSTEM`
On Wed, Mar 20, 2024 at 8:14 PM Robert Haas wrote: > On Wed, Mar 20, 2024 at 3:11 PM Magnus Hagander > wrote: > > I would argue that having the default permissions not allow postgres to > edit it's own config files *except* for postgresql.auto.conf would be a > better default than what we have now, but that's completely independent of > the patch being discussed on this thread. (And FWIW also already solved on > debian-based platforms for example, which but the main config files in /etc > with postgres only having read permissions on them - and having the > *packagers* adapt such things for their platforms in general seems like a > better place). > > I don't think that I agree that it's categorically better, but it > might be better for some people or in some circumstances. I very much > do agree that it's a packaging question rather than our job to sort > out. > Right, what I meant is that making it a packaging decision is the better place. Wherever it goes, allowing the administrator to choose what fits them should be made possible. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Possibility to disable `ALTER SYSTEM`
On Wed, Mar 20, 2024 at 8:04 PM Robert Haas wrote: > On Wed, Mar 20, 2024 at 11:07 AM Jelte Fennema-Nio > wrote: > > > Ugh, please let's not do this. This was bouncing around in my head > last night, and this is really a quite radical change - especially just to > handle the given ask, which is to prevent a specific command from running. > Not implement a brand new security system. There are so many ways this > could go wrong if we start having separate permissions for some of our > files. In addition to backups and other tools that need to write to the > conf files as the postgres user, what about systems that create a new > cluster automatically e.g. Patroni? It will now need elevated privs just to > create the conf files and assign the new ownership to them. Lots of moving > pieces there and ways things could go wrong. So a big -1 from me, as they > say/ :) > > > > Well put. I don't think the effort of making all tooling handle this > > correctly is worth the benefit that it brings. afaict everyone on this > > thread that actually wants to use this feature would be happy with the > > functionality that the current patch provides (i.e. having > > postgresql.auto.conf writable, but having ALTER SYSTEM error out). > > Yeah, I agree with this completely. I don't understand why people who > hate the feature and hope it dies in a fire get to decide how it has > to work. > > And also, if we verify that the configuration files are all read-only > at the OS level, that also prevents the external tool from managing > them. Well, it can: it can make them non-read-only after server start, > then modify them, then make them read-only again, and it can make sure > that if the system crashes, it again marks them read-only before > trying to start PG. But it seems quite obvious that this will be > inconvenient and difficult to get right. I find it quite easy to > understand the idea that someone wants the PostgreSQL configuration to > be managed by Kubernetes rather than via by ALTER SYSTEM, but I can't > think of any scenario when you just don't want to be able to manage > the configuration at all. Who in the world would want that? > Yeah, I don't see why it's our responsibility to decide what permissions people should have on their config files. I would argue that having the default permissions not allow postgres to edit it's own config files *except* for postgresql.auto.conf would be a better default than what we have now, but that's completely independent of the patch being discussed on this thread. (And FWIW also already solved on debian-based platforms for example, which but the main config files in /etc with postgres only having read permissions on them - and having the *packagers* adapt such things for their platforms in general seems like a better place). -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Possibility to disable `ALTER SYSTEM`
On Tue, Mar 19, 2024 at 3:52 PM Tom Lane wrote: > > Heikki Linnakangas writes: > > Perhaps we could make that even better with a GUC though. I propose a > > GUC called 'configuration_managed_externally = true / false". If you set > > it to true, we prevent ALTER SYSTEM and make the error message more > > definitive: > > > postgres=# ALTER SYSTEM SET wal_level TO minimal; > > ERROR: configuration is managed externally > > > As a bonus, if that GUC is set, we could even check at server startup > > that all the configuration files are not writable by the postgres user, > > and print a warning or refuse to start up if they are. > > I like this idea. The "bonus" is not optional though, because > setting the files' ownership/permissions is the only way to be > sure that the prohibition is even a little bit bulletproof. > > One small issue: how do we make that work on Windows? Have recent > versions grown anything that looks like real file permissions? Windows has had full ACL support since 1993. The easiest way to do what you're doing here is to just set a DENY permission on the postgres operating system user. //Magnus
Re: Possibility to disable `ALTER SYSTEM`
On Mon, Mar 18, 2024 at 4:44 PM Daniel Gustafsson wrote: > > > On 18 Mar 2024, at 16:34, Magnus Hagander wrote: > > > > On Mon, Mar 18, 2024 at 2:09 PM Daniel Gustafsson wrote: > >> > >>> On 18 Mar 2024, at 13:57, Robert Haas wrote: > >> > >>> my proposal is something like this, taking a > >>> bunch of text from Jelte's patch and some inspiration from Magnus's > >>> earlier remarks: > >> > >> I still think any wording should clearly mention that settings in the file > >> are > >> still applied. The proposed wording says to implicitly but to avoid > >> confusion > >> I think it should be explicit. > > > > I haven't kept up with the thread, but in general I'd prefer it to > > actually turn off parsing the file as well. I think just turning off > > the ability to change it -- including the ability to *revert* changes > > that were made to it before -- is going to be confusing. > > Wouldn't that break pgBackrest which IIRC write to .auto.conf directly > without using ALTER SYSTEM? Ugh of course. And not only that, it would also break pg_basebackup which does the same. So I guess that's not a good idea. I guess nobody anticipated this when that was done:) > > But, if we have decided it shouldn't do that, then IMHO we should > > consider naming it maybe enable_alter_system_command instead -- since > > we're only disabling the alter system command, not the actual feature > > in total. > > Good point. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Possibility to disable `ALTER SYSTEM`
On Mon, Mar 18, 2024 at 2:09 PM Daniel Gustafsson wrote: > > > On 18 Mar 2024, at 13:57, Robert Haas wrote: > > > my proposal is something like this, taking a > > bunch of text from Jelte's patch and some inspiration from Magnus's > > earlier remarks: > > I still think any wording should clearly mention that settings in the file are > still applied. The proposed wording says to implicitly but to avoid confusion > I think it should be explicit. I haven't kept up with the thread, but in general I'd prefer it to actually turn off parsing the file as well. I think just turning off the ability to change it -- including the ability to *revert* changes that were made to it before -- is going to be confusing. But, if we have decided it shouldn't do that, then IMHO we should consider naming it maybe enable_alter_system_command instead -- since we're only disabling the alter system command, not the actual feature in total. > > Perhaps figuring out how to > > document this is best left to a separate thread, and there's also the > > question of whether a new section that talks about this also ought to > > talk about anything else. But I feel like we're way overdue to do > > something about this. > > Seconded, both that it needs to be addressed and that it should be done on a > separate thread from this one. +1. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: pg_dump include/exclude fails to error
On Sun, Mar 10, 2024 at 4:51 PM Pavel Stehule wrote: > > Hi > > ne 10. 3. 2024 v 15:23 odesílatel Magnus Hagander > napsal: >> >> When including tables with the new pg_dump functionality, it fails to >> error out if a table is missing, but only if more than one table is >> specified. >> >> E.g., if table foo exist, but not bar: >> >> pg_dump --table bar >> pg_dump: error: no matching tables were found >> >> with file "myfilter" containing just "table bar" >> pg_dump --filter myfilter >> pg_dump: error: no matching tables were found >> >> with the file "myfilter" containing both "table foo" and "table bar" >> (order doesn't matter): >> > > > is not this expected behaviour (consistent with -t option)? > > (2024-03-10 16:48:07) postgres=# \dt > List of relations > ┌┬──┬───┬───┐ > │ Schema │ Name │ Type │ Owner │ > ╞╪══╪═══╪═══╡ > │ public │ foo │ table │ pavel │ > └┴──┴───┴───┘ > (1 row) > > pavel@nemesis:~/src/orafce$ /usr/local/pgsql/master/bin/pg_dump -t foo -t boo > > /dev/null > pavel@nemesis:~/src/orafce$ > > if you want to raise error, you should to use option --strict-names. > > pavel@nemesis:~/src/orafce$ /usr/local/pgsql/master/bin/pg_dump -t foo -t boo > --strict-names > /dev/null > pg_dump: error: no matching tables were found for pattern "boo" Hmpf, you're right. I guess I don't use multiple-dash-t often enough :) So yeah, then I agree this is probably the right behaviour. Maybe the docs for --filter deserves a specific mention about these rules though, as it's going to be a lot more common to specify multiples when using --filter? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
pg_dump include/exclude fails to error
When including tables with the new pg_dump functionality, it fails to error out if a table is missing, but only if more than one table is specified. E.g., if table foo exist, but not bar: pg_dump --table bar pg_dump: error: no matching tables were found with file "myfilter" containing just "table bar" pg_dump --filter myfilter pg_dump: error: no matching tables were found with the file "myfilter" containing both "table foo" and "table bar" (order doesn't matter): Not having looked into the code, but it looks to me like some variable isn't properly reset, or perhaps there is a check for existence rather than count? //Magnus
Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.
On Fri, Dec 22, 2023 at 11:04 PM Alexander Korotkov wrote: > > Hi, Anton! > > On Mon, Dec 4, 2023 at 3:50 AM Anton A. Melnikov > wrote: >> >> Thanks for remarks! >> >> On 28.11.2023 21:34, Alexander Korotkov wrote: >> > After examining the second patch >> > ("v2-0001-Add-restartpoint-stats.patch"), it appears that adding >> > additional statistics as outlined in the patch is the most suitable >> > approach to address the concerns raised. This solution provides more >> > visibility into the system's behavior without altering its core >> > mechanics. >> >> Agreed. I left only this variant of the patch and rework it due to commit >> 96f05261. >> So the new counters is in the pg_stat_checkpointer view now. >> Please see the v3-0001-add-restartpoints-stats.patch attached. >> >> >> > However, it's essential that this additional functionality >> > is accompanied by comprehensive documentation to ensure clear >> > understanding and ease of use by the PostgreSQL community. >> > >> > Please consider expanding the documentation to include detailed >> > explanations of the new statistics and their implications in various >> > scenarios. >> >> In the separate v3-0002-doc-for-restartpoints-stats.patch i added the >> definitions >> of the new counters into the "28.2.15. pg_stat_checkpointer" section >> and explanation of them with examples into the "30.5.WAL Configuration" one. >> >> Would be glad for any comments and and concerns. > > > I made some grammar corrections to the docs and have written the commit > message. > > I think this patch now looks good. I'm going to push this if there are no > objections. Per the docs, the sync_time, write_time and buffers_written only apply to checkpoints, not restartpoints. Is this correct? AFAICT from a quick look at the code they include both checkpoints and restartpoints in which case I think the docs should be clarified to indicate that? (Or if I'm wrong, and it doesn't include them, then shouldn't we have separate counters for them?) //Magnus
Replication conflicts not processed in ClientWrite
When a backend is blocked on writing data (such as with a network error or a very slow client), indicated with wait event ClientWrite, it appears to not properly notice that it's overrunning max_standby_streaming_delay, and therefore does not cancel the transaction on the backend. I've reproduced this repeatedly on Ubuntu 20.04 with PostgreSQL 15 out of the debian packages. Curiously enough, if I install the debug symbols and restart, in order to get a backtrace, it starts processing the cancellation again and can no longer reproduce. So it sounds like some timing issue around it. My simple test was, with session 1 on the standby and session 2 on the primary: Session 1: begin transaction isolation level repeatable read; Session 1: select count(*) from testtable; Session 2: alter table testtable rename to testtable2; Session 1: select * from testtable t1 cross join testtable t2; kill -STOP At this point, replication lag sartgs growing on the standby and it never terminates the session. If I then SIGCONT it, it will get terminated by replication conflict. If I kill the session hard, the replication lag recovers immediately. AFAICT if the confliact happens at ClientRead, for example, it's picked up immediately, but there's something in ClientWrite that prevents it. My first thought would be OpenSSL, but this is reproducible both on tls-over-tcp and on unix sockets. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: System username in pg_stat_activity
On Fri, Feb 16, 2024 at 9:45 PM Andres Freund wrote: > > Hi, > > On 2024-02-16 15:22:16 -0500, Tom Lane wrote: > > Magnus Hagander writes: > > > I mean, we could split it into more than one view. But adding a new > > > view for every new thing we want to show is also not very good from > > > either a usability or performance perspective. So where would we put > > > it? > > > > It'd have to be a new view with a row per session, showing static > > (or at least mostly static?) properties of the session. > > Yep. > > > > Could we move some existing fields of pg_stat_activity into such a > > view? > > I'd suspect that at least some of > - leader_pid > - datid > - datname > - usesysid > - usename > - backend_start > - client_addr > - client_hostname > - client_port > - backend_type > > could be moved. Whether's worth breaking existing queries, I don't quite know. I think that's the big question. I think if we move all of those we will break every single monitoring tool out there for postgres... That's a pretty hefty price. > One option would be to not return (some) of them from pg_stat_get_activity(), > but add them to the view in a way that the planner can elide the reference. Without having any numbers, I would think that the join to pg_authid for exapmle is likely more costly than returning all the other fields. But that one does get eliminated as long as one doesn't query that column. But if we make more things "joined in from the view", isn't that likely to just make it more expensive in most cases? > > I'm not sure that this is worth the trouble TBH. If it can be shown > > that pulling a few fields out of pg_stat_activity actually does make > > for a useful speedup, then maybe OK ... but Andres hasn't provided > > any evidence that there's a measurable issue. > > If I thought that the two columns proposed here were all that we wanted to > add, I'd not be worried. But there have been quite a few other fields > proposed, e.g. tracking idle/active time on a per-connection granularity. > > We even already have a patch to add pg_stat_session > https://commitfest.postgresql.org/47/3405/ In a way, that's yet another different type of values though -- it contains accumulated stats. So we really have 3 types -- "info" that's not really stats (username, etc), "current state" (query, wait events, state) and "accumulated stats" (counters since start).If we don't want to combine them all, we should perhaps not combine any and actually have 3 views? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: System username in pg_stat_activity
On Fri, Feb 16, 2024 at 9:31 PM Magnus Hagander wrote: > > On Fri, Feb 16, 2024 at 9:20 PM Andres Freund wrote: > > > > Hi, > > > > On 2024-02-16 20:57:59 +0100, Magnus Hagander wrote: > > > On Fri, Feb 16, 2024 at 8:41 PM Andres Freund wrote: > > > > On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote: > > > > > The attached patch adds a column "authuser" to pg_stat_activity which > > > > > contains the username of the externally authenticated user, being the > > > > > same value as the SYSTEM_USER keyword returns in a backend. > > > > > > > > I continue to think that it's a bad idea to make pg_stat_activity ever > > > > wider > > > > with columns that do not actually describe properties that change > > > > across the > > > > course of a session. Yes, there's the argument that that ship has > > > > sailed, but > > > > I don't think that's a good reason to continue ever further down that > > > > road. > > > > > > > > It's not just a usability issue, it also makes it more expensive to > > > > query > > > > pg_stat_activity. This is of course more pronounced with textual > > > > columns than > > > > with integer ones. > > > > > > That's a fair point, but I do think that has in most ways already sailed, > > > yes. > > > > > > I mean, we could split it into more than one view. But adding a new > > > view for every new thing we want to show is also not very good from > > > either a usability or performance perspective. So where would we put > > > it? > > > > I think we should group new properties that don't change over the course of > > a > > session ([1]) in a new view (e.g. pg_stat_session). I don't think we need > > one > > view per property, but I do think it makes sense to split information that > > changes very frequently (like most pg_stat_activity contents) from > > information > > that doesn't (like auth_method, auth_identity). > > That would make sense in many ways, but ends up with "other level of > annoyances". E.g. the database name and oid don't change, but would we > want to move those out of pg_stat_activity? Same for username? Don't > we just end up in a grayzone about what belongs where? > > Also - were you envisioning just another view, or actually replacing > the pg_stat_get_activity() part? As in where do you think the cost > comes? Andres -- did you spot this question in the middle or did it get lost in the flurry of others? :) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: System username in pg_stat_activity
On Fri, Feb 16, 2024 at 9:51 PM Andres Freund wrote: > > Hi, > > On 2024-02-16 21:41:41 +0100, Magnus Hagander wrote: > > > Maybe I am missing something, but why aren't we just getting the value > > > from > > > the leader's entry, instead of copying it? > > > > The answer to that would be "because I didn't think of it" :) > > :) > > > > Were you thinking of something like the attached? > > > @@ -435,6 +438,22 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) > > { > > values[29] = > > Int32GetDatum(leader->pid); > > nulls[29] = false; > > + > > + /* > > + * The authenticated user in a > > parallel worker is the same as the one in > > + * the leader, so look it up there. > > + */ > > + if (leader->backendId) > > + { > > + LocalPgBackendStatus > > *leaderstat = pgstat_get_local_beentry_by_backend_id(leader->backendId); > > + > > + if > > (leaderstat->backendStatus.st_auth_method != uaReject && > > leaderstat->backendStatus.st_auth_method != uaImplicitReject) > > + { > > + nulls[31] = nulls[32] > > = false; > > + values[31] = > > CStringGetTextDatum(hba_authname(leaderstat->backendStatus.st_auth_method)); > > + values[32] = > > CStringGetTextDatum(leaderstat->backendStatus.st_auth_identity); > > + } > > + } > > Mostly, yes. > > I only skimmed the patch, but it sure looks to me that we could end up with > none of the branches setting 31,32, so I think you'd have to make sure to > handle that case. That case sets nulls[] for both of them to true I believe? And when that is set I don't believe we need to set the values themselves. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: System username in pg_stat_activity
On Fri, Feb 16, 2024 at 8:55 PM Andres Freund wrote: > > Hi, > > On 2024-01-12 17:16:53 +0100, Magnus Hagander wrote: > > On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot > > wrote: > > > On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote: > > > > On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot > > > > wrote: > > > > > > > > > > If we go the 2 fields way, then what about auth_identity and > > > > > auth_method then? > > > > > > > > > > > > Here is an updated patch based on this idea. > > > > > > Thanks! > > > > > > + > > > + > > > + auth_method text > > > + > > > + > > > + The authentication method used for authenticating the connection, > > > or > > > + NULL for background processes. > > > + > > > > > > I'm wondering if it would make sense to populate it for parallel workers > > > too. > > > I think it's doable thanks to d951052, but I'm not sure it's worth it > > > (one could > > > join based on the leader_pid though). OTOH that would be consistent with > > > how the SYSTEM_USER behaves with parallel workers (it's populated). > > > > I guess one could conceptually argue that "authentication happens int > > he leader". But we do populate it with the other user records, and > > it'd be weird if this one was excluded. > > > > The tricky thing is that pgstat_bestart() is called long before we > > deserialize the data. But from what I can tell it should be safe to > > change it per the attached? That should be AFAICT an extremely short > > window of time longer before we report it, not enough to matter. > > I don't like that one bit. The whole subsystem initialization dance already is > quite complicated, particularly for pgstat, we shouldn't make it more > complicated. Especially not when the initialization is moved quite a bit away > from all the other calls. > > Besides just that, I also don't think delaying visibility of the worker in > pg_stat_activity until parallel worker initialization has completed is good, > that's not all cheap work. > > > Maybe I am missing something, but why aren't we just getting the value from > the leader's entry, instead of copying it? The answer to that would be "because I didn't think of it" :) Were you thinking of something like the attached? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index cf3de80394..9403df5886 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -23891,7 +23891,7 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); - + system_user system_user diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 5cf9363ac8..51ed656b17 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -784,6 +784,28 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + + auth_method text + + + The authentication method used for authenticating the connection, or + NULL for background processes. + + + + + + auth_identity text + + + The identity (if any) that the user presented during the authentication + cycle before they were assigned a database role. Contains the same + value as the identity part in , or NULL + for background processes. + + + application_name text diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 04227a72d1..bb1060fc66 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -873,6 +873,8 @@ CREATE VIEW pg_stat_activity AS S.leader_pid, S.usesysid, U.rolname AS usename, +S.auth_method, +S.auth_identity, S.application_name, S.client_addr, S.client_hostname, diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c index 9bbdc4beb0..6621969154 100644 --- a/src/backend/libpq/auth.c +++ b/src/backend/libpq/auth.c @@ -625,9 +625,20 @@ ClientAuthentication(Port *port) status = CheckRADIUSAuth(port); break; case uaCert: - /* uaCert will be treated as if clientcert=verify-full (uaTrust) */ + + /* + * uaCert will be treated as if clientcert=verify-full further + * down + */ + break; case uaTrust: status = STATUS_OK; + + /* + * Trust do
Re: System username in pg_stat_activity
On Fri, Feb 16, 2024 at 9:20 PM Andres Freund wrote: > > Hi, > > On 2024-02-16 20:57:59 +0100, Magnus Hagander wrote: > > On Fri, Feb 16, 2024 at 8:41 PM Andres Freund wrote: > > > On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote: > > > > The attached patch adds a column "authuser" to pg_stat_activity which > > > > contains the username of the externally authenticated user, being the > > > > same value as the SYSTEM_USER keyword returns in a backend. > > > > > > I continue to think that it's a bad idea to make pg_stat_activity ever > > > wider > > > with columns that do not actually describe properties that change across > > > the > > > course of a session. Yes, there's the argument that that ship has > > > sailed, but > > > I don't think that's a good reason to continue ever further down that > > > road. > > > > > > It's not just a usability issue, it also makes it more expensive to query > > > pg_stat_activity. This is of course more pronounced with textual columns > > > than > > > with integer ones. > > > > That's a fair point, but I do think that has in most ways already sailed, > > yes. > > > > I mean, we could split it into more than one view. But adding a new > > view for every new thing we want to show is also not very good from > > either a usability or performance perspective. So where would we put > > it? > > I think we should group new properties that don't change over the course of a > session ([1]) in a new view (e.g. pg_stat_session). I don't think we need one > view per property, but I do think it makes sense to split information that > changes very frequently (like most pg_stat_activity contents) from information > that doesn't (like auth_method, auth_identity). That would make sense in many ways, but ends up with "other level of annoyances". E.g. the database name and oid don't change, but would we want to move those out of pg_stat_activity? Same for username? Don't we just end up in a grayzone about what belongs where? Also - were you envisioning just another view, or actually replacing the pg_stat_get_activity() part? As in where do you think the cost comes? (And as to Toms question about key column - the pid column can surely be that? We already do that for pg_stat_ssl and pg_stat_gssapi, that are both driven from pg_stat_get_activity() but shows a different set of columns. > Additionally I think something like pg_stat_session could also contain > per-session cumulative counters like the session's contribution to > pg_stat_database.{idle_in_transaction_time,active_time} -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: System username in pg_stat_activity
On Fri, Feb 16, 2024 at 8:41 PM Andres Freund wrote: > > Hi, > > On 2024-01-10 12:46:34 +0100, Magnus Hagander wrote: > > The attached patch adds a column "authuser" to pg_stat_activity which > > contains the username of the externally authenticated user, being the > > same value as the SYSTEM_USER keyword returns in a backend. > > I continue to think that it's a bad idea to make pg_stat_activity ever wider > with columns that do not actually describe properties that change across the > course of a session. Yes, there's the argument that that ship has sailed, but > I don't think that's a good reason to continue ever further down that road. > > It's not just a usability issue, it also makes it more expensive to query > pg_stat_activity. This is of course more pronounced with textual columns than > with integer ones. That's a fair point, but I do think that has in most ways already sailed, yes. I mean, we could split it into more than one view. But adding a new view for every new thing we want to show is also not very good from either a usability or performance perspective. So where would we put it? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: System username in pg_stat_activity
On Fri, Jan 19, 2024 at 1:43 PM Julien Rouhaud wrote: > > Hi, > > On Thu, Jan 18, 2024 at 11:01 PM Magnus Hagander wrote: > > > > I did. Here it is, and also including that suggested docs fix as well > > as a rebase on current master. > > +if (MyClientConnectionInfo.authn_id) > +strlcpy(lbeentry.st_auth_identity, > MyClientConnectionInfo.authn_id, NAMEDATALEN); > +else > +MemSet(_auth_identity, 0, > sizeof(lbeentry.st_auth_identity)); > > Should we use pg_mbcliplen() here? I don't think there's any strong > guarantee that no multibyte character can be used. I also agree with > the nearby comment about MemSet being overkill. Hm. Good question. I don't think there is such a guarantee, no. So something like attached v5? Also, wouldn't that problem already exist a few lines down for the SSL parts? > + value as the identity part in , or NULL > I was looking at > https://www.postgresql.org/docs/current/auth-username-maps.html and > noticed that this page is switching between system-user and > system-username. Should we clean that up while at it? Seems like something we should clean up yes, but not as part of this patch. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index cf3de80394..9403df5886 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -23891,7 +23891,7 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); - + system_user system_user diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 5cf9363ac8..51ed656b17 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -784,6 +784,28 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + + auth_method text + + + The authentication method used for authenticating the connection, or + NULL for background processes. + + + + + + auth_identity text + + + The identity (if any) that the user presented during the authentication + cycle before they were assigned a database role. Contains the same + value as the identity part in , or NULL + for background processes. + + + application_name text diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 849a03e4b6..269ab4e586 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -1529,6 +1529,9 @@ ParallelWorkerMain(Datum main_arg) /* Attach to the leader's serializable transaction, if SERIALIZABLE. */ AttachSerializableXact(fps->serializable_xact_handle); + /* Report to the stats system that we are started */ + pgstat_bestart(); + /* * We've initialized all of our state now; nothing should change * hereafter. diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 04227a72d1..bb1060fc66 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -873,6 +873,8 @@ CREATE VIEW pg_stat_activity AS S.leader_pid, S.usesysid, U.rolname AS usename, +S.auth_method, +S.auth_identity, S.application_name, S.client_addr, S.client_hostname, diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c index 9bbdc4beb0..6621969154 100644 --- a/src/backend/libpq/auth.c +++ b/src/backend/libpq/auth.c @@ -625,9 +625,20 @@ ClientAuthentication(Port *port) status = CheckRADIUSAuth(port); break; case uaCert: - /* uaCert will be treated as if clientcert=verify-full (uaTrust) */ + + /* + * uaCert will be treated as if clientcert=verify-full further + * down + */ + break; case uaTrust: status = STATUS_OK; + + /* + * Trust doesn't set_authn_id(), but we still need to store the + * auth_method + */ + MyClientConnectionInfo.auth_method = uaTrust; break; } @@ -645,6 +656,12 @@ ClientAuthentication(Port *port) #endif } + /* + * All auth methods should have either called set_authn_id() or manually + * set the auth_method if they were successful. + */ + Assert(status != STATUS_OK || MyClientConnectionInfo.auth_method != 0); + if (Log_connections && status == STATUS_OK && !MyClientConnectionInfo.authn_id) { diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 1a1050c8da..119319e85a 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -357,6 +357,17 @@ pgstat_bestart(void) else MemSet(_clientaddr, 0, sizeof(lbeentry.st_clientaddr)); + lb
Re: System username in pg_stat_activity
On Fri, Jan 19, 2024 at 12:33 PM Aleksander Alekseev wrote: > > Hi, > > > > Did you forget to share the new revision (aka v4)? I can only see the > > > "reorder_parallel_worker_bestart.patch" attached. > > > > I did. Here it is, and also including that suggested docs fix as well > > as a rebase on current master. > > ``` > +lbeentry.st_auth_method = MyClientConnectionInfo.auth_method; > +if (MyClientConnectionInfo.authn_id) > +strlcpy(lbeentry.st_auth_identity, > MyClientConnectionInfo.authn_id, NAMEDATALEN); > +else > +MemSet(_auth_identity, 0, > sizeof(lbeentry.st_auth_identity)); > ``` > > I believe using sizeof(lbeentry.st_auth_identity) instead of > NAMEDATALEN is generally considered a better practice. We use the NAMEDATALEN method in the rest of the function, so I did it the same way for consistency. I think if we want to change that, we should change the whole function at once to keep it consistent. > Calling MemSet for a CString seems to be an overkill. I suggest > setting lbeentry.st_auth_identity[0] to zero. Fair enough. Will make that change. //Magnus
Re: System username in pg_stat_activity
On Fri, Jan 19, 2024 at 7:20 AM Bertrand Drouvot wrote: > > Hi, > > On Thu, Jan 18, 2024 at 04:01:33PM +0100, Magnus Hagander wrote: > > On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot > > > Did you forget to share the new revision (aka v4)? I can only see the > > > "reorder_parallel_worker_bestart.patch" attached. > > > > I did. Here it is, and also including that suggested docs fix as well > > as a rebase on current master. > > > > Thanks! > > Just a few comments: > > 1 === > > + The authentication method used for authenticating the connection, or > + NULL for background processes. > > What about? "NULL for background processes (except for parallel workers which > inherit this information from their leader process)" Ugh. That doesn't read very well at all to me. Maybe just "NULL for background processes without a user"? > 2 === > > + cycle before they were assigned a database role. Contains the same > + value as the identity part in , or NULL > + for background processes. > > Same comment about parallel workers. > > 3 === > > +# pg_stat_activity should contain trust and empty string for trust auth > +$res = $node->safe_psql( > + 'postgres', > + "SELECT auth_method, auth_identity='' FROM pg_stat_activity WHERE > pid=pg_backend_pid()", > + connstr => "user=scram_role"); > +is($res, 'trust|t', 'Users with trust authentication should show correctly > in pg_stat_activity'); > + > +# pg_stat_activity should contain NULL for auth of background processes > +# (test is a bit out of place here, but this is where the other > pg_stat_activity.auth* tests are) > +$res = $node->safe_psql( > + 'postgres', > + "SELECT auth_method IS NULL, auth_identity IS NULL FROM > pg_stat_activity WHERE backend_type='checkpointer'", > +); > +is($res, 't|t', 'Background processes should show NULL for auth in > pg_stat_activity'); > > What do you think about testing the parallel workers cases too? (I'm not 100% > sure it's worth the extra complexity though). I'm leaning towards not worth that. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Possibility to disable `ALTER SYSTEM`
On Tue, Jan 30, 2024 at 10:48 PM Tom Lane wrote: > > Robert Haas writes: > > There's nothing wrong with that exactly, but what does it gain us over > > my proposal of a sentinel file? > > I was imagining using selinux and/or sepgsql to directly prevent > writing postgresql.auto.conf from the Postgres account. Combine that > with a non-Postgres-owned postgresql.conf (already supported) and you > have something that seems actually bulletproof, rather than a hint. > Admittedly, using that approach requires knowing something about a > non-Postgres security mechanism. Wouldn't a simple "chattr +i postgresql.auto.conf" work? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: System username in pg_stat_activity
On Mon, Jan 15, 2024 at 11:17 AM Bertrand Drouvot wrote: > > Hi, > > On Fri, Jan 12, 2024 at 05:16:53PM +0100, Magnus Hagander wrote: > > On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot > > wrote: > > > > > > I'm wondering if it would make sense to populate it for parallel workers > > > too. > > > I think it's doable thanks to d951052, but I'm not sure it's worth it > > > (one could > > > join based on the leader_pid though). OTOH that would be consistent with > > > how the SYSTEM_USER behaves with parallel workers (it's populated). > > > > I guess one could conceptually argue that "authentication happens int > > he leader". But we do populate it with the other user records, and > > it'd be weird if this one was excluded. > > > > The tricky thing is that pgstat_bestart() is called long before we > > deserialize the data. But from what I can tell it should be safe to > > change it per the attached? That should be AFAICT an extremely short > > window of time longer before we report it, not enough to matter. > > Thanks! Yeah, that seems reasonable to me. Also, I think we should remove the > "MyProcPort" test here then (looking at v3): > > + if (MyProcPort && MyClientConnectionInfo.authn_id) > + strlcpy(lbeentry.st_auth_identity, > MyClientConnectionInfo.authn_id, NAMEDATALEN); > + else > + MemSet(_auth_identity, 0, > sizeof(lbeentry.st_auth_identity)); > > to get the st_auth_identity propagated to the parallel workers. Yup, I had done that in v4 which as you noted further down, I forgot to post. > > > - what about "Contains the same value as the identity part in > > linkend="system-user" />"? > > Not sure, but looks like you missed this comment? I did. Agree with your comment, and updated now. > > > +# Users with md5 auth should show both auth method and name in > > > pg_stat_activity > > > > > > what about "show both auth method and identity"? > > > > Good spot, yeah, I changed it over to identity everywhere else so it > > should be here as well. > > Did you forget to share the new revision (aka v4)? I can only see the > "reorder_parallel_worker_bestart.patch" attached. I did. Here it is, and also including that suggested docs fix as well as a rebase on current master. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 210c7c0b02..8fa0e5474e 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -23462,7 +23462,7 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); - + system_user system_user diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 6e74138a69..25eb7a4bf4 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -784,6 +784,28 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + + auth_method text + + + The authentication method used for authenticating the connection, or + NULL for background processes. + + + + + + auth_identity text + + + The identity (if any) that the user presented during the authentication + cycle before they were assigned a database role. Contains the same + value as the identity part in , or NULL + for background processes. + + + application_name text diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 849a03e4b6..269ab4e586 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -1529,6 +1529,9 @@ ParallelWorkerMain(Datum main_arg) /* Attach to the leader's serializable transaction, if SERIALIZABLE. */ AttachSerializableXact(fps->serializable_xact_handle); + /* Report to the stats system that we are started */ + pgstat_bestart(); + /* * We've initialized all of our state now; nothing should change * hereafter. diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e43e36f5ac..b68c382de1 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -873,6 +873,8 @@ CREATE VIEW pg_stat_activity AS S.leader_pid, S.usesysid, U.rolname AS usename, +S.auth_method, +S.auth_identity, S.application_name, S.client_addr, S.client_hostname, diff --git a/src/backend/libpq/auth.c b/src/backend/lib
Re: System username in pg_stat_activity
On Thu, Jan 11, 2024 at 5:55 PM Bertrand Drouvot wrote: > > Hi, > > On Thu, Jan 11, 2024 at 02:24:58PM +0100, Magnus Hagander wrote: > > On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot > > wrote: > > > > > > If we go the 2 fields way, then what about auth_identity and auth_method > > > then? > > > > > > Here is an updated patch based on this idea. > > Thanks! > > + > + > + auth_method text > + > + > + The authentication method used for authenticating the connection, or > + NULL for background processes. > + > > I'm wondering if it would make sense to populate it for parallel workers too. > I think it's doable thanks to d951052, but I'm not sure it's worth it (one > could > join based on the leader_pid though). OTOH that would be consistent with > how the SYSTEM_USER behaves with parallel workers (it's populated). I guess one could conceptually argue that "authentication happens int he leader". But we do populate it with the other user records, and it'd be weird if this one was excluded. The tricky thing is that pgstat_bestart() is called long before we deserialize the data. But from what I can tell it should be safe to change it per the attached? That should be AFAICT an extremely short window of time longer before we report it, not enough to matter. > + > + auth_identity text > + > + > + The identity (if any) that the user presented during the > authentication > + cycle before they were assigned a database role. Contains the same > + value as > > Same remark regarding the parallel workers case +: > > - Would it be better to use the `name` datatype for auth_identity? I've been going back and forth. And I think my conclusion is that it's not a postgres identifier, so it shouldn't be. See the earlier discussion, and for example that that's what we do for cert names when SSL is used. > - what about "Contains the same value as the identity part in linkend="system-user" />"? > > + /* > +* Trust doesn't set_authn_id(), but we still need to > store the > +* auth_method > +*/ > + MyClientConnectionInfo.auth_method = uaTrust; > > +1, I think it is useful here to provide "trust" and not a NULL value in the > context of this patch. Yeah, that's probably "independently correct", but actually useful here. > +# pg_stat_activity shold contain trust and empty string for trust auth > > typo: s/shold/should/ Ops. > +# Users with md5 auth should show both auth method and name in > pg_stat_activity > > what about "show both auth method and identity"? Good spot, yeah, I changed it over to identity everywhere else so it should be here as well. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/ diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 849a03e4b6..269ab4e586 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -1529,6 +1529,9 @@ ParallelWorkerMain(Datum main_arg) /* Attach to the leader's serializable transaction, if SERIALIZABLE. */ AttachSerializableXact(fps->serializable_xact_handle); + /* Report to the stats system that we are started */ + pgstat_bestart(); + /* * We've initialized all of our state now; nothing should change * hereafter. diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 1ad3367159..ce090f6864 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -22,6 +22,7 @@ #include "access/genam.h" #include "access/heapam.h" #include "access/htup_details.h" +#include "access/parallel.h" #include "access/session.h" #include "access/sysattr.h" #include "access/tableam.h" @@ -1235,7 +1236,7 @@ InitPostgres(const char *in_dbname, Oid dboid, process_session_preload_libraries(); /* report this backend in the PgBackendStatus array */ - if (!bootstrap) + if (!bootstrap && !IsParallelWorker()) pgstat_bestart(); /* close the transaction we started above */
Re: Compile warnings in dbcommands.c building with meson
On Wed, Jan 10, 2024 at 1:16 PM Aleksander Alekseev wrote: > > Hi, > > > When building current head on debian bullseye I get this compile warning: > > > > In file included from ../src/backend/commands/dbcommands.c:20: > > ../src/backend/commands/dbcommands.c: In function ‘createdb’: > > ../src/include/postgres.h:104:9: warning: ‘src_hasloginevt’ may be > > used uninitialized in this function [-Wmaybe-uninitialized] > > 104 | return (Datum) (X ? 1 : 0); > > | ^~~ > > ../src/backend/commands/dbcommands.c:683:8: note: ‘src_hasloginevt’ > > was declared here > > 683 | bool src_hasloginevt; > > |^~~ > > > > > > I only get this when building with meson, not when building with > > autotools. AFAICT, I have the same config: > > > > ./configure --enable-debug --enable-depend --with-python > > --enable-cassert --with-openssl --enable-tap-tests --with-icu > > > > vs > > > > meson setup build -Ddebug=true -Dpython=true -Dcassert=true > > -Dssl=openssl -Dtap-test=true -Dicu=enabled -Dnls=disabled > > > > > > in both cases the compiler is: > > gcc (Debian 10.2.1-6) 10.2.1 20210110 > > Seems to me that the compiler is not smart enough to process: > > ``` > if (!get_db_info(dbtemplate, ShareLock, > _dboid, _owner, _encoding, > _istemplate, _allowconn, _hasloginevt, > _frozenxid, _minmxid, _deftablespace, > _collate, _ctype, _iculocale, > _icurules, _locprovider, > _collversion)) > ereport(ERROR, ... > ``` > > Should we just silence the warning like this - see attachment? I don't > think createdb() is called that often to worry about slight > performance change, if there is any. Certainly looks that way, but I'm curious as to why nobody else has seen this.. That said, it appears to be gone in current master. Even though nothing changed in that file. Must've been some transient effect, that somehow didn't get blown away by doing a clean -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Slightly improved meson error for docs tools
On Wed, Jan 10, 2024 at 1:05 PM Aleksander Alekseev wrote: > > Hi, > > > least lists what's expected -- I'm not sure if this is the way to go, > > or if we should somehow list the individual tools that are failing > > here? > > Personally I think the patch is OK. Thanks. I've pushed this one for now, we can always adjust further later if needed. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Postgres Partitions Limitations (5.11.2.3)
On Thu, Jan 11, 2024 at 11:24 AM Ashutosh Bapat wrote: > > On Wed, Jan 10, 2024 at 10:38 PM Laurenz Albe > wrote: > > > > On Wed, 2024-01-10 at 13:41 +0100, Magnus Hagander wrote: > > > It still reads a bit weird to me. How about the attached wording instead? > > > > Thanks! I am fine with your wording. > > Works for me too. Thanks, applied and backpatched all the way. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: System username in pg_stat_activity
On Wed, Jan 10, 2024 at 3:12 PM Bertrand Drouvot wrote: > > Hi, > > On Wed, Jan 10, 2024 at 02:59:42PM +0100, Magnus Hagander wrote: > > On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot > > I definitely think it should be the same. If it's not exactly the > > same, then it should be *two* columns, one with auth method and one > > with the name. > > > > And thinking more about it maybe that's cleaner, because that makes it > > easier to do things like filter based on auth method? > > Yeah, that's sounds even better. > > > > > > > + > > > > + > > > > + authname name > > > > + > > > > + > > > > + The authentication method and identity (if any) that the user > > > > + used to log in. It contains the same value as > > > > +returns in the backend. > > > > + > > > > + > > > > > > I'm fine with auth_method:identity. > > > > > > > +S.authname, > > > > > > What about using system_user as the field name? (because if we keep > > > auth_method:identity it's not really the authname anyway). > > > > I was worried system_user or sysuser would both be confusing with the > > fact that we have usesysid -- which would reference a *different* > > sys... > > If we go the 2 fields way, then what about auth_identity and auth_method then? Here is an updated patch based on this idea. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index de78d58d4b..ec6a25495f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -23462,7 +23462,7 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); - + system_user system_user diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index b804eb8b5e..9a3ab8d06f 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -784,6 +784,28 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + + auth_method text + + + The authentication method used for authenticating the connection, or + NULL for background processes. + + + + + + auth_identity text + + + The identity (if any) that the user presented during the authentication + cycle before they were assigned a database role. Contains the same + value as returns in the backend, or NULL + for background processes. + + + application_name text diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e43e36f5ac..b68c382de1 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -873,6 +873,8 @@ CREATE VIEW pg_stat_activity AS S.leader_pid, S.usesysid, U.rolname AS usename, +S.auth_method, +S.auth_identity, S.application_name, S.client_addr, S.client_hostname, diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c index 9bbdc4beb0..6621969154 100644 --- a/src/backend/libpq/auth.c +++ b/src/backend/libpq/auth.c @@ -625,9 +625,20 @@ ClientAuthentication(Port *port) status = CheckRADIUSAuth(port); break; case uaCert: - /* uaCert will be treated as if clientcert=verify-full (uaTrust) */ + + /* + * uaCert will be treated as if clientcert=verify-full further + * down + */ + break; case uaTrust: status = STATUS_OK; + + /* + * Trust doesn't set_authn_id(), but we still need to store the + * auth_method + */ + MyClientConnectionInfo.auth_method = uaTrust; break; } @@ -645,6 +656,12 @@ ClientAuthentication(Port *port) #endif } + /* + * All auth methods should have either called set_authn_id() or manually + * set the auth_method if they were successful. + */ + Assert(status != STATUS_OK || MyClientConnectionInfo.auth_method != 0); + if (Log_connections && status == STATUS_OK && !MyClientConnectionInfo.authn_id) { diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 3dac79c30e..4bb4435c81 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -357,6 +357,12 @@ pgstat_bestart(void) else MemSet(_clientaddr, 0, sizeof(lbeentry.st_clientaddr)); + lbeentry.st_auth_method = MyClientConnectionInfo.auth_method; + if (MyProcPort && MyClientConnectionInfo.authn_id) + strlcpy(lbeentry.st_auth_identity, MyClientConnectionInfo
Re: System username in pg_stat_activity
On Wed, Jan 10, 2024 at 2:56 PM Bertrand Drouvot wrote: > > Hi, > > On Wed, Jan 10, 2024 at 02:08:03PM +0100, Magnus Hagander wrote: > > On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev > > wrote: > > > > > > Hi, > > > > > > Thanks for the patch. > > +1 > > > > > This overlaps with for example the values in pg_stat_gss, but it will > > > > include values for authentication methods that don't have their own > > > > view such as peer/ident. gss/ssl info will of course still be shown, > > > > it is just in more than one place. > > Yeah, I think that's a good idea. > > > > It hurts my sense of beauty that usename and authname are of different > > > types. But if I'm the only one, maybe we can close our eyes on this. > > > Also I suspect that placing usename and authname in a close proximity > > > can be somewhat confusing. Perhaps adding authname as the last column > > > of the view will solve both nitpicks? > > > > But it should probably actually be name, given that's the underlying > > datatype. I kept changing it around and ended up half way in > > between... > > > > > > > ``` > > > +/* Information about the authenticated user */ > > > +charst_authuser[NAMEDATALEN]; > > > ``` > > > > > > Well, here it's called "authuser" and it looks like the intention was > > > to use `name` datatype... I suggest using "authname" everywhere for > > > consistency. > > I think it depends what we want the new field to reflect. If it is the exact > same thing as the SYSTEM_USER then I think it has to be text (as the > SYSTEM_USER > is made of "auth_method:identity"). Now if we want it to be "only" the > identity > part of it, then the `name` datatype would be fine. I'd vote for the exact > same > thing as the SYSTEM_USER (means auth_method:identity). I definitely think it should be the same. If it's not exactly the same, then it should be *two* columns, one with auth method and one with the name. And thinking more about it maybe that's cleaner, because that makes it easier to do things like filter based on auth method? > > + > > + > > + authname name > > + > > + > > + The authentication method and identity (if any) that the user > > + used to log in. It contains the same value as > > +returns in the backend. > > + > > + > > I'm fine with auth_method:identity. > > > +S.authname, > > What about using system_user as the field name? (because if we keep > auth_method:identity it's not really the authname anyway). I was worried system_user or sysuser would both be confusing with the fact that we have usesysid -- which would reference a *different* sys... > Also, what about adding a test in say 003_peer.pl to check that the value > matches > the SYSTEM_USER one? Yeah, that's a good idea I think. I quickly looked over for tests and couldn't really find any for pg_stat_activity, btu we can definitely piggyback them in one or more of the auth tests. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: System username in pg_stat_activity
On Wed, Jan 10, 2024 at 2:27 PM Dagfinn Ilmari Mannsåker wrote: > > Magnus Hagander writes: > > > On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev > > wrote: > >> > >> It hurts my sense of beauty that usename and authname are of different > >> types. But if I'm the only one, maybe we can close our eyes on this. > >> Also I suspect that placing usename and authname in a close proximity > >> can be somewhat confusing. Perhaps adding authname as the last column > >> of the view will solve both nitpicks? > > > > But it should probably actually be name, given that's the underlying > > datatype. I kept changing it around and ended up half way in > > between... > > https://www.postgresql.org/docs/current/functions-info.html#FUNCTIONS-INFO-SESSION-TABLE > (and pg_typeof(system_user)) says it's text. Which makes sense, since > it can easily be longer than 63 bytes, e.g. in the case of a TLS client > certificate DN. We already truncate all those to NAMEDATALEN in pg_stat_ssl for example. so I think the truncation part of those should be OK. We'll truncate "a little bit more" since we also have the 'cert:', but not significantly so I think. but yeah, conceptually it should probably be text because name is supposedly a *postgres identifier*, which this is not. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: System username in pg_stat_activity
On Wed, Jan 10, 2024 at 1:44 PM Aleksander Alekseev wrote: > > Hi, > > Thanks for the patch. > > > The attached patch adds a column "authuser" to pg_stat_activity which > > contains the username of the externally authenticated user, being the > > same value as the SYSTEM_USER keyword returns in a backend. > > I believe what was meant is "authname", not "authuser". > > > This overlaps with for example the values in pg_stat_gss, but it will > > include values for authentication methods that don't have their own > > view such as peer/ident. gss/ssl info will of course still be shown, > > it is just in more than one place. > > > > I was originally thinking this column should be "sysuser" to map to > > the keyword, but since we already have "usesysid" as a column name in > > pg_stat_activity I figured that could be confusing since it actually > > means something completely different. But happy to change that back if > > people think that's better. > > This part of the documentation is wrong: > > ``` > + > + > + authname name > + > ``` > > Actually the type is `text`: > > ``` > =# \d pg_stat_activity ; > View "pg_catalog.pg_stat_activity" > Column | Type | Collation | Nullable | Default > --+--+---+--+- > datid| oid | | | > datname | name | | | > pid | integer | | | > leader_pid | integer | | | > usesysid | oid | | | > usename | name | | | > authname | text | | | > ``` > > It hurts my sense of beauty that usename and authname are of different > types. But if I'm the only one, maybe we can close our eyes on this. > Also I suspect that placing usename and authname in a close proximity > can be somewhat confusing. Perhaps adding authname as the last column > of the view will solve both nitpicks? But it should probably actually be name, given that's the underlying datatype. I kept changing it around and ended up half way in between... > ``` > +/* Information about the authenticated user */ > +charst_authuser[NAMEDATALEN]; > ``` > > Well, here it's called "authuser" and it looks like the intention was > to use `name` datatype... I suggest using "authname" everywhere for > consistency. Yeah, I flipped back and forth a few times and clearly got stuck in the middle. They should absolutely be the same everywhere - whatever name is used it should be consistent. > Since the patch affects pg_proc.dat I believe the commit message > should remind bumping the catalog version. Yes. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index de78d58d4b..ec6a25495f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -23462,7 +23462,7 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); - + system_user system_user diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index b804eb8b5e..dc146a4c9f 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -784,6 +784,17 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + + authname name + + + The authentication method and identity (if any) that the user + used to log in. It contains the same value as +returns in the backend. + + + application_name text diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e43e36f5ac..fe81aefbbb 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -873,6 +873,7 @@ CREATE VIEW pg_stat_activity AS S.leader_pid, S.usesysid, U.rolname AS usename, +S.authname, S.application_name, S.client_addr, S.client_hostname, diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 3dac79c30e..4e0e16a8fe 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -357,6 +357,11 @@ pgstat_bestart(void) else MemSet(_clientaddr, 0, sizeof(lb
Re: Postgres Partitions Limitations (5.11.2.3)
On Tue, Dec 5, 2023 at 3:57 PM Ashutosh Bapat wrote: > > On Tue, Dec 5, 2023 at 1:40 AM Laurenz Albe wrote: > > > > On Fri, 2023-12-01 at 18:49 +0530, Ashutosh Bapat wrote: > > > On Thu, Nov 30, 2023 at 10:29 PM Laurenz Albe > > > wrote: > > > > > > > > On Thu, 2023-11-30 at 19:22 +0530, Ashutosh Bapat wrote: > > > > > May be attach the patch to hackers thread (this) as well? > > > > > > > > If you want, sure. I thought it was good enough if the thread > > > > is accessible via the commitfest app. > > > > > > The addition is long enough that it deserved to be outside of parentheses. > > > > > > I think it's worth mentioning the exception but in a way that avoids > > > repeating what's mentioned in the last paragraph of just the previous > > > section. I don't have brilliant ideas about how to rephrase it. > > > > > > Maybe "Using ONLY to add or drop a constraint, other than PRIMARY and > > > UNIQUE, on only the partitioned table is supported as long as there > > > are no partitions. ...". > > > > I agree that the parenthesis is too long. I shortened it in the attached > > patch. Is that acceptable? > > It's still longer than the actual sentence :). I am fine with it if > somebody else finds it acceptable. It still reads a bit weird to me. How about the attached wording instead? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/ diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 22d04006ad..01b1d82b0d 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -4348,7 +4348,9 @@ ALTER INDEX measurement_city_id_logdate_key Using ONLY to add or drop a constraint on only the partitioned table is supported as long as there are no partitions. Once partitions exist, using ONLY -will result in an error. Instead, constraints on the partitions +will result in an error for any constraints other than +UNIQUE and PRIMARY KEY. +Instead, constraints on the partitions themselves can be added and (if they are not present in the parent table) dropped.
Re: [patch] pg_basebackup: mention that spread checkpoints are the default in --help
On Tue, Oct 31, 2023 at 8:01 PM Michael Banck wrote: > > Hi, > > On Tue, Oct 31, 2023 at 04:59:24PM +0530, Shlok Kyal wrote: > > I went through the Cfbot for this patch and found out that the build > > is failing with the following error (Link: > > https://cirrus-ci.com/task/4648506929971200?logs=build#L1217): > > Oops, sorry. Attached is a working third version of this patch. While I think Peters argument about one reading better than the other one, that does also increase the "help message bloat" mentioned by Michael. So I think we're better off actually using the original version, so I'm going to go ahead and push that one (and also to avoid endless bikeshedding)- -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
System username in pg_stat_activity
The attached patch adds a column "authuser" to pg_stat_activity which contains the username of the externally authenticated user, being the same value as the SYSTEM_USER keyword returns in a backend. This overlaps with for example the values in pg_stat_gss, but it will include values for authentication methods that don't have their own view such as peer/ident. gss/ssl info will of course still be shown, it is just in more than one place. I was originally thinking this column should be "sysuser" to map to the keyword, but since we already have "usesysid" as a column name in pg_stat_activity I figured that could be confusing since it actually means something completely different. But happy to change that back if people think that's better. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/ diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index de78d58d4b..ec6a25495f 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -23462,7 +23462,7 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); - + system_user system_user diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index b804eb8b5e..dc146a4c9f 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -784,6 +784,17 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + + authname name + + + The authentication method and identity (if any) that the user + used to log in. It contains the same value as +returns in the backend. + + + application_name text diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index e43e36f5ac..fe81aefbbb 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -873,6 +873,7 @@ CREATE VIEW pg_stat_activity AS S.leader_pid, S.usesysid, U.rolname AS usename, +S.authname, S.application_name, S.client_addr, S.client_hostname, diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c index 3dac79c30e..ffface119e 100644 --- a/src/backend/utils/activity/backend_status.c +++ b/src/backend/utils/activity/backend_status.c @@ -357,6 +357,11 @@ pgstat_bestart(void) else MemSet(_clientaddr, 0, sizeof(lbeentry.st_clientaddr)); + if (MyProcPort && GetSystemUser()) + strlcpy(lbeentry.st_authuser, GetSystemUser(), NAMEDATALEN); + else + MemSet(_authuser, 0, sizeof(lbeentry.st_authuser)); + #ifdef USE_SSL if (MyProcPort && MyProcPort->ssl_in_use) { diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c index 30a2063505..cd9769c52c 100644 --- a/src/backend/utils/adt/pgstatfuncs.c +++ b/src/backend/utils/adt/pgstatfuncs.c @@ -304,7 +304,7 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS) Datum pg_stat_get_activity(PG_FUNCTION_ARGS) { -#define PG_STAT_GET_ACTIVITY_COLS 31 +#define PG_STAT_GET_ACTIVITY_COLS 32 int num_backends = pgstat_fetch_stat_numbackends(); int curr_backend; int pid = PG_ARGISNULL(0) ? -1 : PG_GETARG_INT32(0); @@ -617,6 +617,9 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) nulls[30] = true; else values[30] = UInt64GetDatum(beentry->st_query_id); + + /* Authenticated user */ + values[31] = CStringGetTextDatum(beentry->st_authuser); } else { @@ -646,6 +649,7 @@ pg_stat_get_activity(PG_FUNCTION_ARGS) nulls[28] = true; nulls[29] = true; nulls[30] = true; + nulls[31] = true; } tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc, values, nulls); diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 7979392776..bca8ea8db8 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -5440,9 +5440,9 @@ proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f', proretset => 't', provolatile => 's', proparallel => 'r', prorettype => 'record', proargtypes => 'int4', - proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,bool,int4,int8}', - proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', - proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,gss_delegation,leader_pid,query_id}', + proallargtypes => '{int
Slightly improved meson error for docs tools
The meson build doesn't tell you what tool is missing when trying to build the docs (and you don't have it in the path.. sigh), it just tells you that something is missing. Attached is a small patch that at least lists what's expected -- I'm not sure if this is the way to go, or if we should somehow list the individual tools that are failing here? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/ diff --git a/meson.build b/meson.build index 57f9735feb..c317144b6b 100644 --- a/meson.build +++ b/meson.build @@ -587,7 +587,7 @@ if not docs_opt.disabled() if xmllint_bin.found() and xsltproc_bin.found() docs_dep = declare_dependency() elif docs_opt.enabled() -error('missing required tools for docs in HTML / man page format') +error('missing required tools (xmllint and xsltproc needed) for docs in HTML / man page format') endif endif
Compile warnings in dbcommands.c building with meson
When building current head on debian bullseye I get this compile warning: In file included from ../src/backend/commands/dbcommands.c:20: ../src/backend/commands/dbcommands.c: In function ‘createdb’: ../src/include/postgres.h:104:9: warning: ‘src_hasloginevt’ may be used uninitialized in this function [-Wmaybe-uninitialized] 104 | return (Datum) (X ? 1 : 0); | ^~~ ../src/backend/commands/dbcommands.c:683:8: note: ‘src_hasloginevt’ was declared here 683 | bool src_hasloginevt; |^~~ I only get this when building with meson, not when building with autotools. AFAICT, I have the same config: ./configure --enable-debug --enable-depend --with-python --enable-cassert --with-openssl --enable-tap-tests --with-icu vs meson setup build -Ddebug=true -Dpython=true -Dcassert=true -Dssl=openssl -Dtap-test=true -Dicu=enabled -Dnls=disabled in both cases the compiler is: gcc (Debian 10.2.1-6) 10.2.1 20210110 -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Commitfest manager January 2024
On Tue, Jan 2, 2024 at 3:45 AM vignesh C wrote: > > On Mon, 1 Jan 2024 at 21:01, Magnus Hagander wrote: > > > > On Mon, Jan 1, 2024 at 4:35 AM vignesh C wrote: > > > > > > On Sun, 24 Dec 2023 at 18:40, vignesh C wrote: > > > > > > > > On Sun, 24 Dec 2023 at 07:16, Michael Paquier > > > > wrote: > > > > > > > > > > On Sat, Dec 23, 2023 at 08:52:38AM +0530, vignesh C wrote: > > > > > > I didn't see anyone volunteering for the January Commitfest, so I'll > > > > > > volunteer to be CF manager for January 2024 Commitfest. > > > > > > > > > > (Adding Magnus in CC.) > > > > > > > > > > That would be really helpful. Thanks for helping! Do you have the > > > > > admin rights on the CF app? You are going to require them in order to > > > > > mark the CF as in-process, and you would also need to switch the CF > > > > > after that from "Future" to "Open" so as people can still post > > > > > patches once January one begins. > > > > > > > > I don't have admin rights for the CF app. Please provide admin rights. > > > > > > I have not yet got the admin rights, Kindly provide admin rights for the > > > CF app. > > > > It's been christmas holidays... > > > > > > What's your community username? > > My username is vignesh.postgres CF admin permissions granted! //Magnus
Re: Commitfest manager January 2024
On Mon, Jan 1, 2024 at 4:35 AM vignesh C wrote: > > On Sun, 24 Dec 2023 at 18:40, vignesh C wrote: > > > > On Sun, 24 Dec 2023 at 07:16, Michael Paquier wrote: > > > > > > On Sat, Dec 23, 2023 at 08:52:38AM +0530, vignesh C wrote: > > > > I didn't see anyone volunteering for the January Commitfest, so I'll > > > > volunteer to be CF manager for January 2024 Commitfest. > > > > > > (Adding Magnus in CC.) > > > > > > That would be really helpful. Thanks for helping! Do you have the > > > admin rights on the CF app? You are going to require them in order to > > > mark the CF as in-process, and you would also need to switch the CF > > > after that from "Future" to "Open" so as people can still post > > > patches once January one begins. > > > > I don't have admin rights for the CF app. Please provide admin rights. > > I have not yet got the admin rights, Kindly provide admin rights for the CF > app. It's been christmas holidays... What's your community username? //Magnus
Re: Password leakage avoidance
On Wed, Dec 27, 2023 at 10:31 PM Jonathan S. Katz wrote: > > On 12/24/23 12:15 PM, Tom Lane wrote: > > >> Maybe we need a PQcreaterole that provide the mechanism to set passwords > >> safely. It'd likely need to take all the options need for creating a > >> role, but that would at least give the underlying mechanism to ensure > >> we're always sending a hashed password to the server. > > > > I'm kind of down on that, because it seems like it'd be quite hard to > > design an easy-to-use C API that doesn't break the next time somebody > > adds another option to CREATE USER. What's so wrong with suggesting > > that the password be set in a separate step? (For comparison, typical > > Unix utilities like useradd(8) also tell you to set the password > > separately.) > > Modern development frameworks tend to reduce things down to one-step, > even fairly complex operations. Additionally, a lot of these frameworks > don't even require a developer to build backend applications that > involve doing actually work on the backend (e.g. UNIX), so the approach > of useradd(8) et al. are not familiar. Adding the additional step would > lead to errors, e.g. the developer not calling the "change password" > function to create the obfuscated password. Granted, we can push the > problem down to driver authors to "be better" and simplify the process > for their end users, but that still can be error prone, having seen this > with driver authors implementing PostgreSQL SCRAM and having made > (correctable) mistakes that could have lead to security issues. This seems to confuse "driver" with "framework". I would say the "two step" approach is perfectly valid for a driver whereas as you say most people building say webapps or similar on top of a framework will expect it to handle things for them. But that's more of a framework thing than a driver thing, depending on terminology. E.g. it would be up to the "Postgres support driver" in django/rails/whatnot to reduce it down to one step, not to a low level driver like libpq (or other low level drivers). None of those frameworks are likely to want to require direct driver access anyway, they *want* to take control of that process in my experience. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: proposal: change behavior on collation version mismatch
On Mon, Nov 27, 2023 at 9:30 PM Jeff Davis wrote: > > On Mon, 2023-11-27 at 11:06 -0800, Jeremy Schneider wrote: > > If we want to have a GUC that > > allows warning behavior, I think that's OK but I think it should be > > superuser-only and documented as a "developer" setting similar to > > zero_damaged_pages. > > A GUC seems sensible to express the availability-vs-safety trade-off. I > suspect we can get a GUC that defaults to "warning" committed, but > anything else will be controversial. A guc like this would bring a set of problems similar to what we have e.g. with fsync. That is, set it to "warnings only", insert a single row into the table with an "unlucky" key, set it back to "errors always" and you now have a corrupt database, but your setting reflects that it shouldn't be corrupt. Sure, people shouldn't do that - but people will, and it will make things harder to debug. There's been talk before about adding a "tainted" flag or similar to pg_control that gets set if you ever start the system with fsync=off. Similar things could be done here of course, but I'd worry a bit about adding another flag like this which can lead to hard-to-determine-state without resolving that. (The fact that we have "fsync" under WAL config and not developer options is an indication that we can't really use the classification of the config parameters are a good indicator of what's safe and not safe to set) I could get behind turning it into an error though :) //Magnus
Re: [HACKERS] pg_upgrade vs vacuum_cost_delay
On Fri, Nov 24, 2023 at 5:34 PM Bruce Momjian wrote: > > On Fri, Nov 24, 2023 at 01:10:01PM +0100, Michael Banck wrote: > > Hi, > > > > On Fri, Nov 24, 2023 at 12:17:56PM +0100, Magnus Hagander wrote: > > > On Fri, Nov 24, 2023 at 11:21 AM Michael Banck wrote: > > > > On Wed, Nov 22, 2023 at 11:23:34PM -0500, Bruce Momjian wrote: > > > > > + Non-zero values of > > > > > + vacuum_cost_delay will delay statistics > > > > > generation. > > > > > > > > Now I wonder wheter vacuumdb maybe should have an option to explicitly > > > > force vacuum_cost_delay to 0 (I don't think it has?)? > > > > > > That's exactly what I proposed, isn't it? :) > > > > You're right, I somehow only saw your mail after I had already sent > > mine. > > > > To make up for this, I created a patch that implements our propoals, see > > attached. > > This is already posssible with PGOPTIONS, so I don't see the need for > a separate option: > > PGOPTIONS='-c vacuum_cost_delay=99' psql -c 'SHOW vacuum_cost_delay;' > test > vacuum_cost_delay > --- > 99ms > (1 row) > > Here is a patch which shows its usage. Given how common this would be I think that's a pretty use-unfriendly way to do it. I'd vote for still adding it. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: [HACKERS] pg_upgrade vs vacuum_cost_delay
On Fri, Nov 24, 2023 at 11:21 AM Michael Banck wrote: > > Hi, > > On Wed, Nov 22, 2023 at 11:23:34PM -0500, Bruce Momjian wrote: > > + Non-zero values of > > + vacuum_cost_delay will delay statistics generation. > > Now I wonder wheter vacuumdb maybe should have an option to explicitly > force vacuum_cost_delay to 0 (I don't think it has?)? That's exactly what I proposed, isn't it? :) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: [HACKERS] pg_upgrade vs vacuum_cost_delay
On Thu, Nov 23, 2023 at 5:23 AM Bruce Momjian wrote: > > On Thu, Jun 16, 2016 at 04:45:14PM +0200, Magnus Hagander wrote: > > On Thu, Jun 16, 2016 at 4:35 PM, Euler Taveira wrote: > > > > On 16-06-2016 09:05, Magnus Hagander wrote: > > > Shouldn't pg_upgrade turn off vacuum cost delay when it vacuums the > > new > > > cluster? Not talking about the post-analyze script, but when it runs > > > vacuumdb to analyze and freeze before loading the new schema, in > > > prepare_new_cluster()? Those run during downtime, so it seems like > > you'd > > > want those to run as fast as possible. > > > > > Doesn't --new-options do the job? > > > > > > You could, but it seems like it should do it by default. > > Based on this seven year old post, I realized there are minimal > directions in pg_upgrade docs about how to generate statistics quickly, > so I created this patch to help. > > We do have docs on updating planner statistics: > > > https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-STATISTICS > > but that doesn't seem to cover cases where you are doing an upgrade or > pg_dump restore. Should I move this information into there instead? Wow, that's... A while :) I don't think that final sentence really helps much - for anybody who doesn't know that functionality well already, it will just be confusing. At the very least it should be a link that sends you to the documentation of how that functionality works? But beyond that, perhaps what we'd really want (now that vacuumdb has gained more functionality, and is used instead of the custom script all the way) to add is a parameter --no-cost-delay that would issue a SET to turn it off for the run? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: run pgindent on a regular basis / scripted manner
On Tue, Oct 17, 2023 at 11:07 PM Tom Lane wrote: > > Magnus Hagander writes: > > If it doesn't know how to rebuild it, aren't we going to be stuck in a > > catch-22 if we need to change it in certain ways? Since an old version > > of pg_bsd_indent would reject the patch that might include updating > > it. (And when it does, one should expect the push to take quite a long > > time, but given the infrequency I agree that part is probably not an > > issue) > > Everyone who has proposed this has included a caveat that there must > be a way to override the check. Given that minimal expectation, it > shouldn't be too hard to deal with pg_bsd_indent-updating commits. I haven't managed to fully keep up with the thread, so I missed that. And i can't directly find it looking back either - but as long as three's an actual idea for how to do that, the problem goes away :) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: run pgindent on a regular basis / scripted manner
On Tue, Oct 17, 2023 at 10:43 PM Jelte Fennema wrote: > > On Tue, 17 Oct 2023 at 18:53, Maciek Sakrejda wrote: > > Git push does have an --atomic flag to treat the entire push as a single > > operation. > > I decided to play around a bit with server hooks. Attached is a git > "update" hook that rejects pushes to the master branch when the new > HEAD of master does not pass pgindent. It tries to do the minimal > amount of work necessary. Together with the --atomic flag of git push > I think this would work quite well. > > Note: It does require that pg_bsd_indent is in PATH. While not perfect > seems like it would be acceptable in practice to me. Its version is > not updated very frequently. So manually updating it on the git server > when we do does not seem like a huge issue to me. If it doesn't know how to rebuild it, aren't we going to be stuck in a catch-22 if we need to change it in certain ways? Since an old version of pg_bsd_indent would reject the patch that might include updating it. (And when it does, one should expect the push to take quite a long time, but given the infrequency I agree that part is probably not an issue) And unless we're only enforcing it on master, we'd also need to make provisions for different versions of it on different branches, I think? Other than that, I agree it's fairly simple. It does nede a lot more sandboxing than what's in there now, but that's not too hard of a problem to solve, *if* this is what we want. (And of course needs to be integrated with the existing script since AFAIK you can't chain git hooks unless you do it manually - but that's mostliy mechanical) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Possibility to disable `ALTER SYSTEM`
On Mon, Sep 11, 2023 at 1:56 PM Martín Marqués wrote: > > Hi, > > > I would like to propose a patch that allows administrators to disable > > `ALTER SYSTEM` via either a runt-time option to pass to the Postgres server > > process at startup (e.g. `--disable-alter-system=true`, false by default) > > or a new GUC (or even both), without changing the current default method of > > the server. > > I'm actually going to put a strong +1 to Gabriele's proposal. It's an > undeniable problem (I'm only seeing arguments regarding other ways the > system would be insecure), and there might be real use cases for users > outside kubernetes for having this feature and using it (meaning > disabling the use of ALTER SYSTEM). If enough people are in favor of it *given the known issues with it*, I can drop my objection to a "meh, but I still don't think it's a good idea". But to do that, there would need to be a very in-your-face warning in the documentation about it like "note that this only disables the ALTER SYSTEM command. It does not prevent a superuser from changing the configuration remotely using other means". For example, in the very simplest, wth the POC patch out there now, I can still run: postgres=# CREATE TEMP TABLE x(t text); CREATE TABLE postgres=# INSERT INTO x VALUES ('work_mem=1TB'); INSERT 0 1 postgres=# COPY x TO '/home/mha/postgresql/inst/head/data/postgresql.auto.conf'; COPY 1 postgres=# SELECT pg_reload_conf(); pg_reload_conf t (1 row) postgres=# show work_mem; work_mem -- 1TB (1 row) Or anything similar to that. Yes, this is marginally harder than saying ALTER SYSTEM SET work_mem='1TB', but only very very marginally so. And from a security perspective, there is zero difference. But we do also allow "trust" authentication which is another major footgun from a security perspective, against which we only defend with warnings, so that in itself is not a reason not to do the same here. > In Patroni for example, the PostgreSQL service is controlled on all > nodes by Patroni, and these kinds of changes could end up breaking the > cluster if there was a failover. For this reason Patroni starts > postgres with some GUC options as CMD arguments so that values in > postgresql.conf or postgresql.auto.conf are ignored. The values in the > DCS are the ones that matter. Right. And patroni would need to continue to do that even with this patch, because it also needs to override if somebody puts something in postgresql.conf, no? Removing the defence against that seems like a bad idea... > (see more about how Patroni manages this here: > https://patroni.readthedocs.io/en/latest/patroni_configuration.html#postgresql-parameters-controlled-by-patroni > > But let's face it, that's a hack, not something to be proud of, even > if it does what is intended. And this is a product and we shouldn't be > advertising hacks to overcome limitations. It's in a way a hack. But it's not the fault of ALTER SYSTEM, as you'd also have to manage postgresql.conf. One slightly less hacky part might be to have patroni generate a config file of it's own and include it with the highest priority -- at that point it *would* be come a hack around ALTER SYSTEM, because ALTER SYSTEM has a higher priority than any manual user config file. But it is not today. Another idea to solve the problem could be to instead introduce a specific configuration file (either hardcoded or an include_final_parameter= parameter, in which case patroni or the k8s operator could set that parameter on the command line and that parameter only) that is parsed *after* postgresql.auto.conf and thereby would override the manual settings. This file would explicilty be documented as intended for this type of tooling, and when you have a tool - whether patroni or another declarative operator - it owns this file and can overwrite it with whatever it wants. And this would also retain the ability to use ALTER SYSTEM SET for *other* parameters, if they want to. That's just a very quick idea and there may definitely be holes in it, but I'm not sure those holes are any worse than what's suggested here, and I do thin kit's cleaner. > I find the opposition to this lacking good reasons, while I find the > implementation to be useful in some cases. Stopping ALTER SYSTEM SET without actually preventing the superuser from doing the same thing as they were doing before would to me be at least as much of a hack as what patroni does today is. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Possibility to disable `ALTER SYSTEM`
On Sat, Sep 9, 2023 at 5:14 PM Alvaro Herrera wrote: > > On 2023-Sep-08, Magnus Hagander wrote: > > > Now, it might be that you don't care at all about the *security* side > > of the feature, and only care about the convenience side. But in that > > case, the original suggestion from Tom of using an even trigger seems > > like a fine enough solution? > > ALTER SYSTEM, like all system-wide commands, does not trigger event > triggers. These are per-database only. > > https://www.postgresql.org/docs/16/event-trigger-matrix.html Hah, didn't think of that. And yes, that's a very good point. But one way to fix that would be to actually make event triggers for system wide commands, which would then be useful for other things as well... -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Possibility to disable `ALTER SYSTEM`
On Fri, Sep 8, 2023 at 5:31 PM Tom Lane wrote: > > Alvaro Herrera writes: > > I don't understand Tom's resistance to this request. > > It's false security. If you think you are going to prevent a superuser > from messing with the system's configuration, you are going to need a > lot more restrictions than this, and we'll be forever getting security > reports that "hey, I found another way for a superuser to get filesystem > access". I think the correct answer to this class of problems is "don't > give superuser privileges to clients running inside the container". +1. And to make that happen, the appropriate thing is to identify *why* they are using superuser today, and focus efforts on finding ways for them to do that without being superuser. > > I did not like the mention of COPY PROGRAM, though, and in principle I > > do not support the idea of treating it the same way as ALTER SYSTEM. > > It's one of the easiest ways to modify postgresql.conf from SQL. If you > don't block that off, the feature is certainly not secure. (But of > course, there are more ways.) It's easier to just create a function in an untrusted language. Same principle. Once you're superuser, you can break through anything. We need a "allowlist" of things a user can do, rather than a blocklist of "they can do everything they can possibly think of and a computer is capable of doing, except for this one specific thing". Blocklisting individual permissions of a superuser will never be secure. Now, it might be that you don't care at all about the *security* side of the feature, and only care about the convenience side. But in that case, the original suggestion from Tom of using an even trigger seems like a fine enough solution? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Release notes wording about logical replication as table owner
We have: "This improves security and now requires subscription owners to be either superusers or to have SET ROLE permissions on all tables in the replication set. The previous behavior of performing all operations as the subscription owner can be enabled with the subscription run_as_owner option." How does one have SET ROLE permissions on a table? I think that's supposed to be: "subscription owners be either superusers or to have SET ROLE permissions on all roles owning tables in the replication set." Or something like that? Or can someone suggest a better wording? //Magnus
Re: Commitfest manager for September
On Wed, Aug 30, 2023 at 2:50 PM Peter Eisentraut wrote: > > On 28.08.23 15:46, Peter Eisentraut wrote: > > I would like to be the commitfest manager for CF 2023-09. > > I think my community account needs to have some privilege change to be > able to act as CF manager in the web interface. Could someone make that > happen? Done! -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: list of acknowledgments for PG16
On Tue, Aug 22, 2023 at 4:03 PM Joe Conway wrote: > > On 8/22/23 09:44, Tom Lane wrote: > > Alvaro Herrera writes: > >> On 2023-Aug-22, Peter Eisentraut wrote: > >>> The list of acknowledgments for the PG16 release notes has been committed. > >>> It should show up here sometime: > >>> <https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-ACKNOWLEDGEMENTS>. > > > >> Hmm, I think these docs would only regenerate during the RC1 release, so > >> it'll be a couple of weeks, unless we manually poke the doc builder. > > > > Yeah. I could produce a new set of tarballs from the v16 branch tip, > > but I don't know the process (nor have the admin permissions) to > > extract the HTML docs and put them on the website. > > > These days the docs update is part of a scripted process for doing an > entire release. > > I'm sure we could figure out how to just release the updated docs, but > with RC1 a week away, is it really worthwhile? We've also been pretty strict to say that we don't *want* unreleased docs on the website for any of our stable branches before, so changing that would be a distinct policy change as well. And doing such an exception for just one commit seems like it's set up for problems -- you'd then have to do another one as soon as an adjustment is made. And in the end, that would mean changing the policy to say that the "release branches documentation tracks branch tip instead of releases". Which I generally speaking don't think is a good idea, because then they don't match what people are running anymore. I think it only really makes sense for this one part of the docs -- even other changes to the REL16 docs should be excluded until the next release is (this time, RC1). Bottom line is, definite -1 for doing a one-off change that violates the principle we're on. Now, if we want a *separate* location where we continuously load branch tip docs that's a different thing and certainly something we could consider. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Should we remove db_user_namespace?
On Sat, Jul 15, 2023 at 1:34 AM Nathan Bossart wrote: > > On Mon, Jul 10, 2023 at 03:43:07PM +0900, Michael Paquier wrote: > > On Wed, Jul 05, 2023 at 08:49:26PM -0700, Nathan Bossart wrote: > >> On Thu, Jul 06, 2023 at 08:21:18AM +0900, Michael Paquier wrote: > >>> Removing the GUC from this table is kind of annoying. Cannot this be > >>> handled like default_with_oids or ssl_renegotiation_limit to avoid any > >>> kind of issues with the reload of dump files and the kind? > >> > >> Ah, good catch. > > > > Thanks. Reading through the patch, this version should be able to > > handle the dump reloads. > > Hm. Do we actually need to worry about this? It's a PGC_SIGHUP GUC, so it > can only be set at postmaster start or via a configuration file. Any dump > files that are trying to set it or clients that are trying to add it to > startup packets are already broken. I guess keeping the GUC around would > avoid breaking any configuration files or startup scripts that happen to be > setting it to false, but I don't know if that's really worth worrying > about. I'd lean towards "no". A hard break, when it's a major release, is better than a "it stopped having effect but didn't tell you anything" break. Especially when it comes to things like startup scripts etc. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Should we remove db_user_namespace?
On Fri, Jun 30, 2023 at 11:43 PM Nathan Bossart wrote: > > On Fri, Jun 30, 2023 at 05:40:18PM -0400, Tom Lane wrote: > > Might be worth asking on pgsql-general whether anyone knows of > > active use of this feature. If not, I'm good with killing it. > > Will do. Strong +1 from here for removing it, assuming you don't find a bunch of users on -general who are using it. Having never come across one myself, I think it's unlikely, but I agree it's good to ask. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Add wait event for log emission?
On Tue, Jun 13, 2023 at 6:59 PM Andres Freund wrote: > > Hi, > > I just helped somebody debug a postgres performance problem that turned out to > be not actually be postgres' fault. It turned out to be because postgres' > stdout/stderr were piped to a program, and that program was slow. Whenever the > pipe buffer filled up, postgres stopped making progress. > > That's not postgres' fault. But we make it too hard to debug such an > issue. There's no way to figure this out from within postgres, one pretty much > needs to look at stack traces. > > I think we should add a few wait events for log emission. I think it'd be good > to have one wait event for each log destination. > > That's not perfect - we'd e.g. still not be able to debug where the logger > process is stuck, due it not being in pg_stat_activity. But other processes > reporting the wait event for writing to the logger process would be a pretty > good hint. +1. Would it make sense to at the same time create a separate one for syslog, or just use the same? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: win32ver data in meson-built postgres.exe
On Thu, Jun 8, 2023 at 3:45 AM Noah Misch wrote: > > On Wed, Jun 07, 2023 at 04:47:26PM -0700, Andres Freund wrote: > > On 2023-06-07 16:14:07 -0700, Noah Misch wrote: > > > A postgres.exe built with meson, ninja, and MSVC lacks the version > > > metadata > > > that postgres.exe gets under non-meson build systems. Patch attached. > > > > I dimly recall that we discussed that and basically decided that it doesn't > > really make sense to attach this information to postgres.exe. > > I looked for a discussion behind that, but I didn't find it. A key > user-visible consequence is whether the task manager "Name" column shows (1) > "PostgreSQL Server" (version data present) vs. (2) "postgres.exe" (no version > data). While (2) is not terrible, (1) is more typical on Windows. I don't > see cause to migrate to (2) after N years of sending (1). Certainly this part > of the user experience should not depend on one's choice of build system. +1, both on that it should be the same across build systems, and that the variant that we have in the msvc build system is the best one. And if we don't have the version structure in it, it will cause issues for installers (I think) and software inventory processes (definitely) that also use that. I don't recall a discussion about removing it, but it's not unlikely I missed it if it did take place... > > > This preserves two quirks of the older build systems. First, > > > postgres.exe is icon-free. > > > > We could also just change that. > > I would be +1 for that (only if done for all build systems). Showing the > elephant in task manager feels better than showing the generic-exe icon. I think this decision goes back all the way to the ancient times, and the argument was then "user should not use the postgres.exe file when clicking around" sort of. Back then, task manager didn't show the icon at all, regardless. It does now, so I'm +1 to add the icon (in all the build systems). -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Upgrade of git.postgresql.org
On Mon, May 15, 2023 at 9:37 PM Magnus Hagander wrote: > > The pginfra team is about to begin an upgrade of git.postgresql.org to > a new version of debian. During the operation there may be some > intermittent outages -- we will let you know once the upgrade is > complete. This upgrade has now been completed. If you see any issues with it from now on, please let us know and we'll look into it! -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Upgrade of git.postgresql.org
The pginfra team is about to begin an upgrade of git.postgresql.org to a new version of debian. During the operation there may be some intermittent outages -- we will let you know once the upgrade is complete. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: run pgindent on a regular basis / scripted manner
On Sat, Apr 22, 2023 at 9:59 PM Noah Misch wrote: > > (Given that another commentator is "absolutely against" a hook, this message > is mostly for readers considering this for other projects.) > > On Sat, Apr 22, 2023 at 03:23:59PM +0200, Magnus Hagander wrote: > > On Tue, Feb 7, 2023 at 5:43 AM Noah Misch wrote: > > > On Mon, Feb 06, 2023 at 06:17:02PM +0100, Peter Eisentraut wrote: > > > > Also, pgindent takes tens of seconds to run, so hooking that into the > > > > git > > > > push process would slow this down quite a bit. > > > > > > The pre-receive hook would do a full pgindent when you change > > > typedefs.list. > > > Otherwise, it would reindent only the files being changed. The average > > > push > > > need not take tens of seconds. > > > > It would probably ont be tens of seconds, but it would be slow. It > > would need to do a clean git checkout into an isolated environment and > > spawn in there, and just that takes time. > > That would be slow, but I wouldn't do it that way. I'd make "pg_bsd_ident > --pre-receive --show-diff" that, instead of reading from the filesystem, gets > the bytes to check from the equivalent of this Perl-like pseudocode: > > while (<>) { > my($old_hash, $new_hash, $ref) = split; > foreach my $filename (split /\n/, `git diff --name-only > $old_hash..$new_hash`) { > $file_content = `git show $new_hash $filename`; > } > } > > I just ran pgindent on the file name lists of the last 1000 commits, and > runtime was less than 0.5s for each of 998/1000 commits. There's more a real > implementation might handle: > > - pg_bsd_indent changes > - typedefs.list changes > - skip if the break-glass "pgindent: no" appears in a commit message > - commits changing so many files that a clean "git checkout" would be faster Wouldn't there also be the case of a header file change that could potentially invalidate a whole lot of C files? There's also the whole potential problem of isolations. We need to run the whole thing in an isolated environment (because any way in at this stage could lead to an exploit if a committer key is compromised at any point). And at least in the second case, it might not have access to view that data yet because it's not in... Could probably be worked around, but not trivially so. (But as mentioned above, I think the conclusion is we don't want this enforced in a receive hook anyway) > > And it would have to also > > know to rebuild pg_bsd_indent on demand, which would require a full > > ./configure run (or meson equivalent). etc. > > > > So while it might not be tens of seconds, it most definitely won't be fast. > > A project more concerned about elapsed time than detecting all defects might > even choose to take no synchronous action for pg_bsd_indent and typedefs.list > changes. When a commit changes either of those, the probability that the > committer already ran pgindent rises substantially. True, but it's far from 100% -- and if you got something in that didn't work, then the *next* committer would have to clean it up -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: run pgindent on a regular basis / scripted manner
On Sat, Apr 22, 2023 at 4:12 PM Andrew Dunstan wrote: > > > On 2023-04-22 Sa 08:47, Magnus Hagander wrote: > > On Sat, Apr 22, 2023 at 1:42 PM Andrew Dunstan wrote: > > On 2023-04-22 Sa 04:50, Michael Paquier wrote: > > On Fri, Apr 21, 2023 at 09:58:17AM +0200, Jelte Fennema wrote: > > For 2 the upstream thread listed two approaches: > a. Install a pre-receive git hook on the git server that rejects > pushes to master that are not indented > b. Add a test suite that checks if the code is correctly indented, so > the build farm would complain about it. (Suggested by Peter E) > > I think both a and b would work to achieve 2. But as Peter E said, b > indeed sounds like less of a divergence of the status quo. So my vote > would be for b. > > FWIW, I think that there is value for both of them. Anyway, isn't 'a' > exactly the same as 'b' in design? Both require a build of > pg_bsd_indent, meaning that 'a' would also need to run an equivalent > of the regression test suite, but it would be actually costly > especially if pg_bsd_indent itself is patched. I think that getting > more noisy on this matter with 'b' would be enough, but as an extra > PG_TEST_EXTRA for committers to set. > > Such a test suite would need a dependency to the 'git' command itself, > which is not something that could be safely run in a release tarball, > in any case. > > > Perhaps we should start with a buildfarm module, which would run pg_indent > --show-diff. That would only need to run on one animal, so a failure wouldn't > send the whole buildfarm red. This would be pretty easy to do. > > Just to be clear, you guys are aware we already have a git repo > that's supposed to track "head + pg_indent" at > https://git.postgresql.org/gitweb/?p=postgresql-pgindent.git;a=shortlog;h=refs/heads/master-pgindent > right? > > I see it is currently not working and this has not been noticed by > anyone, so I guess it kind of indicates nobody is using it today. The > reason appears to be that it uses pg_bsd_indent that's in our apt > repos and that's 2.1.1 and not 2.1.2 at this point. But if this is a > service that would actually be useful, this could certainly be ficked > pretty easy. > > But bottom line is that if pgindent is as predictable as it should be, > it might be easier to use that one central place that already does it > rather than have to build a buildfarm module? > > > Now that pg_bsd_indent is in the core code why not just use that? yeah, it just required building. And the lazy approach was to use the DEB :) For a quick fix I've built the current HEAD and have it just using that one -- right now it'll fail again when a change is made to it, but I'll get that cleaned up. It's back up and running, and results are at https://git.postgresql.org/gitweb/?p=postgresql-pgindent.git;a=shortlog;h=refs/heads/master-pgindent -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: run pgindent on a regular basis / scripted manner
On Tue, Feb 7, 2023 at 5:43 AM Noah Misch wrote: > > On Mon, Feb 06, 2023 at 06:17:02PM +0100, Peter Eisentraut wrote: > > Also, pgindent takes tens of seconds to run, so hooking that into the git > > push process would slow this down quite a bit. > > The pre-receive hook would do a full pgindent when you change typedefs.list. > Otherwise, it would reindent only the files being changed. The average push > need not take tens of seconds. It would probably ont be tens of seconds, but it would be slow. It would need to do a clean git checkout into an isolated environment and spawn in there, and just that takes time. And it would have to also know to rebuild pg_bsd_indent on demand, which would require a full ./configure run (or meson equivalent). etc. So while it might not be tens of seconds, it most definitely won't be fast. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: run pgindent on a regular basis / scripted manner
On Sat, Apr 22, 2023 at 1:42 PM Andrew Dunstan wrote: > > > On 2023-04-22 Sa 04:50, Michael Paquier wrote: > > On Fri, Apr 21, 2023 at 09:58:17AM +0200, Jelte Fennema wrote: > > For 2 the upstream thread listed two approaches: > a. Install a pre-receive git hook on the git server that rejects > pushes to master that are not indented > b. Add a test suite that checks if the code is correctly indented, so > the build farm would complain about it. (Suggested by Peter E) > > I think both a and b would work to achieve 2. But as Peter E said, b > indeed sounds like less of a divergence of the status quo. So my vote > would be for b. > > FWIW, I think that there is value for both of them. Anyway, isn't 'a' > exactly the same as 'b' in design? Both require a build of > pg_bsd_indent, meaning that 'a' would also need to run an equivalent > of the regression test suite, but it would be actually costly > especially if pg_bsd_indent itself is patched. I think that getting > more noisy on this matter with 'b' would be enough, but as an extra > PG_TEST_EXTRA for committers to set. > > Such a test suite would need a dependency to the 'git' command itself, > which is not something that could be safely run in a release tarball, > in any case. > > > Perhaps we should start with a buildfarm module, which would run pg_indent > --show-diff. That would only need to run on one animal, so a failure wouldn't > send the whole buildfarm red. This would be pretty easy to do. Just to be clear, you guys are aware we already have a git repo that's supposed to track "head + pg_indent" at https://git.postgresql.org/gitweb/?p=postgresql-pgindent.git;a=shortlog;h=refs/heads/master-pgindent right? I see it is currently not working and this has not been noticed by anyone, so I guess it kind of indicates nobody is using it today. The reason appears to be that it uses pg_bsd_indent that's in our apt repos and that's 2.1.1 and not 2.1.2 at this point. But if this is a service that would actually be useful, this could certainly be ficked pretty easy. But bottom line is that if pgindent is as predictable as it should be, it might be easier to use that one central place that already does it rather than have to build a buildfarm module? //Magnus
Re: User functions for building SCRAM secrets
password policy or anything else that requires the server to see the cleartext or cleartext-equivalent of the password can then revoke this role from public, and force password changes to go through a security definer funciton, like SELECT pg_change_password_with_policy('joe', 'mysupersecretpasswrod'). This function can then be under the control of an extension or whatever you want. If one had the client under control one could for example do the policy validation on the client and pass it to the backend with some internal hash as well -- this would be entirely under the control of the application though, as a generic libpq connection could and should not be considered "client under control" in this case. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: When to drop src/tools/msvc support
On Tue, Apr 11, 2023 at 12:27 AM Andres Freund wrote: > > Hi, > > On 2023-04-10 19:55:35 +0100, Dave Page wrote: > > Projects other than the EDB installers use the MSVC build system - e.g. > > pgAdmin uses it’s own builds of libpq and other tools (psql, pg_dump etc) > > that are pretty heavily baked into a fully automated build system (even the > > build servers and all their requirements are baked into Ansible). > > > > Changing that lot would be non-trivial, though certainly possible, and I > > suspect we’re not the only ones doing that sort of thing. > > Do you have a link to the code for that, if it's open? Just to get an > impression for how hard it'd be to switch over? The pgadmin docs/readme refers to https://github.com/pgadmin-org/pgadmin4/tree/master/pkg/win32 It clearly doesn't have the full automation stuff, but appears to have the parts about building the postgres dependency. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: When to drop src/tools/msvc support
On Mon, Apr 10, 2023 at 6:56 PM Tom Lane wrote: > > Robert Haas writes: > > However, if this is the direction we're going, we probably need to > > give pgsql-packagers a heads up ASAP, because anybody who is still > > relying on the MSVC system to build Windows binaries is presumably > > going to need some time to adjust. If we rip out the build system > > somebody is using a couple of weeks before beta, that might make it > > difficult for that person to get the beta out promptly. And I think > > there's probably more than just EDB who would be in that situation. > > Oh ... that's a good point. Is there anyone besides EDB shipping > MSVC-built executables? Would it even be practical to switch to > meson with a month-or-so notice? Seems kind of tight, and it's > not like the packagers volunteered to make this switch. > > Maybe we have to bite the bullet and keep MSVC for v16. > If we drop it as soon as v17 opens, there's probably not that > much incremental work involved compared to dropping for v16. Not involved with any such build tasks anymore, but I think we can definitely assume there are others than EDB who do that. It's also used by people who build add-on modules to be loaded in the EDB-installer-installed systems, I'm sure. It seems a bit aggressive to those to drop the entire build system out just before beta. Thus, +1 on actually keeping it up and dropping it immediately as v17 opens, giving them a year of advantage. And probably updating the docs (if anybody were to read them.. but at least then we tried) stating that it's deprecated and will be removed in v17. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Should we remove vacuum_defer_cleanup_age?
On Sat, Mar 18, 2023 at 12:09 AM Andres Freund wrote: > Hi, > > As evidenced by the bug fixed in be504a3e974, vacuum_defer_cleanup_age is > not > heavily used - the bug was trivial to hit as soon as > vacuum_defer_cleanup_age > is set to a non-toy value. It complicates thinking about visibility > horizons > substantially, as vacuum_defer_cleanup_age can make them go backward > substantially. Obviously it's also severely undertested. > > I started writing a test for vacuum_defer_cleanup_age while working on the > fix > referenced above, but now I am wondering if said energy would be better > spent > removing vacuum_defer_cleanup_age alltogether. > > vacuum_defer_cleanup_age was added as part of hot standby. Back then we did > not yet have hot_standby_feedback. Now that that exists, > vacuum_defer_cleanup_age doesn't seem like a good idea anymore. It's > pessimisistic, i.e. always retains rows, even if none of the standbys has > an > old enough snapshot. > > The only benefit of vacuum_defer_cleanup_age over hot_standby_feedback is > that > it provides a limit of some sort. But transactionids aren't producing dead > rows in a uniform manner, so limiting via xid isn't particularly useful. > And > even if there are use cases, it seems those would be served better by > introducing a cap on how much hot_standby_feedback can hold the horizon > back. > > I don't think I have the cycles to push this through in the next weeks, > but if > we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a > good idea to mark it as deprecated in 16? > +1. I haven't seen any (correct) use of this in many many years on my end at least. And yes, having a cap on hot_standby_feedback would also be great. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: use __builtin_clz to compute most significant bit set
On Sat, Feb 25, 2023 at 9:32 PM Joseph Yu wrote: > hi community > > This is the first time for me to submit a patch to Postgres community. > > instead of using for loop to find the most significant bit set. we could > use __builtin_clz function to first find the number of leading zeros for > the mask and then we can find the index by 32 - __builtin_clz(mask). > Hi! This file has already been removed, as of 4f1f5a7f85. Which already uses __builtin_clz if it' available. Were you perhaps looking at an old version instead of the master branch? -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: run pgindent on a regular basis / scripted manner
On Tue, Feb 7, 2023 at 1:56 PM Amit Kapila wrote: > On Tue, Feb 7, 2023 at 5:16 PM Andrew Dunstan wrote: > > > > On 2023-02-06 Mo 23:43, Noah Misch wrote: > > > > > > Well, we did talk about adding a pre-commit hook to the repository, with > > instructions for how to enable it. And I don't see a problem with adding > the > > pre-receive we're discussing here to src/tools/something. > > > > Yeah. I don't think we are seriously considering putting any > restrictions > > in place on gitmaster > > > > I could have sworn that was exactly what we were discussing, a > pre-receive > > hook on gitmaster. > > > > > > That's one idea that's been put forward, but it seems clear that some > people are nervous about it. > > > > Maybe a better course would be to continue improving the toolset and get > more people comfortable with using it locally and then talk about > integrating it upstream. > > > > Yeah, that sounds more reasonable to me as well. > If we wanted something "in between" we could perhaps also have a async ci job that runs after each commit and sends an emali to the committer if the commit doesn't match up, instead of rejecting it hard but still getting some relatively fast feedback. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Remove source code display from \df+?
On Thu, Jan 12, 2023 at 6:23 AM Pavel Stehule wrote: > > > st 11. 1. 2023 v 22:11 odesílatel Tom Lane napsal: > >> Pavel Stehule writes: >> > st 11. 1. 2023 v 19:31 odesílatel Magnus Hagander >> > napsal: >> >> This is only about Internal and C, isn't it? Isn't the oid of these >> >> static, and identified by INTERNALlanguageId and ClanguageId >> respectively? >> >> So we could just have the query show the prosrc column if the language >> oid >> >> is one of those two, and otherwise show "Please use \sf to view the >> >> source"? >> >> > I think it can be acceptable - maybe we can rename the column "source >> code" >> > like "internal name" or some like that. >> >> Yeah, "source code" has always been kind of a misnomer for these >> languages. >> >> > again I don't think printing "Please use \sf to view the source"? " >> often >> > can be user friendly. \? is clear and \sf is easy to use >> >> We could shorten it to "See \sf" or something like that. But if we change >> the column header to "internal name" or the like, then the column just >> obviously doesn't apply for non-internal languages, so leaving it null >> should be fine. >> > > +1 > > Sure, that works for me as well. I agree the suggested text was way too long, I was more thinking of "something in this direction" rather than that exact text. But yes, with a change of names, we can leave it NULL as well. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
On Wed, Jan 11, 2023 at 8:06 PM Jacob Champion wrote: > On Wed, Jan 11, 2023 at 10:23 AM Magnus Hagander > wrote: > > Sorry to jump in (very) late in this game. So first, I like this general > approach :) > > Thanks! > > > It feels icky to have to add configure tests just to make a test work. > But I guess there isn't really a way around that if we want to test the > full thing. > > I agree... > > > However, shouldn't we be using X509_get_default_cert_file_env() to get > the name of the env? Granted it's unlikely to be anything else, but if > it's an API you're supposed to use. (In an ideal world that function would > not return anything in LibreSSL but I think it does include something, and > then just ignores it?) > > I think you're right, but before I do that, is the cure better than > the disease? It seems like that would further complicate a part of the > Perl tests that is already unnecessarily complicated. (The Postgres > code doesn't use the envvar at all.) Unless you already know of an > OpenSSL-alike that doesn't use that same envvar name? > Fair point. No, I have not run into one, I just recalled having seen the API :) And you're right -- I didn't consider that we were looking at that one in the *perl* code, not the C code. In the C code it would've been a trivial replacement. In the perl, I agree it's not worth it -- at least not until we run into a platform where it *would' matter. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Remove source code display from \df+?
On Wed, Jan 11, 2023 at 7:24 PM Isaac Morland wrote: > On Wed, 11 Jan 2023 at 13:11, Pavel Stehule > wrote: > > please, don't send top post replies - >> https://en.wikipedia.org/wiki/Posting_style >> > > Sorry about that; I do know to do it properly and usually get it right. > GMail doesn’t seem to have an option (that I can find) to leave no space at > the top and put my cursor at the bottom; it nudges pretty firmly in the > direction of top-posting. Thanks for the reminder. > > >> I don't think printing a few first rows is a good idea - usually there is >> nothing interesting (same is PL/Perl, PL/Python, ...) >> >> If the proposed feature can be generic, then this information should be >> stored somewhere in pg_language. Or we can redesign usage of prosrc and >> probin columns - but this can be a much more massive change. >> >>> <http://www.redpill-linpro.com/> >>>>> >>>> > I’m looking for a quick win. So I think that means either drop the source > column entirely, or show single-line source values only and nothing or a > placeholder for anything that is more than one line, unless somebody comes > up with another suggestion. Originally I was thinking just to remove > entirely, but I’ve seen a couple of comments suggesting that people would > find it unfortunate if the source weren’t shown for internal/C functions, > so now I’m leaning towards showing single-line values only. > > I agree that showing the first line or couple of lines isn't likely to be > very useful. The way I format my functions, the first line is always blank > anyway: I write the bodies like this: > > $$ > BEGIN > … > END; > $$; > > Even if somebody uses a different style, the first line is probably just > "BEGIN" or something equally formulaic. > This is only about Internal and C, isn't it? Isn't the oid of these static, and identified by INTERNALlanguageId and ClanguageId respectively? So we could just have the query show the prosrc column if the language oid is one of those two, and otherwise show "Please use \sf to view the source"? -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: [PATCH] Add `verify-system` sslmode to use system CA pool for server cert
On Wed, Jan 11, 2023 at 6:27 PM Jacob Champion wrote: > On Wed, Jan 11, 2023 at 6:37 AM Jelte Fennema wrote: > > > > LGTM. As far as I can tell this is ready for a committer. > > Thanks for the reviews! > Sorry to jump in (very) late in this game. So first, I like this general approach :) It feels icky to have to add configure tests just to make a test work. But I guess there isn't really a way around that if we want to test the full thing. However, shouldn't we be using X509_get_default_cert_file_env() to get the name of the env? Granted it's unlikely to be anything else, but if it's an API you're supposed to use. (In an ideal world that function would not return anything in LibreSSL but I think it does include something, and then just ignores it?) -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Remove source code display from \df+?
On Wed, Jan 11, 2023 at 6:19 PM Pavel Stehule wrote: > > > st 11. 1. 2023 v 17:50 odesílatel Isaac Morland > napsal: > >> I find \df+ much less useful than it should be because it tends to be >> cluttered up with source code. Now that we have \sf, would it be reasonable >> to remove the source code from the \df+ display? This would make it easier >> to see function permissions and comments. If somebody wants to see the full >> definition of a function they can always invoke \sf on it. >> >> If there is consensus on the idea in principle I will write up a patch. >> > > +1 > > +1 but maybe with a twist. For any functions in a procedural language like plpgsql, it makes it pretty useless today. But when viewing an internal or C language function, it's short enough and still actually useful. Maybe some combination where it would keep showing those for such language, but would show "use \sf to view source" for procedural languages? -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
On Fri, Dec 23, 2022 at 2:06 AM Michael Paquier wrote: > On Thu, Dec 22, 2022 at 05:03:35PM +0530, Bharath Rupireddy wrote: > > On Thu, Dec 22, 2022 at 4:57 PM Michael Paquier > wrote: > >> As in using "sequence number" removing "file" from the docs and > >> changing the OUT parameter name to segment_number rather than segno? > >> Fine by me. > > > > +1. > > Okay, done this way. > > Thanks! //Magnus
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
On Wed, Dec 21, 2022 at 1:09 AM Michael Paquier wrote: > On Tue, Dec 20, 2022 at 06:04:40PM +0530, Bharath Rupireddy wrote: > > On Tue, Dec 20, 2022 at 1:27 PM Magnus Hagander > wrote: > >> Caught this thread late. To me, pg_dissect_walfile_name() is a > >> really strange name for a function. Grepping our I code I see the > >> term dissect s used somewhere inside the regex code and exactly > >> zero instances elsewhere. Which is why I definitely didn't > >> recognize the term... > >> > >> Wouldn't something like pg_split_walfile_name() be a lot more > >> consistent with the rest of our names? > > Fine by me to change that if there is little support for the current > naming, though the current one does not sound that bad to me either. > > > Hm. FWIW, here's the patch. > > "split" is used a lot for the picksplit functions, but not in any of > the existing functions as a name. Some extra options: parse, read, > extract, calculate, deduce, get. "parse" would be something I would > be OK with. > Not sure what you mean? We certainly have a lot of functions called split that are not the picksplit ones. split_part(). regexp_split_to_array(), regexp_split_to_table()... And ther'es things like tuiple_data_split() in pageinspect. There are many other examples outside of postgres as well, e.g. python has a split() of pathnames, "almost every language" has a split() on strings etc. I don't think I've ever seen dissect in a place like that either (though Im sure it exists somewhere, it's hardly common) Basically, we take one thing and turn it into 3. That very naturally rings with "split" to me. Parse might work as well, certainly better than dissect. I'd still prefer split though. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Add LSN along with offset to error messages reported for WAL file read/write/validate header failures
On Tue, Dec 20, 2022 at 5:40 AM Michael Paquier wrote: > On Tue, Dec 20, 2022 at 09:01:02AM +0900, Michael Paquier wrote: > > Yeah, my mind was considering as well yesterday the addition of a note > > in the docs about something among these lines, so fine by me. > > And applied that, after tweaking a few tiny things on a last lookup > with a catversion bump. Note that the example has been moved at the > bottom of the table for these functions, which is more consistent with > the surroundings. > > Hi! Caught this thread late. To me, pg_dissect_walfile_name() is a really strange name for a function. Grepping our I code I see the term dissect s used somewhere inside the regex code and exactly zero instances elsewhere. Which is why I definitely didn't recognize the term... Wouldn't something like pg_split_walfile_name() be a lot more consistent with the rest of our names? //Magnus
Re: Ordering behavior for aggregates
On Tue, Dec 13, 2022 at 1:51 PM Vik Fearing wrote: > The standard only defines an ORDER BY clause inside of an aggregate for > ARRAY_AGG(). As an extension to the standard, we allow it for all > aggregates, which is very convenient for non-standard things like > string_agg(). > > However, it is completely useless for things like AVG() or SUM(). If > you include it, the aggregate will do the sort even though it is neither > required nor desired. > > I am proposing something like pg_aggregate.aggordering which would be an > enum of behaviors such as f=Forbidden, a=Allowed, r=Required. Currently > all aggregates would have 'a' but I am thinking that a lot of them could > be switched to 'f'. In that case, if a user supplies an ordering, an > error is raised. > Should there perhaps also be an option for "ignored" where we'd allow the user to specify it, but not actually do the sort because we know it's pointless? Or maybe that should be the behaviour of "forbidden", which should then perhaps have a different name? My main motivation behind this is to be able to optimize aggregates that > could stop early such as ANY_VALUE(), but also to self-optimize queries > written in error (or ignorance). > > There is recurring demand for a first_agg() of some sort, and that one > (whether implemented in core or left to extensions) would use 'r' so > that an error is raised if the user does not supply an ordering. > > I have not started working on this because I envision quite a lot of > bikeshedding, but this is the approach I am aiming for. > > Thoughts? > For consistency, should we have a similar flag for DISITNCT? That could be interesting to forbid for something like first_agg() wouldn't it? I'm not sure what the usecase would be to require it, but maybe there is one? -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Checksum errors in pg_stat_database
On Mon, Dec 12, 2022 at 12:40 AM Michael Paquier wrote: > On Sun, Dec 11, 2022 at 09:18:42PM +0100, Magnus Hagander wrote: > > It would be less of a concern yes, but I think it still would be a > concern. > > If you have a large amount of corruption you could quickly get to > millions > > of rows to keep track of which would definitely be a problem in shared > > memory as well, wouldn't it? > > Yes. I have discussed this item with Bertrand off-list and I share > the same concern. This would lead to an lot of extra workload on a > large seqscan for a corrupted relation when the stats are written > (shutdown delay) while bloating shared memory with potentially > millions of items even if variable lists are handled through a dshash > and DSM. > > > But perhaps we could keep a list of "the last 100 checksum failures" or > > something like that? > > Applying a threshold is one solution. Now, a second thing I have seen > in the past is that some disk partitions were busted but not others, > and the current database-level counters are not enough to make a > difference when it comes to grab patterns in this area. A list of the > last N failures may be able to show some pattern, but that would be > like analyzing things with a lot of noise without a clear conclusion. Anyway, the workload caused by the threshold number had better be > measured before being decided (large set of relation files with a full > range of blocks corrupted, much better if these are in the OS cache > when scanned), which does not change the need of a benchmark. > > What about just adding a counter tracking the number of checksum > failures for relfilenodes in a new structure related to them (note > that I did not write PgStat_StatTabEntry)? > > If we do that, it is then possible to cross-check the failures with > tablespaces, which would point to disk areas that are more sensitive > to corruption. > If that's the concern, then perhaps the level we should be tracking things on is tablespace? We don't have any stats per tablespace today I believe, but that doesn't mean we couldn't create that. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Checksum errors in pg_stat_database
On Thu, Dec 8, 2022 at 2:35 PM Drouvot, Bertrand < bertranddrouvot...@gmail.com> wrote: > > > On 4/2/19 7:06 PM, Magnus Hagander wrote: > > On Tue, Apr 2, 2019 at 8:47 AM Michael Paquier <mailto:mich...@paquier.xyz>> wrote: > > > > On Tue, Apr 02, 2019 at 07:43:12AM +0200, Julien Rouhaud wrote: > > > On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier < > mich...@paquier.xyz <mailto:mich...@paquier.xyz>> wrote: > > >> One thing which is not > > >> proposed on this patch, and I am fine with it as a first draft, > is > > >> that we don't have any information about the broken block number > and > > >> the file involved. My gut tells me that we'd want a separate > view, > > >> like pg_stat_checksums_details with one tuple per (dboid, rel, > fork, > > >> blck) to be complete. But that's just for future work. > > > > > > That could indeed be nice. > > > > Actually, backpedaling on this one... pg_stat_checksums_details may > > be a bad idea as we could finish with one row per broken block. If > > a corruption is spreading quickly, pgstat would not be able to > sustain > > that amount of objects. Having pg_stat_checksums would allow us to > > plugin more data easily based on the last failure state: > > - last relid of failure > > - last fork type of failure > > - last block number of failure. > > Not saying to do that now, but having that in pg_stat_database does > > not seem very natural to me. And on top of that we would have an > > extra row full of NULLs for shared objects in pg_stat_database if we > > adopt the unique view approach... I find that rather ugly. > > > > > > I think that tracking each and every block is of course a non-starter, > as you've noticed. > > I think that's less of a concern now that the stats collector process has > gone and that the stats are now collected in shared memory, what do you > think? > It would be less of a concern yes, but I think it still would be a concern. If you have a large amount of corruption you could quickly get to millions of rows to keep track of which would definitely be a problem in shared memory as well, wouldn't it? But perhaps we could keep a list of "the last 100 checksum failures" or something like that? -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: More efficient build farm animal wakeup?
On Wed, Nov 23, 2022 at 9:15 AM Thomas Munro wrote: > On Wed, Nov 23, 2022 at 2:09 PM Andres Freund wrote: > > It's a huge improvement here. > > Same here. eelpout + elver looking good, just a fraction of a second > hitting that web server each minute. Long polling will be better and > shave off 30 seconds (+/- 30) on start time, but this avoids a lot of > useless churn without even needing a local mirror. Thanks Andrew! > Are you saying you still think it's worth pursuing longpoll or similar methods for it, or that this is good enough? -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: More efficient build farm animal wakeup?
On Tue, Nov 22, 2022 at 12:10 AM Magnus Hagander wrote: > > > On Mon, Nov 21, 2022 at 11:42 PM Andrew Dunstan > wrote: > >> >> On 2022-11-21 Mo 16:20, Magnus Hagander wrote: >> > n Mon, Nov 21, 2022 at 9:58 PM Tom Lane wrote: >> > >> > Andrew Dunstan writes: >> b> > The buildfarm server now creates a companion to >> > branches_of_interest.txt >> > > called branches_of_interest.json which looks like this: >> > >> > ... okay ... >> > >> > >> > Yeah, it's not as efficient as something like long polling or web >> > sockets, but it is most definitely a lot simpler! >> > >> > If we're going to have a lot of animals do pulls of this file every >> > minute or more, it's certainly a lot better to pull this small file >> > than to make multiple git calls. >> > >> > It could trivially be made even more efficient by making the request >> > with either a If-None-Match or If-Modified-Since. While it's still >> > small, that cuts the size approximately in half, and would allow you >> > to skip even more processing if nothing has changed. >> >> >> I'll look at that. >> >> >> > >> > >> > > It updates this every time it does a git fetch, currently every >> > 5 minutes. >> > >> > That up-to-five-minute delay, on top of whatever cronjob delay one >> has >> > on one's animals, seems kind of sad. I've gotten kind of spoiled >> > maybe >> > by seeing first buildfarm results typically within 15 minutes of a >> > push. >> > But if we're trying to improve matters in this area, this doesn't >> seem >> > like quite the way to go. >> > >> > But it does seem like this eliminates one expense. Now that you >> have >> > that bit, maybe we could arrange a webhook or something that allows >> > branches_of_interest.json to get updated immediately after a push? >> > >> > >> > Webhooks are definitely a lot easier to implement in between our >> > servers yeah, so that shouldn't be too hard. We could use the same >> > hooks that we use for borka to build the docs, but have it just run >> > whatever script it is the buildfarm needs. I assume it's just >> > something trivial to run there, Andrew? >> >> >> Yes, I think much better between servers. Currently the cron job looks >> something like this: >> >> >> */5 * * * * cd $HOME/postgresql.git && git fetch -q && >> $HOME/website/bin/branches_of_interest.pl >> >> >> That script is what sets up the json files. >> >> >> I know nothing about git webhooks though, someone will have to point me >> in the right direction. >> > > I can set that up for you -- we have ready-made packages for 95% of what's > needed for that one as we use it elsewhere in the infra. So I'll just set > something up that will run that exact script (as the correct user of > course) and comment out the cronjob,and then send you the details of what > is set up where (I don't recall it offhand, but as it's the same we have > elsewhere I'll find it quickly once I look into it). > > Hi! This should now be set up, and Andrew has been sent the instructions for how to access that setup on the buildfarm server. So hopefully it will now be updating the buildfarm server side of things within a couple of seconds from a commit, and not do any speculative pulls. But we'll keep an extra eye on it for a bit of course, as it's entirely possible I got something worng :) (This is only the part git -> bf server, of course, as that step doesn't need any client changes it was easier to do quickly) //Magnus
Re: More efficient build farm animal wakeup?
On Mon, Nov 21, 2022 at 11:42 PM Andrew Dunstan wrote: > > On 2022-11-21 Mo 16:20, Magnus Hagander wrote: > > n Mon, Nov 21, 2022 at 9:58 PM Tom Lane wrote: > > > > Andrew Dunstan writes: > > > The buildfarm server now creates a companion to > > branches_of_interest.txt > > > called branches_of_interest.json which looks like this: > > > > ... okay ... > > > > > > Yeah, it's not as efficient as something like long polling or web > > sockets, but it is most definitely a lot simpler! > > > > If we're going to have a lot of animals do pulls of this file every > > minute or more, it's certainly a lot better to pull this small file > > than to make multiple git calls. > > > > It could trivially be made even more efficient by making the request > > with either a If-None-Match or If-Modified-Since. While it's still > > small, that cuts the size approximately in half, and would allow you > > to skip even more processing if nothing has changed. > > > I'll look at that. > > > > > > > > > It updates this every time it does a git fetch, currently every > > 5 minutes. > > > > That up-to-five-minute delay, on top of whatever cronjob delay one > has > > on one's animals, seems kind of sad. I've gotten kind of spoiled > > maybe > > by seeing first buildfarm results typically within 15 minutes of a > > push. > > But if we're trying to improve matters in this area, this doesn't > seem > > like quite the way to go. > > > > But it does seem like this eliminates one expense. Now that you have > > that bit, maybe we could arrange a webhook or something that allows > > branches_of_interest.json to get updated immediately after a push? > > > > > > Webhooks are definitely a lot easier to implement in between our > > servers yeah, so that shouldn't be too hard. We could use the same > > hooks that we use for borka to build the docs, but have it just run > > whatever script it is the buildfarm needs. I assume it's just > > something trivial to run there, Andrew? > > > Yes, I think much better between servers. Currently the cron job looks > something like this: > > > */5 * * * * cd $HOME/postgresql.git && git fetch -q && > $HOME/website/bin/branches_of_interest.pl > > > That script is what sets up the json files. > > > I know nothing about git webhooks though, someone will have to point me > in the right direction. > I can set that up for you -- we have ready-made packages for 95% of what's needed for that one as we use it elsewhere in the infra. So I'll just set something up that will run that exact script (as the correct user of course) and comment out the cronjob,and then send you the details of what is set up where (I don't recall it offhand, but as it's the same we have elsewhere I'll find it quickly once I look into it). -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: More efficient build farm animal wakeup?
On Mon, Nov 21, 2022 at 11:41 PM Tom Lane wrote: > Andrew Dunstan writes: > > On 2022-11-21 Mo 15:58, Tom Lane wrote: > >> But if we're trying to improve matters in this area, this doesn't seem > >> like quite the way to go. > > > Well, 5 minutes was originally chosen because it was sufficient for the > > purpose for which up to now the server used its mirror. Now we have > > added a new purpose we can certainly revisit that. Shall I try 2 minutes > > or go down to 1? > > Actually, if we implement a webhook to update this, the server could > stop doing speculative git pulls too, no? > That would be the main point, yes. Saves a few hundred (or thousand) wasteful git pulls *and* reacts quicker to actual pushes. As long as you have a clear line of communications between the machines, it's basically win/win I think. That's probably why, as Thomas noticed earlier, that's what "everybody" does. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: More efficient build farm animal wakeup?
On Mon, Nov 21, 2022 at 11:27 PM Andrew Dunstan wrote: > > On 2022-11-21 Mo 16:26, Magnus Hagander wrote: > > > > > Is there a reason this file is a list of hashes each hash with a > > single value in it? Would it make more sense if it was: > > { > > "REL_11_STABLE": "140c803723", > > "REL_12_STABLE": "4cbcb7ed85", > > "REL_13_STABLE": "c13667b518", > > "REL_14_STABLE": "5cda142bb9", > > "REL_15_STABLE": "ff9d27ee2b", > > "HEAD": "51b5834cd5" > > } > > > > > No. It's the way it is because the client relies on their being in the > right order. JSON hashes are conceptually unordered. > Ah yeah, if they need to be ordered that certainly makes more sense. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: More efficient build farm animal wakeup?
On Mon, Nov 21, 2022 at 9:51 PM Andrew Dunstan wrote: > > On 2022-11-20 Su 17:32, Thomas Munro wrote: > > On Sun, Nov 20, 2022 at 2:44 AM Andrew Dunstan > wrote: > >> It might not suit your use case, but one of the things I do to reduce > >> fetch load is to run a local mirror which runs > >> > >>git fetch -q --prune > >> > >> every 5 minutes. It also runs a git daemon, and several of my animals > >> point at that. > > Thanks. I understand now that my configuration without a local mirror > > is super inefficient (it spends the first ~25s of each minute running > > git commands). Still, even though that can be improved by me setting > > up more stuff, I'd like something event-driven rather than short > > polling-based for lower latency. > > > >> If there's a better git API I'll be happy to try to use it. > > Cool. Seems like we just have to invent something first... > > > > FWIW I'm also trying to chase the short polling out of cfbot. It > > regularly harasses the git servers at one end (could be fixed with > > this approach), and wastes a percentage of our allotted CPU slots on > > the other end by scheduling periodically (could be fixed with webhooks > > from Cirrus). > > > > I think I have solved most of the actual issues without getting too > complex. > > Here's how: > > The buildfarm server now creates a companion to branches_of_interest.txt > called branches_of_interest.json which looks like this: > > [ >{ > "REL_11_STABLE" : "140c803723" >}, >{ > "REL_12_STABLE" : "4cbcb7ed85" >}, >{ > "REL_13_STABLE" : "c13667b518" >}, >{ > "REL_14_STABLE" : "5cda142bb9" >}, >{ > "REL_15_STABLE" : "ff9d27ee2b" >}, >{ > "HEAD" : "51b5834cd5" >} > ] Is there a reason this file is a list of hashes each hash with a single value in it? Would it make more sense if it was: { "REL_11_STABLE": "140c803723", "REL_12_STABLE": "4cbcb7ed85", "REL_13_STABLE": "c13667b518", "REL_14_STABLE": "5cda142bb9", "REL_15_STABLE": "ff9d27ee2b", "HEAD": "51b5834cd5" } -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: More efficient build farm animal wakeup?
n Mon, Nov 21, 2022 at 9:58 PM Tom Lane wrote: > Andrew Dunstan writes: > > The buildfarm server now creates a companion to branches_of_interest.txt > > called branches_of_interest.json which looks like this: > > ... okay ... > Yeah, it's not as efficient as something like long polling or web sockets, but it is most definitely a lot simpler! If we're going to have a lot of animals do pulls of this file every minute or more, it's certainly a lot better to pull this small file than to make multiple git calls. It could trivially be made even more efficient by making the request with either a If-None-Match or If-Modified-Since. While it's still small, that cuts the size approximately in half, and would allow you to skip even more processing if nothing has changed. > It updates this every time it does a git fetch, currently every 5 minutes. > > That up-to-five-minute delay, on top of whatever cronjob delay one has > on one's animals, seems kind of sad. I've gotten kind of spoiled maybe > by seeing first buildfarm results typically within 15 minutes of a push. > But if we're trying to improve matters in this area, this doesn't seem > like quite the way to go. > > But it does seem like this eliminates one expense. Now that you have > that bit, maybe we could arrange a webhook or something that allows > branches_of_interest.json to get updated immediately after a push? > Webhooks are definitely a lot easier to implement in between our servers yeah, so that shouldn't be too hard. We could use the same hooks that we use for borka to build the docs, but have it just run whatever script it is the buildfarm needs. I assume it's just something trivial to run there, Andrew? //Magnus
Re: More efficient build farm animal wakeup?
On Sun, Nov 20, 2022 at 4:56 AM Thomas Munro wrote: > On Sun, Nov 20, 2022 at 1:35 AM Magnus Hagander > wrote: > > tl,tr; it's not there now, but yes if we can find a smart way for th ebf > clients to consume it, it is something we could build and deploy fairly > easily. > > Cool -- it sounds a lot like you've thought about this already :-) > > About the client: currently run_branches.pl makes an HTTP request for > the "branches of interest" list. Seems like a candidate point for a > long poll? I don't think it'd have to be much smarter than it is > today, it'd just have to POST the commits it already has, I think. > Um, branches of interest will only pick up when it gets a new *branch*, not a new *commit*, so I think that would be a very different problem to solve. And I don't think we have new branche *that* often... Perhaps as a first step, the server could immediately report which > branches to bother fetching, considering the client's existing > commits. That'd almost always be none, but ~11.7 times per day a new > commit shows up, and once a year there's a new interesting branch. > That would avoid the need for the 6 git fetches that usually follow in > the common case, which admittedly might not be a change worth making > on its own. After all, the git fetches are probably quite similar > HTTP requests themselves, except that there 6 of them, one per branch, > and they hit the public git server instead of some hypothetical > buildfarm endpoint. > As Andres mentioned downthread, that's not a lot more lightweight than what "git fetch" does. The thing we'd want to avoid is having to do that so much and often. And getting to that is going to require modification of the buildfarm client to make it more "smart" regardless. In particular, making it do this "right" in the face of multiple branches is probably going to be a big win. Then you could switch to long polling by letting the client say "if > currently none, I'm prepared to wait up to X seconds for a different > answer", assuming you know how to build the server side of that > (insert magic here). Of course, you can't make it too long or your > session might be dropped in the badlands between client and server, > but that's just a reason to make X configurable. I think RFC6202 says > that 120 seconds probably works fine across most kinds of links, which > means that you lower the total poll rate hitting the server, but--more > interestingly for me as a client--you minimise latency when something > finally happens. (With various keepalive tricks and/or heartbeat > streaming tricks you could possibly make it much higher, who knows... > but you'd have to set it very very low to do worse than what we're > doing today in total request count). Or maybe there is some existing > easy perl library that could be used for this (joke answer: cpan > install Twitter::API and follow @pg_commits). > I also honestly wonder how big a problem a much longer than 120 seconds timeout would be in practice. Since we own both the client and the server in this case, we'd only be at mercy of network equipment in between and I think we're much less exposed to weirdness there than "the average browser". Thus, as long as it's configurable, I think we could go for something much longer by default. I'd imagine something like a GET https://git.postgresql.org/buildfarm-branchtips X-branch-master: a4adc31f69 X-branch-REL_14_STABLE: b33283cbd3 X-longpoll: 120 For that one it would check branch master and rel 14, and if either branchtip doesn't match what was in the header, it'd return immediately with a textfile that's basically master: if master has changed and not REL_14. If nothing has changed, go into longpoll for 120 seconds based on the header, and if nothing at all has changed in that time, return a 304. We could also use something like a websocket to just stream the changes out over. In either case it would also need to change the buildfarm client to run as a daemon rather than a cronjob I think? (obviously optional, we don't have to remove the current abilities) However, when I started this thread I was half expecting such a thing > to exist already, somewhere, I just haven't been able to find it > myself... Don't other people have this problem? Maybe everybody who > has this problem uses webhooks (git server post commit hook opens > connection to client) as you mentioned, but as you also mentioned > that'd never fly for our topology. > Yeah, webhook seems to be what most people use. FWIW, an implementation for us would be a small daemon that receives such webhooks from our git server and redistributtes it for the long polling. That's still the easiest way to get the data out of git itself... //Magnus
Re: More efficient build farm animal wakeup?
On Sat, Nov 19, 2022 at 4:13 AM Thomas Munro wrote: > Hi, > > Is there a way to find out about new git commits that is more > efficient and timely than running N git fetches or whatever every > minute in a cron job? Maybe some kind of long polling where you send > an HTTP request that says "I think the tips of branches x, y, z are at > 111, 222, 333" and the server responds when that ceases to be true? > I'm not aware of any such thing standardized for git, but it wouldn't be hard to build one for that (I'm talking primarily about the server side here, not how to integrate that into the buildfarm side of things). We could also set something up whereby we could fire off webhooks when branches change (easy enough for registered servers in the buildfarm as we can easily avoid abuse there -- it would take more work to make something like that a public service, due to the risk of abuse). But that may actually be worse off, since I bet a lot of buildfarm animals (most even?) are probably sitting behind a NAT gateway of some kind, meaning consuming webhooks is hard. I did something similar for how we did things on borka (using some internal pginfra webhooks that are not available to the public at this time), but I had to revert that because of issues with concurrent buildfarm runs in the environment that we had set up. But we are using it for the snapshots docs builder, to make sure the website for that gets updated immediately after a commit on master. But the principle definitely work. Another thing to consider would be that something like this would cause all buildfarm clients to start git pull:ing down changes at more or less exactly the same time. Though in total that would probably still mean a lot less load than those that "git pul" very frequently today, it could potentially lead to some nets with lots of bf clients experiencing some level of bandwidth filling or something. Could probably be solved pretty easily with a random delay (which doesn't have to be long, as for most git pulls it will be a very quick operation), just something that's worth considering. tl,tr; it's not there now, but yes if we can find a smart way for th ebf clients to consume it, it is something we could build and deploy fairly easily. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Commit fest 2022-11
On Tue, Nov 1, 2022 at 11:59 AM Michael Paquier wrote: > On Tue, Nov 01, 2022 at 07:10:15PM +0900, Ian Lawrence Barwick wrote: > > Yup, Monday by the looks of it. > > > > My community login is "barwick", in case needed. > > Adding Magnus in CC, in case you need the admin permissions on the CF > app. (I have no idea how to do it, and I likely lack the permissions > to do that anyway.) > > Permissions added! -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: future of serial and identity columns
On Fri, Oct 7, 2022 at 2:03 PM Vik Fearing wrote: > On 10/4/22 09:41, Peter Eisentraut wrote: > > In PostgreSQL 10, we added identity columns, as an alternative to serial > > columns (since 6.something). They mostly work the same. Identity > > columns are SQL-conforming, have some more features (e.g., overriding > > clause), and are a bit more robust in schema management. Some of that > > was described in [0]. AFAICT, there have been no complaints since that > > identity columns lack features or are somehow a regression over serial > > columns. > > > > But clearly, the syntax "serial" is more handy, and most casual examples > > use that syntax. So it seems like we are stuck with maintaining these > > two variants in parallel forever. I was thinking we could nudge this a > > little by remapping "serial" internally to create an identity column > > instead. At least then over time, the use of the older serial > > mechanisms would go away. > > > > Note that pg_dump dumps a serial column in pieces (CREATE SEQUENCE + > > ALTER SEQUENCE ... OWNED BY + ALTER TABLE ... SET DEFAULT). So if we > > did this, any existing databases would keep their old semantics, and > > those who really need it can manually create the old semantics as well. > > > > Attached is a demo patch how the implementation of this change would > > look like. This creates a bunch of regression test failures, but > > AFAICT, those are mainly display differences and some very peculiar test > > setups that are intentionally examining some edge cases. These would > > need to be investigated in more detail, of course. > > I haven't tested the patch yet, just read it. > > Is there any reason to use BY DEFAULT over ALWAYS? I tend to prefer the > latter. > I would assume to maintain backwards compatibility with the semantics of SERIAL today? I do also prefer ALWAYS, but that would make it a compatibility break. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>
Re: Ubuntu 16.04: Xenial: Why was it removed from the apt repo?
Hello! Because it has been removed and moved to the archives, as per the warning from early July. See https://www.postgresql.org/message-id/flat/YsV8fmomNNC%2BGpIR%40msg.credativ.de //Magnus On Mon, Sep 19, 2022 at 4:46 PM Larry Rosenman wrote: > > All of a sudden I'm getting repo not found for > Ubuntu 16.04 LTS on the APT repo. Why? > > -- > Larry Rosenman http://www.lerctr.org/~ler > Phone: +1 214-642-9640 E-Mail: l...@lerctr.org > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 > >
Re: windows resource files, bugs and what do we actually want
On Fri, Sep 2, 2022 at 3:26 AM Andres Freund wrote: > Hi, > > On 2022-08-29 15:13:14 -0700, Andres Freund wrote: > > 1) For make based builds, all libraries that are built with MODULES > rather > >than MODULES_big have the wrong "FILETYPE", because Makefile.win32 > checks > >$(shlib), which is only set for MODULES_big. > > > >This used to be even more widely wrong until recently: > > > >commit 16a4a3d59cd5574fdc697ea16ef5692ce34c54d5 > >Author: Peter Eisentraut > >Date: 2020-01-15 10:15:06 +0100 > > > >Remove libpq.rc, use win32ver.rc for libpq > > > >Afaict before that we only set it correctly for pgevent. > > > > 2) For make base builds, We only set InternalName, OriginalFileName when > >$shlib is set, but InternalName, OriginalFilename are required. > > > > > https://docs.microsoft.com/en-us/windows/win32/menurc/versioninfo-resource > > > > These are harder to fix than was immediately obvious to me. We generate one > win32ver.rc per directory, even if a directory contains multiple build > products (think MODULES or src/bin/scripts). So we simply can't put a > correct > filename etc into the .rc file, unless we change the name of the .rc file. > Eeep. Yeah, that may be the reasoning behind some of how it was in the past. > > I looked into how hard it would be to fix this on the make side, and > decided > it's too hard. I'm inclined to leave this alone and fix it later in the > meson > port. > Agreed. -- Magnus Hagander Me: https://www.hagander.net/ <http://www.hagander.net/> Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>