Re: pltcl crashes due to a syntax error
Erik Wienhold writes: > Tcl_GetVar returns null if errorInfo does not exist. Omitting econtext > from errcontext in that case looks like the proper fix to me. Yeah, that was the conclusion I came to last night while sitting in the airport, but I didn't have time to prepare a cleaned-up patch. The new bit of information that this bug report provides is that it's possible to get a TCL_ERROR result without Tcl having set errorInfo. That seems a tad odd, and it must happen only in weird corner cases, else we'd have heard of this decades ago. Not sure if it's worth trying to characterize those cases further, however. > Or just do away with throw_tcl_error and call ereport directly. I'd say this adds to the importance of having throw_tcl_error, because now it's even more complex than before, and there are multiple call sites. > Compare > that to pltcl_trigger_handler where the same case is handled like this: Hm, I wonder why that's not using throw_tcl_error. I guess because it wants to give its own primary message, but still ... regards, tom lane
Re: meson and check-tests
"Tristan Partin" writes: > On Fri May 31, 2024 at 12:02 PM CDT, Ashutosh Bapat wrote: >> We talked this off-list at the conference. It seems we have to somehow >> avoid passing pg_regress --schedule argument and instead pass the list of >> tests. Any idea how to do that? > I think there are 2 solutions to this. > 1. Avoid passing --schedule by default, which doesn't sound like a great >solution. > 2. Teach pg_regress to ignore the --schedule option if specific tests >are passed instead. > 3. Add a --no-schedule option to pg_regress which would override the >previously added --schedule option. > I personally prefer 2 or 3. Just to refresh peoples' memory of what the Makefiles do: src/test/regress/GNUmakefile has check: all $(pg_regress_check) $(REGRESS_OPTS) --schedule=$(srcdir)/parallel_schedule $(MAXCONNOPT) $(EXTRA_TESTS) check-tests: all | temp-install $(pg_regress_check) $(REGRESS_OPTS) $(MAXCONNOPT) $(TESTS) $(EXTRA_TESTS) (and parallel cases for installcheck etc). AFAICS, meson.build has no equivalent to the EXTRA_TESTS add-on, nor does it have behavior equivalent to check-tests' substitution of $(TESTS) for --schedule. But I suggest that those behaviors have stood for a long time and so the appropriate thing to do is duplicate them as best we can, not invent something different. regards, tom lane
Re: pltcl crashes due to a syntax error
"a.kozhemyakin" writes: > When executing the following query on master branch: > CREATE EXTENSION pltcl; > CREATE or replace PROCEDURE test_proc5(INOUT a text) > LANGUAGE pltcl > AS $$ > set aa [concat $1 "+" $1] > return [list $aa $aa]) > $$; > CALL test_proc5('abc'); > CREATE EXTENSION > CREATE PROCEDURE > server closed the connection unexpectedly Replicated here. I'll look into it later if nobody beats me to it. Thanks for the report! regards, tom lane
Re: Explicit specification of index ensuring uniqueness of foreign columns
Kaiting Chen writes: > I'd like to resurrect a subset of my proposal in [1], specifically that: > The FOREIGN KEY constraint syntax gains a [ USING INDEX index_name ] clause > optionally following the referenced column list. > ... > While, in this minimal reproduction, the two indexes are interchangeable, > there > are situations that may reasonably occur in the course of ordinary use in > which > they aren't. For example, a redundant unique index with different storage > parameters may exist during the migration of an application schema. I agree that there's a hazard there, but I question if the case is sufficiently real-world to justify the costs of introducing a non-SQL-standard clause in foreign key constraints. One such cost is that pg_dump output would become less able to be loaded into other DBMSes, or even into older PG versions. I also wonder if this wouldn't just trade one fragility for another. Specifically, I am not sure that we guarantee that the names of indexes underlying constraints remain the same across dump/reload. If they don't, the USING INDEX clause might fail unnecessarily. As against that, I'm not sure I've ever seen a real-world case with intentionally-duplicate unique indexes. So on the whole I'm unconvinced that this is worth changing. regards, tom lane
Re: pgsql: Add more SQL/JSON constructor functions
Amit Langote writes: > On Mon, May 27, 2024 at 7:10 PM Alvaro Herrera > wrote: >> On 2024-May-27, Alvaro Herrera wrote: >> I just noticed this behavior, which looks like a bug to me: >> >> select json_serialize('{"a":1, "a":2}' returning varchar(5)); >> json_serialize >> >> {"a": >> >> I think this function should throw an error if the destination type >> doesn't have room for the output json. Otherwise, what good is the >> serialization function? > This behavior comes from using COERCE_EXPLICIT_CAST when creating the > coercion expression to convert json_*() functions' argument to the > RETURNING type. Yeah, I too think this is a cast, and truncation is the spec-defined behavior for casting to varchar with a specific length limit. I see little reason that this should work differently from select json_serialize('{"a":1, "a":2}' returning text)::varchar(5); json_serialize {"a": (1 row) regards, tom lane
Re: Need clarification on compilation errors in PG 16.2
Pradeep Kumar writes: > Yes it was defined in "pg_config.h" > /* Define to 1 if you have the `memset_s' function. */ > #define HAVE_MEMSET_S 1 That's correct for recent versions of macOS. I see you are building against a recent SDK: /Library/Developer/CommandLineTools/SDKs/MacOSX14.2.sdk/usr/include/string.h but I wonder if maybe the actual OS version is back-rev? regards, tom lane
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Hannu Krosing writes: > Attached is a minimal patch to allow missing roles in REVOKE command FTR, I think this is a very bad idea. It might be OK if we added some kind of IF EXISTS option, but I'm not eager about that concept either. The right thing here is to fix the backend so that pg_dump doesn't see these bogus ACLs. regards, tom lane
Re: DISCARD ALL does not force re-planning of plpgsql functions/procedures
Jelte Fennema-Nio writes: > I got a report on the PgBouncer repo[1] that running DISCARD ALL was > not sufficient between connection handoffs to force replanning of > stored procedures. Turns out that while DISCARD AL and DISCARD PLAN > reset the plan cache they do not reset the num_custom_plans fields of > the existing PlanSources. So while the generic plan is re-planned > after DISCARD ALL, the decision on whether to choose it or not won't > be changed. Hm, should it be? That's hard-won knowledge, and I'm not sure there is a good reason to believe it's no longer applicable. Note that any change in behavior there would affect prepared statements in general, not only plpgsql. regards, tom lane
Re: Fix an incorrect assertion condition in mdwritev().
I wrote: > Hmm ... I agree that this is better normally. But there's an > edge case where it would fail to notice a problem that the > existing code does notice: if blocknum is close to UINT32_MAX > and adding nblocks causes it to wrap around to a small value. > Is there an inexpensive way to catch that? After a few minutes' thought, how about: Assert((uint64) blocknum + (uint64) nblocks <= (uint64) mdnblocks(reln, forknum)); This'd stop being helpful if we ever widen BlockNumber to 64 bits, but I think that's unlikely. (Partitioning seems like a better answer for giant tables.) regards, tom lane
Re: Fix an incorrect assertion condition in mdwritev().
Michael Paquier writes: > On Sat, May 25, 2024 at 11:52:22PM +0800, Xing Guo wrote: >> #ifdef CHECK_WRITE_VS_EXTEND >> -Assert(blocknum < mdnblocks(reln, forknum)); >> +Assert(blocknum + nblocks <= mdnblocks(reln, forknum)); >> #endif > Yes, it looks like you're right that this can be made stricter, > computing the number of blocks we're adding in the number calculated > (aka adding one block to this number fails immediately at initdb). Hmm ... I agree that this is better normally. But there's an edge case where it would fail to notice a problem that the existing code does notice: if blocknum is close to UINT32_MAX and adding nblocks causes it to wrap around to a small value. Is there an inexpensive way to catch that? (If not, it's not a reason to block this patch; but let's think about it while we're here.) regards, tom lane
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Hannu Krosing writes: > Having an pg_init_privs entry referencing a non-existing user is > certainly of no practical use. Sure, that's not up for debate. What I think we're discussing right now is 1. What other cases are badly handled by the pg_init_privs mechanisms. 2. How much of that is practical to fix in v17, seeing that it's all long-standing bugs and we're already past beta1. I kind of doubt that the answer to #2 is "all of it". But perhaps we can do better than "none of it". regards, tom lane
Re: Schema variables - new implementation for Postgres 15
Pavel Stehule writes: > we can introduce special safe mode started by > set enable_direct_variable_read to off; > and allowing access to variables only by usage dedicated function > (supported by parser) named like variable or pg_variable Didn't we learn twenty years ago that GUCs that change query semantics are an awful idea? Pick a single access method for these things and stick to it. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Tomas Vondra writes: > I personally don't think the FOSDEM triage is a very productive use of > our time - we go through patches top to bottom, often with little idea > what the current state of the patch is. We always ran out of time after > looking at maybe 1/10 of the list. > Having an in-person discussion about patches would be good, but I think > we should split the meeting into much smaller groups for that, each > looking at a different subset. And maybe it should be determined in > advance, so that people can look at those patches in advance ... Yeah, subgroups of 3 or 4 people sounds about right. And definitely some advance looking to see which patches need discussion. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Joe Conway writes: > On 5/24/24 15:45, Tom Lane wrote: >> I was *not* proposing doing a regular review, unless of course >> somebody really wants to do that. What I am thinking about is >> suggesting how to make progress on patches that are stuck, or in some >> cases delivering the bad news that this patch seems unlikely to ever >> get accepted and it's time to cut our losses. (Patches that seem to >> be moving along in good order probably don't need any attention in >> this process, beyond determining that that's the case.) That's why >> I think we need some senior people doing this, as their opinions are >> more likely to be taken seriously. > Maybe do a FOSDEM-style dev meeting with triage review at PG.EU would at > least move us forward? Granted it is less early and perhaps less often > than the thread seems to indicate, but has been tossed around before and > seems doable. Perhaps. The throughput of an N-person meeting is (at least) a factor of N less than the same N people looking at patches individually. On the other hand, the consensus of a meeting is more likely to be taken seriously than a single person's opinion, senior or not. So it could work, but I think we'd need some prefiltering so that the meeting only spends time on those patches already identified as needing help. regards, tom lane
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Robert Haas writes: > On Fri, May 24, 2024 at 2:57 PM Tom Lane wrote: >> Doesn't seem right to me. That will give pg_dump the wrong idea >> of what the initial privileges actually were, and I don't see how >> it can construct correct delta GRANT/REVOKE on the basis of false >> information. During the dump reload, the extension will be >> recreated with the original owner (I think), causing its objects' >> privileges to go back to the original pg_init_privs values. > Oh! That does seem like it would make what I said wrong, but how would > it even know who the original owner was? Shouldn't we be recreating > the object with the owner it had at dump time? Keep in mind that the whole point here is for the pg_dump script to just say "CREATE EXTENSION foo", not to mess with the individual objects therein. So the objects are (probably) going to be owned by the user that issued CREATE EXTENSION. In the original conception, that was the end of it: what you got for the member objects was whatever state CREATE EXTENSION left behind. The idea of pg_init_privs is to support dump/reload of subsequent manual alterations of privileges for extension-created objects. I'm not, at this point, 100% certain that that's a fully realizable goal. But I definitely think it's insane to expect that to work without also tracking changes in the ownership of said objects. Maybe forbidding ALTER OWNER on extension-owned objects isn't such a bad idea? regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Tomas Vondra writes: > On 5/24/24 19:09, Tom Lane wrote: >>>> ... Maybe we could divide and conquer: get a >>>> dozen-or-so senior contributors to split up the list of pending >>>> patches and then look at each one with an eye to what needs to >>>> happen to move it along (*not* to commit it right away, although >>>> in some cases maybe that's the thing to do). > I think meeting all these conditions - a week early in the cycle, but > not in the summer, when everyone can focus on this - will be difficult. True. Perhaps in the fall there'd be a better chance? > So maybe it'd be better to just set some deadline by which this needs to > be done, and make sure every pending patch has someone expected to look > at it? IMHO we're not in position to assign stuff to people, so I guess > people would just volunteer anyway - the CF app might track this. One problem with a time-extended process is that the set of CF entries is not static, so a predetermined division of labor will result in missing some newly-arrived entries. Maybe that's not a problem though; anything newly-arrived is by definition not "stuck". But we would definitely need some support for keeping track of what's been looked at and what remains, whereas if it happens over just a few days that's probably not so essential. > It's not entirely clear to me if this would effectively mean doing a > regular review of those patches, or something less time consuming. I was *not* proposing doing a regular review, unless of course somebody really wants to do that. What I am thinking about is suggesting how to make progress on patches that are stuck, or in some cases delivering the bad news that this patch seems unlikely to ever get accepted and it's time to cut our losses. (Patches that seem to be moving along in good order probably don't need any attention in this process, beyond determining that that's the case.) That's why I think we need some senior people doing this, as their opinions are more likely to be taken seriously. regards, tom lane
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
The partition_split test has unstable results, as shown at [1]. I suggest adding "ORDER BY conname" to the two queries shown to fail there. Better look at other queries in the test for possible similar problems, too. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jackdaw=2024-05-24%2015%3A58%3A17
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Robert Haas writes: > On Fri, May 24, 2024 at 11:59 AM Tom Lane wrote: >> So this goal seems to >> mean that neither ALTER OWNER nor REASSIGN OWNED should touch >> pg_init_privs at all, as that would break its function of recording >> a historical state. Only DROP OWNED should get rid of pg_init_privs >> grants, and that only because there's no choice -- if the role is >> about to go away, we can't hang on to a reference to its OID. > But I would have thought that the right thing to do to pg_init_privs > here would be essentially s/$OLDOWNER/$NEWOWNER/g. Doesn't seem right to me. That will give pg_dump the wrong idea of what the initial privileges actually were, and I don't see how it can construct correct delta GRANT/REVOKE on the basis of false information. During the dump reload, the extension will be recreated with the original owner (I think), causing its objects' privileges to go back to the original pg_init_privs values. Applying a delta that starts from some other state seems pretty questionable in that case. It could be that if we expect pg_dump to issue an ALTER OWNER to move ownership of the altered extension object to its new owner, and only then apply its computed delta GRANT/REVOKEs, then indeed the right thing is for the original ALTER OWNER to apply s/$OLDOWNER/$NEWOWNER/g to pg_init_privs. I've not thought this through in complete detail, but it feels like that might work, because the reload-time ALTER OWNER would apply exactly that change to both the object's ACL and its pg_init_privs, and then the delta is starting from the right state. Of course, pg_dump can't do that right now because it lacks the information that such an ALTER is needed. Although ... this is tickling a recollection that pg_dump doesn't try very hard to run CREATE EXTENSION with the same owner that the extension had originally. That's a leftover from the times when basically all extensions required superuser to install, and of course one superuser is as good as the next. There might be some work we have to do on that side too if we want to up our game in this area. Another case that's likely not handled well is what if the extension really shouldn't have its original owner (e.g. you're using --no-owner). If it's restored under a new owner then the pg_init_privs data certainly doesn't apply, and it feels like it'd be mostly luck if the precomputed delta GRANT/REVOKEs lead to a state you like. regards, tom lane
Re: PG17beta1: Unable to test Postgres on Fedora due to fatal Error in psql: undefined symbol: PQsocketPoll
Hans Buschmann writes: > When I tried to connect to the restored database with psql \c I got: > ... > postgres=# \c cpsdb > pgbeta/bin/psql: symbol lookup error: pgbeta/bin/psql: undefined symbol: > PQsocketPoll > (To my understanding) the problem comes from incompatible libpq.so libraries > on the system. Right, you must have a v16-or-earlier libpq lying around somewhere, and psql has bound to that not to the beta-test version. PQsocketPoll is new in v17. > - Why doesn't psql use the just created lib64/libpq.so.5.17 from ninja > install? It's on you to ensure that happens, especially on Linux systems which have a strong bias towards pulling libraries from /usr/lib[64]. Normally our --enable-rpath option is sufficient; while that's default in an autoconf-based build, I'm not sure that it is in a meson build. Also, if your beta libpq is not where the rpath option expected it to get installed, the linker will silently fall back to /usr/lib[64]. > The loading of the locally available libpq.so should always have priority > over a system wide in /usr/lib64 Tell it to the Linux developers --- they think the opposite. Likewise, all of your other proposals need to be addressed to the various distros' packagers; this is not the place to complain. The main thing that is bothering me about the behavior you describe is that it didn't fail until psql actually tried to call PQsocketPoll. (AFAICT from a quick look, that occurs during \c but not during the startup connection.) I had thought that we select link options that result in early binding and hence startup-time failure for a case like this. I can confirm though that this acts as described on my RHEL8 box if I force current psql to link to v16 libpq, so either we've broken that or it never did apply to frontend programs. But it doesn't seem to me to be a great thing for it to behave like this. You could easily miss that you have a broken setup until after you deploy it. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Tomas Vondra writes: > On 5/20/24 16:16, Robert Haas wrote: >> On Sun, May 19, 2024 at 3:18 PM Tom Lane wrote: >>> * Before starting this thread, Robert did a lot of very valuable >>> review of some individual patches. I think what prompted him to >>> start the thread was the realization that he'd only made a small >>> dent in the problem. Maybe we could divide and conquer: get a >>> dozen-or-so senior contributors to split up the list of pending >>> patches and then look at each one with an eye to what needs to >>> happen to move it along (*not* to commit it right away, although >>> in some cases maybe that's the thing to do). It'd be great if >>> that could happen just before each commitfest, but that's probably >>> not practical with the current patch volume. What I'm thinking >>> for the moment is to try to make that happen once a year or so. >> I like this idea. > Me too. Do you think it'd happen throughout the whole year, or at some > particular moment? I was imagining a focused community effort spanning a few days to a week. It seems more likely to actually happen if we attack it that way than if it's spread out as something people will do when they get around to it. Of course the downside is finding a week when everybody is available. > We used to do a "triage" at the FOSDEM PGDay meeting, but that used to > be more of an ad hoc thing to use the remaining time, with only a small > subset of people. But that seems pretty late in the dev cycle, I guess > we'd want it to happen early, perhaps during the first CF? Yeah, early in the cycle seems more useful, although the summer might be the worst time for peoples' availability. regards, tom lane
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Daniel Gustafsson writes: > On 24 May 2024, at 16:20, Tom Lane wrote: >> Another point: shdepReassignOwned explicitly does not touch grants >> or default ACLs. It feels like the same should be true of >> pg_init_privs entries, > Agreed, I can't see why pg_init_privs should be treated differently. Thinking about this some more: the point of pg_init_privs is to record an object's privileges as they stood at the end of CREATE EXTENSION (or extension update), with the goal that pg_dump should be able to compute the delta between that and the object's current privileges and emit GRANT/REVOKE commands to restore those current privileges after a fresh extension install. (We slide gently past the question of whether the fresh extension install is certain to create privileges matching the previous pg_init_privs entry.) So this goal seems to mean that neither ALTER OWNER nor REASSIGN OWNED should touch pg_init_privs at all, as that would break its function of recording a historical state. Only DROP OWNED should get rid of pg_init_privs grants, and that only because there's no choice -- if the role is about to go away, we can't hang on to a reference to its OID. However ... then what are the implications of doing ALTER OWNER on an extension-owned object? Is pg_dump supposed to recognize that that's happened and replay it too? If not, is it sane at all to try to restore the current privileges, which are surely dependent on the current owner? I kind of doubt that that's possible at all, and even if it is it might result in security issues. It seems like pg_init_privs has missed a critical thing, which is to record the original owner not only the original privileges. (Alternatively, maybe we should forbid ALTER OWNER on extension-owned objects? Or at least on those having pg_init_privs entries?) I'm wondering too about this scenario: 1. CREATE EXTENSION installs an object and sets some initial privileges. 2. DBA manually modifies the object's privileges. 3. ALTER EXTENSION UPDATE further modifies the object's privileges. I think what will happen is that at the end of ALTER EXTENSION, we'll store the object's current ACL verbatim in pg_init_privs, therefore including the effects of step 2. This seems undesirable, but I'm not sure how to get around it. Anyway, this is starting to look like the sort of can of worms best not opened post-beta1. v17 has made some things better in this area, and I don't think it's made anything worse; so maybe we should declare victory for the moment and hope to address these additional concerns later. I've added an open item though. regards, tom lane
Re: Convert node test compile-time settings into run-time parameters
Peter Eisentraut writes: > Ok, I have an improved plan. I'm wrapping all the code related to this > in #ifdef DEBUG_NODE_TESTS_ENABLED. This in turn is defined in > assert-enabled builds, or if you define it explicitly, or if you define > one of the legacy individual symbols. That way you get the run-time > settings in a normal development build, but there is no new run-time > overhead. This is the same scheme that we use for debug_discard_caches. +1; this addresses my concern about not adding effectively-dead code to production builds. Your point upthread about debug_print_plan and other legacy debug switches was not without merit; should we also fold those into this plan? (In that case we'd need a symbol named more generically than DEBUG_NODE_TESTS_ENABLED.) > (An argument could be made to enable this code if and only if assertions > are enabled, since these tests are themselves kind of assertions. But I > think having a separate symbol documents the purpose of the various code > sections better.) Agreed. >> Maybe not three settings, but a single setting, with multiple values, like >> debug_io_direct? > Yeah, good idea. Let's get some more feedback on this before I code up > a complicated list parser. Kinda doubt it's worth the trouble, either to code the GUC support or to use it. I don't object to having the booleans in a debug build, I was just concerned about whether they should exist in production. regards, tom lane
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Daniel Gustafsson writes: > I had a look, but I didn't beat you to a fix since it's not immediately clear > to me how this should work for REASSING OWNED (DROP OWNED seems a simpler > case). Should REASSIGN OWNED alter the rows in pg_shdepend matching init > privs > from SHARED_DEPENDENCY_OWNER to SHARED_DEPENDENCY_INITACL, so that these can > be > mopped up with a later DROP OWNED? Trying this in a POC patch it fails with > RemoveRoleFromInitPriv not removing the rows, shortcircuiting that for a test > seems to make it work but is it the right approach? I've tentatively concluded that I shouldn't have modeled SHARED_DEPENDENCY_INITACL so closely on SHARED_DEPENDENCY_ACL, in particular the decision that we don't need such an entry if there's also SHARED_DEPENDENCY_OWNER. I think one reason we can get away with omitting a SHARED_DEPENDENCY_ACL entry for the owner is that the object's normal ACL is part of its primary catalog row, so it goes away automatically if the object is dropped. But obviously that's not true for a pg_init_privs entry. I can see two routes to a solution: 1. Create SHARED_DEPENDENCY_INITACL, if applicable, whether the role is the object's owner or not. Then, clearing out the pg_shdepend entry cues us to go delete the pg_init_privs entry. 2. Just always search pg_init_privs for relevant entries when dropping an object. I don't especially like #2 on performance grounds, but it has a lot fewer moving parts than #1. In particular, there's some handwaving in changeDependencyOnOwner() about why we should drop SHARED_DEPENDENCY_ACL when changing owner, and I've not wrapped my head around how those concerns map to INITACL if we treat it in this different way. Another point: shdepReassignOwned explicitly does not touch grants or default ACLs. It feels like the same should be true of pg_init_privs entries, or at least if not, why not? In that case there's nothing to be done in shdepReassignOwned (although maybe its comments should be adjusted to mention this explicitly). The bug is just that DROP OWNED isn't getting rid of the entries because there's no INITACL entry to cue it to do so. Another thing I'm wondering about right now is privileges on global objects (roles, databases, tablespaces). The fine manual says "Although an extension script is not prohibited from creating such objects, if it does so they will not be tracked as part of the extension". Presumably, that also means that no pg_init_privs entries are made; but do we do that correctly? Anyway, -ENOCAFFEINE for the moment. I'll look more later. regards, tom lane
Re: PG catalog
"Karki, Sanjay" writes: > I need to grant select on privilege in pg_catalog to user so I can connect > via Toad Data point , Why do you think you need to do that? Most catalogs have public select privilege already, and for the ones that don't, there are very good reasons why not. I don't know what "Toad Data point" is, but if it thinks it needs more privilege than is normally granted, you should be asking very pointed questions about why and why that shouldn't be considered a security breach. (Usually we get complaints that the default permissions on the catalogs are too loose, not too tight.) regards, tom lane
Re: struct RelOptInfo member relid comments
Ashutosh Bapat writes: > OJ is an outer join, AFAIU. OJ's have their own relids. If you are > wondering why not all joins - I think inner joins need not be tracked as > separated relations in parse tree, but OJ's need to be. An outer join is necessarily associated with explicit JOIN syntax in the FROM clause, and each such JOIN has its own rangetable entry and hence a relid. Inner joins might arise from comma syntax (that is, "SELECT ... FROM tab1, tab2"). For perhaps-historical reasons that syntax doesn't give rise to an explicit RTE_JOIN rangetable entry, so the implied join has no relid. regards, tom lane
Re: struct RelOptInfo member relid comments
jian he writes: > imho, the above comment is not very helpful. > we should say more about what kind of information relid says about a base rel? "Relid" is defined at the very top of the file: /* * Relids * Set of relation identifiers (indexes into the rangetable). */ typedef Bitmapset *Relids; Repeating that everyplace the term "relid" appears would not be tremendously helpful. regards, tom lane
Re: First draft of PG 17 release notes
Bruce Momjian writes: > On Thu, May 23, 2024 at 02:27:07PM +1200, David Rowley wrote: >> I also don't agree these should be left to "Source code" section. I >> feel that section is best suited for extension authors who might care >> about some internal API change. I'm talking of stuff that makes some >> user queries possibly N times (!) faster. Surely "Source Code" isn't >> the place to talk about that? > I said a new section "after our 'Source code' section," not in the > source code section. Surely, if we make this a separate section, it would come before 'Source code'? I am not sure Bruce that you realize that your disregard for performance improvements is shared by nobody. Arguably, performance is 90% of what we do these days, and it's also 90% of what users care about. regards, tom lane
Re: DROP OWNED BY fails to clean out pg_init_privs grants
Hannu Krosing writes: > While the 'DROP OWNED BY fails to clean out pg_init_privs grants' > issue is now fixed,we have a similar issue with REASSIGN OWNED BY that > is still there: Ugh, how embarrassing. I'll take a look tomorrow, if no one beats me to it. regards, tom lane
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Robert Haas writes: > I think part of the reason we ended up with the protocol parameters = > GUCs thing is because you seemed to be concurring with that approach > upthread. I think it was Jelte's idea originally, but I interpreted > some of your earlier remarks to be supporting it. I'm not sure whether > you've revised your opinion, or just refined it, or whether we > misinterpreted your earlier remarks. I don't recall exactly what I thought earlier, but now I think we'd be better off with separate infrastructure. guc.c is unduly complex already. Perhaps there are bits of it that could be factored out and shared, but I bet not a lot. regards, tom lane
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Jacob Champion writes: > Would it be good to expand on that idea of criticality? IIRC one of > Jelte's complaints earlier was that middleware has to know all the > extension types anyway, to be able to figure out whether it has to do > something about them or not. HTTP has the concept of hop-by-hop vs > end-to-end headers for related reasons. Yeah, perhaps. We'd need to figure out just which classes we need to divide protocol parameters into, and then think about a way for code to understand which class a parameter falls into even when it doesn't specifically know that parameter. That seems possible though. PNG did it with spelling rules for the chunk labels. Here, since we don't yet have any existing _pq_.xxx parameter names, we could maybe say that the names shall follow a pattern like "_pq_.class.param". (That works only if the classes are non-overlapping, an assumption not yet justified by evidence; but we could do something more complicated if we have to.) regards, tom lane
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
Jelte Fennema-Nio writes: > On Fri, 17 May 2024 at 21:24, Robert Haas wrote: >> Perhaps these are all minority positions, but I can't tell you what >> everyone thinks, only what I think. > I'd love to hear some opinions from others on these design choices. So > far it seems like we're the only two that have opinions on this (which > seems hard to believe) and our opinions are clearly conflicting. And > above all I'd like to move forward with this, be it my way or yours > (although I'd prefer my way of course ;) ) I got around to looking through this thread in preparation for next week's patch review session. I have a couple of opinions to offer: 1. Protocol versions suck. Bumping them is seldom a good answer, and should never be done if you have any finer-grained negotiation mechanism available. My aversion to this is over thirty years old: I learned that lesson from watching the GIF87-to-GIF89 transition mess. Authors of GIF-writing tools tended to take the easy way out and write "GIF89" in the header whether they were actually using any of the new version's features or not. This led to an awful lot of pictures that couldn't be read by available GIF-displaying tools, for no good reason whatsoever. The PNG committee, a couple years later, reacted to that mess by designing PNG to have no version number whatsoever, and yet be extensible in a fine-grained way. (Basically, a PNG file is made up of labeled chunks. If a reader doesn't recognize a particular chunk code, it can still tell whether the chunk is "critical" or not, and thereby decide if it must give up or can proceed while ignoring that chunk.) So overall, I have a strong preference for using the _pq_.xxx mechanism instead of a protocol version bump. I do not believe the latter has any advantage. 2. I share Robert's suspicion of equating protocol parameters with GUCs. The GUC mechanism is quite opinionated and already serves multiple masters. In particular, the fact that GUC settings are normally transactional does not play nice with the way protocol parameters need to behave. Yeah, no doubt you could add another dollop of complexity to guc.c to make parameters work differently from other GUCs, but I think it's the wrong design direction. We should handle protocol parameters with a separate mechanism. It's not, for instance, clear to me that protocol parameters should be exposed at the SQL level at all; but if we don't feel they need to be available via SHOW and pg_settings, what benefit is guc.c really bringing to the table? regards, tom lane
Re: about cross-compiling issue
"=?utf-8?B?6ZmI5Lqa5p2w?=" <1441147...@qq.com> writes: > Hello, I have a question about cross-compiling. I get an error when doing > initdb for postgresql for arm64 architecture devices. > The error information is Error relocating > /data/postgresql/postgresql-16.3-arm64-v8a-build/tmp_install/usr/postgresql/arm64-v8a/lib/dict_snowball.so: > palloc0: symbol not found. We don't really support cross-compiling, because there are too many things that the configure script can't check for if it can't run a test program. In this particular case I think what is biting you is that configure won't add -Wl,--export-dynamic to the backend link switches. You might think that that shouldn't require a test program to verify, but c-compiler.m4 says differently: # Given a string, check if the compiler supports the string as a # command-line option. If it does, add to the given variable. # For reasons you'd really rather not know about, this checks whether # you can link to a particular function, not just whether you can link. # In fact, we must actually check that the resulting program runs :-( This check dates to 2008, and maybe it's no longer necessary on any modern system, but I'm unexcited about trying to find out. regards, tom lane
Re: Sort operation displays more tuples than it contains its subnode
"a.rybakina" writes: > I faced the issue, when the sorting node in the actual information > shows a larger number of tuples than it actually is. And I can not > understand why? If I'm reading this correctly, the sort node you're worrying about feeds the inner side of a merge join. Merge join will rewind its inner side to the start of the current group of equal-keyed tuples whenever it sees that the next outer tuple must also be joined to that group. Since what EXPLAIN is counting is the number of tuples returned from the node, that causes it to double-count those tuples. The more duplicate-keyed tuples on the outer side, the bigger the effect. You can see the same thing happening at the Materialize a little further up, which is feeding the inside of the other merge join. regards, tom lane
Re: Schema variables - new implementation for Postgres 15
Peter Eisentraut writes: > On 18.05.24 13:29, Alvaro Herrera wrote: >> I want to note that when we discussed this patch series at the dev >> meeting in FOSDEM, a sort-of conclusion was reached that we didn't want >> schema variables at all because of the fact that creating a variable >> would potentially change the meaning of queries by shadowing table >> columns. But this turns out to be incorrect: it's_variables_ that are >> shadowed by table columns, not the other way around. > But that's still bad, because seemingly unrelated schema changes can > make variables appear and disappear. For example, if you have > SELECT a, b FROM table1 > and then you drop column b, maybe the above query continues to work > because there is also a variable b. Yeah, that seems pretty dangerous. Could we make it safe enough by requiring some qualification on variable names? That is, if you mean b to be a variable, then you must write something like SELECT a, pg_variables.b FROM table1 This is still ambiguous if you use "pg_variables" as a table alias in the query, but the alias would win so the query still means what it meant before. Also, table aliases (as opposed to actual table names) don't change readily, so I don't think there's much risk of the query suddenly meaning something different than it did yesterday. regards, tom lane
Re: Possible Bug in relation_open
Robert Haas writes: > On Tue, May 21, 2024 at 9:58 AM Pradeep Kumar > wrote: >> If the user tries to open the relation in RangeVar and NoLock mode calling >> table_openrv(relation, NoLock), it will internally call >> relation_openrv()-->relation_open(). In relation_open() we checking the >> Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES); , here we expecting >> the lockmode is NoLock or greater than that, but in same function again we >> checking this assert case Assert(lockmode != NoLock || >> IsBootstrapProcessingMode() || CheckRelationLockedByMe(r, AccessShareLock, >> true)); , here we are expecting (lockmode != NoLock) , so why are there two >> cases that contradict? and What if the user tries to open the relation in >> NoLock mode? and that will definitely cause the assert failure, Suppose the >> user who writes some extension and reads some relation oid that is constant, >> and wants to acquire NoLock?, need some clarification on this. > You need to acquire a lock. Otherwise, the relcache entry could change > underneath you while you're accessing it, which would result in > PostgreSQL crashing. To clarify: the rule is that it's only allowed to pass NoLock if you know for certain that some suitable lock on that relation is already held by the current query. That's why these conditions are complicated. regards, tom lane
Re: tydedef extraction - back to the future
Andrew Dunstan writes: > Attached is an attempt to thread this needle. The core is a new perl > module that imports the current buildfarm client logic. The intention is > that once we have this, the buildfarm client will switch to using the > module (if found) rather than its own built-in logic. There is precedent > for this sort of arrangement (AdjustUpgrade.pm). Accompanying the new > module is a standalone perl script that uses the new module, and > replaces the current shell script (thus making it more portable). Haven't read the code in detail, but +1 for concept. A couple of minor quibbles: * Why not call the wrapper script "find_typedefs"? Without the "s" it seems rather confusing --- "which typedef is this supposed to find, exactly?" * The header comment for sub typedefs seems to have adequate detail about what the arguments are, but that all ought to be propagated into the --help output for the wrapper script. Right now you couldn't figure out how to use the wrapper without reading the underlying module. regards, tom lane
Re: Reading timestamp values from Datums gives garbage values
Tomas Vondra writes: > On 5/20/24 16:37, Sushrut Shivaswamy wrote: >> When trying to read the query response from the Datum, I get garbage values. >> I've tried various types and none of them read the correct value. > TimestampTz is int64, so using DatumGetInt64 is probably the simplest > solution. And it's the number of microseconds, so X/1e6 should give you > the epoch. Don't forget that TimestampTz uses an epoch (time zero) of 2000-01-01. If you want a Unix-convention value where the epoch is 1970-01-01, you'll need to add 30 years to the result. The reported values seem pretty substantially off, though --- 5293917674 would be barely an hour and a half later than the epoch, which seems unlikely to be the value Sushrut intended to test with. I suspect a mistake that's outside the fragment of code we were shown. regards, tom lane
Re: Convert node test compile-time settings into run-time parameters
Peter Eisentraut writes: > This patch converts the compile-time settings > COPY_PARSE_PLAN_TREES > WRITE_READ_PARSE_PLAN_TREES > RAW_EXPRESSION_COVERAGE_TEST > into run-time parameters > debug_copy_parse_plan_trees > debug_write_read_parse_plan_trees > debug_raw_expression_coverage_test I'm kind of down on this. It seems like forcing a bunch of useless-in-production debug support into the standard build. What of this would be of any use to any non-developer? regards, tom lane
Re: Underscore in positional parameters?
Erik Wienhold writes: > On 2024-05-20 03:26 +0200, jian he wrote: >> /* Check parameter number is in range */ >> if (paramno <= 0 || paramno > MaxAllocSize / sizeof(Oid)) >> ereport(ERROR, ... > Yes, it makes sense to show the upper bound. How about a hint such as > "Valid parameters range from $%d to $%d."? I kind of feel like this upper bound is ridiculous. In what scenario is parameter 25000 not a mistake, if not indeed somebody trying to break the system? The "Bind" protocol message only allows an int16 parameter count, so rejecting parameter numbers above 32K would make sense to me. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Bruce Momjian writes: > On Sun, May 19, 2024 at 03:18:11PM -0400, Tom Lane wrote: >> * Another reason for things sitting a long time is that they're too >> big to review without an unreasonable amount of effort. We should >> encourage authors to break large patches into smaller stepwise >> refinements. It may seem like that will result in taking forever >> to reach the end goal, but dropping a huge patchset on the community >> isn't going to give speedy results either. > I think it is sometimes hard to incrementally apply patches if the > long-term goal isn't agreed or know to be achievable. True. The value of the earlier patches in the series can be unclear if you don't understand what the end goal is. But I think people could post a "road map" of how they intend a patch series to go. Another way of looking at this is that sometimes people do post large chunks of work in long many-patch sets, but we tend to look at the whole series as something to review and commit as one (or I do, at least). We should be more willing to bite off and push the earlier patches in such a series even when the later ones aren't entirely done. (The cfbot tends to discourage this, since as soon as one of the patches is committed it no longer knows how to apply the rest. Can we improve on that tooling somehow?) regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Dmitry Dolgov <9erthali...@gmail.com> writes: > There are lots of good takes on this in the thread. It also makes clear what's > at stake -- as Melanie pointed out with the patch about EXPLAIN for parallel > bitmap heap scan, we're loosing potential contributors for no reasons. But I'm > a bit concerned about what are the next steps: if memory serves, every couple > of years there is a discussion about everything what goes wrong with the > review > process, commitfests, etc. Yet to my (admittedly limited) insight into the > community, not many things have changed due to those discussions. How do we > make sure this time it will be different? Things *have* changed, if you take the long view. We didn't have commitfests at all until around 2007, and we've changed the ground rules for them a couple of times since then. We didn't have the CF app at all until, well, I don't recall when, but the first few CFs were managed by keeping patch lists on a wiki page. It's not that people are unwilling to change this stuff, but that it's hard to identify what will make things better. IMV one really fundamental problem is the same as it's been for a couple of decades: too many patch submissions, too few committers. We can't fix that by just appointing a ton more committers, at least not if we want to keep the project's quality up. We have to grow qualified committers. IIRC, one of the main reasons for instituting the commitfests at all was the hope that if we got more people to spend time reading the code and reviewing patches, some of them would learn enough to become good committers. I think that's worked, again taking a long view. I just did some quick statistics on the commit history, and I see that we were hovering at somewhere around ten active committers from 1999 to 2012, but since then it's slowly crept up to about two dozen today. (I'm counting "active" as "at least 10 commits per year", which is an arbitrary cutoff --- feel free to slice the data for yourself.) Meanwhile the number of submissions has also grown, so I'm not sure how much better the load ratio is. My point here is not that things are great, but just that we are indeed improving, and I hope we can continue to. Let's not be defeatist about it. I think this thread has already identified a few things we have consensus to improve in the CF app, and I hope somebody goes off and makes those happen (I lack the web skillz to help myself). However, the app itself is just a tool; what counts more is our process around it. I have a couple of thoughts about that: * Patches that sit in the queue a long time tend to be ones that lack consensus, either about the goal or the details of how to achieve it. Sometimes "lacks consensus" really means "nobody but the author thinks this is a good idea, but we haven't mustered the will to say no". But I think it's more usually the case that there are plausible competing opinions about what the patch should do or how it should do it. How can we resolve such differences and get something done? * Another reason for things sitting a long time is that they're too big to review without an unreasonable amount of effort. We should encourage authors to break large patches into smaller stepwise refinements. It may seem like that will result in taking forever to reach the end goal, but dropping a huge patchset on the community isn't going to give speedy results either. * Before starting this thread, Robert did a lot of very valuable review of some individual patches. I think what prompted him to start the thread was the realization that he'd only made a small dent in the problem. Maybe we could divide and conquer: get a dozen-or-so senior contributors to split up the list of pending patches and then look at each one with an eye to what needs to happen to move it along (*not* to commit it right away, although in some cases maybe that's the thing to do). It'd be great if that could happen just before each commitfest, but that's probably not practical with the current patch volume. What I'm thinking for the moment is to try to make that happen once a year or so. > What I think wasn't discussed yet in details is the question of motivation. > Surely, it would be great to have a process that will introduce as less burden > as possible. But giving more motivation to follow the process / use the tool > is > as important. What motivates folks to review patches, figure out status of a > complicated patch thread, maintain a list of open items, etc? Yeah, all this stuff ultimately gets done "for the good of the project", which isn't the most reliable motivation perhaps. I don't see a better one... regards, tom lane
State of pg_createsubscriber
I'm fairly disturbed about the readiness of pg_createsubscriber. The 040_pg_createsubscriber.pl TAP test is moderately unstable in the buildfarm [1], and there are various unaddressed complaints at the end of the patch thread (pretty much everything below [2]). I guess this is good enough to start beta with, but it's far from being good enough to ship, IMO. If there were active work going on to fix these things, I'd feel better, but neither the C code nor the test script have been touched since 1 April. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_failures.pl?max_days=30=HEAD==pg_basebackupCheck=Submit [2] https://www.postgresql.org/message-id/flat/3fa9ef0f-b277-4c13-850a-8ccc04de1406%40eisentraut.org#152dacecfc8f0cf08cbd8ecb79d6d38f
Re: [PATCH] Avoid mixing custom and OpenSSL BIO functions
Daniel Gustafsson writes: > On 19 Apr 2024, at 15:13, David Benjamin wrote: >> Circling back here, anything else needed from my end on this patch? > Patience mostly, v17 very recently entered feature freeze so a lof of the > developers are busy finding and fixing bugs to stabilize the > release. Per the cfbot [1], this patch needs a rebase over the ALPN-related commits. It still isn't likely to get human attention before the July commitfest, but you can save some time by making sure it's in a reviewable state before that. regards, tom lane [1] http://commitfest.cputube.org/david-benjamin.html
Re: generic plans and "initial" pruning
David Rowley writes: > With the caveat of not yet having looked at the latest patch, my > thoughts are that having the executor startup responsible for taking > locks is a bad idea and I don't think we should go down this path. OK, it's certainly still up for argument, but ... > 1. No ability to control the order that the locks are obtained. The > order in which the locks are taken will be at the mercy of the plan > the planner chooses. I do not think I buy this argument, because plancache.c doesn't provide any "ability to control the order" today, and never has. The order in which AcquireExecutorLocks re-gets relation locks is only weakly related to the order in which the parser/planner got them originally. The order in which AcquirePlannerLocks re-gets the locks is even less related to the original. This doesn't cause any big problems that I'm aware of, because these locks are fairly weak. I think we do have a guarantee that for partitioned tables, parents will be locked before children, and that's probably valuable. But an executor-driven lock order could preserve that property too. > 2. It introduces lots of complexity regarding how to cleanly clean up > after a failed executor startup which is likely to make exec startup > slower and the code more complex Perhaps true, I'm not sure. But the patch we'd been discussing before this proposal was darn complex as well. > 3. It puts us even further down the path of actually needing an > executor startup phase. Huh? We have such a thing already. > For #1, the locks taken for SELECT queries are less likely to conflict > with other locks obtained by PostgreSQL, but at least at the moment if > someone is getting deadlocks with a DDL type operation, they can > change their query or DDL script so that locks are taken in the same > order. If we allowed executor startup to do this then if someone > comes complaining that PG18 deadlocks when PG17 didn't we'd just have > to tell them to live with it. There's a comment at the bottom of > find_inheritance_children_extended() just above the qsort() which > explains about the deadlocking issue. The reason it's important there is that function is (sometimes) used for lock modes that *are* exclusive. > For #3, I've been thinking about what improvements we can do to make > the executor more efficient. In [1], Andres talks about some very > interesting things. In particular, in his email items 3) and 5) are > relevant here. If we did move lots of executor startup code into the > planner, I think it would be possible to one day get rid of executor > startup and have the plan record how much memory is needed for the > non-readonly part of the executor state and tag each plan node with > the offset in bytes they should use for their portion of the executor > working state. I'm fairly skeptical about that idea. The entire reason we have an issue here is that we want to do runtime partition pruning, which by definition can't be done at plan time. So I doubt it's going to play nice with what we are trying to accomplish in this thread. Moreover, while "replace a bunch of small pallocs with one big one" would save some palloc effort, what are you going to do to ensure that that memory has the right initial contents? I think this idea is likely to make the executor a great deal more notationally complex without actually buying all that much. Maybe Andres can make it work, but I don't want to contort other parts of the system design on the purely hypothetical basis that this might happen. > I think what Amit had before your objection was starting to turn into > something workable and we should switch back to working on that. The reason I posted this idea was that I didn't think the previously existing patch looked promising at all. regards, tom lane
Re: Requiring LLVM 14+ in PostgreSQL 18
Thomas Munro writes: > Oops, right I didn't know we had that documented. Thanks. Will hold > off doing anything until the thaw. FWIW, I don't think the release freeze precludes docs-only fixes. But if you prefer to sit on this, that's fine too. regards, tom lane
Re: Why is citext/regress failing on hamerkop?
Andrew Dunstan writes: > I've pushed a small change, that should just mark with an asterisk any > gitref that is more than 2 days older than the tip of the branch at the > time of reporting. Thanks! regards, tom lane
Re: Ignore Visual Studio's Temp Files While Working with PG on Windows
=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?= writes: > But this is different. If I understand it well, just by following > https://www.postgresql.org/docs/16/install-windows-full.html you'll > get those files no matter what is your specific environment (or > specific set of tools). Hm? Visual Studio seems like quite a specific tool from here. I did some googling around the question of project .gitignore files ignoring .vs/, and was amused to come across this: https://github.com/github/gitignore/blob/main/VisualStudio.gitignore which seems like a mighty fine example of where we *don't* want to go. regards, tom lane
Re: Ignore Visual Studio's Temp Files While Working with PG on Windows
Yasir writes: > We can add it to "~/.config/git/ignore" as it will ignore globally on > windows which we don't want. Also we don't have ".git/info/exclude" in PG > project's so the best place left is projects's .gitignore. That's what was > patched. As Peter said, we're not going to do that. The intention with the project's .gitignore files is to ignore files that are intentionally built by our "make" targets (and, hopefully, will be removed by "make maintainer-clean"). Anything else that you want git to ignore should be in a personal ignore list; especially files that are platform-specific. The fact that it's reasonable to ignore ".vs" files when working with your toolset doesn't mean that it's reasonable to ignore them when working on some other platform. If we used some other policy, we'd have tons of debates about which files were reasonable to exclude. For myself, for example, I exclude "*~" (Emacs backup files) and "*.orig" (patch(1) backup files) but those choices are very much dependent on the set of tools I choose to use. Other developers have other personal exclusion lists. If we tried to make the project's files be the union of all those lists, we'd be at serious risk of ignoring stuff we absolutely shouldn't ignore in some contexts. regards, tom lane
Re: Speed up clean meson builds by ~25%
Alvaro Herrera writes: > On 2024-May-17, Robert Haas wrote: >> Anyone else want to vote? > I had pretty much the same thought as you. It seems a waste to leave > the code in existing branches be slow only because we have a much better > approach for a branch that doesn't even exist yet. I won't complain too loudly as long as we remember to revert the patch from HEAD once the ecpg fix goes in. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Robert Haas writes: > On Fri, May 17, 2024 at 3:51 PM Laurenz Albe wrote: >> I think it is a good rule. I don't think that it shouldn't lead to putting >> people on the pillory or kicking their patches, but I imagine that a >> committer >> looking for somebody else's patch to work on could prefer patches by people >> who are doing their share of reviews. > If you give me an automated way to find that out, I'll consider paying > some attention to it. Yeah, I can't imagine that any committer (or reviewer, really) is doing any such thing, because it would take far too much effort to figure out how much work anyone else is doing. I see CFMs reminding everybody that this rule exists, but I don't think they ever try to check it either. It's pretty much the honor system, and I'm sure some folk ignore it. regards, tom lane
Re: Speed up clean meson builds by ~25%
Robert Haas writes: > If this is the consensus opinion, then > https://commitfest.postgresql.org/48/4914/ should be marked Rejected. > However, while I think the improvements that Tom was able to make here > sound fantastic, I don't understand why that's an argument against > Jelte's patches. After all, Tom's work will only go into v18, but this > patch could be adopted in v17 and back-patched to all releases that > support meson builds, saving oodles of compile time for as long as > those releases are supported. The most obvious beneficiary of that > course of action would seem to be Tom himself, since he back-patches > more fixes than anybody, last I checked, but it'd be also be useful to > get slightly quicker results from the buildfarm and slightly quicker > results for anyone using CI on back-branches and for other hackers who > are looking to back-patch bug fixes. I don't quite understand why we > want to throw those potential benefits out the window just because we > have a better fix for the future. As I mentioned upthread, I'm more worried about confusing error reports than the machine time. It would save me personally exactly nada, since (a) I usually develop with gcc not clang, (b) when I do use clang it's not a heavily-affected version, and (c) since we *very* seldom change the grammar in stable branches, ccache will hide the problem pretty effectively anyway in the back branches. (If you're not using ccache, please don't complain about build time.) I grant that there are people who are more affected, but still, I'd just as soon not contort the build rules for a temporary problem. regards, tom lane
Re: Postgres and --config-file option
Robert Haas writes: > If the reason that somebody is upset is because it's not technically > true to say that you *must* do one of those things, we could fix that > with "You must" -> "You can" or with "You must specify" -> "Specify". > The patch you propose is also not terrible or anything, but it goes in > the direction of listing every alternative, which will become > unpalatable as soon as somebody adds one more way to do it, or maybe > it's unpalatable already. I agree that it's not necessary or particularly useful for this hint to be exhaustive. I could get behind your suggestion of s/You must specify/Specify/, but I also think it's fine to do nothing. regards, tom lane
Re: problems with "Shared Memory and Semaphores" section of docs
Nathan Bossart writes: > [ many, many problems in documented formulas ] > At a bare minimum, we should probably fix the obvious problems, but I > wonder if we could simplify this section a bit, too. Yup. "The definition of insanity is doing the same thing over and over and expecting different results." Time to give up on documenting these things in such detail. Anybody who really wants to know can look at the source code. > If the exact values > are important, maybe we could introduce more GUCs like > shared_memory_size_in_huge_pages that can be consulted (instead of > requiring users to break out their calculators). I don't especially like shared_memory_size_in_huge_pages, and I don't want to introduce more of those. GUCs are not the right way to expose values that you can't actually set. (Yeah, I'm guilty of some of the existing ones like that, but it's still not a good thing.) Maybe it's time to introduce a system view for such things? It could be really simple, with name and value, or we could try to steal some additional ideas such as units from pg_settings. regards, tom lane
Re: Why is citext/regress failing on hamerkop?
TAKATSUKA Haruka writes: > I'm a hamerkop maintainer. > Sorry I have missed the scm error for so long. > Today I switched scmrepo from git.postgresql.org/git/postgresql.git > to github.com/postgres/postgres.git and successfully modernized > the build target code. Thanks very much! I see hamerkop has gone green in HEAD. It looks like it succeeded in v13 too but failed in v12, which suggests that the isolationcheck problem is intermittent, which is not too surprising given our current theory about what's causing that. At this point I think we are too close to the 17beta1 release freeze to mess with it, but I'd support pushing Thomas' proposed patch after the freeze is over. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
"David G. Johnston" writes: > On Friday, May 17, 2024, Joe Conway wrote: >> A solution to both of these issues (yours and mine) would be to allow >> things to be added *during* the CF month. What is the point of having a >> "freeze" before every CF anyway? Especially if they start out clean. If >> something is ready for review on day 8 of the CF, why not let it be added >> for review? > In conjunction with WIP removing this limitation on the bimonthlies makes > sense to me. I think that the original motivation for this was two-fold: 1. A notion of fairness, that you shouldn't get your patch reviewed ahead of others that have been waiting (much?) longer. I'm not sure how much this is really worth. In particular, even if we do delay an incoming patch by one CF, it's still going to compete with the older stuff in future CFs. There's already a sort feature in the CF dashboard whereby patches that have lingered for more CFs appear ahead of patches that are newer, so maybe just ensuring that late-arriving patches sort as "been around for 0 CFs" is sufficient. 2. As I mentioned a bit ago, the original idea was that we didn't exit a CF until it was empty of un-handled patches, so obviously allowing new patches to come in would mean we'd never get to empty. That idea didn't work and we don't think that way anymore. So yeah, I'm okay with abandoning the must-submit-to-a-future-CF restriction. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Melanie Plageman writes: > So, anyway, I'd argue that we need a parking lot for patches which we > all agree are important and have a path forward but need someone to do > the last 20-80% of the work. To avoid this being a dumping ground, > patches should _only_ be allowed in the parking lot if they have a > clear path forward. Patches which haven't gotten any interest don't go > there. Patches in which the author has clearly not addressed feedback > that is reasonable for them to address don't go there. These are > effectively community TODOs which we agree need to be done -- if only > someone had the time. Hmm. I was envisioning "parking lot" as meaning "this is on my personal TODO list, and I'd like CI support for it, but I'm not expecting anyone else to pay attention to it yet". I think what you are describing is valuable but different. Maybe call it "pending" or such? Or invent a different name for the other thing. regards, tom lane
Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
Peter Eisentraut writes: > On 16.05.24 16:45, Tom Lane wrote: >> Yeah, that was bothering me too, but I went for the minimum delta. >> I did think that a couple of integer macros would be a better idea, >> so +1 for what you did here. > I committed this, and Michael took care of WaitEventExtension, so we > should be all clear here. Thanks. I just made the committed typedefs.list exactly match the current buildfarm output, so we're clean for now. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Robert Haas writes: > On Fri, May 17, 2024 at 10:31 AM Tom Lane wrote: >> If we go back to the old its-development-mode-all-the-time approach, >> what is likely to happen is that the commit rate for not-your-own- >> patches goes to zero, because it's always possible to rationalize >> your own stuff as being more important. > We already have gone back to that model. We just haven't admitted it > yet. And we're never going to get out of it until we find a way to get > the contents of the CommitFest application down to a more reasonable > size and level of complexity. There's just no way everyone's up for > that level of pain. I'm not sure not up for that level of pain. Yeah, we clearly need to get the patch list to a point of manageability, but I don't agree that abandoning time-boxed CFs will improve anything. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Robert Haas writes: > On Fri, May 17, 2024 at 9:54 AM Joe Conway wrote: >> I agree with you. Before commitfests were a thing, we had no structure >> at all as I recall. The dates for the dev cycle were more fluid as I >> recall, and we had no CF app to track things. Maybe retaining the >> structure but going back to the continuous development would be just the >> thing, with the CF app tracking just the currently (as of this >> week/month/???) active stuff. > The main thing that we'd gain from that is to avoid all the work of > pushing lots of things forward to the next CommitFest at the end of > every CommitFest. To my mind, the point of the time-boxed commitfests is to provide a structure wherein people will (hopefully) pay some actual attention to other peoples' patches. Conversely, the fact that we don't have one running all the time gives committers some defined intervals where they can work on their own stuff without feeling guilty that they're not handling other people's patches. If we go back to the old its-development-mode-all-the-time approach, what is likely to happen is that the commit rate for not-your-own- patches goes to zero, because it's always possible to rationalize your own stuff as being more important. > Like, we could also just have a button that says "move everything > that's left to the next CommitFest". Clearly, CFMs would appreciate some more tooling to make that sort of thing easier. Perhaps we omitted it in the original CF app coding because we expected the end-of-CF backlog to be minimal, but it's now obvious that it generally isn't. BTW, I was reminded while trawling old email yesterday that we used to freeze the content of a CF at its start and then hold the CF open until the backlog actually went to zero, which resulted in multi-month death-march CFs and no clarity at all as to release timing. Let's not go back to that. But the CF app was probably built with that model in mind. regards, tom lane
Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
Michael Paquier writes: > On Thu, May 16, 2024 at 10:45:18AM -0400, Tom Lane wrote: >> ... I think the enum should be nuked altogether, but >> it's a bit late to be redesigning that for v17 perhaps. > You're right, WaitEventExtension is better gone. The only thing that > matters is that we want to start computing the IDs assigned to the > custom wait events for extensions with a number higher than the > existing WAIT_EXTENSION to avoid interferences in pg_stat_activity, so > this could be cleaned up as the attached. WFM, and this is probably a place where we don't want to change the API in v17 and again in v18, so I agree with pushing now. Reminder though: beta1 release freeze begins Saturday. Not many hours left. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Peter Eisentraut writes: > On 16.05.24 23:46, Tom Lane wrote: >> Right, so what can we do about that? Does Needs Review state need to >> be subdivided, and if so how? > Maybe a new state "Unclear". ... > I think, if we consider the core mission of the commitfest app, we need > to be more protective of the Needs Review state. Yeah, makes sense. > So a third status that encompasses the various other situations like > maybe forgotten by author, disagreements between author and reviewer, > process difficulties, needs some senior developer intervention, etc. > could be helpful. Hmm, "forgotten by author" seems to generally turn into "this has been in WOA state a long time". Not sure we have a problem representing that, only with a process for eventually retiring such entries. Your other three examples all sound like "needs senior developer attention", which could be a helpful state that's distinct from "ready for committer". It's definitely not the same as "Unclear". regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Daniel Gustafsson writes: > Pulling in the information from the CFBot regarding test failures and whether > the patch applies at all, and automatically altering the state to WOA for at > least the latter category would be nice. +1. There are enough intermittent test failures that I don't think changing state for that would be good, but patch apply failure is usually persistent. I gather that the CFBot is still a kluge that is web-scraping the CF app rather than being actually integrated with it, and that there are plans to make that better that haven't been moving fast. Probably that would have to happen first, but there would be a lot of benefit from having the info flow be two-way. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Peter Eisentraut writes: > The problem is if we have 180 patches in Needs Review, and only 20 are > really actually ready to be reviewed. And a second-order problem is > that if you already know that this will be the case, you give up before > even looking. Right, so what can we do about that? Does Needs Review state need to be subdivided, and if so how? If it's just that a patch should be in some other state altogether, we should simply encourage people to change the state as soon as they discover that. I think the problem is not so much "90% are in the wrong state" as "each potential reviewer has to rediscover that". At this point it seems like there's consensus to have a "parking" section of the CF app, separate from the time-boxed CFs, and I hope somebody will go make that happen. But I don't think that's our only issue, so we need to keep thinking about what should be improved. regards, tom lane
Re: Why is citext/regress failing on hamerkop?
Andrew Dunstan writes: > On 2024-05-16 Th 17:15, Tom Lane wrote: >> I'd rather have some visible status on the BF dashboard. Invariably, >> with a problem like this, the animal's owner is unaware there's a >> problem. If it's just silently not reporting, then no one else will >> notice either, and we effectively lose an animal (despite it still >> burning electricity to perform those rejected runs). > Fair enough. That will mean some database changes and other stuff, so it > will take a bit longer. Sure, I don't think it's urgent. regards, tom lane
Re: Why is citext/regress failing on hamerkop?
Andrew Dunstan writes: > On 2024-05-16 Th 16:18, Tom Lane wrote: >> Andrew: maybe the buildfarm server could be made to flag >> animals building exceedingly old commits? This is the second >> problem of this sort that I've noticed this month, and you >> really have to look closely to realize it's happening. > Yeah, that should be doable. Since we have the git ref these days we > should be able to mark it as old, or maybe just reject builds for very > old commits (the latter would be easier). I'd rather have some visible status on the BF dashboard. Invariably, with a problem like this, the animal's owner is unaware there's a problem. If it's just silently not reporting, then no one else will notice either, and we effectively lose an animal (despite it still burning electricity to perform those rejected runs). regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Jacob Champion writes: > On Thu, May 16, 2024 at 1:31 PM Joe Conway wrote: >> Maybe we should just make it a policy that *nothing* gets moved forward >> from commitfest-to-commitfest and therefore the author needs to care >> enough to register for the next one? > I think that's going to severely disadvantage anyone who doesn't do > this as their day job. Maybe I'm bristling a bit too much at the > wording, but not having time to shepherd a patch is not the same as > not caring. Also, I doubt that there are all that many patches that have simply been abandoned by their authors. Our problem is the same as it's been for many years: not enough review person-power, rather than not enough patches. So I think authors would just jump through that hoop, or enough of them would that it wouldn't improve matters. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Jacob Champion writes: > ... Similar to how people currently use the > Reviewer field as a personal TODO list... it might be nice to > officially separate the ideas a bit. Oh, that's an independent pet peeve of mine. Usually, if I'm looking over the CF list for a patch to review, I skip over ones that already show an assigned reviewer, because I don't want to step on that person's toes. But it seems very common to put one's name down for review without any immediate intention of doing work. Or to do a review and wander off, leaving the patch apparently being tended to but not really. (And I confess I'm occasionally guilty of both things myself.) I think it'd be great if we could separate "I'm actively reviewing this" from "I'm interested in this". As a bonus, adding yourself to the "interested" list would be a fine proxy for the thumbs-up or star markers mentioned upthread. If those were separate columns, we could implement some sort of aging scheme whereby somebody who'd not commented for (say) a week or two would get quasi-automatically moved from the "active reviewer" column to the "interested" column, whereupon it wouldn't be impolite for someone else to sign up for active review. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Melanie Plageman writes: > I was reflecting on why a general purpose patch tracker sounded > appealing to me, and I realized that, at least at this time of year, I > have a few patches that really count as "waiting on author" that I > know I need to do additional work on before they need more review but > which aren't currently my top priority. I should probably simply > withdraw and re-register them. My justification was that I'll lose > them if I don't keep them in the commitfest app. But, I could just, > you know, save them somewhere myself instead of polluting the > commitfest app with them. I don't know if others are in this > situation. Anyway, I'm definitely currently guilty of parking. It's also nice that the CF app will run CI for you, so at least you can keep the patch building if you're so inclined. David J. had a suggestion for this too upthread, which was to create a separate slot for WIP patches that isn't one of the dated CF slots. It's hard to argue that such patches need to be in "the CF app" at all, if you're not actively looking for review. But the CI runs and the handy per-author patch status list make the app very tempting infrastructure for parked patches. Maybe we could have a not-the-CF app that offers those amenities? regards, tom lane
Re: GUC names in messages
Daniel Gustafsson writes: > On 16 May 2024, at 13:35, Peter Eisentraut wrote: >> - errmsg("WAL generated with full_page_writes=off was replayed " >> + errmsg("WAL generated with \"full_page_writes=off\" was replayed " > I'm not a fan of this syntax, but I at the same time can't offer a better idea > so this isn't an objection but a hope that it can be made even better during > the v18 cycle. Yeah ... formally correct would be something like errmsg("WAL generated with \"full_page_writes\"=\"off\" was replayed " but that's a bit much for my taste. regards, tom lane
Re: Why is citext/regress failing on hamerkop?
Thomas Munro writes: > For citext_utf8, I pushed cff4e5a3. Hamerkop runs infrequently, so > here's hoping for 100% green on master by Tuesday or so. Meanwhile, back at the ranch, it doesn't seem that changed anything: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop=2024-05-16%2011%3A00%3A32 ... and now that I look more closely, the reason why it didn't change anything is that hamerkop is still building 0294df2 on HEAD. All its other branches are equally stuck at the end of March. So this is a flat-out-broken animal, and I plan to just ignore it until its owner un-sticks it. (In particular, I think we shouldn't be in a hurry to push the patch discussed downthread.) Andrew: maybe the buildfarm server could be made to flag animals building exceedingly old commits? This is the second problem of this sort that I've noticed this month, and you really have to look closely to realize it's happening. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
Daniel Gustafsson writes: >> On 16 May 2024, at 20:30, Robert Haas wrote: >> The original intent of CommitFests, and of commitfest.postgresql.org >> by extension, was to provide a place where patches could be registered >> to indicate that they needed to be reviewed, thus enabling patch >> authors and patch reviewers to find each other in a reasonably >> efficient way. I don't think it's working any more. > But which part is broken though, the app, our commitfest process and workflow > and the its intent, or our assumption that we follow said process and workflow > which may or may not be backed by evidence? IMHO, from being CMF many times, > there is a fair bit of the latter, which excacerbates the problem. This is > harder to fix with more or better software though. Yeah. I think that Robert put his finger on a big part of the problem, which is that punting a patch to the next CF is a lot easier than rejecting it, particularly for less-senior CFMs who may not feel they have the authority to say no (or at least doubt that the patch author would accept it). It's hard even for senior people to get patch authors to take no for an answer --- I know I've had little luck at it --- so maybe that problem is inherent. But a CF app full of patches that are unlikely ever to go anywhere isn't helpful. It's also true that some of us are abusing the process a bit. I know I frequently stick things into the CF app even if I intend to commit them pretty darn soon, because it's a near-zero-friction way to run CI on them, and I'm too lazy to learn how to do that otherwise. I like David's suggestion of a "Pending Commit" status, or maybe I should just put such patches into RfC state immediately? However, short-lived entries like that don't seem like they're a big problem beyond possibly skewing the CF statistics a bit. It's the stuff that keeps hanging around that seems like the core of the issue. >> I spent a good deal of time going through the CommitFest this week > And you deserve a big Thank You for that. + many regards, tom lane
Re: race condition when writing pg_control
Andres Freund writes: > On 2024-05-16 14:50:50 -0400, Tom Lane wrote: >> The intention was certainly always that it be atomic. If it isn't >> we have got *big* trouble. > We unfortunately do *know* that on several systems e.g. basebackup can read a > partially written control file, while the control file is being > updated. Yeah, but can't we just retry that if we get a bad checksum? What had better be atomic is the write to disk. Systems that can't manage POSIX semantics for concurrent reads and writes are annoying, but not fatal ... regards, tom lane
Re: race condition when writing pg_control
Nathan Bossart writes: > I suspect it will be difficult to investigate this one too much further > unless we can track down a copy of the control file with the bad checksum. > Other than searching for any new code that isn't doing the appropriate > locking, maybe we could search the buildfarm for any other occurrences. I > also seem some threads concerning whether the way we are reading/writing > the control file is atomic. The intention was certainly always that it be atomic. If it isn't we have got *big* trouble. regards, tom lane
Re: [UNVERIFIED SENDER] pg_upgrade can result in early wraparound on databases with high transaction load
Daniel Gustafsson writes: >> On 5 Jul 2022, at 18:59, Tom Lane wrote: >> Given the lack of field complaints, it's probably not worth trying >> to do anything to restore that capability. But we really ought to >> update pg_upgrade's code and docs in pre-v15 branches to say that >> the minimum supported source version is 9.0. > (reviving an old thread from the TODO) > Since we never got around to doing this we still refer to 8.4 as a possible > upgrade path in v14 and older. Oh, yeah, that seems to have fallen through a crack. > The attached takes the conservative approach of raising the minimum supported > version to 9.0 while leaving the code to handle 8.4 in place. While it can be > removed, the risk/reward tradeoff of gutting code in backbranches doesn't seem > appealing since the code will be unreachable with this check anyways. Yeah, it's not worth working harder than this. I do see one typo in your comment: s/supported then/supported when/. LGTM otherwise. regards, tom lane
Re: Docs: Always use true|false instead of sometimes on|off for the subscription options
Peter Smith writes: > Yeah, I had a vested interest in this one place because I've been > reviewing the other thread [1] that was mentioned above. If that other > thread chooses "true|false" then putting "true|false" adjacent to > another "on|off" will look silly. But if that other thread adopts the > existing 'failover=on|off' values then it will diverge even further > from being consistent with the CREATE SUBSCRIPTION page. It's intentional that we allow more than one spelling of boolean values. I can see some value in being consistent within nearby examples, but I don't agree at all that it should be uniform across all our docs. regards, tom lane
Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
Peter Eisentraut writes: > In these cases, I think for > NotificationHash > ResourceOwnerData > WalSyncMethod > we can just get rid of the typedef. I have no objection to dealing with NotificationHash as you have here. > ReadBuffersFlags shouldn't be an enum at all, because its values are > used as flag bits. Yeah, that was bothering me too, but I went for the minimum delta. I did think that a couple of integer macros would be a better idea, so +1 for what you did here. > WaitEventExtension, I'm not sure, it's like, an extensible enum? I > guess let's remove the typedef there, too. I am also quite confused by that. It seems to be kind of an enum that is supposed to be extended at runtime, meaning that neither of the existing enum member values ought to be used as such, although either autoprewarm.c didn't get the memo or I misunderstand the intended usage. NUM_BUILTIN_WAIT_EVENT_EXTENSION is possibly the most bizarre idea I've ever seen: what would a "built-in extension" event be exactly? I think the enum should be nuked altogether, but it's a bit late to be redesigning that for v17 perhaps. > Attached is a variant patch. I'm good with this, with a mental note to look again at WaitEventExtension later. regards, tom lane
Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
I wrote: > This works for me. One point that could stand discussion while we're > here is whether the once-a-cycle run should use the verbatim buildfarm > results or it's okay to editorialize on that typedefs list. I did a > little of the latter in da256a4a7, and I feel like we should either > bless that practice in this document or decide that it was a bad idea. > For reference, what I added to the buildfarm's list was > +InjectionPointCacheEntry > +InjectionPointCondition > +InjectionPointConditionType > +InjectionPointEntry > +InjectionPointSharedState > +NotificationHash > +ReadBuffersFlags > +ResourceOwnerData > +WaitEventExtension > +WalSyncMethod I realized that the reason the InjectionPoint typedefs were missing is that none of the buildfarm animals that contribute typedefs are building with --enable-injection-points. I rectified that on sifaka, and now those are in the list available from the buildfarm. As for the remainder, they aren't showing up because no variable or field is declared using them, which means no debug symbol table entry is made for them. This means we could just drop those typedefs and be little the worse off notationally. I experimented with a patch for that, as attached. (In the case of NotificationHash, I thought it better to arrange for there to be a suitable variable; but it could certainly be done the other way too.) Is this too anal? regards, tom lane diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index d0891e3f0e..6861f028d2 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -2253,10 +2253,13 @@ AsyncExistsPendingNotify(Notification *n) if (pendingNotifies->hashtab != NULL) { /* Use the hash table to probe for a match */ - if (hash_search(pendingNotifies->hashtab, - , - HASH_FIND, - NULL)) + NotificationHash *ent; + + ent = hash_search(pendingNotifies->hashtab, + , + HASH_FIND, + NULL); + if (ent) return true; } else diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c index ab9343bc5c..3bde0eba4d 100644 --- a/src/backend/utils/resowner/resowner.c +++ b/src/backend/utils/resowner/resowner.c @@ -107,7 +107,7 @@ StaticAssertDecl(RESOWNER_HASH_MAX_ITEMS(RESOWNER_HASH_INIT_SIZE) >= RESOWNER_AR /* * ResourceOwner objects look like this */ -typedef struct ResourceOwnerData +struct ResourceOwnerData { ResourceOwner parent; /* NULL if no parent (toplevel owner) */ ResourceOwner firstchild; /* head of linked list of children */ @@ -155,7 +155,7 @@ typedef struct ResourceOwnerData /* The local locks cache. */ LOCALLOCK *locks[MAX_RESOWNER_LOCKS]; /* list of owned locks */ -} ResourceOwnerData; +}; /* @@ -415,7 +415,7 @@ ResourceOwnerCreate(ResourceOwner parent, const char *name) ResourceOwner owner; owner = (ResourceOwner) MemoryContextAllocZero(TopMemoryContext, - sizeof(ResourceOwnerData)); + sizeof(*owner)); owner->name = name; if (parent) diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 76787a8267..1a1f11a943 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -19,14 +19,14 @@ /* Sync methods */ -typedef enum WalSyncMethod +enum WalSyncMethod { WAL_SYNC_METHOD_FSYNC = 0, WAL_SYNC_METHOD_FDATASYNC, WAL_SYNC_METHOD_OPEN, /* for O_SYNC */ WAL_SYNC_METHOD_FSYNC_WRITETHROUGH, WAL_SYNC_METHOD_OPEN_DSYNC /* for O_DSYNC */ -} WalSyncMethod; +}; extern PGDLLIMPORT int wal_sync_method; extern PGDLLIMPORT XLogRecPtr ProcLastRecPtr; diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h index 42211bfec4..edb7011743 100644 --- a/src/include/storage/bufmgr.h +++ b/src/include/storage/bufmgr.h @@ -107,14 +107,14 @@ typedef struct BufferManagerRelation #define BMR_REL(p_rel) ((BufferManagerRelation){.rel = p_rel}) #define BMR_SMGR(p_smgr, p_relpersistence) ((BufferManagerRelation){.smgr = p_smgr, .relpersistence = p_relpersistence}) -typedef enum ReadBuffersFlags +enum ReadBuffersFlags { /* Zero out page if reading fails. */ READ_BUFFERS_ZERO_ON_ERROR = (1 << 0), /* Call smgrprefetch() if I/O necessary. */ READ_BUFFERS_ISSUE_ADVICE = (1 << 1), -} ReadBuffersFlags; +}; struct ReadBuffersOperation { diff --git a/src/include/utils/wait_event.h b/src/include/utils/wait_event.h index 080e92d1cf..72c4d60930 100644 --- a/src/include/utils/wait_event.h +++ b/src/include/utils/wait_event.h @@ -53,11 +53,11 @@ extern PGDLLIMPORT uint32 *my_wait_event_info; * * The ID retrieved can be used with pgstat_report_wait_start() or equivalent. */ -typedef enum +enum WaitEventExtension { WAIT_EVENT_EXTENSION = PG_WAIT_EXTENSION, WAIT_EVENT_EXTENSION_FIRST_USER_DEFINED, -} WaitEventEx
Re: Adding the extension name to EData / log_line_prefix
Andres Freund writes: > What portability issues do you forsee? We already look up the same symbol in > all the shared libraries ("Pg_magic_func"), so we know that we can deal with > duplicate function names. Are you thinking that somehow we'd end up with > symbol interposition or something? No, it's the dependence on the physical library file name that's bothering me. Maybe that won't be an issue, but I foresee requests like "would you please case-fold it" or "the extension-trimming rule isn't quite right", etc. regards, tom lane
Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
Nathan Bossart writes: > On Wed, May 15, 2024 at 04:07:18PM -0400, Robert Haas wrote: >> How's this? > I compared this with my v1, and the only bit of information there that I > see missing in v3 is that validation step 4 only applies in the > once-per-cycle run (or if you forget to pgindent before committing a > patch). This might be why I was struggling to untangle the two types of > pgindent runs in my attempt. Perhaps it's worth adding a note to that step > about when it is required. Oh ... another problem is that the VALIDATION steps really apply to both kinds of indent runs, but it's not clear to me that that's obvious in v3. Maybe the steps should be rearranged to be (1) base case, (2) VALIDATION, (3) ONCE PER CYCLE. At this point my OCD got the better of me and I did a little additional wordsmithing. How about the attached? regards, tom lane diff --git a/src/tools/pgindent/README b/src/tools/pgindent/README index b649a21f59..369f120eb0 100644 --- a/src/tools/pgindent/README +++ b/src/tools/pgindent/README @@ -1,8 +1,9 @@ pgindent'ing the PostgreSQL source tree === -We run this process at least once in each development cycle, -to maintain uniform layout style in our C and Perl code. +pgindent is used to maintain uniform layout style in our C and Perl code, +and should be run for every commit. There are additional code beautification +tasks which should be performed at least once per release cycle. You might find this blog post interesting: http://adpgtech.blogspot.com/2015/05/running-pgindent-on-non-core-code-or.html @@ -25,45 +26,31 @@ PREREQUISITES: Or if you have cpanm installed, you can just use: cpanm https://cpan.metacpan.org/authors/id/S/SH/SHANCOCK/Perl-Tidy-20230309.tar.gz -DOING THE INDENT RUN: -1) Change directory to the top of the source tree. - -2) Download the latest typedef file from the buildfarm: - - wget -O src/tools/pgindent/typedefs.list https://buildfarm.postgresql.org/cgi-bin/typedefs.pl +DOING THE INDENT RUN BEFORE A NORMAL COMMIT: - (See https://buildfarm.postgresql.org/cgi-bin/typedefs.pl?show_list for a full - list of typedef files, if you want to indent some back branch.) +1) Change directory to the top of the source tree. -3) Run pgindent on the C files: +2) Run pgindent on the C files: src/tools/pgindent/pgindent . If any files generate errors, restore their original versions with "git checkout", and see below for cleanup ideas. -4) Indent the Perl code using perltidy: - - src/tools/pgindent/pgperltidy . - - If you want to use some perltidy version that's not in your PATH, - first set the PERLTIDY environment variable to point to it. - -5) Reformat the bootstrap catalog data files: - - ./configure # "make" will not work in an unconfigured tree - cd src/include/catalog - make reformat-dat-files - cd ../../.. - -VALIDATION: - -1) Check for any newly-created files using "git status"; there shouldn't +3) Check for any newly-created files using "git status"; there shouldn't be any. (pgindent leaves *.BAK files behind if it has trouble, while perltidy leaves *.LOG files behind.) -2) Do a full test build: +4) If pgindent wants to change anything your commit wasn't touching, + stop and figure out why. If it is making ugly whitespace changes + around typedefs your commit adds, you need to add those typedefs + to src/tools/pgindent/typedefs.list. + +5) If you have the patience, it's worth eyeballing the "git diff" output + for any egregiously ugly changes. See below for cleanup ideas. + +6) Do a full test build: make -s clean make -s all # look for unexpected warnings, and errors of course @@ -75,14 +62,38 @@ VALIDATION: header files that get copied into ecpg output; if so, adjust the expected-files to match. -3) If you have the patience, it's worth eyeballing the "git diff" output - for any egregiously ugly changes. See below for cleanup ideas. +AT LEAST ONCE PER RELEASE CYCLE: + +1) Download the latest typedef file from the buildfarm: + + wget -O src/tools/pgindent/typedefs.list https://buildfarm.postgresql.org/cgi-bin/typedefs.pl + + This step resolves any differences between the incrementally updated + version of the file and a clean, autogenerated one. + (See https://buildfarm.postgresql.org/cgi-bin/typedefs.pl?show_list for a full + list of typedef files, if you want to indent some back branch.) + +2) Run pgindent as above. + +3) Indent the Perl code using perltidy: + + src/tools/pgindent/pgperltidy . + + If you want to use some perltidy version that's not in your PATH, + first set the PERLTIDY environment variable to point to it. + +4) Reformat the bootstrap catalog data files: + + ./configure # "make" will not work in an unconfigured tree + cd src/include/catalog + make reformat-dat-files + cd ../
Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
Robert Haas writes: > On Wed, May 15, 2024 at 3:30 PM Nathan Bossart > wrote: >> This is much cleaner, thanks. The only thing that stands out to me is that >> the "once per release cycle" section should probably say to do an indent >> run after downloading the typedef file. > How's this? This works for me. One point that could stand discussion while we're here is whether the once-a-cycle run should use the verbatim buildfarm results or it's okay to editorialize on that typedefs list. I did a little of the latter in da256a4a7, and I feel like we should either bless that practice in this document or decide that it was a bad idea. For reference, what I added to the buildfarm's list was +InjectionPointCacheEntry +InjectionPointCondition +InjectionPointConditionType +InjectionPointEntry +InjectionPointSharedState +NotificationHash +ReadBuffersFlags +ResourceOwnerData +WaitEventExtension +WalSyncMethod I believe all of these must have been added manually during v17. If I took any one of them out there was some visible disimprovement in pgindent's results, so I kept them. Was that the right decision? If so we should explain it here. regards, tom lane
Re: [PATCH] Add --syntax to postgres for SQL syntax checking
"David G. Johnston" writes: > Now, in my ideal world something like this could be made as an extension so > that it can work on older versions and not have to be maintained by core. > And likely grow more features over time. Is there anything fundamental > about this that prevents it being implemented in an extension and, if so, > what can we add to core in v18 to alleviate that limitation? It'd be pretty trivial to create a function that takes a string and runs it through raw_parser --- I've got such things laying about for microbenchmarking purposes, in fact. But the API that'd present for tools such as editors is enormously different from the proposed patch: there would need to be a running server and they'd need to be able to log into it, plus there are more minor concerns like having to wrap the string in valid quoting. Now on the plus side, once you'd bought into that environment, it'd be equally trivial to offer alternatives like "run raw parsing and parse analysis, but don't run the query". I continue to maintain that that's the set of checks you'd really want in a lot of use-cases. regards, tom lane
Re: [PATCH] Add --syntax to postgres for SQL syntax checking
Robert Haas writes: > On Wed, May 15, 2024 at 2:39 PM Tom Lane wrote: >> The thing that was bothering me most about this is that I don't >> understand why that's a useful check. ... > But I don't understand the idea that the concept doesn't make sense. Sorry: "make sense" was a poorly chosen phrase there. What I was doubting, and continue to doubt, is that 100% checking of what you can check without catalog access and 0% checking of the rest is a behavior that end users will find especially useful. > I think it is perfectly reasonable to imagine a world in which the > initial parsing takes care of reporting everything that can be > determined by static analysis without knowing anything about catalog > contents, and parse analysis checks all the things that require > catalog access, and you can run the first set of checks and then > decide whether to proceed further. I think if I were designing a new > system from scratch, I'd definitely want to set it up that way, and I > think moving our existing system in that direction would probably let > us clean up a variety of warts along the way. Really, the only > argument I see against it is that getting from where we are to there > would be a gigantic amount of work for the value we'd derive. I'm less enthusiatic about this than you are. I think it would likely produce a slower and less maintainable system. Slower because we'd need more passes over the query: what parse analysis does today would have to be done in at least two separate steps. Less maintainable because knowledge about certain things would end up being more spread around the system. Taking your example of getting syntax checking to recognize invalid EXPLAIN keywords: right now there's just one piece of code that knows what those options are, but this'd require two. Also, "run the first set of checks and then decide whether to proceed further" seems like optimizing the system for broken queries over valid ones, which I don't think is an appropriate goal. Now, I don't say that there'd be *no* benefit to reorganizing the system that way. But it wouldn't be an unalloyed win, and so I share your bottom line that the costs would be out of proportion to the benefits. regards, tom lane
Re: add function argument names to regex* functions.
Robert Haas writes: > On Thu, Apr 4, 2024 at 9:55 AM jian he wrote: >> changing "N" to lower-case would be misleading for regexp_replace? >> so I choose "count". > I don't see why that would be confusing for regexp_replace > specifically, but I think N => count is a reasonable change to make. > However, I don't think this quite works: > + then the count'th match of the pattern I think the origin of the problem here is not wanting to use "N" as the actual name of the parameter, because then users would have to double-quote it to write "regexp_replace(..., "N" => 42, ...)". However ... is that really so awful? It's still fewer keystrokes than "count". It's certainly a potential gotcha for users who've not internalized when they need double quotes, but I think we could largely address that problem just by making sure to provide a documentation example that shows use of "N". > An English speaker is more likely to understand what is meant by > "N'th" than what is meant by "count'th". +1 ... none of the proposals make that bit read more clearly than it does now. regards, tom lane
Re: [PATCH] Add --syntax to postgres for SQL syntax checking
walt...@technowledgy.de writes: > Tom Lane: >> BTW, if you do feel that a pure grammar check is worthwhile, you >> should look at the ecpg preprocessor, which does more or less that >> with the SQL portions of its input. > Would working with ecpg allow to get back a parse tree of the query to > do stuff with that? No, ecpg isn't interested in building a syntax tree. > This is really what is missing for the ecosystem. A libpqparser for > tools to use: Formatters, linters, query rewriters, simple syntax > checkers... they are all missing access to postgres' own parser. To get to that, you'd need some kind of agreement on what the syntax tree is. I doubt our existing implementation would be directly useful to very many tools, and even if it is, do they want to track constant version-to-version changes? regards, tom lane
Re: Fix PGresult leak in pg_dump during binary upgrade
Daniel Gustafsson writes: > While looking at pg_dump performance today I noticed that pg_dump fails to > clear query results in binary_upgrade_set_pg_class_oids during binary upgrade > mode. 9a974cbcba00 moved the query to the outer block, but left the PQclear > and query buffer destruction in the is_index conditional, making it not always > be executed. 353708e1fb2d fixed the leak of the query buffer but left the > PGresult leak. The attached fixes the PGresult leak which when upgrading > large > schemas can be non-trivial. +1 --- in 353708e1f I was just fixing what Coverity complained about. I wonder why it missed this; it does seem to understand that PGresult leaks are a thing. But anyway, I missed it too. regards, tom lane
Re: [PATCH] Add --syntax to postgres for SQL syntax checking
=?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?= writes: > I'm not sure everyone in this thread understands the reason for this > patch, which is clearly my fault, since I have failed to explain. Main > idea is to make a tool to validate query can be parsed, that's all. > Similar to running "EXPLAIN query", but not caring about the result > and not caring about the DB structure (ignoring missing tables, ...), > just checking it was successfully executed. This definitely belongs to > the server side and not to the client side, it is just a tool to > validate that for this running PostgreSQL backend it is a "parseable" > query. The thing that was bothering me most about this is that I don't understand why that's a useful check. If I meant to type UPDATE mytab SET mycol = 42; and instead I type UPDATEE mytab SET mycol = 42; your proposed feature would catch that; great. But if I type UPDATE mytabb SET mycol = 42; it won't. How does that make sense? I'm not entirely sure where to draw the line about what a "syntax check" should catch, but this seems a bit south of what I'd want in a syntax-checking editor. BTW, if you do feel that a pure grammar check is worthwhile, you should look at the ecpg preprocessor, which does more or less that with the SQL portions of its input. ecpg could be a better model to follow because it doesn't bring all the dependencies of the server and so is much more likely to appear in a client-side installation. It's kind of an ugly, unmaintained mess at the moment, sadly. The big knock on doing this client-side is that there might be version skew compared to the server you're using --- but if you are not doing anything beyond a grammar-level check then your results are pretty approximate anyway, ISTM. We've not heard anything suggesting that version skew is a huge problem for ecpg users. regards, tom lane
Re: Adding the extension name to EData / log_line_prefix
Andres Freund writes: > On 2024-05-15 12:54:45 -0400, Chapman Flack wrote: >> But I'd bet, within the fraction of the population that does use it, >> it is already a short string that looks a whole lot like the name >> of the extension. So maybe enhancing the documentation and making it >> easy to set up would achieve much of the objective here. > The likely outcome would IMO be that some extensions will have the data, > others not. Whereas inferring the information from our side will give you > something reliable. > But I also just don't think it's something that architecturally fits together > that well. If we either had TEXTDOMAIN reliably set across extensions or it'd > architecturally be pretty, I'd go for it, but imo it's neither. There is one advantage over my suggestion of changing PG_MODULE_MAGIC: if we tell people to write PG_MODULE_MAGIC; #undef TEXTDOMAIN #define TEXTDOMAIN PG_TEXTDOMAIN("hstore") then that's 100% backwards compatible and they don't need any version-testing ifdef's. I still think that the kind of infrastructure Andres proposes is way overkill compared to the value, plus it's almost certainly going to have a bunch of platform-specific problems to solve. So I think Peter's thought is worth pursuing. regards, tom lane
Re: Adding the extension name to EData / log_line_prefix
Peter Eisentraut writes: > On 14.05.24 01:11, Tom Lane wrote: >> The mechanism that Andres describes for sourcing the name seems a bit >> overcomplex though. Why not just allow/require each extension to >> specify its name as a constant string? We could force the matter by >> redefining PG_MODULE_MAGIC as taking an argument: >> PG_MODULE_MAGIC("hstore"); > We kind of already have something like this, for NLS. If you look for > pg_bindtextdomain(TEXTDOMAIN) and ereport_domain(), this information > already trickles into the vicinity of the error data. Maybe the same > thing could just be used for this, by wiring up the macros a bit > differently. Hmm, cute idea, but it'd only help for extensions that are NLS-enabled. Which I bet is a tiny fraction of the population. So far as I can find, we don't even document how to set up TEXTDOMAIN for an extension --- you have to cargo-cult the macro definition from some in-core extension. regards, tom lane
Re: Fixup a few 2023 copyright years
David Rowley writes: > On Wed, 15 May 2024 at 17:32, Michael Paquier wrote: >> While running src/tools/copyright.pl, I have noticed that that a >> newline was missing at the end of index_including.sql, as an effect of >> the test added by you in a63224be49b8. I've cleaned up that while on >> it, as it was getting added automatically, and we tend to clean these >> like in 3f1197191685 or more recently c2df2ed90a82. > Thanks for fixing that. I'm a little surprised that pgindent does not > fix that sort of thing. pgindent does not touch anything but .c and .h files. I do recommend running "git diff --check" (with --staged if you already git-added your changes) before you're ready to commit something. That does find generic whitespace issues, and I believe it would've found this one. regards, tom lane
Re: explain format json, unit for serialize and memory are different.
Michael Paquier writes: > Perhaps Alvaro and Tom would like to chime in, as committers of > respectively 5de890e3610d and 06286709ee06? No objection here. In a green field I might argue for round-to-nearest instead of round-up, but it looks like we have several precedents for round-up, so let's avoid changing that existing behavior. regards, tom lane
Re: An improved README experience for PostgreSQL
Nathan Bossart writes: > On Tue, May 14, 2024 at 10:05:01AM +0200, Peter Eisentraut wrote: >> My point is, in order to get that enhanced GitHub experience, you don't >> actually have to commit these files into the individual source code >> repository. You can add them to the organization and they will apply to all >> repositories under the organization. This is explained at the above link. > Oh, I apologize, my brain skipped over the word "organization" in your > message. FWIW, I'd vote against doing it that way, because then maintaining/updating those files would only be possible for whoever owns the github repo. I don't have a position on whether we want these additional files or not; but if we do, I think the best answer is to stick 'em under .github/ where they are out of the way but yet updatable by any committer. regards, tom lane
Re: elog/ereport VS misleading backtrace_function function address
Alvaro Herrera writes: > On 2024-May-14, Jakub Wartak wrote: >> Reality is apparently mixed,at least from what I have checked : >> - all RHEL 7.x/8.x (PGDG and our forks) do NOT come with >> --enable-debug according to pg_config. > Ooh, yeah, that's true according to > https://git.postgresql.org/gitweb/?p=pgrpms.git;a=blob;f=rpm/redhat/main/non-common/postgresql-16/main/postgresql-16.spec;h=ab2f6edc903f083e04b8f7a1d3bad8e1b7018791;hb=1a8b9fa7019d3f73322ca873b62dc0b33e73ed1d > 507 %if %beta > 508 --enable-debug \ > 509 --enable-cassert \ > 510 %endif > Maybe a better approach for this whole thing is to change the specs so > that --enable-debug is always given, not just for %beta. My recollection from my time at Red Hat is that their standard policy is to build everything with debug symbols all the time; so this is violating that policy, and we should change it just on those grounds. However, I'm not sure how much the change will help Joe Average User with respect to the thread topic. RH actually has infrastructure that splits the debug symbol tables out into separate "debuginfo" RPMs, which you have to install manually if you want to debug a particular package. This is good for disk space consumption, but it means that most users are still only going to see the same backtrace they see currently. I don't know how much of that applies to, eg, Debian. regards, tom lane
Re: I have an exporting need...
Heikki Linnakangas writes: > On 13/05/2024 16:01, Juan Hernández wrote: >> Do you consider useful to add a parameter (for example, >> --separatetables) so when used the exporting file process can create a >> different tablename.sql file for each table in database automatically? > It'd be tricky to restore from, as you need to restore the tables in the > right order. I think you'd still need a "main" sql file that includes > all the other files in the right order. And using the table names as > filenames gets tricky if the table names contain any funny characters. It's a lot worse than that, as it's entirely possible to have circular FK dependencies, meaning there is no "right order" if you think of each table file as self-contained DDL plus data. Other sorts of circularities are possible too. pg_dump deals with that hazard by splitting things up: first create all the tables, then load all the data, then create all the indexes and foreign keys. You can tell it to just emit the parts relevant to a particular table, but it's on your head whether that's actually going to be useful in your context. I doubt that it's widely enough useful to justify creating a special mode beyond what we already have. regards, tom lane
Re: Underscore in positional parameters?
Dean Rasheed writes: > On Tue, 14 May 2024 at 07:43, Michael Paquier wrote: >> On Tue, May 14, 2024 at 05:18:24AM +0200, Erik Wienhold wrote: >>> Parameter $1_2 is taken as $1 because in rule {param} in scan.l we get >>> the parameter number with atol which stops at the underscore. That's a >>> regression in faff8f8e47f. Before that commit, $1_2 resulted in >>> "ERROR: trailing junk after parameter". > I'm sure that this wasn't intentional -- I think we just failed to > notice that "param" also uses "decinteger" in the scanner. Taking a > quick look, there don't appear to be any other uses of "decinteger", > so at least it only affects params. > Unless the spec explicitly says otherwise, I agree that we should > reject this, as we used to do, and add a comment saying that it's > intentionally not supported. I can't believe it would ever be useful, > and the current behaviour is clearly broken. +1, let's put this back the way it was. regards, tom lane
Re: Why is citext/regress failing on hamerkop?
Alexander Lakhin writes: > 13.05.2024 23:17, Tom Lane wrote: >> 3. We don't know exactly why hamerkop suddenly started seeing these >> failures, but a plausible theory emerges after noting that its >> reported time for the successful "make check" step dropped pretty >> substantially right when this started. In the v13 branch, "make >> check" was taking 2:18 or more in the several runs right before the >> first isolationcheck failure, but 1:40 or less just after. So it >> looks like the animal was moved onto faster hardware. > Yes, and one thing I can't explain yet, is why REL_14_STABLE+ timings > substantially differ from REL_13_STABLE-, say, for the check stage: As I mentioned in our off-list discussion, I have a lingering feeling that this v14 commit could be affecting the results somehow: Author: Tom Lane Branch: master Release: REL_14_BR [d5a9a661f] 2020-10-18 12:56:43 -0400 Update the Winsock API version requested by libpq. According to Microsoft's documentation, 2.2 has been the current version since Windows 98 or so. Moreover, that's what the Postgres backend has been requesting since 2004 (cf commit 4cdf51e64). So there seems no reason for libpq to keep asking for 1.1. I didn't believe at the time that that'd have any noticeable effect, but maybe it somehow made Winsock play a bit nicer with the GSS support? regards, tom lane
Re: Why is parula failing?
David Rowley writes: > I've not seen any recent failures from Parula that relate to this > issue. The last one seems to have been about 4 weeks ago. > I'm now wondering if it's time to revert the debugging code added in > 1db689715. Does anyone think differently? +1. It seems like we wrote off the other issue as a compiler bug, so I don't have much trouble assuming that this one was too. regards, tom lane
Re: Adding the extension name to EData / log_line_prefix
Andres Freund writes: > On 2024-05-13 19:25:11 -0300, Fabrízio de Royes Mello wrote: >> Hmmm, depending on the extension it can extensively call/use postgres code >> so would be nice if we can differentiate if the code is called from >> Postgres itself or from an extension. > I think that's not realistically possible. It's also very fuzzy what that'd > mean. If there's a planner hook and then the query executes normally, what do > you report for an execution time error? And even the simpler case - should use > of pg_stat_statements cause everything within to be logged as a > pg_stat_statement message? Not to mention that there could be more than one extension on the call stack. I think tying this statically to the ereport call site is fine. The mechanism that Andres describes for sourcing the name seems a bit overcomplex though. Why not just allow/require each extension to specify its name as a constant string? We could force the matter by redefining PG_MODULE_MAGIC as taking an argument: PG_MODULE_MAGIC("hstore"); regards, tom lane
Re: Why is citext/regress failing on hamerkop?
Thomas Munro writes: > For citext_utf8, I pushed cff4e5a3. Hamerkop runs infrequently, so > here's hoping for 100% green on master by Tuesday or so. In the meantime, some off-list investigation by Alexander Lakhin has turned up a good deal of information about why we're seeing failures on hamerkop in the back branches. Summarizing, it appears that 1. In a GSS-enabled Windows build without any active Kerberos server, libpq's pg_GSS_have_cred_cache() succeeds, allowing libpq to try to open a GSSAPI connection, but then gss_init_sec_context() fails, leading to client-side reports like this: +Connection 2 failed: could not initiate GSSAPI security context: Unspecified GSS failure. Minor code may provide more information: Credential cache is empty +FATAL: sorry, too many clients already (The first of these lines comes out during the attempted GSS connection, the second during the only-slightly-more-successful non-GSS connection.) So that's problem number 1: how is it that gss_acquire_cred() succeeds but then gss_init_sec_context() disclaims knowledge of any credentials? Can we find a way to get this failure to be detected during pg_GSS_have_cred_cache()? It is mighty expensive to launch a backend connection that is doomed to fail, especially when this happens during *every single libpq connection attempt*. 2. Once gss_init_sec_context() fails, libpq abandons the connection and starts over; since it has already initiated a GSS handshake on the connection, there's not much choice. Although libpq faithfully issues closesocket() on the abandoned connection, Alexander found that the connected backend doesn't reliably see that: it may just sit there until the AuthenticationTimeout elapses (1 minute by default). That backend is still consuming a postmaster child slot, so if this happens on any sizable fraction of test connection attempts, it's little surprise that we soon get "sorry, too many clients already" failures. 3. We don't know exactly why hamerkop suddenly started seeing these failures, but a plausible theory emerges after noting that its reported time for the successful "make check" step dropped pretty substantially right when this started. In the v13 branch, "make check" was taking 2:18 or more in the several runs right before the first isolationcheck failure, but 1:40 or less just after. So it looks like the animal was moved onto faster hardware. That feeds into this problem because, with a successful isolationcheck run taking more than a minute, there was enough time for some of the earlier stuck sessions to time out and exit before their postmaster-child slots were needed. Alexander confirmed this theory by demonstrating that the main regression tests in v15 would pass if he limited their speed enough (by reducing the CPU allowed to a VM) but not at full speed. So the buildfarm results suggesting this is only an issue in <= v13 must be just a timing artifact; the problem is still there. Of course, backends waiting till timeout is not a good behavior independently of what is triggering that, so we have two problems to solve here, not one. I have no ideas about the gss_init_sec_context() failure, but I see a plausible theory about the failure to detect socket closure in Microsoft's documentation about closesocket() [1]: If the l_onoff member of the LINGER structure is zero on a stream socket, the closesocket call will return immediately and does not receive WSAEWOULDBLOCK whether the socket is blocking or nonblocking. However, any data queued for transmission will be sent, if possible, before the underlying socket is closed. This is also called a graceful disconnect or close. In this case, the Windows Sockets provider cannot release the socket and other resources for an arbitrary period, thus affecting applications that expect to use all available sockets. This is the default behavior for a socket. I'm not sure whether we've got unsent data pending in the problematic condition, nor why it'd remain unsent if we do (shouldn't the backend consume it anyway?). But this has the right odor for an explanation. I'm pretty hesitant to touch this area myself, because it looks an awful lot like commits 6051857fc and ed52c3707, which eventually had to be reverted. I think we need a deeper understanding of exactly what Winsock is doing or not doing before we try to fix it. I wonder if there are any Microsoft employees around here with access to the relevant source code. In the short run, it might be a good idea to deprecate using --with-gssapi on Windows builds. A different stopgap idea could be to drastically reduce the default AuthenticationTimeout, perhaps only on Windows. regards, tom lane [1] https://learn.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-closesocket
Re: Is there any chance to get some kind of a result set sifting mechanism in Postgres?
aa writes: > If you call the action of "sifting" ordering, then yes. If you don't call > it ordering, then no. > In essence, is the output of a filtering mechanism, done in a single result > set pass. And this pass should be the same pass in charge of collecting the > result set in the first place. Sounds a lot like a WHERE clause to me. regards, tom lane
Re: Fix out-of-bounds in the function GetCommandTagName
David Rowley writes: > I've added a CF entry under your name for this: > https://commitfest.postgresql.org/48/4927/ > If it was code new to PG17 I'd be inclined to go ahead with it now, > but it does not seem to align with making the release mode stable. > I'd bet others will feel differently about that. Delaying seems a > better default choice at least. The security team's Coverity instance has started to show this complaint now too. So I'm going to go ahead and push this change in HEAD. It's probably unwise to change it in stable branches, since there's at least a small chance some external code is using COMMAND_TAG_NEXTTAG for the same purpose tag_behavior[] does. But we aren't anywhere near declaring v17's API stable, so I'd rather fix the issue than dismiss it in HEAD. regards, tom lane