Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
On 3/4/22 23:09, Nikita Glukhov wrote: > On 04.03.2022 23:28, Tom Lane wrote: > >> Tomas Vondra writes: >>> On 3/4/22 20:29, Nikita Glukhov wrote: So, we probably have corrupted indexes that were updated since such "incomplete" upgrade of ltree. >>> IIRC pg_upgrade is not expected to upgrade extensions - it keeps the >>> installed version of the extension, and that's intentional. >> Yeah, exactly. But this opens up an additional consideration we >> have to account for: whatever we do needs to work with either 1.1 >> or 1.2 SQL-level versions of the extension. >> >> regards, tom lane > > It becomes clear that ltree upgrade 1.1 => 1.2 is broken, the problem > is not so much related to PG12 => PG13+ upgrades. > Well, it's quite a mess :-( It very clearly is not just 1.1 -> 1.2 upgrade issue, because you can get a crash with 1.1 after PG12 -> PG13 upgrade, as demonstrated earlier. So just "fixing" the extension upgrade is no enough. But as you showed, 1.1 -> 1.2 upgrade is broken too. > > You can see here another problem: installation of opclass options > procedure does not invalidate relation's opclass options cache. > Seems like that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PoC] Let libpq reject unexpected authentication requests
On Fri, Mar 04, 2022 at 08:19:26PM -0500, Tom Lane wrote: > Jacob Champion writes: >> Here is my take on option 2, then: you get to choose exactly one method >> that the client will accept. If you want to use client certificates, >> use require_auth=cert. If you want to force SCRAM, use >> require_auth=scram-sha-256. If the server asks for something different, >> libpq will fail. If the server tries to get away without asking you for >> authentication, libpq will fail. There is no negotiation. Fine by me to put all the control on the client-side, that makes the whole much simpler to reason about. > Seems reasonable, but I bet that for very little more code you could > accept a comma-separated list of allowed methods; libpq already allows > comma-separated lists for some other connection options. That seems > like it'd be a useful increment of flexibility. Same impression here, so +1 for supporting a comma-separated list of values here. This is already handled in parse_comma_separated_list(), now used for multiple hosts and hostaddrs. -- Michael signature.asc Description: PGP signature
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
On 2022-03-04 20:06:43 -0800, Andres Freund wrote: > On 2022-03-05 16:39:21 +1300, Thomas Munro wrote: > > I vote for committing that workaround into the tree temporarily, > > because it's not just cfbot, it's also everyone's dev branches on > > Github + the official mirror that are red. > > I'll do so after making dinner, unless you want to do so sooner. It did fix > the problem (intermixed with a few irrelevant changes): > https://cirrus-ci.com/task/4928987829895168 Pushed.
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Hi, On 2022-03-05 16:39:21 +1300, Thomas Munro wrote: > On Sat, Mar 5, 2022 at 4:19 PM Andres Freund wrote: > > https://github.com/cirruslabs/cirrus-ci-agent/issues/218#issuecomment-1059657781 > > Oof. Nice detective work. Thanks. > > As indicated in the message above it, there's a workaround. Not sure if > > worth > > committing, if they were to fix it in a few days? cfbot could be repaired > > by > > just adding a repo environment variable of CIRRUS_AGENT_VERSION 1.73.2... > > OTOH, committing and reverting a single line + comment is cheap. > > I vote for committing that workaround into the tree temporarily, > because it's not just cfbot, it's also everyone's dev branches on > Github + the official mirror that are red. I'll do so after making dinner, unless you want to do so sooner. It did fix the problem (intermixed with a few irrelevant changes): https://cirrus-ci.com/task/4928987829895168 - Andres
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
On Sat, Mar 5, 2022 at 4:19 PM Andres Freund wrote: > https://github.com/cirruslabs/cirrus-ci-agent/issues/218#issuecomment-1059657781 Oof. Nice detective work. > As indicated in the message above it, there's a workaround. Not sure if worth > committing, if they were to fix it in a few days? cfbot could be repaired by > just adding a repo environment variable of CIRRUS_AGENT_VERSION 1.73.2... > OTOH, committing and reverting a single line + comment is cheap. I vote for committing that workaround into the tree temporarily, because it's not just cfbot, it's also everyone's dev branches on Github + the official mirror that are red.
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Hi, On 2022-03-05 13:21:26 +1300, Thomas Munro wrote: > On Sat, Mar 5, 2022 at 10:56 AM Andres Freund wrote: > > I suspect one also needs the console detach thing. > > I tried adding DETACHED_PROCESS (which should be like calling > FreeConsole() in child process) and then I tried CREATE_NEW_CONSOLE > instead, on top of CREATE_NEW_PROCESS_GROUP. Neither helped, though I > lost the postmaster's output. I think the issue with the process group is real, but independent of the failing test.Locally just specifying CREATE_NEW_PROCESS_GROUP fixes the problem of a pg_ctl start'ed database being stopped after ctrl-c. I think CI failing is due to cirrus bug, assuming a bit too much similarity between unix and windows https://github.com/cirruslabs/cirrus-ci-agent/issues/218#issuecomment-1059657781 As indicated in the message above it, there's a workaround. Not sure if worth committing, if they were to fix it in a few days? cfbot could be repaired by just adding a repo environment variable of CIRRUS_AGENT_VERSION 1.73.2... OTOH, committing and reverting a single line + comment is cheap. Greetings, Andres Freund
Re: Adding CI to our tree (ccache)
Hi, On 2022-03-03 22:56:15 -0600, Justin Pryzby wrote: > On Sun, Feb 20, 2022 at 12:47:31PM -0800, Andres Freund wrote: > > On 2022-02-20 13:36:55 -0600, Justin Pryzby wrote: > > > Have you tried to use the yet-to-be-released ccache with MSVC ? > > > > Yes, it doesn't work, because it requires cl.exe to be used in a specific > > way > > (only a single input file, specific output file naming). Which would > > require a > > decent amount of changes to src/tools/msvc. I think it's more realistic with > > meson etc. > > Did you get to the point that that causes a problem, or did you just realize > that it was a limitation that seems to preclude its use ? I tried to use it, but saw that no caching was happening, and debugged it. Which yielded that it can't be used due to the way output files are specified (and due to multiple files, but that can be prevented with an msbuild parameter). > If so, could you send the branch/commit you had ? This was in a local VM, not cirrus. I ended up making it work with the meson build, after a bit of fiddling. Although bypassing msbuild (by building with ninja, using cl.exe) is a larger win... > The error I'm getting when I try to use ccache involves .rst files, which > don't > exist (and which ccache doesn't know how to find or ignore). > https://cirrus-ci.com/task/5441491957972992 ccache has code to deal with response files. I suspect the problem here is rather that ccache expects the compiler as an argument. > I gather this is the difference between "compiling with MSVC" and compiling > with a visual studio project. I doubt it's related to that. Greetings, Andres Freund
Re: [PoC] Let libpq reject unexpected authentication requests
Jacob Champion writes: > $subject keeps coming up in threads. I think my first introduction to > it was after the TLS injection CVE, and then it came up again in the > pluggable auth thread. It's hard for me to generalize based on "sound > bites", but among the proposals I've seen are > 1. reject plaintext passwords > 2. reject a configurable list of unacceptable methods > 3. allow client and server to negotiate a method > All of them seem to have merit. Agreed. > Here is my take on option 2, then: you get to choose exactly one method > that the client will accept. If you want to use client certificates, > use require_auth=cert. If you want to force SCRAM, use > require_auth=scram-sha-256. If the server asks for something different, > libpq will fail. If the server tries to get away without asking you for > authentication, libpq will fail. There is no negotiation. Seems reasonable, but I bet that for very little more code you could accept a comma-separated list of allowed methods; libpq already allows comma-separated lists for some other connection options. That seems like it'd be a useful increment of flexibility. regards, tom lane
Re: Add the replication origin name and commit-LSN to logical replication worker errcontext
On Fri, Mar 4, 2022 at 6:02 PM Euler Taveira wrote: > > On Fri, Mar 4, 2022, at 2:54 AM, Amit Kapila wrote: > > The LSN of the transaction that contains the change violating the > constraint and the replication origin name can be found from the > server log (LSN 0/14C0378 and replication origin > pg_16395 in the above case). To skip the > transaction, the subscription needs to be disabled temporarily by > ALTER SUBSCRIPTION ... DISABLE first. Then, the > transaction can be skipped by calling the linkend="pg-replication-origin-advance"> > pg_replication_origin_advance() function > with the node_name (i.e., > pg_16395) and the next LSN of the commit LSN (i.e., > LSN 0/14C0379). > > You could also add: > > After that the replication can be resumed by ALTER SUBSCRIPTION ... > ENABLE. > > Let's provide complete instructions. > +1. That sounds reasonable to me. -- With Regards, Amit Kapila.
[PoC] Let libpq reject unexpected authentication requests
Hello, TL;DR: this patch lets you specify exactly one authentication method in the connection string, and libpq will fail the connection if the server doesn't use that method. (This is not intended for PG15. I'm generally anxious about posting experimental work during a commitfest, but there's been enough conversation about this topic recently that I felt like it'd be useful to have code to point to.) == Proposal and Alternatives == $subject keeps coming up in threads. I think my first introduction to it was after the TLS injection CVE, and then it came up again in the pluggable auth thread. It's hard for me to generalize based on "sound bites", but among the proposals I've seen are 1. reject plaintext passwords 2. reject a configurable list of unacceptable methods 3. allow client and server to negotiate a method All of them seem to have merit. I'm personally motivated by the case brought up by the CVE: if I'm expecting client certificate authentication, it's not acceptable for the server to extract _any_ information about passwords from my system, whether they're plaintext, hashed, or SCRAM-protected. So I chose not to implement option 1. And option 3 looked like a lot of work to take on in an experiment without a clear consensus. Here is my take on option 2, then: you get to choose exactly one method that the client will accept. If you want to use client certificates, use require_auth=cert. If you want to force SCRAM, use require_auth=scram-sha-256. If the server asks for something different, libpq will fail. If the server tries to get away without asking you for authentication, libpq will fail. There is no negotiation. == Why Force Authn? == I think my decision to fail if the server doesn't authenticate might be controversial. It doesn't provide additional protection against active attack unless you're using a mutual authentication method (SCRAM), because you can't prove that the server actually did anything with its side of the handshake. But this approach grew on me for a few reasons: - When using SCRAM, it allows the client to force a server to authenticate itself, even when channel bindings aren't being used. (I really think it's weird that we let the server get away with that today.) - In cases where you want to ensure that your actions are logged for later audit, you can be reasonably sure that you're connecting to a database that hasn't been configured with a `trust` setting. - For cert authentication, it ensures that the server asked for a cert and that you actually sent one. This is more forward-looking; today, we always ask for a certificate from the client even if we don't use it. But if implicit TLS takes off, I'd expect to see more middleware, with more potential for misconfiguration. == General Thoughts == I like that this approach fits nicely into the existing code. The majority of the patch just beefs up check_expected_areq(). The new flag that tracks whether or not we've authenticated is scattered around more than I would like, but I'm hopeful that some of the pluggable auth conversations will lead to nice refactoring opportunities for those internal helpers. There's currently no way to prohibit client certificates from being sent. If my use case is "servers shouldn't be able to extract password info if the client expects certificates", then someone else may very well say "servers shouldn't be able to extract my client certificate if I wanted to use SCRAM". Likewise, this feature won't prevent a GSS authenticated channel -- but we do have gssencmode=disable, so I'm less concerned there. I made the assumption that a GSS encrypted channel authenticates both parties to each other, but I don't actually know what guarantees are made there. I have not tested SSPI. I'm not a fan of the multiple spellings of "password" ("ldap", "pam", et al). My initial thought was that it'd be nice to match the client setting to the HBA setting in the server. But I don't think it's really all that helpful; worst-case, it pretends to do something it can't, since the client can't determine what the plaintext password is actually used for on the backend. And if someone devises (say) a SASL scheme for proxied LDAP authentication, that spelling becomes obsolete. Speaking of obsolete, the current implementation assumes that any SASL exchange must be for SCRAM. That won't be anywhere close to future- proof. Thanks, --Jacob From 545a89aafacd0f997cd3e14cd20b192335eafadc Mon Sep 17 00:00:00 2001 From: Jacob Champion Date: Mon, 28 Feb 2022 09:40:43 -0800 Subject: [PATCH] libpq: let client reject unexpected auth methods The require_auth connection option allows the client to choose one (and only one) authentication type for use with the server. There is no negotiation: if the server does not present the expected authentication request, the connection fails. Internally, the patch expands the role of check_expected_areq() to ensure that the incoming request matches conn->require_auth. It also
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
On Sat, Mar 5, 2022 at 10:56 AM Andres Freund wrote: > I suspect one also needs the console detach thing. I tried adding DETACHED_PROCESS (which should be like calling FreeConsole() in child process) and then I tried CREATE_NEW_CONSOLE instead, on top of CREATE_NEW_PROCESS_GROUP. Neither helped, though I lost the postmaster's output. Things I tried and their output are here: https://github.com/macdice/postgres/commits/hack3
Re: Proposal: Support custom authentication methods using hooks,Re: Proposal: Support custom authentication methods using hooks
>> I still don't understand why using plaintex password authentication >> over SSL connection is considered insecure. Actually we have been >> stating opposite in the manual: >> https://www.postgresql.org/docs/14/auth-password.html >> >> "If the connection is protected by SSL encryption then password can be >> used safely, though." > > If you aren't doing client verification (i.e., cert in pg_hba) and are > not doing verify-full on the client side then a man-in-the-middle > attack on TLS is trivial, and the plaintext password will be > sniffable. So the plaintext password is safe if used with hostssl + verify-full (server side) and sslmode = verify-full (client side), right? Best reagards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
On 2022-03-04 14:04:44 -0800, Andres Freund wrote: > Trying to make it use the current user now - I don't know what > permissions services are needed to run a service as a user or such. My first attempt of using %USERNAME% failed: [22:10:20.862] c:\cirrus>call tmp_install\bin\pg_ctl.exe register -D tmp_check/db "-UContainerAdministrator" [22:10:20.889] pg_ctl: could not register service "PostgreSQL": error code 1057 seems to require a domain for some reason: [22:33:54.599] c:\cirrus>call tmp_install\bin\pg_ctl.exe register -Dtmp_check/db -U "User Manager\ContainerAdministrator" but that then doesn't start: [22:33:54.660] c:\cirrus>call net start PostgreSQL [22:33:55.887] System error 1068 has occurred. So it indeed seems hard to start a service as the current user. At least with my limited windows knowledge.
Re: role self-revocation
On Fri, Mar 4, 2022 at 1:50 PM Robert Haas wrote: > On Mon, Feb 28, 2022 at 2:09 PM Stephen Frost wrote: > > The ability of a role to revoke itself from some other role is just > > something we need to accept as being a change that needs to be made, and > > I do believe that such a change is supported by the standard, in that a > > REVOKE will only work if you have the right to make it as the user who > > performed the GRANT in the first place. > > First, a quick overview of the > issue for those who have not followed the earlier threads in their > grueling entirety: > > rhaas=# create user boss; > CREATE ROLE > rhaas=# create user peon; > CREATE ROLE > rhaas=# grant peon to boss; > GRANT ROLE > rhaas=# \c - peon > You are now connected to database "rhaas" as user "peon". > rhaas=> revoke peon from boss; -- i don't like being bossed around! > REVOKE ROLE > > The wording for this example is hurting my brain. GRANT admin TO joe; \c admin REVOKE admin FROM joe; > I argue (and Stephen seems to agree) that the peon shouldn't be able > to undo the superuser's GRANT. I think I disagree. Or, at least, the superuser has full control of dictating how role membership is modified and that seems sufficient. The example above works because of: "A role is not considered to hold WITH ADMIN OPTION on itself, but it may grant or revoke membership in itself from a database session where the session user matches the role." If a superuser doesn't want "admin" to modify its own membership then they can prevent anyone but a superuser from being able to have a session_user of "admin". If that happens then the only way a non-superuser can modify group membership is by being added to said group WITH ADMIN OPTION. Now, if two people and a superuser are all doing membership management on the same group, and we want to add permission checks and multiple grants as tools, instead of having them just communicate with each other, then by all means let us do so. In that case, in answer to questions 2 and 3, we should indeed track which session_user made the grant and only allow the same session_user or the superuser to revoke it (we'd want to stop "ignoring" the GRANTED BY clause of REVOKE ROLE FROM so the superuser at least could remove grants made via WITH ADMIN OPTION). 4. Should we apply this rule to other types of grants, rather than > just to role membership? Why or why not? Consider this: > > The fact that the accountant may not be not in favor > of the auditor seeing what the accountant is doing with the money is > precisely the reason why we have auditors. [...] > However, when the superuser > performs the GRANT as in the above example, the grantor is recorded as > the table owner, not the superuser! So if we really want role > membersip and other kinds of grants to behave in the same way, we have > our work cut out for us here. > Yes, this particular choice seems unfortunate, but also not something that I think it is necessarily mandatory for us to improve. If the accountant is the owner then yes they get to decide permissions. In the presence of an auditor role either you trust the accountant role to keep the permissions in place or you define a superior authority to both the auditor and accountant to be the owner. Or let the superuser manage everything by witholding login and WITH ADMIN OPTION privileges from the ownership role. If we do extend role membership tracking I suppose the design question is whether the new role grantor dependency tracking will have a superuser be the recorded grantor instead of some owner. Given that roles don't presently have an owner concept, consistency with existing permissions in this manner would be trickier. Because of this, I would probably leave role grantor tracking at the session_user level while database objects continue to emanate from the object owner. The global vs database differences seem like a sufficient theoretical justification for the difference in implementation. David J.
Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
On 04.03.2022 23:28, Tom Lane wrote: Tomas Vondra writes: On 3/4/22 20:29, Nikita Glukhov wrote: So, we probably have corrupted indexes that were updated since such "incomplete" upgrade of ltree. IIRC pg_upgrade is not expected to upgrade extensions - it keeps the installed version of the extension, and that's intentional. Yeah, exactly. But this opens up an additional consideration we have to account for: whatever we do needs to work with either 1.1 or 1.2 SQL-level versions of the extension. regards, tom lane It becomes clear that ltree upgrade 1.1 => 1.2 is broken, the problem is not so much related to PG12 => PG13+ upgrades. The following examples crashes PG13+: CREATE EXTENSION ltree VERSION "1.1"; CREATE TABLE test AS SELECT i::text::ltree data FROM generate_series(1, 10) i; CREATE INDEX ON test USING gist (data); ALTER EXTENSION ltree UPDATE TO "1.2"; -- works, cached bytea options is still NULL and siglen is 28 SELECT * FROM test WHERE data = '12345'; \connect -- crash, siglen = 8 SELECT * FROM test WHERE data = '12345'; You can see here another problem: installation of opclass options procedure does not invalidate relation's opclass options cache. -- Nikita Glukhov Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Andres Freund writes: > I don't really understand why start_postmaster() bothers to wrap postmaster > start through cmd.exe, particularly when it prevents us from getting > postmaster's pid. Also the caveats around cmd.exe and sharing mode. I think the basic idea is to avoid having to write our own code to do the I/O redirections --- that's certainly why we use a shell on the Unix side. But it might well be that biting the bullet and handling redirection ourselves would be easier than coping with all these other problems. regards, tom lane
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Hi, On 2022-03-04 16:51:32 -0500, Tom Lane wrote: > Andres Freund writes: > > That fixed the immediate problem, but dblink, postgres_fdw tests failed: > > +ERROR: could not establish connection > > +DETAIL: connection to server at "localhost" (::1), port 5432 failed: > > FATAL: > > role "SYSTEM" does not exist > > [ scratches head... ] Where is libpq getting that username from? > Why is it different from whatever we'd determined during initdb? > (Maybe a case-folding issue??) When running as a service (via pg_ctl register) the default username to run under appears to be SYSTEM. Which then differs from the user that vcregress.pl runs under. Trying to make it use the current user now - I don't know what permissions services are needed to run a service as a user or such. > > The dblink and fdw failures can presumably be fixed by passing current_user > > as > > the username. That seems like a good idea anyway? > > I don't think it's a good idea to hack that without understanding > why it's suddenly going wrong. I think I understand why - seems more a question of whether we want to support running tests against a server running as a different user. Greetings, Andres Freund
Re: Avoiding smgrimmedsync() during nbtree index builds
On Wed, Mar 2, 2022 at 8:09 PM Justin Pryzby wrote: > > Rebased to appease cfbot. > > I ran these paches under a branch which shows code coverage in cirrus. It > looks good to my eyes. > https://api.cirrus-ci.com/v1/artifact/task/5212346552418304/coverage/coverage/00-index.html thanks for doing that and for the rebase! since I made updates, the attached version 6 is also rebased. To Dmitry's question: On Sun, Feb 13, 2022 at 9:33 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > On Wed, Feb 09, 2022 at 01:49:30PM -0500, Melanie Plageman wrote: > > I don't see it in the discussion, so naturally curious -- why directmgr > is not used for bloom index, e.g. in blbuildempty? thanks for pointing this out. blbuildempty() is also included now. bloom doesn't seem to use smgr* anywhere except blbuildempty(), so that is the only place I made changes in bloom index build. v6 has the following updates/changes: - removed erroneous extra calls to unbuffered_prep() and unbuffered_finish() in GiST and btree index builds - removed unnecessary includes - some comments were updated to accurately reflect use of directmgr - smgrwrite doesn't WAL-log anymore. This one I'm not sure about. I think it makes sense that unbuffered_extend() of non-empty pages of WAL-logged relations (or the init fork of unlogged relations) do log_newpage(), but I wasn't so sure about smgrwrite(). Currently all callers of smgrwrite() do log_newpage() and anyone using mdwrite() will end up writing the whole page. However, it seems possible that a caller of the directmgr API might wish to do a write to a particular offset and, either because of that, or, for some other reason, require different logging than that done in log_newpage(). I didn't want to make a separate wrapper function for WAL-logging in directmgr because it felt like one more step to forget. - heapam_relation_set_new_filenode doesn't use directmgr API anymore for unlogged relations. It doesn't extend or write, so it seemed like a special case better left alone. Note that the ambuildempty() functions which also write to the init fork of an unlogged relation still use the directmgr API. It is a bit confusing because they pass do_wal=true to unbuffered_prep() even though they are unlogged relations because they must log and fsync. - interface changes to unbuffered_prep() I removed the parameters to unbuffered_prep() which required the user to specify if fsync should be added to pendingOps or done with smgrimmedsync(). Understanding all of the combinations of these parameters and when they were needed was confusing and the interface felt like a foot gun. Special cases shouldn't use this interface. I prefer the idea that users of this API expect that 1) empty pages won't be checksummed or WAL logged 2) for relations that are WAL-logged, when the build is done, the relation will be fsync'd by the backend (unless the fsync optimization is being used) 3) the only case in which fsync requests are added to the pendingOps queue is when the fsync optimization is being used (which saves the redo pointer and check it later to determine if it needs to fsync itself) I also added the parameter "do_wal" to unbuffered_prep() and the UnBufferedWriteState struct. This is used when extending the file to determine whether or not to log_newpage(). unbuffered_extend() and unbuffered_write() no longer take do_wal as a parameter. Note that callers need to pass do_wal=true/false to unbuffered_prep() based on whether or not they want log_newpage() called during unbuffered_extend()--not simply based on whether or not the relation in question is WAL-logged. do_wal is the only member of the UnBufferedWriteState struct in the first patch in the set, but I think it makes sense to keep the struct around since I anticipate that the patch containing the other members needed for the fsync optimization will be committed at the same time. One final note on unbuffered_prep() -- I am thinking of renaming "do_optimization" to "try_optimization" or maybe "request_fsync_optimization". The interface (of unbuffered_prep()) would be better if we always tried to do the optimization when relevant (when the relation is WAL-logged). - freespace map, visimap, and hash index don't use directmgr API anymore For most use cases writing/extending outside shared buffers, it isn't safe to rely on requesting fsync from checkpointer. visimap, fsm, and hash index all request fsync from checkpointer for the pages they write with smgrextend() and don't smgrimmedsync() when finished adding pages (even when the hash index is WAL-logged). Supporting these exceptions made the interface incoherent, so I cut them. - added unbuffered_extend_range() This one is a bit unfortunate. Because GiST index build uses log_newpages() to log a whole page range but calls smgrextend() for each of those pages individually, I
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Hi, On 2022-03-05 09:29:01 +1300, Thomas Munro wrote: > On Sat, Mar 5, 2022 at 7:10 AM Andres Freund wrote: > > Or perhaps pg_ctl ought to pass CREATE_NEW_PROCESS_GROUP to CreateProcess()? > > The lack of a process group would explain why we're getting signalled on > > ctrl-c... > > I thought that sounded promising, given that the recent Cirrus agent > commit you pointed to says "Always kill spawned shell's process group > to avoid pipe FD hangs", and given that we do something conceptually > similar on Unix. It doesn't seem to help, though... > > https://cirrus-ci.com/task/5572163880091648 I suspect one also needs the console detach thing. I don't really understand why start_postmaster() bothers to wrap postmaster start through cmd.exe, particularly when it prevents us from getting postmaster's pid. Also the caveats around cmd.exe and sharing mode. Greetings, Andres Freund
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Andres Freund writes: > That fixed the immediate problem, but dblink, postgres_fdw tests failed: > +ERROR: could not establish connection > +DETAIL: connection to server at "localhost" (::1), port 5432 failed: FATAL: > role "SYSTEM" does not exist [ scratches head... ] Where is libpq getting that username from? Why is it different from whatever we'd determined during initdb? (Maybe a case-folding issue??) > The dblink and fdw failures can presumably be fixed by passing current_user as > the username. That seems like a good idea anyway? I don't think it's a good idea to hack that without understanding why it's suddenly going wrong. regards, tom lane
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Hi, On 2022-03-04 09:57:35 -0800, Andres Freund wrote: > On 2022-03-04 09:46:44 -0800, Andres Freund wrote: > > On 2022-03-04 09:30:37 -0800, Andres Freund wrote: > > > I wonder if we're missing some steps, at least on windows, to make pg_ctl > > > start independent of the starting shell? > > > > Sure looks that way. On windows, if I do pg_ctl start, then hit ctrl-c, the > > server shuts down. > > Short term the easiest fix might be to start postgres for those tests as a > service. But it seems we should fix whatever the cause of that > terminal-connectedness behaviour is. > > I'm out for ~2-3h. I started a test run with using a service just now: > https://cirrus-ci.com/task/5519573792325632 but I very well might have typoed > something... That fixed the immediate problem, but dblink, postgres_fdw tests failed: https://api.cirrus-ci.com/v1/artifact/task/5519573792325632/log/contrib/dblink/regression.diffs FROM dblink(connection_parameters(),'SELECT * FROM foo') AS t(a int, b text, c text[]) WHERE t.a > 7; - a | b | c +---+ - 8 | i | {a8,b8,c8} - 9 | j | {a9,b9,c9} -(2 rows) - +ERROR: could not establish connection +DETAIL: connection to server at "localhost" (::1), port 5432 failed: FATAL: role "SYSTEM" does not exist https://api.cirrus-ci.com/v1/artifact/task/5519573792325632/log/contrib/postgres_fdw/regression.diffs +ERROR: could not connect to server "loopback" and it also seems to redirect logging to the event log without further configuration... The dblink and fdw failures can presumably be fixed by passing current_user as the username. That seems like a good idea anyway? Greetings, Andres Freund
Re: role self-revocation
Robert Haas writes: > 1. What should be the exact rule for whether A can remove a grant made > by B? Is it has_privs_of_role()? is_member_of_role()? Something else? No strong opinion here, but I'd lean slightly to the more restrictive option. > 2. What happens if the same GRANT is enacted by multiple users? For > example, suppose peon does "GRANT peon to boss" and then the superuser > does the same thing afterwards, or vice versa? One design would be to > try to track those as two separate grants, but I'm not sure if we want > to add that much complexity, since that's not how we do it now and it > would, for example, implicate the choice of PK on the pg_auth_members > table. As you note later, we *do* track such grants separately in ordinary ACLs, and I believe this is clearly required by the SQL spec. It says (for privileges on objects): Each privilege is represented by a privilege descriptor. A privilege descriptor contains: — The identification of the object on which the privilege is granted. — The of the grantor of the privilege. ^^^ — The of the grantee of the privilege. — Identification of the action that the privilege allows. — An indication of whether or not the privilege is grantable. — An indication of whether or not the privilege has the WITH HIERARCHY OPTION specified. Further down (4.42.3 in SQL:2021), the granting of roles is described, and that says: Each role authorization is described by a role authorization descriptor. A role authorization descriptor includes: — The role name of the role. — The authorization identifier of the grantor. — The authorization identifier of the grantee. — An indication of whether or not the role authorization is grantable. If we are not tracking the grantors of role authorizations, then we are doing it wrong and we ought to fix that. > 3. What happens if a user is dropped after being recorded as a > grantor? Should work the same as it does now for ordinary ACLs, ie, you gotta drop the grant first. > 4. Should we apply this rule to other types of grants, rather than > just to role membership? I am not sure about the reasoning behind the existing rule that superuser-granted privileges are recorded as being granted by the object owner. It does feel more like a wart than something we want. It might have been a hack to deal with the lack of GRANTED BY options in GRANT/REVOKE back in the day. Changing it could have some bad compatibility consequences though. In particular, I believe it would break existing pg_dump files, in that after restore all privileges would be attributed to the restoring superuser, and there'd be no very easy way to clean that up. > Please note that it is not really my intention to try to shove > anything into v15 here. Agreed, this is not something to move on quickly. We might want to think about adjusting pg_dump to use explicit GRANTED BY options in GRANT/REVOKE a release or two before making incompatible changes. regards, tom lane
role self-revocation
On Mon, Feb 28, 2022 at 2:09 PM Stephen Frost wrote: > The ability of a role to revoke itself from some other role is just > something we need to accept as being a change that needs to be made, and > I do believe that such a change is supported by the standard, in that a > REVOKE will only work if you have the right to make it as the user who > performed the GRANT in the first place. Moving this part of the discussion to a new thread to reduce confusion and hopefully get broader input on this topic. It seems like Stephen and I agree in principle that some change here is a good idea. If anyone else thinks it's a bad idea, then this would be a great time to mention that, ideally with reasons. If you agree that it's a good idea, then it would be great to have your views on the follow-up questions which I shall pose below. To the extent that it is reasonably possible to do so, I would like to try to keep focused on specific design questions rather than getting tangled up in general discussion of long-term direction. First, a quick overview of the issue for those who have not followed the earlier threads in their grueling entirety: rhaas=# create user boss; CREATE ROLE rhaas=# create user peon; CREATE ROLE rhaas=# grant peon to boss; GRANT ROLE rhaas=# \c - peon You are now connected to database "rhaas" as user "peon". rhaas=> revoke peon from boss; -- i don't like being bossed around! REVOKE ROLE I argue (and Stephen seems to agree) that the peon shouldn't be able to undo the superuser's GRANT. Furthermore, we also seem to agree that you don't necessarily have to be the exact user who performed the grant. For example, it would be shocking if one superuser couldn't remove a grant made by another superuser, or for that matter if a superuser couldn't remove a grant made by a non-superuser. But there are a few open questions in my mind: 1. What should be the exact rule for whether A can remove a grant made by B? Is it has_privs_of_role()? is_member_of_role()? Something else? 2. What happens if the same GRANT is enacted by multiple users? For example, suppose peon does "GRANT peon to boss" and then the superuser does the same thing afterwards, or vice versa? One design would be to try to track those as two separate grants, but I'm not sure if we want to add that much complexity, since that's not how we do it now and it would, for example, implicate the choice of PK on the pg_auth_members table. An idea that occurs to me is to say that the first GRANT works and becomes the grantor of record, and any duplicate GRANT that happens later issues a NOTICE without changing anything. If the user performing the later GRANT has sufficient privileges and wishes to do so, s/he can REVOKE first and then re-GRANT. On the other hand, for other types of grants, like table privileges, we do track multiple grants by different users, so maybe we should do the same thing here: rhaas=# create table example (a int, b int); CREATE TABLE rhaas=# grant select on table example to foo with grant option; GRANT rhaas=# grant select on table example to bar with grant option; GRANT rhaas=# \c - foo You are now connected to database "rhaas" as user "foo". rhaas=> grant select on table example to exemplar; GRANT rhaas=> \c - bar You are now connected to database "rhaas" as user "bar". rhaas=> grant select on table example to exemplar; GRANT rhaas=> select relacl from pg_class where relname = 'example'; relacl --- {rhaas=arwdDxt/rhaas,foo=r*/rhaas,bar=r*/rhaas,exemplar=r/foo,exemplar=r/bar} (1 row) 3. What happens if a user is dropped after being recorded as a grantor? We actually have a grantor column in pg_auth_members today, but it's not properly maintained. If the grantor is dropped the OID remains in the table, and could eventually end up pointing to some other user if the OID counter wraps around and a new role is created with the same OID. That's completely unacceptable for something we want to use for any serious purpose. I suggest that what ought to happen is the role should acquire a dependency on the grant, such that DROP fails and the GRANT is listed as something to be dropped, and DROP OWNED BY drops the GRANT. I think this would require adding an OID column to pg_auth_members so that a dependency can point to it, which sounds like a significant infrastructure change that would need to be carefully validated for adverse side effects, but not a huge crazy problem that we can't get past. 4. Should we apply this rule to other types of grants, rather than just to role membership? Why or why not? Consider this: rhaas=# create user accountant; CREATE ROLE rhaas=# create user auditor; CREATE ROLE rhaas=# create table money (a int, b text); CREATE TABLE rhaas=# alter table money owner to accountant; ALTER TABLE rhaas=# grant select on table money to auditor; GRANT rhaas=# \c - accountant You are now connected to database
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
On Sat, Mar 5, 2022 at 7:10 AM Andres Freund wrote: > Or perhaps pg_ctl ought to pass CREATE_NEW_PROCESS_GROUP to CreateProcess()? > The lack of a process group would explain why we're getting signalled on > ctrl-c... I thought that sounded promising, given that the recent Cirrus agent commit you pointed to says "Always kill spawned shell's process group to avoid pipe FD hangs", and given that we do something conceptually similar on Unix. It doesn't seem to help, though... https://cirrus-ci.com/task/5572163880091648
Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
Tomas Vondra writes: > On 3/4/22 20:29, Nikita Glukhov wrote: >> So, we probably have corrupted indexes that were updated since such >> "incomplete" upgrade of ltree. > IIRC pg_upgrade is not expected to upgrade extensions - it keeps the > installed version of the extension, and that's intentional. Yeah, exactly. But this opens up an additional consideration we have to account for: whatever we do needs to work with either 1.1 or 1.2 SQL-level versions of the extension. regards, tom lane
Re: SQL/JSON: JSON_TABLE
On 3/4/22 13:13, Erikjan Rijkers wrote: > Op 04-03-2022 om 17:33 schreef Andrew Dunstan: >> > >> This set of patches deals with items 1..7 above, but not yet the ERROR >> ON ERROR issue. It also makes some message cleanups, but there is more >> to come in that area. >> >> It is based on the latest SQL/JSON Functions patch set, which does not >> include the sql_json GUC patch. >> > > [0001-SQL-JSON-functions-without-sql_json-GUC-v56.patch] > > [0002-JSON_TABLE-v56.patch] > > [0003-JSON_TABLE-PLAN-DEFAULT-clause-v56.patch] > > [0004-JSON_TABLE-PLAN-clause-v56.patch] > > Hi, > > I quickly tried the tests I already had and there are two statements > that stopped working: > > testdb=# SELECT JSON('{"a": 123, "b": [true, "foo"], "a2": "bar"}' > RETURNING jsonb); > ERROR: syntax error at or near "RETURNING" > LINE 1: ...SON('{"a": 123, "b": [true, "foo"], "a2": "bar"}' RETURNING > ... > > testdb=# select JSON_SCALAR(123.45 returning jsonb); > ERROR: syntax error at or near "returning" > LINE 1: select JSON_SCALAR(123.45 returning jsonb) > > (the '^' pointer in both cases underneath 'RETURNING' > > > Yes, you're right, that was implemented as part of the GUC patch. I'll try to split that out and send new patchsets with the RETURNING clause but without the GUC (see upthread for reasons) cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
On 3/4/22 20:29, Nikita Glukhov wrote: > On 25.02.2022 00:15, Tomas Vondra wrote: > >> Hi, >> >> Victor Yegorov reported a crash related to a GiST index on ltree [1], >> following a pg_upgrade from 12.x to 14.x, with a data set reproducing >> this. I spent some time investigating this, and it seems this is a silly >> bug in commit >> >> commit 911e70207703799605f5a0e8aad9f06cff067c63 (HEAD) >> Author: Alexander Korotkov >> Date: Mon Mar 30 19:17:11 2020 +0300 >> >> Implement operator class parameters >> ... >> >> in PG13, which modified ltree_gist so that siglen is opclass parameter >> (and not hard-coded). But the procedures use LTREE_GET_ASIGLEN macro to >> extract the value, which either extracts the value from the catalog (if >> the index has opclass parameter) or uses a default value - but it always >> uses LTREE_ASIGLEN_DEFAULT, which belongs to _ltree_gist opclass (i.e. >> for array of ltree). And that's 28 instead of the 8, as it should be. > > It seems that ltree extension simply was not updated to v1.2 after > pg_upgrade: > ALTER EXTENSION ltree UPDATE TO '1.2'; -- is missing > > Upgrade script ltree--1.1--1.2.sql creates ltree_gist_options() and > registers it in the opclass. ltree_gist_options() initializes bytea > options using the correct SIGLEN_DEFAULT=8. > > If ltree_gist_options() is absent, LTREE_GET_ASIGLEN() receives NULL > and wrong LTREE_ASIGLEN_DEFAULT is used. But if ltree_gist_options() > is registered, LTREE_GET_ASIGLEN() receives non-NULL bytea options > with the correct default value. > > > So, we probably have corrupted indexes that were updated since such > "incomplete" upgrade of ltree. > IIRC pg_upgrade is not expected to upgrade extensions - it keeps the installed version of the extension, and that's intentional. Moreover, it's perfectly legal to install older version, so you can do CREATE EXTENSION ltree VERSION '1.1'; So I don't think we can call this "incomplete upgrade". > > Also I found that contrib/pg_trgm/trgm_gist.c uses wrongly named macro > LTREE_GET_ASIGLEN(). Does it? When I grep for LTREE_GET_ASIGLEN, I don't get any matches in pg_trgm. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Allow async standbys wait for sync replication
On Wed, Mar 02, 2022 at 09:47:09AM +0530, Bharath Rupireddy wrote: > On Wed, Mar 2, 2022 at 2:57 AM Nathan Bossart > wrote: >> I think there are a couple of advantages. For one, spinning is probably >> not the best from a resource perspective. > > Just to be on the same page - by spinning do you mean - the async > walsender waiting for the sync flushLSN in a for-loop with > WaitLatch()? Yes. >> Also, this approach might fit in better >> with the existing synchronous replication framework. When a WAL sender >> realizes that it can't send up to the current "flush" LSN because it's not >> synchronously replicated, it will request to be alerted when it is. > > I think you are referring to the way a backend calls SyncRepWaitForLSN > and waits until any one of the walsender sets syncRepState to > SYNC_REP_WAIT_COMPLETE in SyncRepWakeQueue. Firstly, SyncRepWaitForLSN > blocking i.e. the backend spins/waits in for (;;) loop until its > syncRepState becomes SYNC_REP_WAIT_COMPLETE. The backend doesn't do > any other work but waits. So, spinning isn't avoided completely. > > Unless, I'm missing something, the existing syc repl queue > (SyncRepQueue) mechanism doesn't avoid spinning in the requestors > (backends) SyncRepWaitForLSN or in the walsenders SyncRepWakeQueue. My point is that there are existing tools for alerting processes when an LSN is synchronously replicated and for waking up WAL senders. What I am proposing wouldn't involve spinning in XLogSendPhysical() waiting for synchronous replication. Like SyncRepWaitForLSN(), we'd register our LSN in the queue (SyncRepQueueInsert()), but we wouldn't sit in a separate loop waiting to be woken. Instead, SyncRepWakeQueue() would eventually wake up the WAL sender and trigger another iteration of WalSndLoop(). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: ltree_gist indexes broken after pg_upgrade from 12 to 13
On 25.02.2022 00:15, Tomas Vondra wrote: Hi, Victor Yegorov reported a crash related to a GiST index on ltree [1], following a pg_upgrade from 12.x to 14.x, with a data set reproducing this. I spent some time investigating this, and it seems this is a silly bug in commit commit 911e70207703799605f5a0e8aad9f06cff067c63 (HEAD) Author: Alexander Korotkov Date: Mon Mar 30 19:17:11 2020 +0300 Implement operator class parameters ... in PG13, which modified ltree_gist so that siglen is opclass parameter (and not hard-coded). But the procedures use LTREE_GET_ASIGLEN macro to extract the value, which either extracts the value from the catalog (if the index has opclass parameter) or uses a default value - but it always uses LTREE_ASIGLEN_DEFAULT, which belongs to _ltree_gist opclass (i.e. for array of ltree). And that's 28 instead of the 8, as it should be. It seems that ltree extension simply was not updated to v1.2 after pg_upgrade: ALTER EXTENSION ltree UPDATE TO '1.2'; -- is missing Upgrade script ltree--1.1--1.2.sql creates ltree_gist_options() and registers it in the opclass. ltree_gist_options() initializes bytea options using the correct SIGLEN_DEFAULT=8. If ltree_gist_options() is absent, LTREE_GET_ASIGLEN() receives NULL and wrong LTREE_ASIGLEN_DEFAULT is used. But if ltree_gist_options() is registered, LTREE_GET_ASIGLEN() receives non-NULL bytea options with the correct default value. So, we probably have corrupted indexes that were updated since such "incomplete" upgrade of ltree. Also I found that contrib/pg_trgm/trgm_gist.c uses wrongly named macro LTREE_GET_ASIGLEN(). -- Nikita Glukhov Postgres Professional:http://www.postgrespro.com The Russian Postgres Company
Re: Proposal: Support custom authentication methods using hooks
On Thu, 2022-03-03 at 11:12 +0100, Peter Eisentraut wrote: > At the moment, it is not possible to judge whether the hook interface > you have chosen is appropriate. > > I suggest you actually implement the Azure provider, then make the hook > interface, and then show us both and we can see what to do with it. To add a data point here, I've rebased my OAUTHBEARER experiment [1] on top of this patchset. (That should work with Azure's OIDC provider, and if it doesn't, I'd like to know why.) After the port, here are the changes I still needed to carry in the backend to get the tests passing: - I needed to add custom HBA options to configure the provider. - I needed to declare usermap support so that my provider could actually use check_usermap(). - I had to modify the SASL mechanism registration to allow a custom maximum message length, but I think that's not the job of Samay's proposal to fix; it's just a needed improvement to CheckSASLAuth(). Obviously, the libpq frontend still needs to understand how to speak the new SASL mechanism. There are third-party SASL implementations that are plugin-based, which could potentially ease the pain here, at the expense of a major dependency and a very new distribution model. --Jacob [1] https://www.postgresql.org/message-id/flat/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel%40vmware.com
Re: Proposal: Support custom authentication methods using hooks,Re: Proposal: Support custom authentication methods using hooks
On Thu, Mar 3, 2022 at 11:50 PM Tatsuo Ishii wrote: > > >> So, dropping plaintext password authentication support from libpq will > >> make it impossible for users to use the former method. > > > > Yes, just like dropping support for md5 would make it impossible for > > users to have their passwords be hashed with md5, which is an altogether > > good thing considering how easy it is to brute-force md5 these days. > > I still don't understand why using plaintex password authentication > over SSL connection is considered insecure. Actually we have been > stating opposite in the manual: > https://www.postgresql.org/docs/14/auth-password.html > > "If the connection is protected by SSL encryption then password can be > used safely, though." If you aren't doing client verification (i.e., cert in pg_hba) and are not doing verify-full on the client side then a man-in-the-middle attack on TLS is trivial, and the plaintext password will be sniffable. The documentation should be updated to say under no circumstances is this safe.
Re: SQL/JSON: JSON_TABLE
Op 04-03-2022 om 17:33 schreef Andrew Dunstan: This set of patches deals with items 1..7 above, but not yet the ERROR ON ERROR issue. It also makes some message cleanups, but there is more to come in that area. It is based on the latest SQL/JSON Functions patch set, which does not include the sql_json GUC patch. > [0001-SQL-JSON-functions-without-sql_json-GUC-v56.patch] > [0002-JSON_TABLE-v56.patch] > [0003-JSON_TABLE-PLAN-DEFAULT-clause-v56.patch] > [0004-JSON_TABLE-PLAN-clause-v56.patch] Hi, I quickly tried the tests I already had and there are two statements that stopped working: testdb=# SELECT JSON('{"a": 123, "b": [true, "foo"], "a2": "bar"}' RETURNING jsonb); ERROR: syntax error at or near "RETURNING" LINE 1: ...SON('{"a": 123, "b": [true, "foo"], "a2": "bar"}' RETURNING ... testdb=# select JSON_SCALAR(123.45 returning jsonb); ERROR: syntax error at or near "returning" LINE 1: select JSON_SCALAR(123.45 returning jsonb) (the '^' pointer in both cases underneath 'RETURNING' thanks, Erik Rijkers cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Hi, On 2022-03-04 09:57:35 -0800, Andres Freund wrote: > On 2022-03-04 09:46:44 -0800, Andres Freund wrote: > > On 2022-03-04 09:30:37 -0800, Andres Freund wrote: > > > I wonder if we're missing some steps, at least on windows, to make pg_ctl > > > start independent of the starting shell? > > > > Sure looks that way. On windows, if I do pg_ctl start, then hit ctrl-c, the > > server shuts down. > > Short term the easiest fix might be to start postgres for those tests as a > service. But it seems we should fix whatever the cause of that > terminal-connectedness behaviour is. > > I'm out for ~2-3h. I started a test run with using a service just now: > https://cirrus-ci.com/task/5519573792325632 but I very well might have typoed > something... Seems to have worked for the first few tests at least. Unless somebody wants to clean up that commit and push it, I'll do so once I'm back. Perhaps pg_ctl needs to call FreeConsole() or such? https://docs.microsoft.com/en-us/windows/console/closing-a-console Or perhaps pg_ctl ought to pass CREATE_NEW_PROCESS_GROUP to CreateProcess()? The lack of a process group would explain why we're getting signalled on ctrl-c... Greetings, Andres Freund
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
On 2022-03-04 09:46:44 -0800, Andres Freund wrote: > On 2022-03-04 09:30:37 -0800, Andres Freund wrote: > > I wonder if we're missing some steps, at least on windows, to make pg_ctl > > start independent of the starting shell? > > Sure looks that way. On windows, if I do pg_ctl start, then hit ctrl-c, the > server shuts down. Short term the easiest fix might be to start postgres for those tests as a service. But it seems we should fix whatever the cause of that terminal-connectedness behaviour is. I'm out for ~2-3h. I started a test run with using a service just now: https://cirrus-ci.com/task/5519573792325632 but I very well might have typoed something...
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Hi, On 2022-03-04 09:30:37 -0800, Andres Freund wrote: > I wonder if we're missing some steps, at least on windows, to make pg_ctl > start independent of the starting shell? Sure looks that way. On windows, if I do pg_ctl start, then hit ctrl-c, the server shuts down. Greetings, Andres Freund
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Andres Freund writes: > On 2022-03-04 12:23:43 -0500, Tom Lane wrote: >> pg_regress thinks the postmaster is listening to a Unix socket. > pg_regress outputs the above message when neither PGHOST, PGPORT or > --host/--port are set. On windows that nevertheless ends up with connecting to > localhost. Yeah, I just traced that down. pg_regress was not updated when libpq's behavior was changed for platform-varying DEFAULT_PGSOCKET_DIR. I'll go fix that, but you're right that it's cosmetic. regards, tom lane
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Hi, On 2022-03-04 12:23:43 -0500, Tom Lane wrote: > Actually ... looking closer at the failure log: > > [09:41:50.589] Checking plpgsql > [09:41:50.608] (using postmaster on Unix socket, default port) > ^^^ > [09:41:50.608] == dropping database "pl_regression" > == > [09:41:52.731] psql: error: connection to server at "localhost" (::1), port > 5432 failed: Connection refused (0x274D/10061) > [09:41:52.731]Is the server running on that host and accepting TCP/IP > connections? > [09:41:52.731] connection to server at "localhost" (127.0.0.1), port 5432 > failed: Connection refused (0x274D/10061) > [09:41:52.731]Is the server running on that host and accepting TCP/IP > connections? > > pg_regress thinks the postmaster is listening to a Unix socket. > Maybe it's *only* listening to a Unix socket. In any case, > psql has very clearly not been told to use a Unix socket. > Something is not wired up correctly there. I saw that too, but I don't think it's the primary problem. The server is listening on ::1 as we know from the log, and quite evidently psql is trying to connect to that too. This output also looked this way before the failures. pg_regress outputs the above message when neither PGHOST, PGPORT or --host/--port are set. On windows that nevertheless ends up with connecting to localhost. Greetings, Andres Freund
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Hi, On 2022-03-04 09:10:14 -0800, Andres Freund wrote: > > I find it a little suspicious that startcreate is merrily starting > > the postmaster on port 5432, without concern for the possibility > > that that port is in use or blocked by a packet filter. > > It's a container that's just created for that run, so there shouldn't be > anything else running. We also see that the server successfully starts: > > 2022-03-03 23:24:15.261 GMT [5504][postmaster] LOG: starting PostgreSQL > 15devel, compiled by Visual C++ build 1929, 64-bit > 2022-03-03 23:24:15.270 GMT [5504][postmaster] LOG: listening on IPv6 > address "::1", port 5432 > 2022-03-03 23:24:15.270 GMT [5504][postmaster] LOG: listening on IPv4 > address "127.0.0.1", port 5432 > 2022-03-03 23:24:15.321 GMT [5808][startup] LOG: database system was shut > down at 2022-03-03 23:24:15 GMT > 2022-03-03 23:24:15.341 GMT [5504][postmaster] LOG: database system is ready > to accept connections Oh, huh. I added a printout of the running tasks, and it sure looks like postgres is not running anymore when plcheck is run, without anything ending up in the server log... It looks like the CI agent had some changes about stdin / stdout of the shells used to start scripts. The version between the last successful run and the first failing run changed. https://github.com/cirruslabs/cirrus-ci-agent/commits/master I wonder if we're missing some steps, at least on windows, to make pg_ctl start independent of the starting shell? Greetings, Andres Freund
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Andres Freund writes: > On 2022-03-04 09:54:28 -0500, Tom Lane wrote: >> I find it a little suspicious that startcreate is merrily starting >> the postmaster on port 5432, without concern for the possibility >> that that port is in use or blocked by a packet filter. > It's a container that's just created for that run, so there shouldn't be > anything else running. We also see that the server successfully starts: True, so the port was free. The postmaster wouldn't know whether there's a relevant packet filter though. The reason I'm harping on this is that it's hard to see why this postmaster start would fail before anything was actually done, when several previous starts on other ports behaved just fine. Actually ... looking closer at the failure log: [09:41:50.589] Checking plpgsql [09:41:50.608] (using postmaster on Unix socket, default port) ^^^ [09:41:50.608] == dropping database "pl_regression" == [09:41:52.731] psql: error: connection to server at "localhost" (::1), port 5432 failed: Connection refused (0x274D/10061) [09:41:52.731] Is the server running on that host and accepting TCP/IP connections? [09:41:52.731] connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused (0x274D/10061) [09:41:52.731] Is the server running on that host and accepting TCP/IP connections? pg_regress thinks the postmaster is listening to a Unix socket. Maybe it's *only* listening to a Unix socket. In any case, psql has very clearly not been told to use a Unix socket. Something is not wired up correctly there. regards, tom lane
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Hi, On 2022-03-04 09:54:28 -0500, Tom Lane wrote: > Thomas Munro writes: > > https://github.com/postgres/postgres/commits/master > > > I don't see what that patch has to do with the symptoms. > > The buildfarm is not unhappy, so I'd be looking at what changed > recently in the cfbot's setup. Very odd. > I find it a little suspicious that startcreate is merrily starting > the postmaster on port 5432, without concern for the possibility > that that port is in use or blocked by a packet filter. It's a container that's just created for that run, so there shouldn't be anything else running. We also see that the server successfully starts: 2022-03-03 23:24:15.261 GMT [5504][postmaster] LOG: starting PostgreSQL 15devel, compiled by Visual C++ build 1929, 64-bit 2022-03-03 23:24:15.270 GMT [5504][postmaster] LOG: listening on IPv6 address "::1", port 5432 2022-03-03 23:24:15.270 GMT [5504][postmaster] LOG: listening on IPv4 address "127.0.0.1", port 5432 2022-03-03 23:24:15.321 GMT [5808][startup] LOG: database system was shut down at 2022-03-03 23:24:15 GMT 2022-03-03 23:24:15.341 GMT [5504][postmaster] LOG: database system is ready to accept connections Greetings, Andres Freund
Re: Make unlogged table resets detectable
Is this patch targetting pg15 ? There's no discussion since June. Latest at 2021-06-08 21:29:25 by Jeff Davis 2022-02-02 16:37:58 Julien Rouhaud (rjuju) Closed in commitfest 2022-01 with status: Moved to next CF 2021-12-03 06:18:05 Michael Paquier (michael-kun) Closed in commitfest 2021-11 with status: Moved to next CF 2021-10-04 16:32:49 Jaime Casanova (jcasanov) Closed in commitfest 2021-09 with status: Moved to next CF 2021-08-03 02:29:40 Masahiko Sawada (masahikosawada)Closed in commitfest 2021-07 with status: Moved to next CF
Re: Add LZ4 compression in pg_dump
The patch is failing on cfbot/freebsd. http://cfbot.cputube.org/georgios-kokolatos.html Also, I wondered if you'd looked at the "high compression" interfaces in lz4hc.h ? Should pg_dump use that ? On Fri, Feb 25, 2022 at 08:03:40AM -0600, Justin Pryzby wrote: > Thanks for working on this. Your 0001 looks similar to what I did for zstd > 1-2 > years ago. > https://commitfest.postgresql.org/32/2888/ > > I rebased and attached the latest patches I had in case they're useful to you. > I'd like to see zstd included in pg_dump eventually, but it was too much work > to shepherd the patches. Now that seems reasonable for pg16. > > With the other compression patches I've worked on, we've used an extra patch > with changes the default to the new compression algorithm, to force cfbot to > exercize the new code. > > Do you know the process with commitfests and cfbot ? > There's also this, which allows running the tests on cirrus before mailing the > patch to the hackers list. > ./src/tools/ci/README
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Thomas Munro writes: > On Fri, Mar 4, 2022 at 10:37 PM Bharath Rupireddy > wrote: >> It looks like regression tests are failing on Windows Server 2019 [1]. >> Ignore if it's reported elsewhere. > It's broken since 46ab07ff "Clean up assorted failures under clang's > -fsanitize=undefined checks.": > https://github.com/postgres/postgres/commits/master > I don't see what that patch has to do with the symptoms. The buildfarm is not unhappy, so I'd be looking at what changed recently in the cfbot's setup. I find it a little suspicious that startcreate is merrily starting the postmaster on port 5432, without concern for the possibility that that port is in use or blocked by a packet filter. regards, tom lane
Re: New Table Access Methods for Multi and Single Inserts
On Mon, 19 Apr 2021 at 06:52, Bharath Rupireddy wrote: > > On Mon, Apr 5, 2021 at 9:49 AM Bharath Rupireddy > wrote: > > > > On Wed, Mar 10, 2021 at 10:21 AM Bharath Rupireddy > > wrote: > > > Attaching the v4 patch set. Please review it further. > > > > Attaching v5 patch set after rebasing onto the latest master. > > Another rebase due to conflicts in 0003. Attaching v6 for review. I recently touched the topic of multi_insert, and I remembered this patch. I had to dig a bit to find it, but as it's still open I've added some comments: > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h > +#define MAX_BUFFERED_TUPLES1000 > +#define MAX_BUFFERED_BYTES65535 It looks like these values were copied over from copyfrom.c, but are now expressed in the context of the heapam. As these values are now heap-specific (as opposed to the TableAM-independent COPY infrastructure), shouldn't we instead optimize for heap page insertions? That is, I suggest using a multiple (1 or more) of MaxHeapTuplesPerPage for _TUPLES, and that same / a similar multiple of BLCKSZ for MAX_BUFFERED_BYTES. > TableInsertState->flushed > TableInsertState->mi_slots I don't quite like the current storage-and-feedback mechanism for flushed tuples. The current assumptions in this mechanism seem to be that 1.) access methods always want to flush all available tuples at once, 2.) access methods want to maintain the TupleTableSlots for all inserted tuples that have not yet had all triggers handled, and 3.) we need access to the not-yet-flushed TupleTableSlots. I think that that is not a correct set of assumptions; I think that only flushed tuples need to be visible to the tableam-agnostic code; and that tableams should be allowed to choose which tuples to flush at which point; as long as they're all flushed after a final multi_insert_flush. Examples: A heap-based access method might want bin-pack tuples and write out full pages only; and thus keep some tuples in the buffers as they didn't fill a page; thus having flushed only a subset of the current buffered tuples. A columnstore-based access method might not want to maintain the TupleTableSlots of original tuples, but instead use temporary columnar storage: TupleTableSlots are quite large when working with huge amounts of tuples; and keeping lots of tuple data in memory is expensive. As such, I think that this should be replaced with a TableInsertState->mi_flushed_slots + TableInsertState->mi_flushed_len, managed by the tableAM, in which only the flushed tuples are visible to the AM-agnostic code. This is not much different from how the implementation currently works; except that ->mi_slots now does not expose unflushed tuples; and that ->flushed is replaced by an integer value of number of flushed tuples. A further improvement (in my opinion) would be the change from a single multi_insert_flush() to a signalling-based multi_insert_flush: It is not unreasonable for e.g. a columnstore to buffer tens of thousands of inserts; but doing so in TupleTableSlots would introduce a high memory usage. Allowing for batched retrieval of flushed tuples would help in memory usage; which is why multiple calls to multi_insert_flush() could be useful. To handle this gracefully, we'd probably add TIS->mi_flush_remaining, where each insert adds one to mi_flush_remaining; and each time mi_flushed_slots has been handled mi_flush_remaining is decreased by mi_flushed_len by the handler code. Once we're done inserting into the table, we keep calling multi_insert_flush until no more tuples are being flushed (and error out if we're still waiting for flushes but no new flushed tuples are returned). - Matthias
walmethods.c is kind of a mess (was Re: refactoring basebackup.c)
On Fri, Mar 4, 2022 at 3:32 AM Dipesh Pandit wrote: > GZIP manages to overcome this problem as it provides an option to turn on/off > compression on the fly while writing a compressed archive with the help of > zlib > library function deflateParams(). The current gzip implementation for > CreateWalTarMethod uses this library function to turn off compression just > before > step #1 and it writes the uncompressed header of size equal to TAR_BLOCK_SIZE. > It uses the same library function to turn on the compression for writing the > contents > of the WAL file as part of step #2. It again turns off the compression just > before step > #3 to overwrite the header. The header is overwritten at the same offset with > size > equal to TAR_BLOCK_SIZE. This is a real mess. To me, it seems like a pretty big hack to use deflateParams() to shut off compression in the middle of the compressed data stream so that we can go back and overwrite that part of the data later. It appears that the only reason we need that hack is because we don't know the file size starting out. Except we kind of do know the size, because pad_to_size specifies a minimum size for the file. It's true that the maximum file size is unbounded, but I'm not sure why that's important. I wonder if anyone else has an idea why we didn't just set the file size to pad_to_size exactly when we write the tar header the first time, instead of this IMHO kind of nutty approach where we back up. I'd try to figure it out from the comments, but there basically aren't any. I also had a look at the relevant commit messages and didn't see anything relevant there either. If I'm missing something, please point it out. While I'm complaining, I noticed while looking at this code that it is documented that "The caller must ensure that only one method is instantiated in any given program, and that it's only instantiated once!" As far as I can see, this is because somebody thought about putting all of the relevant data into a struct and then decided on an alternative strategy of storing some of it there, and the rest in a global variable. I can't quite imagine why anyone would think that was a good idea. There may be some reason that I can't see right now, but here again there appear to be no relevant code comments. I'm somewhat inclined to wonder whether we could just get rid of walmethods.c entirely and use the new bbstreamer stuff instead. That code also knows how to write plain files into a directory, and write tar archives, and compress stuff, but in my totally biased opinion as the author of most of that code, it's better code. It has no restriction on using at most one method per program, or of instantiating that method only once, and it already has LZ4 support, and there's a pending patch for ZSTD support that I intend to get committed soon as well. It also has, and I know I might be beating a dead horse here, comments. Now, admittedly, it does need to know the size of each archive member up front in order to work, so if we can't solve the problem then we can't go this route. But if we can't solve that problem, then we also can't add LZ4 and ZSTD support to walmethods.c, because random access to compressed data is not really a thing, even if we hacked it to work for gzip. Thoughts? -- Robert Haas EDB: http://www.enterprisedb.com
pg_rewind enhancements
Hi, While using pg_rewind, I found that it is a bit difficult to use pg_rewind as it seems to copy even the configuration files and also remove some of the files created on the old primary which may not be present on the new primary. Similarly it copies files under the data directory of the new primary which may not be needed or which possibly could be junk files. I would propose to have a couple of new command line arguments to pg_rewind. One, a comma separated list of files which should be preserved on the old primary, in other words which shouldn't be overwritten from the new primary. Second, a comma separated list of files which should be excluded while copying files from the new primary onto the old primary. Would like to invite more thoughts from the hackers. Regards, RKN
Re: Commitfest 2022-03 Patch Triage Part 1b
Just FYI. Better to follow up to the thread for the patch that's already in the CF. Otherwise your patch will missed by someone who looks at the CF entry to see the latest patch. Indeed. Done. -- Fabien.
Re: psql - add SHOW_ALL_RESULTS option
Are you planning to send a rebased patch for this commit fest? Argh, I did it in a reply in another thread:-( Attached v15. So as to help moves things forward, I'd suggest that we should not to care too much about corner case repetition of some error messages which are due to libpq internals, so I could remove the ugly buffer reset from the patch and have the repetition, and if/when the issue is fixed later in libpq then the repetition will be removed, fine! The issue is that we just expose the strange behavior of libpq, which is libpq to solve, not psql. What do you think? -- Fabien.diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index e0abe34bb6..8f7f93172a 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -50,8 +50,28 @@ BEGIN \; SELECT 2.0 AS "float" \; SELECT 'world' AS "text" \; COMMIT; + float +--- + 2.0 +(1 row) + + text +--- + world +(1 row) + -- compound with empty statements and spurious leading spacing \;\; SELECT 3 + 3 \;\;\; SELECT ' ' || ' !' \;\; SELECT 1 + 4 \;; + ?column? +-- +6 +(1 row) + + ?column? +-- + ! +(1 row) + ?column? -- 5 @@ -61,6 +81,11 @@ COMMIT; SELECT 1 + 1 + 1 AS "add" \gset SELECT :add + 1 + 1 AS "add" \; SELECT :add + 1 + 1 AS "add" \gset + add +- + 5 +(1 row) + -- set operator SELECT 1 AS i UNION SELECT 2 ORDER BY i; i diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index caabb06c53..f01adb1fd2 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -127,18 +127,11 @@ echo '\x \\ SELECT * FROM foo;' | psql commands included in the string to divide it into multiple transactions. (See for more details about how the server handles multi-query strings.) - Also, psql only prints the - result of the last SQL command in the string. - This is different from the behavior when the same string is read from - a file or fed to psql's standard input, - because then psql sends - each SQL command separately. - Because of this behavior, putting more than one SQL command in a - single -c string often has unexpected results. - It's better to use repeated -c commands or feed - multiple commands to psql's standard input, + If having several commands executed in one transaction is not desired, + use repeated -c commands or feed multiple commands to + psql's standard input, either using echo as illustrated above, or via a shell here-document, for example: @@ -3570,10 +3563,6 @@ select 1\; select 2\; select 3; commands included in the string to divide it into multiple transactions. (See for more details about how the server handles multi-query strings.) -psql prints only the last query result -it receives for each request; in this example, although all -three SELECTs are indeed executed, psql -only prints the 3. @@ -4160,6 +4149,18 @@ bar +SHOW_ALL_RESULTS + + +When this variable is set to off, only the last +result of a combined query (\;) is shown instead of +all of them. The default is on. The off behavior +is for compatibility with older versions of psql. + + + + + SHOW_CONTEXT diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index d65b9a124f..3b2f6305b4 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -34,6 +34,8 @@ static bool DescribeQuery(const char *query, double *elapsed_msec); static bool ExecQueryUsingCursor(const char *query, double *elapsed_msec); static bool command_no_begin(const char *query); static bool is_select_command(const char *query); +static int SendQueryAndProcessResults(const char *query, double *pelapsed_msec, + bool is_watch, const printQueryOpt *opt, FILE *printQueryFout, bool *tx_ended); /* @@ -354,7 +356,7 @@ CheckConnection(void) * Returns true for valid result, false for error state. */ static bool -AcceptResult(const PGresult *result) +AcceptResult(const PGresult *result, bool show_error) { bool OK; @@ -385,7 +387,7 @@ AcceptResult(const PGresult *result) break; } - if (!OK) + if (!OK && show_error) { const char *error = PQerrorMessage(pset.db); @@ -473,6 +475,18 @@ ClearOrSaveResult(PGresult *result) } } +/* + * Consume all results + */ +static void +ClearOrSaveAllResults() +{ + PGresult *result; + + while ((result = PQgetResult(pset.db)) != NULL) + ClearOrSaveResult(result); +} + /* * Print microtiming output. Always print raw milliseconds; if the interval @@ -573,7 +587,7 @@ PSQLexec(const char
Re: wal_compression=zstd
On 2/22/22 18:19, Justin Pryzby wrote: > As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress > level is 6). I think this choice needs to be supported by some benchmarks. > > 0001 adds support for zstd > 0002 makes more efficient our use of bits in the WAL header > 0003 makes it the default compression, to exercise on CI (windows will fail). > 0004 adds support for zstd-level > 0005-6 are needed to allow recovery tests to pass when wal compression is > enabled; > 0007 (not included) also adds support for zlib. I'm of the impression nobody > cares about this, otherwise it would've been included 5-10 years ago. > > Only 0001 should be reviewed for pg15 - the others are optional/future work. I don't see why patch 5 shouldn't be applied forthwith. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: wal_compression=zstd
On Fri, Mar 4, 2022 at 6:44 AM Justin Pryzby wrote: > > Why? ZSTD using this default has its reasons, no? And it would be > > consistent to do the same for ZSTD as for the other two methods. > > In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the > cost. I agree with Michael. Your 1-off test is exactly that, and the results will have depended on the data you used for the test. I'm not saying we could never decide to default to a compression level other than the library's default, but I do not think we should do it casually or as the result of any small number of tests. There should be a strong presumption that the authors of the library have a good idea what is sensible in general unless we can explain compellingly why our use case is different from typical ones. There's an ease-of-use concern here too. It's not going to make things any easier for users to grok if zstd is available in different parts of the system but has different defaults in each place. It wouldn't be the end of the world if that happened, but neither would it be ideal. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [Proposal] vacuumdb --schema only
Le 04/03/2022 à 11:56, Justin Pryzby a écrit : On Fri, Mar 04, 2022 at 10:11:28AM +0100, Gilles Darold wrote: The attached patch implements that. Option -n | --schema can be used multiple time and can not be used together with options -a or -t. Yes, thanks. I suggest there should also be an --exclude-schema. Ok, I will add it too. I do not propose to extend the VACUUM and ANALYZE commands because their current syntax doesn't allow me to see an easy way to do that I think this would be easy with the parenthesized syntax. I'm not suggesting to do it there, though. Yes this is what I've though, something a la EXPLAIN, for example : "VACUUM (ANALYZE, SCHEMA foo)" but this is a change in the VACUUM syntax that needs to keep the compatibility with the current syntax. We will have two syntax something like "VACUUM ANALYZE FULL dbname" and "VACUUM (ANALYZE, FULL) dbname". The other syntax "problem" is to be able to use multiple schema values in the VACUUM command, perhaps "VACUUM (ANALYZE, SCHEMA (foo,bar))". + /* +* When filtereing on schema name, filter by table is not allowed. +* The schema name can already be set in a fqdn table name. set *to* Thanks, will be fixed in next patch version. -- Gilles Darold
Re: Add the replication origin name and commit-LSN to logical replication worker errcontext
On Fri, Mar 4, 2022, at 2:54 AM, Amit Kapila wrote: > The LSN of the transaction that contains the change violating the > constraint and the replication origin name can be found from the > server log (LSN 0/14C0378 and replication origin > pg_16395 in the above case). To skip the > transaction, the subscription needs to be disabled temporarily by > ALTER SUBSCRIPTION ... DISABLE first. Then, the > transaction can be skipped by calling the linkend="pg-replication-origin-advance"> > pg_replication_origin_advance() function > with the node_name (i.e., > pg_16395) and the next LSN of the commit LSN (i.e., > LSN 0/14C0379). You could also add: After that the replication can be resumed by ALTER SUBSCRIPTION ... ENABLE. Let's provide complete instructions. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Thanks Bharath for working on all my review comments. I took a quick look at the new version of the patch (v7-pg_walinspect.patch) and this version looks a lot better. I'll do some detailed review later (maybe next week or so) and let you know my further comments, if any. -- With Regards, Ashutosh Sharma. On Fri, Mar 4, 2022 at 3:54 PM Bharath Rupireddy wrote: > > On Thu, Mar 3, 2022 at 10:05 PM Robert Haas wrote: > > > > On Fri, Feb 25, 2022 at 6:03 AM Bharath Rupireddy > > wrote: > > > Added a new function that returns the first and last valid WAL record > > > LSN of a given WAL file. > > > > Sounds like fuzzy thinking to me. WAL records can cross file > > boundaries, and forgetting about that leads to all sorts of problems. > > Just give people one function that decodes a range of LSNs and call it > > good. Why do you need anything else? If people want to get the first > > record that begins in a segment or the first record any portion of > > which is in a particular segment or the last record that begins in a > > segment or the last record that ends in a segment or any other such > > thing, they can use a WHERE clause for that... and if you think they > > can't, then that should be good cause to rethink the return value of > > the one-and-only SRF that I think you need here. > > Thanks Robert. > > Thanks to others for your review comments. > > Here's the v7 patch set. These patches are based on the motive that > "keep it simple and short yet effective and useful". With that in > mind, I have not implemented the wait mode for any of the functions > (as it doesn't look an effective use-case and requires adding a new > page_read callback, instead throw error if future LSN is specified), > also these functions will give WAL information on the current server's > timeline. Having said that, I'm open to adding new functions in future > once this initial version gets in, if there's a requirement and users > ask for the new functions. > > Please review the v7 patch set and provide your thoughts. > > Regards, > Bharath Rupireddy.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Please don't mix comments from multiple reviewers into one thread. It's hard to understand which comments are mine or Julien's or from others. Can you please respond to the email from each of us separately with an inline response. That will be helpful to understand your thoughts on our review comments. -- With Regards, Ashutosh Sharma. On Fri, Mar 4, 2022 at 4:59 PM Nitin Jadhav wrote: > > Thanks for reviewing. > > > 6) How about shutdown and end-of-recovery checkpoint? Are you planning > > to have an ereport_startup_progress mechanism as 0002? > > I thought of including it earlier then I felt lets first make the > current patch stable. Once all the fields are properly decided and the > patch gets in then we can easily extend the functionality to shutdown > and end-of-recovery cases. I have also observed that the timer > functionality wont work properly in case of shutdown as we are doing > an immediate checkpoint. So this needs a lot of discussion and I would > like to handle this on a separate thread. > --- > > > 7) I think you don't need to call checkpoint_progress_start and > > pgstat_progress_update_param, any other progress reporting function > > for shutdown and end-of-recovery checkpoint right? > > I had included the guards earlier and then removed later based on the > discussion upthread. > --- > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-+- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time| 2022-03-03 18:59:56.018662+05:30 > > phase | performing two phase checkpoint > > > > > > This is the output I see when the checkpointer process has come out of > > the two phase checkpoint and is currently writing checkpoint xlog > > records and doing other stuff like updating control files etc. Is this > > okay? > > The idea behind choosing the phases is based on the functionality > which takes longer time to execute. Since after two phase checkpoint > till post checkpoint cleanup won't take much time to execute, I have > not added any additional phase for that. But I also agree that this > gives wrong information to the user. How about mentioning the phase > information at the end of each phase like "Initializing", > "Initialization done", ..., "two phase checkpoint done", "post > checkpoint cleanup done", .., "finalizing". Except for the first phase > ("initializing") and last phase ("finalizing"), all the other phases > describe the end of a certain operation. I feel this gives correct > information even though the phase name/description does not represent > the entire code block between two phases. For example if the current > phase is ''two phase checkpoint done". Then the user can infer that > the checkpointer has done till two phase checkpoint and it is doing > other stuff that are after that. Thoughts? > > > The output of log_checkpoint shows the number of buffers written is 3 > > whereas the output of pg_stat_progress_checkpoint shows it as 0. See > > below: > > > > 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3 > > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; > > write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2, > > longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB > > > > -- > > > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > > -[ RECORD 1 ]-+- > > pid | 22043 > > type | checkpoint > > kind | immediate force wait requested time > > start_lsn | 0/14C60F8 > > start_time| 2022-03-03 18:59:56.018662+05:30 > > phase | finalizing > > buffers_total | 0 > > buffers_processed | 0 > > buffers_written | 0 > > > > Any idea why this mismatch? > > Good catch. In BufferSync() we have 'num_to_scan' (buffers_total) > which indicates the total number of buffers to be processed. Based on > that, the 'buffers_processed' and 'buffers_written' counter gets > incremented. I meant these values may reach upto 'buffers_total'. The > current pg_stat_progress_view support above information. There is > another place when 'ckpt_bufs_written' gets incremented (In > SlruInternalWritePage()). This increment is above the 'buffers_total' > value and it is included in the server log message (checkpoint end) > and not included in the view. I am a bit confused here. If we include > this increment in the view then we cannot calculate the exact > 'buffers_total' beforehand. Can we increment the 'buffers_toal' also > when 'ckpt_bufs_written' gets incremented so that we can match the > behaviour with checkpoint end message? Please share your thoughts. > --- > > > I think we can add a couple of more information to this view - > > start_time for buffer write operation and start_time for buffer sync >
Re: Add the replication origin name and commit-LSN to logical replication worker errcontext
On Fri, Mar 4, 2022 at 2:55 PM Amit Kapila wrote: > > On Fri, Mar 4, 2022 at 6:40 AM Masahiko Sawada wrote: > > > > Attached updated version patches. > > > > The patch looks mostly good to me. Few minor comments: Thank you for the comments! > 1. I think we can have an Assert for errarg->origin_name in > apply_error_callback after checking the command as this function > assumes that it will always be set. Added. > 2. I suggest minor changes in the documentation change: > When a conflict produces an error, the replication won't proceed, and > the apply worker will emit the following kind of message to the > subscriber's server log: > + > +ERROR: duplicate key value violates unique constraint "test_pkey" > +DETAIL: Key (c)=(1) already exists. > +CONTEXT: processing remote data during "INSERT" for replication > target relation "public.test" in transaction 725 committed at LSN > 0/14BFA88 from replication origin "pg_16395" > + > > The LSN of the transaction that contains the change violating the > constraint and the replication origin name can be found from the > server log (LSN 0/14C0378 and replication origin > pg_16395 in the above case). To skip the > transaction, the subscription needs to be disabled temporarily by > ALTER SUBSCRIPTION ... DISABLE first. Then, the > transaction can be skipped by calling the linkend="pg-replication-origin-advance"> > pg_replication_origin_advance() function > with the node_name (i.e., > pg_16395) and the next LSN of the commit LSN (i.e., > LSN 0/14C0379). Fixed. On Fri, Mar 4, 2022 at 3:21 PM Amit Kapila wrote: > > On Fri, Mar 4, 2022 at 11:45 AM osumi.takami...@fujitsu.com > wrote: > > > > On Friday, March 4, 2022 2:23 PM Masahiko Sawada > > wrote: > > > I've attached updated patches. > > Hi, thank you for updating the patch. > > > > One comment on v4. > > > > In v4-0002, we introduce 'commit_lsn' in the ApplyErrorCallbackArg. > > This member is set for prepare, rollback prepared and stream_abort as well. > > The new log message format is useful when we have a prepare transaction > > that keeps failing on the subscriber and want to know the target transaction > > for the pg_replication_origin_advance(), right ? > > If this is true, I wasn't sure if the name 'commit_lsn' is the > > most accurate name for this variable. Should we adjust the name a bit ? > > > > If we want to change this variable name, the other options could be > 'end_lsn', or 'finish_lsn'. > Agreed with 'finish_lsn'. I've attached updated patches. Please review them. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/ v4-0002-Add-the-origin-name-and-remote-commit-LSN-to-logi.patch Description: Binary data v4-0001-Use-complete-sentences-in-logical-replication-wor.patch Description: Binary data
Re: wal_compression=zstd
On Fri, Mar 04, 2022 at 04:19:32PM +0900, Michael Paquier wrote: > On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote: > > > As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress > > level is 6). > > Why? ZSTD using this default has its reasons, no? And it would be > consistent to do the same for ZSTD as for the other two methods. In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the cost. postgres=# SET wal_compression= 'zstd-6'; postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal'); begin; CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal; Duración: 2729,017 ms (00:02,729) wal_bytes| 6102403 postgres=# SET wal_compression= 'zstd-1'; postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal'); begin; CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal; Duración: 2090,459 ms (00:02,090) wal_bytes| 6330269 It may be relevant that we're only compressing 8k [0]. It would probably pay off to preserve a compression "dictionary" to be re-used across FPIs. > -The supported methods are pglz and > -lz4 (if PostgreSQL was > -compiled with --with-lz4). The default value is > -off. Only superusers can change this setting. > +The supported methods are pglz, and > +(if configured when PostgreSQLwas built) > +lz4 and zstd. > +The default value is off. > +Only superusers can change this setting. > > This is making the docs less verbose. I think that you should keep > the mention to respectively --with-lz4 and --with-zstd after each > value. I changed this relative to your latest zstd patch in July since it reads better to me. YMMV. On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote: >> 0001 adds support for zstd >> 0002 makes more efficient our use of bits in the WAL header >> 0003 makes it the default compression, to exercise on CI (windows will fail). >> 0004 adds support for zstd-level >> 0005-6 are needed to allow recovery tests to pass when wal compression is >> enabled; >> 0007 (not included) also adds support for zlib. I'm of the impression nobody >> cares about this, otherwise it would've been included 5-10 years ago. > 0003, defaulting to zstd, would require a lot of benchmarking to be > justified. Maybe you didn't see that I wrote that it was for CI ? (In any case, it's impossible to do that without first taking care of 005-6). > and 0004 to extend compression to support a level > I have argued against making the code more complicated for such things in the > past, with reloptions. I suppose that you're referring to reloptions for toast compression, which was about controlling LZ4 compression level. https://www.postgresql.org/message-id/flat/cafitn-vmhu_yakmgno90suih_6ltvmqz5hpqb2itgqtvyoj...@mail.gmail.com I think you're right that it's not interesting to control the compression level of LZ4 - but there's no reason to say the same thing of zstd. We're already debating which is the "right" level to use (1 or 6). I don't think there is a "right" level - some people will want to trade more CPU for disk writes and some people won't.
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Thanks for reviewing. > 6) How about shutdown and end-of-recovery checkpoint? Are you planning > to have an ereport_startup_progress mechanism as 0002? I thought of including it earlier then I felt lets first make the current patch stable. Once all the fields are properly decided and the patch gets in then we can easily extend the functionality to shutdown and end-of-recovery cases. I have also observed that the timer functionality wont work properly in case of shutdown as we are doing an immediate checkpoint. So this needs a lot of discussion and I would like to handle this on a separate thread. --- > 7) I think you don't need to call checkpoint_progress_start and > pgstat_progress_update_param, any other progress reporting function > for shutdown and end-of-recovery checkpoint right? I had included the guards earlier and then removed later based on the discussion upthread. --- > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > -[ RECORD 1 ]-+- > pid | 22043 > type | checkpoint > kind | immediate force wait requested time > start_lsn | 0/14C60F8 > start_time| 2022-03-03 18:59:56.018662+05:30 > phase | performing two phase checkpoint > > > This is the output I see when the checkpointer process has come out of > the two phase checkpoint and is currently writing checkpoint xlog > records and doing other stuff like updating control files etc. Is this > okay? The idea behind choosing the phases is based on the functionality which takes longer time to execute. Since after two phase checkpoint till post checkpoint cleanup won't take much time to execute, I have not added any additional phase for that. But I also agree that this gives wrong information to the user. How about mentioning the phase information at the end of each phase like "Initializing", "Initialization done", ..., "two phase checkpoint done", "post checkpoint cleanup done", .., "finalizing". Except for the first phase ("initializing") and last phase ("finalizing"), all the other phases describe the end of a certain operation. I feel this gives correct information even though the phase name/description does not represent the entire code block between two phases. For example if the current phase is ''two phase checkpoint done". Then the user can infer that the checkpointer has done till two phase checkpoint and it is doing other stuff that are after that. Thoughts? > The output of log_checkpoint shows the number of buffers written is 3 > whereas the output of pg_stat_progress_checkpoint shows it as 0. See > below: > > 2022-03-03 20:04:45.643 IST [22043] LOG: checkpoint complete: wrote 3 > buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; > write=24.652 s, sync=104.256 s, total=3889.625 s; sync files=2, > longest=0.011 s, average=0.008 s; distance=0 kB, estimate=0 kB > > -- > > [local]:5432 ashu@postgres=# select * from pg_stat_progress_checkpoint; > -[ RECORD 1 ]-+- > pid | 22043 > type | checkpoint > kind | immediate force wait requested time > start_lsn | 0/14C60F8 > start_time| 2022-03-03 18:59:56.018662+05:30 > phase | finalizing > buffers_total | 0 > buffers_processed | 0 > buffers_written | 0 > > Any idea why this mismatch? Good catch. In BufferSync() we have 'num_to_scan' (buffers_total) which indicates the total number of buffers to be processed. Based on that, the 'buffers_processed' and 'buffers_written' counter gets incremented. I meant these values may reach upto 'buffers_total'. The current pg_stat_progress_view support above information. There is another place when 'ckpt_bufs_written' gets incremented (In SlruInternalWritePage()). This increment is above the 'buffers_total' value and it is included in the server log message (checkpoint end) and not included in the view. I am a bit confused here. If we include this increment in the view then we cannot calculate the exact 'buffers_total' beforehand. Can we increment the 'buffers_toal' also when 'ckpt_bufs_written' gets incremented so that we can match the behaviour with checkpoint end message? Please share your thoughts. --- > I think we can add a couple of more information to this view - > start_time for buffer write operation and start_time for buffer sync > operation. These are two very time consuming tasks in a checkpoint and > people would find it useful to know how much time is being taken by > the checkpoint in I/O operation phase. thoughts? I felt the detailed progress is getting shown for these 2 phases of the checkpoint like 'buffers_processed', 'buffers_written' and 'files_synced'. Hence I did not think about adding start time and If it is really required, then I can add. > Is it that hard or costly to do? Just sending a message to increment > the stat counter in RequestCheckpoint() would be enough. > > Also,
Re: [Proposal] vacuumdb --schema only
On Fri, 4 Mar 2022 at 14:41, Gilles Darold wrote: > Hi, > > > When we want to vacuum and/or analyze all tables in a dedicated schema, > let's say pg_catalog for example, there is no easy way to do that. The > VACUUM command doesn't allow it so we have to use \gexec or a SQL script > to do that. We have an external command vacuumdb that could be used to > simplify this task. For example the following command can be used to > clean all tables stored in the pg_catalog schema: > > vacuumdb --schema pg_catalog -d foo > > +1 This gives much better flexibility to users. > The attached patch implements that. Option -n | --schema can be used > multiple time and can not be used together with options -a or -t. > > > Common use cases are an application that creates lot of temporary > objects then drop them which can bloat a lot the catalog or which have > heavy work in some schemas only. Of course the good practice is to find > the bloated tables and execute VACUUM on each table but if most of the > tables in the schema are regularly bloated the use of the vacuumdb > --schema script can save time. > > > I do not propose to extend the VACUUM and ANALYZE commands because their > current syntax doesn't allow me to see an easy way to do that and also > because I'm not really in favor of such change. But if there is interest > in improving these commands I will be pleased to do that, with the > syntax suggested. > > > Best regards, > > -- > Gilles Darold >
Re: [Proposal] vacuumdb --schema only
On Fri, Mar 04, 2022 at 10:11:28AM +0100, Gilles Darold wrote: > The attached patch implements that. Option -n | --schema can be used > multiple time and can not be used together with options -a or -t. Yes, thanks. I suggest there should also be an --exclude-schema. > I do not propose to extend the VACUUM and ANALYZE commands because their > current syntax doesn't allow me to see an easy way to do that I think this would be easy with the parenthesized syntax. I'm not suggesting to do it there, though. > + /* > + * When filtereing on schema name, filter by table is not allowed. > + * The schema name can already be set in a fqdn table name. set *to* -- Justin
Re: Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
On Fri, Mar 4, 2022 at 10:37 PM Bharath Rupireddy wrote: > It looks like regression tests are failing on Windows Server 2019 [1]. > Ignore if it's reported elsewhere. It's broken since 46ab07ff "Clean up assorted failures under clang's -fsanitize=undefined checks.": https://github.com/postgres/postgres/commits/master I don't see what that patch has to do with the symptoms. It looks a bit like the cluster started by the "startcreate" step ("pg_ctl.exe start ...") is mysteriously no longer running when we get to the "test_pl" step ("Connection refused").
Re: Removing unneeded self joins
On 1/3/2022 03:03, Greg Stark wrote: On Thu, 1 Jul 2021 at 02:38, Ronan Dunklau wrote: Well in some cases they can't, when the query is not emitting redundant predicates by itself but they are added by something else like a view or a RLS policy. Maybe it would be worth it to allow spending a bit more time planning for those cases ? Yeah, I'm generally in favour of doing more work in the optimizer to save query authors work writing queries. My question is whether it handles cases like: select b.x,c.y from t join t2 as b on (b.id = t.id) join t2 as c on (c.id = t.id) That is, if you join against the same table twice on the same qual. Does the EC mechanism turn this into a qual on b.id = c.id and then turn this into a self-join that can be removed? Yes, the self-join removal machinery uses EC mechanism as usual to get all join clauses. So, this case works (See demo in attachment). Also, in new version of the patch I fixed one stupid bug: checking a self-join candidate expression operator - we can remove only expressions like F(arg1) = G(arg2). -- regards, Andrey Lepikhov Postgres ProfessionalFrom 70398361a0a0d9c6c3c7ddd1fd305ac11138e7b1 Mon Sep 17 00:00:00 2001 From: "Andrey V. Lepikhov" Date: Thu, 15 Jul 2021 15:26:13 +0300 Subject: [PATCH] Remove self-joins. Remove inner joins of a relation to itself if could prove that the join can be replaced with a scan. We can prove the uniqueness using the existing innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clauses. 2. If the join target list contains any inner vars, an inner row must be (physically) the same row as the outer one. In this patch we use Rowley's [1] approach to identify a self-join: 1. Collect all mergejoinable join quals which look like a.x = b.x 2. Check innerrel_is_unique() for the qual list from (1). If it returns true, then outer row matches only the same row from the inner relation. So proved, that this join is self-join and can be replaced by a scan. Some regression tests changed due to self-join removal logic. [1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com --- src/backend/optimizer/plan/analyzejoins.c | 888 +- src/backend/optimizer/plan/planmain.c | 5 + src/backend/optimizer/util/joininfo.c | 3 + src/backend/optimizer/util/relnode.c | 26 +- src/backend/utils/misc/guc.c | 10 + src/include/optimizer/pathnode.h | 4 + src/include/optimizer/planmain.h | 2 + src/test/regress/expected/equivclass.out | 32 + src/test/regress/expected/join.out| 426 +++ src/test/regress/expected/sysviews.out| 3 +- src/test/regress/sql/equivclass.sql | 16 + src/test/regress/sql/join.sql | 197 + 12 files changed, 1583 insertions(+), 29 deletions(-) diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index 337f470d58..c5ac8e2bd4 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -22,6 +22,7 @@ */ #include "postgres.h" +#include "catalog/pg_class.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" #include "optimizer/joininfo.h" @@ -32,10 +33,12 @@ #include "optimizer/tlist.h" #include "utils/lsyscache.h" +bool enable_self_join_removal; + /* local functions */ static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo); static void remove_rel_from_query(PlannerInfo *root, int relid, - Relids joinrelids); + Relids joinrelids, int subst_relid); static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved); static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel); static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel, @@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root, RelOptInfo *innerrel, JoinType jointype, List *restrictlist); +static void change_rinfo(RestrictInfo* rinfo, Index from, Index to); +static Bitmapset* change_relid(Relids relids, Index oldId, Index newId); +static void change_varno(Expr *expr, Index oldRelid, Index newRelid); /* @@ -86,7 +92,7 @@ restart: remove_rel_from_query(root, innerrelid, bms_union(sjinfo->min_lefthand, - sjinfo->min_righthand)); +
Re: Column Filtering in Logical Replication
On Wed, Mar 2, 2022 at 5:43 PM Tomas Vondra wrote: > > Attached is an updated patch, addressing most of the issues reported so > far. There are various minor tweaks, but the main changes are: ... > > 3) checks of column filter vs. publish_via_partition_root and replica > identity, following the same logic as the row-filter patch (hopefully, > it touches the same places, using the same logic, ...) > > That means - with "publish_via_partition_root=false" it's not allowed to > specify column filters on partitioned tables, only for leaf partitions. > > And we check column filter vs. replica identity when adding tables to > publications, or whenever we change the replica identity. > This handling is different from row filter work and I see problems with it. The column list validation w.r.t primary key (default replica identity) is missing. The handling of column list vs. partitions has multiple problems: (a) In attach partition, the patch is just checking ancestors for RI validation but what if the table being attached has further subpartitions; (b) I think the current locking also seems to have problems because it is quite possible that while it validates the ancestors here, concurrently someone changes the column list. I think it won't be enough to just change the locking mode because with the current patch strategy during attach, we will be first taking locks for child tables of current partition and then parent tables which can pose deadlock hazards. The columns list validation also needs to be done when we change publication action. There could be more similar problems which I might have missed. For some of these (except for concurrency issues), my colleague Shi-San has done testing and the results are below [1]. I feel we should do RI vs. column list handling similar to row filter work (at one place) to avoid all such hazards and possibly similar handling at various places, there is a good chance that we will miss some places or make mistakes that are not easy to catch. Do let me know if you think it makes sense for me or one of the people who work on row filter patch to try this (make the handling of RI checks similar to row filter work) and then we can see if that turns out to be a simple way to deal with all these problems? Some other miscellaneous comments: = * In get_rel_sync_entry(), the handling for partitioned tables doesn't seem to be correct. It can publish a different set of columns based on the order of publications specified in the subscription. For example: create table parent (a int, b int, c int) partition by range (a); create table test_part1 (like parent); alter table parent attach partition test_part1 for values from (1) to (10); create publication pub for table parent(a) with (PUBLISH_VIA_PARTITION_ROOT); create publication pub2 for table test_part1(b); --- Now, depending on the order of publications in the list while defining subscription, the column list will change create subscription sub connection 'port=1 dbname=postgres' publication pub, pub2; For the above, column list will be: (a) create subscription sub connection 'port=1 dbname=postgres' publication pub2, pub; For this one, the column list will be: (a, b) To avoid this, the column list should be computed based on the final publish_as_relid as we are doing for the row filter. * Fetching column filter info in tablesync.c is quite expensive. It seems to be using four round-trips to get the complete info whereas for row-filter we use just one round trip. I think we should try to get both row filter and column filter info in just one round trip. [1] - Test-1: The patch doesn't check when the primary key changes. e.g. -- publisher -- create table tbl(a int primary key, b int); create publication pub for table tbl(a); alter table tbl drop CONSTRAINT tbl_pkey; alter table tbl add primary key (b); insert into tbl values (1,1); -- subscriber -- create table tbl(a int, b int); create subscription sub connection 'port=5432 dbname=postgres' publication pub; update tbl set b=1 where a=1; alter table tbl add primary key (b); -- publisher -- delete from tbl; Column "b" is part of replica identity, but it is filtered, which caused an error on the subscriber side. ERROR: publisher did not send replica identity column expected by the logical replication target relation "public.tbl" CONTEXT: processing remote data during "DELETE" for replication target relation "public.tbl" in transaction 724 at 2022-03-04 11:46:16.330892+08 Test-2: Partitioned table RI w.r.t column list. 2.1 Using "create table ... partition of". e.g. -- publisher -- create table parent (a int, b int) partition by range (a); create publication pub for table parent(a) with(publish_via_partition_root=true); create table child partition of parent (primary key (a,b)) default; insert into parent values (1,1); -- subscriber -- create table parent (a int, b int) partition by range (a); create table child
Re: pg_tablespace_location() failure with allow_in_place_tablespaces
On Fri, Mar 04, 2022 at 06:04:00PM +0900, Kyotaro Horiguchi wrote: > And I made a quick hack on do_pg_start_backup. And I found that > pg_basebackup copies in-place tablespaces under the *current > directory*, which is not ok at all:( Yeah, I have noticed that as well while testing such configurations a couple of hours ago, but I am not sure yet how much we need to care about that as in-place tablespaces are included in the main data directory anyway, which would be fine for most test purposes we usually care about. Perhaps this has an impact on the patch posted on the thread that wants to improve the guarantees around tablespace directory structures, but I have not studied this thread much to have an opinion. And it is Friday. -- Michael signature.asc Description: PGP signature
Re: pg_tablespace_location() failure with allow_in_place_tablespaces
On Fri, Mar 4, 2022 at 10:04 PM Kyotaro Horiguchi wrote: > And I made a quick hack on do_pg_start_backup. And I found that > pg_basebackup copies in-place tablespaces under the *current > directory*, which is not ok at all:( Hmm. Which OS are you on? Looks OK here -- the "in place" tablespace gets copied as a directory under pg_tblspc, no symlink: postgres=# set allow_in_place_tablespaces = on; SET postgres=# create tablespace ts1 location ''; CREATE TABLESPACE postgres=# create table t (i int) tablespace ts1; CREATE TABLE postgres=# insert into t values (1), (2); INSERT 0 2 postgres=# create user replication replication; CREATE ROLE $ pg_basebackup --user replication -D pgdata2 $ ls -slaph pgdata/pg_tblspc/ total 4.0K 0 drwx-- 3 tmunro tmunro 19 Mar 4 23:16 ./ 4.0K drwx-- 19 tmunro tmunro 4.0K Mar 4 23:16 ../ 0 drwx-- 3 tmunro tmunro 29 Mar 4 23:16 16384/ $ ls -slaph pgdata2/pg_tblspc/ total 4.0K 0 drwx-- 3 tmunro tmunro 19 Mar 4 23:16 ./ 4.0K drwx-- 19 tmunro tmunro 4.0K Mar 4 23:16 ../ 0 drwx-- 3 tmunro tmunro 29 Mar 4 23:16 16384/ $ ls -slaph pgdata/pg_tblspc/16384/PG_15_202203031/5/ total 8.0K 0 drwx-- 2 tmunro tmunro 19 Mar 4 23:16 ./ 0 drwx-- 3 tmunro tmunro 15 Mar 4 23:16 ../ 8.0K -rw--- 1 tmunro tmunro 8.0K Mar 4 23:16 16385 $ ls -slaph pgdata2/pg_tblspc/16384/PG_15_202203031/5/ total 8.0K 0 drwx-- 2 tmunro tmunro 19 Mar 4 23:16 ./ 0 drwx-- 3 tmunro tmunro 15 Mar 4 23:16 ../ 8.0K -rw--- 1 tmunro tmunro 8.0K Mar 4 23:16 16385 The warning from readlink() while making the mapping file isn't ideal, and perhaps we should suppress that with something like the attached. Or does the missing map file entry break something on Windows? From 6f001ec46c5e5f6ffa8e103f2b0d711e6904b763 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 4 Mar 2022 23:07:46 +1300 Subject: [PATCH] Fix warning in basebackup of in-place tablespaces. While building the tablespace map for a base backup, don't call readlink() on directories under pg_tblspc. --- src/backend/access/transam/xlog.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 0d2bd7a357..b92cc66afb 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -66,6 +66,7 @@ #include "catalog/pg_control.h" #include "catalog/pg_database.h" #include "common/controldata_utils.h" +#include "common/file_utils.h" #include "executor/instrument.h" #include "miscadmin.h" #include "pg_trace.h" @@ -8292,6 +8293,10 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, snprintf(fullpath, sizeof(fullpath), "pg_tblspc/%s", de->d_name); + /* Skip in-place tablespaces (testing use only) */ + if (get_dirent_type(fullpath, de, false, ERROR) == PGFILETYPE_DIR) +continue; + #if defined(HAVE_READLINK) || defined(WIN32) rllen = readlink(fullpath, linkpath, sizeof(linkpath)); if (rllen < 0) -- 2.30.2
Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
On Thu, Mar 3, 2022 at 10:05 PM Robert Haas wrote: > > On Fri, Feb 25, 2022 at 6:03 AM Bharath Rupireddy > wrote: > > Added a new function that returns the first and last valid WAL record > > LSN of a given WAL file. > > Sounds like fuzzy thinking to me. WAL records can cross file > boundaries, and forgetting about that leads to all sorts of problems. > Just give people one function that decodes a range of LSNs and call it > good. Why do you need anything else? If people want to get the first > record that begins in a segment or the first record any portion of > which is in a particular segment or the last record that begins in a > segment or the last record that ends in a segment or any other such > thing, they can use a WHERE clause for that... and if you think they > can't, then that should be good cause to rethink the return value of > the one-and-only SRF that I think you need here. Thanks Robert. Thanks to others for your review comments. Here's the v7 patch set. These patches are based on the motive that "keep it simple and short yet effective and useful". With that in mind, I have not implemented the wait mode for any of the functions (as it doesn't look an effective use-case and requires adding a new page_read callback, instead throw error if future LSN is specified), also these functions will give WAL information on the current server's timeline. Having said that, I'm open to adding new functions in future once this initial version gets in, if there's a requirement and users ask for the new functions. Please review the v7 patch set and provide your thoughts. Regards, Bharath Rupireddy. From df9e5b5f9a76a2002d9ef8d9b0db88003a07a5b5 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Fri, 4 Mar 2022 08:12:20 + Subject: [PATCH v7] pg_walinspect --- contrib/Makefile | 1 + contrib/pg_walinspect/.gitignore | 4 + contrib/pg_walinspect/Makefile | 26 + contrib/pg_walinspect/pg_walinspect--1.0.sql | 91 ++ contrib/pg_walinspect/pg_walinspect.c| 949 +++ contrib/pg_walinspect/pg_walinspect.control | 5 + src/backend/access/transam/xlogreader.c | 14 +- src/bin/pg_waldump/pg_waldump.c | 5 + src/common/relpath.c | 18 + src/include/access/xlog.h| 2 +- src/include/access/xlog_internal.h | 2 +- src/include/access/xlogreader.h | 2 - src/include/common/relpath.h | 1 + 13 files changed, 1109 insertions(+), 11 deletions(-) create mode 100644 contrib/pg_walinspect/.gitignore create mode 100644 contrib/pg_walinspect/Makefile create mode 100644 contrib/pg_walinspect/pg_walinspect--1.0.sql create mode 100644 contrib/pg_walinspect/pg_walinspect.c create mode 100644 contrib/pg_walinspect/pg_walinspect.control diff --git a/contrib/Makefile b/contrib/Makefile index e3e221308b..705c6fc36b 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -40,6 +40,7 @@ SUBDIRS = \ pgrowlocks \ pgstattuple \ pg_visibility \ + pg_walinspect \ postgres_fdw \ seg \ spi \ diff --git a/contrib/pg_walinspect/.gitignore b/contrib/pg_walinspect/.gitignore new file mode 100644 index 00..5dcb3ff972 --- /dev/null +++ b/contrib/pg_walinspect/.gitignore @@ -0,0 +1,4 @@ +# Generated subdirectories +/log/ +/results/ +/tmp_check/ diff --git a/contrib/pg_walinspect/Makefile b/contrib/pg_walinspect/Makefile new file mode 100644 index 00..c92a97447f --- /dev/null +++ b/contrib/pg_walinspect/Makefile @@ -0,0 +1,26 @@ +# contrib/pg_walinspect/Makefile + +MODULE_big = pg_walinspect +OBJS = \ + $(WIN32RES) \ + pg_walinspect.o +PGFILEDESC = "pg_walinspect - functions to inspect contents of PostgreSQL Write-Ahead Log" + +PG_CPPFLAGS = -I$(libpq_srcdir) +SHLIB_LINK_INTERNAL = $(libpq) + +EXTENSION = pg_walinspect +DATA = pg_walinspect--1.0.sql + +REGRESS = pg_walinspect + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/pg_walinspect +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/pg_walinspect/pg_walinspect--1.0.sql b/contrib/pg_walinspect/pg_walinspect--1.0.sql new file mode 100644 index 00..619b1d1d4a --- /dev/null +++ b/contrib/pg_walinspect/pg_walinspect--1.0.sql @@ -0,0 +1,91 @@ +/* contrib/pg_walinspect/pg_walinspect--1.0.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "CREATE EXTENSION pg_walinspect" to load this file. \quit + +-- +-- pg_get_raw_wal_record() +-- +CREATE FUNCTION pg_get_raw_wal_record(IN in_lsn pg_lsn, +OUT lsn pg_lsn, +OUT record bytea +) +AS 'MODULE_PATHNAME', 'pg_get_raw_wal_record' +LANGUAGE C CALLED ON NULL INPUT PARALLEL SAFE; + +GRANT EXECUTE ON FUNCTION pg_get_raw_wal_record(pg_lsn) TO pg_monitor; + +-- +--
Re: Doc about how to set max_wal_senders when setting minimal wal_level
On Fri, 04 Mar 2022 at 14:05, Kyotaro Horiguchi wrote: > At Fri, 04 Mar 2022 12:18:29 +0800, Japin Li wrote in >> >> On Thu, 03 Mar 2022 at 12:10, Japin Li wrote: >> >> Attach v3 patch to fix missing close varname tag. > > +A precondition for using minimal WAL is to disable WAL archiving and > +streaming replication by setting linkend="guc-max-wal-senders"/> > +to 0, and archive_mode > +to off. > > It is a bit odd that the features to stop and the corresponding GUCs > are written irrespectively. It would be better they're in the same > order. > Thanks for your review. Modified. > +servers. If setting max_wal_senders to > +0 consider also reducing the amount of WAL > produced > +by changing wal_level to > minimal. > > Those who anively follow this suggestion may bump into failure when > arhive_mode is on. Thus archive_mode is also worth referred to. But, > anyway, IMHO, it is mere a performance tips that is not necessarily > required in this section, or even in this documentaiotn. Addtion to > that, if we write this for max_wal_senders, archive_mode will deserve > the similar tips but I think it is too verbose. In short, I think I > would not add that description at all. > It already has a tip about wal_level for archive_mode [1], IIUC. archive_mode cannot be enabled when wal_level is set to minimal. [1] https://www.postgresql.org/docs/devel/runtime-config-wal.html#GUC-ARCHIVE-MODE -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 7ed8c82a9d..a70adb7030 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2775,6 +2775,10 @@ include_dir 'conf.d' minimal makes any base backups taken before unavailable for archive recovery and standby server, which may lead to data loss. +A precondition for using minimal WAL is to disable WAL archiving and +streaming replication by setting archive_mode to +off, and to +0. In logical level, the same information is logged as @@ -3535,6 +3539,7 @@ include_dir 'conf.d' mode. In always mode, all files restored from the archive or streamed with streaming replication will be archived (again). See for details. +The default value is off. This parameter can only be set at server start. @@ -4096,7 +4101,9 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows reconnect. This parameter can only be set at server start. Also, wal_level must be set to replica or higher to allow connections from standby -servers. +servers. If setting max_wal_senders to +0 consider also reducing the amount of WAL produced +by changing wal_level to minimal.
Regression tests failures on Windows Server 2019 - on master at commit # d816f366b
Hi, It looks like regression tests are failing on Windows Server 2019 [1]. Ignore if it's reported elsewhere. [1] https://github.com/postgres/postgres/runs/5419324953 ❌ 00:02 test_pl [08:28:53.758] [08:28:53.758] c:\cirrus>call "C:/Program Files/Git/usr/bin/timeout.exe" -v -k60s 15m perl src/tools/msvc/vcregress.pl plcheck [08:28:53.953] [08:28:53.953] Checking plpgsql [08:28:53.977] (using postmaster on Unix socket, default port) [08:28:53.977] == dropping database "pl_regression" == [08:28:56.044] psql: error: connection to server at "localhost" (::1), port 5432 failed: Connection refused (0x274D/10061) [08:28:56.044] Is the server running on that host and accepting TCP/IP connections? [08:28:56.044] connection to server at "localhost" (127.0.0.1), port 5432 failed: Connection refused (0x274D/10061) [08:28:56.044] Is the server running on that host and accepting TCP/IP connections? [08:28:56.050] command failed: "c:/cirrus/Debug/psql/psql" -X -c "SET client_min_messages = warning" -c "DROP DATABASE IF EXISTS \"pl_regression\"" "postgres" [08:28:56.063] [08:28:56.063] c:\cirrus>if 2 NEQ 0 exit /b 2 [08:28:56.084] [08:28:56.084] Exit status: 2 Regards, Bharath Rupireddy.
[Proposal] vacuumdb --schema only
Hi, When we want to vacuum and/or analyze all tables in a dedicated schema, let's say pg_catalog for example, there is no easy way to do that. The VACUUM command doesn't allow it so we have to use \gexec or a SQL script to do that. We have an external command vacuumdb that could be used to simplify this task. For example the following command can be used to clean all tables stored in the pg_catalog schema: vacuumdb --schema pg_catalog -d foo The attached patch implements that. Option -n | --schema can be used multiple time and can not be used together with options -a or -t. Common use cases are an application that creates lot of temporary objects then drop them which can bloat a lot the catalog or which have heavy work in some schemas only. Of course the good practice is to find the bloated tables and execute VACUUM on each table but if most of the tables in the schema are regularly bloated the use of the vacuumdb --schema script can save time. I do not propose to extend the VACUUM and ANALYZE commands because their current syntax doesn't allow me to see an easy way to do that and also because I'm not really in favor of such change. But if there is interest in improving these commands I will be pleased to do that, with the syntax suggested. Best regards, -- Gilles Darold diff --git a/doc/src/sgml/ref/vacuumdb.sgml b/doc/src/sgml/ref/vacuumdb.sgml index 956c0f01cb..e4f6d32ba9 100644 --- a/doc/src/sgml/ref/vacuumdb.sgml +++ b/doc/src/sgml/ref/vacuumdb.sgml @@ -39,6 +39,24 @@ PostgreSQL documentation dbname + + vacuumdb + connection-option + option + + + + + -n + --schema + + schema + + + + dbname + + vacuumdb connection-option @@ -244,6 +262,17 @@ PostgreSQL documentation + + -n schema + --schema=schema + + +Clean or analyze all tables in schema only. +Multiple schemas can be vacuumed by writing multiple -n switches. + + + + --no-index-cleanup @@ -619,6 +648,14 @@ PostgreSQL documentation $ vacuumdb --analyze --verbose --table='foo(bar)' xyzzy + +To clean all tables in the Foo and bar schemas +only in a database named xyzzy: + +$ vacuumdb --schema='"Foo"' --schema='bar' xyzzy + + + diff --git a/src/bin/scripts/t/100_vacuumdb.pl b/src/bin/scripts/t/100_vacuumdb.pl index 96a818a3c1..4c4f47e32a 100644 --- a/src/bin/scripts/t/100_vacuumdb.pl +++ b/src/bin/scripts/t/100_vacuumdb.pl @@ -103,6 +103,8 @@ $node->safe_psql( CREATE TABLE funcidx (x int); INSERT INTO funcidx VALUES (0),(1),(2),(3); CREATE INDEX i0 ON funcidx ((f1(x))); + CREATE SCHEMA "Foo"; + CREATE TABLE "Foo".bar(id int); |); $node->command_ok([qw|vacuumdb -Z --table="need""q(uot"(")x") postgres|], 'column list'); @@ -146,5 +148,12 @@ $node->issues_sql_like( [ 'vacuumdb', '--min-xid-age', '2147483001', 'postgres' ], qr/GREATEST.*relfrozenxid.*2147483001/, 'vacuumdb --table --min-xid-age'); +$node->issues_sql_like( + [ 'vacuumdb', '--schema', '"Foo"', 'postgres' ], + qr/VACUUM "Foo".*/, + 'vacuumdb --schema schema only'); +$node->command_fails( + [ 'vacuumdb', '-n', 'pg_catalog', '-t pg_class', 'postgres' ], + 'cannot vacuum all tables in schema(s) and specific table(s) at the same time'); done_testing(); diff --git a/src/bin/scripts/t/101_vacuumdb_all.pl b/src/bin/scripts/t/101_vacuumdb_all.pl index 1dcf411767..b122c995b1 100644 --- a/src/bin/scripts/t/101_vacuumdb_all.pl +++ b/src/bin/scripts/t/101_vacuumdb_all.pl @@ -15,5 +15,8 @@ $node->issues_sql_like( [ 'vacuumdb', '-a' ], qr/statement: VACUUM.*statement: VACUUM/s, 'vacuum all databases'); +$node->command_fails( + [ 'vacuumdb', '-a', '-n', 'pg_catalog' ], + 'cannot vacuum specific schema(s) in all databases'); done_testing(); diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c index 4f6917fd39..69b470598f 100644 --- a/src/bin/scripts/vacuumdb.c +++ b/src/bin/scripts/vacuumdb.c @@ -46,10 +46,10 @@ typedef struct vacuumingOptions bool process_toast; } vacuumingOptions; - static void vacuum_one_database(ConnParams *cparams, vacuumingOptions *vacopts, int stage, +SimpleStringList *schemas, SimpleStringList *tables, int concurrentCons, const char *progname, bool echo, bool quiet); @@ -94,6 +94,7 @@ main(int argc, char *argv[]) {"verbose", no_argument, NULL, 'v'}, {"jobs", required_argument, NULL, 'j'}, {"parallel", required_argument, NULL, 'P'}, + {"schema", required_argument, NULL, 'n'}, {"maintenance-db", required_argument, NULL, 2}, {"analyze-in-stages", no_argument, NULL, 3}, {"disable-page-skipping", no_argument, NULL, 4}, @@ -125,6 +126,7 @@ main(int argc, char *argv[]) SimpleStringList tables = {NULL, NULL}; int concurrentCons = 1; int tbl_count = 0; + SimpleStringList schemas = {NULL, NULL}; /* initialize options */
Re: pg_tablespace_location() failure with allow_in_place_tablespaces
At Fri, 04 Mar 2022 17:28:45 +0900 (JST), Kyotaro Horiguchi wrote in > > By the way, regardless of the patch, I got an error from pg_basebackup > > for an in-place tablespace. pg_do_start_backup calls readlink > > believing pg_tblspc/* is always a symlink. > > > > # Running: pg_basebackup -D > > /home/horiguti/work/worktrees/tsp_replay_2/src/test/recovery/tmp_check/t_029_replay_tsp_drops_primary1_data/backup/my_backup > > -h /tmp/X8E4nbF4en -p 51584 --checkpoint fast --no-sync > > WARNING: could not read symbolic link "pg_tblspc/16384": Invalid argument > > So now we know that there are three places that needs the same > processing. > > pg_tablespace_location: this patch tries to fix > sendDir:it already supports in-place tsp > do_pg_start_backup: not supports in-place tsp yet. And I made a quick hack on do_pg_start_backup. And I found that pg_basebackup copies in-place tablespaces under the *current directory*, which is not ok at all:( regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: psql - add SHOW_ALL_RESULTS option
On 15.01.22 10:00, Fabien COELHO wrote: The reason this test constantly fails on cfbot windows is a use-after-free bug. Indeed! Thanks a lot for the catch and the debug! The ClearOrSaveResult function is quite annoying because it may or may not clear the result as a side effect. Attached v14 moves the status extraction before the possible clear. I've added a couple of results = NULL after such calls in the code. Are you planning to send a rebased patch for this commit fest?
Re: refactoring basebackup.c
Hi, > > It will be good if we can also fix > > CreateWalTarMethod to support LZ4 and ZSTD. > Ok we will see, either Dipesh or I will take care of it. I took a look at the CreateWalTarMethod to support LZ4 compression for WAL files. The current implementation involves a 3 step to backup a WAL file to a tar archive. For each file: 1. It first writes the header in the function tar_open_for_write, flushes the contents of tar to disk and stores the header offset. 2. Next, the contents of WAL are written to the tar archive. 3. In the end, it recalculates the checksum in function tar_close() and overwrites the header at an offset stored in step #1. The need for overwriting header in CreateWalTarMethod is mainly related to partial WAL files where the size of the WAL file < WalSegSize. The file is being padded and checksum is recalculated after adding pad bytes. If we go ahead and implement LZ4 support for CreateWalTarMethod then we have a problem here at step #3. In order to achieve better compression ratio, compressed LZ4 blocks are linked to each other and these blocks are decoded sequentially. If we overwrite the header as part of step #3 then it corrupts the link between compressed LZ4 blocks. Although LZ4 provides an option to write the compressed block independently (using blockMode option set to LZ4F_blockIndepedent) but it is still a problem because we don't know if overwriting the header after recalculating the checksum will not overlap the boundary of the next block. GZIP manages to overcome this problem as it provides an option to turn on/off compression on the fly while writing a compressed archive with the help of zlib library function deflateParams(). The current gzip implementation for CreateWalTarMethod uses this library function to turn off compression just before step #1 and it writes the uncompressed header of size equal to TAR_BLOCK_SIZE. It uses the same library function to turn on the compression for writing the contents of the WAL file as part of step #2. It again turns off the compression just before step #3 to overwrite the header. The header is overwritten at the same offset with size equal to TAR_BLOCK_SIZE. Since GZIP provides this option to enable/disable compression, it is possible to control the size of data we are writing to a compressed archive. Even if we overwrite an already written block in a compressed archive there is no risk of it overlapping with the boundary of the next block. This mechanism is not available in LZ4 and ZSTD. In order to support LZ4 and ZSTD compression for CreateWalTarMethod we may need to refactor this code unless I am missing something. We need to somehow add the padding bytes in case of partial WAL before we send it to the compressed archive. This will make sure that all files which are being compressed does not require any padding as the size is always equal to WalSegSize. There is no need to recalculate the checksum and we can avoid overwriting the header as part of step #3. Thoughts? Thanks, Dipesh
Re: pg_tablespace_location() failure with allow_in_place_tablespaces
At Fri, 04 Mar 2022 16:54:49 +0900 (JST), Kyotaro Horiguchi wrote in > At Fri, 04 Mar 2022 16:41:03 +0900 (JST), Kyotaro Horiguchi > wrote in > > At Fri, 4 Mar 2022 15:44:22 +0900, Michael Paquier > > wrote in > > > The use may be limited to any automated testing and > > > allow_in_place_tablespaces is a developer GUC, still it seems to me > > > that there is an argument to allow the case rather than tweak any > > > tests to hardcode a path with the tablespace OID. And any other code > > > paths are able to handle such tablespaces, be they in recovery or in > > > tablespace create/drop. > > > > +1 > > By the way, regardless of the patch, I got an error from pg_basebackup > for an in-place tablespace. pg_do_start_backup calls readlink > believing pg_tblspc/* is always a symlink. > > # Running: pg_basebackup -D > /home/horiguti/work/worktrees/tsp_replay_2/src/test/recovery/tmp_check/t_029_replay_tsp_drops_primary1_data/backup/my_backup > -h /tmp/X8E4nbF4en -p 51584 --checkpoint fast --no-sync > WARNING: could not read symbolic link "pg_tblspc/16384": Invalid argument So now we know that there are three places that needs the same processing. pg_tablespace_location: this patch tries to fix sendDir:it already supports in-place tsp do_pg_start_backup: not supports in-place tsp yet. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
RE: Logical replication timeout problem
Dear Wang, > Some codes were added in ReorderBufferProcessTXN() for treating DDL, My mailer went wrong, so I'll put comments again. Sorry. Some codes were added in ReorderBufferProcessTXN() for treating DDL, but I doubted updating accept_writes is needed. IMU, the parameter is read by OutputPluginPrepareWrite() in order align messages. They should have a header - like 'w' - before their body. But here only a keepalive message is sent, no meaningful changes, so I think it might be not needed. I commented out the line and tested like you did [1], and no timeout and errors were found. Do you have any reasons for that? https://www.postgresql.org/message-id/OS3PR01MB6275A95FD44DC6C46058EA389E3B9%40OS3PR01MB6275.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED