Re: doc: Fix description of how the default user name is chosen
On 01.11.22 22:31, David G. Johnston wrote: This is the only sentence I claim is factually incorrect, with a suggested re-wording. committed
Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn
On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila wrote: > > On Tue, Nov 22, 2022 at 10:33 PM Maxim Orlov wrote: > >> > >> > >> Regarding the tests, the patch includes a new scenario to > >> reproduce this issue. However, since the issue can be reproduced also > >> by the existing scenario (with low probability, though), I'm not sure > >> it's worth adding the new scenario. > > > > AFAICS, the test added doesn't 100% reproduce this issue, so, maybe, it > > does not worth it. But, I do not have a strong opinion here. > > Let's add tests in a separate commit and let the actual committer to decide > > what to do, should we? > > > > +1 to not have a test for this as the scenario can already be tested > by the existing set of tests. Agreed not to have a test case for this. I've attached an updated patch. I've confirmed this patch works for all supported branches. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com 0001-Reset-InitialRunningXacts-array-when-freeing-SnapBui.patch Description: Binary data
Re: [PoC] Federated Authn/z with OAUTHBEARER
Hi Jacob, I had validated Github by skipping the discovery mechanism and letting the provider extension pass on the endpoints. This is just for validation purposes. If it needs to be supported, then need a way to send the discovery document from extension. Thanks, Mahendrakar. On Thu, 24 Nov 2022 at 09:16, Andrey Chudnovsky wrote: > > > How does this differ from the previous proposal? The OAUTHBEARER SASL > > mechanism already relies on OIDC for discovery. (I think that decision > > is confusing from an architectural and naming standpoint, but I don't > > think they really had an alternative...) > Mostly terminology questions here. OAUTHBEARER SASL appears to be the > spec about using OAUTH2 tokens for Authentication. > While any OAUTH2 can generally work, we propose to specifically > highlight that only OIDC providers can be supported, as we need the > discovery document. > And we won't be able to support Github under that requirement. > Since the original patch used that too - no change on that, just > confirmation that we need OIDC compliance. > > > 0) The original hook proposal upthread, I thought, was about allowing > > libpq's flow implementation to be switched out by the application. I > > don't see that approach taken here. It's fine if that turned out to be a > > bad idea, of course, but this patch doesn't seem to match what we were > > talking about. > We still plan to allow the client to pass the token. Which is a > generic way to implement its own OAUTH flows. > > > 1) I'm really concerned about the sudden explosion of flows. We went > > from one flow (Device Authorization) to six. It's going to be hard > > enough to validate that *one* flow is useful and can be securely > > deployed by end users; I don't think we're going to be able to maintain > > six, especially in combination with my statement that iddawc is not an > > appropriate dependency for us. > > > I'd much rather give applications the ability to use their own OAuth > > code, and then maintain within libpq only the flows that are broadly > > useful. This ties back to (0) above. > We consider the following set of flows to be minimum required: > - Client Credentials - For Service to Service scenarios. > - Authorization Code with PKCE - For rich clients,including pgAdmin. > - Device code - for psql (and possibly other non-GUI clients). > - Refresh code (separate discussion) > Which is pretty much the list described here: > https://oauth.net/2/grant-types/ and in OAUTH2 specs. > Client Credentials is very simple, so does Refresh Code. > If you prefer to pick one of the richer flows, Authorization code for > GUI scenarios is probably much more widely used. > Plus it's easier to implement too, as interaction goes through a > series of callbacks. No polling required. > > > 2) Breaking the refresh token into its own pseudoflow is, I think, > > passing the buck onto the user for something that's incredibly security > > sensitive. The refresh token is powerful; I don't really want it to be > > printed anywhere, let alone copy-pasted by the user. Imagine the > > phishing opportunities. > > > If we want to support refresh tokens, I believe we should be developing > > a plan to cache and secure them within the client. They should be used > > as an accelerator for other flows, not as their own flow. > It's considered a separate "grant_type" in the specs / APIs. > https://openid.net/specs/openid-connect-core-1_0.html#RefreshTokens > > For the clients, it would be storing the token and using it to authenticate. > On the question of sensitivity, secure credentials stores are > different for each platform, with a lot of cloud offerings for this. > pgAdmin, for example, has its own way to secure credentials to avoid > asking users for passwords every time the app is opened. > I believe we should delegate the refresh token management to the clients. > > >3) I don't like the departure from the OAUTHBEARER mechanism that's > > presented here. For one, since I can't see a sample plugin that makes > > use of the "flow type" magic numbers that have been added, I don't > > really understand why the extension to the mechanism is necessary. > I don't think it's much of a departure, but rather a separation of > responsibilities between libpq and upstream clients. > As libpq can be used in different apps, the client would need > different types of flows/grants. > I.e. those need to be provided to libpq at connection initialization > or some other point. > We will change to "grant_type" though and use string to be closer to the spec. > What do you think is the best way for the client to signal which OAUTH > flow should be used? > > On Wed, Nov 23, 2022 at 12:05 PM Jacob Champion > wrote: > > > > On 11/23/22 01:58, mahendrakar s wrote: > > > We validated on libpq handling OAuth natively with different flows > > > with different OIDC certified providers. > > > > > > Flows: Device Code, Client Credentials and Refresh Token. > > > Providers: Microsoft, Google
Re: Non-decimal integer literals
On 23.11.22 09:54, David Rowley wrote: On Wed, 23 Nov 2022 at 02:37, Peter Eisentraut wrote: Here is a new patch. This looks like quite an inefficient way to convert a hex string into an int64: while (*ptr && isxdigit((unsigned char) *ptr)) { int8digit = hexlookup[(unsigned char) *ptr]; if (unlikely(pg_mul_s64_overflow(tmp, 16, &tmp)) || unlikely(pg_sub_s64_overflow(tmp, digit, &tmp))) goto out_of_range; ptr++; } I wonder if you'd be better off with something like: while (*ptr && isxdigit((unsigned char) *ptr)) { if (unlikely(tmp & UINT64CONST(0xF000))) goto out_of_range; tmp = (tmp << 4) | hexlookup[(unsigned char) *ptr++]; } Going by [1], clang will actually use multiplication by 16 to implement the former. gcc is better and shifts left by 4, so likely won't improve things for gcc. It seems worth doing it this way for anything that does not have HAVE__BUILTIN_OP_OVERFLOW anyway. My code follows the style used for parsing the decimal integers. Keeping that consistent is valuable I think. I think the proposed change makes the code significantly harder to understand. Also, what you are suggesting here would amount to an attempt to make parsing hexadecimal integers even faster than parsing decimal integers. Is that useful?
Re: Add 64-bit XIDs into PostgreSQL 15
Hi Chris, > XID wraparound doesn't happen to healthy databases > If you disagree, I would like to hear why. Consider the case when you run a slow OLAP query that takes 12h to complete and 100K TPS of fast OLTP-type queries on the same system. The fast queries will consume all 32-bit XIDs in less than 12 hours, while the OLAP query started 12 hours ago didn't finish yet and thus its tuples can't be frozen. > I agree that 64-bit xids are a good idea. I just don't think that existing > safety measures should be ignored or reverted. Fair enough. > The problem isn't just the lack of disk space, but the difficulty that stuck > autovacuum runs pose in resolving the issue. Keep in mind that everything > you can do to reclaim disk space (vacuum full, cluster, pg_repack) will be > significantly slowed down by an extremely bloated table/index comparison. > The problem is that if you are running out of disk space, and your time to > recovery much longer than expected, then you have a major problem. It's not > just one or the other, but the combination that poses the real risk here. > > Now that's fine if you want to run a bloatless table engine but to my > knowledge none of these are production-ready yet. ZHeap seems mostly > stalled. Oriole is still experimental. But with the current PostgreSQL > table structure. > > A DBA can monitor disk space, but if the DBA is not also monitoring xid lag, > then by the time corrective action is taken it may be too late. Good point but I still don't think this is related to the XID wraparound problem. -- Best regards, Aleksander Alekseev
Re: [PATCH] minor optimization for ineq_histogram_selectivity()
On 11/23/22 16:59, Tom Lane wrote: =?UTF-8?Q?Fr=c3=a9d=c3=a9ric_Yhuel?= writes: On 10/24/22 17:26, Frédéric Yhuel wrote: When studying the weird planner issue reported here [1], I came up with the attached patch. It reduces the probability of calling get_actual_variable_range(). This isn't very useful anymore thanks to this patch: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c6ad5eaa957bdc2132b900a96e0d2ec9264d39c I hadn't looked at this patch before, but now that I have, I'm inclined to reject it anyway. It just moves the problem around: now, instead of possibly doing an unnecessary index probe at the right end, you're possibly doing an unnecessary index probe at the left end. Indeed... it seemed to me that both versions would do an unnecessary index probe at the left end, but I wasn't careful enough :-/ It also looks quite weird compared to the normal coding of binary search. That's right. I wonder if there'd be something to be said for leaving the initial probe calculation alone and doing this: else if (probe == sslot.nvalues - 1 && sslot.nvalues > 2) + { + /* Don't probe the endpoint until we have to. */ + if (probe > lobound) + probe--; + else have_end = get_actual_variable_range(root, vardata, sslot.staop, collation, NULL, &sslot.values[probe]); + } On the whole though, it seems like a wart. Yeah... it's probably wiser not risking introducing a bug, only to save an index probe in rare cases (and only 100 reads, thanks to 9c6ad5ea). Thank you for having had a look at it. Best regards, Frédéric
Re: Allow file inclusion in pg_hba and pg_ident files
Hi, On Thu, Nov 24, 2022 at 02:37:23PM +0800, Julien Rouhaud wrote: > On Thu, Nov 24, 2022 at 02:07:21PM +0900, Michael Paquier wrote: > > On Wed, Nov 23, 2022 at 03:56:50PM +0800, Julien Rouhaud wrote: > > > The depth 0 is getting used quite a lot now, maybe we should have a > > > define for > > > it to make it easier to grep, like TOP_LEVEL_AUTH_FILE or something like > > > that? > > > And also add a define for the magical 10 for the max inclusion depth, for > > > both > > > auth files and GUC files while at it? > > > > Sounds like a good idea to me, and it seems to me that this had better > > be unified between the GUCs (see ParseConfigFp() that hardcodes a > > depth of 0) and hba.c. It looks like they could be added to > > conffiles.h, as of CONF_FILE_START_{LEVEL,DEPTH} and > > CONF_FILE_MAX_{LEVEL,DEPTH}. Would you like to send a patch? So I went with CONF_FILE_START_DEPTH and CONF_FILE_MAX_DEPTH. Attached v22 that fixes it in all the places I found. > > Now, to the tests.. > > > > > Mmm, I haven't looked deeply so I'm not sure if the perl podules are > > > aware of > > > it or not, but maybe we could somehow detect the used delimiter at the > > > beginning after normalizing the directory, and use a $DELIM rather than a > > > plain > > > "/"? > > > > I am not sure. Could you have a look and see if you can get the CI > > back to green? The first thing I would test is to switch the error > > patterns to be regexps based on the basenames rather than the full > > paths (tweaking the queries on the system views to do htat), so as we > > avoid all this business with slash and backslash transformations. Apparently just making sure that the $node->data_dir consistently uses forward slashes is enough to make the CI happy, for VS 2019 [1] and MinGW64 [2], so done this way with an extra normalization step. [1] https://cirrus-ci.com/task/4944203946917888 [2] https://cirrus-ci.com/task/6070103853760512 >From 879cf469d00d9274b67b80eb5fe47dfccf03022d Mon Sep 17 00:00:00 2001 From: Julien Rouhaud Date: Thu, 24 Nov 2022 16:57:53 +0800 Subject: [PATCH v22 1/2] Introduce macros for initial/maximum depth levels for configuration files Author: Julien Rouhaud Reviewed-by: FIXME Discussion: https://postgr.es/m/20220223045959.35ipdsvbxcstrhya@jrouhaud --- src/backend/commands/extension.c | 4 +++- src/backend/libpq/hba.c | 8 src/backend/utils/misc/guc-file.l | 5 +++-- src/backend/utils/misc/guc.c | 8 +--- src/include/utils/conffiles.h | 3 +++ 5 files changed, 18 insertions(+), 10 deletions(-) diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c index 806d6056ab..de01b792b9 100644 --- a/src/backend/commands/extension.c +++ b/src/backend/commands/extension.c @@ -60,6 +60,7 @@ #include "tcop/utility.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/conffiles.h" #include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/memutils.h" @@ -515,7 +516,8 @@ parse_extension_control_file(ExtensionControlFile *control, * Parse the file content, using GUC's file parsing code. We need not * check the return value since any errors will be thrown at ERROR level. */ - (void) ParseConfigFp(file, filename, 0, ERROR, &head, &tail); + (void) ParseConfigFp(file, filename, CONF_FILE_START_DEPTH, ERROR, &head, +&tail); FreeFile(file); diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c index 862ec18e91..8f1a0c4c73 100644 --- a/src/backend/libpq/hba.c +++ b/src/backend/libpq/hba.c @@ -620,7 +620,7 @@ free_auth_file(FILE *file, int depth) FreeFile(file); /* If this is the last cleanup, remove the tokenization context */ - if (depth == 0) + if (depth == CONF_FILE_START_DEPTH) { MemoryContextDelete(tokenize_context); tokenize_context = NULL; @@ -650,7 +650,7 @@ open_auth_file(const char *filename, int elevel, int depth, * avoid dumping core due to stack overflow if an include file loops back * to itself. The maximum nesting depth is pretty arbitrary. */ - if (depth > 10) + if (depth > CONF_FILE_MAX_DEPTH) { ereport(elevel, (errcode_for_file_access(), @@ -684,7 +684,7 @@ open_auth_file(const char *filename, int elevel, int depth, * tokenization. This will be closed with this file when coming back to * this level of cleanup. */ - if (depth == 0) + if (depth == CONF_FILE_START_DEPTH) { /* * A context may be present, but assume that it has been eliminated @@ -762,7 +762,7 @@ tokenize_auth_file(const char *filename, FILE *file, List **tok_lines, initStringInfo(&buf); - if (depth == 0) + if (depth == CONF_FILE_START_DEPTH) *tok_lines =
Re: Non-decimal integer literals
On Thu, 24 Nov 2022 at 21:35, Peter Eisentraut wrote: > My code follows the style used for parsing the decimal integers. > Keeping that consistent is valuable I think. I think the proposed > change makes the code significantly harder to understand. Also, what > you are suggesting here would amount to an attempt to make parsing > hexadecimal integers even faster than parsing decimal integers. Is that > useful? Isn't it being faster one of the major use cases for this feature? I remember many years ago and several jobs ago when working with SQL Server being able to speed up importing data using hexadecimal DATETIMEs. I can't think why else you might want to represent a DATETIME as a hexstring, so I assumed this was a large part of the use case for INTs in PostgreSQL. Are you telling me that better performance is not something anyone will want out of this feature? David
Re: Transparent column encryption
On Wed, 23 Nov 2022 19:45:06 +0100 Peter Eisentraut wrote: > On 28.10.22 12:16, Jehan-Guillaume de Rorthais wrote: [...] > >* I wonder if encryption related fields in ParameterDescription and > > RowDescription could be optional somehow? The former might be quite > > large when using a lot of parameters (like, imagine a large and ugly > > "IN($1...$N)"). On the other hand, these messages are not sent in high > > frequency anyway... > > They are only used if you turn on the column_encryption protocol option. > Or did you mean make them optional even then? I meant even when column_encryption is turned on. Regards,
Re: Prefetch the next tuple's memory during seqscans
On Wed, 23 Nov 2022 at 10:58, David Rowley wrote: > My current thoughts are that it might be best to go with 0005 to start > with. I know Melanie is working on making some changes in this area, > so perhaps it's best to leave 0002 until that work is complete. I tried running TPC-H @ scale 5 with master (@d09dbeb9) vs master + 0001 + 0005 patch. The results look quite promising. Query 15 seems to run 15% faster and overall it's 4.23% faster. Full results are attached. David query master Master + 0001 + 0005compare 1 25999.5 25793.6 100.8% 2 1171.0 1152.0 101.6% 3 6180.5 5456.5 113.3% 4 1167.1 1107.0 105.4% 5 4968.3 4604.8 107.9% 6 3696.6 3306.4 111.8% 7 5501.4 4905.6 112.1% 8 1394.8 1345.2 103.7% 9 10861.2 11159.8 97.3% 10 4354.3 4356.4 100.0% 11 382.5 386.3 99.0% 12 3888.6 3838.0 101.3% 13 6905.0 6622.5 104.3% 14 3886.1 3429.8 113.3% 15 8009.7 6927.8 115.6% 16 2406.2 2363.9 101.8% 17 14.614.998.1% 18 11735.6 11453.9 102.5% 19 44.944.7100.4% 20 262.8 246.6 106.6% 21 3014.1 3027.6 99.6% 22 176.4 179.3 98.4%
Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency
On 2022-Nov-23, Peter Geoghegan wrote: > On Wed, Nov 23, 2022 at 2:54 AM Alvaro Herrera > wrote: > > Something like the attached. It would result in output like this: > > WARNING: new multixact has more than one updating member: 0 2[17378 > > (keysh), 17381 (nokeyupd)] > > > > Then it should be possible to trace (in pg_waldump output) the > > operations of each of the transactions that have any status in the > > multixact that includes some form of "upd". > > That seems very useful. Okay, pushed to all branches. > Separately, I wonder if it would make sense to add additional > defensive checks to FreezeMultiXactId() for this. There is an > assertion that should catch the presence of multiple updaters in a > single Multi when it looks like we have to generate a new Multi to > carry the XID members forward (typically something we only need to do > during a VACUUM FREEZE). We could at least make that > "Assert(!TransactionIdIsValid(update_xid));" line into a defensive > "can't happen" ereport(). It couldn't hurt, at least -- we already > have a similar relfrozenxid check nearby, added after the "freeze the > dead" bug was fixed. Hmm, agreed. I'll see about that separately. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I'm always right, but sometimes I'm more right than other times." (Linus Torvalds)
Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn
> > Agreed not to have a test case for this. > > I've attached an updated patch. I've confirmed this patch works for > all supported branches. > > Regards, > > -- > Masahiko Sawada > Amazon Web Services: https://aws.amazon.com > It works for me as well. Thanks! I've created a commitfest entry for this patch, see https://commitfest.postgresql.org/41/4024/ Hope, it will be committed soon. -- Best regards, Maxim Orlov.
Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn
On Thu, Nov 24, 2022 at 1:48 PM Masahiko Sawada wrote: > > On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila wrote: > > Agreed not to have a test case for this. > > I've attached an updated patch. I've confirmed this patch works for > all supported branches. > I have slightly changed the checks used in the patch, otherwise looks good to me. I am planning to push (v11-v15) the attached tomorrow unless there are more comments. -- With Regards, Amit Kapila. v2-0001-Fix-uninitialized-access-to-InitialRunningXacts-d.patch Description: Binary data
Re: postgres_fdw: batch inserts vs. before row triggers
On Fri, Nov 18, 2022 at 8:46 PM Etsuro Fujita wrote: > Attached is a patch for fixing these issues. Here is an updated patch. In the attached, I added an assertion to ExecInsert(). Also, I tweaked comments and test cases a little bit, for consistency. Also, I noticed a copy-and-pasteo in a comment in ExecBatchInsert(), so I fixed it as well. Barring objections, I will commit the patch. Best regards, Etsuro Fujita fix-handling-of-pending-inserts-2.patch Description: Binary data
Report roles in pg_upgrade pg_ prefix check
Looking at a recent pg_upgrade thread I happened to notice that the check for roles with a pg_ prefix only reports the error, not the roles it found. Other similar checks where the user is expected to alter the old cluster typically reports the found objects in a textfile. The attached adds reporting to make that class of checks consistent (the check for prepared transactions which also isn't reporting is different IMO as it doesn't expect ALTER commands). As this check is only executed against the old cluster the patch removes the check when printing the error. -- Daniel Gustafsson https://vmware.com/ v1-0001-Report-incompatible-roles-in-pg_upgrade-checking.patch Description: Binary data
Re: Bug in row_number() optimization
On 24.11.2022 06:16, Richard Guo wrote: Regarding how to fix this problem, firstly I believe we need to evaluate window functions in the per-tuple memory context, as the HEAD does. When we decide we need to go into pass-through mode, I'm thinking that we can just copy out the results of the last evaluation to the per-query memory context, while still storing their pointers in ecxt_aggvalues. Does this idea work? Although I'm not familiar with the code, this makes sense to me. You proposed: +#ifdef USE_FLOAT8_BYVAL + evalWfuncContext = winstate->ss.ps.ps_ExprContext->ecxt_per_tuple_memory; +#else + evalWfuncContext = winstate->ss.ps.ps_ExprContext->ecxt_per_query_memory; +#endif Shouldn't we handle any pass-by-reference type the same? I suppose, a user-defined window function can return some other type, not int8. Best regards, -- Sergey Shinderukhttps://postgrespro.com/
Re: postgres_fdw binary protocol support
> 22 нояб. 2022 г., в 17:10, Ashutosh Bapat > написал(а): > > Hi Illya, > > On Mon, Nov 21, 2022 at 8:50 PM Ilya Gladyshev > wrote: >> >> Hi everyone, >> >> I have made a patch that introduces support for libpq binary protocol >> in postgres_fdw. The idea is simple, when a user knows that the foreign >> server is binary compatible with the local and his workload could >> somehow benefit from using binary protocol, it can be switched on for a >> particular server or even a particular table. >> > > Why do we need this feature? If it's for performance then do we have > performance numbers? Yes, it is for performance, but I am yet to do the benchmarks. My initial idea was that binary protocol must be more efficient than text, because as I understand that’s the whole point of it. However, the minor tests that I have done do not prove this and I couldn’t find any benchmarks for it online, so I will do further tests to find a use case for it. > About the patch itself, I see a lot of if (binary) {} else {} block > which are repeated. It will be good if we can add functions/macros to > avoid duplication. Yea, that’s true, I have some ideas about improving it
Re: Lockless queue of waiters in LWLock
Hi, hackers! Andres Freund recently committed his nice LWLock optimization a4adc31f6902f6f. So I've rebased the patch on top of the current master (PFA v5). Regards, Pavel Borisov, Supabase. v5-0001-Lockless-queue-of-LWLock-waiters.patch Description: Binary data
indentation in _hash_pgaddtup()
Hi, I was looking at : commit d09dbeb9bde6b9faabd30e887eff4493331d6424 Author: David Rowley Date: Thu Nov 24 17:21:44 2022 +1300 Speedup hash index builds by skipping needless binary searches In _hash_pgaddtup(), it seems the indentation is off for the assertion. Please take a look at the patch. Thanks hash-pgaddtup-indent.patch Description: Binary data
Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()
Hi, While working on something else, I noticed that WaitXLogInsertionsToFinish() goes the LWLockWaitForVar() route even for a process that's holding a WAL insertion lock. Typically, a process holding WAL insertion lock reaches WaitXLogInsertionsToFinish() when it's in need of WAL buffer pages for its insertion and waits for other older in-progress WAL insertions to finish. This fact guarantees that the process holding a WAL insertion lock will never have its insertingAt less than 'upto'. With that said, here's a small improvement I can think of, that is, to avoid calling LWLockWaitForVar() for the WAL insertion lock the caller of WaitXLogInsertionsToFinish() currently holds. Since LWLockWaitForVar() does a bunch of things - holds interrupts, does atomic reads, acquires and releases wait list lock and so on, avoiding it may be a good idea IMO. I'm attaching a patch herewith. Here's the cirrus-ci tests - https://github.com/BRupireddy/postgres/tree/avoid_LWLockWaitForVar_for_currently_held_wal_ins_lock_v1. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com v1-0001-Avoid-LWLockWaitForVar-for-currently-held-WAL-ins.patch Description: Binary data
Re: indentation in _hash_pgaddtup()
> On 24 Nov 2022, at 13:42, Ted Yu wrote: > In _hash_pgaddtup(), it seems the indentation is off for the assertion. > > Please take a look at the patch. Indentation is handled by applying src/tools/pgindent to the code, and re-running it on this file yields no re-indentation so this is in fact correct according to the pgindent rules. -- Daniel Gustafsson https://vmware.com/
Re: TAP output format in pg_regress
> On 23 Nov 2022, at 00:56, Andres Freund wrote: > On 2022-11-22 23:17:44 +0100, Daniel Gustafsson wrote: >> The attached v10 attempts to address the points raised above. Notes and >> diagnostics are printed to stdout/stderr respectively and the TAP emitter is >> changed to move more of the syntax into it making it less painful on the >> translators. > > Thanks! I played a bit with it locally and it's nice. Thanks for testing and reviewing! > I think it might be worth adding a few more details to the output on stderr, > e.g. a condensed list of failed tests, as that's then printed in the errorlogs > summary in meson after running all tests. As we don't do that in the current > output, that seems more like an optimization for later. Since this patch already change the output verbosity it seems within the goal- posts to do this now rather than later. I've added a first stab at this in the attached patch. > My compiler complains with: > > [6/80 7 7%] Compiling C object src/test/regress/pg_regress.p/pg_regress.c.o > ../../../../home/andres/src/postgresql/src/test/regress/pg_regress.c: In > function ‘emit_tap_output_v’: > ../../../../home/andres/src/postgresql/src/test/regress/pg_regress.c:354:9: > warning: function ‘emit_tap_output_v’ might be a candidate for ‘gnu_printf’ > format attribute [-Wsuggest-attribute=format] > 354 | vfprintf(fp, fmt, argp); > | ^~~~ > ../../../../home/andres/src/postgresql/src/test/regress/pg_regress.c:356:17: > warning: function ‘emit_tap_output_v’ might be a candidate for ‘gnu_printf’ > format attribute [-Wsuggest-attribute=format] > 356 | vfprintf(logfile, fmt, argp_logfile); > | ^~~~ > > Which seems justified and easily remedied via pg_attribute_printf(). Fixed. > I think there might be something slightly wrong with 'tags' - understandable, > since there's basically no comment explaining what it's supposed to do. I > added an intentional failure to 'show.pgc', which then yielded the following > tap output: > ok 51sql/quote 15 ms > not ok 52sql/show9 ms > # tags: stdout source ok 53sql/insupd > 11 ms > ok 54sql/parser 13 ms > > which then subsequently leads meson to complain that > TAP parsing error: out of order test numbers > TAP parsing error: Too few tests run (expected 62, got 61) > > Looks like all that's needed is a \n after "tags: %s" Ugh, yes, I forgot to run my ecpg tests before submitting. Fixed. > I think the patch is also missing a \n after in log_child_failure(). Fixed. >> Subject: [PATCH v10 2/2] Experimental: meson: treat regress tests as TAP > > I'd just squash that in I think. Isn't really experimental either IMO ;) Done. >> +if (type == DIAG || type == BAIL) >> +fp = stderr; > > Personally I'd move the initialization of fp to an else branch here, but > that's a very minor taste thing. I actually had it like that before, moved it and wasn't sure what I preferred. Moved back. >> +fprintf(stdout, "Bail Out!"); >> +if (logfile) >> +fprintf(logfile, "Bail Out!"); > > I think this needs a \n. Fixed. > I was wondering whether it's worth having an printf wrapper that does the > vfprintf(); if (logfile) vfprintf() dance. But it's proably not. Not sure if the saved lines of code justifies the loss of readability. >> +/* Not using the normal bail() here as we want _exit */ >> +_bail(_("could not exec \"%s\": %s\n"), shellprog, >> strerror(errno)); > > Not in love with _bail, but I don't immediately have a better idea. Agreed on both counts. >> +pg_log_error(_("# could not open file \"%s\" for reading: %s"), >> + file, strerror(errno)); > > Hm. Shouldn't this just use diag()? It should. I've changed all uses of pg_log_error to be diag or bail for consistency in output, except for the one in getopt. >> +test_status_ok(tests[i], >> INSTR_TIME_GET_MILLISEC(stoptimes[i]), (num_tests > 1)); >> >> if (statuses[i] != 0) >> log_child_failure(statuses[i]); > > Hm. We probably shouldn't treat the test as a success if statuses[i] != 0? > Although it sure looks like we did so far. Thats a good point, I admittedly hadn't thought about it. 55de145d1cf6f1d1b9 doesn't mention why it's done this way, maybe it's assumed that if the process died then the test would've failed anyways (which is likely true but not guaranteed to be so). As it's unrelated to the question of output format I'll open a new thread with that question to keep it from being buried here. >> -printf("%s ", tl->str); >> +appendStringInfo(&tagbuf, "%s ", tl->str); > > Why does tagbuf exist? Afaict it
Re: Patch: Global Unique Index
Pavel Stehule schrieb am 24.11.2022 um 07:03: > There are many Oracle users that find global indexes useful despite > their disadvantages. > > I have seen this mostly when the goal was to get the benefits of > partition pruning at runtime which turned the full table scan (=Seq Scan) > on huge tables to partition scans on much smaller partitions. > Partition wise joins were also helpful for query performance. > The substantially slower drop partition performance was accepted in thos > cases > > > I think it would be nice to have the option in Postgres as well. > > I do agree however, that the global index should not be created > automatically. > > Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better > > > Is it necessary to use special marks like GLOBAL if this index will > be partitioned, and uniqueness will be ensured by repeated > evaluations? > > Or you think so there should be really forced one relation based > index? > > I can imagine a unique index on partitions without a special mark, > that will be partitioned, and a second variant classic index created > over a partitioned table, that will be marked as GLOBAL. My personal opinion is, that a global index should never be created automatically. The user should consciously decide on using a feature that might have a serious impact on performance in some areas.
Re: indentation in _hash_pgaddtup()
Daniel Gustafsson writes: >> On 24 Nov 2022, at 13:42, Ted Yu wrote: >> In _hash_pgaddtup(), it seems the indentation is off for the assertion. > Indentation is handled by applying src/tools/pgindent to the code, and > re-running it on this file yields no re-indentation so this is in fact correct > according to the pgindent rules. It is one messy bit of code though --- perhaps a little more thought about where to put line breaks would help? Alternatively, it could be split into multiple statements, along the lines of #ifdef USE_ASSERT_CHECKING if (PageGetMaxOffsetNumber(page) > 0) { IndexTuple lasttup = PageGetItem(page, PageGetItemId(page, PageGetMaxOffsetNumber(page))); Assert(_hash_get_indextuple_hashkey(lasttup) <= _hash_get_indextuple_hashkey(itup)); } #endif (details obviously tweakable) regards, tom lane
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Wednesday, October 5, 2022 6:42 PM Peter Smith wrote: > Hi Euler, a long time ago you ask me a few questions about my previous review > [1]. > > Here are my replies, plus a few other review comments for patch v7-0001. Hi, thank you for your comments. > == > > 1. doc/src/sgml/catalogs.sgml > > + > + Application delay of changes by a specified amount of time. The > + unit is in milliseconds. > + > > The wording sounds a bit strange still. How about below > > SUGGESTION > The length of time (ms) to delay the application of changes. Fixed. > === > > 2. Other documentation? > > Maybe should say something on the Logical Replication Subscription page > about this? (31.2 Subscription) Added. > === > > 3. doc/src/sgml/ref/create_subscription.sgml > > + synchronized, this may lead to apply changes earlier than expected. > + This is not a major issue because a typical setting of this > parameter > + are much larger than typical time deviations between servers. > > Wording? > > SUGGESTION > ... than expected, but this is not a major issue because this parameter is > typically much larger than the time deviations between servers. Fixed. > ~~~ > > 4. Q/A > > From [2] you asked: > > > Should there also be a big warning box about the impact if using > > synchronous_commit (like the other streaming replication page has this > > warning)? > Impact? Could you elaborate? > > ~ > > I noticed the streaming replication docs for recovery_min_apply_delay has a > big > red warning box saying that setting this GUC may block the synchronous > commits. So I was saying won’t a similar big red warning be needed also for > this min_apply_delay parameter if the delay is used in conjunction with a > publisher wanting synchronous commit because it might block everything? I agree with you. Fixed. > ~~~ > > 4. Example > > + > +CREATE SUBSCRIPTION foo > + CONNECTION 'host=192.168.1.50 port=5432 user=foo > dbname=foodb' > +PUBLICATION baz > + WITH (copy_data = false, min_apply_delay = '4h'); > + > > If the example named the subscription/publication as ‘mysub’ and ‘mypub’ I > think it would be more consistent with the existing examples. Fixed. > == > > 5. src/backend/commands/subscriptioncmds.c - SubOpts > > @@ -89,6 +91,7 @@ typedef struct SubOpts > bool disableonerr; > char*origin; > XLogRecPtr lsn; > + int64 min_apply_delay; > } SubOpts; > > I feel it would be better to be explicit about the storage units. So call this > member ‘min_apply_delay_ms’. E.g. then other code in > parse_subscription_options will be more natural when you are converting using > and assigning them to this member. I don't think we use such names including units explicitly. Could you please tell me a similar example for this ? > ~~~ > > 6. - parse_subscription_options > > + /* > + * If there is no unit, interval_in takes second as unit. This > + * parameter expects millisecond as unit so add a unit (ms) if > + * there isn't one. > + */ > > The comment feels awkward. How about below > > SUGGESTION > If no unit was specified, then explicitly add 'ms' otherwise the interval_in > function would assume 'seconds' Fixed. > ~~~ > > 7. - parse_subscription_options > > (This is a repeat of [1] review comment #12) > > + if (opts->min_apply_delay < 0 && IsSet(supported_opts, > SUBOPT_MIN_APPLY_DELAY)) > + ereport(ERROR, > + errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), > + errmsg("option \"%s\" must not be negative", "min_apply_delay")); > > Why is this code here instead of inside the previous code block where the > min_apply_delay was assigned in the first place? Changed. > == > > 8. src/backend/replication/logical/worker.c - apply_delay > > + * When min_apply_delay parameter is set on subscriber, we wait long > + enough to > + * make sure a transaction is applied at least that interval behind the > + * publisher. > > "on subscriber" -> "on the subscription" Fixed. > ~~~ > > 9. > > + * Apply delay only after all tablesync workers have reached READY > + state. A > + * tablesync worker are kept until it reaches READY state. If we allow > + the > > > Wording ?? > > "A tablesync worker are kept until it reaches READY state." ?? I removed the sentence. > ~~~ > > 10. > > 10a. > + /* nothing to do if no delay set */ > > Uppercase comment > /* Nothing to do if no delay set */ > > ~ > > 10b. > + /* set apply delay */ > > Uppercase comment > /* Set apply delay */ Both are fixed. > ~~~ > > 11. - apply_handle_stream_prepare / apply_handle_stream_commit > > The previous concern about incompatibility with the "Parallel Apply" > work (see [1] review comments #17, #18) is still a pending issue, isn't it? Yes, I think so. Kindly have a look at [1]. > == > > 12. src/backend/utils/adt/timestamp.c interval_to_ms > > +/* > + * Given an Interval returns the number of milliseconds. > + */ > +int64 > +
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Wednesday, November 16, 2022 12:58 PM Ian Lawrence Barwick wrote: > 2022年11月14日(月) 10:09 Takamichi Osumi (Fujitsu) > : > > I've simply rebased the patch to make it applicable on top of HEAD and > > make the tests pass. Note there are still open pending comments and > > I'm going to start to address those. > Thanks for the updated patch. > > While reviewing the patch backlog, we have determined that this patch adds > one or more TAP tests but has not added the test to the "meson.build" file. > > To do this, locate the relevant "meson.build" file for each test and add it > in the > 'tests' dictionary, which will look something like this: > > 'tap': { > 'tests': [ > 't/001_basic.pl', > ], > }, > > For some additional details please see this Wiki article: > > https://wiki.postgresql.org/wiki/Meson_for_patch_authors > > For more information on the meson build system for PostgreSQL see: > > https://wiki.postgresql.org/wiki/Meson Hi, thanks for your notification. You are right. Modified. The updated patch can be seen in [1]. [1] - https://www.postgresql.org/message-id/TYCPR01MB8373775ECC6972289AF8CB30ED0F9%40TYCPR01MB8373.jpnprd01.prod.outlook.com Best Regards, Takamichi Osumi
RE: Time delayed LR (WAS Re: logical replication restrictions)
Hi, On Tuesday, November 22, 2022 6:15 PM vignesh C wrote: > On Mon, 14 Nov 2022 at 12:14, Amit Kapila wrote: > > > > Hi, > > > > The thread title doesn't really convey the topic under discussion, so > > changed it. IIRC, this has been mentioned by others as well in the > > thread. > > > > On Sat, Nov 12, 2022 at 7:21 PM vignesh C wrote: > > > > > > Few comments: > > > 1) I feel if the user has specified a long delay there is a chance > > > that replication may not continue if the replication slot falls > > > behind the current LSN by more than max_slot_wal_keep_size. I feel > > > we should add this reference in the documentation of min_apply_delay > > > as the replication will not continue in this case. > > > > > > > This makes sense to me. Modified accordingly. The updated patch is in [1]. > > > > > 2) I also noticed that if we have to shut down the publisher server > > > with a long min_apply_delay configuration, the publisher server > > > cannot be stopped as the walsender waits for the data to be > > > replicated. Is this behavior ok for the server to wait in this case? > > > If this behavior is ok, we could add a log message as it is not very > > > evident from the log files why the server could not be shut down. > > > > > > > I think for this case, the behavior should be the same as for physical > > replication. Can you please check what is behavior for the case you > > are worried about in physical replication? Note, we already have a > > similar parameter for recovery_min_apply_delay for physical > > replication. > > In the case of physical replication by setting recovery_min_apply_delay, I > noticed that both primary and standby nodes were getting stopped successfully > immediately after the stop server command. In case of logical replication, > stop > server fails: > pg_ctl -D publisher -l publisher.log stop -c waiting for server to shut > down... > failed > pg_ctl: server does not shut down > > In case of logical replication, the server does not get stopped because the > walsender process is not able to exit: > ps ux | grep walsender > vignesh 1950789 75.3 0.0 8695216 22284 ? Rs 11:51 1:08 > postgres: walsender vignesh [local] START_REPLICATION Thanks, I could reproduce this and I'll update this point in a subsequent version. [1] - https://www.postgresql.org/message-id/TYCPR01MB8373775ECC6972289AF8CB30ED0F9%40TYCPR01MB8373.jpnprd01.prod.outlook.com Best Regards, Takamichi Osumi
Re: Patch: Global Unique Index
On Fri, Nov 18, 2022 at 3:31 AM Cary Huang wrote: > > Patch: Global Unique Index > - Optimizer, query planning and vacuum - > Since no major modification is done on global unique index's structure and > storage, it works in the same way as a regular partitioned index. No major > change is required to be done on optimizer, planner and vacuum process as > they should work in the same way as regular index. It might not need changes in the vacuum to make it work. But this can not be really useful without modifying the vacuum the way it works. I mean currently, the indexes are also partitioned based on the table so whenever we do table vacuum it's fine to do index vacuum but now you will have one gigantic index and which will be vacuumed every time we vacuum any of the partitions. So for example, if you have 1 partitions then by the time you vacuum the whole table (all 1 partitions) the global index will be vacuumed 1 times. There was some effort in past (though it was not concluded) about decoupling the index and heap vacuuming such that instead of doing the index vacuum for each partition we remember the dead tids and we only do the index vacuum when we think there are enough dead items so that the index vacuum makes sense[1]. [1] https://www.postgresql.org/message-id/CA%2BTgmoZgapzekbTqdBrcH8O8Yifi10_nB7uWLB8ajAhGL21M6A%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: indentation in _hash_pgaddtup()
On Thu, Nov 24, 2022 at 7:11 AM Tom Lane wrote: > Daniel Gustafsson writes: > >> On 24 Nov 2022, at 13:42, Ted Yu wrote: > >> In _hash_pgaddtup(), it seems the indentation is off for the assertion. > > > Indentation is handled by applying src/tools/pgindent to the code, and > > re-running it on this file yields no re-indentation so this is in fact > correct > > according to the pgindent rules. > > It is one messy bit of code though --- perhaps a little more thought > about where to put line breaks would help? Alternatively, it could > be split into multiple statements, along the lines of > > #ifdef USE_ASSERT_CHECKING > if (PageGetMaxOffsetNumber(page) > 0) > { > IndexTuple lasttup = PageGetItem(page, > PageGetItemId(page, > > PageGetMaxOffsetNumber(page))); > > Assert(_hash_get_indextuple_hashkey(lasttup) <= >_hash_get_indextuple_hashkey(itup)); > } > #endif > > (details obviously tweakable) > > regards, tom lane > Thanks Tom for the suggestion. Here is patch v2. hash-pgaddtup-indent-v2.patch Description: Binary data
RE: Time delayed LR (WAS Re: logical replication restrictions)
Hi, On Monday, November 14, 2022 7:15 PM Amit Kapila wrote: > On Wed, Nov 9, 2022 at 12:11 PM Kyotaro Horiguchi > wrote: > > > > At Wed, 10 Aug 2022 17:33:00 -0300, "Euler Taveira" > > wrote in > > > On Wed, Aug 10, 2022, at 9:39 AM, osumi.takami...@fujitsu.com wrote: > > > > Minor review comments for v6. > > > Thanks for your review. I'm attaching v7. > > > > Using interval is not standard as this kind of parameters but it seems > > convenient. On the other hand, it's not great that the unit month > > introduces some subtle ambiguity. This patch translates a month to 30 > > days but I'm not sure it's the right thing to do. Perhaps we shouldn't > > allow the units upper than days. > > > > Agreed. Isn't the same thing already apply to recovery_min_apply_delay for > which the maximum unit seems to be in days? If so, there is no reason to do > something different here? The corresponding one of physical replication had the upper limit of INT_MAX(like it means 24 days is OK, but 25 days isn't). I added this test in the patch posted in [1]. > > > apply_delay() chokes the message-receiving path so that a not-so-long > > delay can cause a replication timeout to fire. I think we should > > process walsender pings even while delaying. Needing to make > > replication timeout longer than apply delay is not great, I think. > > > > Again, I think for this case also the behavior should be similar to how we > handle > recovery_min_apply_delay. Yes, I agree with you. This feature makes it easier to trigger the publisher's timeout, which can't be observed in the physical replication. I'll do the investigation and modify this point in a subsequent version. [1] - https://www.postgresql.org/message-id/TYCPR01MB8373775ECC6972289AF8CB30ED0F9%40TYCPR01MB8373.jpnprd01.prod.outlook.com Best Regards, Takamichi Osumi
RE: logical replication restrictions
Hi, On Thursday, August 11, 2022 7:33 PM Amit Kapila wrote: > On Tue, Aug 9, 2022 at 3:52 AM Euler Taveira wrote: > > > > On Wed, Aug 3, 2022, at 10:27 AM, Amit Kapila wrote: > > > > Your explanation makes sense to me. The other point to consider is > > that there can be cases where we may not apply operation for the > > transaction because of empty transactions (we don't yet skip empty > > xacts for prepared transactions). So, won't it be better to apply the > > delay just before we apply the first change for a transaction? Do we > > want to apply the delay during table sync as we sometimes do need to > > enter apply phase while doing table sync? > > > > I thought about the empty transactions but decided to not complicate > > the code mainly because skipping transactions is not a code path that > > will slow down this feature. As explained in the documentation, there > > is no harm in delaying a transaction for more than min_apply_delay; it > > cannot apply earlier. Having said that I decided to do nothing. I'm > > also not sure if it deserves a comment or if this email is a possible > > explanation > for this decision. > > > > I don't know what makes you think it will complicate the code. But anyway > thinking further about the way apply_delay is used at various places in the > patch, > as pointed out by Peter Smith it seems it won't work for the parallel apply > feature where we start applying the transaction immediately after start > stream. > I was wondering why don't we apply delay after each commit of the transaction > rather than at the begin command. We can remember if the transaction has > made any change and if so then after commit, apply the delay. If we can do > that > then it will alleviate the concern of empty and skipped xacts as well. I agree with this direction. I'll update this point in a subsequent patch. > Another thing I was wondering how to determine what is a good delay time for > tests and found that current tests in replay_delay.pl uses 3s, so should we > use > the same for apply delay tests in this patch as well? Fixed in the patch posted in [1]. [1] - https://www.postgresql.org/message-id/TYCPR01MB8373775ECC6972289AF8CB30ED0F9%40TYCPR01MB8373.jpnprd01.prod.outlook.com Best Regards, Takamichi Osumi
Re: Patch: Global Unique Index
On Thu, Nov 24, 2022 at 07:03:24AM +0100, Pavel Stehule wrote: > I can imagine a unique index on partitions without a special mark, that > will be partitioned, That exists since v11, as long as the index keys include the partition keys. > and a second variant classic index created over a partitioned table, > that will be marked as GLOBAL. That's not what this patch is about, though. On Thu, Nov 24, 2022 at 08:52:16PM +0530, Dilip Kumar wrote: > but now you will have one gigantic index and which will be vacuumed > every time we vacuum any of the partitions. This patch isn't implemented as "one gigantic index", though. -- Justin
Re: Damage control for planner's get_actual_variable_endpoint() runaway
On Tue, 22 Nov 2022 at 18:44, Tom Lane wrote: > > I wrote: > > Still wondering if there's really no CHECK_FOR_INTERRUPT anywhere > > else in this loop. > > I did some experimentation using the test case Jakub presented > to start with, and verified that that loop does respond promptly > to control-C even in HEAD. So there are CFI(s) in the loop as > I thought, and we don't need another. Thanks for checking. Sorry for not responding earlier. > What we do need is some more work on nearby comments. I'll > see about that and push it. Thanks; nicely streamlined. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: TAP output format in pg_regress
You guys are really fast... I only think about problem, it is already mentioned here... Most issues I've noticed are already fixed before I was able to say something. Nevertheless... /* * Bailing out is for unrecoverable errors which prevents further testing to * occur and after which the test run should be aborted. By passing non_rec * as true the process will exit with _exit(2) and skipping registered exit * handlers. */ static void bail_out(bool non_rec, const char *fmt,...) { In original code, where _exit were called, there were mention about recursion (whatever it is) as a reason for using _exit() instead of exit(). Now this mention is gone: - _exit(2); /* not exit(), that could be recursive */ + /* Not using the normal bail() as we want _exit */ + _bail(_("could not stop postmaster: exit code was %d\n"), r); The only remaining part of recursion is _rec suffix. I guess we should keep mention of recursion in comments, and for me _rec stands for "record", not "recursion", I will not guess original meaning by _rec suffix. I would suggest to change it to _recurs or _recursive to keep things clear - +#define _bail(...) bail_out(true, __VA_ARGS__) +#define bail(...) bail_out(false, __VA_ARGS__) I will never guess what the difference between bail, _bail and bail_out. We need really good explanation here, or better to use self-explanatory names here... I would start bail_out with _ as it is "internal", not used in the code. And for _bail I would try to find some name that shows it's nature. Like bail_in_recursion or something. - + if (ok || ignore) { - va_start(ap, fmt); - vfprintf(logfile, fmt, ap); - va_end(ap); + emit_tap_output(TEST_STATUS, "ok %-5i %s%-*s %8.0f ms%s\n", + testnumber, + (parallel ? PARALLEL_INDENT : ""), + padding, testname, + runtime, + (ignore ? " # SKIP (ignored)" : "")); + } + else + { + emit_tap_output(TEST_STATUS, "not ok %-5i %s%-*s %8.0f ms\n", + testnumber, + (parallel ? PARALLEL_INDENT : ""), + padding, testname, + runtime); } This magic spell "...%-5i %s%-*s %8.0f ms\n" is too dark to repeat it even two times. I understand problems with spaces... But may be it would be better somehow narrow it to one ugly print... Print "ok %-5i"|"not ok %-5i" to buffer first, and then have one "%s%-*s %8.0f ms%s\n" print or something like that... I am not strongly insisting on that, but I feel strong urge to remove code duplication here... -- Nikolay Shaplov aka Nataraj Fuzzing Engineer at Postgres Professional Matrix IM: @dhyan:nataraj.su signature.asc Description: This is a digitally signed message part.
Re: O(n) tasks cause lengthy startups and checkpoints
On Thu, 24 Nov 2022 at 00:19, Nathan Bossart wrote: > > On Sun, Nov 06, 2022 at 02:38:42PM -0800, Nathan Bossart wrote: > > rebased > > another rebase for cfbot 0001 seems good to me * I like that it sleeps forever until requested * not sure I believe that everything it does can always be aborted out of and shutdown - to achieve that you will need a CHECK_FOR_INTERRUPTS() calls in the loops in patches 5 and 6 at least * not sure why you want immediate execution of custodian tasks - I feel supporting two modes will be a lot harder. For me, I would run locally when !IsUnderPostmaster and also in an Assert build, so we can test it works right - i.e. running in its own process is just a production optimization for performance (which is the stated reason for having this) 0005 seems good from what I know * There is no check to see if it worked in any sane time * It seems possible that "Old" might change meaning - will that make it break/fail? 0006 seems good also * same comments for 5 Rather than explicitly use DEBUG1 everywhere I would have an #define CUSTODIAN_LOG_LEVEL LOG so we can run with it in LOG mode and then set it to DEBUG1 with a one line change in a later phase of Beta I can't really comment with knowledge on sub-patches 0002 to 0004. Perhaps you should aim to get 1, 5, 6 committed first and then return to the others in a later CF/separate thread? -- Simon Riggshttp://www.EnterpriseDB.com/
Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency
On 2022-Nov-24, Dimos Stamatakis wrote: > Thanks for your feedback! > I applied the patch to print more information about the error. Here’s what I > got: > > 2022-11-23 20:33:03 UTC [638 test_database]: [5458] ERROR: new multixact has > more than one updating member: 0 2[248477 (nokeyupd), 248645 (nokeyupd)] > 2022-11-23 20:33:03 UTC [638 test_database]: [5459] STATEMENT: UPDATE > warehouse1 > SET w_ytd = w_ytd + 498 > WHERE w_id = 5 > > I then inspected the WAL and I found the log records for these 2 transactions: > > … > rmgr: MultiXact len (rec/tot): 54/54, tx: 248477, lsn: > 0/66DB82A8, prev 0/66DB8260, desc: CREATE_ID 133 offset 265 nmembers 2: > 248477 (nokeyupd) 248500 (keysh) > rmgr: Heaplen (rec/tot): 70/70, tx: 248477, lsn: > 0/66DB82E0, prev 0/66DB82A8, desc: HOT_UPDATE off 20 xmax 133 flags 0x20 > IS_MULTI EXCL_LOCK ; new off 59 xmax 132, blkref #0: rel 1663/16384/16385 blk > 422 > rmgr: Transaction len (rec/tot): 34/34, tx: 248477, lsn: > 0/66DBA710, prev 0/66DBA6D0, desc: ABORT 2022-11-23 20:33:03.712298 UTC > … > rmgr: Transaction len (rec/tot): 34/34, tx: 248645, lsn: > 0/66DBB060, prev 0/66DBB020, desc: ABORT 2022-11-23 20:33:03.712388 UTC Ah, it seems clear enough: the transaction that aborted after having updated the tuple, is still considered live when doing the second update. That sounds wrong. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"
Re: Add 64-bit XIDs into PostgreSQL 15
On Fri, 21 Oct 2022 at 17:09, Maxim Orlov wrote: > Reviews and opinions are very welcome! I'm wondering whether the safest way to handle this is by creating a new TAM called "heap64", so that all storage changes happens there. (Obviously there are still many other changes in core, but they are more easily fixed). That would reduce the code churn around "heap", allowing us to keep it stable while we move to the brave new world. Many current users see stability as one of the greatest strengths of Postgres, so while I very much support this move, I wonder if this gives us a way to have both stability and innovation at the same time? -- Simon Riggshttp://www.EnterpriseDB.com/
Re: Questions regarding distinct operation implementation
On 23/11/22 23:48, Ankit Kumar Pandey wrote: Hello, I have questions regarding distinct operation and would be glad if someone could help me out. Consider the following table (mytable): id, name 1, A 1, A 2, B 3, A 1, A If we do /select avg(id) over (partition by name) from mytable/, partition logic goes like this: for A: 1, 1, 3, 1 If we want to implement something like this /select avg(distinct id) over (partition by name) from m/ytable and remove duplicate by storing last datum of aggregate column (id) and comparing it with current value. It fails here because aggregate column is not sorted within the partition. Questions: 1. Is sorting prerequisite for finding distinct values? 2. Is it okay to sort aggregate column (within partition) for distinct to work in case of window function? 3. Is an alternative way exists to handle this scenario (because here sort is of no use in aggregation)? Thanks -- Regards, Ankit Kumar Pandey Hi, After little more digging, I can see that aggregation on Window functions are of running type, it would be bit more effective if a lookup hashtable is created where every value in current aggregate column get inserted. Whenever frame moves ahead, a lookup if performed for presence of duplicate. On performance standpoint, this might be bad idea though. Please let me know any opinions on this. -- Regards, Ankit Kumar Pandey
Re: Patch: Global Unique Index
On Thu, 24 Nov 2022 08:00:59 -0700 Thomas Kellerer wrote --- > Pavel Stehule schrieb am 24.11.2022 um 07:03: > > There are many Oracle users that find global indexes useful despite > > their disadvantages. > > > > I have seen this mostly when the goal was to get the benefits of > > partition pruning at runtime which turned the full table scan (=Seq > > Scan) > > on huge tables to partition scans on much smaller partitions. > > Partition wise joins were also helpful for query performance. > > The substantially slower drop partition performance was accepted in > > thos cases > > > > > > I think it would be nice to have the option in Postgres as well. > > > > I do agree however, that the global index should not be created > > automatically. > > > > Something like CREATE GLOBAL [UNIQUE] INDEX ... would be a lot better > > > > > > Is it necessary to use special marks like GLOBAL if this index will > > be partitioned, and uniqueness will be ensured by repeated > > evaluations? > > > > Or you think so there should be really forced one relation based > > index? > > > > I can imagine a unique index on partitions without a special mark, > > that will be partitioned, and a second variant classic index created > > over a partitioned table, that will be marked as GLOBAL. > > > My personal opinion is, that a global index should never be created > automatically. > > The user should consciously decide on using a feature > that might have a serious impact on performance in some areas. Agreed, if a unique index is created on non-partition key columns without including the special mark (partition key columns), it may be a mistake from user. (At least I make this mistake all the time). Current PG will give you a warning to include the partition keys, which is good. If we were to automatically turn that into a global unique index, user may be using the feature without knowing and experiencing some performance impacts (to account for extra uniqueness check in all partitions).
Re: Fix for visibility check on 14.5 fails on tpcc with high concurrency
On 2022-Nov-24, Alvaro Herrera wrote: > On 2022-Nov-24, Dimos Stamatakis wrote: > > > rmgr: MultiXact len (rec/tot): 54/54, tx: 248477, lsn: > > 0/66DB82A8, prev 0/66DB8260, desc: CREATE_ID 133 offset 265 nmembers 2: > > 248477 (nokeyupd) 248500 (keysh) > > rmgr: Heaplen (rec/tot): 70/70, tx: 248477, lsn: > > 0/66DB82E0, prev 0/66DB82A8, desc: HOT_UPDATE off 20 xmax 133 flags 0x20 > > IS_MULTI EXCL_LOCK ; new off 59 xmax 132, blkref #0: rel 1663/16384/16385 > > blk 422 > > rmgr: Transaction len (rec/tot): 34/34, tx: 248477, lsn: > > 0/66DBA710, prev 0/66DBA6D0, desc: ABORT 2022-11-23 20:33:03.712298 UTC > > … > > rmgr: Transaction len (rec/tot): 34/34, tx: 248645, lsn: > > 0/66DBB060, prev 0/66DBB020, desc: ABORT 2022-11-23 20:33:03.712388 UTC > > Ah, it seems clear enough: the transaction that aborted after having > updated the tuple, is still considered live when doing the second > update. That sounds wrong. Hmm, if a transaction is aborted but its Xid is after latestCompletedXid, it would be reported as still running. AFAICS that is only modified with ProcArrayLock held in exclusive mode, and only read with it held in share mode, so this should be safe. I can see nothing else ATM. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "I must say, I am absolutely impressed with what pgsql's implementation of VALUES allows me to do. It's kind of ridiculous how much "work" goes away in my code. Too bad I can't do this at work (Oracle 8/9)." (Tom Allison) http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php
Re: Avoid LWLockWaitForVar() for currently held WAL insertion lock in WaitXLogInsertionsToFinish()
Hi, On 2022-11-24 18:13:10 +0530, Bharath Rupireddy wrote: > With that said, here's a small improvement I can think of, that is, to > avoid calling LWLockWaitForVar() for the WAL insertion lock the caller > of WaitXLogInsertionsToFinish() currently holds. Since > LWLockWaitForVar() does a bunch of things - holds interrupts, does > atomic reads, acquires and releases wait list lock and so on, avoiding > it may be a good idea IMO. That doesn't seem like a big win. We're still going to call LWLockWaitForVar() for all the other locks. I think we could improve this code more significantly by avoiding the call to LWLockWaitForVar() for all locks that aren't acquired or don't have a conflicting insertingAt, that'd require just a bit more work to handle systems without tear-free 64bit writes/reads. The easiest way would probably be to just make insertingAt a 64bit atomic, that transparently does the required work to make even non-atomic read/writes tear free. Then we could trivially avoid the spinlock in LWLockConflictsWithVar(), LWLockReleaseClearVar() and with just a bit more work add a fastpath to LWLockUpdateVar(). We don't need to acquire the wait list lock if there aren't any waiters. I'd bet that start to have visible effects in a workload with many small records. Greetings, Andres Freund
Re: TAP output format in pg_regress
> On 24 Nov 2022, at 18:07, Nikolay Shaplov wrote: > > You guys are really fast... I only think about problem, it is already > mentioned here... Most issues I've noticed are already fixed before I was > able > to say something. Thanks for looking and reviewing! > /* > > * Bailing out is for unrecoverable errors which prevents further testing to > > * occur and after which the test run should be aborted. By passing non_rec > > * as true the process will exit with _exit(2) and skipping registered exit > > * handlers. > > */ > > static void > > bail_out(bool non_rec, const char *fmt,...) > > { > > In original code, where _exit were called, there were mention about recursion > (whatever it is) as a reason for using _exit() instead of exit(). Now this > mention is gone: > > - _exit(2); /* not exit(), that could be recursive */ > + /* Not using the normal bail() as we want _exit */ > + _bail(_("could not stop postmaster: exit code was %d\n"), r); > > The only remaining part of recursion is _rec suffix. > > I guess we should keep mention of recursion in comments, and for me _rec > stands for "record", not "recursion", I will not guess original meaning by > _rec suffix. I would suggest to change it to _recurs or _recursive to keep > things clear The other original comment on _exit usage reads: /* not exit() here... */ I do think that the longer explanation I added in the comment you quoted above is more explanatory than those. I can however add a small note on why skipping registered exit handlers is useful (ie, not risk recursive calls to exit()). > +#define _bail(...) bail_out(true, __VA_ARGS__) > +#define bail(...)bail_out(false, __VA_ARGS__) > > I will never guess what the difference between bail, _bail and bail_out. We > need really good explanation here, or better to use self-explanatory names > here... > > I would start bail_out with _ as it is "internal", not used in the code. > And for _bail I would try to find some name that shows it's nature. Like > bail_in_recursion or something. One option could be to redefine bail() to take the exit function as a parameter and have the caller pass the preferred exit handler. -bail_out(bool non_rec, const char *fmt,...) +bail(void (*exit_func)(int), const char *fmt,...) The callsites would then look like the below, which puts a reference to the actual exit handler used in the code where it is called. I find it a bit uglier, but also quite self-explanatory: @@ -409,10 +403,7 @@ stop_postmaster(void) fflush(NULL); r = system(buf); if (r != 0) - { - /* Not using the normal bail() as we want _exit */ - _bail(_("could not stop postmaster: exit code was %d\n"), r); - } + bail(_exit, _("could not stop postmaster: exit code was %d\n"), r); postmaster_running = false; } @@ -469,7 +460,7 @@ make_temp_sockdir(void) temp_sockdir = mkdtemp(template); if (temp_sockdir == NULL) { - bail(_("could not create directory \"%s\": %s\n"), + bail(exit, _("could not create directory \"%s\": %s\n"), template, strerror(errno)); } Not sure what is the best option, but I've been unable to think of a name which is documenting the code well enough that a comment explaining why isn't required. > This magic spell "...%-5i %s%-*s %8.0f ms\n" is too dark to repeat it even > two > times. I understand problems with spaces... But may be it would be better > somehow narrow it to one ugly print... Print "ok %-5i"|"not ok %-5i" to > buffer first, and then have one "%s%-*s %8.0f ms%s\n" print or something like > that... I'm not convinced that this printf format is that hard to read (which may well be attributed to Stockholm Syndrome), and I do think that breaking it up and adding more code to print the line will make it less readable instead. -- Daniel Gustafsson https://vmware.com/
Re: Add 64-bit XIDs into PostgreSQL 15
On Sun, Nov 20, 2022 at 11:58 PM Chris Travers wrote: > I can start by saying I think it would be helpful (if the other issues are > approached reasonably) to have 64-bit xids, but there is an important piece > of context in reventing xid wraparounds that seems missing from this patch > unless I missed something. > > XID wraparound is a symptom, not an underlying problem. It usually occurs > when autovacuum or other vacuum strategies have unexpected stalls and > therefore fail to work as expected. Shifting to 64-bit XIDs dramatically > changes the sorts of problems that these stalls are likely to pose to > operational teams. -- you can find you are running out of storage rather > than facing an imminent database shutdown. Worse, this patch delays the > problem until some (possibly far later!) time, when vacuum will take far > longer to finish, and options for resolving the problem are diminished. As a > result I am concerned that merely changing xids from 32-bit to 64-bit will > lead to a smaller number of far more serious outages. This is exactly what I think (except perhaps for the part about having fewer outages overall). The more transaction ID space you need, the more space you're likely to need in the near future. We can all agree that having more runway is better than having less runway, at least in some abstract sense, but that in itself doesn't help the patch series very much. The first time the system-level oldestXid (or database level datminfrozenxid) attains an age of 2 billion XIDs will usually *also* be the first time it attains an age of (say) 300 million XIDs. Even 300 million is usually a huge amount of XID space relative to (say) the number of XIDs used every 24 hours. So I know exactly what you mean about just addressing a symptom. The whole project seems to just ignore basic, pertinent questions. Questions like: why are we falling behind like this in the first place? And: If we don't catch up soon, why should we be able to catch up later on? Falling behind on freezing is still a huge problem with 64-bit XIDs. Part of the problem with the current design is that table age has approximately zero relationship with the true cost of catching up on freezing -- we are "using the wrong units", in a very real sense. In general we may have to do zero freezing to advance a table's relfrozenxid age by a billion XIDs, or we might have to write terabytes of FREEZE_PAGE records to advance a similar looking table's relfrozenxid by just one single XID (it could also swing wildly over time for the same table). Which the system simply doesn't try to reason about right now. There are no settings for freezing that use physical units, and frame the problem as a problem of being behind by this many unfrozen pages (they are all based on XID age). And so the problem with letting table age get into the billions isn't even that we'll never catch up -- we actually might catch up very easily! The real problem is that we have no way of knowing ahead of time (at least not right now). VACUUM should be managing the debt, *and* the uncertainty about how much debt we're really in. VACUUM needs to have a dynamic, probabilistic understanding of what's really going on -- something much more sophisticated than looking at table age in autovacuum.c. One reason why you might want to advance relfrozenxid proactively is to give the system a better general sense of the true relationship between logical XID space and physical freezing for a given table and workload -- it gives a clearer picture about the conditions in the table. The relationship between SLRU space and physical heap pages and the work of freezing is made somewhat clearer by a more proactive approach to advancing relfrozenxid. That's one way that VACUUM can lower the uncertainty I referred to. -- Peter Geoghegan
Re: TAP output format in pg_regress
On November 24, 2022 11:07:43 AM PST, Daniel Gustafsson wrote: >> On 24 Nov 2022, at 18:07, Nikolay Shaplov wrote: >One option could be to redefine bail() to take the exit function as a parameter >and have the caller pass the preferred exit handler. > >-bail_out(bool non_rec, const char *fmt,...) >+bail(void (*exit_func)(int), const char *fmt,...) > >The callsites would then look like the below, which puts a reference to the >actual exit handler used in the code where it is called. I'd just rename _bail to bail_noatexit(). >> This magic spell "...%-5i %s%-*s %8.0f ms\n" is too dark to repeat it even >> two >> times. I understand problems with spaces... But may be it would be better >> somehow narrow it to one ugly print... Print "ok %-5i"|"not ok %-5i" to >> buffer first, and then have one "%s%-*s %8.0f ms%s\n" print or something >> like >> that... > >I'm not convinced that this printf format is that hard to read (which may well >be attributed to Stockholm Syndrome), and I do think that breaking it up and >adding more code to print the line will make it less readable instead. I don't think it's terrible either. I do think it'd also be ok to switch between ok / not ok within a single printf, making it easier to keep them in sync. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: indentation in _hash_pgaddtup()
On Fri, 25 Nov 2022 at 04:29, Ted Yu wrote: > Here is patch v2. After running pgindent on v2, I see it still pushes the lines out quite far. If I add a new line after PageGetItemId(page, and put the variable assignment away from the variable declaration then it looks a bit better. It's still 1 char over the limit. David diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c index 9db522051e..dfc7a90d68 100644 --- a/src/backend/access/hash/hashinsert.c +++ b/src/backend/access/hash/hashinsert.c @@ -290,12 +290,20 @@ _hash_pgaddtup(Relation rel, Buffer buf, Size itemsize, IndexTuple itup, { itup_off = PageGetMaxOffsetNumber(page) + 1; +#ifdef USE_ASSERT_CHECKING /* ensure this tuple's hashkey is >= the final existing tuple */ - Assert(PageGetMaxOffsetNumber(page) == 0 || - _hash_get_indextuple_hashkey((IndexTuple) - PageGetItem(page, PageGetItemId(page, - PageGetMaxOffsetNumber(page <= - _hash_get_indextuple_hashkey(itup)); + if (PageGetMaxOffsetNumber(page) > 0) + { + IndexTuple lasttup; + + lasttup = PageGetItem(page, + PageGetItemId(page, + PageGetMaxOffsetNumber(page))); + + Assert(_hash_get_indextuple_hashkey(lasttup) <= + _hash_get_indextuple_hashkey(itup)); + } +#endif } else {
Re: indentation in _hash_pgaddtup()
David Rowley writes: > After running pgindent on v2, I see it still pushes the lines out > quite far. If I add a new line after PageGetItemId(page, and put the > variable assignment away from the variable declaration then it looks a > bit better. It's still 1 char over the limit. If you wanted to be hard-nosed about 80 character width, you could pull out the PageGetItemId call into a separate local variable. I wasn't going to be quite that picky, but I won't object if that seems better to you. regards, tom lane
Re: indentation in _hash_pgaddtup()
On Thu, Nov 24, 2022 at 12:31 PM Tom Lane wrote: > David Rowley writes: > > After running pgindent on v2, I see it still pushes the lines out > > quite far. If I add a new line after PageGetItemId(page, and put the > > variable assignment away from the variable declaration then it looks a > > bit better. It's still 1 char over the limit. > > If you wanted to be hard-nosed about 80 character width, you could > pull out the PageGetItemId call into a separate local variable. > I wasn't going to be quite that picky, but I won't object if that > seems better to you. > > regards, tom lane > Patch v4 stores ItemId in a local variable. hash-pgaddtup-indent-v4.patch Description: Binary data
Re: Time delayed LR (WAS Re: logical replication restrictions)
On Fri, Nov 25, 2022 at 2:15 AM Takamichi Osumi (Fujitsu) wrote: > > On Wednesday, October 5, 2022 6:42 PM Peter Smith > wrote: ... > > > == > > > > 5. src/backend/commands/subscriptioncmds.c - SubOpts > > > > @@ -89,6 +91,7 @@ typedef struct SubOpts > > bool disableonerr; > > char*origin; > > XLogRecPtr lsn; > > + int64 min_apply_delay; > > } SubOpts; > > > > I feel it would be better to be explicit about the storage units. So call > > this > > member ‘min_apply_delay_ms’. E.g. then other code in > > parse_subscription_options will be more natural when you are converting > > using > > and assigning them to this member. > I don't think we use such names including units explicitly. > Could you please tell me a similar example for this ? > Regex search "\..*_ms[e\s]" finds some members where the unit is in the member name. e.g. delay_ms (see EnableTimeoutParams in timeout.h) e.g. interval_in_ms (see timeout_paramsin timeout.c) Regex search ".*_ms[e\s]" finds many local variables where the unit is in the variable name > > == > > > > 16. src/include/catalog/pg_subscription.h > > > > + int64 subapplydelay; /* Replication apply delay */ > > + > > > > Consider renaming this as 'subapplydelayms' to make the units perfectly > > clear. > Similar to the 5th comments, I can't find any examples for this. > I'd like to keep it general, which makes me feel it is more aligned with > existing codes. > As above. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: Questions regarding distinct operation implementation
On Fri, 25 Nov 2022 at 06:57, Ankit Kumar Pandey wrote: > Please let me know any opinions on this. I think if you're planning on working on this then step 1 would have to be checking the SQL standard to see which set of rows it asks implementations to consider for duplicate checks when deciding if the transition should be performed or not. Having not looked, I don't know if this is the entire partition or just the rows in the current frame. Depending on what you want, an alternative today would be to run a subquery to uniquify the rows the way you want and then do the window function stuff. David
Re: indentation in _hash_pgaddtup()
On Fri, 25 Nov 2022 at 09:40, Ted Yu wrote: > Patch v4 stores ItemId in a local variable. ok, I pushed that one. Thank you for working on this. David
Re: indentation in _hash_pgaddtup()
On Fri, 25 Nov 2022 at 09:31, Tom Lane wrote: > If you wanted to be hard-nosed about 80 character width, you could > pull out the PageGetItemId call into a separate local variable. > I wasn't going to be quite that picky, but I won't object if that > seems better to you. I wasn't too worried about the 1 char, but Ted wrote a patch and it looked a little nicer. David
Re: TAP output format in pg_regress
> On 24 Nov 2022, at 20:32, Andres Freund wrote: > > On November 24, 2022 11:07:43 AM PST, Daniel Gustafsson > wrote: >>> On 24 Nov 2022, at 18:07, Nikolay Shaplov wrote: >> One option could be to redefine bail() to take the exit function as a >> parameter >> and have the caller pass the preferred exit handler. >> >> -bail_out(bool non_rec, const char *fmt,...) >> +bail(void (*exit_func)(int), const char *fmt,...) >> >> The callsites would then look like the below, which puts a reference to the >> actual exit handler used in the code where it is called. > > I'd just rename _bail to bail_noatexit(). That's probably the best option, done in the attached along with the comment fixup to mention the recursion issue. >>> This magic spell "...%-5i %s%-*s %8.0f ms\n" is too dark to repeat it even >>> two >>> times. I understand problems with spaces... But may be it would be better >>> somehow narrow it to one ugly print... Print "ok %-5i"|"not ok %-5i" to >>> buffer first, and then have one "%s%-*s %8.0f ms%s\n" print or something >>> like >>> that... >> >> I'm not convinced that this printf format is that hard to read (which may >> well >> be attributed to Stockholm Syndrome), and I do think that breaking it up and >> adding more code to print the line will make it less readable instead. > > I don't think it's terrible either. I do think it'd also be ok to switch > between ok / not ok within a single printf, making it easier to keep them in > sync. I made it into a single printf to see what it would look like, with some additional comments to make it more readable (I'm not a fan of where pgindent moves those but..). -- Daniel Gustafsson https://vmware.com/ v12-0001-Change-pg_regress-output-format-to-be-TAP-compli.patch Description: Binary data
Re: Allow pageinspect's bt_page_stats function to return a set of rows instead of a single row
On Mon, 21 Nov 2022 at 17:34, Naeem Akhter wrote: > The following review has been posted through the commitfest application: > make installcheck-world: tested, passed > Implements feature: tested, passed > Spec compliant: not tested > Documentation:not tested > > i've tested and verified the documentation. Rebasing the patch to the tip of the master branch. pageinspect_btree_multipagestats-v6.patch Description: Binary data
Re: Allow file inclusion in pg_hba and pg_ident files
On Thu, Nov 24, 2022 at 05:07:24PM +0800, Julien Rouhaud wrote: > So I went with CONF_FILE_START_DEPTH and CONF_FILE_MAX_DEPTH. Attached v22 > that fixes it in all the places I found. Sounds fine. Added one comment, fixed one comment, and applied. Thanks! -- Michael signature.asc Description: PGP signature
Re: Bug in row_number() optimization
On Fri, 25 Nov 2022 at 00:52, Sergey Shinderuk wrote: > Shouldn't we handle any pass-by-reference type the same? I suppose, a > user-defined window function can return some other type, not int8. Thanks for reporting this and to you and Richard for working on a fix. I've just looked at it and it seems that valgrind is complaining because a tuple formed by an upper-level WindowAgg contains a pointer to free'd memory due to the byref type and eval_windowaggregates() not having been executed to fill in ecxt_aggvalues and ecxt_aggnulls on the lower-level WindowAgg. Since upper-level WindowAggs cannot reference values calculated in some lower-level WindowAgg, why can't we just NULLify the pointers instead? See attached. It is possible to have a monotonic window function that does not return int8. Technically something like MAX(text_col) OVER (PARTITION BY somecol ORDER BY text_col) is monotonically increasing, it's just that I didn't add a support function to tell the planner about that. Someone could come along in the future and suggest we do that and show us some convincing use case. So whatever the fix, it cannot assume the window function's return type is int8. David diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c index 81ba024bba..083a3b2386 100644 --- a/src/backend/executor/nodeWindowAgg.c +++ b/src/backend/executor/nodeWindowAgg.c @@ -2234,6 +2234,21 @@ ExecWindowAgg(PlanState *pstate) if (winstate->numaggs > 0) eval_windowaggregates(winstate); } + else if (!winstate->top_window) + { + /* +* Otherwise, if we're not the top-window, we'd better fill the +* aggregate values in with something, else they might be pointing +* to freed memory. These don't need to be valid since WindowAggs +* above us cannot reference the result of another WindowAgg. +*/ + numfuncs = winstate->numfuncs; + for (i = 0; i < numfuncs; i++) + { + econtext->ecxt_aggvalues[i] = (Datum) 0; + econtext->ecxt_aggnulls[i] = true; + } + } /* * If we have created auxiliary read pointers for the frame or group
Re: Amcheck verification of GiST and GIN
Hello. I reviewed this patch and I would like to share some comments. It compiled with those 2 warnings: verify_gin.c: In function 'gin_check_parent_keys_consistency': verify_gin.c:481:38: warning: declaration of 'maxoff' shadows a previous local [-Wshadow=compatible-local] 481 | OffsetNumber maxoff = PageGetMaxOffsetNumber(page); | ^~ verify_gin.c:453:41: note: shadowed declaration is here 453 | maxoff; | ^~ verify_gin.c:423:25: warning: unused variable 'heapallindexed' [-Wunused-variable] 423 | boolheapallindexed = *((bool*)callback_state); | ^~ Also, I'm not sure about postgres' headers conventions, inside amcheck.h, there is "miscadmin.h" included, and inside verify_gin.c, verify_gist.h and verify_nbtree.c both amcheck.h and miscadmin.h are included. About the documentation, the bt_index_parent_check has comments about the ShareLock and "SET client_min_messages = DEBUG1;", and both gist_index_parent_check and gin_index_parent_check lack it. verify_gin uses DEBUG3, I'm not sure if it is on purpose, but it would be nice to document it or put DEBUG1 to be consistent. I lack enough context to do a deep review on the code, so in this area this patch needs more eyes. I did the following test: postgres=# create table teste (t text, tv tsvector); CREATE TABLE postgres=# insert into teste values ('hello', 'hello'::tsvector); INSERT 0 1 postgres=# create index teste_tv on teste using gist(tv); CREATE INDEX postgres=# select pg_relation_filepath('teste_tv'); pg_relation_filepath -- base/5/16441 (1 row) postgres=# \q $ bin/pg_ctl -D data -l log waiting for server to shut down done server stopped $ okteta base/5/16441 # I couldn't figure out the dd syntax to change the 1FE9 to '0' $ bin/pg_ctl -D data -l log waiting for server to start done server started $ bin/psql -U ze postgres psql (16devel) Type "help" for help. postgres=# SET client_min_messages = DEBUG3; SET postgres=# select gist_index_parent_check('teste_tv'::regclass, true); DEBUG: verifying that tuples from index "teste_tv" are present in "teste" ERROR: heap tuple (0,1) from table "teste" lacks matching index tuple within index "teste_tv" postgres=# A simple index corruption in gin: postgres=# CREATE TABLE "gin_check"("Column1" int[]); CREATE TABLE postgres=# insert into gin_check values (ARRAY[1]),(ARRAY[2]); INSERT 0 2 postgres=# CREATE INDEX gin_check_idx on "gin_check" USING GIN("Column1"); CREATE INDEX postgres=# select pg_relation_filepath('gin_check_idx'); pg_relation_filepath -- base/5/16453 (1 row) postgres=# \q $ bin/pg_ctl -D data -l logfile stop waiting for server to shut down done server stopped $ okteta data/base/5/16453 # edited some bits near 3FCC $ bin/pg_ctl -D data -l logfile start waiting for server to start done server started $ bin/psql -U ze postgres psql (16devel) Type "help" for help. postgres=# SET client_min_messages = DEBUG3; SET postgres=# SELECT gin_index_parent_check('gin_check_idx', true); ERROR: number of items mismatch in GIN entry tuple, 49 in tuple header, 1 decoded postgres=# There are more code paths to follow to check the entire code, and I had a hard time to corrupt the indices. Is there any automated code to corrupt index to test such code? -- Jose Arthur Benetasso Villanova
Re: Allow file inclusion in pg_hba and pg_ident files
On Fri, Nov 25, 2022 at 07:41:59AM +0900, Michael Paquier wrote: > On Thu, Nov 24, 2022 at 05:07:24PM +0800, Julien Rouhaud wrote: > > So I went with CONF_FILE_START_DEPTH and CONF_FILE_MAX_DEPTH. Attached v22 > > that fixes it in all the places I found. > > Sounds fine. Added one comment, fixed one comment, and applied. > Thanks! Thanks a lot!
Re: Perform streaming logical transactions by background workers and parallel apply
Here are some review comments for v51-0001. == .../replication/logical/applyparallelworker.c 1. General - Error messages, get_worker_name() I previously wrote a comment to ask if the get_worker_name() should be used in more places but the reply [1, #2b] was: > 2b. > Consider if maybe all of these ought to be calling get_worker_name() > which is currently static in worker.c. Doing this means any future > changes to get_worker_name won't cause more inconsistencies. The most error message in applyparallelxx.c can only use "xx parallel worker", so I think it's fine not to call get_worker_name ~ I thought the reply missed the point I was trying to make -- I meant if it was arranged now so *every* message would go via get_worker_name() then in future somebody wanted to change the names (e.g. from "logical replication parallel apply worker" to "LR PA worker") then it would only need to be changed in one central place instead of hunting down every hardwired error message. Anyway, you can do it how you want -- I just was not sure you'd got my original point. ~~~ 2. HandleParallelApplyMessage + case 'X': /* Terminate, indicating clean exit. */ + shm_mq_detach(winfo->error_mq_handle); + winfo->error_mq_handle = NULL; + break; + default: + elog(ERROR, "unrecognized message type received from logical replication parallel apply worker: %c (message length %d bytes)", + msgtype, msg->len); The case 'X' code indentation is too much. == src/backend/replication/logical/origin.c 3. replorigin_session_setup(RepOriginId node, int acquired_by) @@ -1075,12 +1075,20 @@ ReplicationOriginExitCleanup(int code, Datum arg) * array doesn't have to be searched when calling * replorigin_session_advance(). * - * Obviously only one such cached origin can exist per process and the current + * Normally only one such cached origin can exist per process and the current * cached value can only be set again after the previous value is torn down * with replorigin_session_reset(). + * + * However, we do allow multiple processes to point to the same origin slot if + * requested by the caller by passing PID of the process that has already + * acquired it as acquired_by. This is to allow multiple parallel apply + * processes to use the same origin, provided they maintain commit order, for + * example, by allowing only one process to commit at a time. For the first + * process requesting this origin, the acquired_by parameter needs to be set to + * 0. */ void -replorigin_session_setup(RepOriginId node) +replorigin_session_setup(RepOriginId node, int acquired_by) I think the meaning of the acquired_by=0 is not fully described here: "For the first process requesting this origin, the acquired_by parameter needs to be set to 0." IMO that seems to be describing it only from POV that you are always going to want to allow multiple processes. But really this is an optional feature so you might pass acquired_by=0, not just because this is the first of multiple, but also because you *never* want to allow multiple at all. The comment does not convey this meaning. Maybe something worded like below is better? SUGGESTION Normally only one such cached origin can exist per process so the cached value can only be set again after the previous value is torn down with replorigin_session_reset(). For this normal case pass acquired_by=0 (meaning the slot is not allowed to be already acquired by another process). However, sometimes multiple processes can safely re-use the same origin slot (for example, multiple parallel apply processes can safely use the same origin, provided they maintain commit order by allowing only one process to commit at a time). For this case the first process must pass acquired_by=0, and then the other processes sharing that same origin can pass acquired_by=PID of the first process. == src/backend/replication/logical/worker.c 4. GENERAL - get_worker_name() If you decide it is OK to hardwire some error messages instead of unconditionally calling the get_worker_name() -- see my #1 review comment in this post -- then there are some other messages in this file that also seem like they can be also hardwired because the type of worker is already known. Here are some examples: 4a. + else if (am_parallel_apply_worker()) + { + if (rel->state != SUBREL_STATE_READY) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + /* translator: first %s is the name of logical replication worker */ + errmsg("%s for subscription \"%s\" will stop", + get_worker_name(), MySubscription->name), + errdetail("Cannot handle streamed replication transactions using parallel apply workers until all tables have been synchronized."))); + + return true; + } In the above code from should_apply_changes_for_rel we already know this is a parallel apply worker. ~ 4b. + if (am_parallel_apply_worker()) + ereport(LOG, + /* translator: first %s is the name of logical replication worker */ + (errmsg("%s for subscription \
Re: Bug in row_number() optimization
On Fri, Nov 25, 2022 at 7:34 AM David Rowley wrote: > Since upper-level WindowAggs cannot reference values calculated in > some lower-level WindowAgg, why can't we just NULLify the pointers > instead? See attached. Verified the problem is fixed with this patch. I'm not familiar with the WindowAgg execution codes. As far as I understand, this patch works because we set ecxt_aggnulls to true, making it a NULL value. And the top-level WindowAgg node's "Filter" is strict so that it can filter out all the tuples that don't match the intermediate WindowAgg node's run condition. So I find the comments about "WindowAggs above us cannot reference the result of another WindowAgg" confusing. But maybe I'm missing something. Thanks Richard
Re: Patch: Global Unique Index
On Thu, Nov 24, 2022 at 9:39 PM Justin Pryzby wrote: > On Thu, Nov 24, 2022 at 08:52:16PM +0530, Dilip Kumar wrote: > > but now you will have one gigantic index and which will be vacuumed > > every time we vacuum any of the partitions. > > This patch isn't implemented as "one gigantic index", though. If this patch is for supporting a global index then I expect that the global index across all the partitions is going to be big. Anyway, my point was about vacuuming the common index every time you vacuum any of the partitions of the table is not the right way and that will make global indexes less usable. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Patch: Global Unique Index
On Fri, Nov 25, 2022 at 8:49 AM Dilip Kumar wrote: > > On Thu, Nov 24, 2022 at 9:39 PM Justin Pryzby wrote: > > On Thu, Nov 24, 2022 at 08:52:16PM +0530, Dilip Kumar wrote: > > > but now you will have one gigantic index and which will be vacuumed > > > every time we vacuum any of the partitions. > > > > This patch isn't implemented as "one gigantic index", though. > > If this patch is for supporting a global index then I expect that the > global index across all the partitions is going to be big. Anyway, my > point was about vacuuming the common index every time you vacuum any > of the partitions of the table is not the right way and that will make > global indexes less usable. Okay, I got your point. After seeing the details it seems instead of supporting one common index it is just allowing uniqueness checks across multiple index partitions. Sorry for the noise. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Bug in row_number() optimization
On Fri, 25 Nov 2022 at 16:00, Richard Guo wrote: > Verified the problem is fixed with this patch. I'm not familiar with > the WindowAgg execution codes. As far as I understand, this patch works > because we set ecxt_aggnulls to true, making it a NULL value. And the > top-level WindowAgg node's "Filter" is strict so that it can filter out > all the tuples that don't match the intermediate WindowAgg node's run > condition. So I find the comments about "WindowAggs above us cannot > reference the result of another WindowAgg" confusing. But maybe I'm > missing something. There are two different pass-through modes that the WindowAgg can move into when it detects that the run condition is no longer true: 1) WINDOWAGG_PASSTHROUGH and 2) WINDOWAGG_PASSTHROUGH_STRICT #2 is used when the WindowAgg is the top-level one in this query level. Remember we'll need multiple WindowAgg nodes when there are multiple different windows to evaluate. The reason that we need #1 is that if there are multiple WindowAggs, then the top-level one (or just any WindowAgg above it) might need all the rows from a lower-level WindowAgg. For example: SELECT * FROM (SELECT row_number() over(order by id) rn, sum(qty) over (order by date) qty from t) t where rn <= 10; if the "order by id" window is evaluated first, we can't stop outputting rows when rn <= 10 is no longer true as the "order by date" window might need those. In this case, once rn <= 10 is no longer true, the WindowAgg for that window would go into WINDOWAGG_PASSTHROUGH. This means we can stop window func evaluation on any additional rows. The final query will never see rn==11, so we don't need to generate that. The problem is that once the "order by id" window stops evaluating the window funcs, if the window result is byref, then we leave junk in the aggregate output columns. Since we continue to output rows from that WindowAgg for the top-level "order by date" window, we don't want to form tuples with free'd memory. Since nothing in the query will ever seen rn==11 and beyond, there's no need to put anything in that part of the output tuple. We can just make it an SQL NULL. What I mean by "WindowAggs above us cannot reference the result of another WindowAgg" is that the evaluation of sum(qty) over (order by date) can't see the "rn" column. SQL does not allow it. If it did, that would have to look something like: SELECT * FROM (SELECT SUM(row_number() over (order by id)) over (order by date) qty from t); -- not valid SQL WINDOWAGG_PASSTHROUGH_STRICT not only does not evaluate window funcs, it also does not even bother to store tuples in the tuple store. In this case there's no higher-level WindowAgg that will need these tuples, so we can just read our sub-node until we find the next partition, or stop when there's no PARTITION BY clause. Just thinking of the patch a bit more, what I wrote ends up continually zeroing the values and marking the columns as NULL. Likely we can just do this once when we do: winstate->status = WINDOWAGG_PASSTHROUGH; I'll test that out and make sure it works. David
Re: Add sub-transaction overflow status in pg_stat_activity
On Thu, Nov 24, 2022 at 2:26 AM Andres Freund wrote: > > Indeed. This is why I was thinking that just alerting for overflowed xact > isn't particularly helpful. You really want to see how much they overflow and > how often. I think the way of monitoring the subtransaction count and overflow status is helpful at least for troubleshooting purposes. By regularly monitoring user will know which backend(pid) is particularly using more subtransactions and prone to overflow and which backends are actually frequently causing sub-overflow. > I think they're just not always avoidable, even in a very well operated > system. > > > I wonder if we could lower the impact of suboverflowed snapshots by improving > the representation in PGPROC and SnapshotData. What if we > > a) Recorded the min and max assigned subxid in PGPROC > > b) Instead of giving up in GetSnapshotData() once we see a suboverflowed >PGPROC, store the min/max subxid of the proc in SnapshotData. We could >reliably "steal" space for that from ->subxip, as we won't need to store >subxids for that proc. > > c) When determining visibility with a suboverflowed snapshot we use the >ranges from b) to check whether we need to do a subtrans lookup. I think >that'll often prevent subtrans lookups. > > d) If we encounter a subxid whose parent is in progress and not in ->subxid, >and subxcnt isn't the max, add that subxid to subxip. That's not free >because we'd basically need to do an insertion sort, but likely still a lot >cheaper than doing repeated subtrans lookups. > > I think we'd just need a one or two additional fields in SnapshotData. +1 I think this approach will be helpful in many cases, especially when only some of the backend is creating sub-overflow and impacting overall system performance. Now, most of the xids especially the top xid will not fall in that range (unless that sub-overflowing backend is constantly generating subxids and increasing its range) and the lookups for that xids can be done directly in the snapshot's xip array. On another thought, in XidInMVCCSnapshot() in case of sub-overflow why don't we look into the snapshot's xip array first and see if the xid exists there? if not then we can look into the pg_subtrans SLRU and fetch the top xid and relook again into the xip array. It will be more costly in cases where we do not find xid in the xip array because then we will have to search this array twice but I think looking into this array is much cheaper than directly accessing SLRU. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] Add peer authentication TAP test
Hello! On Windows this test fails with error: # connection error: 'psql: error: connection to server at "127.0.0.1", port x failed: # FATAL: no pg_hba.conf entry for host "127.0.0.1", user "buildfarm", database "postgres", no encryption' May be disable this test for windows like in 001_password.pl and 002_saslprep.pl? Best wishes, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Companycommit e4491b91729b307d29ce0205b455936b3a538373 Author: Anton A. Melnikov Date: Fri Nov 25 07:40:11 2022 +0300 Skip test 003_peer.pl for windows like 001_password.pl and 002_saslprep.pl diff --git a/src/test/authentication/t/003_peer.pl b/src/test/authentication/t/003_peer.pl index ce8408a4f8c..7454bf4a598 100644 --- a/src/test/authentication/t/003_peer.pl +++ b/src/test/authentication/t/003_peer.pl @@ -9,6 +9,11 @@ use warnings; use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; +if (!$use_unix_sockets) +{ + plan skip_all => + "authentication tests cannot run without Unix-domain sockets"; +} # Delete pg_hba.conf from the given node, add a new entry to it # and then execute a reload to refresh it.
Re: [PATCH] Add peer authentication TAP test
On Fri, Nov 25, 2022 at 07:56:08AM +0300, Anton A. Melnikov wrote: > On Windows this test fails with error: > # connection error: 'psql: error: connection to server at "127.0.0.1", port > x failed: > # FATAL: no pg_hba.conf entry for host "127.0.0.1", user "buildfarm", > database "postgres", no encryption' > > May be disable this test for windows like in 001_password.pl and > 002_saslprep.pl? You are not using MSVC but MinGW, are you? The buildfarm members with TAP tests enabled are drongo, fairywren, bowerbord and jacana. Even though none of them are running the tests from src/test/authentication/, this is running on a periodic basis in the CI, where we are able to skip the test in MSVC already: postgresql:authentication / authentication/003_peer SKIP 9.73s So yes, it is plausible that we are missing more safeguards here. Your suggestion to skip under !$use_unix_sockets makes sense, as not having unix sockets is not going to work for peer and WIN32 needs SSPI to be secure with pg_regress. Where is your test failing? On the first $node->psql('postgres') at the beginning of the test? Just wondering.. -- Michael signature.asc Description: PGP signature
Re: Questions regarding distinct operation implementation
On 25/11/22 02:14, David Rowley wrote: On Fri, 25 Nov 2022 at 06:57, Ankit Kumar Pandey wrote: Please let me know any opinions on this. I think if you're planning on working on this then step 1 would have to be checking the SQL standard to see which set of rows it asks implementations to consider for duplicate checks when deciding if the transition should be performed or not. Having not looked, I don't know if this is the entire partition or just the rows in the current frame. Depending on what you want, an alternative today would be to run a subquery to uniquify the rows the way you want and then do the window function stuff. David Thanks David, these are excellent pointers, I will look into SQL standard first and so on. -- Regards, Ankit Kumar Pandey
Re: Support logical replication of DDLs
On Sun, 20 Nov 2022 at 09:29, vignesh C wrote: > > On Fri, 11 Nov 2022 at 11:03, Peter Smith wrote: > > > > On Fri, Nov 11, 2022 at 4:17 PM Peter Smith wrote: > > > > > > On Fri, Nov 11, 2022 at 4:09 PM Peter Smith wrote: > > > > > > > > On Fri, Nov 11, 2022 at 3:47 PM Peter Smith > > > > wrote: > > > > > > > > > > Here are more review comments for the v32-0001 file ddl_deparse.c > > > > > > > > > > *** NOTE - my review post became too big, so I split it into smaller > > > > > parts. > > > > > > > > > > > THIS IS PART 4 OF 4. > > > > === > > > > src/backend/commands/ddl_deparse.c > > Thanks for the comments, the attached v39 patch has the changes for the same. One comment: While fixing review comments, I found that default syntax is not handled for create domain: + /* +* Verbose syntax +* +* CREATE DOMAIN %{identity}D AS %{type}T %{not_null}s %{constraints}s +* %{collation}s +*/ + createDomain = new_objtree("CREATE"); + + append_object_object(createDomain, +"DOMAIN %{identity}D AS", + new_objtree_for_qualname_id(TypeRelationId, + objectId)); + append_object_object(createDomain, +"%{type}T", + new_objtree_for_type(typForm->typbasetype, typForm->typtypmod)); Regards, Vignesh
Remove a unused argument from qual_is_pushdown_safe
Hello, I found that qual_is_pushdown_safe() has an argument "subquery" that is not used in the function. This argument has not been referred to since the commit 964c0d0f80e485dd3a4073e073ddfd9bfdda90b2. I think we can remove this if there is no special reason. Regards, Yugo Nagata -- Yugo NAGATA diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 4ddaed31a4..5327ea2803 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -136,8 +136,7 @@ static void check_output_expressions(Query *subquery, static void compare_tlist_datatypes(List *tlist, List *colTypes, pushdown_safety_info *safetyInfo); static bool targetIsInAllPartitionLists(TargetEntry *tle, Query *query); -static bool qual_is_pushdown_safe(Query *subquery, Index rti, - RestrictInfo *rinfo, +static bool qual_is_pushdown_safe(Index rti, RestrictInfo *rinfo, pushdown_safety_info *safetyInfo); static void subquery_push_qual(Query *subquery, RangeTblEntry *rte, Index rti, Node *qual); @@ -2527,7 +2526,7 @@ set_subquery_pathlist(PlannerInfo *root, RelOptInfo *rel, Node *clause = (Node *) rinfo->clause; if (!rinfo->pseudoconstant && -qual_is_pushdown_safe(subquery, rti, rinfo, &safetyInfo)) +qual_is_pushdown_safe(rti, rinfo, &safetyInfo)) { /* Push it down */ subquery_push_qual(subquery, rte, rti, clause); @@ -3813,7 +3812,7 @@ targetIsInAllPartitionLists(TargetEntry *tle, Query *query) * found to be unsafe to reference by subquery_is_pushdown_safe(). */ static bool -qual_is_pushdown_safe(Query *subquery, Index rti, RestrictInfo *rinfo, +qual_is_pushdown_safe(Index rti, RestrictInfo *rinfo, pushdown_safety_info *safetyInfo) { bool safe = true;
Re: [BUG] FailedAssertion in SnapBuildPurgeOlderTxn
On Thu, Nov 24, 2022 at 4:43 PM Amit Kapila wrote: > > On Thu, Nov 24, 2022 at 1:48 PM Masahiko Sawada wrote: > > > > On Wed, Nov 23, 2022 at 12:00 PM Amit Kapila > > wrote: > > > > Agreed not to have a test case for this. > > > > I've attached an updated patch. I've confirmed this patch works for > > all supported branches. > > > > I have slightly changed the checks used in the patch, otherwise looks > good to me. I am planning to push (v11-v15) the attached tomorrow > unless there are more comments. > Pushed. -- With Regards, Amit Kapila.
Re: [PATCH] Add peer authentication TAP test
Hello, thanks for rapid answer! On 25.11.2022 08:18, Michael Paquier wrote: You are not using MSVC but MinGW, are you? The buildfarm members with TAP tests enabled are drongo, fairywren, bowerbord and jacana. Even though none of them are running the tests from src/test/authentication/, this is running on a periodic basis in the CI, where we are able to skip the test in MSVC already: postgresql:authentication / authentication/003_peer SKIP 9.73s There is MSVC on my PC. The project was build with Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30136 for x64 So yes, it is plausible that we are missing more safeguards here. Your suggestion to skip under !$use_unix_sockets makes sense, as not having unix sockets is not going to work for peer and WIN32 needs SSPI to be secure with pg_regress. Where is your test failing? On the first $node->psql('postgres') at the beginning of the test? Just wondering.. The test fails almost at the beginning in reset_pg_hba call after modification pg_hba.conf and node reloading: #t/003_peer.pl .. Dubious, test returned 2 (wstat 512, 0x200) #No subtests run Logs regress_log_003_peer and 003_peer_node.log are attached. With best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company2022-11-25 09:55:49.731 MSK [7648] LOG: starting PostgreSQL 16devel, compiled by Visual C++ build 1929, 64-bit 2022-11-25 09:55:49.735 MSK [7648] LOG: listening on IPv4 address "127.0.0.1", port 60161 2022-11-25 09:55:49.773 MSK [2892] LOG: database system was shut down at 2022-11-25 09:55:49 MSK 2022-11-25 09:55:49.800 MSK [7648] LOG: database system is ready to accept connections 2022-11-25 09:55:49.890 MSK [7648] LOG: received SIGHUP, reloading configuration files 2022-11-25 09:55:50.003 MSK [84] [unknown] LOG: connection received: host=127.0.0.1 port=49822 2022-11-25 09:55:50.007 MSK [84] [unknown] FATAL: no pg_hba.conf entry for host "127.0.0.1", user "postgres", database "postgres", no encryption 2022-11-25 09:55:50.114 MSK [3356] [unknown] LOG: connection received: host=127.0.0.1 port=49823 2022-11-25 09:55:50.117 MSK [3356] [unknown] FATAL: no pg_hba.conf entry for host "127.0.0.1", user "postgres", database "postgres", no encryption 2022-11-25 09:55:50.201 MSK [7648] LOG: received immediate shutdown request 2022-11-25 09:55:50.212 MSK [7648] LOG: database system is shut down # Checking port 60161 # Found port 60161 Name: node Data directory: c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/pgdata Backup directory: c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/backup Archive directory: c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/archives Connection string: port=60161 host=127.0.0.1 Log file: c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/log/003_peer_node.log # Running: initdb -D c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/pgdata -A trust -N The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "English_United States.1252". The default database encoding has accordingly been set to "WIN1252". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory c:/projects/1c-master-7397/postgrespro/src/test/authentication/tmp_check/t_003_peer_node_data/pgdata ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... windows selecting default max_connections ... 100 selecting default shared_buffers ... 128MB selecting default time zone ... Europe/Moscow creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok Sync to disk skipped. The data directory might become corrupt if the operating system crashes. Success. You can now start the database server using: pg_ctl -D ^"c^:^/projects^/1c^-master^-7397^/postgrespro^/src^\test^\authentication^/tmp^_check^/t^_003^_peer^_node^_data^/pgdata^" -l logfile start # Running: c:/projects/1c-master-7397/postgrespro/Release/pg_regress/pg_regress --config-auth c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/pgdata ### Starting node "node" # Running: pg_ctl -w -D c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/pgdata -l c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/log/003_peer_node.log -o --cluster-name=node start waiting for server to start done server started # Postmaster PID for node "node" is 7648 ### Reloading node "node" # Running: pg_ctl -D c:/projects/1c-master-7397/postgrespro/src\test\authentication/tmp_check/t_003_peer_node_data/pgdata reload serve
Re: A problem about join ordering
On Fri, Nov 11, 2022 at 11:24 PM Tom Lane wrote: > Richard Guo writes: > > I'm wondering whether we need to insist on being strict for the lower > > OJ's min_righthand. What if we instead check strictness for its whole > > syn_righthand? > > Surely not. What if the only point of strictness is for a rel that > isn't part of the min_righthand? Then we could end up re-ordering > based on a condition that isn't actually strict for what we've > chosen as the join's RHS. > > It might be possible to change the other part of the equation and > consider the A/B join's min_righthand instead of syn_righthand > while checking if Pcd uses A/B's RHS; but I'm not real sure about > that either. The problem described upthread occurs in the case where the lower OJ is in our LHS. For the other case where the lower OJ is in our RHS, it seems we also have join ordering problem. As an example, consider A leftjoin (B leftjoin (C leftjoin D on (Pcd)) on (Pbc)) on (Pac) A leftjoin ((B leftjoin C on (Pbc)) leftjoin D on (Pcd)) on (Pac) The two forms are equal if we assume Pcd is strict for C, according to outer join identity 3. In the two forms we both decide that we cannot interchange the ordering of A/C join and B/C join, because Pac uses B/C join's RHS. So we add B/C join's full syntactic relset to A/C join's min_righthand to preserve the ordering. However, in the first form B/C's full syntactic relset includes {B, C, D}, while in the second form it only includes {B, C}. As a result, the A/(BC) join is illegal in the first form and legal in the second form, and this will determine whether we can get the third form as below (A leftjoin (B leftjoin C on (Pbc)) on (Pac)) leftjoin D on (Pcd) This makes me rethink whether we should use lower OJ's full syntactic relset or just its min_lefthand + min_righthand to be added to min_righthand to preserve ordering for this case. But I'm not sure about this. Any thoughts? Thanks Richard
Re: [PATCH] Add peer authentication TAP test
On Fri, Nov 25, 2022 at 10:13:29AM +0300, Anton A. Melnikov wrote: > The test fails almost at the beginning in reset_pg_hba call after > modification pg_hba.conf and node reloading: > #t/003_peer.pl .. Dubious, test returned 2 (wstat 512, 0x200) > #No subtests run > > Logs regress_log_003_peer and 003_peer_node.log are attached. Yeah, that's failing exactly at the position I am pointing to. I'll go apply what you have.. -- Michael signature.asc Description: PGP signature
Re: [PATCH] Add peer authentication TAP test
On 25.11.2022 10:34, Michael Paquier wrote: On Fri, Nov 25, 2022 at 10:13:29AM +0300, Anton A. Melnikov wrote: The test fails almost at the beginning in reset_pg_hba call after modification pg_hba.conf and node reloading: #t/003_peer.pl .. Dubious, test returned 2 (wstat 512, 0x200) #No subtests run Logs regress_log_003_peer and 003_peer_node.log are attached. Yeah, that's failing exactly at the position I am pointing to. I'll go apply what you have.. -- Michael Thanks! With the best wishes, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Does pg_rman support PG15?
Hi, To the developers working on pg_rman, are there any plans to support PG15 and when might pg_rman source be released? The latest version of pg_rman, V1.3.14, appears to be incompatible with PG15. (In PG15, pg_start_backup()/pg_stop_backup() have been renamed.)