Re: [PATCH][postgres_fdw] Add push down of CASE WHEN clauses
Le 07/07/2021 à 06:59, David Rowley a écrit : > On Wed, 7 Jul 2021 at 10:18, Gilles Darold wrote: >> I have noticed that postgres_fdw do not push down the CASE WHEN clauses. In >> the following case this normal: > This looks very similar to [1] which is in the current commitfest. > > Are you able to look over that patch and check to ensure you're not > doing anything extra that the other patch isn't. If so, then likely > the best way to progress would be for you to test and review that > patch. > > David > > [1] https://commitfest.postgresql.org/33/3171/ Strange I have searched the commitfest yesterday but without success, this is clearly a duplicate. Anyway, thanks for the pointer and yes I will review Alexander's patch as I know the subject now :-) Best regards -- Gilles Darold MigOps Inc (https://migops.com/)
Re: [PATCH] Make jsonapi usable from libpq
Michael Paquier writes: > It seems to me that this does not address yet the problems with the > palloc/pstrdup in jsonapi.c though, which would need to rely on > malloc() if we finish to use this code in libpq. I am not sure yet > that we have any need to do that yet as we may finish by not using > OAUTH as SASL mechanism at the end in core. So perhaps it would be > better to just give up on this thread for now? Yeah, I think there's nothing to do here unless we decide that we have to have JSON-parsing ability inside libpq ... which is a situation I think we should try hard to avoid. regards, tom lane
Re: [PATCH] Make jsonapi usable from libpq
On Tue, Jun 29, 2021 at 03:34:29PM -0400, Tom Lane wrote: > Actually, I'd forgotten that the PQExpBuffer functions are already > exported by libpq, and much of our frontend code already uses them > from there. So we don't really need to move anything unless there's > a call to use this code in clients that don't use libpq, which are > a pretty small set. > > Also, having them be available both from libpq.so and from libpgcommon.a > would be a tad problematic I think; it'd be hard to tell which way the > linker would choose to resolve that. Coming back to this thread. You were thinking about switching to PQExpBuffer for the error strings generated depending on error code for the frontend, right? Using a PQExpBuffer would be a problem if attempting to get a more detailed error for pg_verifybackup, though I guess that we can continue to live in this tool without this much amount of details in the error strings. It seems to me that this does not address yet the problems with the palloc/pstrdup in jsonapi.c though, which would need to rely on malloc() if we finish to use this code in libpq. I am not sure yet that we have any need to do that yet as we may finish by not using OAUTH as SASL mechanism at the end in core. So perhaps it would be better to just give up on this thread for now? -- Michael signature.asc Description: PGP signature
Re: row filtering for logical replication
On Thu, Jul 1, 2021 at 10:43 AM Euler Taveira wrote: > > > Amit, thanks for rebasing this patch. I already had a similar rebased patch in > my local tree. A recent patch broke your version v15 so I rebased it. > Hi, I did some testing of the performance of the row filtering, in the case of the publisher INSERTing 100,000 rows, using a similar test setup and timing as previously used in the “commands_to_perf_test.sql“ script posted by Önder Kalacı. I found that with the call to ExecInitExtraTupleSlot() in pgoutput_row_filter(), then the performance of pgoutput_row_filter() degrades considerably over the 100,000 invocations, and on my system it took about 43 seconds to filter and send to the subscriber. However, by caching the tuple table slot in RelationSyncEntry, this duration can be dramatically reduced by 38+ seconds. A further improvement can be made using this in combination with Peter's plan cache (v16-0004). I've attached a patch for this, which relies on the latest v16-0001 and v16-0004 patches posted by Peter Smith (noting that v16-0001 is identical to your previously-posted 0001 patch). Also attached is a graph (created by Peter Smith – thanks!) detailing the performance improvement. Regards, Greg Nancarrow Fujitsu Australia v16-0005-Improve-row-filtering-performance.patch Description: Binary data
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
> > > > For example: SELECT nullablecol FROM tab WHERE nullablecol = ; > > If the equality operator is strict then the nullablecol can be NULL in > the WHERE clause but not in the SELECT list. Tom's idea should allow > us to determine both of those things but your idea cannot tell them > apart, so, in theory at least, Tom's idea seems better to me. > > David > That's really something I can't do, thanks for the explanation. -- Best Regards Andy Fan (https://www.aliyun.com/)
Re: [PATCH] Pull general SASL framework out of SCRAM
On Tue, Jul 06, 2021 at 06:20:49PM +, Jacob Champion wrote: > On Mon, 2021-07-05 at 17:17 +0900, Michael Paquier wrote: > Each name must be null-terminated, not just null-separated. That way > the list of names ends with an empty string: > > name-one\0 <- added by the mechanism > name-two\0<- added by the mechanism > \0 <- added by the framework > > The way it's worded now, I could see some implementers failing to > terminate the final name because the framework adds a trailing null > already -- but the framework is terminating the list, not the final > name. Good point. I have used ending with '\0' bytes instead. >> + * init() >> + * >> + * Initializes mechanism-specific state for a connection. This >> + * callback must return a pointer to its allocated state, which will >> + * be passed as-is as the first argument to the other callbacks. >> + * free() is called to release any state resources. > > Maybe say "The free() callback is called" to differentiate it from > standard free()? Yes, that could be confusing. Switched to your wording instead. > It's possible for conn->sasl to be NULL here, say if the client has > channel_binding=require but connects as a user with an MD5 secret. The > SCRAM TAP tests have one such case. Indeed. >> Hmm. Does the RFCs tell us anything about that? > > Just in general terms: > >>Each authentication exchange consists of a message from the client to >>the server requesting authentication via a particular mechanism, >>followed by one or more pairs of challenges from the server and >>responses from the client, followed by a message from the server >>indicating the outcome of the authentication exchange. (Note: >>exchanges may also be aborted as discussed in Section 3.5.) > > So a challenge must be met with a response, or the exchange must be > aborted. (And I don't think our protocol implementation provides a > client abort message; if something goes wrong, we just tear down the > connection.) Thanks. At the same time, section 3.5 also says that the client may send a message to abort. So one can interpret that the client has also the choice to abort without sending a response back to the server? Or I am just interpreting incorrectly the use of "may" in this context? -- Michael signature.asc Description: PGP signature
Re: [PATCH][postgres_fdw] Add push down of CASE WHEN clauses
On Wed, 7 Jul 2021 at 10:18, Gilles Darold wrote: > I have noticed that postgres_fdw do not push down the CASE WHEN clauses. In > the following case this normal: This looks very similar to [1] which is in the current commitfest. Are you able to look over that patch and check to ensure you're not doing anything extra that the other patch isn't. If so, then likely the best way to progress would be for you to test and review that patch. David [1] https://commitfest.postgresql.org/33/3171/
Re: ExecRTCheckPerms() and many prunable partitions
On Fri, 2 Jul 2021 at 12:41, Amit Langote wrote: > I'll mark the CF entry as WoA, unless you'd rather I just mark it RwF. I've set it to waiting on author. It was still set to needs review. If you think you'll not get time to write the patch during this CF, feel free to bump it out. David
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Wed, 7 Jul 2021 at 02:46, David Christensen wrote: > if we do decide to expand the units table there will be a > few additional changes (most significantly, the return value of > `pg_size_bytes()` will need to switch > to `numeric`). I wonder if it's worth changing pg_size_bytes() to return NUMERIC regardless of if we add any additional units or not. Would you like to create 2 patches, one to change the return type and another to add the new units, both based on top of the v2 patch I sent earlier? David
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
On Tue, 6 Jul 2021 at 22:39, David Rowley wrote: > If anyone feels differently, please let me know in the next couple of > days. Otherwise, I plan on taking a final look and pushing it soon. After doing some very minor adjustments, I pushed this. (29f45e299). Thanks to James and Zhihong for reviewing. David
Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
On Tue, Jul 6, 2021 at 11:06 PM Tom Lane wrote: > > Amul Sul writes: > > On Tue, Apr 20, 2021 at 6:59 AM Kyotaro Horiguchi > > wrote: > >> I don't mind RelationGetSmgr(index)->smgr_rnode alone or > >> >member alone and there's not the previous call to > >> RelationGetSmgr just above. How about using a temporary variable? > >> > >> SMgrRelation srel = RelationGetSmgr(index); > >> smgrwrite(srel, ...); > >> log_newpage(srel->..); > > > Understood. Used a temporary variable for the place where > > RelationGetSmgr() calls are placed too close or in a loop. > > [ squint... ] Doesn't this risk introducing exactly the sort of > cache-clobber hazard we're trying to prevent? That is, the above is > not safe unless you are *entirely* certain that there is not and never > will be any possibility of a relcache flush before you are done using > the temporary variable. Otherwise it can become a dangling pointer. > Yeah, there will a hazard, even if we sure right but cannot guarantee future changes in any subroutine that could get call in between. > The point of the static-inline function idea was to be cheap enough > that it isn't worth worrying about this sort of risky optimization. > Given that an smgr function is sure to involve some kernel calls, > I doubt it's worth sweating over an extra test-and-branch beforehand. > So where I was hoping to get to is that smgr objects are *only* > referenced by RelationGetSmgr() calls and nobody ever keeps any > other pointers to them across any non-smgr operations. > Ok, will revert changes added in the previous version, thanks. Regards, Amul
Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options
On Wed, Jul 7, 2021 at 5:33 AM Peter Smith wrote: > PSA my patch which includes all the fixes mentioned above. I agree with Amit to start a separate thread to discuss these points. IMO, we can close this thread. What do you think? Regards, Bharath Rupireddy.
Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options
On Wed, Jul 7, 2021 at 7:36 AM Alvaro Herrera wrote: > > On 2021-Jul-07, Peter Smith wrote: > > > 1. Zap 'opts' up-front > > > > + * > > + * Caller is expected to have cleared 'opts'. > > > > This comment is putting the onus on the caller to "do the right thing". > > > > I think that hopeful expectations about input should be removed - the > > function should just be responsible itself just to zap the SubOpts > > up-front. It makes the code more readable, and it removes any > > potential risk of garbage unintentionally passed in that struct. > > > > /* Start out with cleared opts. */ > > memset(opts, 0, sizeof(SubOpts)); > > Yeah, I gave the initialization aspect some thought too when I reviewed > 0001. Having an explicit memset() just for sanity check is a waste when > you don't really need it; and we're initializing the struct already at > declaration time by assigning {0} to it, so having to add memset feels > like such a waste. Another point in the same area is that some of the > struct members are initialized to some default value different from 0, > so I wondered if it would have been better to remove the = {0} and > instead have another function that would set everything up the way we > want; but it seemed a bit excessive, so I ended up not suggesting that. > > We have many places in the source tree where the caller is expected to > do the right thing, even when those right things are more complex than > this one. This one place isn't terribly bad in that regard, given that > it's a pretty well contained static struct anyway (we would certainly > not export a struct of this name in any .h file.) I don't think it's > all that bad. > > > Alternatively, at least there should be an assertion for some sanity check. > > > > Assert(opt->specified_opts == 0); > > No opinion on this. It doesn't seem valuable enough, but maybe I'm on > the minority on this. > I am also not sure if such an assertion adds much value. > > 2. Remove redundant conditions > > > > /* Check for incompatible options from the user. */ > > - if (enabled && *enabled_given && *enabled) > > + if (opts->enabled && > > + IsSet(supported_opts, SUBOPT_ENABLED) && > > + IsSet(opts->specified_opts, SUBOPT_ENABLED)) > > (etc) > > Yeah, I thought about this too when I saw the 0002 patch in this series. > I agree that the extra rechecks are a bit pointless. > I don't think the committed patch has made anything worse here or added any new condition. Now, even if we want to change these conditions, it is better to discuss them separately. If we see as per current code these look a bit redundant but OTOH, in the future one might expect that if the supported option is not passed by the caller and the user has specified it then it should be an error. For example, we don't want to support some option via some Alter variant but it is supported by Create variant. -- With Regards, Amit Kapila.
Re: rand48 replacement
Fabien COELHO писал 2021-07-06 23:49: Hello Yura, However, I'm not enthousiastic at combining two methods depending on the range, the function looks complex enough without that, so I would suggest not to take this option. Also, the decision process adds to the average cost, which is undesirable. Given 99.99% cases will be in the likely case, branch predictor should eliminate decision cost. Hmmm. ISTM that a branch predictor should predict that unknown < small should probably be false, so a hint should be given that it is really true. Why? Branch predictor is history based: if it were usually true here then it will be true this time either. unknown < small is usually true therefore branch predictor will assume it is true. I put `likely` for compiler: compiler then puts `likely` path closer. And as Dean Rasheed correctly mentioned, mask method will have really bad pattern for branch predictor if range is not just below or equal to power of two. On average the bitmask is the better unbiased method, if the online figures are to be trusted. Also, as already said, I do not really want to add code complexity, especially to get lower average performance, and especially with code like "threshold = - range % range", where both variables are unsigned, I have a headache just looking at it:-) If you mention https://www.pcg-random.org/posts/bounded-rands.html then 1. first graphs are made with not exact Lemire's code. Last code from https://lemire.me/blog/2016/06/30/fast-random-shuffling/ (which I derived from) performs modulo operation only if `(leftover < range)`. Even with `rand_range(100)` it is just once in four thousands runs. 2. You can see "Small-Constant Benchmark" at that page, Debiased Int is 1.5 times faster. And even in "Small-Shuffle" benchmark their unoptimized version is on-par with mask method. 3. If you go to "Making Improvements/Faster Threshold-Based Discarding" section, then you'll see code my version is matched with. It is twice faster than masked method in Small-Shuffle benchmark, and just a bit slower in Large-Shuffle. And __builtin_clzl is not free lunch either, it has latency 3-4 cycles on modern processor. Well, % is not cheap either. Well, probably it could run in parallel with some part of xoroshiro, but it depends on how compiler will optimize this function. I would certainly select the unbias multiply method if we want a u32 range variant. There could be two functions. Yep, but do we need them? Who is likely to want 32 bits pseudo random ints in a range? pgbench needs 64 bits. So I'm still inclined to just keep the faster-on-average bitmask method, despite that it may be slower for some ranges. The average cost for the worst case in PRNG calls is, if I'm not mistaken: 1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2 which does not look too bad. You doesn't count cost of branch-misprediction. https://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-processing-an-unsorted-array https://lemire.me/blog/2019/10/15/mispredicted-branches-can-multiply-your-running-times/ Therefore calculation should be at least: 1 * 0.5 + 0.5 * (3 + 0.5 * (3 + ...)) = 6 By the way, we have 64bit random. If we use 44bit from it for range <= (1<<20), then bias will be less than 1/(2**24). Could we just ignore it (given it is not crypto strong random)? uint64 pg_prng_u64_range(pg_prng_state *state, uint64 range) { uint64 val = xoroshiro128ss(state); uint64 m; if ((range & (range-1) == 0) /* handle all power 2 cases */ return range != 0 ? val & (range-1) : 0; if (likely(range < (1<<20))) /* * While multiply method is biased, bias will be smaller than 1/(1<<24) for * such small ranges. Lets ignore it. */ return ((val >> 20) * range) >> 44; /* Apple's mask method */ m = mask_u64(range-1); val &= m; while (val >= range) val = xoroshiro128ss(state) & m; return val; } Anyway, excuse me for heating this discussion cause of such non-essential issue. I'll try to control myself and don't proceed it further. regards, Sokolov Yura.
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Wed, 7 Jul 2021 at 02:54, Tom Lane wrote: > > Minor nit: use "const char *text" in the struct declaration, so > that all of the static data can be placed in fixed storage. Thanks for pointing that out. > David Rowley writes: > > (I'm not sure why pgindent removed the space between != and NULL, but > > it did, so I left it.) > > It did that because "text" is a typedef name, so it's a bit confused > about whether the statement is really a declaration. Personally I'd > have used "name" or something like that for that field, anyway. I should have thought of that. Thanks for highlighting it. I've renamed the field. Updated patch attached. David v2-0001-Use-lookup-table-for-units-in-pg_size_pretty-and-.patch Description: Binary data
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Wed, 7 Jul 2021 at 00:51, Dean Rasheed wrote: > 10. half_round(21) == 11 :: 20 >> 1 = 10 > > The correct result should be 10 (it would be very odd to claim that > 10241 bytes should be displayed as 11kb), but the half-rounding keeps > rounding up at each stage. > > That's a general property of rounding -- you need to be very careful > when rounding more than once, since otherwise errors will propagate. > C.f. 4083f445c0, which removed a double-round in numeric sqrt(). Thanks. I've adjusted the patch to re-add the round bool flag and get rid of the rightShift field. I'm now calculating how many bits to shift right by based on the difference between the unitbits of the current and next unit then taking 1 bit less if the next unit does half rounding and the current one does not, or adding an extra bit on in the opposite case. I'll post another patch shortly. David
Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options
On Tue, Jul 6, 2021 at 9:24 PM Alvaro Herrera wrote: > > On 2021-Jul-06, Bharath Rupireddy wrote: > > > Thanks, Amit. I'm posting the 0002 patch which removes extra ereport > > calls using local variables. Please review it. > > I looked at this the other day and I'm not sure I like it very much. > It's not making anything any simpler, it's barely saving two lines of > code. I think we can do without this change. Just for the records. I will withdraw the 0002 patch as no one has shown interest. Thanks. Regards, Bharath Rupireddy.
Warn if initdb's --sync-only option is mixed with other options
When reading through code for my previous patch [1] I realized that initdb does *not* warn users that it ignores all other options (except -D/--pgdata) if the --sync-only option is used. I'm not able to come up with an exact situation to prove this, but this behaviour seems potentially dangerous. The user might mix the --sync-only option with other options, but would be extremely surprised if those other options didn't take effect. I _think_ we should throw an error if the user specifies any options that are being ignored. But an error might break someone's automation (perhaps for their own good), since the current behaviour has been in place for a very long time, so I'm willing to settle for at least a warning in such a case. [1]: Slightly improve initdb --sync-only option's help message https://www.postgresql.org/message-id/CABwTF4U6hbNNE1bv%3DLxQdJybmUdZ5NJQ9rKY9tN82NXM8QH%2BiQ%40mail.gmail.com Best regards, -- Gurjeet Singh http://gurjeet.singh.im/ v1-0001-Warn-if-sync-only-is-used-with-other-options.patch Description: Binary data
Re: postgres_fdw - should we tighten up batch_size, fetch_size options against non-numeric values?
On 2021/07/01 21:41, Bharath Rupireddy wrote: On Thu, Jul 1, 2021 at 6:07 PM Fujii Masao wrote: On 2021/07/01 13:16, Bharath Rupireddy wrote: On Thu, Jul 1, 2021 at 8:23 AM Fujii Masao wrote: The recent commit 61d599ede7 documented that the type of those options is floating point. But the docs still use "is a numeric value" in the descriptions of them. Probably it should be replaced with "is a floating point value" there. If we do this, isn't it better to use "floating point" even in the error message? Agreed. PSA v5 patch. Thanks for updating the patch! LGTM. Barring any objection, I will commit this patch. Thanks. One question is; should we back-patch this? This is not a bug fix, so I'm not sure if it's worth back-patching that to already-released versions. But it may be better to do that to v14. IMO, it's a good-to-have fix in v14. But, -1 for backpatching to v13 and lower branches. Agreed. So I pushed the patch to master and v14. Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options
On 2021-Jul-07, Peter Smith wrote: > 1. Zap 'opts' up-front > > + * > + * Caller is expected to have cleared 'opts'. > > This comment is putting the onus on the caller to "do the right thing". > > I think that hopeful expectations about input should be removed - the > function should just be responsible itself just to zap the SubOpts > up-front. It makes the code more readable, and it removes any > potential risk of garbage unintentionally passed in that struct. > > /* Start out with cleared opts. */ > memset(opts, 0, sizeof(SubOpts)); Yeah, I gave the initialization aspect some thought too when I reviewed 0001. Having an explicit memset() just for sanity check is a waste when you don't really need it; and we're initializing the struct already at declaration time by assigning {0} to it, so having to add memset feels like such a waste. Another point in the same area is that some of the struct members are initialized to some default value different from 0, so I wondered if it would have been better to remove the = {0} and instead have another function that would set everything up the way we want; but it seemed a bit excessive, so I ended up not suggesting that. We have many places in the source tree where the caller is expected to do the right thing, even when those right things are more complex than this one. This one place isn't terribly bad in that regard, given that it's a pretty well contained static struct anyway (we would certainly not export a struct of this name in any .h file.) I don't think it's all that bad. > Alternatively, at least there should be an assertion for some sanity check. > > Assert(opt->specified_opts == 0); No opinion on this. It doesn't seem valuable enough, but maybe I'm on the minority on this. > 2. Remove redundant conditions > > /* Check for incompatible options from the user. */ > - if (enabled && *enabled_given && *enabled) > + if (opts->enabled && > + IsSet(supported_opts, SUBOPT_ENABLED) && > + IsSet(opts->specified_opts, SUBOPT_ENABLED)) (etc) Yeah, I thought about this too when I saw the 0002 patch in this series. I agree that the extra rechecks are a bit pointless. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Slightly improve initdb --sync-only option's help message
When reading the output of `initdb --help` I could not clearly understand what the purpose of the --sync-only option was, until I read the documentation of initdb. -S, --sync-only only sync data directory Perhaps the confusion was caused by the fact that sync(hronization) means different things in different contexts, and many of those contexts apply to databases, and to data directories; time sync, data sync, replica sync, etc. I think it would be helpful if the help message was slightly more descriptive. Some options: Used in patch: only sync data directory; does not modify any data To match the wording of --sync-only option: write contents of data directory to disk; helpful after --no-sync option Clearly specify the system operation used for the option perform fsync on data directory; helpful after --no-sync option Best regards, -- Gurjeet Singh http://gurjeet.singh.im/ v1-0001-Explicitly-state-that-sync-only-does-not-modify-d.patch Description: Binary data
Re: visibility map corruption
On Tue, Jul 6, 2021 at 08:36:13PM -0400, Bruce Momjian wrote: > On Tue, Jul 6, 2021 at 06:49:10PM -0400, Bruce Momjian wrote: > > My point is that there are a lot internals involved here that are not > > part of pg_upgrade, though it probably only affects pg_upgrade. Anyway, > > Bertrand patch seems to have what I need. > > One question is how do we want to handle cases where -x next_xid is used > but -u oldestXid is not used? Compute a value for oldestXid like we did > previously? Throw an error? Leave oldestXid unchanged? I am thinking > the last option. Here is a modified version of Bertrand's patch, with docs, that does the last option. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion. oldestxid.diff.gz Description: application/gzip
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
On Wed, 7 Jul 2021 at 13:04, Andy Fan wrote: > Looking forward to watching this change closely, thank you both David and Tom! > But I still don't understand what the faults my way have , do you mind > telling the > details? The problem is that we don't need 6 different ways to determine if a Var can be NULL or not. You're proposing to add a method using Bitmapsets and Tom has some proposing ideas around tracking nullability in Vars. We don't need both. It seems to me that having it in Var allows us to have a much finer gradient about where exactly a Var can be NULL. For example: SELECT nullablecol FROM tab WHERE nullablecol = ; If the equality operator is strict then the nullablecol can be NULL in the WHERE clause but not in the SELECT list. Tom's idea should allow us to determine both of those things but your idea cannot tell them apart, so, in theory at least, Tom's idea seems better to me. David
RE: [HACKERS] logical decoding of two-phase transactions
On Tuesday, July 6, 2021 7:18 PM Ajin Cherian > > thanks for the test! > I hadn't updated the case where sending schema across was the first > change of the transaction as part of the decoding of the > truncate command. In this test case, the schema was sent across > without the stream start, hence the error on the apply worker. > I have updated with a fix. Please do a test and confirm. > Thanks for your patch. I have tested and confirmed that the issue was fixed. Regards Tang
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
On Tue, Jul 6, 2021 at 9:14 PM Tom Lane wrote: > David Rowley writes: > > Tom, I'm wondering if you might get a chance to draw up a design for > > what you've got in mind with this? I assume adding a new field in > > Var, but I'm drawing a few blanks on how things might work for equal() > > when one Var has the field set and another does not. > > As I said before, it hasn't progressed much past the handwaving stage, > but it does seem like it's time to get it done. I doubt I'll have any > cycles for it during the commitfest, but maybe I can devote a block of > time during August. > > regards, tom lane > Looking forward to watching this change closely, thank you both David and Tom! But I still don't understand what the faults my way have , do you mind telling the details? -- Best Regards Andy Fan (https://www.aliyun.com/)
Re: Re[3]: On login trigger: take three
On Sun, Jul 4, 2021 at 1:21 PM vignesh C wrote: > > CFBot shows the following failure: > # poll_query_until timed out executing this query: > # SELECT '0/3046250' <= replay_lsn AND state = 'streaming' FROM > pg_catalog.pg_stat_replication WHERE application_name = 'standby_1'; > # expecting this output: > # t > # last actual query output: > # t > # with stderr: > # NOTICE: You are welcome! > # Looks like your test exited with 29 before it could output anything. > t/001_stream_rep.pl .. > Dubious, test returned 29 (wstat 7424, 0x1d00) > Thanks. I found that the patch was broken by commit f452aaf7d (the part "adjust poll_query_until to insist on empty stderr as well as a stdout match"). So I had to remove a "RAISE NOTICE" (which was just an informational message) from the login trigger function, to satisfy the new poll_query_until expectations. Also, I updated a PG14 version check (now must check PG15 version). Regards, Greg Nancarrow Fujitsu Australia v16-0001-on_client_connect_event_trigger.patch Description: Binary data
Re: visibility map corruption
On Tue, Jul 6, 2021 at 06:49:10PM -0400, Bruce Momjian wrote: > On Tue, Jul 6, 2021 at 03:46:48PM -0700, Peter Geoghegan wrote: > > On Tue, Jul 6, 2021 at 3:30 PM Bruce Momjian wrote: > > > Yes, I can, though it seems like a much bigger issue than pg_upgrade. > > > I will be glad to dig into it. > > > > I'm not sure what you mean by that. Technically this would be an issue > > for any program that uses "pg_resetwal -x" in the way that pg_upgrade > > does, with those same expectations. But isn't pg_upgrade the only > > known program that behaves like that? > > > > I don't see any reason why this wouldn't be treated as a pg_upgrade > > bug in the release notes, regardless of the exact nature or provenance > > of the issue -- the pg_upgrade framing seems useful because this is a > > practical problem for pg_upgrade users alone. Have I missed something? > > My point is that there are a lot internals involved here that are not > part of pg_upgrade, though it probably only affects pg_upgrade. Anyway, > Bertrand patch seems to have what I need. One question is how do we want to handle cases where -x next_xid is used but -u oldestXid is not used? Compute a value for oldestXid like we did previously? Throw an error? Leave oldestXid unchanged? I am thinking the last option. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options
On Tue, Jul 6, 2021 at 6:21 PM Amit Kapila wrote: > > On Fri, Jul 2, 2021 at 12:36 PM Amit Kapila wrote: > > > > On Fri, Jul 2, 2021 at 8:35 AM Alvaro Herrera > > wrote: > > > > > > > The latest patch sent by Bharath looks good to me. Would you like to > > > > commit it or shall I take care of it? > > > > > > Please, go ahead. > > > > > > > Okay, I'll push it early next week (by Tuesday) unless there are more > > comments or suggestions. Thanks! > > > > Pushed! Yesterday, I needed to refactor a lot of code due to this push [1]. The refactoring exercise caused me to study these v11 changes much more deeply. IMO there are a few improvements that should be made: // 1. Zap 'opts' up-front + * + * Caller is expected to have cleared 'opts'. This comment is putting the onus on the caller to "do the right thing". I think that hopeful expectations about input should be removed - the function should just be responsible itself just to zap the SubOpts up-front. It makes the code more readable, and it removes any potential risk of garbage unintentionally passed in that struct. /* Start out with cleared opts. */ memset(opts, 0, sizeof(SubOpts)); Alternatively, at least there should be an assertion for some sanity check. Assert(opt->specified_opts == 0); 2. Remove redundant conditions /* Check for incompatible options from the user. */ - if (enabled && *enabled_given && *enabled) + if (opts->enabled && + IsSet(supported_opts, SUBOPT_ENABLED) && + IsSet(opts->specified_opts, SUBOPT_ENABLED)) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), /*- translator: both %s are strings of the form "option = value" */ errmsg("%s and %s are mutually exclusive options", "connect = false", "enabled = true"))); - if (create_slot && create_slot_given && *create_slot) + if (opts->create_slot && + IsSet(supported_opts, SUBOPT_CREATE_SLOT) && + IsSet(opts->specified_opts, SUBOPT_CREATE_SLOT)) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("%s and %s are mutually exclusive options", "connect = false", "create_slot = true"))); - if (copy_data && copy_data_given && *copy_data) + if (opts->copy_data && + IsSet(supported_opts, SUBOPT_COPY_DATA) && + IsSet(opts->specified_opts, SUBOPT_COPY_DATA)) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("%s and %s are mutually exclusive options", "connect = false", "copy_data = true"))); By definition, this function only allows any option to be "specified_opts" if that option is also "supported_opts". So, there is really no need in the above code to re-check again that it is supported. It can be simplified like this: /* Check for incompatible options from the user. */ - if (enabled && *enabled_given && *enabled) + if (opts->enabled && + IsSet(opts->specified_opts, SUBOPT_ENABLED)) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), /*- translator: both %s are strings of the form "option = value" */ errmsg("%s and %s are mutually exclusive options", "connect = false", "enabled = true"))); - if (create_slot && create_slot_given && *create_slot) + if (opts->create_slot && + IsSet(opts->specified_opts, SUBOPT_CREATE_SLOT)) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("%s and %s are mutually exclusive options", "connect = false", "create_slot = true"))); - if (copy_data && copy_data_given && *copy_data) + if (opts->copy_data && + IsSet(opts->specified_opts, SUBOPT_COPY_DATA)) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("%s and %s are mutually exclusive options", "connect = false", "copy_data = true"))); - 3. Remove redundant conditions Same as 2. Here are more examples of conditions where the redundant checking of "supported_opts" can be removed. /* * Do additional checking for disallowed combination when slot_name = NONE * was used. */ - if (slot_name && *slot_name_given && !*slot_name) + if (!opts->slot_name && + IsSet(supported_opts, SUBOPT_SLOT_NAME) && + IsSet(opts->specified_opts, SUBOPT_SLOT_NAME)) { - if (enabled && *enabled_given && *enabled) + if (opts->enabled && + IsSet(supported_opts, SUBOPT_ENABLED) && + IsSet(opts->specified_opts, SUBOPT_ENABLED)) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), /*- translator: both %s are strings of the form "option = value" */ errmsg("%s and %s are mutually exclusive options", "slot_name = NONE", "enabled = true"))); - if (create_slot && create_slot_given && *create_slot) + if (opts->create_slot && + IsSet(supported_opts, SUBOPT_CREATE_SLOT) && + IsSet(opts->specified_opts, SUBOPT_CREATE_SLOT)) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), + /*- translator: both %s are strings of the form "option = value" */ errmsg("%s and %s are mutually exclusive options", "slot_name = NONE", "create_slot = true"))); It can be simplified like this: /* * Do additional checking for disallowed combination when slot_name = NONE * was used. */ - if (slot_name && *slot_name_given && !*slot_name) + if
Re: Column Filtering in Logical Replication
Hello, here are a few comments on this patch. The patch adds a function get_att_num_by_name; but we have a lsyscache.c function for that purpose, get_attnum. Maybe that one should be used instead? get_tuple_columns_map() returns a bitmapset of the attnos of the columns in the given list, so its name feels wrong. I propose get_table_columnset(). However, this function is invoked for every insert/update change, so it's going to be far too slow to be usable. I think you need to cache the bitmapset somewhere, so that the function is only called on first use. I didn't look very closely, but it seems that struct RelationSyncEntry may be a good place to cache it. The patch adds a new parse node PublicationTable, but doesn't add copyfuncs.c, equalfuncs.c, readfuncs.c, outfuncs.c support for it. Maybe try a compile with WRITE_READ_PARSE_PLAN_TREES and/or COPY_PARSE_PLAN_TREES enabled to make sure everything is covered. (I didn't verify that this actually catches anything ...) The new column in pg_publication_rel is prrel_attr. This name seems at odds with existing column names (we don't use underscores in catalog column names). Maybe prattrs is good enough? prrelattrs? We tend to use plurals for columns that are arrays. It's not super clear to me that strlist_to_textarray() and related processing will behave sanely when the column names contain weird characters such as commas or quotes, or just when used with uppercase column names. Maybe it's worth having tests that try to break such cases. You seem to have left a debugging "elog(LOG)" line in OpenTableList. I got warnings from "git am" about trailing whitespace being added by the patch in two places. Thanks! -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: visibility map corruption
On Tue, Jul 6, 2021 at 3:49 PM Bruce Momjian wrote: > My point is that there are a lot internals involved here that are not > part of pg_upgrade, though it probably only affects pg_upgrade. Anyway, > Bertrand patch seems to have what I need. I was confused by your remarks because I am kind of looking at it from the opposite angle. At least now that I've thought about it a bit. Since the snippet of pg_resetwal code that sets oldestXid wasn't ever intended to be used by pg_upgrade, but was anyway, what we have is a something that's clearly totally wrong (at least in the pg_upgrade case). It's not just wrong for pg_upgrade to do things that way -- it's also wildly unreasonable. We heard a complaint about this from Reddit only because it worked "as designed", and so made the cluster immediately have an anti-wraparound autovacuum. But why would anybody want that behavior, even if it was implemented correctly? It simply makes no sense. The consequences of this bug are indeed complicated and subtle and will probably never be fully understood. But at the same time fixing the bug now seems kind of simple. (Working backwards to arrive here was a bit tricky, mind you.) -- Peter Geoghegan
Re: visibility map corruption
On Tue, Jul 6, 2021 at 03:46:48PM -0700, Peter Geoghegan wrote: > On Tue, Jul 6, 2021 at 3:30 PM Bruce Momjian wrote: > > Yes, I can, though it seems like a much bigger issue than pg_upgrade. > > I will be glad to dig into it. > > I'm not sure what you mean by that. Technically this would be an issue > for any program that uses "pg_resetwal -x" in the way that pg_upgrade > does, with those same expectations. But isn't pg_upgrade the only > known program that behaves like that? > > I don't see any reason why this wouldn't be treated as a pg_upgrade > bug in the release notes, regardless of the exact nature or provenance > of the issue -- the pg_upgrade framing seems useful because this is a > practical problem for pg_upgrade users alone. Have I missed something? My point is that there are a lot internals involved here that are not part of pg_upgrade, though it probably only affects pg_upgrade. Anyway, Bertrand patch seems to have what I need. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: visibility map corruption
On Tue, Jul 6, 2021 at 06:30:41PM -0400, Bruce Momjian wrote: > On Tue, Jul 6, 2021 at 02:27:34PM -0700, Peter Geoghegan wrote: > > > BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal > > > so many times? Why can't we pass all of the update-this options in one > > > call? > > > > No opinion here. > > > > > Who's going to do the legwork on this? > > > > Can Bruce take care of committing the patch for this? Bruce? > > > > This should definitely be in the next point release IMV. > > Yes, I can, though it seems like a much bigger issue than pg_upgrade. > I will be glad to dig into it. Bertrand Drouvot provided a patch in the thread, so I will start from that; CC'ing him too. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: visibility map corruption
On Tue, Jul 6, 2021 at 3:30 PM Bruce Momjian wrote: > Yes, I can, though it seems like a much bigger issue than pg_upgrade. > I will be glad to dig into it. I'm not sure what you mean by that. Technically this would be an issue for any program that uses "pg_resetwal -x" in the way that pg_upgrade does, with those same expectations. But isn't pg_upgrade the only known program that behaves like that? I don't see any reason why this wouldn't be treated as a pg_upgrade bug in the release notes, regardless of the exact nature or provenance of the issue -- the pg_upgrade framing seems useful because this is a practical problem for pg_upgrade users alone. Have I missed something? -- Peter Geoghegan
Re: visibility map corruption
On Tue, Jul 6, 2021 at 02:27:34PM -0700, Peter Geoghegan wrote: > > BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal > > so many times? Why can't we pass all of the update-this options in one > > call? > > No opinion here. > > > Who's going to do the legwork on this? > > Can Bruce take care of committing the patch for this? Bruce? > > This should definitely be in the next point release IMV. Yes, I can, though it seems like a much bigger issue than pg_upgrade. I will be glad to dig into it. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: visibility map corruption
On Tue, Jul 6, 2021 at 3:12 PM Mark Dilger wrote: > Thanks, Peter, for drawing my attention to this. I had already been > following this thread, but had not yet thought about the problem in terms of > amcheck. > > I will investigate possible solutions in verify_heapam(). Thanks! Great that we might be able to make a whole class of bugs detectable with the new amcheck stuff. Glad that I didn't forget about amcheck myself -- I almost forgot. When I was working on the btree amcheck code, I looked for interesting historical bugs and made sure that I could detect them. That seems even more important with heapam. I wouldn't be surprised if one or two important invariants were missed, in part because the heapam design didn't have invariants as a starting point. -- Peter Geoghegan
[PATCH][postgres_fdw] Add push down of CASE WHEN clauses
Hi, I have noticed that postgres_fdw do not push down the CASE WHEN clauses. In the following case this normal: contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT (CASE WHEN mod(c1, 4) = 0 THEN 1 ELSE 2 END) FROM ft1; QUERY PLAN - Foreign Scan on public.ft1 (cost=100.00..146.00 rows=1000 width=4) (actual time=0.306..0.844 rows=822 loops=1) Output: CASE WHEN (mod(c1, 4) = 0) THEN 1 ELSE 2 END Remote SQL: SELECT "C 1" FROM "S 1"."T 1" Planning Time: 0.139 ms Execution Time: 1.057 ms (5 rows) but in these other cases this is a performances killer, all records are fetched contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT sum(CASE WHEN mod(c1, 4) = 0 THEN 1 ELSE 2 END) FROM ft1; QUERY PLAN --- Aggregate (cost=148.50..148.51 rows=1 width=8) (actual time=1.421..1.422 rows=1 loops=1) Output: sum(CASE WHEN (mod(c1, 4) = 0) THEN 1 ELSE 2 END) -> Foreign Scan on public.ft1 (cost=100.00..141.00 rows=1000 width=4) (actual time=0.694..1.366 rows=822 loops=1) Output: c1 Remote SQL: SELECT "C 1" FROM "S 1"."T 1" Planning Time: 1.531 ms Execution Time: 3.901 ms (7 rows) contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM ft1 WHERE c1 > (CASE WHEN mod(c1, 4) = 0 THEN 1 ELSE 100 END); QUERY PLAN - Foreign Scan on public.ft1 (cost=100.00..148.48 rows=333 width=47) (actual time=0.763..3.003 rows=762 loops=1) Output: c1, c2, c3, c4, c5, c6, c7, c8 Filter: (ft1.c1 > CASE WHEN (mod(ft1.c1, 4) = 0) THEN 1 ELSE 100 END) Rows Removed by Filter: 60 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" Planning Time: 0.584 ms Execution Time: 3.392 ms (7 rows) The attached patch adds push down of CASE WHEN clauses. Queries above have the following plans when this patch is applied: contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT sum(CASE WHEN mod(c1, 4) = 0 THEN 1 ELSE 2 END) FROM ft1; QUERY PLAN -- Foreign Scan (cost=107.50..128.53 rows=1 width=8) (actual time=2.022..2.024 rows=1 loops=1) Output: (sum(CASE WHEN (mod(c1, 4) = 0) THEN 1 ELSE 2 END)) Relations: Aggregate on (public.ft1) Remote SQL: SELECT sum(CASE WHEN (mod("C 1", 4) = 0) THEN 1 ELSE 2 END) FROM "S 1"."T 1" Planning Time: 0.252 ms Execution Time: 2.684 ms (6 rows) contrib_regression=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM ft1 WHERE c1 > (CASE WHEN mod(c1, 4) = 0 THEN 1 ELSE 100 END); QUERY PLAN -- Foreign Scan on public.ft1 (cost=100.00..135.16 rows=333 width=47) (actual time=1.797..3.463 rows=762 loops=1) Output: c1, c2, c3, c4, c5, c6, c7, c8 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" > CASE WHEN (mod("C 1", 4) = 0) THEN 1 ELSE 100 END)) Planning Time: 0.745 ms Execution Time: 3.860 ms (5 rows) I don't see a good reason to never push the CASE WHEN clause but perhaps I'm missing something, any though? Best regards, -- Gilles Darold MigOps Inc (http://migops.com) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 31919fda8c..8b8ca91e25 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -185,6 +185,7 @@ static void appendAggOrderBy(List *orderList, List *targetList, static void appendFunctionName(Oid funcid, deparse_expr_cxt *context); static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno, deparse_expr_cxt *context); +static void deparseCaseExpr(CaseExpr *node, deparse_expr_cxt *context); /* * Helper functions @@ -796,6 +797,53 @@ foreign_expr_walker(Node *node, state = FDW_COLLATE_UNSAFE; } break; + case T_CaseTestExpr: + { +CaseTestExpr *c = (CaseTestExpr *) node; + +/* + * If the expression has nondefault collation, either it's of a + * non-builtin type, or it reflects folding of a CollateExpr. + * It's unsafe to send to the remote unless it's used in a + * non-collation-sensitive context. + */ +collation = c->collation; +if
Re: [PATCH] Allow CustomScan nodes to signal projection support
Aleksander Alekseev writes: >> I named the flag CUSTOMPATH_SUPPORT_PROJECTION similar to the other >> custom node flags, but this would revert the current logic > This seems to be a typical Kobayashi Maru situation, i.e any choice is > a bad one. I suggest keeping the patch as is and hoping that the > developers of existing extensions read the release notes. Yeah, I concur that's the least bad choice. I got annoyed by the fact that the existing checks of CustomPath flags had several randomly different styles for basically identical tests, and this patch wanted to introduce yet another. I cleaned that up and pushed this. regards, tom lane
Re: visibility map corruption
> On Jul 6, 2021, at 2:27 PM, Peter Geoghegan wrote: > > It looks like amcheck's verify_heapam.c functionality almost catches > bugs like this one. Something for Mark (CC'd) to consider. Does it > matter that we usually "ctx.oldest_xid = ctx.relfrozenxid", and so > usually use pg_class.relfrozenxid as our oldest_xid (and not > ShmemVariableCache->oldestXid)? In other words, could we be doing more > to sanitize ShmemVariableCache->oldestXid, especially when the > relation's pg_class.relfrozenxid happens to be set to a real XID? Thanks, Peter, for drawing my attention to this. I had already been following this thread, but had not yet thought about the problem in terms of amcheck. I will investigate possible solutions in verify_heapam(). — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: trivial improvement to system_or_bail
On 2021-Jun-30, Daniel Gustafsson wrote: > + BAIL_OUT("system $_[0] failed: $!\n"); > I wonder if we should take more inspiration from the Perl manual and change it > to "failed to execute" to make it clear that the failure was in executing the > program, not from the program itself? You're right, that's a good distinction to make. I've used this wording. Thanks. > +1 for adding the extra details, but another thing that I've always found > very confusing is just the phrasing of the message itself. It makes > no sense unless (a) you know that "system" is Perl's function for > executing a shell command, (b) you are familiar with Perl's generally > cavalier approach to parentheses, and (c) you are also unbothered by > whether the word "failed" is part of the message text or the command > being complained of. We really need to do something to set off the > shell command's text from the surrounding verbiage a little better. > > I'd prefer something like > > command "pg_ctl start" failed: details here Done that way, thanks for the suggestion. Failures now look like this, respectively: Bailout called. Further testing stopped: failed to execute command "finitdb -D /home/alvherre/Code/pgsql-build/master/src/test/recovery/tmp_check/t_019_replslot_limit_primary_data/pgdata -A trust -N --wal-segsize=1": No such file or directory Bailout called. Further testing stopped: command "initdb -0D /home/alvherre/Code/pgsql-build/master/src/test/recovery/tmp_check/t_019_replslot_limit_primary_data/pgdata -A trust -N --wal-segsize=1" exited with value 1 Bailout called. Further testing stopped: command "initdb -0D /home/alvherre/Code/pgsql-build/master/src/test/recovery/tmp_check/t_019_replslot_limit_primary_data/pgdata -A trust -N --wal-segsize=1" died with signal 11 Previously it was just Bailout called. Further testing stopped: system initdb failed -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ "I must say, I am absolutely impressed with what pgsql's implementation of VALUES allows me to do. It's kind of ridiculous how much "work" goes away in my code. Too bad I can't do this at work (Oracle 8/9)." (Tom Allison) http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php
Re: visibility map corruption
On Tue, Jul 6, 2021 at 11:57 AM Tom Lane wrote: > Peter Geoghegan writes: > > ... We should just carry forward the original oldestXid. > > Yup. It's a bit silly that we recognized the need to do that > for oldestMultiXid yet not for oldestXid. True. But at the same time it somehow doesn't seem silly at all. IME some of the most devious bugs evade detection by hiding in plain sight. It looks like amcheck's verify_heapam.c functionality almost catches bugs like this one. Something for Mark (CC'd) to consider. Does it matter that we usually "ctx.oldest_xid = ctx.relfrozenxid", and so usually use pg_class.relfrozenxid as our oldest_xid (and not ShmemVariableCache->oldestXid)? In other words, could we be doing more to sanitize ShmemVariableCache->oldestXid, especially when the relation's pg_class.relfrozenxid happens to be set to a real XID? > BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal > so many times? Why can't we pass all of the update-this options in one > call? No opinion here. > Who's going to do the legwork on this? Can Bruce take care of committing the patch for this? Bruce? This should definitely be in the next point release IMV. -- Peter Geoghegan
Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.
James Hilliard writes: > On Tue, Jul 6, 2021 at 2:34 PM Tom Lane wrote: >> As far as I can tell, the only way to really deal with #2 is to >> perform a runtime dlsym() probe to see whether pwritev exists, and >> then fall back to our src/port/ implementation if not. This does >> not look particularly hard (especially since the lookup code only >> needs to work on macOS), though I admit I've not tried to code it. > Maybe we just need to do something like this?: > https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06499.html Hm. I don't trust __builtin_available too much --- does it exist on really old macOS? dlsym() seems less likely to create problems in that respect. Of course, if there are scenarios where __builtin_available knows that the symbol is available but doesn't work, then we might have to use it. In any case, I don't like their superstructure around the probe. What I'd prefer is a function pointer that initially points to the probe code, and then we change it to point at either pwritev or the pg_pwritev emulation. Certainly failing with ENOSYS is exactly what *not* to do. regards, tom lane
Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.
On Tue, Jul 6, 2021 at 2:34 PM Tom Lane wrote: > > Peter Eisentraut writes: > > I think this change is perfectly appropriate (modulo some small cleanups). > > I think there are a couple of issues here. > > 1. People who are already using MACOSX_DEPLOYMENT_TARGET to control > their builds would like to keep on doing so, but the AC_CHECK_FUNCS > probe doesn't work with that. James' latest patch seems like a > reasonable fix for that (it's a lot less invasive than where we > started from). There is a worry about side-effects on other > platforms, but I don't see an answer to that that's better than > "throw it on the buildfarm and see if anything breaks". > > However ... > > 2. We'd really like to use preadv/pwritev where available. I > maintain that MACOSX_DEPLOYMENT_TARGET is not only not the right > approach to that, but it's actually counterproductive. It forces > you to build for the lowest common denominator, ie the oldest macOS > release you want to support. Even when running on a release that > has pwritev, your build will never use it. > > As far as I can tell, the only way to really deal with #2 is to > perform a runtime dlsym() probe to see whether pwritev exists, and > then fall back to our src/port/ implementation if not. This does > not look particularly hard (especially since the lookup code only > needs to work on macOS), though I admit I've not tried to code it. Maybe we just need to do something like this?: https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg06499.html > > What's unclear to me at the moment is whether #1 and #2 interact, > ie is there still any point in changing configure's probes if > we put in a runtime check on Darwin? I think that we might want > to pay no attention to what the available header files say about > pwritev, as long as we can get the correct 'struct iovec' > declaration from them. > > regards, tom lane
Re: rand48 replacement
Hello Yura, However, I'm not enthousiastic at combining two methods depending on the range, the function looks complex enough without that, so I would suggest not to take this option. Also, the decision process adds to the average cost, which is undesirable. Given 99.99% cases will be in the likely case, branch predictor should eliminate decision cost. Hmmm. ISTM that a branch predictor should predict that unknown < small should probably be false, so a hint should be given that it is really true. And as Dean Rasheed correctly mentioned, mask method will have really bad pattern for branch predictor if range is not just below or equal to power of two. On average the bitmask is the better unbiased method, if the online figures are to be trusted. Also, as already said, I do not really want to add code complexity, especially to get lower average performance, and especially with code like "threshold = - range % range", where both variables are unsigned, I have a headache just looking at it:-) And __builtin_clzl is not free lunch either, it has latency 3-4 cycles on modern processor. Well, % is not cheap either. Well, probably it could run in parallel with some part of xoroshiro, but it depends on how compiler will optimize this function. I would certainly select the unbias multiply method if we want a u32 range variant. There could be two functions. Yep, but do we need them? Who is likely to want 32 bits pseudo random ints in a range? pgbench needs 64 bits. So I'm still inclined to just keep the faster-on-average bitmask method, despite that it may be slower for some ranges. The average cost for the worst case in PRNG calls is, if I'm not mistaken: 1 * 0.5 + 2 * 0.25 + 3 * 0.125 + ... ~ 2 which does not look too bad. -- Fabien.
Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.
Peter Eisentraut writes: > I think this change is perfectly appropriate (modulo some small cleanups). I think there are a couple of issues here. 1. People who are already using MACOSX_DEPLOYMENT_TARGET to control their builds would like to keep on doing so, but the AC_CHECK_FUNCS probe doesn't work with that. James' latest patch seems like a reasonable fix for that (it's a lot less invasive than where we started from). There is a worry about side-effects on other platforms, but I don't see an answer to that that's better than "throw it on the buildfarm and see if anything breaks". However ... 2. We'd really like to use preadv/pwritev where available. I maintain that MACOSX_DEPLOYMENT_TARGET is not only not the right approach to that, but it's actually counterproductive. It forces you to build for the lowest common denominator, ie the oldest macOS release you want to support. Even when running on a release that has pwritev, your build will never use it. As far as I can tell, the only way to really deal with #2 is to perform a runtime dlsym() probe to see whether pwritev exists, and then fall back to our src/port/ implementation if not. This does not look particularly hard (especially since the lookup code only needs to work on macOS), though I admit I've not tried to code it. What's unclear to me at the moment is whether #1 and #2 interact, ie is there still any point in changing configure's probes if we put in a runtime check on Darwin? I think that we might want to pay no attention to what the available header files say about pwritev, as long as we can get the correct 'struct iovec' declaration from them. regards, tom lane
Re: Using COPY FREEZE in pgbench
On Sun, 4 Jul 2021 at 09:32, Tatsuo Ishii wrote: > > >> So overall gain by the patch is around 15%, whereas the last test > >> before the commit was 14%. It seems the patch is still beneficial > >> after the commit. > > > > Yes, that's good! > > Yeah! > I tested this with -s100 and got similar results, but with -s1000 it was more like 1.5x faster: master: done in 111.33 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 52.45 s, vacuum 32.30 s, primary keys 26.58 s) patch: done in 74.04 s (drop tables 0.46 s, create tables 0.04 s, client-side generate 51.81 s, vacuum 2.11 s, primary keys 19.63 s) Nice! Regards, Dean
Re: Hook for extensible parsing.
On Sat, Jun 12, 2021 at 4:29 AM Julien Rouhaud wrote: > I'd like to propose an alternative approach, which is to allow multiple > parsers > to coexist, and let third-party parsers optionally fallback on the core > parsers. I'm sending this now as a follow-up of [1] and to avoid duplicated > efforts, as multiple people are interested in that topic. The patches all build properly and pass all regressions tests. > pg_parse_query() will instruct plugins to parse a query at a time. They're > free to ignore that mode if they want to implement the 3rd mode. If so, they > should either return multiple RawStmt, a single RawStmt with a 0 or > strlen(query_string) stmt_len, or error out. Otherwise, they will implement > either mode 1 or 2, and they should always return a List containing a single > RawStmt with properly set stmt_len, even if the underlying statement is NULL. > This is required to properly skip valid strings that don't contain a > statements, and pg_parse_query() will skip RawStmt that don't contain an > underlying statement. Wouldn't we want to only loop through the individual statements if parser_hook exists? The current patch seems to go through the new code path regardless of the hook being grabbed.
Re: proposal - log_full_scan
> On 6 Jul 2021, at 18:14, Pavel Stehule wrote: > I thought about it more, and sometimes bitmap index scans are problematic > too, index scans in nested loops can be a problem too. Right. Depending on the circumstances, pretty much anything in a plan can be something deemed problematic in some production setting. > Unfortunately, this idea is not well prepared. My patch was a proof concept - > but I think so it is not a good start point. Maybe it needs some tracing API > on executor level and some tool like "perf top", but for executor. Post > execution analysis is not a good direction with big overhead, and mainly it > is not friendly in critical situations. I need some handy tool like perf, but > for executor nodes. I don't know how to do it effectively. These are hot codepaths so adding tracing instrumentation which collects enough information to be useful, and which can be drained fast enough to not be a resource problem is tricky. > Thank you for your review and for your time, but I think it is better to > remove this patch from commit fest. I have no idea how to design this feature > well :-/ No worries, I hope we see an updated approach at some time. In the meantime I'm marking this patch Returned with Feedback. -- Daniel Gustafsson https://vmware.com/
Re: prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619
On Wed, Jun 30, 2021 at 03:29:41PM +0900, Michael Paquier wrote: > On Tue, May 18, 2021 at 01:04:18PM +0200, Drouvot, Bertrand wrote: > > On 5/17/21 8:56 PM, Andres Freund wrote: > >> On 2021-05-17 20:14:40 +0200, Drouvot, Bertrand wrote: > >>> I was also wondering if: > >>> > >>> * We should keep the old behavior in case pg_resetwal -x is being used > >>> without -u? (The proposed patch does not set an arbitrary oldestXID > >>> anymore in case -x is used) > >> I don't think we should. I don't see anything in the old behaviour worth > >> maintaining. > > So, pg_resetwal logic with the oldest XID assignment is causing some > problem here. This open item is opened for some time now and it is > idle for a couple of weeks. It looks that we have some solution > drafted, to be able to move forward, with the following things (no > patches yet): > - More robustness safety checks in procarray.c. > - A rework of oldestXid in pg_resetwal. > > Is there somebody working on that? Bertrand sent a patch which is awaiting review. This allows both of Tom's reproducers to pass pg_upgrade (with assertions and delays). http://cfbot.cputube.org/bertrand-drouvot.html I re-arranged the pg_upgrade output of that patch: it was in the middle of the two halves: "Setting next transaction ID and epoch for new cluster" -- Justin >From 30204d85005d6288d2dc93bcd28ff4fb2454a703 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Sun, 16 May 2021 16:23:02 -0400 Subject: [PATCH 1/3] prion failed with ERROR: missing chunk number 0 for toast value 14334 in pg_toast_2619 It also seems like some assertions in procarray.c would be a good idea. With the attached patch, we get through core regression just fine, but the pg_upgrade test fails immediately after the "Resetting WAL archives" step. --- src/backend/storage/ipc/procarray.c | 20 1 file changed, 20 insertions(+) diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c index 4c91e721d0..324e105c59 100644 --- a/src/backend/storage/ipc/procarray.c +++ b/src/backend/storage/ipc/procarray.c @@ -2486,6 +2486,15 @@ GetSnapshotData(Snapshot snapshot) oldestfxid); /* accurate value known */ GlobalVisTempRels.maybe_needed = GlobalVisTempRels.definitely_needed; + + /* Do basic sanity check on these XIDs */ + Assert(FullTransactionIdPrecedesOrEquals(GlobalVisSharedRels.maybe_needed, + GlobalVisSharedRels.definitely_needed)); + Assert(FullTransactionIdPrecedesOrEquals(GlobalVisCatalogRels.maybe_needed, + GlobalVisCatalogRels.definitely_needed)); + Assert(FullTransactionIdPrecedesOrEquals(GlobalVisDataRels.maybe_needed, + GlobalVisDataRels.definitely_needed)); + /* not much point in checking GlobalVisTempRels, given the above */ } RecentXmin = xmin; @@ -4020,6 +4029,8 @@ GlobalVisTestFor(Relation rel) Assert(FullTransactionIdIsValid(state->definitely_needed) && FullTransactionIdIsValid(state->maybe_needed)); + Assert(FullTransactionIdPrecedesOrEquals(state->maybe_needed, + state->definitely_needed)); return state; } @@ -4085,6 +4096,15 @@ GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons) GlobalVisDataRels.definitely_needed); GlobalVisTempRels.definitely_needed = GlobalVisTempRels.maybe_needed; + /* Do basic sanity check on these XIDs */ + Assert(FullTransactionIdPrecedesOrEquals(GlobalVisSharedRels.maybe_needed, + GlobalVisSharedRels.definitely_needed)); + Assert(FullTransactionIdPrecedesOrEquals(GlobalVisCatalogRels.maybe_needed, + GlobalVisCatalogRels.definitely_needed)); + Assert(FullTransactionIdPrecedesOrEquals(GlobalVisDataRels.maybe_needed, + GlobalVisDataRels.definitely_needed)); + /* not much point in checking GlobalVisTempRels, given the above */ + ComputeXidHorizonsResultLastXmin = RecentXmin; } -- 2.17.0 >From 544a88b1651b0db177d5883219c7ae75f1100f7b Mon Sep 17 00:00:00 2001 From: "Drouvot, Bertrand" Date: Tue, 18 May 2021 13:26:38 +0200 Subject: [PATCH 2/3] pg_upgrade can result in early wraparound on databases with high transaction load MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 5/4/21 10:17 AM, Drouvot, Bertrand wrote: Copy/pasting Andres feedback (Thanks Andres for this feedback) on those questions from another thread [1]. > I was also wondering if: > > * We should keep the old behavior in case pg_resetwal -x is being used > without -u? (The proposed patch does not set an arbitrary oldestXID > anymore in case -x is used) Andres: I don't think we should. I don't see anything in the old behaviour worth maintaining. > * We should ensure that the xid provided with -x or -u is > >= FirstNormalTransactionId (Currently the only check is that it is > # 0)? Andres: Applying TransactionIdIsNormal() seems like a good idea. => I am attaching a new version that makes use of TransactionIdIsNormal() checks.
Re: visibility map corruption
Peter Geoghegan writes: > ... We should just carry forward the original oldestXid. Yup. It's a bit silly that we recognized the need to do that for oldestMultiXid yet not for oldestXid. BTW, is it really necessary for copy_xact_xlog_xid to invoke pg_resetwal so many times? Why can't we pass all of the update-this options in one call? Who's going to do the legwork on this? regards, tom lane
Re: visibility map corruption
On Tue, Jul 6, 2021 at 10:58 AM Bruce Momjian wrote: > Well, pg_upgrade corruptions are rare, but so is modifying > autovacuum_freeze_max_age. If we have a corruption and we know > autovacuum_freeze_max_age was modified, odds are that is the cause. My point is that there isn't necessarily that much use in trying to determine what really happened here. It would be nice to know for sure, but it shouldn't affect what we do about the bug. In a way the situation with the bug is simple. Clearly Tom wasn't thinking of pg_upgrade when he wrote the relevant pg_resetwal code that sets oldestXid, because pg_upgrade didn't exist at the time. He was thinking of restoring the database to a relatively sane state in the event of some kind of corruption, with all of the uncertainty that goes with that. Nobody noticed that pg_upgrade gets this same behavior until much more recently. Now that we see the problem laid out, there isn't much to think about that will affect the response to the issue. At least as far as I can tell. We know that pg_upgrade uses pg_resetwal's -x flag in a context where there is no reason at all to think that the database is corrupt -- Tom can't have anticipated that all those years ago. It's easy to see that the behavior is wrong for pg_upgrade, and it's very hard to imagine any way in which it might have accidentally made some sense all along. We should just carry forward the original oldestXid. -- Peter Geoghegan
Re: [PATCH] Pull general SASL framework out of SCRAM
On Mon, 2021-07-05 at 17:17 +0900, Michael Paquier wrote: > On Wed, Jun 30, 2021 at 10:30:12PM +, Jacob Champion wrote: > > Done in v3, with a second patch for the code motion. > > I have gone through that, tweaking the documentation you have added as > that's the meat of the patch, reworking a bit the declarations of the > callbacks (no need for several typedef gere) and doing some small > format changes to make the indentation happy. And that looks pretty > good. Looks very good, thanks! A few comments on the docs changes: > + * Output parameters: > + * > + * buf: A StringInfo buffer that the callback should populate with > + *supported mechanism names. The names are appended > into this > + *StringInfo, separated by '\0' bytes. Each name must be null-terminated, not just null-separated. That way the list of names ends with an empty string: name-one\0 <- added by the mechanism name-two\0<- added by the mechanism \0 <- added by the framework The way it's worded now, I could see some implementers failing to terminate the final name because the framework adds a trailing null already -- but the framework is terminating the list, not the final name. > + * init() > + * > + * Initializes mechanism-specific state for a connection. This > + * callback must return a pointer to its allocated state, which will > + * be passed as-is as the first argument to the other callbacks. > + * free() is called to release any state resources. Maybe say "The free() callback is called" to differentiate it from standard free()? > It is a bit sad that the SCRAM part cannot be completely > unplugged from the auth part, because of the call to the free function > and the HBA checks, but adding more wrappers to accomodate with that > is not really worth it. Yeah. I think that additional improvements/refactoring here will come naturally if clients are ever allowed to negotiate SASL mechanisms in the future. Doesn't need to happen now. > - if (!pg_fe_scram_channel_bound(conn->sasl_state)) > + if (!conn->sasl || > !conn->sasl->channel_bound(conn->sasl_state)) > conn->sasl should be set in this code path. This style is safer. It's possible for conn->sasl to be NULL here, say if the client has channel_binding=require but connects as a user with an MD5 secret. The SCRAM TAP tests have one such case. > The top comment of scram_init() still mentioned > pg_be_scram_get_mechanisms(), while it should be > scram_get_mechanisms(). > > PG_MAX_SASL_MESSAGE_LENGTH can stay within auth-sasl.c. Looks good to me. > > - I don't think it's legal for a client to refuse a challenge from the > > server without aborting the exchange, so we should probably check to > > make sure that client responses are non-NULL in the success case. > > Hmm. Does the RFCs tell us anything about that? Just in general terms: >Each authentication exchange consists of a message from the client to >the server requesting authentication via a particular mechanism, >followed by one or more pairs of challenges from the server and >responses from the client, followed by a message from the server >indicating the outcome of the authentication exchange. (Note: >exchanges may also be aborted as discussed in Section 3.5.) So a challenge must be met with a response, or the exchange must be aborted. (And I don't think our protocol implementation provides a client abort message; if something goes wrong, we just tear down the connection.) Thanks, --Jacob
Re: Pipeline mode and PQpipelineSync()
On 2021-Jul-06, Boris Kolpackov wrote: > Alvaro Herrera writes: > > > Ah, yes it does. I can reproduce this now. I thought PQconsumeInput > > was sufficient, but it's not: you have to have the PQgetResult in there > > too. Looking ... > > Any progress on fixing this? Yeah ... the problem as I understand it is that the state transition in libpq when the connection is in pipeline aborted state is bogus. I'll post a patch in a bit. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/ #error "Operator lives in the wrong universe" ("Use of cookies in real-time system development", M. Gleixner, M. Mc Guire)
Re: Pipeline mode and PQpipelineSync()
On 2021-Jul-06, Boris Kolpackov wrote: > Alvaro Herrera writes: > > > Ah, yes it does. I can reproduce this now. I thought PQconsumeInput > > was sufficient, but it's not: you have to have the PQgetResult in there > > too. Looking ... > > Any progress on fixing this? Can you please try with this patch? Thanks -- Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/ >From c84b66821249631ba1654b22866ca54c49f238c4 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Tue, 6 Jul 2021 12:46:42 -0400 Subject: [PATCH] fix libpq state machine --- src/interfaces/libpq/fe-exec.c | 46 ++ 1 file changed, 35 insertions(+), 11 deletions(-) diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index b13ddab393..1295a417c1 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -1223,7 +1223,8 @@ pqAllocCmdQueueEntry(PGconn *conn) /* * pqAppendCmdQueueEntry - * Append a caller-allocated command queue entry to the queue. + * Append a caller-allocated command queue entry to the queue, and update + * conn->asyncStatus as needed to account for it. * * The query itself must already have been put in the output buffer by the * caller. @@ -1239,6 +1240,33 @@ pqAppendCmdQueueEntry(PGconn *conn, PGcmdQueueEntry *entry) conn->cmd_queue_tail->next = entry; conn->cmd_queue_tail = entry; + + switch (conn->pipelineStatus) + { + case PQ_PIPELINE_OFF: + case PQ_PIPELINE_ON: + /* + * If there's a result ready to be consumed, let it be so (that is, + * don't change away from READY or READY_MORE); otherwise we wait + * for some result to arrive from the server. + */ + if (conn->asyncStatus == PGASYNC_IDLE) +conn->asyncStatus = PGASYNC_BUSY; + break; + + /* + * If in aborted pipeline state, we don't expect anything from the + * server, so we have to let PQgetResult consume from the aborted + * queue right away. + */ + case PQ_PIPELINE_ABORTED: + if (conn->asyncStatus == PGASYNC_IDLE) + { +resetPQExpBuffer(>errorMessage); +pqPipelineProcessQueue(conn); + } + break; + } } /* @@ -1375,7 +1403,6 @@ PQsendQueryInternal(PGconn *conn, const char *query, bool newQuery) /* OK, it's launched! */ pqAppendCmdQueueEntry(conn, entry); - conn->asyncStatus = PGASYNC_BUSY; return 1; sendFailed: @@ -1510,10 +1537,6 @@ PQsendPrepare(PGconn *conn, /* if insufficient memory, query just winds up NULL */ entry->query = strdup(query); - pqAppendCmdQueueEntry(conn, entry); - - conn->asyncStatus = PGASYNC_BUSY; - /* * Give the data a push (in pipeline mode, only if we're past the size * threshold). In nonblock mode, don't complain if we're unable to send @@ -1522,6 +1545,8 @@ PQsendPrepare(PGconn *conn, if (pqPipelineFlush(conn) < 0) goto sendFailed; + pqAppendCmdQueueEntry(conn, entry); + return 1; sendFailed: @@ -1815,7 +1840,7 @@ PQsendQueryGuts(PGconn *conn, /* OK, it's launched! */ pqAppendCmdQueueEntry(conn, entry); - conn->asyncStatus = PGASYNC_BUSY; + return 1; sendFailed: @@ -2445,7 +2470,7 @@ PQsendDescribe(PGconn *conn, char desc_type, const char *desc_target) /* OK, it's launched! */ pqAppendCmdQueueEntry(conn, entry); - conn->asyncStatus = PGASYNC_BUSY; + return 1; sendFailed: @@ -3072,15 +3097,14 @@ PQpipelineSync(PGconn *conn) pqPutMsgEnd(conn) < 0) goto sendFailed; - pqAppendCmdQueueEntry(conn, entry); - /* * Give the data a push. In nonblock mode, don't complain if we're unable * to send it all; PQgetResult() will do any additional flushing needed. */ if (PQflush(conn) < 0) goto sendFailed; - conn->asyncStatus = PGASYNC_BUSY; + + pqAppendCmdQueueEntry(conn, entry); return 1; -- 2.20.1
Re: visibility map corruption
On Tue, Jul 6, 2021 at 10:32:24AM -0700, Peter Geoghegan wrote: > On Tue, Jul 6, 2021 at 10:27 AM Bruce Momjian wrote: > > OK, this is confirmation that the pg_resetwal bug, and its use by > > pg_upgrade, is a serious issue that needs to be addressed. I am > > prepared to work on it now. > > To be clear, I'm not 100% sure that this is related to the pg_upgrade > + "pg_resetwal sets oldestXid to an invented value" issue. I am sure > that that is a serious issue that needs to be addressed sooner rather > than later, though. Well, pg_upgrade corruptions are rare, but so is modifying autovacuum_freeze_max_age. If we have a corruption and we know autovacuum_freeze_max_age was modified, odds are that is the cause. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Planning time grows exponentially with levels of nested views
Dean Rasheed writes: > I took a look at this and wasn't able to find any way to break it, and > your argument that it can't really make such rewriter bugs any worse > makes sense. Thanks for looking! > Would it make sense to update the comment prior to copying the subquery? Yeah, I hadn't touched that yet because the question was exactly about whether it's correct or not. I think we can shorten it to * Need a modifiable copy of the subquery to hack on, so that the * RTE can be left unchanged in case we decide below that we can't * pull it up after all. > Out of curiosity, I also tested DML against these deeply nested views > to see how the pull-up code in the rewriter compares in terms of > performance, since it does a very similar job. As expected, it's > O(N^2) as well, but it's about an order of magnitude faster: Oh good. I hadn't thought to look at that angle of things. > ... for a one-line change, that's pretty impressive. Yeah. The problem might get less bad if we get anywhere with the idea I suggested at [1]. If we can reduce the number of RTEs in a view's query, then copying it would get cheaper. Still, not copying it at all is always going to be better. I'll go ahead and push the patch. regards, tom lane [1] https://www.postgresql.org/message-id/697679.1625154303%40sss.pgh.pa.us
Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb
Amul Sul writes: > On Tue, Apr 20, 2021 at 6:59 AM Kyotaro Horiguchi > wrote: >> I don't mind RelationGetSmgr(index)->smgr_rnode alone or >> >member alone and there's not the previous call to >> RelationGetSmgr just above. How about using a temporary variable? >> >> SMgrRelation srel = RelationGetSmgr(index); >> smgrwrite(srel, ...); >> log_newpage(srel->..); > Understood. Used a temporary variable for the place where > RelationGetSmgr() calls are placed too close or in a loop. [ squint... ] Doesn't this risk introducing exactly the sort of cache-clobber hazard we're trying to prevent? That is, the above is not safe unless you are *entirely* certain that there is not and never will be any possibility of a relcache flush before you are done using the temporary variable. Otherwise it can become a dangling pointer. The point of the static-inline function idea was to be cheap enough that it isn't worth worrying about this sort of risky optimization. Given that an smgr function is sure to involve some kernel calls, I doubt it's worth sweating over an extra test-and-branch beforehand. So where I was hoping to get to is that smgr objects are *only* referenced by RelationGetSmgr() calls and nobody ever keeps any other pointers to them across any non-smgr operations. regards, tom lane
Re: visibility map corruption
On Tue, Jul 6, 2021 at 10:27 AM Bruce Momjian wrote: > OK, this is confirmation that the pg_resetwal bug, and its use by > pg_upgrade, is a serious issue that needs to be addressed. I am > prepared to work on it now. To be clear, I'm not 100% sure that this is related to the pg_upgrade + "pg_resetwal sets oldestXid to an invented value" issue. I am sure that that is a serious issue that needs to be addressed sooner rather than later, though. -- Peter Geoghegan
Re: Planning time grows exponentially with levels of nested views
On Sun, 18 Apr 2021 at 21:42, Tom Lane wrote: > > > If multiple references are actually possible then this'd break it. > > I think this patch doesn't make things any worse for such a case though. > If we re-introduced such a bug, the result would be an immediate null > pointer crash while trying to process the second reference to a > flattenable subquery. That's probably better for debuggability than > what happens now, where we just silently process the duplicate reference. > I took a look at this and wasn't able to find any way to break it, and your argument that it can't really make such rewriter bugs any worse makes sense. Would it make sense to update the comment prior to copying the subquery? Out of curiosity, I also tested DML against these deeply nested views to see how the pull-up code in the rewriter compares in terms of performance, since it does a very similar job. As expected, it's O(N^2) as well, but it's about an order of magnitude faster: (times to run a plain EXPLAIN in ms, with patch) | SELECT | INSERT | UPDATE | DELETE -++++ v64 | 1.259 | 0.189 | 0.250 | 0.187 v128 | 5.035 | 0.506 | 0.547 | 0.509 v256 | 20.393 | 1.633 | 1.607 | 1.638 v512 | 81.101 | 6.649 | 6.517 | 6.699 Maybe that's not surprising, since there's less to do at that stage. Anyway, it's reassuring to know that it copes OK with this (I've seen some quite deeply nested views in practice, but never that deep). For comparison, this is what SELECT performance looked like for me without the patch: | SELECT -+-- v64 |9.589 v128 | 73.292 v256 | 826.964 v512 | 7844.419 so, for a one-line change, that's pretty impressive. Regards, Dean
Re: visibility map corruption
On Sun, Jul 4, 2021 at 10:28:25PM +, Floris Van Nee wrote: > > > > I wonder if it's related to this issue: > > > > https://www.postgresql.org/message- > > id/20210423234256.hwopuftipdmp3...@alap3.anarazel.de > > > > Have you increased autovacuum_freeze_max_age from its default? This > > already sounds like the kind of database where that would make > > sense. > > > > autovacuum_freeze_max_age is increased in our setup indeed (it is > set to 500M). However, we do regularly run manual VACUUM (FREEZE) > on individual tables in the database, including this one. A lot of > tables in the database follow an INSERT-only pattern and since it's > not running v13 yet, this meant that these tables would only rarely > be touched by autovacuum. Autovacuum would sometimes kick in on some > of these tables at the same time at unfortunate moments. Therefore we > have some regular job running that VACUUM (FREEZE)s tables with a xact > age higher than a (low, 10M) threshold ourselves. OK, this is confirmation that the pg_resetwal bug, and its use by pg_upgrade, is a serious issue that needs to be addressed. I am prepared to work on it now. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Grammar railroad diagram
Hello Bruce ! You can download the railroad generator to generate offline using Java here -> https://www.bottlecaps.de/rr/download/rr-1.63-java8.zip (link from the https://www.bottlecaps.de/rr/ui on tab Welcome). java -jar rr.war -out:Dafny.atg.xhtml grammar.txt Cheers ! On 6/7/21 18:51, Bruce Momjian wrote: On Sat, Jul 3, 2021 at 10:39:02AM +0200, Domingo Alvarez Duarte wrote: I've done a experimental tool to convert bison grammars to a kind of EBNF understood by https://www.bottlecaps.de/rr/ui to generate railroad diagrams see bellow the converted 'postgresql-13.3/src/backend/parser/gram.y' and with some hand made changes to allow view it at https://www.bottlecaps.de/rr/ui the order of the rules could be changed to a better view of the railroad diagrams. Copy and paste the EBNF bellow on https://www.bottlecaps.de/rr/ui tab Edit Grammar then switch to the tab View Diagram. That is pretty cool. I had trouble figuring out how to get it working, so here are the steps I used: 1. save my attachment (created by Domingo) 2. go to https://www.bottlecaps.de/rr/ui 3. select "Edit Grammar" 4. choose "Browse" at the bottom 5. select the attachment you saved in #1 6. choose "Load" at the bottom 7. select "View Diagram" You can even click on the yellow boxes to see the sub-grammar. People have asked for railroad diagrams in the past, and this certainly produces them, and "Options" allows many customizations. I tried downloading as XHTML+SVG and HTML+PNG but got an error: HTTP Status 500 – Internal Server Error Type Exception Report Message The multi-part request contained parameter data (excluding uploaded files) that exceeded the limit for maxPostSize set on the associated connector Description The server encountered an unexpected condition that prevented it from fulfilling the request. It might be nice to download this output and host it on the Postgres website at some point.
Re: Grammar railroad diagram
On Sat, Jul 3, 2021 at 10:39:02AM +0200, Domingo Alvarez Duarte wrote: > I've done a experimental tool to convert bison grammars to a kind of EBNF > understood by https://www.bottlecaps.de/rr/ui to generate railroad diagrams > see > bellow the converted 'postgresql-13.3/src/backend/parser/gram.y' and with some > hand made changes to allow view it at https://www.bottlecaps.de/rr/ui the > order > of the rules could be changed to a better view of the railroad diagrams. Copy > and paste the EBNF bellow on https://www.bottlecaps.de/rr/ui tab Edit Grammar > then switch to the tab View Diagram. That is pretty cool. I had trouble figuring out how to get it working, so here are the steps I used: 1. save my attachment (created by Domingo) 2. go to https://www.bottlecaps.de/rr/ui 3. select "Edit Grammar" 4. choose "Browse" at the bottom 5. select the attachment you saved in #1 6. choose "Load" at the bottom 7. select "View Diagram" You can even click on the yellow boxes to see the sub-grammar. People have asked for railroad diagrams in the past, and this certainly produces them, and "Options" allows many customizations. I tried downloading as XHTML+SVG and HTML+PNG but got an error: HTTP Status 500 – Internal Server Error Type Exception Report Message The multi-part request contained parameter data (excluding uploaded files) that exceeded the limit for maxPostSize set on the associated connector Description The server encountered an unexpected condition that prevented it from fulfilling the request. It might be nice to download this output and host it on the Postgres website at some point. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion. stmtblock ::= stmtmulti stmtmulti ::= stmtmulti ';' stmt | stmt stmt ::= AlterEventTrigStmt | AlterCollationStmt | AlterDatabaseStmt | AlterDatabaseSetStmt | AlterDefaultPrivilegesStmt | AlterDomainStmt | AlterEnumStmt | AlterExtensionStmt | AlterExtensionContentsStmt | AlterFdwStmt | AlterForeignServerStmt | AlterForeignTableStmt | AlterFunctionStmt | AlterGroupStmt | AlterObjectDependsStmt | AlterObjectSchemaStmt | AlterOwnerStmt | AlterOperatorStmt | AlterTypeStmt | AlterPolicyStmt | AlterSeqStmt | AlterSystemStmt | AlterTableStmt | AlterTblSpcStmt | AlterCompositeTypeStmt | AlterPublicationStmt | AlterRoleSetStmt | AlterRoleStmt | AlterSubscriptionStmt | AlterStatsStmt | AlterTSConfigurationStmt | AlterTSDictionaryStmt | AlterUserMappingStmt | AnalyzeStmt | CallStmt | CheckPointStmt | ClosePortalStmt | ClusterStmt | CommentStmt | ConstraintsSetStmt | CopyStmt | CreateAmStmt | CreateAsStmt | CreateAssertionStmt | CreateCastStmt | CreateConversionStmt | CreateDomainStmt | CreateExtensionStmt | CreateFdwStmt | CreateForeignServerStmt | CreateForeignTableStmt | CreateFunctionStmt | CreateGroupStmt | CreateMatViewStmt | CreateOpClassStmt | CreateOpFamilyStmt | CreatePublicationStmt | AlterOpFamilyStmt | CreatePolicyStmt | CreatePLangStmt | CreateSchemaStmt | CreateSeqStmt | CreateStmt | CreateSubscriptionStmt | CreateStatsStmt | CreateTableSpaceStmt | CreateTransformStmt | CreateTrigStmt | CreateEventTrigStmt | CreateRoleStmt | CreateUserStmt | CreateUserMappingStmt | CreatedbStmt | DeallocateStmt | DeclareCursorStmt | DefineStmt | DeleteStmt | DiscardStmt | DoStmt | DropCastStmt | DropOpClassStmt | DropOpFamilyStmt | DropOwnedStmt | DropPLangStmt | DropStmt | DropSubscriptionStmt | DropTableSpaceStmt | DropTransformStmt | DropRoleStmt | DropUserMappingStmt | DropdbStmt | ExecuteStmt | ExplainStmt | FetchStmt | GrantStmt | GrantRoleStmt | ImportForeignSchemaStmt | IndexStmt | InsertStmt | ListenStmt | RefreshMatViewStmt | LoadStmt | LockStmt | NotifyStmt | PrepareStmt | ReassignOwnedStmt | ReindexStmt | RemoveAggrStmt | RemoveFuncStmt | RemoveOperStmt | RenameStmt | RevokeStmt | RevokeRoleStmt | RuleStmt | SecLabelStmt | SelectStmt | TransactionStmt | TruncateStmt | UnlistenStmt | UpdateStmt | VacuumStmt | VariableResetStmt | VariableSetStmt | VariableShowStmt | ViewStmt | CallStmt ::= CALL func_application CreateRoleStmt ::= CREATE ROLE RoleId opt_with OptRoleList opt_with ::= WITH | WITH_LA | OptRoleList ::= OptRoleList CreateOptRoleElem | AlterOptRoleList ::= AlterOptRoleList AlterOptRoleElem | AlterOptRoleElem ::= PASSWORD Sconst | PASSWORD NULL_P | ENCRYPTED PASSWORD Sconst | UNENCRYPTED PASSWORD Sconst | INHERIT | CONNECTION LIMIT SignedIconst | VALID UNTIL Sconst | USER role_list | IDENT CreateOptRoleElem ::= AlterOptRoleElem | SYSID Iconst | ADMIN role_list | ROLE role_list | IN_P ROLE role_list | IN_P GROUP_P role_list CreateUserStmt ::= CREATE USER RoleId opt_with OptRoleList AlterRoleStmt ::= ALTER ROLE RoleSpec opt_with AlterOptRoleList | ALTER USER RoleSpec
Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
> On Jul 5, 2021, at 1:50 AM, Andrey Borodin wrote: > > I'm not sure, but maybe we should allow replication role to change > session_replication_role? Thanks, Andrey, for taking a look. Yes, there is certainly some logic to that suggestion. The patch v4-0005 only delegates authority to perform ALTER SYSTEM SET to three roles: pg_database_security, pg_network_security, and pg_host_security. I don't mind expanding this list to include the replication attribute, but I am curious about opinions on the general design. There may be an advantage in keeping the list short. In particular, as the list gets longer, will it get harder to decide which role to associate with each new GUC that gets added? For third-party extensions, will it be harder for them to decide in any principled way which role to assign to each GUC that they add? There are multiple ways to cut up the set of all GUCs. database/host/network is not an entirely clean distinction, and perhaps database/host/network/replication is better, but I'm uncertain. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: logical replication worker accesses catalogs in error context callback
Bharath Rupireddy writes: > How about making the below else if statement and the attname > assignment into a single line? They are falling below the 80 char > limit. > else if (colno > 0 && colno <= list_length(rte->eref->colnames)) > attname = strVal(list_nth(rte->eref->colnames, colno - 1)); Pushed that way. (I think possibly I'd had this code further indented in its first incarnation, thus the extra line breaks.) regards, tom lane
Re: proposal - log_full_scan
Hi út 6. 7. 2021 v 16:07 odesílatel Daniel Gustafsson napsal: > Looking at this I like the idea in principle, but I'm not convinced that > auto_explain is the right tool for this. auto_explain is for identifying > slow > queries, and what you are proposing is to identify queries with a certain > "shape" (for lack of a better term) even if they aren't slow as per the > log_min_duration setting. If log_min_duration is deemed to crude due to > query > volume then sample_rate is the tool. If sample_rate is also discarded, > then > pg_stat_statements seems a better option. > I don't think so pg_stat_statements can be used - a) it doesn't check execution plan, so this feature can have big overhead against current pg_stat_statements, that works just with AST, b) pg_stat_statements has one entry per AST - but this can be problem on execution plan level, and this is out of perspective of pg_stat_statements. > > Also, why just sequential scans (apart from it being this specific > usecase)? > If the idea is to track aspects of execution which are deemed slow, then > tracking for example spills etc would be just as valid. Do you have > thoughts > on that? > Yes, I thought about it more, and sometimes bitmap index scans are problematic too, index scans in nested loops can be a problem too. For my last customer I had to detect queries with a large bitmap index scan. I can do it with a combination of pg_stat_statements and log checking, but this work is not very friendly. My current idea is to have some extension that can be tran for generally specified executor nodes. Sometimes I can say - I need to know all queries that does seq scan over tabx where tuples processed > N. In other cases can be interesting to know the queries that uses index x for bitmap index scan, > > That being said, a few comments on the patch: > > - (auto_explain_log_min_duration >= 0 && \ > + ((auto_explain_log_min_duration >= 0 || auto_explain_log_seqscan > != -1) && \ > Is there a reason to not follow the existing code and check for >= 0? > > + DefineCustomIntVariable("auto_explain.log_seqscan", > It's only a matter of time before another node is proposed for logging, and > then we'll be stuck adding log_XXXnode GUCs. Is there a more future-proof > way > to do this? > > + "Sets the minimum tuples produced by sequantial scans which plans > will be logged", > s/sequantial/sequential/ > > - es->analyze = (queryDesc->instrument_options && > auto_explain_log_analyze); > + es->analyze = (queryDesc->instrument_options && > (auto_explain_log_analyze || auto_explain_log_seqscan != -1)); > Turning on ANALYZE when log_analyze isn't set to True is a no-no IMO. > > + * Colllect relations where log_seqscan limit was exceeded > s/Colllect/Collect/ > > + if (*relnames.data != '\0') > + appendStringInfoString(, ","); > This should use appendStringInfoChar instead. > > + (errmsg("duration: %.3f ms, over limit seqscans: %s, plan:\n%s", > The "over limit" part is superfluous since it otherwise wouldn't be > logged. If > we're prefixing something wouldn't it be more helpful to include the limit, > like: "seqscans >= %d tuples returned:". I'm not a fan of "seqscans" but > spelling it out is also quite verbose and this is grep-able. > > Documentation and tests are also missing > Unfortunately, this idea is not well prepared. My patch was a proof concept - but I think so it is not a good start point. Maybe it needs some tracing API on executor level and some tool like "perf top", but for executor. Post execution analysis is not a good direction with big overhead, and mainly it is not friendly in critical situations. I need some handy tool like perf, but for executor nodes. I don't know how to do it effectively. Thank you for your review and for your time, but I think it is better to remove this patch from commit fest. I have no idea how to design this feature well :-/ Regards Pavel > > -- > Daniel Gustafsson https://vmware.com/ > >
Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options
On 2021-Jul-06, Bharath Rupireddy wrote: > Thanks, Amit. I'm posting the 0002 patch which removes extra ereport > calls using local variables. Please review it. I looked at this the other day and I'm not sure I like it very much. It's not making anything any simpler, it's barely saving two lines of code. I think we can do without this change. -- Álvaro Herrera Valdivia, Chile — https://www.EnterpriseDB.com/
Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
On Tue, Jul 6, 2021 at 11:03 AM Ronan Dunklau wrote: > > Thank you for the review, I will address those shortly, but will answer some > questions in the meantime. > > > > First, the changes are lacking any explanatory comments. Probably we > > > should follow how nodeAgg does this and add both comments to the > > > ExecSort function header as well as specific comments above the "if" > > > around the new tuplesort_begin_datum explaining the specific > > > conditions that are required for the optimization to be useful and > > > safe. > > Done, since I lifted the restrictions following your questions, there isn't > much left to comment. (see below) > > > > > > > That leads to a question I had: I don't follow why bounded mode (when > > > using byval) needs to be excluded. Comments should be added if there's > > > a good reason (as noted above), but maybe it's a case we can handle > > > safely? > > I had test failures when trying to move the Datum around when performing a > bounded sort, but did not look into it at first. > > Now I've looked into it, and the switch to a heapsort when using bounded mode > just unconditionnaly tried to free a tuple that was never there to begin with. > So if the SortTuple does not contain an actual tuple, but only a single datum, > do not do that. > > I've updated the patch to fix this and enable the optimization in the case of > bounded sort. Awesome. > > > A second question: at first glance it's intuitively the case we might > > > not be able to handle byref values. But nodeAgg doesn't seem to have > > > that restriction. What's the difference here? > > > > I think tuplesort_begin_datum, doesn't have any such limitation, it > > can handle any type of Datum so I think we don't need to consider the > > only attbyval, we can consider any type of attribute for this > > optimization. > > I've restricted the optimization to byval types because of the following > comment in nodeAgg.c: > > /* > * Note: if input type is pass-by-ref, the datums returned by the > sort are > * freshly palloc'd in the per-query context, so we must be careful > to > * pfree them when they are no longer needed. > */ > > As I was not sure how to handle that, I prefered the safety of not enabling > it. Since you both told me it should be safe, I've lifted that restriction > too. To be clear, I don't know for certain it's safe [without extra work], but even if it involves some extra manual pfree'ing (a la nodeAgg) it's probably worth it. Maybe someone else will weigh in on whether or not anything special is required here to ensure we don't leak memory (I haven't looked in detail yet). > > A few small code observations: > > - In my view the addition of unlikely() in ExecSort is unlikely to be > > of benefit because it's a single call for the entire node's execution > > (not in the tuple loop). > > Done. > > > - It seems clearer to change the "if (!node->is_single_val)" to flip > > the true/false cases so we don't need the negation. > > Agreed, done. Thanks > > - I assume there are tests that likely already cover this case, but > > it'd be worth verifying that. > > Yes many test cases cover that, but maybe it would be better to explictly > check for it on some cases: do you think we could add a debug message that can > be checked for ? Mostly I think we should verify code coverage and _maybe_ add a specific test or two that we know execises this path. I don't know that the debug message needs to be matched in the test (probably more pain than it's worth), but the debug message ("enabling datum sort optimizaton" or similar) might be good anyway. I wonder if we need to change costing of sorts for this case. I don't like having to do so, but it's a significant change in speed, so probably should impact what plan gets chosen. Hopefully others will weigh on this also. > > Finally, I believe the same optimization likely ought to be added to > > nodeIncrementalSort. It's less likely the tests there are sufficient > > for both this and the original case, but we'll see. > > I will look into it, but isn't incrementalsort used to sort tuples on several > keys, when they are already sorted on the first ? In that case, I doubt we > would ever have a single-valued tuple here, except if there is a projection to > strip the tuple from extraneous attributes. Yes and no. When incremental sort has to do a full sort there will always be at least 2 attributes. But in prefix sort mode (see prefixsort_state) only non-presorted columns are sorted (i.e., if given a,b already sorted by a, then only b is sorted). So the prefixsort_state could use this optimization. James
Re: Enhanced error message to include hint messages for redundant options error
On Wed, Jun 30, 2021 at 7:48 PM vignesh C wrote: > > On Thu, May 13, 2021 at 8:09 PM vignesh C wrote: > > > > On Thu, May 13, 2021 at 4:58 AM Alvaro Herrera > > wrote: > > > > > > > Thanks for the comments, Attached patch has the changes for the same. > > > > The Patch was not applying on Head, the attached patch is rebased on > top of Head. The patch was not applying on the head because of the recent commit "8aafb02616753f5c6c90bbc567636b73c0cbb9d4", attached patch which is rebased on HEAD. Regards, Vignesh From 5ccc262d895d688fbca42d795ea01c7c16c09b54 Mon Sep 17 00:00:00 2001 From: vignesh Date: Tue, 6 Jul 2021 20:22:45 +0530 Subject: [PATCH v8] Enhance error message to include option name in case of duplicate option error. Enhanced error message to include option name in case of duplication option error, so that the user can easily identify the error. --- contrib/file_fdw/file_fdw.c | 13 +-- src/backend/catalog/aclchk.c| 14 +-- src/backend/commands/copy.c | 57 +++--- src/backend/commands/dbcommands.c | 76 +++-- src/backend/commands/extension.c| 23 +--- src/backend/commands/foreigncmds.c | 21 ++-- src/backend/commands/functioncmds.c | 59 -- src/backend/commands/publicationcmds.c | 30 ++--- src/backend/commands/sequence.c | 48 ++-- src/backend/commands/subscriptioncmds.c | 62 +- src/backend/commands/tablecmds.c| 4 +- src/backend/commands/typecmds.c | 34 ++ src/backend/commands/user.c | 118 +--- src/backend/parser/parse_utilcmd.c | 4 +- src/backend/replication/pgoutput/pgoutput.c | 23 ++-- src/backend/replication/walsender.c | 15 +-- src/backend/tcop/utility.c | 20 ++-- src/include/commands/defrem.h | 16 ++- src/include/commands/publicationcmds.h | 4 +- src/include/commands/subscriptioncmds.h | 4 +- src/include/commands/typecmds.h | 2 +- src/include/commands/user.h | 2 +- src/test/regress/expected/copy2.out | 24 ++-- src/test/regress/expected/foreign_data.out | 8 +- src/test/regress/expected/publication.out | 4 +- 25 files changed, 232 insertions(+), 453 deletions(-) diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 2c2f149fb0..f49dd47930 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -200,6 +200,7 @@ file_fdw_validator(PG_FUNCTION_ARGS) char *filename = NULL; DefElem*force_not_null = NULL; DefElem*force_null = NULL; + DefElem*def; List *other_options = NIL; ListCell *cell; @@ -209,7 +210,7 @@ file_fdw_validator(PG_FUNCTION_ARGS) */ foreach(cell, options_list) { - DefElem*def = (DefElem *) lfirst(cell); + def = (DefElem *) lfirst(cell); if (!is_valid_option(def->defname, catalog)) { @@ -290,10 +291,7 @@ file_fdw_validator(PG_FUNCTION_ARGS) else if (strcmp(def->defname, "force_not_null") == 0) { if (force_not_null) -ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("conflicting or redundant options"), - errhint("Option \"force_not_null\" supplied more than once for a column."))); +ReportDuplicateOptionError(def, NULL); force_not_null = def; /* Don't care what the value is, as long as it's a legal boolean */ (void) defGetBoolean(def); @@ -302,10 +300,7 @@ file_fdw_validator(PG_FUNCTION_ARGS) else if (strcmp(def->defname, "force_null") == 0) { if (force_null) -ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("conflicting or redundant options"), - errhint("Option \"force_null\" supplied more than once for a column."))); +ReportDuplicateOptionError(def, NULL); force_null = def; (void) defGetBoolean(def); } diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index 53392414f1..264cdc2730 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -59,6 +59,7 @@ #include "catalog/pg_ts_template.h" #include "catalog/pg_type.h" #include "commands/dbcommands.h" +#include "commands/defrem.h" #include "commands/event_trigger.h" #include "commands/extension.h" #include "commands/proclang.h" @@ -910,30 +911,25 @@ ExecAlterDefaultPrivilegesStmt(ParseState *pstate, AlterDefaultPrivilegesStmt *s List *nspnames = NIL; DefElem*drolespecs = NULL; DefElem*dnspnames = NULL; + DefElem*defel; AclMode all_privileges; const char *errormsg; /* Deconstruct the "options" part of the statement */ foreach(cell, stmt->options) { - DefElem*defel = (DefElem *) lfirst(cell); + defel = (DefElem *) lfirst(cell); if (strcmp(defel->defname, "schemas") == 0) { if (dnspnames) -ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), - errmsg("conflicting or redundant
Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Thank you for the review, I will address those shortly, but will answer some questions in the meantime. > > First, the changes are lacking any explanatory comments. Probably we > > should follow how nodeAgg does this and add both comments to the > > ExecSort function header as well as specific comments above the "if" > > around the new tuplesort_begin_datum explaining the specific > > conditions that are required for the optimization to be useful and > > safe. Done, since I lifted the restrictions following your questions, there isn't much left to comment. (see below) > > > > That leads to a question I had: I don't follow why bounded mode (when > > using byval) needs to be excluded. Comments should be added if there's > > a good reason (as noted above), but maybe it's a case we can handle > > safely? I had test failures when trying to move the Datum around when performing a bounded sort, but did not look into it at first. Now I've looked into it, and the switch to a heapsort when using bounded mode just unconditionnaly tried to free a tuple that was never there to begin with. So if the SortTuple does not contain an actual tuple, but only a single datum, do not do that. I've updated the patch to fix this and enable the optimization in the case of bounded sort. > > > > A second question: at first glance it's intuitively the case we might > > not be able to handle byref values. But nodeAgg doesn't seem to have > > that restriction. What's the difference here? > > I think tuplesort_begin_datum, doesn't have any such limitation, it > can handle any type of Datum so I think we don't need to consider the > only attbyval, we can consider any type of attribute for this > optimization. I've restricted the optimization to byval types because of the following comment in nodeAgg.c: /* * Note: if input type is pass-by-ref, the datums returned by the sort are * freshly palloc'd in the per-query context, so we must be careful to * pfree them when they are no longer needed. */ As I was not sure how to handle that, I prefered the safety of not enabling it. Since you both told me it should be safe, I've lifted that restriction too. > A few small code observations: > - In my view the addition of unlikely() in ExecSort is unlikely to be > of benefit because it's a single call for the entire node's execution > (not in the tuple loop). Done. > - It seems clearer to change the "if (!node->is_single_val)" to flip > the true/false cases so we don't need the negation. Agreed, done. > - I assume there are tests that likely already cover this case, but > it'd be worth verifying that. Yes many test cases cover that, but maybe it would be better to explictly check for it on some cases: do you think we could add a debug message that can be checked for ? > Finally, I believe the same optimization likely ought to be added to > nodeIncrementalSort. It's less likely the tests there are sufficient > for both this and the original case, but we'll see. I will look into it, but isn't incrementalsort used to sort tuples on several keys, when they are already sorted on the first ? In that case, I doubt we would ever have a single-valued tuple here, except if there is a projection to strip the tuple from extraneous attributes. -- Ronan Dunklaudiff --git a/src/backend/executor/nodeSort.c b/src/backend/executor/nodeSort.c index b99027e0d7..ff443d15a9 100644 --- a/src/backend/executor/nodeSort.c +++ b/src/backend/executor/nodeSort.c @@ -29,6 +29,10 @@ * which saves the results in a temporary file or memory. After the * initial call, returns a tuple from the file with each call. * + * The tuplesort can either occur on the whole tuple (this is the nominal + * case) or, when the input / output tuple consists of only one attribute, + * we switch to the tuplesort_*_datum API, optimized for that specific case. + * * Conditions: * -- none. * @@ -85,33 +89,59 @@ ExecSort(PlanState *pstate) outerNode = outerPlanState(node); tupDesc = ExecGetResultType(outerNode); - - tuplesortstate = tuplesort_begin_heap(tupDesc, - plannode->numCols, - plannode->sortColIdx, - plannode->sortOperators, - plannode->collations, - plannode->nullsFirst, - work_mem, - NULL, - node->randomAccess); + /* + * Switch to the tuplesort_*_datum interface when we have only one key, + * as it is much more efficient especially when the type is pass-by-value. + */ + if (tupDesc->natts == 1) + { + node->is_single_val = true; + tuplesortstate = tuplesort_begin_datum(TupleDescAttr(tupDesc, 0)->atttypid, + plannode->sortOperators[0], + plannode->collations[0], + plannode->nullsFirst[0], + work_mem, + NULL, + node->randomAccess); + } else + tuplesortstate = tuplesort_begin_heap(tupDesc, +
Re: [PATCH] expand the units that pg_size_pretty supports on output
David Rowley writes: > Does anyone want to have a look over this? If not, I plan to push it > in the next day or so. Minor nit: use "const char *text" in the struct declaration, so that all of the static data can be placed in fixed storage. > (I'm not sure why pgindent removed the space between != and NULL, but > it did, so I left it.) It did that because "text" is a typedef name, so it's a bit confused about whether the statement is really a declaration. Personally I'd have used "name" or something like that for that field, anyway. regards, tom lane
Re: .ready and .done files considered harmful
> I have a few suggestions on the patch > 1. > + > + /* > + * Found the oldest WAL, reset timeline ID and log segment number to > generate > + * the next WAL file in the sequence. > + */ > + if (found && !historyFound) > + { > + XLogFromFileName(xlog, , , wal_segment_size); > + ereport(LOG, > + (errmsg("directory scan to archive write-ahead log file \"%s\"", > + xlog))); > + } > > If a history file is found we are not updating curFileTLI and > nextLogSegNo, so it will attempt the previously found segment. This > is fine because it will not find that segment and it will rescan the > directory. But I think we can do better, instead of searching the > same old segment in the previous timeline we can search that old > segment in the new TL so that if the TL switch happened within the > segment then we will find the segment and we will avoid the directory > search. > > > /* > + * Log segment number and timeline ID to get next WAL file in a sequence. > + */ > +static XLogSegNo nextLogSegNo = 0; > +static TimeLineID curFileTLI = 0; > + > > So everytime archiver will start with searching segno=0 in timeline=0. > Instead of doing this can't we first scan the directory and once we > get the first segment to archive then only we can start predicting the > next wal segment? I think there is nothing wrong even if we try to > look for seg 0 in timeline 0, everytime we start the archivar but that > will be true only once in the history of the cluster so why not skip > this until we scan the directory once? > +1, I like Dilip's ideas here to optimize further. Also, one minor comment: + /* +* Log segment number already points to the next file in the sequence +* (as part of successful archival of the previous file). Generate the path +* for status file. +*/ This comment is a bit confusing with the name of the variable nextLogSegNo. I think the name of the variable is appropriate here, but maybe we can reword the comment something like: + /* +* We already have the next anticipated log segment number and the +* timeline, check if this WAL file is ready to be archived. If yes, skip +* the directory scan. +*/ Regards, Jeevan Ladhe
Re: [PATCH] expand the units that pg_size_pretty supports on output
David Rowley writes: > On Mon, 5 Jul 2021 at 20:00, David Rowley wrote: >> I don't really like the fact that I had to add the doHalfRound field >> to get the same rounding behaviour as the original functions. I'm >> wondering if it would just be too clever just to track how many bits >> we've shifted right by in pg_size_pretty* and compare that to the >> value of multishift for the current unit and do appropriate rounding >> to get us to the value of multishift. In theory, we could just keep >> calling the half_rounded macro until we make it to the multishift >> value. My current thoughts are that it's unlikely that anyone would >> twiddle with the size_pretty_units array in such a way for the extra >> code to be worth it. Maybe someone else feels differently. > > I made another pass over this and ended up removing the doHalfRound > field in favour of just doing rounding based on the previous > bitshifts. > > I did a few other tidy ups and I think it's a useful patch as it > reduces the amount of code a bit and makes it dead simple to add new > units in the future. Most importantly it'll help keep pg_size_pretty, > pg_size_pretty_numeric and pg_size_bytes all in sync in regards to > what units they support. > > Does anyone want to have a look over this? If not, I plan to push it > in the next day or so. > > (I'm not sure why pgindent removed the space between != and NULL, but > it did, so I left it.) > > David I like the approach you took here; much cleaner to have one table for all of the individual codepaths. Testing worked as expected; if we do decide to expand the units table there will be a few additional changes (most significantly, the return value of `pg_size_bytes()` will need to switch to `numeric`). Thanks, David
Re: Identify missing publications from publisher while create/alter subscription.
On Wed, Jun 30, 2021 at 8:23 PM vignesh C wrote: > > On Sun, Jun 6, 2021 at 11:55 AM vignesh C wrote: > > > > On Fri, May 7, 2021 at 6:44 PM vignesh C wrote: > > > > > > Thanks for the comments, the attached patch has the fix for the same. > > > > The patch was not applying on the head, attached patch which is rebased on > > HEAD. > > The patch was not applying on the head, attached patch which is rebased on > HEAD. The patch was not applying on the head, attached patch which is rebased on HEAD. Regards, Vignesh From 059130a6d393c8c020234747865368596a3c4702 Mon Sep 17 00:00:00 2001 From: vignesh Date: Tue, 6 Jul 2021 19:54:11 +0530 Subject: [PATCH v11] Identify missing publications from publisher while create/alter subscription. Creating/altering subscription is successful when we specify a publication which does not exist in the publisher. This patch checks if the specified publications are present in the publisher and throws an error if any of the publication is missing in the publisher. --- doc/src/sgml/ref/alter_subscription.sgml | 13 ++ doc/src/sgml/ref/create_subscription.sgml | 18 +- src/backend/commands/subscriptioncmds.c | 208 +++--- src/bin/psql/tab-complete.c | 6 +- src/test/subscription/t/007_ddl.pl| 68 ++- 5 files changed, 281 insertions(+), 32 deletions(-) diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml index b3d173179f..205f82d0d0 100644 --- a/doc/src/sgml/ref/alter_subscription.sgml +++ b/doc/src/sgml/ref/alter_subscription.sgml @@ -161,6 +161,19 @@ ALTER SUBSCRIPTION name RENAME TO < + + +validate_publication (boolean) + + + When true, the command verifies if all the specified publications + that are being subscribed to are present in the publisher and throws + an error if any of the publication doesn't exist. The default is + false. + + + + diff --git a/doc/src/sgml/ref/create_subscription.sgml b/doc/src/sgml/ref/create_subscription.sgml index e812beee37..cad9285c16 100644 --- a/doc/src/sgml/ref/create_subscription.sgml +++ b/doc/src/sgml/ref/create_subscription.sgml @@ -207,8 +207,9 @@ CREATE SUBSCRIPTION subscription_nameCREATE SUBSCRIPTION should connect to the publisher at all. Setting this to false will change default values of - enabled, create_slot and - copy_data to false. + enabled, create_slot, + copy_data and + validate_publication to false. @@ -239,6 +240,19 @@ CREATE SUBSCRIPTION subscription_name + + +validate_publication (boolean) + + + When true, the command verifies if all the specified publications + that are being subscribed to are present in the publisher and throws + an error if any of the publication doesn't exist. The default is + false. + + + + diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c index eb88d877a5..e3f8438e5f 100644 --- a/src/backend/commands/subscriptioncmds.c +++ b/src/backend/commands/subscriptioncmds.c @@ -59,6 +59,7 @@ #define SUBOPT_REFRESH0x0040 #define SUBOPT_BINARY0x0080 #define SUBOPT_STREAMING 0x0100 +#define SUBOPT_VALIDATE_PUB 0x0200 /* check if the 'val' has 'bits' set */ #define IsSet(val, bits) (((val) & (bits)) == (bits)) @@ -79,6 +80,7 @@ typedef struct SubOpts bool refresh; bool binary; bool streaming; + bool validate_publication; } SubOpts; static List *fetch_table_list(WalReceiverConn *wrconn, List *publications); @@ -123,6 +125,8 @@ parse_subscription_options(List *stmt_options, bits32 supported_opts, SubOpts *o opts->binary = false; if (IsSet(supported_opts, SUBOPT_STREAMING)) opts->streaming = false; + if (IsSet(supported_opts, SUBOPT_VALIDATE_PUB)) + opts->validate_publication = false; /* Parse options */ foreach(lc, stmt_options) @@ -237,6 +241,17 @@ parse_subscription_options(List *stmt_options, bits32 supported_opts, SubOpts *o opts->specified_opts |= SUBOPT_STREAMING; opts->streaming = defGetBoolean(defel); } + else if (IsSet(supported_opts, SUBOPT_VALIDATE_PUB) && + strcmp(defel->defname, "validate_publication") == 0) + { + if (IsSet(opts->specified_opts, SUBOPT_VALIDATE_PUB)) +ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + + opts->specified_opts |= SUBOPT_VALIDATE_PUB; + opts->validate_publication = defGetBoolean(defel); + } else ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), @@ -275,10 +290,19 @@ parse_subscription_options(List *stmt_options, bits32 supported_opts, SubOpts *o errmsg("%s and %s are mutually exclusive options",
Re: .ready and .done files considered harmful
> specifically about history files being given higher priority for > archiving. If we go with this change then we'd at least want to rewrite > or remove those comments, but I don't actually agree that we should > remove that preference to archive history files ahead of WAL, for the > reasons brought up previously. > As was suggested on that subthread, it seems like it should be possible > to just track the current timeline and adjust what we're doing if the > timeline changes, and we should even know what the .history file is at > that point and likely don't even need to scan the directory for it, as > it'll be the old timeline ID. I agree, I missed this part. The .history file should be given higher preference. I will take care of it in the next patch. Thanks, Dipesh
Re: proposal - log_full_scan
Looking at this I like the idea in principle, but I'm not convinced that auto_explain is the right tool for this. auto_explain is for identifying slow queries, and what you are proposing is to identify queries with a certain "shape" (for lack of a better term) even if they aren't slow as per the log_min_duration setting. If log_min_duration is deemed to crude due to query volume then sample_rate is the tool. If sample_rate is also discarded, then pg_stat_statements seems a better option. Also, why just sequential scans (apart from it being this specific usecase)? If the idea is to track aspects of execution which are deemed slow, then tracking for example spills etc would be just as valid. Do you have thoughts on that? That being said, a few comments on the patch: - (auto_explain_log_min_duration >= 0 && \ + ((auto_explain_log_min_duration >= 0 || auto_explain_log_seqscan != -1) && \ Is there a reason to not follow the existing code and check for >= 0? + DefineCustomIntVariable("auto_explain.log_seqscan", It's only a matter of time before another node is proposed for logging, and then we'll be stuck adding log_XXXnode GUCs. Is there a more future-proof way to do this? + "Sets the minimum tuples produced by sequantial scans which plans will be logged", s/sequantial/sequential/ - es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze); + es->analyze = (queryDesc->instrument_options && (auto_explain_log_analyze || auto_explain_log_seqscan != -1)); Turning on ANALYZE when log_analyze isn't set to True is a no-no IMO. + * Colllect relations where log_seqscan limit was exceeded s/Colllect/Collect/ + if (*relnames.data != '\0') + appendStringInfoString(, ","); This should use appendStringInfoChar instead. + (errmsg("duration: %.3f ms, over limit seqscans: %s, plan:\n%s", The "over limit" part is superfluous since it otherwise wouldn't be logged. If we're prefixing something wouldn't it be more helpful to include the limit, like: "seqscans >= %d tuples returned:". I'm not a fan of "seqscans" but spelling it out is also quite verbose and this is grep-able. Documentation and tests are also missing -- Daniel Gustafsson https://vmware.com/
Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
On Tue, Jul 6, 2021 at 6:49 PM James Coleman wrote: > > Adding David since this patch is likely a precondition for [1]. > > On Tue, Jul 6, 2021 at 2:15 AM Ronan Dunklau wrote: > > > > Hello, > > > > While testing the patch "Add proper planner support for ORDER BY / DISTINCT > > aggregates" [0] I discovered the performance penalty from adding a sort node > > essentially came from not using the single-datum tuplesort optimization in > > ExecSort (contrary to the sorting done in ExecAgg). > > > > I originally proposed this patch as a companion in the same thread [1], but > > following James suggestion I'm making a separate thread just for this as the > > optimization is worthwhile independently of David's patch: it looks like we > > can expect a 2x speedup on a "select a single ordered column" case. > > > > The patch aimed to be as simple as possible: we only turn this optimization > > on > > when the tuple being sorted has only one attribute, it is "byval" (so as not > > to incur copies which would be hard to track in the execution tree) and > > unbound (again, not having to deal with copying borrowed datum anywhere). > > Thanks again for finding this and working up a patch. > > I've taken a look, and while I haven't dug into testing it yet, I have > a few comments. > > First, the changes are lacking any explanatory comments. Probably we > should follow how nodeAgg does this and add both comments to the > ExecSort function header as well as specific comments above the "if" > around the new tuplesort_begin_datum explaining the specific > conditions that are required for the optimization to be useful and > safe. > > That leads to a question I had: I don't follow why bounded mode (when > using byval) needs to be excluded. Comments should be added if there's > a good reason (as noted above), but maybe it's a case we can handle > safely? > > A second question: at first glance it's intuitively the case we might > not be able to handle byref values. But nodeAgg doesn't seem to have > that restriction. What's the difference here? > I think tuplesort_begin_datum, doesn't have any such limitation, it can handle any type of Datum so I think we don't need to consider the only attbyval, we can consider any type of attribute for this optimization. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Em ter., 6 de jul. de 2021 às 10:19, James Coleman escreveu: > Adding David since this patch is likely a precondition for [1]. > > On Tue, Jul 6, 2021 at 2:15 AM Ronan Dunklau > wrote: > > > > Hello, > > > > While testing the patch "Add proper planner support for ORDER BY / > DISTINCT > > aggregates" [0] I discovered the performance penalty from adding a sort > node > > essentially came from not using the single-datum tuplesort optimization > in > > ExecSort (contrary to the sorting done in ExecAgg). > > > > I originally proposed this patch as a companion in the same thread [1], > but > > following James suggestion I'm making a separate thread just for this as > the > > optimization is worthwhile independently of David's patch: it looks like > we > > can expect a 2x speedup on a "select a single ordered column" case. > > > > The patch aimed to be as simple as possible: we only turn this > optimization on > > when the tuple being sorted has only one attribute, it is "byval" (so as > not > > to incur copies which would be hard to track in the execution tree) and > > unbound (again, not having to deal with copying borrowed datum anywhere). > > Thanks again for finding this and working up a patch. > > I've taken a look, and while I haven't dug into testing it yet, I have > a few comments. > > First, the changes are lacking any explanatory comments. Probably we > should follow how nodeAgg does this and add both comments to the > ExecSort function header as well as specific comments above the "if" > around the new tuplesort_begin_datum explaining the specific > conditions that are required for the optimization to be useful and > safe. > > That leads to a question I had: I don't follow why bounded mode (when > using byval) needs to be excluded. Comments should be added if there's > a good reason (as noted above), but maybe it's a case we can handle > safely? > > A second question: at first glance it's intuitively the case we might > not be able to handle byref values. But nodeAgg doesn't seem to have > that restriction. What's the difference here? > > A few small code observations: > - In my view the addition of unlikely() in ExecSort is unlikely to be > of benefit because it's a single call for the entire node's execution > (not in the tuple loop). > No objection. And I agree that testing is complex and needs to remain as it is. - It seems clearer to change the "if (!node->is_single_val)" to flip > the true/false cases so we don't need the negation. > I think yes, it can be better. regards, Ranier Vilela
Re: .ready and .done files considered harmful
Greetings, * Dipesh Pandit (dipesh.pan...@gmail.com) wrote: > We have addressed the O(n^2) problem which involves directory scan for > archiving individual WAL files by maintaining a WAL counter to identify > the next WAL file in a sequence. This seems to have missed the concerns raised in https://postgr.es/m/20210505170601.gf20...@tamriel.snowman.net ..? And also the comments immediately above the ones being added here: > @@ -596,29 +606,55 @@ pgarch_archiveXlog(char *xlog) > * larger ID; the net result being that past timelines are given higher > * priority for archiving. This seems okay, or at least not obviously worth > * changing. > + * > + * WAL files are generated in a specific order of log segment number. The > + * directory scan for each WAL file can be minimized by identifying the next > + * WAL file in the sequence. This can be achieved by maintaining log segment > + * number and timeline ID corresponding to WAL file currently being archived. > + * The log segment number of current WAL file can be incremented by '1' upon > + * successful archival to point to the next WAL file. specifically about history files being given higher priority for archiving. If we go with this change then we'd at least want to rewrite or remove those comments, but I don't actually agree that we should remove that preference to archive history files ahead of WAL, for the reasons brought up previously. As was suggested on that subthread, it seems like it should be possible to just track the current timeline and adjust what we're doing if the timeline changes, and we should even know what the .history file is at that point and likely don't even need to scan the directory for it, as it'll be the old timeline ID. Thanks, Stephen signature.asc Description: PGP signature
Re: "debug_invalidate_system_caches_always" is too long
Andrew Dunstan writes: > On 7/5/21 11:46 PM, Bharath Rupireddy wrote: >> On Tue, Jul 6, 2021 at 12:43 AM Tom Lane wrote: >>> I like "debug_flush_caches" --- it's short and accurate. >> Do we always flush the cache entries into the disk? Sometimes we just >> invalidate the cache entries in the registered invalidation callbacks, >> right? Since we already use the term "clobber" in the user visible >> config option --clobber-cache, isn't it consistent to use >> debug_clobber_caches? > I think 'flush' here means simply 'discard'. Maybe that would be a > better word to use. "Discard" could be misinterpreted too, no doubt. None of these words have one single exact meaning, so I have only limited patience for this sort of argumentation. (As for initdb's "--clobber-cache", I'm assuming we'd rename that to match whatever we come up with here. It is what it is now only because I was unwilling to call it "--use-debug-invalidate-system-caches-always".) regards, tom lane
Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Adding David since this patch is likely a precondition for [1]. On Tue, Jul 6, 2021 at 2:15 AM Ronan Dunklau wrote: > > Hello, > > While testing the patch "Add proper planner support for ORDER BY / DISTINCT > aggregates" [0] I discovered the performance penalty from adding a sort node > essentially came from not using the single-datum tuplesort optimization in > ExecSort (contrary to the sorting done in ExecAgg). > > I originally proposed this patch as a companion in the same thread [1], but > following James suggestion I'm making a separate thread just for this as the > optimization is worthwhile independently of David's patch: it looks like we > can expect a 2x speedup on a "select a single ordered column" case. > > The patch aimed to be as simple as possible: we only turn this optimization on > when the tuple being sorted has only one attribute, it is "byval" (so as not > to incur copies which would be hard to track in the execution tree) and > unbound (again, not having to deal with copying borrowed datum anywhere). Thanks again for finding this and working up a patch. I've taken a look, and while I haven't dug into testing it yet, I have a few comments. First, the changes are lacking any explanatory comments. Probably we should follow how nodeAgg does this and add both comments to the ExecSort function header as well as specific comments above the "if" around the new tuplesort_begin_datum explaining the specific conditions that are required for the optimization to be useful and safe. That leads to a question I had: I don't follow why bounded mode (when using byval) needs to be excluded. Comments should be added if there's a good reason (as noted above), but maybe it's a case we can handle safely? A second question: at first glance it's intuitively the case we might not be able to handle byref values. But nodeAgg doesn't seem to have that restriction. What's the difference here? A few small code observations: - In my view the addition of unlikely() in ExecSort is unlikely to be of benefit because it's a single call for the entire node's execution (not in the tuple loop). - It seems clearer to change the "if (!node->is_single_val)" to flip the true/false cases so we don't need the negation. - I assume there are tests that likely already cover this case, but it'd be worth verifying that. Finally, I believe the same optimization likely ought to be added to nodeIncrementalSort. It's less likely the tests there are sufficient for both this and the original case, but we'll see. Thanks, James 1: https://www.postgresql.org/message-id/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com
Re: Keep notnullattrs in RelOptInfo (Was part of UniqueKey patch series)
David Rowley writes: > Tom, I'm wondering if you might get a chance to draw up a design for > what you've got in mind with this? I assume adding a new field in > Var, but I'm drawing a few blanks on how things might work for equal() > when one Var has the field set and another does not. As I said before, it hasn't progressed much past the handwaving stage, but it does seem like it's time to get it done. I doubt I'll have any cycles for it during the commitfest, but maybe I can devote a block of time during August. regards, tom lane
Re: Fix pkg-config file for static linking
On 21.06.21 15:47, Filip Gospodinov wrote: -PKG_CONFIG_REQUIRES_PRIVATE = libssl libcrypto +PKG_CONFIG_REQUIRES_PRIVATE = libpgcommon libpgport libssl libcrypto This doesn't work. This patch adds libpgcommon and libpgport to Requires.private. But they are not pkg-config names but library names, so they should be added to Libs.private.
Re: Asymmetric partition-wise JOIN
Andrey Lepikhov писал 2021-07-06 12:28: On 5/7/21 23:15, Zhihong Yu wrote: On Mon, Jul 5, 2021 at 2:57 AM Andrey Lepikhov mailto:a.lepik...@postgrespro.ru>> wrote: + * Can't imagine situation when join relation already exists. But in + * the 'partition_join' regression test it happens. + * It may be an indicator of possible problems. Should a log be added in the above case ? I made additional analysis of this branch of code. This situation can happen in the case of one child or if we join two plane tables with partitioned. Both situations are legal and I think we don't needed to add any log message here. Other mistakes were fixed. Hi. Small typo in comment in src/backend/optimizer/plan/setrefs.c: 281 282 /* 283 * Adjust RT indexes of AppendRelInfos and add to final appendrels list. 284 * The AppendRelInfos are copied, because as a part of a subplan its could 285 * be visited many times in the case of asymmetric join. 286 */ 287 foreach(lc, root->append_rel_list) 288 { its -> it (or they) ? -- Best regards, Alexander Pyhalov, Postgres Professional
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Tue, 6 Jul 2021 at 13:15, David Rowley wrote: > > Can you give an example where calling half_rounded too many times will > give the wrong value? Keeping in mind we call half_rounded the number > of times that the passed in value would need to be left-shifted by to > get the equivalent truncated value. > ./half_rounded 10241 10 1. half_round(10241) == 5121 :: 10241 >> 1 = 5120 2. half_round(5121) == 2561 :: 5120 >> 1 = 2560 3. half_round(2561) == 1281 :: 2560 >> 1 = 1280 4. half_round(1281) == 641 :: 1280 >> 1 = 640 5. half_round(641) == 321 :: 640 >> 1 = 320 6. half_round(321) == 161 :: 320 >> 1 = 160 7. half_round(161) == 81 :: 160 >> 1 = 80 8. half_round(81) == 41 :: 80 >> 1 = 40 9. half_round(41) == 21 :: 40 >> 1 = 20 10. half_round(21) == 11 :: 20 >> 1 = 10 The correct result should be 10 (it would be very odd to claim that 10241 bytes should be displayed as 11kb), but the half-rounding keeps rounding up at each stage. That's a general property of rounding -- you need to be very careful when rounding more than once, since otherwise errors will propagate. C.f. 4083f445c0, which removed a double-round in numeric sqrt(). To be clear, I'm not saying that the current code half-rounds more than once, just that it reads as if it does. Regards, Dean
?????? Why is XLOG_FPI_FOR_HINT always need backups?
Thank you for reply. You are right and the PostgreSQL server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint while data checksum is on. But I wonder whether it is necessary or not while my file system can protect the blocks of database to be torn. And I read a comment in functionMarkBufferDirtyHint: ``` /* * If we need to protect hint bit updates from torn writes, WAL-log a * full page image of the page. This full page image is only necessary * if the hint bit update is the first change to the page since the * last checkpoint. * * We don't check full_page_writes here because that logic is included * when we call XLogInsert() since the value changes dynamically. */ ``` However, the code tell me it has nothing to do with full_page_writes. I can't figure it out. -- Zhang Wenjie -- -- ??: "Japin Li" https://www.postgresql.org/docs/current/runtime-config-wal.html -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Removing redundant check for transaction in progress in check_safe_enum_use
On Sun, 4 Jul 2021, 03:40 Zhihong Yu, wrote: > > Hi, > I was looking at : > Relax transactional restrictions on ALTER TYPE ... ADD VALUE (redux). > > In check_safe_enum_use(): > > + if (!TransactionIdIsInProgress(xmin) && > + TransactionIdDidCommit(xmin)) > + return; > > Since the condition would be true only when TransactionIdDidCommit() returns > true, I think the call to TransactionIdIsInProgress is not needed. > If transaction for xmin is committed, the transaction cannot be in progress > at the same time. I'm not sure that removing the !TransactionIdIsInProgress-check is correct. The comment in heapam_visibility.c:13 explains that we need to check TransactionIdIsInProgress before TransactionIdDidCommit in non-MVCC snapshots, and I'm fairly certain that check_safe_enum_use() is not guaranteed to run only in MVCC snapshots (at least its documentation does not warn against non-MVCC snapshots). Kind regards, Matthias van de Meent
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Tue, 6 Jul 2021 at 23:39, Dean Rasheed wrote: > When I first read this: > > +/* half-round until we get down to unitBits */ > +while (rightshifts++ < unit->unitBits) > +size = half_rounded(size); > > it looked to me like it would be invoking half_rounded() multiple > times, which raised alarm bells because that would risk rounding the > wrong way. Eventually I realised that by the time it reaches that, > rightshifts will always equal unit->unitBits or unit->unitBits - 1, so > it'll never do more than one half-round, which is important. It's true that based on how the units table is set up now, it'll only ever do it once for all but the first loop. I wrote the attached .c file just to try to see if it ever goes wrong and I didn't manage to find any inputs where it did. I always seem to get the half rounded value either the same as the shifted value or 1 higher towards positive infinity $ ./half_rounded -102 3 1. half_round(-102) == -51 :: -102 >> 1 = -51 2. half_round(-51) == -25 :: -51 >> 1 = -26 3. half_round(-25) == -12 :: -26 >> 1 = -13 $ ./half_rounded 6432 3 1. half_round(6432) == 3216 :: 6432 >> 1 = 3216 2. half_round(3216) == 1608 :: 3216 >> 1 = 1608 3. half_round(1608) == 804 :: 1608 >> 1 = 804 Can you give an example where calling half_rounded too many times will give the wrong value? Keeping in mind we call half_rounded the number of times that the passed in value would need to be left-shifted by to get the equivalent truncated value. David #define half_rounded(x) (((x) + ((x) < 0 ? 0 : 1)) / 2) #include #include int main(int argc, char **argv) { int num, times; int i; int r, shift; if (argc < 3) { printf("Usage: %s \n"); return -1; } num = atoi(argv[1]); times = atoi(argv[2]); r = num; shift = num; for (i = 0; i < times; i++) { int rr = half_rounded(r); printf("%d. half_round(%d) == %d :: %d >> 1 = %d\n", i+1, r, rr, shift, shift >> 1); r = rr; shift >>= 1; } return 0; }
Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values
On Tue, Jul 6, 2021 at 8:43 PM David Rowley wrote: > > On Sun, 4 Jul 2021 at 20:53, David Rowley wrote: > > > > On Sat, 3 Jul 2021 at 22:44, Peter Eisentraut > > wrote: > > > I don't think this is a good change. > > > > > I think we should leave it as is. > > > > I'm inclined to agree. > > Does anyone object to marking this patch as rejected in the CF app? > I think if you're going to reject this patch, a brief comment should be added to that code to justify why that existing superfluous check is worthwhile. (After all, similar checks are not being done elsewhere in the Postgres code, AFAICS. e.g. "int" variables are not being checked to see whether they hold values greater than MAXINT). Regards, Greg Nancarrow Fujitsu Australia
Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Em ter., 6 de jul. de 2021 às 08:25, Ranier Vilela escreveu: > Em ter., 6 de jul. de 2021 às 03:15, Ronan Dunklau > escreveu: > >> Hello, >> >> While testing the patch "Add proper planner support for ORDER BY / >> DISTINCT >> aggregates" [0] I discovered the performance penalty from adding a sort >> node >> essentially came from not using the single-datum tuplesort optimization >> in >> ExecSort (contrary to the sorting done in ExecAgg). >> >> I originally proposed this patch as a companion in the same thread [1], >> but >> following James suggestion I'm making a separate thread just for this as >> the >> optimization is worthwhile independently of David's patch: it looks like >> we >> can expect a 2x speedup on a "select a single ordered column" case. >> >> The patch aimed to be as simple as possible: we only turn this >> optimization on >> when the tuple being sorted has only one attribute, it is "byval" (so as >> not >> to incur copies which would be hard to track in the execution tree) and >> unbound (again, not having to deal with copying borrowed datum anywhere). >> >> The attached patch is originally by me, with some cleanup by Ranier >> Vilela. >> I'm sending Ranier's version here. >> > Nice Ronan. > But I think there is some confusion here. > The author is not you? > > Just to clarify, at Commitfest, it was supposed to be the other way around. > You as an author and David as a reviewer. > I'll put myself as a reviewer too. > Sorry David, my mistake. I confused the numbers (id) of Commitfest. regards, Ranier Vilela
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Tue, 6 Jul 2021 at 10:20, David Rowley wrote: > > I made another pass over this and ended up removing the doHalfRound > field in favour of just doing rounding based on the previous > bitshifts. > When I first read this: +/* half-round until we get down to unitBits */ +while (rightshifts++ < unit->unitBits) +size = half_rounded(size); it looked to me like it would be invoking half_rounded() multiple times, which raised alarm bells because that would risk rounding the wrong way. Eventually I realised that by the time it reaches that, rightshifts will always equal unit->unitBits or unit->unitBits - 1, so it'll never do more than one half-round, which is important. So perhaps using doHalfRound would be clearer, but it could just be a local variable tracking whether or not it's the first time through the loop. Regards, Dean
Re: Can a child process detect postmaster death when in pg_usleep?
On Tue, Jul 6, 2021 at 4:33 PM Michael Paquier wrote: > > On Tue, Jul 06, 2021 at 03:54:07PM +0530, Bharath Rupireddy wrote: > > Thanks. You are right. The issue is due to the MyLatch being set by > > SwitchToSharedLatch before WaitLatch. If we use (WL_TIMEOUT | > > WL_EXIT_ON_PM_DEATH), then the backends will honour the > > post_auth_delay as well as detect the postmaster death. Since we are > > not using WL_LATCH_SET, I removed ResetLatch. Also, added some > > comments around why we are not using WL_LATCH_SET. > > > > For PreAuthDelay, there's no problem to use WL_LATCH_SET as MyLatch > > still points to the local latch(which is not set) in > > BackendInitialize(). > > FWIW, I think that it could be a good idea to use the same set of > flags for all the pre/post_auth_delay paths for consistency. That's > useful when grepping for one. Please note that I don't plan to look > more at this patch set for this CF as I am not really excited by the > updates involving developer options, and I suspect more issues like > the one I found upthread so this needs a close lookup. > > If somebody else wishes to look at it, please feel free, of course. Thanks. Anyways, I removed WL_LATCH_SET for PreAuthDelay as well. PSA v4 patch. Regards, Bharath Rupireddy. v3-0001-Use-a-WaitLatch-for-Pre-and-Post-Auth-Delay.patch Description: Binary data
Re: Commit fest manager
On Tue, Jul 6, 2021 at 3:58 PM Michael Paquier wrote: > On Tue, Jul 06, 2021 at 03:38:23PM +0500, Ibrar Ahmed wrote: > > Any update and decision on this? so I can start working on this. > > Working on the CF does not strongly require the admin permissions. I > have already switched the current CF as in progress, so most of the > admin job is done for this month :) > > Another piece of interest are the reviewer/author stat reports, but > poking at patching does not need this information in most cases. > -- > Michael > Great, thanks Michael, I will start doing my work :) -- Ibrar Ahmed
Re: [PATCH] Use optimized single-datum tuplesort in ExecSort
Em ter., 6 de jul. de 2021 às 03:15, Ronan Dunklau escreveu: > Hello, > > While testing the patch "Add proper planner support for ORDER BY / > DISTINCT > aggregates" [0] I discovered the performance penalty from adding a sort > node > essentially came from not using the single-datum tuplesort optimization in > ExecSort (contrary to the sorting done in ExecAgg). > > I originally proposed this patch as a companion in the same thread [1], > but > following James suggestion I'm making a separate thread just for this as > the > optimization is worthwhile independently of David's patch: it looks like > we > can expect a 2x speedup on a "select a single ordered column" case. > > The patch aimed to be as simple as possible: we only turn this > optimization on > when the tuple being sorted has only one attribute, it is "byval" (so as > not > to incur copies which would be hard to track in the execution tree) and > unbound (again, not having to deal with copying borrowed datum anywhere). > > The attached patch is originally by me, with some cleanup by Ranier > Vilela. > I'm sending Ranier's version here. > Nice Ronan. But I think there is some confusion here. The author is not you? Just to clarify, at Commitfest, it was supposed to be the other way around. You as an author and David as a reviewer. I'll put myself as a reviewer too. regards, Ranier Vilela
Re: ECPG doesn't compile CREATE AS EXECUTE properly.
On Tue, Jul 06, 2021 at 05:47:34PM +0900, Kyotaro Horiguchi wrote: > More accurately, I didn't come up with the way to split out some of > the rule-components in a rule out as another rule using the existing > infrastructure. > > [...] > > Then add the following component to the rule "stmt". I see. That sounds interesting as solution, and consistent with the existing cursor queries. The ECPG parser is not that advanced, and we may not really need to make it more complicated with sub-clause handling like INEs. So I'd be rather in favor of what you are describing here. -- Michael signature.asc Description: PGP signature
Re: Why is XLOG_FPI_FOR_HINT always need backups?
On Tue, 06 Jul 2021 at 17:58, zwj <757634...@qq.com> wrote: > Hi all, > > When I read the source code file src/backend/access/transam/xloginsert.c, I > get something confused me. > In the function XLogSaveBufferForHint, the flags are always > REGBUF_FORCE_IMAGE which means it is always need backups. > Is it right? Why do not check the full_page_writes? The documentation [1] says: -- wal_log_hints (boolean) When this parameter is on, the PostgreSQL server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint, even for non-critical modifications of so-called hint bits. -- Does that mean whether the full_page_writes enable or not, if the wal_log_hints enabled, we always write the entire content of each disk page to WAL? If I'm right, should we mention this in wal_log_hints? [1] https://www.postgresql.org/docs/current/runtime-config-wal.html -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Can a child process detect postmaster death when in pg_usleep?
On Tue, Jul 06, 2021 at 03:54:07PM +0530, Bharath Rupireddy wrote: > Thanks. You are right. The issue is due to the MyLatch being set by > SwitchToSharedLatch before WaitLatch. If we use (WL_TIMEOUT | > WL_EXIT_ON_PM_DEATH), then the backends will honour the > post_auth_delay as well as detect the postmaster death. Since we are > not using WL_LATCH_SET, I removed ResetLatch. Also, added some > comments around why we are not using WL_LATCH_SET. > > For PreAuthDelay, there's no problem to use WL_LATCH_SET as MyLatch > still points to the local latch(which is not set) in > BackendInitialize(). FWIW, I think that it could be a good idea to use the same set of flags for all the pre/post_auth_delay paths for consistency. That's useful when grepping for one. Please note that I don't plan to look more at this patch set for this CF as I am not really excited by the updates involving developer options, and I suspect more issues like the one I found upthread so this needs a close lookup. If somebody else wishes to look at it, please feel free, of course. -- Michael signature.asc Description: PGP signature
Re: Commit fest manager
On Tue, Jul 06, 2021 at 03:38:23PM +0500, Ibrar Ahmed wrote: > Any update and decision on this? so I can start working on this. Working on the CF does not strongly require the admin permissions. I have already switched the current CF as in progress, so most of the admin job is done for this month :) Another piece of interest are the reviewer/author stat reports, but poking at patching does not need this information in most cases. -- Michael signature.asc Description: PGP signature
Re: Removing unneeded self joins
On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov wrote: > On 2/7/21 01:56, Hywel Carver wrote: > > On Wed, Jun 30, 2021 at 12:21 PM Andrey Lepikhov > > mailto:a.lepik...@postgrespro.ru>> wrote: > > I think, here we could ask more general question: do we want to > > remove a > > 'IS NOT NULL' clause from the clause list if the rest of the list > > implicitly implies it? > > > > > > My suggestion was not to remove it, but to avoid adding it in the first > > place. When your optimisation has found a join on a group of columns > > under a uniqueness constraint, you would do something like this (forgive > > the pseudo-code) > > > > foreach(column, join_clause) { > >if(column.nullable) { // This condition is what I'm suggesting is > added > > add_null_test(column, IS_NOT_NULL); > >} > > } > > > > But it may be that that's not possible or practical at this point in the > > code. > I think, such option will require to implement a new machinery to prove > that arbitrary column couldn't produce NULL value. > Got it, and it makes sense to me that this would be out of scope for this change. I remember in the previous conversation about this, Tomas acknowledged that while there are some silly queries that would benefit from this change, there are also some well-written ones (e.g. properly denormalised table structures, with decomposed views that need joining together in some queries). So the optimization needs to be essentially free to run to minimise impact on other queries. Looking through the email chain, a previous version of this patch added ~0.6% to planning time in the worst case tested - does that meet the "essentially free" requirement?
Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values
On Sun, 4 Jul 2021 at 20:53, David Rowley wrote: > > On Sat, 3 Jul 2021 at 22:44, Peter Eisentraut > wrote: > > I don't think this is a good change. > > > I think we should leave it as is. > > I'm inclined to agree. Does anyone object to marking this patch as rejected in the CF app? David
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
I've been looking at the NOT IN hashing patch again and after making a few minor tweaks I think it's pretty much ready to go. If anyone feels differently, please let me know in the next couple of days. Otherwise, I plan on taking a final look and pushing it soon. David v5-0001-Use-hash-table-to-speed-up-NOT-IN-with-a-set-of-C.patch Description: Binary data
Re: Commit fest manager
On Fri, Jul 2, 2021 at 7:15 PM Ibrar Ahmed wrote: > > > On Fri, 2 Jul 2021 at 7:06 PM, vignesh C wrote: > >> On Fri, Jul 2, 2021 at 6:05 PM Ibrar Ahmed wrote: >> > >> > >> > >> > On Fri, 2 Jul 2021 at 1:47 PM, David Rowley >> wrote: >> >> >> >> On Fri, 2 Jul 2021 at 15:04, vignesh C wrote: >> >> > I'm interested in assisting Ibrar Ahmed. >> >> >> >> It might be worth talking to Ibrar to see where you can lend a hand. I >> >> think in terms of the number of patches, this might be our biggest >> >> commitfest yet. >> >> >> >> 2020-07 246 >> >> 2020-09 235 >> >> 2020-11 244 >> >> 2021-01 260 >> >> 2020-03 295 >> >> 2020-07 342 >> >> >> >> It's possible Ibrar would welcome you helping to take care of some of >> >> the duties. I've never been brave enough to take on the CF manager >> >> role yet, but from what I can see, to do a good job takes a huge >> >> amount of effort. >> >> >> >> David >> > >> > >> > I am willing to take the responsibility, help from vegnsh is welcome >> >> Thanks, Can someone provide me permissions as this will be my first >> time. My username is vignesh.postgres. >> >> Regards, >> Vignesh > > i need permission my id is ibrar.ah...@gmail.com > >> >> -- > Ibrar Ahmed > Any update and decision on this? so I can start working on this. -- Ibrar Ahmed