Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
> "Tom" == Tom Lane writes: Tom> I'm dubious that relying on zone[1970].tab would improve matters Tom> substantially; it would fix some cases, but I don't think it would Tom> fix all of them. Resolving all ambiguous zone-name choices is not Tom> the charter of those files. Allowing zone matching by _content_ (as we do) rather than by name does not seem to be supported in any respect whatever by the upstream data; we've always been basically on our own with that. [tl/dr for what follows: my proposal reduces the number of discrepancies from 91 (see previously posted list) to 16 or 7, none of which are new] So here are the ambiguities that are not resolvable at all: Africa/Abidjan -> GMT This happens because the Africa/Abidjan zone is literally just GMT even down to the abbreviation, and we don't want to guess Africa/Abidjan for all GMT installs. America/Argentina/Rio_Gallegos -> America/Argentina/Ushuaia Asia/Kuala_Lumpur -> Asia/Singapore These are cases where zone1970.tab, despite its name, includes distinctly-named zones which are distinct only for times in the far past (before 1920 or 1905 respectively). They are otherwise identical by content. We therefore end up choosing arbitrarily. In addition, the following collection of random islands have timezones which lack local abbreviation names, recent offset changes, or DST, and are therefore indistinguishable by content from fixed-offset zones like Etc/GMT+2: Etc/GMT-4 == Indian/Mahe Indian/Reunion Etc/GMT-7 == Indian/Christmas Etc/GMT-9 == Pacific/Palau Etc/GMT-10 == Pacific/Port_Moresby Etc/GMT-11 == Pacific/Guadalcanal Etc/GMT-12 == Pacific/Funafuti Pacific/Tarawa Pacific/Wake Pacific/Wallis Etc/GMT+10 == Pacific/Tahiti Etc/GMT+9 == Pacific/Gambier Etc/GMT+2 == Atlantic/South_Georgia We currently map all of these to the Etc/GMT+x names on the grounds of length. If we chose to prefer zone.tab names over Etc/* names for all of these, we'd be ambiguous only for a handful of relatively small islands. -- Andrew (irc:RhodiumToad)
RE: Run-time pruning for ModifyTable
Hi, Amit > If I understand the details of [1] correctly, ModifyTable will no longer > have N subplans for N result relations as there are today. So, it doesn't > make sense for ModifyTable to contain PartitionedRelPruneInfos and for > ExecInitModifyTable/ExecModifyTable > to have to perform initial and execution-time pruning, respectively. Does this mean that the generic plan will not have N subplans for N result relations? I thought [1] would make creating generic plans faster, but is this correct? regards, kato sho > -Original Message- > From: Amit Langote [mailto:amitlangot...@gmail.com] > Sent: Wednesday, July 3, 2019 5:41 PM > To: David Rowley > Cc: PostgreSQL Hackers > Subject: Re: Run-time pruning for ModifyTable > > On Wed, Jul 3, 2019 at 4:34 PM David Rowley > wrote: > > On Wed, 3 Jul 2019 at 17:27, Amit Langote > wrote: > > > A cursory look at the patch suggests that most of its changes will > > > be for nothing if [1] materializes. What do you think about that? > > > > Yeah, I had this in mind when writing the patch, but kept going > > anyway. I think it's only really a small patch of this patch that > > would get wiped out with that change. Just the planner.c stuff. > > Everything else is still required, as far as I understand. > > If I understand the details of [1] correctly, ModifyTable will no longer > have N subplans for N result relations as there are today. So, it doesn't > make sense for ModifyTable to contain PartitionedRelPruneInfos and for > ExecInitModifyTable/ExecModifyTable > to have to perform initial and execution-time pruning, respectively. > As I said, bottom expansion of target inheritance will mean pruning (both > plan-time and run-time) will occur at the bottom too, so the run-time > pruning capabilities of nodes that already have it will be used for UPDATE > and DELETE too. > > Thanks, > Amit >
Re: Tid scan improvements
On Mon, 1 Jul 2019 at 23:29, Thomas Munro wrote: > The new CF is here. I'm going through poking threads for submissions > that don't apply, but it sounds like this needs more than a rebase? > Perhaps this belongs in the next CF? 0001 and 0004 of v7 got pushed in PG12. The CFbot will be trying to apply 0001 still, but on testing 0002, no joy there either. It would be good to see this back in PG13. For now, I'll mark it as waiting on author. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: TopoSort() fix
I'll try to figure out some scenarios to do the test. A parallel process group is needed for the test. Actually I was trying to do some testing against the locking mechanism. I happened to see this issue. On Wed, Jul 3, 2019 at 9:38 PM Robert Haas wrote: > On Tue, Jul 2, 2019 at 11:23 AM Tom Lane wrote: > > Rui Hai Jiang writes: > > > I think I found an issue in the TopoSort() function. > > > > This indeed seems like a live bug introduced by a1c1af2a. > > Robert? > > This is pretty thoroughly swapped out of my head, but it looks like > that analysis might be correct. > > Is it practical to come up with a test case that demonstrates the problem? > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
Re: Where is SSPI auth username determined for TAP tests?
On Wed, Jul 03, 2019 at 09:53:14AM -0400, Tom Lane wrote: > I haven't checked that this actually works, but it looks plausible, > and I agree it's simpler/easier. Thanks, committed. While testing on Windows, I have been trapped by the fact that IPC::Run mishandles double quotes, causing the tests to fail for the environment variable part because of a mismatching pg_hba.conf entry. The difference is that with we run pg_regress --config-auth using IPC::Run::run on HEAD but the patch switches to system(). So I have finished by removing the double-quote handling from the restore user name which makes the whole test suite more consistent. The patch has at the end the advantage of removing in pg_ident.conf the entry related to the OS user running the scripts, which makes the environment more restricted by default. -- Michael signature.asc Description: PGP signature
Re: Memory-Bounded Hash Aggregation
On Wed, 2019-07-03 at 02:17 +0200, Tomas Vondra wrote: > What does "partitioned hash strategy" do? It's probably explained in > one > of the historical discussions, but I'm not sure which one. I assume > it > simply hashes the group keys and uses that to partition the data, and > then > passing it to hash aggregate. Yes. When spilling, it is cheap to partition on the hash value at the same time, which dramatically reduces the need to spill multiple times. Previous discussions: > Unfortunately the second link does not work :-( It's supposed to be: https://www.postgresql.org/message-id/CAGTBQpa__-NP7%3DkKwze_enkqw18vodRxKkOmNhxAPzqkruc-8g%40mail.gmail.com > I'm not going to block Approach 1, althought I'd really like to see > something that helps with array_agg. I have a WIP patch that I just posted. It doesn't yet work with ARRAY_AGG, but I think it can be made to work by evicting the entire hash table, serializing the transition states, and then later combining them. > Aren't all three approaches a way to "fix" hash aggregate? In any > case, > it's certainly reasonable to make incremental changes. The question > is > whether "approach 1" is sensible step towards some form of "approach > 3" Disk-based hashing certainly seems like a reasonable algorithm on paper that has some potential advantages over sorting. It certainly seems sensible to me that we explore the disk-based hashing strategy first, and then we would at least know what we are missing (if anything) by going with the hybrid approach later. There's also a fair amount of design space to explore in the hybrid strategy. That could take a while to converge, especially if we don't have anything in place to compare against. > > * It means we have a hash table and sort running concurrently, each > > using memory. Andres said this might not be a problem[3], but I'm > > not convinced that the problem is zero. If you use small work_mem > > for the write phase of sorting, you'll end up with a lot of runs > > to > > merge later and that has some kind of cost. > > > > Why would we need to do both concurrently? I thought we'd empty the > hash > table before doing the sort, no? So you are saying we spill the tuples into a tuplestore, then feed the tuplestore through a tuplesort? Seems inefficient, but I guess we can. > Do we actually need to handle that case? How many such aggregates are > there? I think it's OK to just ignore that case (and keep doing what > we do > now), and require serial/deserial functions for anything better. Punting on a few cases is fine with me, if the user has a way to fix it. Regards, Jeff Davis
Re: Cleaning up and speeding up string functions
On Sun, 26 May 2019 at 04:50, Tom Lane wrote: > > David Rowley writes: > > Here's a small patch series aimed to both clean up a few misuses of > > string functions and also to optimise a few things along the way. > > > 0001: Converts various call that use appendPQExpBuffer() that really > > should use appendPQExrBufferStr(). If there's no formatting then > > using the former function is a waste of effort. > > > 0002: Similar to 0001 but replaces various appendStringInfo calls with > > appendStringInfoString calls. > > Agreed on these; we've applied such transformations before. I've pushed 0001 and 0002. Instead of having 0004, how about the attached? Most of the calls won't improve much performance-wise since they're so cheap anyway, but there is xmlconcat(), I imagine that should see some speedup. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services use_binary_string_info_when_len_is_known.patch Description: Binary data
Re: GiST VACUUM
On Thu, Apr 4, 2019 at 8:15 AM Heikki Linnakangas wrote: > I think we should fix this in a similar manner in B-tree, too, but that > can be done separately. For B-tree, we need to worry about > backwards-compatibility, but that seems simple enough; we just need to > continue to understand old deleted pages, where the deletion XID is > stored in the page opaque field. What Postgres versions will the B-Tree fix end up targeting? Sounds like you plan to backpatch all the way? -- Peter Geoghegan
Re: Memory-Bounded Hash Aggregation
On Mon, 2019-07-01 at 12:13 -0700, Jeff Davis wrote: > This is for design review. I have a patch (WIP) for Approach 1, and > if > this discussion starts to converge on that approach I will polish and > post it. WIP patch attached (based on 9a81c9fa); targeting September CF. Not intended for detailed review yet, but it seems to work in enough cases (including grouping sets and JIT) to be a good proof-of-concept for the algorithm and its complexity. Initial performance numbers put it at 2X slower than sort for grouping 10M distinct integers. There are quite a few optimizations I haven't tried yet and quite a few tunables I haven't tuned yet, so hopefully I can close the gap a bit for the small-groups case. I will offer more details soon when I have more confidence in the numbers. It does not attempt to spill ARRAY_AGG at all yet. Regards, Jeff Davis diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 84341a30e5..9f978e5a90 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1702,6 +1702,23 @@ include_dir 'conf.d' + + hashagg_mem_overflow (boolean) + + hashagg_mem_overflow configuration parameter + + + + + If hash aggregation exceeds work_mem at query + execution time, and hashagg_mem_overflow is set + to on, continue consuming more memory rather than + performing disk-based hash aggregation. The default + is off. + + + + max_stack_depth (integer) @@ -4354,6 +4371,24 @@ ANY num_sync ( + enable_hashagg_spill (boolean) + + enable_hashagg_spill configuration parameter + + + + +Enables or disables the query planner's use of hashed aggregation plan +types when the memory usage is expected to +exceed work_mem. This only affects the planner +choice; actual behavior at execution time is dictated by +. The default +is on. + + + + enable_hashjoin (boolean) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 92969636b7..6d6481a75f 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -102,6 +102,7 @@ static void show_tablesample(TableSampleClause *tsc, PlanState *planstate, List *ancestors, ExplainState *es); static void show_sort_info(SortState *sortstate, ExplainState *es); static void show_hash_info(HashState *hashstate, ExplainState *es); +static void show_hashagg_info(AggState *hashstate, ExplainState *es); static void show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es); static void show_instrumentation_count(const char *qlabel, int which, @@ -1826,6 +1827,7 @@ ExplainNode(PlanState *planstate, List *ancestors, case T_Agg: show_agg_keys(castNode(AggState, planstate), ancestors, es); show_upper_qual(plan->qual, "Filter", planstate, ancestors, es); + show_hashagg_info((AggState *) planstate, es); if (plan->qual) show_instrumentation_count("Rows Removed by Filter", 1, planstate, es); @@ -2715,6 +2717,55 @@ show_hash_info(HashState *hashstate, ExplainState *es) } } +/* + * Show information on hash aggregate buckets and batches + */ +static void +show_hashagg_info(AggState *aggstate, ExplainState *es) +{ + Agg *agg = (Agg *)aggstate->ss.ps.plan; + long memPeakKb = (aggstate->hash_mem_peak + 1023) / 1024; + long diskKb= (aggstate->hash_disk_used + 1023) / 1024; + + + Assert(IsA(aggstate, AggState)); + + if (agg->aggstrategy != AGG_HASHED && + agg->aggstrategy != AGG_MIXED) + return; + + if (es->format == EXPLAIN_FORMAT_TEXT) + { + appendStringInfoSpaces(es->str, es->indent * 2); + appendStringInfo( + es->str, + "Memory Usage: %ldkB", + memPeakKb); + + if (aggstate->hash_batches_used > 0) + { + appendStringInfo( +es->str, +" Batches: %d Disk Usage:%ldkB", +aggstate->hash_batches_used, diskKb); + } + + appendStringInfo( + es->str, + "\n"); + } + else + { + ExplainPropertyInteger("Peak Memory Usage", "kB", memPeakKb, es); + if (aggstate->hash_batches_used > 0) + { + ExplainPropertyInteger("HashAgg Batches", NULL, + aggstate->hash_batches_used, es); + ExplainPropertyInteger("Disk Usage", "kB", diskKb, es); + } + } +} + /* * If it's EXPLAIN ANALYZE, show exact/lossy pages for a BitmapHeapScan node */ diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 66a67c72b2..19e1127627 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -1570,7 +1570,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) [op->d.agg_init_trans.transno]; /* If transValue has not yet been initialized, do so now. */ - if (pergroup->noTransValue) + if (pergroup != NULL && pergroup->noTransValue) {
Re: Replacing the EDH SKIP primes
On Wed, Jul 03, 2019 at 08:56:42PM +0200, Daniel Gustafsson wrote: > Agreed, I’ve updated the patch with a comment on this formulated such that it > should stand the test of time even as OpenSSL changes etc. I'd like to think that we had rather mention the warning issue explicitely, so as people don't get surprised, like that for example: * This is the 2048-bit DH parameter from RFC 3526. The generation of the * prime is specified in RFC 2412, which also discusses the design choice * of the generator. Note that when loaded with OpenSSL this causes * DH_check() to fail on with DH_NOT_SUITABLE_GENERATOR, where leaking * a bit is preferred. Now this makes an OpenSSL-specific issue pop up within a section of the code where we want to make things more generic with SSL, so your simpler version has good arguments as well. I have just rechecked the shape of the key, and we have an exact match. -- Michael diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h index 96415a9c8b..93581acb26 100644 --- a/src/include/libpq/libpq-be.h +++ b/src/include/libpq/libpq-be.h @@ -206,19 +206,20 @@ typedef struct Port * Hardcoded DH parameters, used in ephemeral DH keying. (See also * README.SSL for more details on EDH.) * - * If you want to create your own hardcoded DH parameters - * for fun and profit, review "Assigned Number for SKIP - * Protocols" (http://www.skip-vpn.org/spec/numbers.html) - * for suggestions. + * This is the 2048-bit DH parameter from RFC 3526. The generation of the + * prime is specified in RFC 2412, which also discusses the design choice + * of the generator. Note that when loaded with OpenSSL this causes + * DH_check() to fail on with DH_NOT_SUITABLE_GENERATOR, where leaking + * a bit is preferred. */ #define FILE_DH2048 \ "-BEGIN DH PARAMETERS-\n\ -MIIBCAKCAQEA9kJXtwh/CBdyorrWqULzBej5UxE5T7bxbrlLOCDaAadWoxTpj0BV\n\ -89AHxstDqZSt90xkhkn4DIO9ZekX1KHTUPj1WV/cdlJPPT2N286Z4VeSWc39uK50\n\ -T8X8dryDxUcwYc58yWb/Ffm7/ZFexwGq01uejaClcjrUGvC/RgBYK+X0iP1YTknb\n\ -zSC0neSRBzZrM2w4DUUdD3yIsxx8Wy2O9vPJI8BD8KVbGI2Ou1WMuF040zT9fBdX\n\ -Q6MdGGzeMyEstSr/POGxKUAYEY18hKcKctaGxAMZyAcpesqVDNmWn6vQClCbAkbT\n\ -CD1mpF1Bn5x8vYlLIhkmuquiXsNV6TILOwIBAg==\n\ +MIIBCAKCAQEA///JD9qiIWjCNMTGYouA3BzRKQJOCIpnzHQCC76mOxOb\n\ +IlFKCHmONATd75UZs806QxswKwpt8l8UN0/hNW1tUcJF5IW1dmJefsb0TELppjft\n\ +awv/XLb0Brft7jhr+1qJn6WunyQRfEsf5kkoZlHs5Fs9wgB8uKFjvwWY2kg2HFXT\n\ +mmkWP6j9JM9fg2VdI9yjrZYcYvNWIIVSu57VKQdwlpZtZww1Tkq8mATxdGwIyhgh\n\ +fDKQXkYuNs474553LBgOhgObJ4Oi7Aeij7XFXfBvTFLJ3ivL9pVYFxg5lUl86pVq\n\ +5RXSJhiY+gUQFXKOWoqsqmj//wIBAg==\n\ -END DH PARAMETERS-\n" /* signature.asc Description: PGP signature
Re: Index Skip Scan
On Thu, Jul 04, 2019 at 10:06:11AM +1200, David Rowley wrote: > On Thu, 4 Jul 2019 at 09:02, James Coleman wrote: > > I think that example is the opposite direction of what David (Rowley) > > is saying. Unique on {a, b} implies unique on {a, b, c} while you're > > correct that the inverse doesn't hold. > > > > Unique on {a, b} also implies unique on {b, a} as well as on {b, a, c} > > and {c, a, b} and {c, b, a} and {a, c, b}, which is what makes this > > different from pathkeys. > > Yeah, exactly. A superset of the unique columns is still unique. Thanks for clarifying! Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Weird intermittent ECPG test failures
Observe the following buildfarm failures: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela=2019-07-03%2013%3A33%3A59 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-04-30%2014%3A45%3A26 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-06-02%2015%3A15%3A26 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-06-05%2006%3A15%3A26 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-06-26%2002%3A00%3A26 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-07-03%2022%3A15%3A27 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop=2019-04-10%2011%3A00%3A09 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop=2019-04-14%2012%3A42%3A31 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop=2019-05-01%2011%3A00%3A08 https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hamerkop=2019-06-29%2011%3A33%3A25 (These represent all but a couple of the ECPG-Check failures in the last 90 days, excluding a spate around 31-May that were caused by an ill-considered code change.) In every one of these cases, the visible symptom is that one of the ECPG test programs produced precisely nothing on stdout. stderr seems okay though. (Not all of these cases have nonempty expected stderr, but enough do that we can say it's not a case of the program just failing entirely.) We've been seeing this sort of thing for a *long* time, although my recollection is that in the past it's almost always been the thread-thread case that failed, so that I'd supposed that there was some problem with that particular test. It's now clear that that's not true though, and there's seemingly some generic issue with the tests or test infrastructure. The other thing that I thought was invariably true was that it was a Windows-only thing. But here we have conchuela failing in the exact same way, and it is, um, not Windows. Anyone have a theory what might be going on here? regards, tom lane
Re: POC: converting Lists into arrays
On Thu, 4 Jul 2019 at 06:15, Tom Lane wrote: > > David Rowley writes: > > I've now read over the entire patch and have noted down the following: > > Thanks for the review! Attached is a v8 responding to most of your > comments. Anything not quoted below I just accepted. Thanks for the speedy turnaround. I've looked at v8, as far as a diff between the two patches and I'm happy. I've marked as ready for committer. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Index Skip Scan
On Thu, 4 Jul 2019 at 09:02, James Coleman wrote: > > On Wed, Jul 3, 2019 at 3:46 PM David Fetter wrote: > > > > On Wed, Jul 03, 2019 at 12:27:09AM +1200, David Rowley wrote: > > > On Tue, 2 Jul 2019 at 21:00, Thomas Munro wrote: > > > > > > The more I think about these UniqueKeys, the more I think they need to > > > be a separate concept to PathKeys. For example, UniqueKeys: { x, y } > > > should be equivalent to { y, x }, but with PathKeys, that's not the > > > case, since the order of each key matters. UniqueKeys equivalent > > > version of pathkeys_contained_in() would not care about the order of > > > individual keys, it would say things like, { a, b, c } is contained in > > > { b, a }, since if the path is unique on columns { b, a } then it must > > > also be unique on { a, b, c }. > > > > Is that actually true, though? I can see unique {a, b, c} => unique > > {a, b}, but for example: > > > > a | b | c > > --|---|-- > > 1 | 2 | 3 > > 1 | 2 | 4 > > 1 | 2 | 5 > > > > is unique on {a, b, c} but not on {a, b}, at least as I understand the > > way "unique" is used here, which is 3 distinct {a, b, c}, but only one > > {a, b}. > > > > Or I could be missing something obvious, and in that case, please > > ignore. > > I think that example is the opposite direction of what David (Rowley) > is saying. Unique on {a, b} implies unique on {a, b, c} while you're > correct that the inverse doesn't hold. > > Unique on {a, b} also implies unique on {b, a} as well as on {b, a, c} > and {c, a, b} and {c, b, a} and {a, c, b}, which is what makes this > different from pathkeys. Yeah, exactly. A superset of the unique columns is still unique. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Avoiding hash join batch explosions with extreme skew and weird stats
On Tue, Jun 18, 2019 at 3:24 PM Melanie Plageman wrote: > > These questions will probably make a lot more sense with corresponding > code, so I will follow up with the second version of the state machine > patch once I finish it. > > I have changed the state machine and resolved the questions I had raised in the previous email. This seems to work for the parallel and non-parallel cases. I have not yet rewritten the unmatched outer tuple status as a bitmap in a spill file (for ease of debugging). Before doing that, I wanted to ask what a desirable fallback condition would be. In this patch, fallback to hashloop join happens only when inserting tuples into the hashtable after batch 0 when inserting another tuple from the batch file would exceed work_mem. This means you can't increase nbatches, which, I would think is undesirable. I thought a bit about when fallback should happen. So, let's say that we would like to fallback to hashloop join when we have increased nbatches X times. At that point, since we do not want to fall back to hashloop join for all batches, we have to make a decision. After increasing nbatches the Xth time, do we then fall back for all batches for which inserting inner tuples exceeds work_mem? Do we use this strategy but work_mem + some fudge factor? Or, do we instead try to determine if data skew led us to increase nbatches both times and then determine which batch, given new nbatches, contains that data, set fallback to true only for that batch, and let all other batches use the existing logic (with no fallback option) unless they contain a value which leads to increasing nbatches X number of times? -- Melanie Plageman From 2d6fec7d2bac90a41d4ec88ad5ac2011562a14a1 Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Mon, 10 Jun 2019 10:54:42 -0700 Subject: [PATCH v2] hashloop fallback First part is to "chunk" the inner file into arbitrary partitions of work_mem size This chunks inner file and makes it so that the offset is along tuple bounds. Note that this makes it impossible to increase nbatches during the loading of batches after initial hashtable creation In preparation for doing this chunking, separate advance batch and load batch. advance batch only if page offset is reset to 0, then load that part of the batch Second part was to: implement outer tuple batch rewinding per chunk of inner batch Would be a simple rewind and replay of outer side for each chunk of inner if it weren't for LOJ. Because we need to wait to emit NULL-extended tuples for LOJ until after all chunks of inner have been processed. To do this, make a list with an entry for each outer tuple and keep track of its match status. Also, keep track of its offset so that we can access the file at that offset in case the tuples are not processed in order (like in parallel case) For non-hashloop fallback scenario, this list should not be constructed and unmatched outer tuples should be emitted as they are encountered. --- src/backend/executor/nodeHashjoin.c | 379 src/include/executor/hashjoin.h | 10 + src/include/nodes/execnodes.h | 12 + src/test/regress/expected/adaptive_hj.out | 402 ++ src/test/regress/parallel_schedule| 2 +- src/test/regress/serial_schedule | 1 + src/test/regress/sql/adaptive_hj.sql | 39 +++ 7 files changed, 770 insertions(+), 75 deletions(-) create mode 100644 src/test/regress/expected/adaptive_hj.out create mode 100644 src/test/regress/sql/adaptive_hj.sql diff --git a/src/backend/executor/nodeHashjoin.c b/src/backend/executor/nodeHashjoin.c index 8484a287e7..e46b453a9b 100644 --- a/src/backend/executor/nodeHashjoin.c +++ b/src/backend/executor/nodeHashjoin.c @@ -124,9 +124,11 @@ #define HJ_BUILD_HASHTABLE 1 #define HJ_NEED_NEW_OUTER 2 #define HJ_SCAN_BUCKET 3 -#define HJ_FILL_OUTER_TUPLE 4 -#define HJ_FILL_INNER_TUPLES 5 -#define HJ_NEED_NEW_BATCH 6 +#define HJ_FILL_INNER_TUPLES4 +#define HJ_NEED_NEW_BATCH 5 +#define HJ_NEED_NEW_INNER_CHUNK 6 +#define HJ_ADAPTIVE_EMIT_UNMATCHED_OUTER_INIT 7 +#define HJ_ADAPTIVE_EMIT_UNMATCHED_OUTER 8 /* Returns true if doing null-fill on outer relation */ #define HJ_FILL_OUTER(hjstate) ((hjstate)->hj_NullInnerTupleSlot != NULL) @@ -143,10 +145,15 @@ static TupleTableSlot *ExecHashJoinGetSavedTuple(HashJoinState *hjstate, BufFile *file, uint32 *hashvalue, TupleTableSlot *tupleSlot); -static bool ExecHashJoinNewBatch(HashJoinState *hjstate); +static bool ExecHashJoinAdvanceBatch(HashJoinState *hjstate); static bool ExecParallelHashJoinNewBatch(HashJoinState *hjstate); static void ExecParallelHashJoinPartitionOuter(HashJoinState *node); +static bool LoadInner(HashJoinState *hjstate); +static TupleTableSlot *ExecHashJoinGetOuterTupleAtOffset(HashJoinState *hjstate, off_t offset); +static void rewindOuter(BufFile *bufFile); +static TupleTableSlot * +emitUnmatchedOuterTuple(ExprState
Re: [PATCH][PROPOSAL] Add enum releation option type
On 2019-Jul-03, Nikolay Shaplov wrote: > В письме от понедельник, 1 июля 2019 г. 14:06:28 MSK пользователь Alvaro > Herrera написал: > > + GistBufferingValidMsg = gettext_noop("Valid values are \"on\", \"off\", > > and \"auto\"."); > > > > I think that's the most contentious point on this patch at this point > > (though I may be misremembering). > > I actually removed "and" from the list and let it be simple coma separated > list > > ERROR: invalid value for "check_option" option > > DETAIL: Valid values are: "local", "cascaded". > > Now we can translate left part, and subst list to the right part Yes, I saw that, and you know what? Nobody said they liked this idea. > You do not see it that way? I think this is easier to sell if you change that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Index Skip Scan
On Wed, Jul 3, 2019 at 3:46 PM David Fetter wrote: > > On Wed, Jul 03, 2019 at 12:27:09AM +1200, David Rowley wrote: > > On Tue, 2 Jul 2019 at 21:00, Thomas Munro wrote: > > > > The more I think about these UniqueKeys, the more I think they need to > > be a separate concept to PathKeys. For example, UniqueKeys: { x, y } > > should be equivalent to { y, x }, but with PathKeys, that's not the > > case, since the order of each key matters. UniqueKeys equivalent > > version of pathkeys_contained_in() would not care about the order of > > individual keys, it would say things like, { a, b, c } is contained in > > { b, a }, since if the path is unique on columns { b, a } then it must > > also be unique on { a, b, c }. > > Is that actually true, though? I can see unique {a, b, c} => unique > {a, b}, but for example: > > a | b | c > --|---|-- > 1 | 2 | 3 > 1 | 2 | 4 > 1 | 2 | 5 > > is unique on {a, b, c} but not on {a, b}, at least as I understand the > way "unique" is used here, which is 3 distinct {a, b, c}, but only one > {a, b}. > > Or I could be missing something obvious, and in that case, please > ignore. I think that example is the opposite direction of what David (Rowley) is saying. Unique on {a, b} implies unique on {a, b, c} while you're correct that the inverse doesn't hold. Unique on {a, b} also implies unique on {b, a} as well as on {b, a, c} and {c, a, b} and {c, b, a} and {a, c, b}, which is what makes this different from pathkeys. James Coleman
Re: SQL/JSON path issues/questions
Hi! On Wed, Jul 3, 2019 at 5:27 PM Liudmila Mantrova wrote: > > I have rechecked the standard and I agree that we should use "filter > expression" whenever possible. > "A filter expression must be enclosed in parentheses..." looks like an > oversight, so I fixed it. As for what's actually enclosed, I believe we > can still use the word "condition" here as it's easy to understand and > is already used in our docs, e.g. in description of the WHERE clause > that serves a similar purpose. > The new version of the patch fixes the terminology, tweaks the examples, > and provides some grammar and style fixes in the jsonpath-related chapters. It looks good to me. But this sentence looks a bit too complicated. "It can be followed by one or more accessor operators to define the JSON element on a lower nesting level by which to filter the result." Could we phrase this as following? "In order to filter the result by values lying on lower nesting level, @ operator can be followed by one or more accessor operators." -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Custom Scan coverage.
On 2019-Jul-03, didier wrote: > Currently there's 0 coverage of CustomScan code path in core. Yeah :-( > What about adding a noops custom_scan test in src/test/modules/ ? Or > is it out of pg perimeter and each extension using it should take are > of themselves? > > If there's an interest I'm willing to write and propose such test suite. We'd certainly like if there was coverage of that infrastructure. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Custom Scan coverage.
Hi, Currently there's 0 coverage of CustomScan code path in core. What about adding a noops custom_scan test in src/test/modules/ ? Or is it out of pg perimeter and each extension using it should take are of themselves? If there's an interest I'm willing to write and propose such test suite. Regards Didier.
Re: Index Skip Scan
On Wed, Jul 03, 2019 at 12:27:09AM +1200, David Rowley wrote: > On Tue, 2 Jul 2019 at 21:00, Thomas Munro wrote: > > The more I think about these UniqueKeys, the more I think they need to > be a separate concept to PathKeys. For example, UniqueKeys: { x, y } > should be equivalent to { y, x }, but with PathKeys, that's not the > case, since the order of each key matters. UniqueKeys equivalent > version of pathkeys_contained_in() would not care about the order of > individual keys, it would say things like, { a, b, c } is contained in > { b, a }, since if the path is unique on columns { b, a } then it must > also be unique on { a, b, c }. Is that actually true, though? I can see unique {a, b, c} => unique {a, b}, but for example: a | b | c --|---|-- 1 | 2 | 3 1 | 2 | 4 1 | 2 | 5 is unique on {a, b, c} but not on {a, b}, at least as I understand the way "unique" is used here, which is 3 distinct {a, b, c}, but only one {a, b}. Or I could be missing something obvious, and in that case, please ignore. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Re: POC: converting Lists into arrays
David Rowley writes: > I also did some more benchmarking of the patch. ... > Which makes the patched version 2.2% faster than master on that run. BTW, further on the subject of performance --- I'm aware of at least these topics for follow-on patches: * Fix places that are maintaining arrays parallel to Lists for access-speed reasons (at least simple_rte_array, append_rel_array, es_range_table_array). * Look at places using lcons/list_delete_first to maintain FIFO lists. The patch makes these O(N^2) for long lists. If we can reverse the list order and use lappend/list_truncate instead, it'd be better. Possibly in some places the list ordering is critical enough to make this impractical, but I suspect it's an easy win in most. * Rationalize places that are using combinations of list_copy and list_concat, probably by inventing an additional list-concatenation primitive that modifies neither input. * Adjust API for list_qsort(), as discussed, to save indirections and avoid constructing an intermediate pointer array. I also seem to recall one place in the planner that's avoiding using list_qsort by manually flattening a list into an array, qsort'ing, and rebuilding the list :-( I don't think that any one of these fixes would move the needle very much on "typical" simple workloads, but it's reasonable to hope that in aggregate they'd make for a noticeable improvement. In the meantime, I'm gratified that the initial patch at least doesn't seem to have lost any ground. regards, tom lane
Re: Replacing the EDH SKIP primes
> On 3 Jul 2019, at 12:11, Michael Paquier wrote: > It would be nice to add a comment on that though, perhaps in > libpq-be.h where the key is defined. Agreed, I’ve updated the patch with a comment on this formulated such that it should stand the test of time even as OpenSSL changes etc. cheers ./daniel skip_primes-v2.patch Description: Binary data
Re: POC: converting Lists into arrays
On 2019-Jul-03, Tom Lane wrote: > David Rowley writes: > > 10. I wonder if we can reduce a bit of pain for extension authors by > > back patching a macro that wraps up a lnext() call adding a dummy > > argument for the List. > > I was wondering about a variant of that yesterday; specifically, > I thought of naming the new 2-argument function list_next() not lnext(). > Then we could add "#define list_next(l,c) lnext(c)" in the back branches. > This would simplify back-patching code that used the new definition, and > it might save some effort for extension authors who are trying to maintain > cross-branch code. On the other hand, it's more keystrokes forevermore, > and I'm not entirely convinced that code that's using lnext() isn't likely > to need other adjustments anyway. So I didn't pull the trigger on that, > but if people like the idea I'd be okay with doing it like that. I was thinking about this issue too earlier today, and my conclusion is that the way you have it in v7 is fine, because lnext() callsites are not that numerous, so the cost to third-party code authors is not that high; the other arguments trump this consideration IMO. I say this as someone who curses every time he has to backpatch things across the heap_open / table_open change -- but there are a lot more calls of that. > Indeed. I don't want to expend a lot of effort keeping it in sync > with master over a long period, either. Opinions? Yeah, let's get it done soon. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_waldump and PREPARE
On Wed, Jul 3, 2019 at 5:21 PM Fujii Masao wrote: > > On Tue, Jul 2, 2019 at 7:16 PM Julien Rouhaud wrote: > > > > On Fri, Apr 26, 2019 at 5:38 AM Michael Paquier wrote: > > > > > > On Thu, Apr 25, 2019 at 03:08:36PM -0400, Alvaro Herrera wrote: > > > > On 2019-Apr-26, Fujii Masao wrote: > > > >> I did that arrangement because the format of PREPARE TRANSACTION > > > >> record, > > > >> i.e., that struct, also needs to be accessed in backend and frontend. > > > >> But, of course, if there is smarter way, I'm happy to adopt that! > > > > > > > > I don't know. I spent some time staring at the code involved, and it > > > > seems it'd be possible to improve just a little bit on cleanliness > > > > grounds, with a lot of effort, but not enough practical value. > > > > > > Describing those records is something we should do. There are other > > > parsing routines in xactdesc.c for commit and abort records, so having > > > that extra routine for prepare at the same place does not sound > > > strange to me. > > > > > > +typedef xl_xact_prepare TwoPhaseFileHeader; > > > I find this mapping implementation a bit lazy, and your > > > newly-introduced xl_xact_prepare does not count for all the contents > > > of the actual WAL record for PREPARE TRANSACTION. Wouldn't it be > > > better to put all the contents of the record in the same structure, > > > and not only the 2PC header information? > > > > This patch doesn't apply anymore, could you send a rebase? > > Yes, attached is the updated version of the patch. Thanks! So the patch compiles and works as intended. I don't have much to say about it, it all looks good to me, since the concerns about xactdesc.c aren't worth the trouble. I'm not sure that I understand Michael's objection though, as xl_xact_prepare is not a new definition and AFAICS it couldn't contain the records anyway. So I'll let him say if he has further objections or if it's ready for committer!
Re: Index Skip Scan
Hi Thomas and David, Thanks for the feedback ! On 7/2/19 8:27 AM, David Rowley wrote: On Tue, 2 Jul 2019 at 21:00, Thomas Munro wrote: I took this for a quick spin today. The DISTINCT ON support is nice and I think it will be very useful. I've signed up to review it and will have more to say later. But today I had a couple of thoughts after looking into how src/backend/optimizer/plan/planagg.c works and wondering how to do some more skipping tricks with the existing machinery. 1. SELECT COUNT(DISTINCT i) FROM t could benefit from this. (Or AVG(DISTINCT ...) or any other aggregate). Right now you get a seq scan, with the sort/unique logic inside the Aggregate node. If you write SELECT COUNT(*) FROM (SELECT DISTINCT i FROM t) ss then you get a skip scan that is much faster in good cases. I suppose you could have a process_distinct_aggregates() in planagg.c that recognises queries of the right form and generates extra paths a bit like build_minmax_path() does. I think it's probably better to consider that in the grouping planner proper instead. I'm not sure. I think to make that happen we'd need to do a bit of an overhaul in nodeAgg.c to allow it to make use of presorted results instead of having the code blindly sort rows for each aggregate that has a DISTINCT or ORDER BY. The planner would also then need to start requesting paths with pathkeys that suit the aggregate and also probably dictate the order the AggRefs should be evaluated to allow all AggRefs to be evaluated that can be for each sort order. Once that part is done then the aggregates could then also request paths with certain "UniqueKeys" (a feature I mentioned in [1]), however we'd need to be pretty careful with that one since there may be other parts of the query that require that all rows are fed in, not just 1 row per value of "i", e.g SELECT COUNT(DISTINCT i) FROM t WHERE z > 0; can't just feed through 1 row for each "i" value, since we need only the ones that have "z > 0". Getting the first part of this solved is much more important than making skip scans work here, I'd say. I think we need to be able to walk before we can run with DISTINCT / ORDER BY aggs. I agree that the above is outside of scope for the first patch -- I think the goal should be the simple use-cases for IndexScan and IndexOnlyScan. Maybe we should expand [1] with possible cases, so we don't lose track of the ideas. 2. SELECT i, MIN(j) FROM t GROUP BY i could benefit from this if you're allowed to go forwards. Same for SELECT i, MAX(j) FROM t GROUP BY i if you're allowed to go backwards. Those queries are equivalent to SELECT DISTINCT ON (i) i, j FROM t ORDER BY i [DESC], j [DESC] (though as Floris noted, the backwards version gives the wrong answers with v20). That does seem like a much more specific thing applicable only to MIN and MAX, and I think preprocess_minmax_aggregates() could be taught to handle that sort of query, building an index only scan path with skip scan in build_minmax_path(). For the MIN query you just need a path with Pathkeys: { i ASC, j ASC }, UniqueKeys: { i, j }, doing the MAX query you just need j DESC. Ok. The more I think about these UniqueKeys, the more I think they need to be a separate concept to PathKeys. For example, UniqueKeys: { x, y } should be equivalent to { y, x }, but with PathKeys, that's not the case, since the order of each key matters. UniqueKeys equivalent version of pathkeys_contained_in() would not care about the order of individual keys, it would say things like, { a, b, c } is contained in { b, a }, since if the path is unique on columns { b, a } then it must also be unique on { a, b, c }. I'm looking at this, and will keep this in mind. Thanks ! [1] https://wiki.postgresql.org/wiki/Loose_indexscan Best regards, Jesper
Re: proposal: pg_restore --convert-to-text
Alvaro Herrera wrote: > So you suggest that it should be > > pg_restore: error: one of -d/--dbname, -f/--file and -l/--list must be > specified > ? I'd suggest this minimal fix : (int argc, char **argv) /* Complain if neither -f nor -d was specified (except if dumping TOC) */ if (!opts->dbname && !opts->filename && !opts->tocSummary) { - pg_log_error("one of -d/--dbname and -f/--file must be specified"); + pg_log_error("-d/--dbname or -f/--file or -l/--list must be specified"); + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), + progname); exit_nicely(1); } > So you suggest that it should be > pg_restore [connection-option...] { -d | -f | -l } [option...] [filename] > ? Looking at the other commands, it doesn't seem that we use this form for any of those that require at least one argument, for instance: === $ ./pg_basebackup pg_basebackup: error: no target directory specified Try "pg_basebackup --help" for more information. $ ./pg_basebackup --help pg_basebackup takes a base backup of a running PostgreSQL server. Usage: pg_basebackup [OPTION]... $ ./pg_checksums pg_checksums: error: no data directory specified Try "pg_checksums --help" for more information. $ ./pg_checksums --help pg_checksums enables, disables, or verifies data checksums in a PostgreSQL database cluster. Usage: pg_checksums [OPTION]... [DATADIR] $ ./pg_rewind pg_rewind: error: no source specified (--source-pgdata or --source-server) Try "pg_rewind --help" for more information. $ ./pg_rewind --help pg_rewind resynchronizes a PostgreSQL cluster with another copy of the cluster. Usage: pg_rewind [OPTION]... === "pg_restore [OPTION]... [FILE]" appears to be consistent with those, even with the new condition that no option is an error, so it's probably okay. > Output target options: > -l, --list print summarized TOC of the archive > -d, --dbname=NAMEconnect to database name > -f, --file=FILENAME output file name (- for stdout) That makes sense. I would put this section first, before "General options". Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Re: Optimize partial TOAST decompression
Paul Ramsey 于2019年7月2日周二 下午10:46写道: > This looks good to me. A little commentary around why > pglz_maximum_compressed_size() returns a universally correct answer > (there's no way the compressed size can ever be larger than this > because...) would be nice for peasants like myself. > > If you're looking to continue down this code line in your next patch, > the next TODO item is a little more involved: a user-land (ala > PG_DETOAST_DATUM) iterator API for access of TOAST datums would allow > the optimization of searching of large objects like JSONB types, and > so on, where the thing you are looking for is not at a known location > in the object. So, things like looking for a particular substring in a > string, or looking for a particular key in a JSONB. "Iterate until you > find the thing." would allow optimization of some code lines that > currently require full decompression of the objects. > > P. > Thanks for your comment. I've updated the patch. As for the iterator API, I've implemented a de-TOAST iterator actually[0]. And I’m looking for more of its application scenarios and perfecting it. Any comments would be much appreciated. Best Regards, Binguo Bao. [0] https://www.postgresql.org/message-id/flat/cal-ogks_onzpc9m9bxpcztmofwulcfkyecekiagxzwrl8kx...@mail.gmail.com From 2e4e2838937ec6fa1404fe529e7ed303e391d1b2 Mon Sep 17 00:00:00 2001 From: BBG Date: Sun, 2 Jun 2019 19:18:46 +0800 Subject: [PATCH] Optimize partial TOAST decompression --- src/backend/access/heap/tuptoaster.c | 24 +--- src/common/pg_lzcompress.c | 26 ++ src/include/common/pg_lzcompress.h | 1 + 3 files changed, 44 insertions(+), 7 deletions(-) diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c index 55d6e91..684f1b2 100644 --- a/src/backend/access/heap/tuptoaster.c +++ b/src/backend/access/heap/tuptoaster.c @@ -266,6 +266,7 @@ heap_tuple_untoast_attr_slice(struct varlena *attr, if (VARATT_IS_EXTERNAL_ONDISK(attr)) { struct varatt_external toast_pointer; + int32 max_size; VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr); @@ -273,8 +274,13 @@ heap_tuple_untoast_attr_slice(struct varlena *attr, if (!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) return toast_fetch_datum_slice(attr, sliceoffset, slicelength); - /* fetch it back (compressed marker will get set automatically) */ - preslice = toast_fetch_datum(attr); + max_size = pglz_maximum_compressed_size(sliceoffset + slicelength, +toast_pointer.va_rawsize); + /* + * Be sure to get enough compressed slice + * and compressed marker will get set automatically + */ + preslice = toast_fetch_datum_slice(attr, 0, max_size); } else if (VARATT_IS_EXTERNAL_INDIRECT(attr)) { @@ -2031,7 +2037,8 @@ toast_fetch_datum(struct varlena *attr) * Reconstruct a segment of a Datum from the chunks saved * in the toast relation * - * Note that this function only supports non-compressed external datums. + * Note that this function supports non-compressed external datums + * and compressed external datum slices at the start of the object. * -- */ static struct varlena * @@ -2072,10 +2079,9 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length) VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr); /* - * It's nonsense to fetch slices of a compressed datum -- this isn't lo_* - * we can't return a compressed datum which is meaningful to toast later + * It's meaningful to fetch slices at the start of a compressed datum. */ - Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)); + Assert(!VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer) || 0 == sliceoffset); attrsize = toast_pointer.va_extsize; totalchunks = ((attrsize - 1) / TOAST_MAX_CHUNK_SIZE) + 1; @@ -2091,7 +2097,11 @@ toast_fetch_datum_slice(struct varlena *attr, int32 sliceoffset, int32 length) result = (struct varlena *) palloc(length + VARHDRSZ); - SET_VARSIZE(result, length + VARHDRSZ); + if (VARATT_EXTERNAL_IS_COMPRESSED(toast_pointer)) { + SET_VARSIZE_COMPRESSED(result, length + VARHDRSZ); + } else { + SET_VARSIZE(result, length + VARHDRSZ); + } if (length == 0) return result; /* Can save a lot of work at this point! */ diff --git a/src/common/pg_lzcompress.c b/src/common/pg_lzcompress.c index 988b398..80ed17a 100644 --- a/src/common/pg_lzcompress.c +++ b/src/common/pg_lzcompress.c @@ -771,3 +771,29 @@ pglz_decompress(const char *source, int32 slen, char *dest, */ return (char *) dp - dest; } + + + +/* -- + * pglz_max_compressed_size - + * + * Calculate the maximum size of the compressed slice corresponding to the + * raw slice. Return the maximum size, or raw size if maximum size is larger + * than raw size. + * -- + */ +int32 +pglz_maximum_compressed_size(int32 raw_slice_size, int32 raw_size) +{ + int32 result; + + /* + * Use int64 to prevent overflow during calculation. + */ + result =
Re: [PATCH v5] Show detailed table persistence in \dt+
Daniel Gustafsson writes: >> On 2 Jul 2019, at 22:35, Alvaro Herrera wrote: >> Anyway I'm not objecting to the patch -- I agree that we're already not >> testing translatability and that this patch shouldn't be forced to start >> doing it. > I forgot to add that to my previous email, the patch as it stands in v8 looks > good to me. I’ve missed having this on many occasions. OK, pushed. For the record, I did verify that the translatability logic worked by adding some bogus entries to psql/po/es.po and seeing that the display changed to match. regards, tom lane
Re: Increasing default value for effective_io_concurrency?
On Wed, Jul 3, 2019 at 11:24 AM Tomas Vondra wrote: > Maybe. And it would probably work for the systems I used for benchmarks. > > It however assumes two things: (a) the storage system actually has > spindles and (b) you know how many spindles there are. Which is becoming > less and less safe these days - flash storage becomes pretty common, and > even when there are spindles they are often hidden behind the veil of > virtualization in a SAN, or something. Yeah, that's true. > I wonder if we might provide something like pg_test_prefetch which would > measure performance with different values, similarly to pg_test_fsync. That's not a bad idea, but I'm not sure if the results that we got in a synthetic test - presumably unloaded - would be a good guide to what to use in a production situation. Maybe it would; I'm just not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Increasing default value for effective_io_concurrency?
On Wed, Jul 03, 2019 at 11:04:59AM -0400, Robert Haas wrote: On Mon, Jul 1, 2019 at 7:32 PM Andres Freund wrote: On 2019-06-29 22:15:19 +0200, Tomas Vondra wrote: > I think we should consider changing the effective_io_concurrency default > value, i.e. the guc that determines how many pages we try to prefetch in > a couple of places (the most important being Bitmap Heap Scan). Maybe we need improve the way it's used / implemented instead - it seems just too hard to determine the correct setting as currently implemented. Perhaps the translation from effective_io_concurrency to a prefetch distance, which is found in the slightly-misnamed ComputeIoConcurrency function, should be changed. The comments therein say: * Experimental results show that both of these formulas aren't aggressive * enough, but we don't really have any better proposals. Perhaps we could test experimentally what works well with N spindles and then fit a formula to that curve and stick it in here, so that our tuning is based on practice rather than theory. I'm not sure if that approach is adequate or not. It just seems like something to try. Maybe. And it would probably work for the systems I used for benchmarks. It however assumes two things: (a) the storage system actually has spindles and (b) you know how many spindles there are. Which is becoming less and less safe these days - flash storage becomes pretty common, and even when there are spindles they are often hidden behind the veil of virtualization in a SAN, or something. I wonder if we might provide something like pg_test_prefetch which would measure performance with different values, similarly to pg_test_fsync. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_waldump and PREPARE
On Tue, Jul 2, 2019 at 7:16 PM Julien Rouhaud wrote: > > On Fri, Apr 26, 2019 at 5:38 AM Michael Paquier wrote: > > > > On Thu, Apr 25, 2019 at 03:08:36PM -0400, Alvaro Herrera wrote: > > > On 2019-Apr-26, Fujii Masao wrote: > > >> I did that arrangement because the format of PREPARE TRANSACTION record, > > >> i.e., that struct, also needs to be accessed in backend and frontend. > > >> But, of course, if there is smarter way, I'm happy to adopt that! > > > > > > I don't know. I spent some time staring at the code involved, and it > > > seems it'd be possible to improve just a little bit on cleanliness > > > grounds, with a lot of effort, but not enough practical value. > > > > Describing those records is something we should do. There are other > > parsing routines in xactdesc.c for commit and abort records, so having > > that extra routine for prepare at the same place does not sound > > strange to me. > > > > +typedef xl_xact_prepare TwoPhaseFileHeader; > > I find this mapping implementation a bit lazy, and your > > newly-introduced xl_xact_prepare does not count for all the contents > > of the actual WAL record for PREPARE TRANSACTION. Wouldn't it be > > better to put all the contents of the record in the same structure, > > and not only the 2PC header information? > > This patch doesn't apply anymore, could you send a rebase? Yes, attached is the updated version of the patch. Regards, -- Fujii Masao prepare-xact-desc_v2.patch Description: Binary data
Re: Increasing default value for effective_io_concurrency?
On Mon, Jul 1, 2019 at 7:32 PM Andres Freund wrote: > On 2019-06-29 22:15:19 +0200, Tomas Vondra wrote: > > I think we should consider changing the effective_io_concurrency default > > value, i.e. the guc that determines how many pages we try to prefetch in > > a couple of places (the most important being Bitmap Heap Scan). > > Maybe we need improve the way it's used / implemented instead - it seems > just too hard to determine the correct setting as currently implemented. Perhaps the translation from effective_io_concurrency to a prefetch distance, which is found in the slightly-misnamed ComputeIoConcurrency function, should be changed. The comments therein say: * Experimental results show that both of these formulas aren't aggressive * enough, but we don't really have any better proposals. Perhaps we could test experimentally what works well with N spindles and then fit a formula to that curve and stick it in here, so that our tuning is based on practice rather than theory. I'm not sure if that approach is adequate or not. It just seems like something to try. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH v4] Add \warn to psql
Hello Tom, The point is that there would be at least *one* TAP tests so that many other features of psql, although not all, can be tested. [...] Yeah, but the point I was trying to make is that that's mostly down to laziness. Not always. I agree that using TAP test if another simpler option is available is not a good move. However, in the current state, as soon as there is some variation a test is removed and coverage is lost, but they could be kept if the check could be against a regexp. I see no reason that we couldn't be covering a lot of these features in src/test/regress/sql/psql.sql, with far less overhead. The interactive aspects of psql can't be tested that way ... but since this patch doesn't actually provide any way to test those, it's not much of a proof-of-concept. The PoC is checking against a set of regexp instead of expecting an exact output. Ok, it does not solve all possible test scenarii, that is life. IOW, the blocking factor here is not "does src/bin/psql/t/ exist", it's "has somebody written a test that moves the coverage needle meaningfully". I'm not big on adding a bunch of overhead first and just hoping somebody will do something to make it worthwhile later. I do intend to add coverage once a psql TAP test is available, as I have done with pgbench. Ok, some of the changes are still in the long CF queue, but at least pgbench coverage is around 90%. I also intend to direct submitted patches to use the TAP infra when appropriate, instead of "no tests, too bad". -- Fabien.
Re: SQL/JSON path issues/questions
On 6/28/19 6:47 AM, Alexander Korotkov wrote: On Tue, Jun 25, 2019 at 6:38 PM Liudmila Mantrova wrote: Thank you for the catch! Please see the modified version of patch 0004 attached. I tried to review and revise the part related to filters, but I failed because I don't understand the notions used in the documentation. What is the difference between filter expression and filter condition? I can guess that filter expression contains question mark, parentheses and filter condition inside. But this sentence is in contradiction with my guess: "A filter expression must be enclosed in parentheses and preceded by a question mark". So, filter expression is inside the parentheses. Then what is filter condition? The same? Each filter expression can provide one or more filters that are applied to the result of the path evaluation. So additionally to filter condition and filter expression we introduce the notion of just filter. What is it? Could we make it without introduction of new notion? -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company Hi, I have rechecked the standard and I agree that we should use "filter expression" whenever possible. "A filter expression must be enclosed in parentheses..." looks like an oversight, so I fixed it. As for what's actually enclosed, I believe we can still use the word "condition" here as it's easy to understand and is already used in our docs, e.g. in description of the WHERE clause that serves a similar purpose. The new version of the patch fixes the terminology, tweaks the examples, and provides some grammar and style fixes in the jsonpath-related chapters. -- Liudmila Mantrova Technical writer at Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 3a8581d..b0de624 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -11538,7 +11538,8 @@ table2-mapping from the JSON data, similar to XPath expressions used for SQL access to XML. In PostgreSQL, path expressions are implemented as the jsonpath - data type, described in . + data type and can use any elements described in + . JSON query functions and operators @@ -11585,7 +11586,7 @@ table2-mapping }, { "location": [ 47.706, 13.2635 ], "start time": "2018-10-14 10:39:21", -"HR": 130 +"HR": 135 } ] } } @@ -11637,23 +11638,33 @@ table2-mapping When defining the path, you can also use one or more - filter expressions, which work similar to - the WHERE clause in SQL. Each filter expression - can provide one or more filtering conditions that are applied - to the result of the path evaluation. Each filter expression must - be enclosed in parentheses and preceded by a question mark. - Filter expressions are evaluated from left to right and can be nested. - The @ variable denotes the current path evaluation - result to be filtered, and can be followed by one or more accessor - operators to define the JSON element by which to filter the result. - Functions and operators that can be used in the filtering condition - are listed in . - SQL/JSON defines three-valued logic, so the result of the filter - expression may be true, false, + filter expressions that work similar to the + WHERE clause in SQL. A filter expression begins with + a question mark and provides a condition in parentheses: + + +? (condition) + + + + + Filter expressions must be specified right after the path evaluation step + to which they are applied. The result of this step is filtered to include + only those items that satisfy the provided condition. SQL/JSON defines + three-valued logic, so the condition can be true, false, or unknown. The unknown value - plays the same role as SQL NULL. Further path + plays the same role as SQL NULL and can be tested + for with the is unknown predicate. Further path evaluation steps use only those items for which filter expressions - return true. + return true. + + + + Functions and operators that can be used in filter expressions are listed + in . The path + evaluation result to be filtered is denoted by the @ + variable. It can be followed by one or more accessor operators to define + the JSON element on a lower nesting level by which to filter the result. @@ -11667,8 +11678,8 @@ table2-mapping To get the start time of segments with such values instead, you have to filter out irrelevant segments before returning the start time, so the - filter is applied to the previous step and the path in the filtering - condition is different: + filter expression is applied to the previous step, and the path used + in the condition is different: '$.track.segments[*] ? (@.HR 130)."start time"' @@ -11693,9 +11704,9 @@ table2-mapping -
Re: [PATCH v4] Add \warn to psql
Fabien COELHO writes: > The point is that there would be at least *one* TAP tests so that many > other features of psql, although not all, can be tested. I have been > reviewing quite a few patches without tests because of this lack of > infrastructure, and no one patch is ever going to justify a TAP test on > its own. It has to start somewhere. Currently psql coverage is abysmal, > around 40% of lines & functions are called by the whole non regression > tests, despite the hundreds of psql-relying tests. Yeah, but the point I was trying to make is that that's mostly down to laziness. I see no reason that we couldn't be covering a lot of these features in src/test/regress/sql/psql.sql, with far less overhead. The interactive aspects of psql can't be tested that way ... but since this patch doesn't actually provide any way to test those, it's not much of a proof-of-concept. IOW, the blocking factor here is not "does src/bin/psql/t/ exist", it's "has somebody written a test that moves the coverage needle meaningfully". I'm not big on adding a bunch of overhead first and just hoping somebody will do something to make it worthwhile later. regards, tom lane
Re: Where is SSPI auth username determined for TAP tests?
Michael Paquier writes: > I have been reviewing that part, and the part to split the bootstrap > user from the set of extra roles created looks fine to me. Now, it > seems to me that you can simplify 010_dump_connstr.pl as per the > attached because PostgresNode::Init can take care of --auth-config > part with the correct options using auth_extra. What do you think > about the cleanup attached? I haven't checked that this actually works, but it looks plausible, and I agree it's simpler/easier. regards, tom lane
Re: Another way to fix inherited UPDATE/DELETE
Amit Langote writes: > On Wed, Feb 20, 2019 at 6:49 AM Tom Lane wrote: >> Obviously this'd be a major rewrite with no chance of making it into v12, >> but it doesn't sound too big to get done during v13. > Are you planning to work on this? It's on my list, but so are a lot of other things. If you'd like to work on it, feel free. regards, tom lane
Re: TopoSort() fix
On Tue, Jul 2, 2019 at 11:23 AM Tom Lane wrote: > Rui Hai Jiang writes: > > I think I found an issue in the TopoSort() function. > > This indeed seems like a live bug introduced by a1c1af2a. > Robert? This is pretty thoroughly swapped out of my head, but it looks like that analysis might be correct. Is it practical to come up with a test case that demonstrates the problem? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Conflict handling for COPY FROM
Hi, I'm very interested in this patch and would like to give a review within a week. On the feature side, how about simply using the less verbose "ERRORS" instead of "ERROR LIMIT" ? On Wed, Jul 3, 2019 at 1:42 PM Surafel Temesgen wrote: > Hi Alexey, > Thank you for looking at it > > On Tue, Jul 2, 2019 at 7:57 PM Alexey Kondratov < > a.kondra...@postgrespro.ru> wrote: > >> On 28.06.2019 16:12, Alvaro Herrera wrote: >> >> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund >> wrote: >> >>> Or even just return it as a row. CopyBoth is relatively widely >> supported >> >>> these days. >> >> i think generating warning about it also sufficiently meet its propose >> of >> >> notifying user about skipped record with existing logging facility >> >> and we use it for similar propose in other place too. The different >> >> i see is the number of warning that can be generated >> > Warnings seem useless for this purpose. I'm with Andres: returning rows >> > would make this a fine feature. If the user wants the rows in a table >> > as Andrew suggests, she can use wrap the whole thing in an insert. >> >> I agree with previous commentators that returning rows will make this >> feature more versatile. > > > I agree. am looking at the options > > Also, I would prefer having an option to ignore all errors, e.g. with >> option ERROR_LIMIT set to -1. Because it is rather difficult to estimate >> a number of future errors if you are playing with some badly structured >> data, while always setting it to 100500k looks ugly. >> >> > Good idea > > I also +1 having an option to ignore all errors. Other RDBMS might use a large number, but "-1" seems cleaner so far. > > >> 1) Calculation of processed rows isn't correct (I've checked). You do it >> in two places, and >> >> -processed++; >> +if (!cstate->error_limit) >> +processed++; >> >> is never incremented if ERROR_LIMIT is specified and no errors >> occurred/no constraints exist, so the result will always be 0. However, >> if primary column with constraints exists, then processed is calculated >> correctly, since another code path is used: >> >> > Correct. i will fix > > +if (specConflict) >> +{ >> +... >> +} >> +else >> +processed++; >> >> I would prefer this calculation in a single place (as it was before >> patch) for simplicity and in order to avoid such problems. >> >> > ok > > >> 2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT >> is specified and was exceeded, which doesn't seem to be correct, does it? >> >> -if (resultRelInfo->ri_NumIndices > 0) >> +if (resultRelInfo->ri_NumIndices > 0 && >> cstate->error_limit == 0) >> recheckIndexes = >> ExecInsertIndexTuples(myslot, >> >> > No it alwase executed . I did it this way to avoid > inserting index tuple twice but i see its unlikely > > >> 3) Trailing whitespaces added to error messages and tests for some reason: >> >> +ereport(WARNING, >> +(errcode(ERRCODE_BAD_COPY_FILE_FORMAT), >> + errmsg("skipping \"%s\" --- missing data >> for column \"%s\" ", >> >> +ereport(ERROR, >> +(errcode(ERRCODE_BAD_COPY_FILE_FORMAT), >> + errmsg("missing data for column \"%s\" ", >> >> -ERROR: missing data for column "e" >> +ERROR: missing data for column "e" >> CONTEXT: COPY x, line 1: "20002302323" >> >> -ERROR: missing data for column "e" >> +ERROR: missing data for column "e" >> CONTEXT: COPY x, line 1: "2001231\N\N" >> >> > > regards > Surafel > Thanks, Anthony
pguint Installation error in PostgreSQL server version 11.2
Hi Everyone, I have installed PostgresSQL 11.2 version on Centos 7 and try to install *pguint* from source to install *TINYINT* datatype . But installation had problem not able to resolve dependency packages. I have followed below method to install , please help me to resolve this issue. 1. yum install centos-release-scl-rh 2.yum install llvm-toolset-7-clang-tools-extra 3.yum install devtoolset-7 4.yum install llvm-toolset-5 5.make PG_CONFIG=/usr/pgsql-11/bin/pg_config 6.make PG_CONFIG=/usr/pgsql-11/bin/pg_config install Installed llvm5.0 packages successfully but when try to create pg extension getting below error , *[postgres] # CREATE EXTENSION uint;ERROR: XX000: could not load library "/usr/pgsql-11/lib/uint.so": /usr/pgsql-11/lib/uint.so: undefined symbol: GET_1_BYTELOCATION: internal_load_library, dfmgr.c:240Time: 17.247 ms* Please help me to resolve this issue. Regards, Suresh Seema -- Thanks & Regards Suresh S
Re: benchmarking Flex practices
On Wed, Jul 3, 2019 at 5:35 AM Tom Lane wrote: > > John Naylor writes: > > 0001 is a small patch to remove some unneeded generality from the > > current rules. This lowers the number of elements in the yy_transition > > array from 37045 to 36201. > > I don't particularly like 0001. The two bits like this > > -whitespace ({space}+|{comment}) > +whitespace ({space}|{comment}) > > seem likely to create performance problems for runs of whitespace, in that > the lexer will now have to execute the associated action once per space > character not just once for the whole run. Okay. > There are a bunch of higher-order productions that use "{whitespace}*", > which is surely a bit redundant given the contents of {whitespace}. > But maybe we could address that by replacing "{whitespace}*" with > "{opt_whitespace}" defined as > > opt_whitespace ({space}*|{comment}) > > Not sure what impact if any that'd have on table size, but I'm quite sure > that {whitespace} was defined with an eye to avoiding unnecessary > lexer action cycles. It turns out that {opt_whitespace} as defined above is not equivalent to {whitespace}* , since the former is either a single comment or a single run of 0 or more whitespace chars (if I understand correctly). Using {opt_whitespace} for the UESCAPE rules on top of v3-0002, the regression tests pass, but queries like this fail with a syntax error: # select U&'d!0061t!+61' uescape --comment '!'; There was in fact a substantial size reduction, though, so for curiosity's sake I tried just replacing {whitespace}* with {space}* in the UESCAPE rules, and the table shrank from 30367 (that's with 0002 only) to 24661. > As for the other two bits that are like > > -. { > - /* This is only needed for \ just > before EOF */ > +\\ { > > my recollection is that those productions are defined that way to avoid a > flex warning about not all possible input characters being accounted for > in the (resp. ) state. Maybe that warning is > flex-version-dependent, or maybe this was just a worry and not something > that actually produced a warning ... but I'm hesitant to change it. > If we ever did get to flex's default action, that action is to echo the > current input character to stdout, which would be Very Bad. FWIW, I tried Flex 2.5.35 and 2.6.4 with no warnings, and I did get a warning when I deleted any of those two rules. I'll leave them out for now, since this change was only good for ~500 fewer elements in the transition array. > As far as I can see, the point of 0002 is to have just one set of > flex rules for the various variants of quotecontinue processing. > That sounds OK, though I'm a bit surprised it makes this much difference > in the table size. I would suggest that "state_before" needs a less > generic name (maybe "state_before_xqs"?) and more than no comment. > Possibly more to the point, it's not okay to have static state variables > in the core scanner, so that variable needs to be kept in yyextra. > (Don't remember offhand whether it's any more acceptable in the other > scanners.) Ah yes, I got this idea from the ECPG scanner, which is not reentrant. Will fix. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Conflict handling for COPY FROM
Hi Alexey, Thank you for looking at it On Tue, Jul 2, 2019 at 7:57 PM Alexey Kondratov wrote: > On 28.06.2019 16:12, Alvaro Herrera wrote: > >> On Wed, Feb 20, 2019 at 7:04 PM Andres Freund > wrote: > >>> Or even just return it as a row. CopyBoth is relatively widely > supported > >>> these days. > >> i think generating warning about it also sufficiently meet its propose > of > >> notifying user about skipped record with existing logging facility > >> and we use it for similar propose in other place too. The different > >> i see is the number of warning that can be generated > > Warnings seem useless for this purpose. I'm with Andres: returning rows > > would make this a fine feature. If the user wants the rows in a table > > as Andrew suggests, she can use wrap the whole thing in an insert. > > I agree with previous commentators that returning rows will make this > feature more versatile. I agree. am looking at the options Also, I would prefer having an option to ignore all errors, e.g. with > option ERROR_LIMIT set to -1. Because it is rather difficult to estimate > a number of future errors if you are playing with some badly structured > data, while always setting it to 100500k looks ugly. > > Good idea > 1) Calculation of processed rows isn't correct (I've checked). You do it > in two places, and > > -processed++; > +if (!cstate->error_limit) > +processed++; > > is never incremented if ERROR_LIMIT is specified and no errors > occurred/no constraints exist, so the result will always be 0. However, > if primary column with constraints exists, then processed is calculated > correctly, since another code path is used: > > Correct. i will fix +if (specConflict) > +{ > +... > +} > +else > +processed++; > > I would prefer this calculation in a single place (as it was before > patch) for simplicity and in order to avoid such problems. > > ok > 2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT > is specified and was exceeded, which doesn't seem to be correct, does it? > > -if (resultRelInfo->ri_NumIndices > 0) > +if (resultRelInfo->ri_NumIndices > 0 && > cstate->error_limit == 0) > recheckIndexes = > ExecInsertIndexTuples(myslot, > > No it alwase executed . I did it this way to avoid inserting index tuple twice but i see its unlikely > 3) Trailing whitespaces added to error messages and tests for some reason: > > +ereport(WARNING, > +(errcode(ERRCODE_BAD_COPY_FILE_FORMAT), > + errmsg("skipping \"%s\" --- missing data > for column \"%s\" ", > > +ereport(ERROR, > +(errcode(ERRCODE_BAD_COPY_FILE_FORMAT), > + errmsg("missing data for column \"%s\" ", > > -ERROR: missing data for column "e" > +ERROR: missing data for column "e" > CONTEXT: COPY x, line 1: "20002302323" > > -ERROR: missing data for column "e" > +ERROR: missing data for column "e" > CONTEXT: COPY x, line 1: "2001231\N\N" > > regards Surafel
Re: FETCH FIRST clause WITH TIES option
Hi Erik, Thank you for looking at it. Can you have a look? > > It is because of the patch didn't handle that edge case. attache patch contain a fix for it regards Surafel diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 06d611b64c..e83d309c5b 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start [ ROW | ROWS ] ] -[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ] +[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } ] [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] where from_item can be one of: @@ -1430,7 +1430,7 @@ OFFSET start which PostgreSQL also supports. It is: OFFSET start { ROW | ROWS } -FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY +FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } { ONLY | WITH TIES } In this syntax, the start or count value is required by @@ -1440,7 +1440,10 @@ FETCH { FIRST | NEXT } [ count ] { ambiguity. If count is omitted in a FETCH clause, it defaults to 1. -ROW +ROW . +WITH TIES option is used to return two or more rows +that tie for the last place in the result set according to ORDER BY +clause (ORDER BY clause must be specified in this case). and ROWS as well as FIRST and NEXT are noise words that don't influence the effects of these clauses. diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c index baa669abe8..cf557ecb1c 100644 --- a/src/backend/executor/nodeLimit.c +++ b/src/backend/executor/nodeLimit.c @@ -41,6 +41,7 @@ static TupleTableSlot * /* return: a tuple or NULL */ ExecLimit(PlanState *pstate) { LimitState *node = castNode(LimitState, pstate); + ExprContext *econtext = node->ps.ps_ExprContext; ScanDirection direction; TupleTableSlot *slot; PlanState *outerPlan; @@ -102,6 +103,16 @@ ExecLimit(PlanState *pstate) node->lstate = LIMIT_EMPTY; return NULL; } + +/* + * Tuple at limit is needed for comparation in subsequent execution + * to detect ties. + */ +if (node->limitOption == WITH_TIES && + node->position - node->offset == node->count - 1) +{ + ExecCopySlot(node->last_slot, slot); +} node->subSlot = slot; if (++node->position > node->offset) break; @@ -126,12 +137,16 @@ ExecLimit(PlanState *pstate) { /* * Forwards scan, so check for stepping off end of window. If - * we are at the end of the window, return NULL without - * advancing the subplan or the position variable; but change - * the state machine state to record having done so. + * we are at the end of the window, the behavior depends whether + * ONLY or WITH TIES was specified. In case of ONLY, we return + * NULL without advancing the subplan or the position variable; + * but change the state machine state to record having done so. + * In the WITH TIES mode, we need to advance the subplan until + * we find the first row with different ORDER BY pathkeys. */ if (!node->noCount && - node->position - node->offset >= node->count) + node->position - node->offset >= node->count && + node->limitOption == EXACT_NUMBER) { node->lstate = LIMIT_WINDOWEND; @@ -144,18 +159,69 @@ ExecLimit(PlanState *pstate) return NULL; } +else if (!node->noCount && + node->position - node->offset >= node->count && + node->limitOption == WITH_TIES) +{ + /* + * Get next tuple from subplan, if any. + */ + slot = ExecProcNode(outerPlan); + if (TupIsNull(slot)) + { + node->lstate = LIMIT_SUBPLANEOF; + return NULL; + } + /* + * Test if the new tuple and the last tuple match. + * If so we return the tuple. + */ + econtext->ecxt_innertuple = slot; + econtext->ecxt_outertuple = node->last_slot; + if (ExecQualAndReset(node->eqfunction, econtext)) + { + node->subSlot = slot; + node->position++; + } + else + { + node->lstate = LIMIT_WINDOWEND; + + /* + * If we know we won't need to back up, we can release + * resources at this point. + */ + if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD)) + (void) ExecShutdownNode(outerPlan); + + return NULL; + } -/* - * Get next tuple from subplan, if any. - */ -slot = ExecProcNode(outerPlan); -if (TupIsNull(slot)) +} +else { - node->lstate = LIMIT_SUBPLANEOF; - return NULL; + /* + * Get next tuple from subplan, if any. + */ + slot = ExecProcNode(outerPlan); + if (TupIsNull(slot)) + { + node->lstate = LIMIT_SUBPLANEOF; + return NULL;
Re: Replacing the EDH SKIP primes
On Wed, Jul 03, 2019 at 10:56:41AM +0200, Daniel Gustafsson wrote: > OpenSSL provides DH_check() which we use in load_dh_file() to ensure that the > user is passing a valid prime in the DH file. Adding this to the hardcoded > blob seems overkill though, once the validity has been verified before it > being > committed. Agreed, and I didn't notice this check... There could be an argument for having DH_check within an assertion block but as this changes once per decade there is little value. > A DH param in PEM (or DER) format can be checked with the openssl dhparam > tool. > Assuming the PEM is extracted from the patch into a file, one can do: > > openssl dhparam -inform PEM -in /tmp/dh.pem -check -text > > The prime is returned and can be diffed against the one in the RFC. If you > modify the blob you will see that the check complains about it not being > prime. Ah, thanks. I can see that the new key matches the RFC. > There is an expected warning in the output however: "the g value is not a > generator” (this is also present when subjecting the PEM for the 2048 MODP in > OpenSSL). Indeed, I saw that also from OpenSSL. That looks to come from dh.c (there are two other code paths, haven't checked in details). Thanks for the pointer. > I’m far from a cryptographer, but AFAICT from reading it essentially means > that > the RFC authors chose to limit the search space of the shared secret rather > than leaking a bit of it, and OpenSSL has chosen in DH_check() that leaking a > bit is preferrable. (This makes me wonder if we should downgrade the check in > load_dh_file() "codes & DH_NOT_SUITABLE_GENERATOR” to WARNING, but the lack of > reports of it being a problem either shows that most people are just using > openssl dhparam generated parameters which can leak a bit, or aren’t using DH > files at all.) Yeah, no objections per the arguments from the RFC. I am not sure if we actually need to change that part though. We'd still get a complaint for a key which is not a prime, and for the default one this is not something we care much about, because we know its properties, no? It would be nice to add a comment on that though, perhaps in libpq-be.h where the key is defined. -- Michael signature.asc Description: PGP signature
Re: Attached partition not considering altered column properties of root partition.
Thanks Amit for the fix patch, I have applied the patch and verified the issue. The attached partition with altered column properties shows error as below: postgres=# alter table p attach partition p2 for values in (2); psql: ERROR: child table "p2" has different storage option for column "b" than parent DETAIL: EXTENDED versus MAIN Thanks, Prabhat Sahu On Wed, Jul 3, 2019 at 7:23 AM Amit Langote wrote: > Hi Prabhat, > > On Tue, Jul 2, 2019 at 5:12 PM Prabhat Sahu > wrote: > > > > Hi, > > > > In below testcase when I changed the staorage option for root partition, > newly attached partition not including the changed staorage option. > > Is this an expected behavior? > > Thanks for the report. This seems like a bug. Documentation claims > that the child tables inherit column storage options from the parent > table. That's actually enforced in only some cases. > > 1. If you create the child table as a child to begin with (that is, > not attach it as child after the fact): > > create table parent (a text); > create table child () inherits (parent); > select attrelid::regclass, attname, attstorage from pg_attribute where > attrelid in ('parent'::regclass, 'child'::regclass) and attname = 'a'; > attrelid │ attname │ attstorage > ──┼─┼ > parent │ a │ x > child│ a │ x > (2 rows) > > > 2. If you change the parent's column's storage option, child's column > is recursively changed. > > alter table parent alter a set storage main; > select attrelid::regclass, attname, attstorage from pg_attribute where > attrelid in ('parent'::regclass, 'child'::regclass) and attname = 'a'; > attrelid │ attname │ attstorage > ──┼─┼ > parent │ a │ m > child│ a │ m > (2 rows) > > However, we fail to enforce the rule when the child is attached after the > fact: > > create table child2 (a text); > alter table child2 inherit parent; > select attrelid::regclass, attname, attstorage from pg_attribute where > attrelid in ('parent'::regclass, 'child'::regclass, > 'child2'::regclass) and attname = 'a'; > attrelid │ attname │ attstorage > ──┼─┼ > parent │ a │ m > child│ a │ m > child2 │ a │ x > (3 rows) > > To fix this, MergeAttributesIntoExisting() should check that the > attribute options of a child don't conflict with the parent, which the > attached patch implements. Note that partitioning uses the same code > as inheritance, so the fix applies to it too. After the patch: > > create table p (a int, b text) partition by list (a); > create table p1 partition of p for values in (1); > select attrelid::regclass, attname, attstorage from pg_attribute where > attrelid in ('p'::regclass, 'p1'::regclass) and attname = 'b'; > attrelid │ attname │ attstorage > ──┼─┼ > p│ b │ x > p1 │ b │ x > (2 rows) > > alter table p alter b set storage main; > select attrelid::regclass, attname, attstorage from pg_attribute where > attrelid in ('p'::regclass, 'p1'::regclass) and attname = 'b'; > attrelid │ attname │ attstorage > ──┼─┼ > p│ b │ m > p1 │ b │ m > (2 rows) > > create table p2 (like p); > select attrelid::regclass, attname, attstorage from pg_attribute where > attrelid in ('p'::regclass, 'p1'::regclass, 'p2'::regclass) and > attname = 'b'; > attrelid │ attname │ attstorage > ──┼─┼ > p│ b │ m > p1 │ b │ m > p2 │ b │ x > (3 rows) > > alter table p attach partition p2 for values in (2); > ERROR: child table "p2" has different storage option for column "b" than > parent > DETAIL: EXTENDED versus MAIN > > -- ok after changing p2 to match > alter table p2 alter b set storage main; > alter table p attach partition p2 for values in (2); > > Thanks, > Amit
Re: [PATCH] Speedup truncates of relation forks
On 7/1/19 12:55 PM, Jamison, Kirk wrote: > On Wednesday, June 26, 2019 6:10 PM(GMT+9), Adrien Nayrat wrote: >> As far as I remember, you should see "relation" wait events (type lock) on >> standby server. This is due to startup process acquiring AccessExclusiveLock >> for the truncation and other backend waiting to acquire a lock to read the >> table. > > Hi Adrien, thank you for taking time to reply. > > I understand that RelationTruncate() can block read-only queries on > standby during redo. However, it's difficult for me to reproduce the > test case where I need to catch that wait for relation lock, because > one has to execute SELECT within the few milliseconds of redoing the > truncation of one table. Yes, that why your test by measuring vacuum execution time is better as it is more reproductible. > > Instead, I just measured the whole recovery time, smgr_redo(), > to show the recovery improvement compared to head. Please refer below. > > [Recovery Test] > I used the same stored functions and configurations in the previous email > & created "test" db. > > $ createdb test > $ psql -d test > > 1. [Primary] Create 10,000 relations. > test=# SELECT create_tables(1); > > 2. [P] Insert one row in each table. > test=# SELECT insert_tables(1); > > 3. [P] Delete row of each table. > test=# SELECT delfrom_tables(1); > > 4. [Standby] WAL application is stopped at Standby server. > test=# SELECT pg_wal_replay_pause(); > > 5. [P] VACUUM is executed at Primary side, and measure its execution time. > > test=# \timing on > test=# VACUUM; > > Alternatively, you may use: > $ time psql -d test -c 'VACUUM;' > (Note: WAL has not replayed on standby because it's been paused.) > > 6. [P] Wait until VACUUM has finished execution. Then, stop primary server. > test=# pg_ctl stop -w > > 7. [S] Resume WAL replay, then promote standby (failover). > I used a shell script to execute recovery & promote standby server > because it's kinda difficult to measure recovery time. Please refer to the > script below. > - "SELECT pg_wal_replay_resume();" is executed and the WAL application is > resumed. > - "pg_ctl promote" to promote standby. > - The time difference of "select pg_is_in_recovery();" from "t" to "f" is > measured. > > shell script: > > PGDT=/path_to_storage_directory/ > > if [ "$1" = "resume" ]; then > psql -c "SELECT pg_wal_replay_resume();" test > date +%Y/%m/%d_%H:%M:%S.%3N > pg_ctl promote -D ${PGDT} > set +x > date +%Y/%m/%d_%H:%M:%S.%3N > while [ 1 ] > do > RS=`psql -Atc "select pg_is_in_recovery();" test` > if [ ${RS} = "f" ]; then > break > fi > done > date +%Y/%m/%d_%H:%M:%S.%3N > set -x > exit 0 > fi > > > [Test Results] > shared_buffers = 24GB > > 1. HEAD > (wal replay resumed) > 2019/07/01_08:48:50.326 > server promoted > 2019/07/01_08:49:50.482 > 2019/07/01_09:02:41.051 > > Recovery Time: > 13 min 50.725 s -> Time difference from WAL replay to complete recovery > 12 min 50.569 s -> Time difference of "select pg_is_in_recovery();" from "t" > to "f" > > 2. PATCH > (wal replay resumed) > 2019/07/01_07:34:26.766 > server promoted > 2019/07/01_07:34:57.790 > 2019/07/01_07:34:57.809 > > Recovery Time: > 31.043 s -> Time difference from WAL replay to complete recovery > 00.019 s -> Time difference of "select pg_is_in_recovery();" from "t" to "f" > > [Conclusion] > The recovery time significantly improved compared to head > from 13 minutes to 30 seconds. > > Any thoughts? > I'd really appreciate your comments/feedback about the patch and/or test. > > Thanks for the time you spend on this test, it is a huge win! Although creating 10k tables and deleting tuples is not a common use case, it is still good to know how your patch performs. I will try to look deeper in your patch, but my knowledge on postgres internal are limited :) -- Adrien signature.asc Description: OpenPGP digital signature
Re: [PATCH][PROPOSAL] Add enum releation option type
В письме от понедельник, 1 июля 2019 г. 14:06:28 MSK пользователь Alvaro Herrera написал: > It strikes me that the way to avoid sentence construction is to have > each enum reloption declare a string that it uses to list the values it > accepts. So for example we would have > > +#define GIST_OPTION_BUFFERING_ENUM_DEF { \ > + { "on", GIST_OPTION_BUFFERING_ON }, \ > + { "off",GIST_OPTION_BUFFERING_OFF },\ > + { "auto", GIST_OPTION_BUFFERING_AUTO }, \ > + { (const char *) NULL, 0 } \ > +} > + > + GistBufferingValidMsg = gettext_noop("Valid values are \"on\", \"off\", > and \"auto\"."); > > I think that's the most contentious point on this patch at this point > (though I may be misremembering). I actually removed "and" from the list and let it be simple coma separated list ERROR: invalid value for "check_option" option DETAIL: Valid values are: "local", "cascaded". Now we can translate left part, and subst list to the right part errdetail("Valid values are: %s.", buf.data))); It is not that nice as before, but quite acceptable, as I see it. You do not see it that way?
Re: [PATCH][PROPOSAL] Add enum releation option type
В письме от понедельник, 1 июля 2019 г. 21:25:30 MSK пользователь Thomas Munro написал: > > It's not clear to me that all of Michael's and Álvaro's issues have been > > addressed, so I've marked this Waiting on Author. > To help reviewers for the Commitfest that is starting, could you > please rebase this patch? Oh, yes, sure, thank you for reminding. diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c index 5773021..fffab3a 100644 --- a/src/backend/access/common/reloptions.c +++ b/src/backend/access/common/reloptions.c @@ -433,32 +433,41 @@ static relopt_real realRelOpts[] = {{NULL}} }; -static relopt_string stringRelOpts[] = +static relopt_enum_elt_def gist_buffering_enum_def[] = +GIST_OPTION_BUFFERING_ENUM_DEF; +static relopt_enum_elt_def view_check_option_enum_def[] = +VIEW_OPTION_CHECK_OPTION_ENUM_DEF; +static relopt_enum enumRelOpts[] = { { { "buffering", - "Enables buffering build for this GiST index", - RELOPT_KIND_GIST, - AccessExclusiveLock +"Enables buffering build for this GiST index", +RELOPT_KIND_GIST, +AccessExclusiveLock }, - 4, - false, - gistValidateBufferingOption, - "auto" + gist_buffering_enum_def, + GIST_OPTION_BUFFERING_AUTO }, { { "check_option", - "View has WITH CHECK OPTION defined (local or cascaded).", - RELOPT_KIND_VIEW, - AccessExclusiveLock +"View has WITH CHECK OPTION defined (local or cascaded).", +RELOPT_KIND_VIEW, +AccessExclusiveLock }, - 0, - true, - validateWithCheckOption, - NULL + view_check_option_enum_def, + VIEW_OPTION_CHECK_OPTION_NOT_SET }, + { + { + NULL + } + } +}; + +static relopt_string stringRelOpts[] = +{ /* list terminator */ {{NULL}} }; @@ -505,6 +514,12 @@ initialize_reloptions(void) realRelOpts[i].gen.lockmode)); j++; } + for (i = 0; enumRelOpts[i].gen.name; i++) + { + Assert(DoLockModesConflict(enumRelOpts[i].gen.lockmode, + enumRelOpts[i].gen.lockmode)); + j++; + } for (i = 0; stringRelOpts[i].gen.name; i++) { Assert(DoLockModesConflict(stringRelOpts[i].gen.lockmode, @@ -543,6 +558,14 @@ initialize_reloptions(void) j++; } + for (i = 0; enumRelOpts[i].gen.name; i++) + { + relOpts[j] = [i].gen; + relOpts[j]->type = RELOPT_TYPE_ENUM; + relOpts[j]->namelen = strlen(relOpts[j]->name); + j++; + } + for (i = 0; stringRelOpts[i].gen.name; i++) { relOpts[j] = [i].gen; @@ -640,6 +663,9 @@ allocate_reloption(bits32 kinds, int type, const char *name, const char *desc) case RELOPT_TYPE_REAL: size = sizeof(relopt_real); break; + case RELOPT_TYPE_ENUM: + size = sizeof(relopt_enum); + break; case RELOPT_TYPE_STRING: size = sizeof(relopt_string); break; @@ -719,6 +745,24 @@ add_real_reloption(bits32 kinds, const char *name, const char *desc, double defa } /* + * add_enuum_reloption + * Add a new enum reloption + */ +void +add_enum_reloption(bits32 kinds, const char *name, const char *desc, + relopt_enum_elt_def *members, int default_val) +{ + relopt_enum *newoption; + + newoption = (relopt_enum *) allocate_reloption(kinds, RELOPT_TYPE_ENUM, + name, desc); + newoption->members = members; + newoption->default_val = default_val; + + add_reloption((relopt_gen *) newoption); +} + +/* * add_string_reloption * Add a new string reloption * @@ -1234,6 +1278,51 @@ parse_one_reloption(relopt_value *option, char *text_str, int text_len, optreal->min, optreal->max))); } break; + case RELOPT_TYPE_ENUM: + { +relopt_enum *optenum = (relopt_enum *) option->gen; +relopt_enum_elt_def *elt; + +parsed = false; +for (elt = optenum->members; elt->string_val; elt++) +{ + if (pg_strcasecmp(value, elt->string_val) == 0) + { + option->values.enum_val = elt->symbol_val; + parsed = true; + break; + } +} +if (!parsed) +{ + /* + * If value is not among the allowed string values, but we + * are not asked to validate, just use default numeric + * value. Otherwise raise an error. + */ + if (!validate) + option->values.enum_val = optenum->default_val; + else + { + StringInfoData buf; + + initStringInfo(); + for (elt = optenum->members; elt->string_val; elt++) + { + appendStringInfo(, "\"%s\"", elt->string_val); + if (elt[1].string_val) +appendStringInfo(, ", "); + } + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid value for \"%s\" option", + option->gen->name), + errdetail("Valid values are: %s.", buf.data))); + pfree(buf.data); + } +} + } + break; case RELOPT_TYPE_STRING: { relopt_string *optstring = (relopt_string *) option->gen; @@ -1327,6 +1416,11 @@ fillRelOptions(void *rdopts, Size basesize, options[i].values.real_val : ((relopt_real *) options[i].gen)->default_val;
Re: postgres_fdw: Minor improvement to postgresAcquireSampleRowsFunc
On Fri, Jun 28, 2019 at 7:15 PM Etsuro Fujita wrote: > On Fri, Jun 28, 2019 at 6:54 PM Julien Rouhaud wrote: > > On Fri, Jun 28, 2019 at 11:39 AM Etsuro Fujita > > wrote: > > > In postgresAcquireSampleRowsFunc, we 1) determine the fetch size and > > > then 2) construct the fetch command in each iteration of fetching some > > > rows from the remote, but that would be totally redundant. > > > > Indeed. > > > > > Attached > > > is a patch for removing that redundancy. > > > > It all looks good to me! I marked it as ready for committer. > > Cool! I'll commit the patch if there are no objections. Thanks for > reviewing! Pushed. Best regards, Etsuro Fujita
Re: Replacing the EDH SKIP primes
> On 2 Jul 2019, at 09:49, Michael Paquier wrote: > On Tue, Jul 02, 2019 at 08:14:25AM +0100, Peter Eisentraut wrote: >> >> I was wondering whether the provided binary blob contained any checksums >> or other internal checks. How would we know whether it contains >> transposed characters or replaces a 1 by a I or a l? If I just randomly >> edit the blob, the ssl tests still pass. (The relevant load_dh_buffer() >> call does get called by the tests.) How can we make sure we actually >> got a good copy? >> > > PEM_read_bio_DHparams() has some checks on the Diffie-Hellman key, but > it is up to the caller to make sure that it is normally providing a > prime number in this case to make the cracking harder, no? OpenSSL provides DH_check() which we use in load_dh_file() to ensure that the user is passing a valid prime in the DH file. Adding this to the hardcoded blob seems overkill though, once the validity has been verified before it being committed. > RFC 3526 > has a small formula in this case, which we can use to double-check the > patch. A DH param in PEM (or DER) format can be checked with the openssl dhparam tool. Assuming the PEM is extracted from the patch into a file, one can do: openssl dhparam -inform PEM -in /tmp/dh.pem -check -text The prime is returned and can be diffed against the one in the RFC. If you modify the blob you will see that the check complains about it not being prime. There is an expected warning in the output however: "the g value is not a generator” (this is also present when subjecting the PEM for the 2048 MODP in OpenSSL). From reading RFC2412 which outlines how the primes are generated, this is by design. In Appendix E: "Because these two primes are congruent to 7 (mod 8), 2 is a quadratic residue of each prime. All powers of 2 will also be quadratic residues. This prevents an opponent from learning the low order bit of the Diffie-Hellman exponent (AKA the subgroup confinement problem). Using 2 as a generator is efficient for some modular exponentiation algorithms. [Note that 2 is technically not a generator in the number theory sense, because it omits half of the possible residues mod P. From a cryptographic viewpoint, this is a virtue.]" I’m far from a cryptographer, but AFAICT from reading it essentially means that the RFC authors chose to limit the search space of the shared secret rather than leaking a bit of it, and OpenSSL has chosen in DH_check() that leaking a bit is preferrable. (This makes me wonder if we should downgrade the check in load_dh_file() "codes & DH_NOT_SUITABLE_GENERATOR” to WARNING, but the lack of reports of it being a problem either shows that most people are just using openssl dhparam generated parameters which can leak a bit, or aren’t using DH files at all.) cheers ./daniel
Re: MSVC Build support with visual studio 2019
On Wed, 3 Jul 2019 at 10:01, Michael Paquier wrote: > On Tue, Jul 02, 2019 at 02:10:11PM +0900, Michael Paquier wrote: > > OK, committed to HEAD for now after perltidy'ing the patch. Let's see > > what the buildfarm has to say about it first. Once we are sure that > > the thing is stable, I'll try to backpatch it. This works on my own > > dev machines with VS 2015 and 2019, but who knows what hides in the > > shadows... > > The buildfarm did not have much to say, so backpatched down to 9.4, > adjusting things on the way. Thanks Michael. Regards, Haribabu Kommi
Re: Run-time pruning for ModifyTable
On Wed, Jul 3, 2019 at 4:34 PM David Rowley wrote: > On Wed, 3 Jul 2019 at 17:27, Amit Langote wrote: > > A cursory look at the patch suggests that most of its changes will be > > for nothing if [1] materializes. What do you think about that? > > Yeah, I had this in mind when writing the patch, but kept going > anyway. I think it's only really a small patch of this patch that > would get wiped out with that change. Just the planner.c stuff. > Everything else is still required, as far as I understand. If I understand the details of [1] correctly, ModifyTable will no longer have N subplans for N result relations as there are today. So, it doesn't make sense for ModifyTable to contain PartitionedRelPruneInfos and for ExecInitModifyTable/ExecModifyTable to have to perform initial and execution-time pruning, respectively. As I said, bottom expansion of target inheritance will mean pruning (both plan-time and run-time) will occur at the bottom too, so the run-time pruning capabilities of nodes that already have it will be used for UPDATE and DELETE too. Thanks, Amit
RE: extension patch of CREATE OR REPLACE TRIGGER
Dear Mr. Thomas > The July Commitfest is now beginning. To give the patch the best chance of > attracting reviewers, could you please post a rebased version? The last > version > doesn't apply. I really appreciate your comments. Recently, I'm very busy because of my work. So, I'll fix this within a couple of weeks. Regards, Osumi Takamichi
Re: FETCH FIRST clause WITH TIES option
On 2019-07-01 19:38, Surafel Temesgen wrote: Thank you for informing. attach is a rebased patch against current master [...] [fetch_first_with_ties_v10.patch] Hi Surafel, The patch applies OK, make check is OK, compiles OK. But I get: TRAP: FailedAssertion("!(!(((slot)->tts_flags & (1 << 1)) != 0))", File: "execTuples.c", Line: 491) when running a variant ('fetch 1' instead of 'fetch 2') of the test SQL against src/test/regress/data/onek.data: (in the script below the location of the file 'onek.data' will have to be changed) - 8< - #!/bin/bash echo " drop table if exists onek ; create table onek ( unique1 int4, unique2 int4, two int4, four int4, ten int4, twenty int4, hundred int4, thousand int4, twothousand int4, fivethousint4, tenthous int4, odd int4, even int4, stringu1 name, stringu2 name, string4 name ); copy onek from '/home/aardvark/pg_stuff/pg_sandbox/pgsql.fetch_first_with_ties/src/test/regress/data/onek.data'; create index onek_unique1 on onek using btree(unique1 int4_ops); create index onek_unique2 on onek using btree(unique2 int4_ops); create index onek_hundred on onek using btree(hundred int4_ops); create index onek_stringu1 on onek using btree(stringu1 name_ops); -- OK: select * from onek where thousand < 5 order by thousand fetch first 1 rows only ; -- crashes: select * from onek where thousand < 5 order by thousand fetch first 1 rows with ties ; " | psql -qXa - 8< - Can you have a look? thanks, Erik Rijkers
Re: Problem with default partition pruning
Hosoya-san, Thanks for updating the patches. I have no comment in particular about v4_default_partition_pruning.patch, but let me reiterate my position about v5_ignore_contradictory_where_clauses_at_partprune_step.patch, which I first stated in the following email a few months ago: https://www.postgresql.org/message-id/d2c38e4e-ade4-74de-f686-af37e4a5f1c1%40lab.ntt.co.jp This patch proposes to apply constraint exclusion to check whether it will be wasteful to generate pruning steps from a given clause against a given sub-partitioned table, because the clause contradicts its partition clause. Actually, the patch started out to generalize the existing usage of constraint exclusion in partprune.c that's used to skip processing useless arguments of an OR clause. The problem with steps generated from such contradictory clauses is that they fail to prune the default partition of a sub-partitioned table, because the value extracted from such a clause appears to the pruning logic to fall in the default partition, given that the pruning logic proper is unaware of the partition constraint of the partitioned table that pruning is being applied to. Here is an example similar to one that Hosoya-san shared earlier on this thread that shows the problem. create table p (a int) partition by range (a); create table p1 partition of p for values from (0) to (20) partition by range (a); create table p11 partition of p1 for values from (0) to (10); create table p1_def partition of p1 default; -- p11 correctly pruned, but p1_def not explain select * from p1 where a = 25; QUERY PLAN ── Append (cost=0.00..41.94 rows=13 width=4) -> Seq Scan on p1_def (cost=0.00..41.88 rows=13 width=4) Filter: (a = 25) (3 rows) Here without the knowledge that p1's range is restricted to 0 <= a < 20 by way of its partition constraint, the pruning logic, when handed the value 25, concludes that p1_def must be scanned. With the patch, partprune.c concludes without performing pruning that scanning any of p1's partitions is unnecessary. explain select * from p1 where a = 25; QUERY PLAN ── Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) Actually, as of 11.4, setting constraint_exclusion = on, by way of relation_excluded_by_constraints(), will give you the same result even without the patch. My argument earlier was that we shouldn't have two places that will do essentially the same processing -- partprune.c with the patch applied and relation_excluded_by_constraints(). That is, we should only keep the latter, with the trade-off that users have to live with the default partition of sub-partitioned tables not being pruned in some corner cases like this one. Note that there's still a problem with the existing usage of constraint exclusion in partprune.c, which Thibaut first reported on this thread [1]. explain select * from p1 where a = 25 or a = 5; QUERY PLAN ── Append (cost=0.00..96.75 rows=50 width=4) -> Seq Scan on p11 (cost=0.00..48.25 rows=25 width=4) Filter: ((a = 25) OR (a = 5)) -> Seq Scan on p1_def (cost=0.00..48.25 rows=25 width=4) Filter: ((a = 25) OR (a = 5)) (5 rows) Here only one of the OR's arguments can be true for p1's partitions, but partprune.c's current usage of constraint exclusion fails to notice that. I had posted a patch [2] to solve this specific problem. Hosoya-san's patch is a generalization of my patch. Thanks, Amit [1] https://www.postgresql.org/message-id/bd03f475-30d4-c4d0-3d7f-d2fbde755971%40dalibo.com [2] https://www.postgresql.org/message-id/9bb31dfe-b0d0-53f3-3ea6-e64b811424cf%40lab.ntt.co.jp
Re: Custom table AMs need to include heapam.h because of BulkInsertState
On Wed, 3 Jul 2019 at 19:35, Michael Paquier wrote: > > On Tue, Jul 02, 2019 at 01:26:26AM +1200, David Rowley wrote: > > I've pushed the original patch plus a small change to only call > > table_finish_bulk_insert() for the target of the copy when we're using > > bulk inserts. > > Yes, sorry for coming late at the party here. What I meant previously > is that I did not find the version published at [1] to be natural with > its structure to define an executor callback which then calls a > callback for table AMs, still I get your point that it would be better > to try to avoid unnecessary fsync calls on partitions where no tuples > have been redirected with a COPY. The version 1 of the patch attached > at [2] felt much more straight-forward and cleaner by keeping all the > table AM callbacks within copy.c. > > This has been reverted as of f5db56f, still it seems to me that this > was moving in the right direction. I think the only objection to doing it the way [2] did was, if there are more than MAX_PARTITION_BUFFERS partitions then we may end up evicting the CopyMultiInsertBuffer out of the CopyMultiInsertInfo and thus cause a call to table_finish_bulk_insert() before we're done with the copy. It's not impossible that this could happen many times for a given partition. I agree that a working version of [2] is cleaner than [1] but it's just the thought of those needless calls. For [1], I wasn't very happy with the way it turned out which is why I ended up suggesting a few other ideas. I just don't really like either of them any better than [1], so I didn't chase those up, and that's why I ended up going for [2]. If you think any of the other ideas I suggested are better (apart from [2]) then let me know and I can see about writing a patch. Otherwise, I plan to just fix [2] and push. > [1]: > https://postgr.es/m/CAKJS1f95sB21LBF=1mcsev+xlta_jc3mtxx5kgduhdsogow...@mail.gmail.com > [2]: > https://postgr.es/m/cakjs1f_0t-k0_3xe+erxpq-jgaob6trzayercxf2rpgduvm...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Custom table AMs need to include heapam.h because of BulkInsertState
On Tue, Jul 02, 2019 at 01:26:26AM +1200, David Rowley wrote: > I've pushed the original patch plus a small change to only call > table_finish_bulk_insert() for the target of the copy when we're using > bulk inserts. Yes, sorry for coming late at the party here. What I meant previously is that I did not find the version published at [1] to be natural with its structure to define an executor callback which then calls a callback for table AMs, still I get your point that it would be better to try to avoid unnecessary fsync calls on partitions where no tuples have been redirected with a COPY. The version 1 of the patch attached at [2] felt much more straight-forward and cleaner by keeping all the table AM callbacks within copy.c. This has been reverted as of f5db56f, still it seems to me that this was moving in the right direction. [1]: https://postgr.es/m/CAKJS1f95sB21LBF=1mcsev+xlta_jc3mtxx5kgduhdsogow...@mail.gmail.com [2]: https://postgr.es/m/cakjs1f_0t-k0_3xe+erxpq-jgaob6trzayercxf2rpgduvm...@mail.gmail.com -- Michael signature.asc Description: PGP signature
Re: Run-time pruning for ModifyTable
On Wed, 3 Jul 2019 at 17:27, Amit Langote wrote: > A cursory look at the patch suggests that most of its changes will be > for nothing if [1] materializes. What do you think about that? Yeah, I had this in mind when writing the patch, but kept going anyway. I think it's only really a small patch of this patch that would get wiped out with that change. Just the planner.c stuff. Everything else is still required, as far as I understand. > [1] https://www.postgresql.org/message-id/357.1550612935%40sss.pgh.pa.us -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: POC: converting Lists into arrays
On Tue, Jul 2, 2019 at 5:12 PM Tom Lane wrote: > Oleksandr Shulgin writes: > > Not related to the diff v6..v7, but shouldn't we throw additionally a > > memset() with '\0' before calling pfree(): > > I don't see the point of that. In debug builds CLOBBER_FREED_MEMORY will > take care of it, and in non-debug builds I don't see why we'd expend > the cycles. > This is what I was wondering about, thanks for providing a reference. -- Alex
Re: TopoSort() fix
On Wed, Jul 03, 2019 at 10:41:59AM +0800, Rui Hai Jiang wrote: > Could the attached patch fix this issue? Or does any one else plan to fix > it? > > If people are busy and have not time, I can go ahead to fix it. To fix > this issue, do we need a patch for each official branch? Only a committer could merge any fix you produce. What you have sent looks fine to me, so let's wait for Robert, who has visiblu broken this part to comment. Back-patched versions are usually taken care of by the committer merging the fix, and by experience it is better to agree about the shape of a patch on HEAD before working on other branches. Depending on the review done, the patch's shape may change slightly... -- Michael signature.asc Description: PGP signature
Re: [PATCH v4] Add \warn to psql
Hello Tom, I took a look at this. I have no quibble with the proposed feature, and the implementation is certainly simple enough. But I'm unconvinced about the proposed test scaffolding. Spinning up a new PG instance is a *hell* of a lot of overhead to pay for testing something that could be tested as per attached. Admittedly, the attached doesn't positively prove which pipe each output string went down, but that does not strike me as a concern large enough to justify adding a TAP test for. Sure. The point is that there would be at least *one* TAP tests so that many other features of psql, although not all, can be tested. I have been reviewing quite a few patches without tests because of this lack of infrastructure, and no one patch is ever going to justify a TAP test on its own. It has to start somewhere. Currently psql coverage is abysmal, around 40% of lines & functions are called by the whole non regression tests, despite the hundreds of psql-relying tests. Pg is around 80% coverage overall. Basically, I really thing that one psql dedicated TAP test should be added, not for \warn per se, but for other features. I'd be happier about adding TAP infrastructure if it looked like it'd be usable to test some of the psql areas that are unreachable by the existing test methodology, particularly tab-complete.c and prompt.c. But I don't see anything here that looks like it'll work for that. The tab complete and prompt are special interactive cases and probably require special infrastructure to make a test believe it is running against a tty while it is not. The point of this proposal is not to address these special needs, but to lay a basic infra. I don't like what you did to command_checks_all, Yeah, probably my fault, not David. either --- it could hardly say "bolted on after the fact" more clearly if you'd written that in text. If we need an input-stream argument, let's just add it in a rational place and adjust the callers. There aren't that many of 'em, nor has the subroutine been around all that long. I wanted to avoid breaking the function signature of it is used by some external packages. Not caring is an option. -- Fabien.
Re: POC: converting Lists into arrays
On Tue, 2 Jul 2019 at 11:27, Tom Lane wrote: > My previous patch would have had you replace this with a loop using > an integer list-position index. You can still do that if you like, > but it's less change to convert the loop to a foreach(), drop the > prev/next variables, and replace the list_delete_cell call with > foreach_delete_current: > > foreach(cell, state->enterKeys) > { > TrgmStateKey *existingKey = (TrgmStateKey *) lfirst(cell); > > if (need to delete) > state->enterKeys = > foreach_delete_current(state->enterKeys, > cell); > } > > So I think this is a win, and attached is v7. It's pretty nice to get rid of those. I also like you've handled the changes in SRFs. I've now read over the entire patch and have noted down the following: 1. MergeAttributes() could do with a comment mention why you're not using foreach() on the outer loop. I almost missed the list_delete_nth_cell() call that's a few branches deep in the outer loop. 2. In expandTupleDesc(), couldn't you just change the following: int i; for (i = 0; i < offset; i++) { if (aliascell) aliascell = lnext(eref->colnames, aliascell); } to: aliascell = offset < list_length(eref->colnames) ? list_nth_cell(eref->colnames, offset) : NULL; 3. Worth Assert(list != NIL); in new_head_cell() and new_tail_cell() ? 4. Do you think it would be a good idea to document that the 'pos' arg in list_insert_nth and co must be <= list_length(). I know you've mentioned that in insert_new_cell, but that's static and list_insert_nth is not. I think it would be better not to have to chase down comments of static functions to find out how to use an external function. 5. Why does enlarge_list() return List *? Can't it just return void? I noticed this after looking at add_new_cell_after() and reading your cautionary comment and then looking at lappend_cell(). At first, it seemed that lappend_cell() could end up reallocating List to make way for the new cell, but from looking at enlarge_list() it seems we always maintain the original allocation of the header. So why bother returning List * in that function? 6. Is there a reason to use memmove() in list_concat() rather than just memcpy()? I don't quite believe the comment you've written. As far as I can see you're not overwriting any useful memory so the order of the copy should not matter. 7. The last part of the following comment might not age well. /* * Note: unlike the individual-list-cell deletion functions, we never make * any effort to move the surviving list cells to new storage. This is * because none of them can move in this operation, so it's the same as * the old implementation in terms of what callers may assume. */ The old comment about extending the list seems more fitting. 9. I see your XXX comment in list_qsort(), but wouldn't it be better to just invent list_qsort_internal() as a static function and just have it qsort the list in-place, then make list_qsort just return list_qsort_internal(list_copy(list)); and keep the XXX comment so that the fixup would just remove the list_copy()? That way, when it comes to fixing that inefficiency we can just cut and paste the internal implementation into list_qsort(). It'll be much less churn, especially if you put the internal version directly below the external one. 10. I wonder if we can reduce a bit of pain for extension authors by back patching a macro that wraps up a lnext() call adding a dummy argument for the List. That way they don't have to deal with their own pre-processor version dependent code. Downsides are we'd need to keep the macro into the future, however, it's just 1 line of code... I also did some more benchmarking of the patch. Basically, I patched with the attached file (converted to .txt not to upset the CFbot) then ran make installcheck. This was done on an AWS m5d.large instance. The patch runs the planner 10 times then LOGs the average time of those 10 runs. Taking the sum of those averages I see: Master: 0.482479 seconds Patched: 0.471949 seconds Which makes the patched version 2.2% faster than master on that run. I've resisted attaching the spreadsheet since there are almost 22k data points per run. Apart from the 10 points above, I think the patch is good to go. I also agree with keeping the further improvements like getting rid of the needless list_copy() in list_concat() calls as a followup patch. I also agree with Tom about moving quickly with this one. Reviewing it in detail took me a long time, I'd really rather not do it again after leaving it to rot for a while. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 1bcfdd67e0..750320cb3b 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@
Re: [HACKERS] advanced partition matching algorithm for partition-wise join
On Tue, Jul 2, 2019 at 1:47 PM amul sul wrote: > Attached version is rebase atop of the latest master head(c74d49d41c), thanks. Thanks! Will review. Best regards, Etsuro Fujita
Re: [HACKERS] WIP: Aggregation push-down
On Fri, Mar 1, 2019 at 12:01 AM Antonin Houska wrote: > Tom Lane wrote: > > > Antonin Houska writes: > > > Michael Paquier wrote: > > >> Latest patch set fails to apply, so moved to next CF, waiting on > > >> author. > > > > > Rebased. > > > > This is in need of rebasing again :-(. I went ahead and pushed the 001 > > part, since that seemed fairly uncontroversial. > > ok, thanks. > Another rebase is needed for the patches. Thanks Richard
Re: Where is SSPI auth username determined for TAP tests?
On Sun, Jun 30, 2019 at 12:09:18PM -0400, Tom Lane wrote: > We could apply the same hack on the source node, but I wonder if it > wouldn't be better to make this less of a kluge. I'm tempted to > propose that "pg_regress --config-auth --user XXX" should understand > XXX as the bootstrap superuser name, and then we could clean up > 010_dump_connstr.pl by using that. I have been reviewing that part, and the part to split the bootstrap user from the set of extra roles created looks fine to me. Now, it seems to me that you can simplify 010_dump_connstr.pl as per the attached because PostgresNode::Init can take care of --auth-config part with the correct options using auth_extra. What do you think about the cleanup attached? -- Michael diff --git a/src/bin/pg_dump/t/010_dump_connstr.pl b/src/bin/pg_dump/t/010_dump_connstr.pl index 048ecf24eb..e9f0ff8211 100644 --- a/src/bin/pg_dump/t/010_dump_connstr.pl +++ b/src/bin/pg_dump/t/010_dump_connstr.pl @@ -177,15 +177,11 @@ my $restore_super = qq{regress_a'b\\c=d\\ne"f}; # dbname/user connection parameters my $envar_node = get_new_node('destination_envar'); -$envar_node->init(extra => - [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ]); -$envar_node->run_log( - [ - $ENV{PG_REGRESS}, '--config-auth', - $envar_node->data_dir, '--user', - $dst_bootstrap_super, '--create-role', - $restore_super - ]); +$envar_node->init( + extra => + [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ], + auth_extra => + [ '--user', $dst_bootstrap_super, '--create-role', $restore_super ]); $envar_node->start; # make superuser for restore @@ -210,15 +206,11 @@ is($stderr, '', 'no dump errors'); $restore_super =~ s/"//g if $TestLib::windows_os;# IPC::Run mishandles '"' on Windows my $cmdline_node = get_new_node('destination_cmdline'); -$cmdline_node->init(extra => - [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ]); -$cmdline_node->run_log( - [ - $ENV{PG_REGRESS},'--config-auth', - $cmdline_node->data_dir, '--user', - $dst_bootstrap_super,'--create-role', - $restore_super - ]); +$cmdline_node->init( + extra => + [ '-U', $dst_bootstrap_super, '--locale=C', '--encoding=LATIN1' ], + auth_extra => + [ '--user', $src_bootstrap_super, '--create-role', $restore_super ]); $cmdline_node->start; $cmdline_node->run_log( [ 'createuser', '-U', $dst_bootstrap_super, '-s', $restore_super ]); signature.asc Description: PGP signature