Re: [HACKERS] some review comments on logical rep code
On Fri, Apr 28, 2017 at 02:13:48PM -0400, Peter Eisentraut wrote: > On 4/28/17 01:01, Noah Misch wrote: > > On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote: > >> On Fri, Apr 28, 2017 at 1:42 PM, Noah Mischwrote: > >>> On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote: > Pushed. Thanks! > >>> > >>> Does this close the open item, or is there more to do? > >> > >> There is only one item remaining, and the patch is attached on > >> here[1]. I guess reviewing that patch is almost done. > >> > >> [1] > >> https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com > > > > Thanks. Peter, This PostgreSQL 10 open item is past due for your status > > update. Kindly send a status update within 24 hours, and include a date for > > your subsequent status update. Refer to the policy on open item ownership: > > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com > > I think the patch that Fujii Masao has proposed has found general > agreement. I would recommend that he commits it as he sees fit. This is not a conforming status update, because it does not specify a date for your next update. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transition tables for triggers on foreign tables and views
Kevin Grittnerwrites: > Well, I was sort of hoping that the triggers that can now be defined > but can never fire *did* fire at some point. They will fire if you have an INSTEAD OF row-level trigger; the existence of that trigger is what determines whether we implement DML on a view through the view's own triggers or through translation to an action on the underlying table. I do not think it'd be reasonable to throw an error for creation of a statement-level view trigger when there's no row-level trigger, because that just imposes a hard-to-deal-with DDL ordering dependency. You could make a case for having the updatable-view translation code print a WARNING if it notices that there are statement-level triggers that cannot be fired due to the translation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
I wrote: > David Rowleywrites: >> (On 29 April 2017 at 02:26, Tom Lane wrote: >> Seems related to the unconditional setting of extra.inner_unique to >> true for JOIN_UNIQUE_INNER jointypes in add_paths_to_joinrel() >> Setting this based on the return value of innerrel_is_unique() as done >> with the other join types seems to fix the issue. > Yes, I think that's correct. Well, "make check-world" disabused me of that notion: there are several test cases in postgres_fdw that lost perfectly-valid inner_unique markings. The reason is that create_unique_path will create uniqueness even when you couldn't prove it from the underlying rel itself. So my previous thought about comparing the outerrel to sjinfo->min_lefthand is really necessary to avoid regressions from what we had before. However, while that seems to be enough to generate correct plans, it doesn't address Teodor's performance complaint: he's wishing the planner would notice that the semijoin inner rel is effectively unique, even when the best plan involves initially joining the semijoin inner rel to just a subset of the semijoin outer --- against which that inner rel is *not* unique. Applying innerrel_is_unique() helps for some simpler cases, but not this one. Really, the way to fix Teodor's complaint is to recognize that the semijoin inner rel is effectively unique against the whole outer rel, and then strength-reduce the semijoin to a plain join. The infrastructure we built for unique joins is capable of proving that, we just weren't applying it in the right way. Attached are two alternative patches. The first just does the minimum necessary to fix the bug; the second adds some code to perform strength-reduction of semijoins. The second patch is capable of finding the plan Teodor wanted for his test case --- in fact, left to its own devices, it finds a *better* plan, both by cost and actual runtime. I'm kind of strongly tempted to apply the second patch; but it would be fair to complain that reduce_unique_semijoins() is new development and should wait for v11. Opinions? regards, tom lane diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c index 5aedcd1..39e2ddd 100644 *** a/src/backend/optimizer/path/joinpath.c --- b/src/backend/optimizer/path/joinpath.c *** add_paths_to_joinrel(PlannerInfo *root, *** 126,133 * * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't * matter since the executor can make the equivalent optimization anyway; ! * we need not expend planner cycles on proofs. For JOIN_UNIQUE_INNER, we ! * know we're going to force uniqueness of the innerrel below. For * JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid letting that value escape * this module. */ --- 126,136 * * We have some special cases: for JOIN_SEMI and JOIN_ANTI, it doesn't * matter since the executor can make the equivalent optimization anyway; ! * we need not expend planner cycles on proofs. For JOIN_UNIQUE_INNER, if ! * the LHS covers all of the associated semijoin's min_lefthand, then it's ! * appropriate to set inner_unique because the path produced by ! * create_unique_path will be unique relative to the LHS. (If we have an ! * LHS that's only part of the min_lefthand, that is *not* true.) For * JOIN_UNIQUE_OUTER, pass JOIN_INNER to avoid letting that value escape * this module. */ *** add_paths_to_joinrel(PlannerInfo *root, *** 138,144 extra.inner_unique = false; /* well, unproven */ break; case JOIN_UNIQUE_INNER: ! extra.inner_unique = true; break; case JOIN_UNIQUE_OUTER: extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel, --- 141,148 extra.inner_unique = false; /* well, unproven */ break; case JOIN_UNIQUE_INNER: ! extra.inner_unique = bms_is_subset(sjinfo->min_lefthand, ! outerrel->relids); break; case JOIN_UNIQUE_OUTER: extra.inner_unique = innerrel_is_unique(root, outerrel, innerrel, diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out index 69ce7aa..87ff365 100644 *** a/src/test/regress/expected/join.out --- b/src/test/regress/expected/join.out *** reset enable_sort; *** 5634,5636 --- 5634,5665 drop table j1; drop table j2; drop table j3; + -- check that semijoin inner is not seen as unique for a portion of the outerrel + explain (verbose, costs off) + select t1.unique1, t2.hundred + from onek t1, tenk1 t2 + where exists (select 1 from tenk1 t3 + where t3.thousand = t1.unique1 and t3.tenthous = t2.hundred) + and t1.unique1 < 1; +QUERY PLAN + - + Nested Loop +Output: t1.unique1, t2.hundred +
Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range
On Fri, Apr 28, 2017 at 3:00 PM, Tom Lanewrote: > You are confusing number of tuples in the index, which we estimate from > independent measurements such as the file size, with endpoint value, > which is used for purposes like guessing whether a mergejoin will be > able to stop early. For purposes like that, we do NOT want to include > dead tuples, because the merge join is never gonna see 'em. I spent several hours today thinking about this and, more than once, thought I'd come up with an example demonstrating why my idea was better than yours (despite the fact that, as you point out, the merge join is never gonna see 'em). However, in each instance, I eventually realized that I was wrong, so I guess I'll have to concede this point. > Or put another way: the observed problem is planning time, not that the > resulting estimates are bad. You argue that we should cut planning > time by changing the definition of the estimate, and claim that the > new definition is better, but I think you have nothing but wishful > thinking behind that. I'm willing to try to cut planning time, but > I don't want the definition to change any further than it has to. OK, I guess that makes sense. There can be scads of dead tuples at the end of the index, and there's no surety that the query actually requires touching that portion of the index at all apart from planning, so it seems a bit unfortunate to burden planning with the overhead of cleaning them up. But I guess with your new proposed definition at least that can only happen once. After that there may be a bunch of pages to skip at the end of the index before we actually find a tuple, but they should at least be empty. Right now, you can end up skipping many tuples repeatedly for every query. I tested a two-table join with a million committed deleted but not dead tuples at the end of one index; that increased planning time from ~0.25ms to ~90ms; obviously, a two-and-a-half order of magnitude increase in CPU time spent planning is not a welcome development on a production system. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] A design for amcheck heapam verification
It seems like a good next step for amcheck would be to add functionality that verifies that heap tuples have matching index tuples, and that heap pages are generally sane. I've been thinking about a design for this for a while now, and would like to present some tentative ideas before I start writing code. Using a bloom filter built with hashed index tuples, and then matching that against the heap is an approach to index/heap consistency checking that has lots of upsides, but at least one downside. The downside is pretty obvious: we introduce a low though quantifiable risk of false negatives (failure to detect corruption due to the actual absence of an appropriate index tuple). That's not great, especially because it occurs non-deterministically, but the upsides to this approach are considerable. Perhaps it's worth it. Here is the general approach that I have in mind right now: * Size the bloom filter based on the pg_class.reltuples of the index to be verified, weighing a practical tolerance for false negatives, capping with work_mem. - We might throw an error immediately if it's impossible to get a reasonably low probability of false negatives -- for some value of "reasonable". * Perform existing verification checks for a B-Tree. While scanning the index, hash index tuples, including their heap TID pointer. Build the bloom filter with hashed values as we go. As we Scan the heap, we: * Verify that HOT safety was correctly assessed in respect of the index (or indexes) being verified. * Test the visibility map, and sanity check MultiXacts [1]. * Probe index/heap match check (uses bloom filter): If a heap tuple meets the following conditions: - Is not a HOT update tuple. - Is committed, and committed before RecentGlobalXmin. - Satisfies any predicate (for partial index case). Then: - Build a would-be index tuple value, perhaps reusing CREATE INDEX code. - Hash that in the same manner as in the original index pass. - Raise an error if the bloom filter indicates it is not present in the index. Seems like we should definitely go to the index first, because the heap is where we'll find visibility information. If I wanted to go about implementing the same index/heap checks in a way that does not have the notable downside around false negatives, I suppose I'd have to invent a new, internal mechanism that performed a kind of special merge join of an index against the heap. That would be complicated, and require a lot of code; in other words, it would be bug-prone. I wouldn't say that amcheck is simple today, but at least the complexity owes everything to how B-Trees already work, as opposed to how a bunch of custom infrastructure we had to invent works. The amount of code in amcheck's verify_nbtree.c file is pretty low, and that's a good goal to stick with. The very detailed comment that argues for the correctness of amcheck's bt_right_page_check_scankey() function is, to a large degree, also arguing for the correctness of a bunch of code within nbtree. amcheck verification should be comprehensive, but also simple and minimal, which IMV is a good idea for about the same reason that that's a good idea when writing unit tests. The merge join style approach would also make verification quite expensive, particularly when one table has multiple indexes. A tool whose overhead can only really be justified when we're pretty sure that there is already a problem is *significantly* less useful. And, it ties verification to the executor, which could become a problem if we make the functionality into a library that is usable by backup tools that don't want to go through the buffer manager (or any SQL interface). Apart from the low memory overhead of using a bloom filter, resource management is itself made a lot easier. We won't be sensitive to misestimations, because we only need an estimate of the number of tuples in the index, which will tend to be accurate enough in the vast majority of cases. reltuples is needed to size the bloom filter bitmap up-front. It doesn't matter how wide individual index tuples turn out to be, because we simply hash everything, including even the heap TID contained within the index. Using a bloom filter makes verification "stackable" in a way that might become important later. For example, we can later improve amcheck to verify a table in parallel, by having a parallel worker verify one index each, with bloom filters built in fixed size shared memory buffers. A parallel heap scan then has workers concurrently verify heap pages, and concurrently probe each final bloom filter. Similarly, everything works the same if we add the option of scanning a B-Tree in physical order (at the expense of not doing cross-page verification). And, while I'd start with nbtree, you can still pretty easily generalize the approach to building the bloom filter across AMs. All index AMs other than BRIN have index tuples that are essentially some values that are either the original heap values
Re: [HACKERS] Declarative partitioning - another take
On Fri, Apr 28, 2017 at 06:29:48PM +0900, Amit Langote wrote: > On 2017/04/28 7:36, David Fetter wrote: > > On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote: > >> On 2017/04/27 1:52, Robert Haas wrote: > >>> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote > >>>wrote: > FWIW, I too prefer the latter, that is, fire only the parent's triggers. > In that case, applying only the patch 0001 will do. > >>> > >>> Do we need to update the documentation? > >> > >> Yes, I think we should. How about as in the attached? > >> > >> By the way, code changes I made in the attached are such that a subsequent > >> patch could implement firing statement-level triggers of all the tables in > >> a partition hierarchy, which it seems we don't want to do. Should then > >> the code be changed to not create ResultRelInfos of all the tables but > >> only the root table (the one mentioned in the command)? You will see that > >> the patch adds fields named es_nonleaf_result_relations and > >> es_num_nonleaf_result_relations, whereas just es_root_result_relation > >> would perhaps do, for example. > > > > Did I notice correctly that there's no way to handle transition tables > > for partitions in AFTER STATEMENT triggers? > > Did you mean to ask about AFTER STATEMENT triggers defined on > "partitioned" tables? Specifying transition table for them is disallowed > at all. > > ERROR: "p" is a partitioned table > DETAIL: Triggers on partitioned tables cannot have transition tables. OK, I suppose. It wasn't clear from the documentation. > Triggers created on (leaf) partitions *do* allow specifying transition table. That includes the upcoming "default" tables, I presume. > Or are you asking something else altogether? I was just fuzzy on the interactions among these features. > > If not, I'm not suggesting that this be added at this late date, but > > we might want to document that. > > I don't see mentioned in the documentation that such triggers cannot be > defined on partitioned tables. Is that what you are saying should be > documented? Yes, but I bias toward documenting a lot, and this restriction could go away in some future version, which would make things more confusing in the long run. I'm picturing a conversation in 2020 that goes something like this: "On 10, you could have AFTER STATEMENT triggers on tables, foreigh tables, and leaf partition tables which referenced transition tables, but not on DEFAULT partitions. On 11, you could on DEFAULT partition tables. From 12 onward, you can have transition tables on any relation." Kevin? Thomas? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
David Rowleywrites: > (On 29 April 2017 at 02:26, Tom Lane wrote: >> It looks like in the case that's giving wrong answers, the mergejoin >> is wrongly getting marked as "Inner Unique". Something's a bit too >> cheesy about that planner logic --- not sure what, yet. > Seems related to the unconditional setting of extra.inner_unique to > true for JOIN_UNIQUE_INNER jointypes in add_paths_to_joinrel() > Setting this based on the return value of innerrel_is_unique() as done > with the other join types seems to fix the issue. > I don't know yet if that's the correct fix. It's pretty late 'round > this side to be thinking too hard about it. Yes, I think that's correct. I'd jumped to the conclusion that we could skip making the test in this case, but this example shows that that's wrong. The problem is that, in an example like this, create_unique_path will create a path that's unique-ified for all the join keys of the semijoin --- but we're considering joining against just a subset of the semijoin's outer rels, so the inner path is NOT unique for that subset. We could possibly skip making the test if the outerrel contains sjinfo->min_lefthand, but I'm not sufficiently excited about shaving cycles here to take any new risks. Let's just call innerrel_is_unique() and be done. Will fix in a bit, once I've managed to create a smaller test case for the regression tests. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
On Fri, Apr 28, 2017 at 6:59 PM, Tom Lanewrote: > David Rowley writes: > > Did you mean to attach this? > > See the link in Teodor's original message (it's actually a .bz2 file > not a .gz) > Yes, I didn't mean Teodor has renamed it. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] [PATCH] Push limit to sort through a subquery
> > If you add this to the commitfest app, more people might look at it when > the next commitfest opens. I have added it. https://commitfest.postgresql.org/14/1119/ Also, it might help if you can provide a query/ies with numbers where this > optimization shows improvement. > I can't provide the real queries where we encountered the problem because they are internal. However I showed a simplified version of the queries in my first post. On our queries, the change made quite a difference - execution time dropped from 31.4 seconds to 7.2 seconds. Explain analyze also shows that memory use dropped significantly and we didn't have to spill the sort to disk From: -> Sort (cost=989.95..1013.27 rows=9326 width=30) (node_startup_time/loop=31328.891, node_total_time/loop: 31329.756 rows=2001 loops=1) Buffers: temp read=772 written=11201 lsm_bufmgr hits=3392 Sort Key: *** Sort Method: external merge Sort Space Used: 89592 Sort Space Type: Disk To: -> Sort (cost=989.95..1013.27 rows=9326 width=30) (node_startup_time/loop=7123.275, node_total_time/loop: 7123.504 rows=2001 loops=1) Buffers: lsm_bufmgr hits=3387 Sort Key: *** Sort Method: top-N heapsort Sort Space Used: 3256 Sort Space Type: Memory
Re: [HACKERS] Transition tables for triggers on foreign tables and views
On Fri, Apr 28, 2017 at 3:54 PM, Kevin Grittnerwrote: > Not firing a table's DML because it was fired off a view would be crazy, should read: Not firing a table's trigger because the trigger is on DML run from a view's INSTEAD OF trigger would be crazy, -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transition tables for triggers on foreign tables and views
On Fri, Apr 28, 2017 at 2:42 PM, Tom Lanewrote: > Kevin Grittner writes: >> On Tue, Apr 25, 2017 at 6:17 PM, Thomas Munro >> wrote: >>> For views, aside from the question of transition tables, I noticed >>> that statement triggers don't seem to fire at all with updatable >>> views. Surely they should -- isn't that a separate bug? > >> I checked out 25dc142a (before any of my commits for $subject), >> built it, and tried the above -- with no warning generated. I then >> used an UPDATE and DELETE against the view, also with no trigger >> fired (nor any error during trigger creation or DML). Does anyone >> know whether such trigger ever fired at any point in the commit >> history? > > [ experiments... ] They did, and do, fire if you do it the old-style > way with an INSTEAD OF row trigger. Here is the table from near the start of CREATE TRIGGER docs, "folded" such that I hope it remains intelligible in a fixed-width font after email systems get done with it: When Event Row-level Statement-level BEFORE INSERT/UPDATE/DELETE Tables and foreign tables Tables, views, and foreign tables TRUNCATE — Tables AFTER INSERT/UPDATE/DELETE Tables and foreign tables Tables, views, and foreign tables TRUNCATE — Tables INSTEAD OF INSERT/UPDATE/DELETE Views — TRUNCATE — — The claim seems to be that you can create a { BEFORE | AFTER } { event [ OR ... ] } ... FOR EACH STATEMENT trigger where event is { INSERT | UPDATE | DELETE } on an updateable view. Well, you can *create* it, but it will never fire. > They don't fire if you're relying > on an updatable view. It seems we fire the table's statement triggers > instead, ie, the update is fully translated into an update on the > underlying table. Well, certainly that should *also* happen. Not firing a table's DML because it was fired off a view would be crazy, or so it seems to me. > I'm not sure how intentional that was, but it's not a completely > unreasonable definition on its face, and given the lack of field > complaints since 9.3, I think we should probably stick to it. Are you talking about being able to create INSERT, UPDATE, and DELETE triggers on the view (which never fire), or about firing triggers on the table when an INSTEAD OF trigger is fired. > However, if you didn't understand that from the documentation, > then we have a documentation problem. I totally get that there are INSTEAD OF triggers and have no quibble with how they behave. I can't even argue that the above chart is wrong in terms of what CREATE TRIGGER allows. However, creating triggers that can never fire seems entirely wrong. >> If we do get these working, don't they deserve at least >> one regression test? > > Are you sure there isn't one? Well, I was sort of hoping that the triggers that can now be defined but can never fire *did* fire at some point. But if that were true, and they subsequently were broken, it would mean we lacked regression tests for that case. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench tap tests & minor fixes
Here is a v3, with less files. I cannot say I find it better, but it still works. The "command_likes" function has been renamed "command_checks". -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index ae36247..749b16d 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -229,7 +229,7 @@ typedef struct SimpleStats typedef struct StatsData { time_t start_time; /* interval start time, for aggregates */ - int64 cnt; /* number of transactions */ + int64 cnt; /* number of transactions, including skipped */ int64 skipped; /* number of transactions skipped under --rate * and --latency-limit */ SimpleStats latency; @@ -329,7 +329,7 @@ typedef struct bool prepared[MAX_SCRIPTS]; /* whether client prepared the script */ /* per client collected stats */ - int64 cnt; /* transaction count */ + int64 cnt; /* client transaction count, for -t */ int ecnt; /* error count */ } CState; @@ -2045,7 +2045,9 @@ doCustom(TState *thread, CState *st, StatsData *agg) if (INSTR_TIME_IS_ZERO(now)) INSTR_TIME_SET_CURRENT(now); now_us = INSTR_TIME_GET_MICROSEC(now); - while (thread->throttle_trigger < now_us - latency_limit) + while (thread->throttle_trigger < now_us - latency_limit && + /* with -t, do not overshoot */ + (nxacts <= 0 || st->cnt < nxacts)) { processXactStats(thread, st, , true, agg); /* next rendez-vous */ @@ -2053,6 +2055,12 @@ doCustom(TState *thread, CState *st, StatsData *agg) thread->throttle_trigger += wait; st->txn_scheduled = thread->throttle_trigger; } + + if (nxacts > 0 && st->cnt >= nxacts) + { + st->state = CSTATE_FINISHED; + break; + } } st->state = CSTATE_THROTTLE; @@ -2372,7 +2380,7 @@ doCustom(TState *thread, CState *st, StatsData *agg) per_script_stats || use_log) processXactStats(thread, st, , false, agg); else - thread->stats.cnt++; + thread->stats.cnt++, st->cnt++; if (is_connect) { @@ -2381,7 +2389,6 @@ doCustom(TState *thread, CState *st, StatsData *agg) INSTR_TIME_SET_ZERO(now); } -++st->cnt; if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded) { /* exit success */ @@ -2530,6 +2537,7 @@ processXactStats(TState *thread, CState *st, instr_time *now, lag = INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled; } + /* thread stats */ if (progress || throttle_delay || latency_limit) { accumStats(>stats, skipped, latency, lag); @@ -2539,8 +2547,12 @@ processXactStats(TState *thread, CState *st, instr_time *now, thread->latency_late++; } else + /* just count */ thread->stats.cnt++; + /* client stat is just counting */ + st->cnt ++; + if (use_log) doLog(thread, st, agg, skipped, latency, lag); @@ -3118,7 +3130,7 @@ process_backslash_command(PsqlScanState sstate, const char *source) /* Save line */ my_command->line = expr_scanner_get_substring(sstate, start_offset, - end_offset); + end_offset + 1); if (pg_strcasecmp(my_command->argv[0], "sleep") == 0) { @@ -3509,7 +3521,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time, { printf("number of transactions per client: %d\n", nxacts); printf("number of transactions actually processed: " INT64_FORMAT "/%d\n", - total->cnt, nxacts * nclients); + total->cnt - total->skipped, nxacts * nclients); } else { @@ -3525,12 +3537,12 @@ printResults(TState *threads, StatsData *total, instr_time total_time, if (throttle_delay && latency_limit) printf("number of transactions skipped: " INT64_FORMAT " (%.3f %%)\n", total->skipped, - 100.0 * total->skipped / (total->skipped + total->cnt)); + 100.0 * total->skipped / total->cnt); if (latency_limit) printf("number of transactions above the %.1f ms latency limit: %d (%.3f %%)\n", latency_limit / 1000.0, latency_late, - 100.0 * latency_late / (total->skipped + total->cnt)); + 100.0 * latency_late / total->cnt); if (throttle_delay || progress || latency_limit) printSimpleStats("latency", >latency); @@ -3580,7 +3592,7 @@ printResults(TState *threads, StatsData *total, instr_time total_time, printf(" - number of transactions skipped: " INT64_FORMAT " (%.3f%%)\n", sql_script[i].stats.skipped, 100.0 * sql_script[i].stats.skipped / - (sql_script[i].stats.skipped + sql_script[i].stats.cnt)); + sql_script[i].stats.cnt); if (num_scripts > 1) printSimpleStats(" - latency", _script[i].stats.latency); @@ -4106,6 +4118,12 @@ main(int argc, char **argv) exit(1); } + if (progress_timestamp && progress <= 0) + { + fprintf(stderr, "--progress-timestamp is allowed only under --progress\n"); + exit(1); + } + /* * save main process id in the global variable because process id will be * changed after fork. diff
Re: [HACKERS] Transition tables for triggers on foreign tables and views
Kevin Grittnerwrites: > On Tue, Apr 25, 2017 at 6:17 PM, Thomas Munro > wrote: >> For views, aside from the question of transition tables, I noticed >> that statement triggers don't seem to fire at all with updatable >> views. Surely they should -- isn't that a separate bug? > I checked out 25dc142a (before any of my commits for $subject), > built it, and tried the above -- with no warning generated. I then > used an UPDATE and DELETE against the view, also with no trigger > fired (nor any error during trigger creation or DML). Does anyone > know whether such trigger ever fired at any point in the commit > history? [ experiments... ] They did, and do, fire if you do it the old-style way with an INSTEAD OF row trigger. They don't fire if you're relying on an updatable view. It seems we fire the table's statement triggers instead, ie, the update is fully translated into an update on the underlying table. I'm not sure how intentional that was, but it's not a completely unreasonable definition on its face, and given the lack of field complaints since 9.3, I think we should probably stick to it. However, if you didn't understand that from the documentation, then we have a documentation problem. > If we do get these working, don't they deserve at least > one regression test? Are you sure there isn't one? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables
On Fri, Apr 28, 2017 at 1:18 AM, Ashutosh Bapatwrote: > For two-way join this works and is fairly straight-forward. I am > assuming that A an B are base relations and not joins. But making it > work for N-way join is the challenge. I don't think it's much different, is it? Anyway, I'm going to protest if your algorithm for merging bounds takes any more than linear time, regardless of what else we decide. >> Having said that I think we could make this work, I'm starting to >> agree with you that it will add more complexity than it's worth. >> Needing to keep track of the type of every partition bound >> individually seems like a real nuisance, and it's not likely to win >> very often because, realistically, people should and generally will >> use the same type for the partitioning column in all of the relevant >> tables. So I'm going to revise my position and say it's fine to just >> give up on partitionwise join unless the types match exactly, but I >> still think we should try to cover the cases where the bounds don't >> match exactly but only 1:1 or 1:0 or 0:1 mappings are needed (iow, >> optimizations 1 and 2 from your list of 4). I agree that ganging >> partitions (optimization 4 from your list) is not something to tackle >> right now. > > Good. I will have a more enjoyable vacation now. Phew, what a relief. :-) > Do you still want the patition key type to be out of partition scheme? > Keeping it there means we match it only once and save it only at a > single place. Otherwise, it will have to be stored in RelOptInfo of > the partitioned table and match it for every pair of joining > relations. The only reason for removing things from the PartitionScheme was if they didn't need to be consistent across all tables. Deciding that the type is one of the things that has to match means deciding it should be in the PartitionScheme, not the RelOptInfo. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range
Robert Haaswrites: > On Fri, Apr 28, 2017 at 12:12 PM, Tom Lane wrote: >> Maybe we need another type of snapshot that would accept any >> non-vacuumable tuple. I really don't want SnapshotAny semantics here, > I don't, in general, share your intuition that using SnapshotAny is > the wrong thing. I think you are mistaken. > We're looking up the last value in the index for > planning purposes. It seems to me that, as far as making index scans > more or less expensive to scan, a dead tuple is almost as good as a > live one. You are confusing number of tuples in the index, which we estimate from independent measurements such as the file size, with endpoint value, which is used for purposes like guessing whether a mergejoin will be able to stop early. For purposes like that, we do NOT want to include dead tuples, because the merge join is never gonna see 'em. In short, arguing about the cost of an indexscan per se is quite irrelevant, because this statistic is not used when estimating the cost of an indexscan. Or put another way: the observed problem is planning time, not that the resulting estimates are bad. You argue that we should cut planning time by changing the definition of the estimate, and claim that the new definition is better, but I think you have nothing but wishful thinking behind that. I'm willing to try to cut planning time, but I don't want the definition to change any further than it has to. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] identity columns
On Fri, Apr 28, 2017 at 2:49 PM, Peter Eisentrautwrote: > On 4/27/17 16:10, Robert Haas wrote: >> I still think you should consider improving the psql output, though. >> Vitaly's examples upthread indicate that for a serial sequence, >> there's psql output showing the linkage between the table and sequence >> in both directions, but not when GENERATED is used. Can we get >> something morally equivalent for the GENERATED case? > > Done. Good catch. Cool. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transition tables for triggers on foreign tables and views
On Tue, Apr 25, 2017 at 6:17 PM, Thomas Munrowrote: > My colleague Prabhat Sahu reported off list that transition tables > don't work for views. I probably should have thought about that when > I fixed something similar for partitioned tables, and after some > experimentation I see that this is also broken for foreign tables. Good spot. Don't know why that didn't occur to me to look at. > For foreign tables using postgres_fdw, I see that transition tables > capture new rows for INSERT but capture nothing for DELETE and UPDATE. Will dig into that in a bit, but first... > For views, aside from the question of transition tables, I noticed > that statement triggers don't seem to fire at all with updatable > views. Surely they should -- isn't that a separate bug? > > Example: > > create table my_table (i int); > create view my_view as select * from my_table; > create function my_trigger_function() returns trigger language plpgsql as > $$ begin raise warning 'hello world'; return null; end; $$; > create trigger my_trigger after insert or update or delete on my_view > for each statement execute procedure my_trigger_function(); > insert into my_view values (42); > > ... and the world remains ungreeted. I checked out 25dc142a (before any of my commits for $subject), built it, and tried the above -- with no warning generated. I then used an UPDATE and DELETE against the view, also with no trigger fired (nor any error during trigger creation or DML). Does anyone know whether such trigger ever fired at any point in the commit history? Should we remove the documentation that anything except INSTEAD OF triggers work on a view, and generate errors for attempt to do otherwise, or is there something to salvage that has worked at some point? If we do get these working, don't they deserve at least one regression test? Will post again after I have a chance to review the FDW issue. -- Kevin Grittner VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] identity columns
On 4/27/17 16:10, Robert Haas wrote: > I still think you should consider improving the psql output, though. > Vitaly's examples upthread indicate that for a serial sequence, > there's psql output showing the linkage between the table and sequence > in both directions, but not when GENERATED is used. Can we get > something morally equivalent for the GENERATED case? Done. Good catch. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some review comments on logical rep code
On 4/28/17 01:01, Noah Misch wrote: > On Fri, Apr 28, 2017 at 01:55:48PM +0900, Masahiko Sawada wrote: >> On Fri, Apr 28, 2017 at 1:42 PM, Noah Mischwrote: >>> On Fri, Apr 28, 2017 at 06:37:09AM +0900, Fujii Masao wrote: Pushed. Thanks! >>> >>> Does this close the open item, or is there more to do? >> >> There is only one item remaining, and the patch is attached on >> here[1]. I guess reviewing that patch is almost done. >> >> [1] >> https://www.postgresql.org/message-id/CAHGQGwELzJrA4SDA4TsJGds4X-ykTOP%2By5hecsoQmQqzZf8T7A%40mail.gmail.com > > Thanks. Peter, This PostgreSQL 10 open item is past due for your status > update. Kindly send a status update within 24 hours, and include a date for > your subsequent status update. Refer to the policy on open item ownership: > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com I think the patch that Fujii Masao has proposed has found general agreement. I would recommend that he commits it as he sees fit. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dropping a partitioned table takes too long
On Fri, Apr 28, 2017 at 6:12 AM, 高增琦wrote: > It seems that in 'load_relcache_init_file()', we forget to initialize > 'rd_pdcxt' of relcache. Fixed. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Dropping a partitioned table takes too long
On Wed, Apr 26, 2017 at 12:21 PM, Peter Eisentrautwrote: > On 4/26/17 12:15, Robert Haas wrote: >> On Tue, Apr 25, 2017 at 10:05 PM, Amit Langote >> wrote: The attached patch try to replace 'heap_open' with 'LockRelationOid' when locking parent table. It improved dropping a table with 7000 partitions. >>> >>> Your patch seems to be a much better solution to the problem, thanks. >> >> Does anyone wish to object to this patch as untimely? >> >> If not, I'll commit it. > > Seems quite reasonable. OK, done. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Crash when partition column specified twice
On Fri, Apr 28, 2017 at 7:23 AM, Beena Emersonwrote: > Hello Amit, > > The extra n->is_from_type = false; seems to be added by mistake? > > @@ -11888,6 +11891,8 @@ TableFuncElement: ColId Typename > opt_collate_clause > n->is_local = true; > n->is_not_null = false; > n->is_from_type = false; > + n->is_from_type = false; > + n->is_from_parent = false; > n->storage = 0; > n->raw_default = NULL; > n->cooked_default = NULL; Good catch. Committed after fixing that issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interval for launching the table sync worker
On 4/27/17 21:20, Masahiko Sawada wrote: > Isn't it better to use != NIL instead as follows? > >else if (table_state != NIL && last_start_times) I'm not a fan of that in general, and it doesn't really add any clarity here. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interval for launching the table sync worker
On 4/27/17 15:33, Petr Jelinek wrote: > On 27/04/17 21:00, Peter Eisentraut wrote: >> On 4/27/17 06:47, Petr Jelinek wrote: >>> One thing I am missing in your patch however is cleanup of entries for >>> relations that finished sync. I wonder if it would be enough to just >>> destroy the hash when we get to empty list. >> >> I had omitted that because the amount of memory "leaked" is not much, >> but I guess it wouldn't hurt to clean it up. >> >> How about the attached? > > Yes that's more or less what I meant. All good. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range
On Fri, Apr 28, 2017 at 12:12 PM, Tom Lanewrote: > Robert Haas writes: >> On Thu, Apr 27, 2017 at 5:22 PM, Tom Lane wrote: >>> How so? Shouldn't the indexscan go back and mark such tuples dead in >>> the index, such that they'd be visited this way only once? If that's >>> not happening, maybe we should try to fix it. > >> Hmm. Actually, I think the scenario I saw was where there was a large >> number of tuples at the end of the index that weren't dead yet due to >> an old snapshot held open. That index was being scanned by lots of >> short-running queries. Those queries executed just fine, but they >> took a long to plan because they had to step over all of the dead >> tuples in the index one by one. > > But that was the scenario that we intended to fix by changing to > SnapshotDirty, no? Or I guess not quite, because > dead-but-still-visible-to-somebody tuples are rejected by SnapshotDirty. Yup. > Maybe we need another type of snapshot that would accept any > non-vacuumable tuple. I really don't want SnapshotAny semantics here, > but a tuple that was live more recently than the xmin horizon seems > like it's acceptable enough. HeapTupleSatisfiesVacuum already > implements the right behavior, but we don't have a Snapshot-style > interface for it. Maybe. What I know is that several people have found SnapshotDirty to be problematic, and in the case with which I am acquainted, using SnapshotAny fixed it. I do not know whether, if everybody in the world were using SnapshotAny, somebody would have the problem you're talking about, or some other one. And if they did, I don't know whether using the new kind of snapshot you are proposing would fix it. I do know that giving SnapshotAny to people seems to have only improved things according to the information currently available to me. I don't, in general, share your intuition that using SnapshotAny is the wrong thing. We're looking up the last value in the index for planning purposes. It seems to me that, as far as making index scans more or less expensive to scan, a dead tuple is almost as good as a live one. Until that tuple is not only marked dead, but removed from the index page, it contributes to the cost of an index scan. To put that another way, suppose the range of index keys is 0 to 2 million, but the heap tuples for values 1 million and up are committed deleted. All the index tuples remain (and may or may not be removable depending on what other snapshots exist in the system). Now, consider the following three cases: (a) index scan from 0 to 10,000 (b) index scan from 1,000,000 to 1,010,000 (c) index scan from 3,000,000 to 3,010,000 I contend that the cost of index scan (b) is a lot closer to the cost of (a) than to the cost of (c). (b) finds a whole bunch of index tuples; (c) gives up immediately and goes home. So I actually think that using the actual last value in the index - 2,000,000 - is conceptually *correct* regardless of whether it's marked dead and regardless of whether the corresponding heap tuple is dead. The cost depends mostly on which tuples are present in the index, not which table rows the user can see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
(On 29 April 2017 at 02:26, Tom Lanewrote: > Alexander Korotkov writes: >> I've reproduced this bug on d981074c. >> On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query >> result is OK. >> But with seqscan and hashjoin disabled, query returns 0 rows. > > Ah, thanks for the clue about enable_hashjoin, because it wasn't > reproducing for me as stated. > > It looks like in the case that's giving wrong answers, the mergejoin > is wrongly getting marked as "Inner Unique". Something's a bit too () > cheesy about that planner logic --- not sure what, yet. Seems related to the unconditional setting of extra.inner_unique to true for JOIN_UNIQUE_INNER jointypes in add_paths_to_joinrel() Setting this based on the return value of innerrel_is_unique() as done with the other join types seems to fix the issue. I don't know yet if that's the correct fix. It's pretty late 'round this side to be thinking too hard about it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication in the same cluster
On Thu, Apr 27, 2017 at 4:08 AM, Petr Jelinekwrote: > Back when writing the original patch set, I was also playing with the > idea of having CREATE SUBSCRIPTION do multiple committed steps in > similar fashion to CREATE INDEX CONCURRENTLY but that leaves mess behind > on failure which also wasn't very popular outcome. I wonder how bad it > would be if we created all the stuff for subscription but in disabled > form, then committed, then created slot outside of tx (slot creation is > not transactional anyway) and then switched the subscription to enabled > (if needed) in next tx. It would still leave subscription behind on > failure but a) user would see the failure, b) the subscription would be > inactive so no active harm from it. We also already prevent running > CREATE SUBSCRIPTION inside transaction block when automatic slot > creation is chosen so there is no difference from that perspective. Sounds like a solid approach. There's no way to end up with a remote object without also ending up with a logical object, which seems like it greatly reduces the chances of confusion and error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Fri, Apr 28, 2017 at 2:13 AM, Amit Langotewrote: > It seems to me that there is no difference in behavior between > inheritance-based and declarative partitioning as far as statement-level > triggers are concerned (at least currently). In both the cases, we fire > statement-level triggers only for the table specified in the command. OK. >>> By the way, code changes I made in the attached are such that a subsequent >>> patch could implement firing statement-level triggers of all the tables in >>> a partition hierarchy, which it seems we don't want to do. Should then >>> the code be changed to not create ResultRelInfos of all the tables but >>> only the root table (the one mentioned in the command)? You will see that >>> the patch adds fields named es_nonleaf_result_relations and >>> es_num_nonleaf_result_relations, whereas just es_root_result_relation >>> would perhaps do, for example. >> >> It seems better not to create any ResultRelInfos that we don't >> actually need, so +1 for such a revision to the patch. > > OK, done. It took a bit more work than I thought. So, this seems weird, because rootResultRelIndex is initialized even when splan->partitioned_rels == NIL, but isn't actually valid in that case. ExecInitModifyTable seems to think it's always valid, though. I think the way that you've refactored fireBSTriggers and fireASTriggers is a bit confusing. Instead of splitting out a separate function, how about just having the existing function begin with if (node->rootResultRelInfo) resultRelInfo = node->rootResultRelInfo; else resultRelInfo = node->resultRelInfo; ? I think the way you've coded it is a holdover from the earlier design where you were going to call it multiple times, but now that's not needed. Looks OK, otherwise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range
> What I'm thinking of is the regular indexscan that's done internally > by get_actual_variable_range, not whatever ends up getting chosen as > the plan for the user query. I had supposed that that would kill > dead index entries as it went, but maybe that's not happening for > some reason. Really, this happens as you said. Index entries are marked as dead. But after this, backends spends cpu time on skip this killed entries in _bt_checkkeys : if (scan->ignore_killed_tuples && ItemIdIsDead(iid)) { /* return immediately if there are more tuples on the page */ if (ScanDirectionIsForward(dir)) { if (offnum < PageGetMaxOffsetNumber(page)) return NULL; } else { BTPageOpaque opaque = (BTPageOpaque) PageGetSpecialPointer(page); if (offnum > P_FIRSTDATAKEY(opaque)) return NULL; } This confirmed by perf records and backtrace reported by Vladimir earlier. root@pgload01e ~ # perf report | grep -v '^#' | head 56.67% postgres postgres[.] _bt_checkkeys 19.27% postgres postgres[.] _bt_readpage 2.09% postgres postgres[.] pglz_decompress 2.03% postgres postgres[.] LWLockAttemptLock 1.61% postgres postgres[.] PinBuffer.isra.3 1.14% postgres postgres[.] hash_search_with_hash_value 0.68% postgres postgres[.] LWLockRelease 0.42% postgres postgres[.] AllocSetAlloc 0.40% postgres postgres[.] SearchCatCache 0.40% postgres postgres[.] ReadBuffer_common root@pgload01e ~ # It seems like killing dead tuples does not solve this problem.
Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range
Robert Haaswrites: > On Thu, Apr 27, 2017 at 5:22 PM, Tom Lane wrote: >> How so? Shouldn't the indexscan go back and mark such tuples dead in >> the index, such that they'd be visited this way only once? If that's >> not happening, maybe we should try to fix it. > Hmm. Actually, I think the scenario I saw was where there was a large > number of tuples at the end of the index that weren't dead yet due to > an old snapshot held open. That index was being scanned by lots of > short-running queries. Those queries executed just fine, but they > took a long to plan because they had to step over all of the dead > tuples in the index one by one. But that was the scenario that we intended to fix by changing to SnapshotDirty, no? Or I guess not quite, because dead-but-still-visible-to-somebody tuples are rejected by SnapshotDirty. Maybe we need another type of snapshot that would accept any non-vacuumable tuple. I really don't want SnapshotAny semantics here, but a tuple that was live more recently than the xmin horizon seems like it's acceptable enough. HeapTupleSatisfiesVacuum already implements the right behavior, but we don't have a Snapshot-style interface for it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
David Rowleywrites: > Did you mean to attach this? See the link in Teodor's original message (it's actually a .bz2 file not a .gz) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
On 29 April 2017 at 00:45, Alexander Korotkovwrote: > On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query > result is OK. Hi, Did you mean to attach this? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PROPOSAL] Use SnapshotAny in get_actual_variable_range
On Thu, Apr 27, 2017 at 5:22 PM, Tom Lanewrote: >>> But if we delete many rows from beginning or end of index, it would be >>> very expensive too because we will fetch each dead row and reject it. > >> Yep, and I've seen that turn into a serious problem in production. > > How so? Shouldn't the indexscan go back and mark such tuples dead in > the index, such that they'd be visited this way only once? If that's > not happening, maybe we should try to fix it. Hmm. Actually, I think the scenario I saw was where there was a large number of tuples at the end of the index that weren't dead yet due to an old snapshot held open. That index was being scanned by lots of short-running queries. Those queries executed just fine, but they took a long to plan because they had to step over all of the dead tuples in the index one by one. That increased planning time - multiplied by the number of times it was incurred - was sufficient to cripple the system. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
Ah, thanks for the clue about enable_hashjoin, because it wasn't reproducing for me as stated. I missed tweaked config, sorry -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
Alexander Korotkovwrites: > I've reproduced this bug on d981074c. > On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query > result is OK. > But with seqscan and hashjoin disabled, query returns 0 rows. Ah, thanks for the clue about enable_hashjoin, because it wasn't reproducing for me as stated. It looks like in the case that's giving wrong answers, the mergejoin is wrongly getting marked as "Inner Unique". Something's a bit too cheesy about that planner logic --- not sure what, yet. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] vcregress support for single TAP tests
On 04/26/2017 10:32 PM, Peter Eisentraut wrote: > On 4/23/17 17:09, Andrew Dunstan wrote: >> Here's a patch that will allow calling vcregress.pl to run a single TAP >> test set. It would work like this: >> >> >> vcregress.pl src/test/recover true >> >> >> The second argument if true (in the perl sense, of course) would trigger >> a temp install before running the tests. It defaults to off, in an >> attempt to minimize the unnecessary running of installs. > Seems kind of weird to have that functionality only for the tap tests, > though. > Yeah, you're right, I think we'd be much better mimicking what the makefiles do and honouring the NO_TEMP_INSTALL environment variable. Revised patch that works that way attached. So to run an arbitrary TAP test set, you would do (for example) vcregress.pl taptest src/test/recover cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl index 8933920..d387274 100644 --- a/src/tools/msvc/vcregress.pl +++ b/src/tools/msvc/vcregress.pl @@ -34,7 +34,7 @@ if (-e "src/tools/msvc/buildenv.pl") my $what = shift || ""; if ($what =~ -/^(check|installcheck|plcheck|contribcheck|modulescheck|ecpgcheck|isolationcheck|upgradecheck|bincheck|recoverycheck)$/i +/^(check|installcheck|plcheck|contribcheck|modulescheck|ecpgcheck|isolationcheck|upgradecheck|bincheck|recoverycheck|taptest)$/i ) { $what = uc $what; @@ -54,13 +54,6 @@ copy("$Config/dummy_seclabel/dummy_seclabel.dll", "src/test/regress"); $ENV{PATH} = "$topdir/$Config/libpq;$ENV{PATH}"; -my $schedule = shift; -unless ($schedule) -{ - $schedule = "serial"; - $schedule = "parallel" if ($what eq 'CHECK' || $what =~ /PARALLEL/); -} - if ($ENV{PERL5LIB}) { $ENV{PERL5LIB} = "$topdir/src/tools/msvc;$ENV{PERL5LIB}"; @@ -90,13 +83,14 @@ my %command = ( ISOLATIONCHECK => \, BINCHECK => \, RECOVERYCHECK => \, - UPGRADECHECK => \,); + UPGRADECHECK => \, + TAPTEST=> \,); my $proc = $command{$what}; exit 3 unless $proc; -&$proc(); +&$proc(@_); exit 0; @@ -104,6 +98,7 @@ exit 0; sub installcheck { + my $schedule = shift || 'serial'; my @args = ( "../../../$Config/pg_regress/pg_regress", "--dlpath=.", @@ -119,6 +114,7 @@ sub installcheck sub check { + my $schedule = shift || 'parallel'; InstallTemp(); chdir "${topdir}/src/test/regress"; my @args = ( @@ -145,7 +141,7 @@ sub ecpgcheck exit $status if $status; InstallTemp(); chdir "$topdir/src/interfaces/ecpg/test"; - $schedule = "ecpg"; + my $schedule = "ecpg"; my @args = ( "../../../../$Config/pg_regress_ecpg/pg_regress_ecpg", "--bindir=", @@ -219,6 +215,17 @@ sub bincheck exit $mstat if $mstat; } +sub taptest +{ + my $dir = shift; + + die "no tests found!" unless -d "$topdir/$dir/t"; + + InstallTemp(); + my $status = tap_check("$topdir/$dir"); + exit $status if $status; +} + sub plcheck { chdir "../../pl"; @@ -516,7 +523,6 @@ sub fetchRegressOpts $m =~ s{\\\r?\n}{}g; if ($m =~ /^\s*REGRESS_OPTS\s*\+?=(.*)/m) { - # Substitute known Makefile variables, then ignore options that retain # an unhandled variable reference. Ignore anything that isn't an # option starting with "--". @@ -588,17 +594,21 @@ sub GetTests sub InstallTemp { - print "Setting up temp install\n\n"; - Install("$tmp_installdir", "all", $config); + unless ($ENV{NO_TEMP_INSTALL}) + { + print "Setting up temp install\n\n"; + Install("$tmp_installdir", "all", $config); + } $ENV{PATH} = "$tmp_installdir/bin;$ENV{PATH}"; } sub usage { print STDERR - "Usage: vcregress.pl [ ]\n\n", + "Usage: vcregress.pl [ arg [ , arg] ... ]\n\n", "Options for :\n", " bincheck run tests of utilities in src/bin/\n", + " taptest run an arbitrary TAP test\n", " check deploy instance and run regression tests on it\n", " contribcheck run tests of modules in contrib/\n", " ecpgcheck run regression tests of ECPG\n", @@ -608,8 +618,10 @@ sub usage " plcheckrun tests of PL languages\n", " recoverycheck run recovery test suite\n", " upgradecheck run tests of pg_upgrade\n", - "\nOptions for :\n", + "\nOptions for arg: (used by check and installcheck)\n", " serial serial mode\n", - " parallel parallel mode\n"; + " parallel parallel mode\n", + "\nOptions for args for taptest\n", + " test-dir (required) directory where tests reside\n", exit(1); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
On Fri, Apr 28, 2017 at 12:48 PM, Teodor Sigaevwrote: > Both 9.6 and 10devel are affected to addiction of query result on seqscan >> variable. >> > Oops, I was too nervious, 9.6 is not affected to enable_seqscan setting. > But it doesn't push down condition too. I've reproduced this bug on d981074c. On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query result is OK. # explain analyze SELECT * FROM t1 INNER JOIN t2 ON ( EXISTS ( SELECT true FROM t3 WHERE t3.id1 = t1.id AND t3.id2 = t2.id ) ) WHERE t1.name = '5c5fec6a41b8809972870abc154b3ecd'; QUERY PLAN - Nested Loop (cost=6.42..1924.71 rows=1 width=99) (actual time=14.044..34.957 rows=*162* loops=1) Join Filter: (t3.id1 = t1.id) Rows Removed by Join Filter: 70368 -> Index Only Scan using t1i2 on t1 (cost=0.28..4.30 rows=1 width=66) (actual time=0.026..0.028 rows=1 loops=1) Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text) Heap Fetches: 0 -> Hash Join (cost=6.14..1918.37 rows=163 width=66) (actual time=0.077..28.310 rows=70530 loops=1) Hash Cond: (t3.id2 = t2.id) -> Seq Scan on t3 (cost=0.00..1576.30 rows=70530 width=66) (actual time=0.005..6.433 rows=70530 loops=1) -> Hash (cost=3.84..3.84 rows=184 width=33) (actual time=0.065..0.065 rows=184 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 20kB -> Seq Scan on t2 (cost=0.00..3.84 rows=184 width=33) (actual time=0.003..0.025 rows=184 loops=1) Planning time: 2.542 ms Execution time: 35.008 ms (14 rows) But with seqscan and hashjoin disabled, query returns 0 rows. # set enable_seqscan = off; # set enable_hashjoin = off; # explain analyze SELECT * FROM t1 INNER JOIN t2 ON ( EXISTS ( SELECT true FROM t3 WHERE t3.id1 = t1.id AND t3.id2 = t2.id ) ) WHERE t1.name = '5c5fec6a41b8809972870abc154b3ecd'; QUERY PLAN --- Nested Loop (cost=0.97..5265.82 rows=1 width=99) (actual time=18.718..18.718 rows=*0* loops=1) Join Filter: (t3.id1 = t1.id) Rows Removed by Join Filter: 163 -> Index Only Scan using t1i2 on t1 (cost=0.28..4.30 rows=1 width=66) (actual time=0.024..0.024 rows=1 loops=1) Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text) Heap Fetches: 0 -> Merge Join (cost=0.69..5259.48 rows=163 width=66) (actual time=0.033..18.670 rows=163 loops=1) Merge Cond: (t2.id = t3.id2) -> Index Only Scan using t2i1 on t2 (cost=0.27..19.03 rows=184 width=33) (actual time=0.015..0.038 rows=184 loops=1) Heap Fetches: 0 -> Index Only Scan using t3i2 on t3 (cost=0.42..4358.37 rows=70530 width=66) (actual time=0.015..10.484 rows=70094 loops=1) Heap Fetches: 0 Planning time: 2.571 ms Execution time: 18.778 ms (14 rows) -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] scram and \password
On 04/28/2017 07:49 AM, Noah Misch wrote: On Fri, Apr 21, 2017 at 11:04:14PM +0300, Heikki Linnakangas wrote: I'll continue reviewing the rest of the patch on Monday, but [...] This PostgreSQL 10 open item is past due for your status update. Kindly send a status update within 24 hours, and include a date for your subsequent status update. Moreover, this open item has been in progress for 17 days, materially longer than the 1-2 week RMT non-intervention period. Refer to the policy on open item ownership: https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com I just pushed some little cleanups that caught my eye while working on this. I need to rebase the patch set I sent earlier, and fix the little things that Michael pointed out, but that shouldn't take long. I plan to push a fix for this on Tuesday (Monday is a national holiday here). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure in REL9_5_STABLE
On Wed, Apr 19, 2017 at 11:19 AM, Andrew Gierthwrote: > > "Pavan" == Pavan Deolasee writes: > > Pavan> I am attaching a patch that throws a similar ERROR during > Pavan> planning even for 9.5. AFAICS in presence of grouping sets, we > Pavan> always decide to use sort-based implementation for grouping, but > Pavan> do not check if the columns support ordering or not. > > Hmmm. This is obviously my fault somewhere... the patch looks good, I'll > deal with it shortly. > JFR this got fixed via 7be3678a8cfb55dcfca90fa586485f835ab912a5 Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...
On 28 Apr. 2017 17:04, "Kang Yuzhe"wrote: Hello Simon, The journey that caused and is causing me a lot of pain is finding my way in PG development. Complex Code Reading like PG. Fully understanding the science of DBMS Engines: Query Processing, Storage stuff, Transaction Management and so on... Anyway as you said, the rough estimation towards any expertise seems to be in abidance with by The 10,000 Hour Rule. I will strive based on this rule. Start with not top-posting on the mailing list ;) For now, would please tell me how to know the exact PG version to which a specific patch was developed? Given x patch, how do I know the specific PG version it was developed for? If it a was created by git format-patch then the base git revision will be shown. This may be a commit from postgres public tree that you can find with 'git branch --contains'. Otherwise look at the proposed commit message if any, in the patch header. Or the email it was attached to. If all else fails guess based on the date.
Re: [HACKERS] Crash when partition column specified twice
Hello Amit, The extra n->is_from_type = false; seems to be added by mistake? @@ -11888,6 +11891,8 @@ TableFuncElement: ColId Typename opt_collate_clause n->is_local = true; n->is_not_null = false; n->is_from_type = false; + n->is_from_type = false; + n->is_from_parent = false; n->storage = 0; n->raw_default = NULL; n->cooked_default = NULL; On Fri, Apr 28, 2017 at 6:08 AM, Amit Langotewrote: > On 2017/04/27 12:36, Amit Langote wrote: > > Noticed that a crash occurs if a column is specified twice when creating > a > > partition: > > > > create table p (a int) partition by list (a); > > > > -- crashes > > create table p1 partition of parent ( > > a not null, > > a default 1 > > ) for values in (1); > > > > The logic in MergeAttributes() that merged partition column options with > > those of the parent didn't properly check for column being specified > twice > > and instead tried to delete the same ColumnDef from a list twice, causing > > the crash. > > > > Attached fixes that. > > Patch rebased, because of a conflict with b9a3ef55b2. > > Thanks, > Amit > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- Beena Emerson EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] Dropping a partitioned table takes too long
It seems that in 'load_relcache_init_file()', we forget to initialize 'rd_pdcxt' of relcache. 2017-04-27 0:33 GMT+08:00 Robert Haas: > On Wed, Apr 26, 2017 at 12:22 PM, Tom Lane wrote: > > Robert Haas writes: > >> On Tue, Apr 25, 2017 at 10:05 PM, Amit Langote > >> wrote: > >>> Your patch seems to be a much better solution to the problem, thanks. > > > >> Does anyone wish to object to this patch as untimely? > > > >> If not, I'll commit it. > > > > It's certainly not untimely to address such problems. What I'm wondering > > is if we should commit both patches. Avoiding an unnecessary heap_open > > is certainly a good thing, but it seems like the memory leak addressed > > by the first patch might still be of concern in other scenarios. > > I will defer to you on that. If you think that patch is a good idea, > please have at it. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > -- GaoZengqi pgf...@gmail.com zengqi...@gmail.com
Re: [HACKERS] Cached plans and statement generalization
On 26.04.2017 13:46, Pavel Stehule wrote: I attach new patch which allows to limit the number of autoprepared statements (autoprepare_limit GUC variable). Also I did more measurements, now with several concurrent connections and read-only statements. Results of pgbench with 10 connections, scale 10 and read-only statements are below: Protocol TPS extended 87k prepared 209k simple+autoprepare 206k As you can see, autoprepare provides more than 2 times speed improvement. Also I tried to measure overhead of parsing (to be able to substitute all literals, not only string literals). I just added extra call of pg_parse_query. Speed is reduced to 181k. So overhead is noticeable, but still making such optimization useful. This is why I want to ask question: is it better to implement slower but safer and more universal solution? Unsafe solution has not any sense, and it is dangerous (80% of database users has not necessary knowledge). If somebody needs the max possible performance, then he use explicit prepared statements. I attached new patch to this mail. I completely reimplement my original approach and now use parse tree transformation. New pgbench (-S -c 10) results are the following: Protocol TPS extended 87k prepared 209k simple+autoprepare 185k So there is some slowdown comparing with my original implementation and explicitly prepared statements, but still it provide more than two times speed-up comparing with unprepared queries. And it doesn't require to change existed applications. As far as most of real production application are working with DBMS through some connection pool (pgbouncer,...), I think that such optimization will be useful. Isn't it interesting if If we can increase system throughput almost two times by just setting one parameter in configuration file? I also tried to enable autoprepare by default and run regression tests. 7 tests are not passed because of the following reasons: 1. Slightly different error reporting (for example error location is not always identically specified). 2. Difference in query behavior caused by changed local settings (Andres gives an example with search_path, and date test is failed because of changing datestyle). 3. Problems with indirect dependencies (when table is altered only cached plans directly depending on this relation and invalidated, but not plans with indirect dependencies). 4. Not performing domain checks for null values. I do not think that this issues can cause problems for real application. Also it is possible to limit number of autoprepared statements using autoprepare_limit parameter, avoid possible backend memory overflow in case of larger number of unique queries sent by application. LRU discipline is used to drop least recently used plans. Any comments and suggestions for future improvement of this patch are welcome. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index cd39167..4fbc8b7 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -3610,6 +3610,454 @@ raw_expression_tree_walker(Node *node, } /* + * raw_expression_tree_mutator --- transform raw parse tree. + * + * This function is implementing slightly different approach for tree update than expression_tree_mutator(). + * Callback is given pointer to pointer to the current node and can update this field instead of returning reference to new node. + * It makes it possible to remember changes and easily revert them without extra traversal of the tree. + * + * This function do not need QTW_DONT_COPY_QUERY flag: it never implicitly copy tree nodes, doing in-place update. + * + * Like raw_expression_tree_walker, there is no special rule about query + * boundaries: we descend to everything that's possibly interesting. + * + * Currently, the node type coverage here extends only to DML statements + * (SELECT/INSERT/UPDATE/DELETE) and nodes that can appear in them, because + * this is used mainly during analysis of CTEs, and only DML statements can + * appear in CTEs. If some other node is visited, iteration is immediately stopped and true is returned. + */ +bool +raw_expression_tree_mutator(Node *node, + bool (*mutator) (), + void *context) +{ + ListCell *temp; + + /* + * The walker has already visited the current node, and so we need only + * recurse into any sub-nodes it has. + */ + if (node == NULL) + return false; + + /* Guard against stack overflow due to overly complex expressions */ + check_stack_depth(); + + switch (nodeTag(node)) + { + case T_SetToDefault: + case T_CurrentOfExpr: + case T_Integer: + case T_Float: + case T_String: + case T_BitString: + case T_Null: + case T_ParamRef: + case T_A_Const: + case
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
Both 9.6 and 10devel are affected to addiction of query result on seqscan variable. Oops, I was too nervious, 9.6 is not affected to enable_seqscan setting. But it doesn't push down condition too. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interval for launching the table sync worker
On Fri, Apr 28, 2017 at 5:26 PM, Kyotaro HORIGUCHIwrote: > At Fri, 28 Apr 2017 10:20:48 +0900, Masahiko Sawada > wrote in
Re: [HACKERS] Declarative partitioning - another take
On 2017/04/28 7:36, David Fetter wrote: > On Thu, Apr 27, 2017 at 10:30:54AM +0900, Amit Langote wrote: >> On 2017/04/27 1:52, Robert Haas wrote: >>> On Tue, Apr 25, 2017 at 10:34 PM, Amit Langote >>>wrote: FWIW, I too prefer the latter, that is, fire only the parent's triggers. In that case, applying only the patch 0001 will do. >>> >>> Do we need to update the documentation? >> >> Yes, I think we should. How about as in the attached? >> >> By the way, code changes I made in the attached are such that a subsequent >> patch could implement firing statement-level triggers of all the tables in >> a partition hierarchy, which it seems we don't want to do. Should then >> the code be changed to not create ResultRelInfos of all the tables but >> only the root table (the one mentioned in the command)? You will see that >> the patch adds fields named es_nonleaf_result_relations and >> es_num_nonleaf_result_relations, whereas just es_root_result_relation >> would perhaps do, for example. > > Did I notice correctly that there's no way to handle transition tables > for partitions in AFTER STATEMENT triggers? Did you mean to ask about AFTER STATEMENT triggers defined on "partitioned" tables? Specifying transition table for them is disallowed at all. ERROR: "p" is a partitioned table DETAIL: Triggers on partitioned tables cannot have transition tables. Triggers created on (leaf) partitions *do* allow specifying transition table. Or are you asking something else altogether? > If not, I'm not suggesting that this be added at this late date, but > we might want to document that. I don't see mentioned in the documentation that such triggers cannot be defined on partitioned tables. Is that what you are saying should be documented? Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] convert EXSITS to inner join gotcha and bug
Hi! Seems, there two issues: 1) Sometime conditions which for a first glance could be pushed down to scan are leaved as join quals. And it could be a ~30 times performance loss. 2) Number of query result depend on enabe_seqscan variable. The query explain analyze SELECT * FROM t1 INNER JOIN t2 ON ( EXISTS ( SELECT true FROM t3 WHERE t3.id1 = t1.id AND t3.id2 = t2.id ) ) WHERE t1.name = '5c5fec6a41b8809972870abc154b3ecd' ; produces following plan: Nested Loop (cost=6.42..1928.71 rows=1 width=99) (actual time=71.415..148.922 rows=162 loops=1) Join Filter: (t3.id1 = t1.id) Rows Removed by Join Filter: 70368 -> Index Only Scan using t1i2 on t1 (cost=0.28..8.30 rows=1 width=66) (actual time=0.100..0.103 rows=1 loops=1) Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text) Heap Fetches: 1 -> Hash Join (cost=6.14..1918.37 rows=163 width=66) (actual time=0.370..120.971 rows=70530 loops=1) (1) Hash Cond: (t3.id2 = t2.id) (2) -> Seq Scan on t3 (cost=0.00..1576.30 rows=70530 width=66) (actual time=0.017..27.424 rows=70530 loops=1) -> Hash (cost=3.84..3.84 rows=184 width=33) (actual time=0.273..0.273 rows=184 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 20kB -> Seq Scan on t2 (cost=0.00..3.84 rows=184 width=33) (actual time=0.017..0.105 rows=184 loops=1) Planning time: 7.326 ms Execution time: 149.115 ms Condition (1) is not pushed to scan (2) which seemsly could be safely moved. With seqscan = off condition is not pushed too but query returns only one row instead of 162. Scan on t3 returns ~7 rows but only ~150 rows are really needed. I didn't found a combination of GUCs enable_* to push down that and it seems to me there is reason for that which I don't see or support is somehow missed. If pair of (t3.id1, t3.id2) is unique (see dump, there is a unique index on them) the query could be directly rewrited to inner join and its plan is: Nested Loop (cost=9.70..299.96 rows=25 width=66) (actual time=0.376..5.232 rows=162 loops=1) -> Nested Loop (cost=9.43..292.77 rows=25 width=99) (actual time=0.316..0.645 rows=162 loops=1) -> Index Only Scan using t1i2 on t1 (cost=0.28..8.30 rows=1 width=66) (actual time=0.047..0.050 rows=1 loops=1) Index Cond: (name = '5c5fec6a41b8809972870abc154b3ecd'::text) Heap Fetches: 1 -> Bitmap Heap Scan on t3 (cost=9.15..283.53 rows=94 width=66) (actual time=0.257..0.426 rows=162 loops=1) Recheck Cond: (id1 = t1.id) Heap Blocks: exact=3 -> Bitmap Index Scan on t3i1 (cost=0.00..9.12 rows=94 width=0) (actual time=0.186..0.186 rows=162 loops=1) Index Cond: (id1 = t1.id) -> Index Only Scan using t2i1 on t2 (cost=0.27..0.29 rows=1 width=33) (actual time=0.024..0.024 rows=1 loops=162) Index Cond: (id = t3.id2) Heap Fetches: 162 Planning time: 5.532 ms Execution time: 5.457 ms Second plan is ~30 times faster. But with turned off sequentual scan the first query is not work correctly, which points to some bug in planner, I suppose. Both 9.6 and 10devel are affected to addiction of query result on seqscan variable. Dump to reproduce (subset of real data but obfucated), queries are in attachment http://sigaev.ru/misc/exists_to_nested.sql.gz -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ --query returns 162 rows explain analyze SELECT * FROM t1 INNER JOIN t2 ON ( EXISTS ( SELECT true FROM t3 WHERE t3.id1 = t1.id AND t3.id2 = t2.id ) ) WHERE t1.name = '5c5fec6a41b8809972870abc154b3ecd' ; set enable_seqscan=off; --the same query returns only one row! explain analyze SELECT * FROM t1 INNER JOIN t2 ON ( EXISTS ( SELECT true FROM t3 WHERE t3.id1 = t1.id AND t3.id2 = t2.id ) ) WHERE t1.name = '5c5fec6a41b8809972870abc154b3ecd' ; explain analyze SELECT t1.* FROM t1, t2, t3 WHERE t1.name = '5c5fec6a41b8809972870abc154b3ecd' AND t3.id1 = t1.id AND t3.id2 = t2.id;
Re: [HACKERS] On How To Shorten the Steep Learning Curve Towards PG Hacking...
Hello Simon, The journey that caused and is causing me a lot of pain is finding my way in PG development. Complex Code Reading like PG. Fully understanding the science of DBMS Engines: Query Processing, Storage stuff, Transaction Management and so on... Anyway as you said, the rough estimation towards any expertise seems to be in abidance with by The 10,000 Hour Rule. I will strive based on this rule. For now, would please tell me how to know the exact PG version to which a specific patch was developed? Given x patch, how do I know the specific PG version it was developed for? Regards, Zeray On Mon, Apr 17, 2017 at 7:33 PM, Simon Riggswrote: > On 27 March 2017 at 13:00, Kang Yuzhe wrote: > > > I have found PG source Code reading and hacking to be one the most > > frustrating experiences in my life. I believe that PG hacking should > not be > > a painful journey but an enjoyable one! > > > > It is my strong believe that out of my PG hacking frustrations, there may > > come insights for the PG experts on ways how to devise hands-on with PG > > internals so that new comers will be great coders as quickly as possible. > > I'm here now because PostgreSQL has clear, well designed and > maintained code with accurate docs, great comments and a helpful team. > > I'd love to see detailed cases where another project is better in a > measurable way; I am willing to learn from that. > > Any journey to expertise takes 10,000 hours. There is no way to shorten > that. > > What aspect of your journey caused you pain? > > -- > Simon Riggshttp://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: [HACKERS] Declarative partitioning - another take
Hi Rajkumar, On 2017/04/28 17:11, Rajkumar Raghuwanshi wrote: > On Fri, Apr 28, 2017 at 11:43 AM, Amit Langote < >> Updated patch attached. > > I have applied given patch, could see below behaviour with statement > trigger. > > When trying to delete value within partition range, triggers got fired > (with zero row affected) as expected, but if trying to delete value which > is outside of partition range (with zero row affected), No trigger fired. > is this expected?? Good catch. I'm afraid though that this is not a defect of this patch, but some unrelated (maybe) bug, which affects not only the partitioned tables but inheritance in general. Problem is that the plan created is such that the executor has no opportunity to fire the trigger in question, because the plan contains no information about which table is affected by the statement. You can see that with inheritance. See below: create table foo (); create table bar () inherits (foo); create or replace function trig_notice() returns trigger as $$ begin raise notice 'trigger fired'; return null; end; $$ language plpgsql; create trigger foo_del_before before delete on foo for each statement execute procedure trig_notice(); explain delete from foo where false; QUERY PLAN -- Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (2 rows) -- no trigger fired delete from foo where false; DELETE 0 Trigger *is* fired though, if inheritance is not used. explain delete from only foo where false; QUERY PLAN - Delete on foo (cost=0.00..0.00 rows=0 width=0) -> Result (cost=0.00..0.00 rows=0 width=6) One-Time Filter: false (3 rows) delete from only foo where false; NOTICE: trigger fired DELETE 0 I'm not sure how to go about fixing this, if at all. Or to fix it at least for partitioned tables. Would like to hear what others think. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interval for launching the table sync worker
At Fri, 28 Apr 2017 10:20:48 +0900, Masahiko Sawadawrote in
Re: [HACKERS] Declarative partitioning - another take
On Fri, Apr 28, 2017 at 11:43 AM, Amit Langote < langote_amit...@lab.ntt.co.jp> wrote: > Updated patch attached. > Hi Amit, I have applied given patch, could see below behaviour with statement trigger. When trying to delete value within partition range, triggers got fired (with zero row affected) as expected, but if trying to delete value which is outside of partition range (with zero row affected), No trigger fired. is this expected?? Below are steps to reproduce. CREATE TABLE trigger_test_table (a INT, b INT) PARTITION BY RANGE(a); CREATE TABLE trigger_test_table1 PARTITION OF trigger_test_table FOR VALUES FROM (0) to (6); INSERT INTO trigger_test_table (a,b) SELECT i,i FROM generate_series(1,3)i; CREATE TABLE trigger_test_statatics(TG_NAME varchar,TG_TABLE_NAME varchar,TG_LEVEL varchar,TG_WHEN varchar, TG_OP varchar); CREATE FUNCTION trigger_test_procedure() RETURNS TRIGGER AS $ttp$ BEGIN INSERT INTO trigger_test_statatics SELECT TG_NAME,TG_TABLE_NAME,TG_LEVEL,TG_WHEN,TG_OP; RETURN OLD; END; $ttp$ LANGUAGE plpgsql; CREATE TRIGGER trigger_test11 AFTER DELETE ON trigger_test_table FOR EACH STATEMENT EXECUTE PROCEDURE trigger_test_procedure(); CREATE TRIGGER trigger_test12 AFTER DELETE ON trigger_test_table1 FOR EACH STATEMENT EXECUTE PROCEDURE trigger_test_procedure(); postgres=# DELETE FROM trigger_test_table WHERE a = 5; DELETE 0 postgres=# SELECT * FROM trigger_test_statatics; tg_name | tg_table_name| tg_level | tg_when | tg_op ++---+-+ trigger_test11 | trigger_test_table | STATEMENT | AFTER | DELETE (1 row) TRUNCATE TABLE trigger_test_statatics; --statement trigger NOT fired, when trying to delete data outside partition range. postgres=# DELETE FROM trigger_test_table WHERE a = 10; DELETE 0 postgres=# SELECT * FROM trigger_test_statatics; tg_name | tg_table_name | tg_level | tg_when | tg_op -+---+--+-+--- (0 rows) Thanks & Regards, Rajkumar Raghuwanshi QMG, EnterpriseDB Corporation
Re: [HACKERS] subscription worker doesn't start immediately on eabled
At Fri, 28 Apr 2017 06:43:19 +0900, Fujii Masaowrote in
Re: [HACKERS] .pgpass's behavior has changed
On Fri, Apr 28, 2017 at 4:54 PM, Kyotaro HORIGUCHIwrote: > I noticed that the precedence between host and hostaddr in a > connection string is reversed in regard to .pgpass lookup in > devel. > > For example the following connection string uses a .pgpass entry > with "127.0.0.1", not "hoge". > > "host=hoge hostaddr=127.0.0.1 port=5432 dbname=postgres" > > > This change was introdueced by the commit > 274bb2b3857cc987cfa21d14775cae9b0dababa5 and the current behavior > contradicts the documentation. > > https://www.postgresql.org/docs/devel/static/libpq-connect.html > >> hostaddr >> ... >> Note that authentication is likely to fail if host is not the >> name of the server at network address hostaddr. Also, note that >> host rather than hostaddr is used to identify the connection in >> a password file (see Section 33.15, “The Password File”). > > I think this should be fixed for the same reason with the > following commit. I am planning to look at your patch more closely, but thinking how painful it is going to be to check your patch, I think that it would be a good idea to have a TAP test for pgpass in src/test/authentication/ to be sure that this never breaks again in the future. We cannot test all combinations as servers only listen to unix domains but surely we can get some coverage with libpq. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] .pgpass's behavior has changed
Hello. I noticed that the precedence between host and hostaddr in a connection string is reversed in regard to .pgpass lookup in devel. For example the following connection string uses a .pgpass entry with "127.0.0.1", not "hoge". "host=hoge hostaddr=127.0.0.1 port=5432 dbname=postgres" This change was introdueced by the commit 274bb2b3857cc987cfa21d14775cae9b0dababa5 and the current behavior contradicts the documentation. https://www.postgresql.org/docs/devel/static/libpq-connect.html > hostaddr > ... > Note that authentication is likely to fail if host is not the > name of the server at network address hostaddr. Also, note that > host rather than hostaddr is used to identify the connection in > a password file (see Section 33.15, “The Password File”). I think this should be fixed for the same reason with the following commit. > commit 11003eb55658df0caf183eef69c7a97d56a4f2d7 > Author: Robert Haas> Date: Thu Dec 1 14:36:39 2016 -0500 > > libpq: Fix inadvertent change in PQhost() behavior. But the above also leaves a bug so I sent another patch to fix it. The attched patch restores the 9.6's beavior of looking up .pgpass file in the same manner to the aother patch. regards, -- Kyotaro Horiguchi NTT Open Source Software Center *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *** *** 978,986 connectOptions2(PGconn *conn) for (i = 0; i < conn->nconnhost; i++) { ! /* Try to get a password for this host from pgpassfile */ conn->connhost[i].password = ! passwordFromFile(conn->connhost[i].host, conn->connhost[i].port, conn->dbName, conn->pguser, --- 978,995 for (i = 0; i < conn->nconnhost; i++) { ! /* ! * Try to get a password for this host from pgpassfile. We use host ! * name rather than host address in the same manner to PQhost(). ! */ ! char *pwhost = conn->connhost[i].host; ! ! if (conn->connhost[i].type == CHT_HOST_ADDRESS && ! conn->pghost != NULL && conn->pghost[0] != '\0') ! pwhost = conn->pghost; ! conn->connhost[i].password = ! passwordFromFile(pwhost, conn->connhost[i].port, conn->dbName, conn->pguser, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PQhost may return socket dir for network connection
Hello. As the subject, PQhost() seems to be forgeting about the case where only hostaddr is specified in a connection string. PQhost() returns /var/run/postgresql or such for a connection made from the connection string "hostaddr=127.0.0.1 port=5432 dbname=postgres". I don't believe this behavior is useful. The first attached file is a fix for master. connhost[]->pghost is properly set by connectOptions2() so we don't need to handle the default case in PQhost. The second is a fix for 9.6. connectOption2 was setting pgunixsocket then clearing pghost reagardless of the value of pghostaddr. This behavior inhibited PQhost() from getting the right name to return. The behavior for several connection strings after this fix are the follows. The first line is the case fixed by this patch. 1. PQhost = 127.0.0.1 <= "hostaddr=127.0.0.1 port=5432 dbname=postgres" 2. PQhost = /tmp <= "port=5432 dbname=postgres" 3. PQhost = /tmp <= "host=/tmp port=5432 dbname=postgres" 4. PQhost = /tmp <= "host=/tmp hostaddr=127.0.0.1 port=5432 dbname=postgres" 5. PQhost = localhost <= "host=localhost port=5432 dbname=postgres" 6. PQhost = localhost <= "host=localhost hostaddr=127.0.0.1 port=5432 dbname=postgres" 7. PQhost = hoge <= "host=hoge hostaddr=127.0.0.1 port=5432 dbname=postgres" The third case is somewhat confusing but it would be the user's fault. regards, -- Kyotaro Horiguchi NTT Open Source Software Center *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *** *** 5836,5854 PQhost(const PGconn *conn) { if (!conn) return NULL; ! if (conn->connhost != NULL && ! conn->connhost[conn->whichhost].type != CHT_HOST_ADDRESS) ! return conn->connhost[conn->whichhost].host; ! else if (conn->pghost != NULL && conn->pghost[0] != '\0') return conn->pghost; ! else ! { ! #ifdef HAVE_UNIX_SOCKETS ! return DEFAULT_PGSOCKET_DIR; ! #else ! return DefaultHost; ! #endif ! } } char * --- 5836,5857 { if (!conn) return NULL; ! ! /* ! * We should try to avoid returning ip address. connhost[]->host stores IP ! * address in the case of CHT_HOST_ADDRESS so try to use conn->pghost ! * instead. ! */ ! if (conn->connhost == NULL || ! (conn->connhost[conn->whichhost].type == CHT_HOST_ADDRESS && ! conn->pghost != NULL && conn->pghost[0] != '\0')) return conn->pghost; ! ! /* ! * Otherwise we use this as host name even though it might be an IP ! * address. ! */ ! return conn->connhost[conn->whichhost].host; } char * *** a/src/interfaces/libpq/fe-connect.c --- b/src/interfaces/libpq/fe-connect.c *** *** 819,827 connectOptions2(PGconn *conn) } /* ! * Allow unix socket specification in the host name */ ! if (conn->pghost && is_absolute_path(conn->pghost)) { if (conn->pgunixsocket) free(conn->pgunixsocket); --- 819,828 } /* ! * Allow unix socket specification in the host name if hostaddr is not set */ ! if ((conn->pghostaddr == NULL || conn->pghostaddr[0] == '\0') && ! conn->pghost && is_absolute_path(conn->pghost)) { if (conn->pgunixsocket) free(conn->pgunixsocket); *** *** 5354,5370 PQhost(const PGconn *conn) return NULL; if (conn->pghost != NULL && conn->pghost[0] != '\0') return conn->pghost; ! else ! { #ifdef HAVE_UNIX_SOCKETS ! if (conn->pgunixsocket != NULL && conn->pgunixsocket[0] != '\0') ! return conn->pgunixsocket; ! else ! return DEFAULT_PGSOCKET_DIR; #else ! return DefaultHost; #endif - } } char * --- 5355,5369 return NULL; if (conn->pghost != NULL && conn->pghost[0] != '\0') return conn->pghost; ! if (conn->pghostaddr != NULL && conn->pghostaddr[0] != '\0') ! return conn->pghostaddr; #ifdef HAVE_UNIX_SOCKETS ! if (conn->pgunixsocket != NULL && conn->pgunixsocket[0] != '\0') ! return conn->pgunixsocket; ! return DEFAULT_PGSOCKET_DIR; #else ! return DefaultHost; #endif } char * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise aggregation/grouping
Jeevan Chalkewrote: > On Thu, Apr 27, 2017 at 4:53 PM, Antonin Houska wrote: > > > Robert Haas wrote: > > > Well, I'm confused. I see that there's a relationship between what > > > Antonin is trying to do and what Jeevan is trying to do, but I can't > > > figure out whether one is a subset of the other, whether they're both > > > orthogonal, or something else. This plan looks similar to what I > > > would expect Jeevan's patch to produce, > > The point is that the patch Jeevan wanted to work on is actually a subset > > of > > [1] combined with [2]. > Seems like, as you are targeting every relation whether or not it is > partitioned. Yes. > With my patch, I am getting following plan where we push entire > aggregation below append. > > QUERY PLAN > -- > Append > -> HashAggregate > Group Key: b_1.j > -> Hash Join > Hash Cond: (b_1.j = c_1.k) > -> Seq Scan on b_1 > -> Hash > -> Seq Scan on c_1 > -> HashAggregate > Group Key: b_2.j > -> Hash Join > Hash Cond: (b_2.j = c_2.k) > -> Seq Scan on b_2 > -> Hash > -> Seq Scan on c_2 > (15 rows) I think this is not generic enough because the result of the Append plan can be joined to another relation. As such a join can duplicate the already-aggregated values, the aggregates should not be finalized below the top-level plan. > Antonin, I have tried applying your patch on master but it doesn't get > apply. Can you please provide the HEAD and any other changes required > to be applied first? I've lost that information. I'll post a new version to the [1] thread asap. > How the plan look like when GROUP BY key does not match with the > partitioning key i.e. GROUP BY b.v ? EXPLAIN (COSTS false) SELECT b.v, avg(b.v + c.v) FROMb JOIN c ON b.j = c.k GROUP BYb.v; QUERY PLAN Finalize HashAggregate Group Key: b_1.v -> Append -> Partial HashAggregate Group Key: b_1.v -> Hash Join Hash Cond: (b_1.j = c_1.k) -> Seq Scan on b_1 -> Hash -> Seq Scan on c_1 -> Partial HashAggregate Group Key: b_2.v -> Hash Join Hash Cond: (b_2.j = c_2.k) -> Seq Scan on b_2 -> Hash -> Seq Scan on c_2 > > [1] https://www.postgresql.org/message-id/9666.1491295317%40localhost > > > > [2] https://commitfest.postgresql.org/14/994/ -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bug in prepared statement cache invalidation?
Hi hackers, I find out that now Postgres correctly invalidates prepared plans which directly depend on altered relation, but doesn't invalidate plans having transitive (indirect) dependencies. Is it a bug or feature? postgres=# create table foo(x integer); CREATE TABLE postgres=# select * from foo; x --- (0 rows) postgres=# create function returnqueryf()returns setof foo as $$ begin return query select * from foo; end; $$ language plpgsql; CREATE FUNCTION postgres=# select * from returnqueryf(); x --- (0 rows) postgres=# create function returnqueryff()returns setof foo as $$ begin return query select * from returnqueryf(); end; $$ language plpgsql; CREATE FUNCTION postgres=# select * from returnqueryff(); x --- (0 rows) postgres=# alter table foo add column y integer; ALTER TABLE postgres=# select * from foo; x | y ---+--- (0 rows) postgres=# select * from returnqueryf(); x | y ---+--- (0 rows) postgres=# select * from returnqueryff(); ERROR: structure of query does not match function result type DETAIL: Number of returned columns (1) does not match expected column count (2). CONTEXT: PL/pgSQL function returnqueryff() line 1 at RETURN QUERY p -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Merge join for GiST
Hi, hackers! I've adapted crossmatch join from pgSphere to cube for performance tests. I've placed spatial join code here https://github.com/Octonica/postgres/blob/spatialjoin/contrib/cube/spatialjoin.c and node code here https://github.com/Octonica/postgres/blob/spatialjoin/contrib/cube/joinnode.c If you have an idea of improving the performance of this code, please do not hesitate to express them. One of the performance bottlenecks is code nearby heap_getattr() in fetch_next_pair(). ==Performance Tests== I've tested performance on queries which aggregate result of the spatial join. See cube_test.sql attached. On 3d data, Spatial Join performs 3x faster than Nested Loop Join + GiST Scan Nested Loop + Index Scan plan: HashAggregate (cost=36841568.00..36841570.00 rows=200 width=40) (actual time=206565.869..206738.307 rows=298731 loops=1) Group Key: r.nx -> Nested Loop (cost=0.41..25591568.00 rows=9 width=40) (actual time=0.357..200838.416 rows=8464147 loops=1) -> Seq Scan on regions r (cost=0.00..6410.00 rows=30 width=40) (actual time=0.015..324.436 rows=30 loops=1) -> Index Scan using idx on datatable a (cost=0.41..55.28 rows=3000 width=64) (actual time=0.174..0.648 rows=28 loops=30) Index Cond: (r.c @> c) Planning time: 17.175 ms Execution time: 206806.926 ms Time: 206852,635 ms (03:26,853) Spatial Join plan: HashAggregate (cost=56250001.00..56250003.00 rows=200 width=40) (actual time=67373.644..67553.118 rows=298731 loops=1) Group Key: r.nx -> Custom Scan (SpatialJoin) (cost=0.00..1.00 rows=45 width=40) (actual time=0.151..61718.804 rows=8464147 loops=1) Outer index: idx Inner index: idx1 Planning time: 0.182 ms Execution time: 67630.742 ms Time: 67631,557 ms (01:07,632) But on more realistic 7D data with queries emulating OLAP system performance of Spatial Join is 2 times worse than Nested Loop Join + GiST Scan. Which comes as a complete surprise to me. I do not see any algorithmic reason for Spatial Join to be slower. Thus I strongly suspect that my implementation is not efficient, but as for now I have no ideas how to improve it. Here are plans for 7D Nested Loop + Index Scan HashAggregate (cost=3425143.00..3425743.00 rows=6 width=72) (actual time=122794.715..122822.893 rows=6 loops=1) Group Key: r.nx -> Nested Loop (cost=0.41..2075143.00 rows=6000 width=72) (actual time=0.311..100478.710 rows=39817008 loops=1) -> Seq Scan on r1 r (cost=0.00..2419.00 rows=6 width=128) (actual time=0.043..60.579 rows=6 loops=1) -> Index Scan using ix_a1_cube on a1 a (cost=0.41..24.55 rows=1000 width=128) (actual time=0.110..1.266 rows=664 loops=6) Index Cond: (c <@ r.c) Planning time: 0.349 ms Execution time: 122831.353 ms (8 rows) Spatial Join HashAggregate (cost=6750001.00..6750601.00 rows=6 width=72) (actual time=241832.855..241889.360 rows=6 loops=1) Group Key: r.nx -> Custom Scan (SpatialJoin) (cost=0.00..1.00 rows=3 width=72) (actual time=0.140..216187.111 rows=39817008 loops=1) Outer index: ix_r1_cube Inner index: ix_a1_cube Planning time: 0.533 ms Execution time: 241907.569 ms (7 rows) Time: 241910,440 ms (04:01,910) Any ideas will be highly appreciated. Best regards, Andrey Borodin. create extension if not exists cube; begin transaction; SELECT setseed(.43); \timing create table dataTable as select cube(array[random(),random(),random()]) c, 1::float as x1, 1::float as x2, 1::float as x3, 1::float as x4 from generate_series(1,3e6,1) s; create index idx on dataTable using gist(c); create table regions as select cube(array[x,y,z],array[x+0.1,y+0.01,z+0.01]) c, row_number() over() as nx from (select random() x,random() y, random() z from generate_series(1,3e5,1) s) q; create index idx1 on regions using gist(c); set max_parallel_workers_per_gather = 0; explain analyze select sum(a.x1) as x1, sum(a.x2) as x2, sum(a.x3) as x3, sum(a.x4) as x4 from dataTable a join regions r on r.c @> a.c group by r.nx; explain analyze select sum(a.x1) as x1, sum(a.x2) as x2, sum(a.x3) as x3, sum(a.x4) as x4 from dataTable a join regions r on r.c && a.c group by r.nx; commit; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
Thanks for taking a look. On 2017/04/28 5:24, Robert Haas wrote: > On Wed, Apr 26, 2017 at 9:30 PM, Amit Langote >wrote: >>> Do we need to update the documentation? >> >> Yes, I think we should. How about as in the attached? > > Looks reasonable, but I was thinking you might also update the section > which contrasts inheritance-based partitioning with declarative > partitioning. It seems to me that there is no difference in behavior between inheritance-based and declarative partitioning as far as statement-level triggers are concerned (at least currently). In both the cases, we fire statement-level triggers only for the table specified in the command. Maybe, we will fix things someday so that statement-level triggers will be fired for all the tables in a inheritance hierarchy when the root parent is updated or deleted, but that's currently not true. We may never implement that behavior for declarative partitioned tables though, so there will be a difference if and when we implement the former. Am I missing something? >> By the way, code changes I made in the attached are such that a subsequent >> patch could implement firing statement-level triggers of all the tables in >> a partition hierarchy, which it seems we don't want to do. Should then >> the code be changed to not create ResultRelInfos of all the tables but >> only the root table (the one mentioned in the command)? You will see that >> the patch adds fields named es_nonleaf_result_relations and >> es_num_nonleaf_result_relations, whereas just es_root_result_relation >> would perhaps do, for example. > > It seems better not to create any ResultRelInfos that we don't > actually need, so +1 for such a revision to the patch. OK, done. It took a bit more work than I thought. Updated patch attached. Thanks, Amit >From a8b584f09bc02b6c16c33e816fc12c7e443dd65c Mon Sep 17 00:00:00 2001 From: amit Date: Mon, 24 Apr 2017 14:55:08 +0900 Subject: [PATCH] Fire per-statement triggers of partitioned tables Current implementation completely misses them, because it assumes that no ResultRelInfos need to be created for partitioned tables, whereas they *are* needed to fire the per-statment triggers, which we *do* allow to be defined on the partitioned tables. Reported By: Rajkumar Raghuwanshi Patch by: Amit Langote Report: https://www.postgresql.org/message-id/CAKcux6%3DwYospCRY2J4XEFuVy0L41S%3Dfic7rmkbsU-GXhhSbmBg%40mail.gmail.com --- doc/src/sgml/trigger.sgml | 26 +++ src/backend/executor/execMain.c | 55 -- src/backend/executor/nodeModifyTable.c | 64 ++ src/backend/nodes/copyfuncs.c | 2 + src/backend/nodes/outfuncs.c| 3 ++ src/backend/nodes/readfuncs.c | 2 + src/backend/optimizer/plan/createplan.c | 1 + src/backend/optimizer/plan/planner.c| 3 ++ src/backend/optimizer/plan/setrefs.c| 19 ++-- src/include/nodes/execnodes.h | 12 + src/include/nodes/plannodes.h | 13 +- src/include/nodes/relation.h| 1 + src/test/regress/expected/triggers.out | 81 + src/test/regress/sql/triggers.sql | 70 14 files changed, 323 insertions(+), 29 deletions(-) diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml index 2a718d7f47..5771bd5fdc 100644 --- a/doc/src/sgml/trigger.sgml +++ b/doc/src/sgml/trigger.sgml @@ -33,7 +33,8 @@ A trigger is a specification that the database should automatically execute a particular function whenever a certain type of operation is -performed. Triggers can be attached to tables, views, and foreign tables. +performed. Triggers can be attached to tables (partitioned or not), +views, and foreign tables. @@ -111,14 +112,21 @@ Statement-level BEFORE triggers naturally fire before the statement starts to do anything, while statement-level AFTER triggers fire at the very end of the statement. These types of -triggers may be defined on tables or views. Row-level BEFORE -triggers fire immediately before a particular row is operated on, -while row-level AFTER triggers fire at the end of the -statement (but before any statement-level AFTER triggers). -These types of triggers may only be defined on tables and foreign tables. -Row-level INSTEAD OF triggers may only be defined on views, -and fire immediately as each row in the view is identified as needing to -be operated on. +triggers may be defined on tables, views, or foreign tables. Row-level +BEFORE triggers fire immediately before a particular row is +operated on, while row-level AFTER triggers fire at the end of +the statement (but before any statement-level AFTER triggers). +These types of triggers may only be defined on non-partitioned tables and +foreign tables.