Re: [HACKERS] UPDATE of partition key
ure = mtstate->mt_transition_capture; 24. I know from reading the thread this name has changed before, but I think delete_skipped seems like the wrong name for this variable in: if (delete_skipped) *delete_skipped = true; Skipped is the wrong word here as that indicates like we had some sort of choice and that we decided not to. However, that's not the case when the tuple was concurrently deleted. Would it not be better to call it "tuple_deleted" or even "success" and reverse the logic? It's just a bit confusing that you're setting this to skipped before anything happens. It would be nicer if there was a better way to do this whole thing as it's a bit of a wart in the code. I understand why the code exists though. Also, I wonder if it's better to always pass a boolean here to save having to test for NULL before setting it, that way you might consider putting the success = false just before the return NULL, then do success = true after the tuple is gone. Failing that, putting: something like: success = false; /* not yet! */ where you're doing the if (deleted_skipped) test is might also be better. 25. Comment "we should" should be "we must". /* * For some reason if DELETE didn't happen (for e.g. trigger * prevented it, or it was already deleted by self, or it was * concurrently deleted by another transaction), then we should * skip INSERT as well, otherwise, there will be effectively one * new row inserted. Maybe just: /* If the DELETE operation was unsuccessful, then we must not * perform the INSERT into the new partition. "for e.g." is not really correct in English. "For example, ..." or just "e.g. ..." is correct. If you de-abbreviate the e.g. then you've written "For exempli gratia", which translates to "For for example". 26. You're not really explaining what's going on here: if (mtstate->mt_transition_capture) saved_tcs_map = mtstate->mt_transition_capture->tcs_map; You have a comment later to say you're about to "Revert back to the transition capture map", but I missed the part that explained about modifying it in the first place. 27. Comment does not explain how we're skipping checking the partition constraint check in: * We have already checked partition constraints above, so skip * checking them here. Maybe something like: * We've already checked the partition constraint above, however, we * must still ensure the tuple passes all other constraints, so we'll * call ExecConstraints() and have it validate all remaining checks. 28. For table WITH OIDs, the OID should probably follow the new tuple for partition-key-UPDATEs. CREATE TABLE p (a BOOL NOT NULL, b INT NOT NULL) PARTITION BY LIST (a) WITH OIDS; CREATE TABLE P_true PARTITION OF p FOR VALUES IN('t'); CREATE TABLE P_false PARTITION OF p FOR VALUES IN('f'); INSERT INTO p VALUES('t', 10); SELECT tableoid::regclass,oid,a FROM p; tableoid | oid | a --+---+--- p_true | 16792 | t (1 row) UPDATE p SET a = 'f'; -- partition-key-UPDATE (oid has changed (it probably shouldn't have)) SELECT tableoid::regclass,oid,a FROM p; tableoid | oid | a --+---+--- p_false | 16793 | f (1 row) UPDATE p SET b = 20; -- non-partition-key-UPDATE (oid remains the same) SELECT tableoid::regclass,oid,a FROM p; tableoid | oid | a --+---+--- p_false | 16793 | f (1 row) I'll try to continue with the review tomorrow, but I think some other reviews are also looming too. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] No mention of CREATE STATISTICS in event trigger docs
A patch to fix this is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services event_trigger_statistics_doc.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 10 November 2017 at 16:30, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote: > In 0002, bms_add_range has a bit naive-looking loop > > + while (wordnum <= uwordnum) > + { > + bitmapword mask = (bitmapword) ~0; > + > + /* If working on the lower word, zero out bits below 'lower'. > */ > + if (wordnum == lwordnum) > + { > + int lbitnum = BITNUM(lower); > + mask >>= lbitnum; > + mask <<= lbitnum; > + } > + > + /* Likewise, if working on the upper word, zero bits above > 'upper' */ > + if (wordnum == uwordnum) > + { > + int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) > + 1); > + mask <<= ushiftbits; > + mask >>= ushiftbits; > + } > + > + a->words[wordnum++] |= mask; > + } > > Without some aggressive optimization, the loop takes most of the > time to check-and-jump for nothing especially with many > partitions and somewhat unintuitive. > > The following uses a bit tricky bitmap operation but > is straightforward as a whole. > > = > /* fill the bits upper from BITNUM(lower) (0-based) of the first word */ > a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1); > > /* fill up intermediate words */ > while (wordnum < uwordnum) >a->words[wordnum++] = ~(bitmapword) 0; > > /* fill up to BITNUM(upper) bit (0-based) of the last word */ > a->workds[wordnum++] |= > (~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1)); > = No objections here for making bms_add_range() perform better, but this is not going to work when lwordnum == uwordnum. You'd need to special case that. I didn't think it was worth the trouble, but maybe it is... I assume the += should be |=. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 7 November 2017 at 01:52, David Rowley <david.row...@2ndquadrant.com> wrote: > Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13) Hi Amit, I had another look over this today. Apologies if any of the review seems petty. Here goes: 1. If test seems to be testing for a child that's a partitioned table, but is testing for a non-NULL part_scheme. /* * If childrel is itself partitioned, add it and its partitioned * children to the list being propagated up to the root rel. */ if (childrel->part_scheme && rel->part_scheme) Should this code use IS_PARTITIONED_REL() instead? Seems a bit strange to test for a NULL part_scheme 2. There's a couple of mistakes in my bms_add_range() code. I've attached bms_add_range_fix.patch. Can you apply this to your tree? 3. This assert seems to be Asserting the same thing twice: Assert(rel->live_partitioned_rels != NIL && list_length(rel->live_partitioned_rels) > 0); A List with length == 0 is always NIL. 4. get_partitions_from_clauses(), can you comment why you perform the list_concat() there. I believe this is there so that the partition bound from the parent is passed down to the child so that we can properly eliminate all child partitions when the 2nd level of partitioning is using the same partition key as the 1st level. I think this deserves a paragraph of comment to explain this. 5. Please add a comment to explain what's going on here in classify_partition_bounding_keys() if (partattno == 0) { partexpr = lfirst(partexprs_item); partexprs_item = lnext(partexprs_item); } Looks like, similar to index expressions, that partition expressions are attno 0 to mark to consume the next expression from the list. Does this need validation that there are enough partexprs_item items like what is done in get_range_key_properties()? Or is this validated somewhere else? 6. Comment claims the if test will test something which it does not seem to test for: /* * Redundant key elimination using btree-semantics based tricks. * * Only list and range partitioning use btree operator semantics, so * skip otherwise. Also, if there are expressions whose value is yet * unknown, skip this step, because we need to compare actual values * below. */ memset(keyclauses, 0, PARTITION_MAX_KEYS * sizeof(List *)); if (partkey->strategy == PARTITION_STRATEGY_LIST || partkey->strategy == PARTITION_STRATEGY_RANGE) I was expecting this to be skipped when the clauses contained a non-const, but it does not seem to. 7. Should be "compare them" /* * If the leftarg and rightarg clauses' constants are both of the type * expected by "op" clause's operator, then compare then using the * latter's comparison function. */ But if I look at the code "compare then using the latter's comparison function." is not true, it seems to use op's comparison function not rightarg's. With most of the calls op and rightarg are the same, but not all of them. The function shouldn't make that assumption even if the args op was always the same as rightarg. 8. remove_redundant_clauses() needs an overview comment of what the function does. 9. The comment should explain what we would do in the case of key < 3 AND key <= 2 using some examples. /* try to keep only one of <, <= */ 10. Wondering why this loop runs backward? for (s = BTMaxStrategyNumber; --s >= 0;) Why not just: for (s = 0; s < BTMaxStrategyNumber; s++) I can't see a special reason for it to run backward. It seems unusual, but if there's a good reason that I've failed to realise then it's maybe worth a comment. 11. Pleae comment on why *constfalse = true is set here: if (!chk || s == (BTEqualStrategyNumber - 1)) continue; if (partition_cmp_args(partopfamily, partopcintype, chk, eq, chk, _result)) { if (!test_result) { *constfalse = true; return; } /* discard the redundant key. */ xform[s] = NULL; } Looks like we'd hit this in a case such as: WHERE key = 1 AND key > 1. Also please add a comment when discarding the redundant key maybe explain that equality is more useful than the other strategies when there's an overlap. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services bms_add_range_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 7 November 2017 at 01:52, David Rowley <david.row...@2ndquadrant.com> wrote: > Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13) I have a little more review to share: 1. Missing "in" in comment. Should be "mentioned in" * get_append_rel_partitions * Return the list of partitions of rel that pass the clauses mentioned * rel->baserestrictinfo 2. Variable should be declared in inner scope with the following fragment: void set_basic_child_rel_properties(PlannerInfo *root, RelOptInfo *rel, RelOptInfo *childrel, AppendRelInfo *appinfo) { AttrNumber attno; if (rel->part_scheme) { which makes the code the same as where you moved it from. 3. Normally lfirst() is assigned to a variable at the start of a foreach() loop. You have code which does not follow this. foreach(lc, clauses) { Expr *clause; int i; if (IsA(lfirst(lc), RestrictInfo)) { RestrictInfo *rinfo = lfirst(lc); You could assign this to a Node * since the type is unknown to you at the start of the loop. 4. /* * Useless if what we're thinking of as a constant is actually * a Var coming from this relation. */ if (bms_is_member(rel->relid, constrelids)) continue; should this be moved to just above the op_strict() test? This one seems cheaper. 5. Typo "paritions": /* No clauses to prune paritions, so scan all partitions. */ But thinking about it more the comment should something more along the lines of /* No useful clauses for partition pruning. Scan all partitions. */ The key difference is that there might be clauses, just without Consts. Actually, the more I look at get_append_rel_partitions() I think it would be better if you re-shaped that if/else if test so that it only performs the loop over the partindexes if it's been set. I ended up with the attached version of the function after moving things around a little bit. I'm still reviewing but thought I'd share this part so far. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services /* * get_append_rel_partitions * Return the list of partitions of rel that pass the clauses mentioned * in rel->baserestrictinfo. An empty list is returned if no matching * partitions were found. * * Returned list contains the AppendRelInfos of chosen partitions. */ static List * get_append_rel_partitions(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte) { List *partclauses; bool contains_const, constfalse; /* * Get the clauses that match the partition key, including information * about any nullness tests against partition keys. Set keynullness to * a invalid value of NullTestType, which 0 is not. */ partclauses = match_clauses_to_partkey(rel, list_copy(rel->baserestrictinfo), _const, ); if (!constfalse) { Relation parent = heap_open(rte->relid, NoLock); PartitionDesc partdesc = RelationGetPartitionDesc(parent); Bitmapset *partindexes; List *result = NIL; inti; /* * If we have matched clauses that contain at least one constant * operand, then use these to prune partitions. */ if (partclauses != NIL && contains_const) partindexes = get_partitions_from_clauses(parent, partclauses); /* * Else there are no clauses that are useful to prune any paritions, * so we must scan all partitions. */ else partindexes = bms_add_range(NULL, 0, partdesc->nparts - 1); /* Fetch the partition appinfos. */ i = -1; while ((i = bms_next_member(partindexes, i)) >= 0) { AppendRelInfo *appinfo = rel->part_appinfos[i]; #ifdef USE_ASSERT_CHECKING RangeTblEntry *rte = planner_rt_fetch(appinfo->child_relid, root); /* * Must be the intended child's RTE here, because appinfos are ordered * the same way as partitions in the partition descriptor. */ Assert(partdesc->oids[i] == rte->relid); #endif result = lappend(result, appinfo); } /* Record which partitions must be scanned. */ rel->live_part_appinfos = result; heap_close(parent, NoLock); return result; } return NIL; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 6 November 2017 at 23:01, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > OK, I have gotten rid of the min/max partition index interface and instead > adopted the bms_add_range() approach by including your patch to add the > same in the patch set (which is now 0002 in the whole set). I have to > admit that it's simpler to understand the new code with just Bitmapsets to > look at, but I'm still a bit concerned about materializing the whole set > right within partition.c, although we can perhaps optimize it later. Thanks for making that change. The code looks much more simple now. For performance, if you're worried about a very large number of partitions, then I think you're better off using bms_next_member() rather than bms_first_member(), (likely this applies globally, but you don't need to worry about those). The problem with bms_first_member is that it must always loop over the 0 words before it finds any bits set for each call, whereas bms_next_member will start on the word it was last called for. There will likely be a pretty big performance difference between the two when processing a large Bitmapset. > Attached updated set of patches, including the fix to make the new pruning > code handle Boolean partitioning. Thanks. I'll look over it all again starting my Tuesday morning. (UTC+13) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Removing useless DISTINCT clauses
In [1] we made a change to process the GROUP BY clause to remove any group by items that are functionally dependent on some other GROUP BY items. This really just checks if a table's PK columns are entirely present in the GROUP BY clause and removes anything else belonging to that table. All this seems to work well, but I totally failed to consider that the exact same thing applies to DISTINCT too. Over in [2], Rui Liu mentions that the planner could do a better job for his case. Using Rui Liu's example: CREATE TABLE test_tbl ( k INT PRIMARY KEY, col text); INSERT into test_tbl select generate_series(1,1000), 'test'; Master: postgres=# explain analyze verbose select distinct col, k from test_tbl order by k limit 1000; QUERY PLAN - Limit (cost=1658556.19..1658563.69 rows=1000 width=9) (actual time=8934.962..8935.495 rows=1000 loops=1) Output: col, k -> Unique (cost=1658556.19..1733557.50 rows=1175 width=9) (actual time=8934.961..8935.460 rows=1000 loops=1) Output: col, k -> Sort (cost=1658556.19..1683556.63 rows=1175 width=9) (actual time=8934.959..8935.149 rows=1000 loops=1) Output: col, k Sort Key: test_tbl.k, test_tbl.col Sort Method: external merge Disk: 215128kB -> Seq Scan on public.test_tbl (cost=0.00..154056.75 rows=1175 width=9) (actual time=0.062..1901.728 rows=1000 loops=1) Output: col, k Planning time: 0.092 ms Execution time: 8958.687 ms (12 rows) Patched: postgres=# explain analyze verbose select distinct col, k from test_tbl order by k limit 1000; QUERY PLAN -- Limit (cost=0.44..34.31 rows=1000 width=9) (actual time=0.030..0.895 rows=1000 loops=1) Output: col, k -> Unique (cost=0.44..338745.50 rows=1175 width=9) (actual time=0.029..0.814 rows=1000 loops=1) Output: col, k -> Index Scan using test_tbl_pkey on public.test_tbl (cost=0.44..313745.06 rows=1175 width=9) (actual time=0.026..0.452 rows=1000 loops=1) Output: col, k Planning time: 0.152 ms Execution time: 0.985 ms (8 rows) A patch to implement this is attached. I'll add it to the Jan commitfest. (I don't expect anyone to look at this before then). [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d4c3a156cb46dcd1f9f97a8011bd94c544079bb5 [2] https://www.postgresql.org/message-id/flat/CAKJS1f9q0j3BgMUsDbtf9%3DecfVLnqvkYB44MXj0gpVuamcN8Xw%40mail.gmail.com#CAKJS1f9q0j3BgMUsDbtf9=ecfvlnqvkyb44mxj0gpvuamcn...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_useless_distinct_clauses.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning
On 6 November 2017 at 17:30, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > On 2017/11/03 13:32, David Rowley wrote: >> On 31 October 2017 at 21:43, Amit Langote <langote_amit...@lab.ntt.co.jp> >> wrote: >> 1. This comment seem wrong. >> >> /* >> * Since the clauses in rel->baserestrictinfo should all contain Const >> * operands, it should be possible to prune partitions right away. >> */ > > Yes. I used to think it was true, then realized it isn't and updated the > code to get rid of that assumption, but I forgot updating this comment. > Fixed. > >> How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ? >> baserestrictinfo in this case will contain a single RestrictInfo with >> an OpExpr containing two Var args and it'll come right through that >> function too. ... > We won't be able to use such a clause for pruning at all; neither > planning-time pruning nor execution-time pruning. Am I missing something? I just meant the comment was wrong. > > The design with min/max partition index interface to the partition.c's new > partition-pruning facility is intentional. You can find hints about how > such a design came about in the following Robert's email: > > https://www.postgresql.org/message-id/CA%2BTgmoYcv_MghvhV8pL33D95G8KVLdZOxFGX5dNASVkXO8QuPw%40mail.gmail.com Yeah, I remember reading that before I had looked at the code. I disagree with Robert on this. The fact that the min/max range gets turned into a list of everything in that range in get_append_rel_partitions means all the advantages that storing the partitions as a range is voided. If you could have kept it a range the entire time, then that might be different, but seems you need to materialize the entire range in order to sort the partitions into order. I've included Robert in just in case he wants to take a look at the code that resulted from that design. Maybe something is not following what he had in mind, or maybe he'll change his mind based on the code that resulted. > For range queries, it is desirable for the partitioning module to return > the set of qualifying partitions that are contiguous in a compact (O(1)) > min/max representation than having to enumerate all those indexes in the > set. It's nice to avoid iterating over that set twice -- once when > constructing the set in the partitioning module and then again in the > caller (in this case, planner) to perform the planning-related tasks per > selected partition. The idea is that you still get the min and max from the bsearch, but then use bms_add_range() to populate a bitmapset of the matching partitions. The big-O notation of the search shouldn't change. > We need the other_parts Bitmapset too, because selected partitions may not > always be contiguous, even in the case of range partitioning, if there are > OR clauses and the possibility that the default partition is also > selected. While computing the selected partition set from a given set of > clauses, partitioning code tries to keep the min/max representation as > long as it makes sense to and once the selected partitions no longer > appear to be contiguous it switches to the Bitmapset mode. Yip. I understand that. I just think there's no benefit to having min/max since it needs to be materialized into a list of the entire range at some point, it might as well be done as soon as possible using a bitmapset, which would save having all the partset_union, partset_intersect, partset_range_empty, partset_range_overlap, partset_range_adjacent code. You'd end up just using bms_union and bms_intersect then bms_add_range to handle the min/max bound you get from the bsearch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 31 October 2017 at 21:43, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > Attached updated version of the patches match_clauses_to_partkey() needs to allow for the way quals on Bool columns are represented. create table pt (a bool not null) partition by list (a); create table pt_true partition of pt for values in('t'); create table pt_false partition of pt for values in('f'); explain select * from pt where a = true; QUERY PLAN -- Append (cost=0.00..76.20 rows=2810 width=1) -> Seq Scan on pt_false (cost=0.00..38.10 rows=1405 width=1) Filter: a -> Seq Scan on pt_true (cost=0.00..38.10 rows=1405 width=1) Filter: a (5 rows) match_clause_to_indexcol() shows an example of how to handle this. explain select * from pt where a = false; will need to be allowed too. This works slightly differently. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 3 November 2017 at 17:32, David Rowley <david.row...@2ndquadrant.com> wrote: > 2. This code is way more complex than it needs to be. > > if (num_parts > 0) > { > int j; > > all_indexes = (int *) palloc(num_parts * sizeof(int)); > j = 0; > if (min_part_idx >= 0 && max_part_idx >= 0) > { > for (i = min_part_idx; i <= max_part_idx; i++) > all_indexes[j++] = i; > } > if (!bms_is_empty(other_parts)) > while ((i = bms_first_member(other_parts)) >= 0) > all_indexes[j++] = i; > if (j > 1) > qsort((void *) all_indexes, j, sizeof(int), intcmp); > } > > It looks like the min/max partition stuff is just complicating things > here. If you need to build this array of all_indexes[] anyway, I don't > quite understand the point of the min/max. It seems like min/max would > probably work a bit nicer if you didn't need the other_parts > BitmapSet, so I recommend just getting rid of min/max completely and > just have a BitmapSet with bit set for each partition's index you > need, you'd not need to go to the trouble of performing a qsort on an > array and you could get rid of quite a chunk of code too. > > The entire function would then not be much more complex than: > > partindexes = get_partitions_from_clauses(parent, partclauses); > > while ((i = bms_first_member(partindexes)) >= 0) > { > AppendRelInfo *appinfo = rel->part_appinfos[i]; > result = lappend(result, appinfo); > } > > Then you can also get rid of your intcmp() function too. I've read a bit more of the patch and I think even more now that the min/max stuff should be removed. I understand that you'll be bsearching for a lower and upper bound for cases like: SELECT * FROM pt WHERE key BETWEEN 10 and 20; but it looks like the min and max range stuff is thrown away if the query is written as: SELECT * FROM pt WHERE key BETWEEN 10 and 20 OR key BETWEEN 30 AND 40; from reading the code, it seems like partset_union() would be called in this case and if the min/max of each were consecutive then the min/max range would get merged, but there's really a lot of code to support this. I think it would be much better to invent bms_add_range() and just use a Bitmapset to store the partition indexes to scan. You could simply use bms_union for OR cases and bms_intersect() or AND cases. It seems this would allow removal of this complex code. It looks like this would allow you to remove all the partset_range_* macros too. I've attached a patch which implements bms_add_range() which will save you from having to write the tight loops to call bms_add_member() such as the ones in partset_union(). Those would not be so great if there was a huge number of partitions as the Bitmapset->words[] array could be expanded many more times than required. bms_add_range() will handle that much more efficiently with a maximum of 1 repalloc() for the whole operation. It would also likely faster since it's working at the bitmapword level rather than bit level. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Add-bms_add_range-to-add-members-within-the-specifie.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
On 3 November 2017 at 03:26, Craig Ringer <cr...@2ndquadrant.com> wrote: > On 2 November 2017 at 22:22, David Rowley <david.row...@2ndquadrant.com> > wrote: >> Maybe, but the new implementation is not going to do well with places >> where we perform lcons(). Probably many of those places could be >> changed to lappend(), but I bet there's plenty that need prepend. > > Yeah, and it's IMO significant that pretty much every nontrivial > system with ADTs offers multiple implementations of list data > structures, often wrapped with a common API. > > Java's Collections, the STL, you name it. I've never really looked at much Java, but I've done quite a bit of dotnet stuff in my time and they have a common interface ICollection and various data structures that implement that interface. Which is implemented by a bunch of classes, something like: ConcurrentDictionary (hash table) Dictionary (hash table) HashSet (hash table) LinkedList (some kinda linked list) List (Arrays, probably) SortedDictionary (bst, I think) SortedList (no idea, perhaps a btree) SortedSet Bag (no idea, but does not care about any order) Probably there are more, but the point is that they obviously don't believe there's a one-size-fits-all type. I don't think we should do that either. However, I've proved nothing on the performance front yet, so that should probably be my next step. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 31 October 2017 at 21:43, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > Attached updated version of the patches addressing some of your comments I've spent a bit of time looking at these but I'm out of time for now. So far I have noted down the following; 1. This comment seem wrong. /* * Since the clauses in rel->baserestrictinfo should all contain Const * operands, it should be possible to prune partitions right away. */ How about PARTITION BY RANGE (a) and SELECT * FROM parttable WHERE a > b; ? baserestrictinfo in this case will contain a single RestrictInfo with an OpExpr containing two Var args and it'll come right through that function too. 2. This code is way more complex than it needs to be. if (num_parts > 0) { int j; all_indexes = (int *) palloc(num_parts * sizeof(int)); j = 0; if (min_part_idx >= 0 && max_part_idx >= 0) { for (i = min_part_idx; i <= max_part_idx; i++) all_indexes[j++] = i; } if (!bms_is_empty(other_parts)) while ((i = bms_first_member(other_parts)) >= 0) all_indexes[j++] = i; if (j > 1) qsort((void *) all_indexes, j, sizeof(int), intcmp); } It looks like the min/max partition stuff is just complicating things here. If you need to build this array of all_indexes[] anyway, I don't quite understand the point of the min/max. It seems like min/max would probably work a bit nicer if you didn't need the other_parts BitmapSet, so I recommend just getting rid of min/max completely and just have a BitmapSet with bit set for each partition's index you need, you'd not need to go to the trouble of performing a qsort on an array and you could get rid of quite a chunk of code too. The entire function would then not be much more complex than: partindexes = get_partitions_from_clauses(parent, partclauses); while ((i = bms_first_member(partindexes)) >= 0) { AppendRelInfo *appinfo = rel->part_appinfos[i]; result = lappend(result, appinfo); } Then you can also get rid of your intcmp() function too. 3. Following code has the wrong size calculation: memset(keynullness, -1, PARTITION_MAX_KEYS * sizeof(NullTestType *)); should be PARTITION_MAX_KEYS * sizeof(NullTestType). It might have worked on your machine if you're compiling as 32 bit. I'll continue on with the review in the next few days. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 31 October 2017 at 21:43, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > Attached updated version of the patches addressing some of your comments > above and fixing a bug that Rajkumar reported [1]. As mentioned there, > I'm including here a patch (the 0005 of the attached) to tweak the default > range partition constraint to be explicit about null values that it might > contain. So, there are 6 patches now and what used to be patch 0005 in > the previous set is patch 0006 in this version of the set. Hi Amit, I've been looking over this. I see the latest patches conflict with cf7ab13bf. Can you send patches rebased on current master? Thanks -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
On 3 November 2017 at 03:27, Stephen Frost <sfr...@snowman.net> wrote: > * David Rowley (david.row...@2ndquadrant.com) wrote: >> We could get around a few of these problems if Lists were implemented >> internally as arrays, however, arrays are pretty bad if we want to >> delete an item out the middle as we'd have to shuffle everything over >> one spot. Linked lists are much better here... at least they are when >> you already have the cell you want to remove... so we can't expect >> that we can just rewrite List to use arrays instead. > > This actually depends on just how you delete the item out of the array, > though there's a trade-off there. If you "delete" the item by marking > it as "dead" then you don't need to re-shuffle everything, but, of > course, then you have to make sure to 'skip' over those by running down > the array for each list_nth() call- but here's the thing, with a small > list that's all in a tighly packed array, that might not be too bad. > Certainly far better than having to grab random memory on each step and > I'm guessing you'd only actually store the "data" item for a given list > member in the array if it's a integer/oid/etc list, not when it's > actually a pointer being stored in the list, so you're always going to > be working with pretty tightly packed arrays where scanning the list on > the list_nth call really might be darn close to as fast as using an > offset to the actual entry in the array, unless it's a pretty big list, > but I don't think we've actually got lots of multi-hundred-deep lists. I was hoping to have list_nth as always O(1) so that we'd never be tempted again to use arrays directly like we are not for things like simle_rel_array[]. Probably it could be made to work quickly in most cases by tracking if there are any deleted items and just do the direct array lookups if nothing is deleted, and if something is deleted then visit index n minus bms_num_members() of some deleted bitmapset for the bits earlier than the Nth element. bms_num_members() could be made faster with 64bit bitmap words and __builtin_popcountl() falling back on the number_of_ones[] array when not available. Would also require a bms_num_members_before() function. In the implementation that I attached, I showed an example of how to delete items out an array list, which I think is quite a bit cleaner than how we generally do it today with List. (e.g in reconsider_outer_join_clauses()) However, it'll still perform badly when just a single known item is being removed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
On 3 November 2017 at 03:38, Tom Lane <t...@sss.pgh.pa.us> wrote: > David Rowley <david.row...@2ndquadrant.com> writes: >> On 3 November 2017 at 03:17, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> We've jacked up the List API and driven a new implementation underneath >>> once before. Maybe it's time to do that again. > >> Maybe, but the new implementation is not going to do well with places >> where we perform lcons(). Probably many of those places could be >> changed to lappend(), but I bet there's plenty that need prepend. > > [ shrug... ] To me, that means this implementation isn't necessarily > the right solution. True, of course, could be worked around probably with some bit flags to record if lcons was called, and then consume the elements in reverse. We might have to shuffle things along a bit for Lists that get lcons and lappend calls. The API break that I can't see a way around is: /* does the list have 0 elements? */ if (list == NIL) ... That would need to be rewritten as: if (list_length(list) == 0) ... > It seems to me that a whole lot of the complaints about this could be > resolved simply by improving the List infrastructure to allocate ListCells > in batches. That would address the question of "too much palloc traffic" > and greatly improve the it-accesses-all-over-memory situation too. > > Possibly there are more aggressive changes that could be implemented > without breaking too many places, but I think it would be useful to > start there and see what it buys us. Yeah, certainly would be a good way of determining the worth of changing. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ArrayLists instead of List (for some things)
On 3 November 2017 at 03:17, Tom Lane <t...@sss.pgh.pa.us> wrote: > We've jacked up the List API and driven a new implementation underneath > once before. Maybe it's time to do that again. Maybe, but the new implementation is not going to do well with places where we perform lcons(). Probably many of those places could be changed to lappend(), but I bet there's plenty that need prepend. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ArrayLists instead of List (for some things)
Hackers, Our List implementation internally uses linked lists which are certainly good for some things, but pretty bad at other things. Linked lists are pretty bad when you want to fetch the Nth element, as it means looping ever each previous element until you get to the Nth. They're also pretty bad for a modern CPU due to their memory access pattern. We could get around a few of these problems if Lists were implemented internally as arrays, however, arrays are pretty bad if we want to delete an item out the middle as we'd have to shuffle everything over one spot. Linked lists are much better here... at least they are when you already have the cell you want to remove... so we can't expect that we can just rewrite List to use arrays instead. I've attached a first cut implementation of "AList". I was originally going to call it "ArrayList", but my fingers were getting tired, so I change it. This first cut version replaces nothing in the code base to use ALists. This email (and due to the timing of it) is more of a marker that this is being worked on. I know in particular that Andres has complained at least once about List usages in the executor, which I agree is not great. Looking at the usage of list_nth() is quite scary! My plans for this include: * Make targetlists from createplan onwards use ALists. Every time I look at build_path_tlist() I gawk at the number of palloc calls that are going on inside those lappend() calls. We already know how many items will be in the list, so with AList we could just alist_premake(list_length(path->pathtarget->exprs)) and we'd never have to palloc() another element for that list again. We do the same again in various mutator functions in setrefs.c too! * list_nth usage in adjust_appendrel_attrs_mutator() to speed up translation between parent and child Vars * And also, umm, simple_rte_array and simple_rel_array. Well, we'd still need somewhere to store all the RelOptInfos, but the idea is that it would be rather nice if we could not expand the entire inheritance hierarchy of a relation, and it would be rather nice if we could just add the partitions that we need, rather than add them all and setting dummy paths for the ones we're not interested in. Anyway, please don't debate the usages of the new type here. As for all the above plans, I admit to not having a full handle on them yet. I wrote the Array List implementation so I could get a better idea on how possible each of those would be, so, for now, to prevent any duplicate work, here it is... (Including in Andres because I know he's mentioned his dislike for List in the executor) Comments on the design are welcome, but I was too late to the commitfest, so there are other priorities. However, if you have a strong opinion, feel free to voice it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Basic-implementation-of-array-lists-AList.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Removing LEFT JOINs in more cases
Hackers, Normally we'll only ever remove a LEFT JOIN relation if it's unused and there's no possibility that the join would cause row duplication. To check that the join wouldn't cause row duplicate we make use of proofs, such as unique indexes, or for sub-queries, we make use of DISTINCT and GROUP BY clauses. There's another case that we don't handle, and it's VERY simple to test for. Quite simply, it seems we could remove the join in cases such as: create table t1 (id int primary key); create table t2 (id int primary key, b int not null); insert into t2 values(1,1),(2,1); insert into t1 values(1); select distinct t1.* from t1 left join t2 on t1.id=t2.b; and select t1.id from t1 left join t2 on t1.id=t2.b GROUP BY t1.id; but not: select t1.id,count(*) from t1 left join t2 on t1.id=t2.b GROUP BY t1.id; In this case, the join *can* cause row duplicates, but the distinct or group by would filter these out again anyway, so in these cases, we'd not only get the benefit of not joining but also not having to remove the duplicate rows caused by the join. Given how simple the code is to support this, it seems to me to be worth handling. A patch to do this is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Support-removing-LEFT-JOINs-with-DISTINCT-GROUP-BY.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An unlikely() experiment
On 30 October 2017 at 22:44, Andres Freund <and...@anarazel.de> wrote: > On 2017-10-30 22:39:01 +1300, David Rowley wrote: >> Today I was thinking, to get around that issue, we might be able to >> generate another thin wrapper around elog_start() and mark that as >> __attribute__((cold)) and fudge the macro a bit to call that function >> instead if it can detect a compile time const level >= ERROR. I've not >> looked at the code again to remind myself if that would be possible. > > Yes, that's what I was thinking too. Add a elog_fatal() wrapping > elog_finish(), and move the if (__builtin_constant_p(elevel)) branch a > bit earlier, and that should work. Similar with errstart_fatal() for > ereport(). This may have been too good to be true. I can't seem to get it to work and I think it's because the function is inside the do{}while(0) and the if (__builtin_constant_p(elevel) && (elevel) >= ERROR) branch, which appears to mean that: "The paths leading to call of cold functions within code are marked as unlikely by the branch prediction mechanism" [1] is not the path that the macro is in in the calling function, like we might have hoped. I can get the assembly to change if I put an unlikely() around the condition or if I go and vandalize the macro to become: #define elog(elevel, ...) \ elog_start_error(__FILE__, __LINE__, PG_FUNCNAME_MACRO) [1] https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Function-Attributes.html -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An unlikely() experiment
On 30 October 2017 at 22:34, Andres Freund <and...@anarazel.de> wrote: > Hi, > > On 2015-12-20 02:49:13 +1300, David Rowley wrote: >> Alternatively, if there was some way to mark the path as cold from within >> the path, rather than from the if() condition before the path, then we >> could perhaps do something in the elog() macro instead. I just couldn't >> figure out a way to do this. > > I just was thinking about this, and it turns out that > __attribute__((cold)) does what we need here. We could just mark > elog_start() and errstart() as cold, and afaict that should do the > trick. I had actually been thinking about this again today. I had previously not thought __attribute__((cold)) would be a good idea since if we mark elog_start() with that, then code paths with elog(NOTICE) and elog(LOG) not to mention DEBUG would be marked as cold. Today I was thinking, to get around that issue, we might be able to generate another thin wrapper around elog_start() and mark that as __attribute__((cold)) and fudge the macro a bit to call that function instead if it can detect a compile time const level >= ERROR. I've not looked at the code again to remind myself if that would be possible. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 27 October 2017 at 01:44, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > I think Antonin has a point. The join processing may deem some base > relations dummy (See populate_joinrel_with_paths()). So an Append > relation which had multiple child alive at the end of > set_append_rel_size() might ultimately have only one child after > partition-wise join has worked on it. hmm, I see. partition-wise join is able to remove eliminate partitions on the other side of the join that can't be matched to: # set enable_partition_wise_join=on; SET # explain select count(*) from ab ab1 inner join ab ab2 on ab1.a=ab2.a where ab1.a between 1 and 2 and ab2.a between 1 and 10001; QUERY PLAN Aggregate (cost=0.00..0.01 rows=1 width=8) -> Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false (3 rows) # \d+ ab Table "public.ab" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +-+---+--+-+-+--+- a | integer | | | | plain | | b | integer | | | | plain | | Partition key: RANGE (a) Partitions: ab_p1 FOR VALUES FROM (1) TO (1), ab_p2 FOR VALUES FROM (10000) TO (2) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 26 October 2017 at 23:42, Antonin Houska <a...@cybertec.at> wrote: > David Rowley <david.row...@2ndquadrant.com> wrote: > >> Method 1: >> >> In set_append_rel_size() detect when just a single subpath would be >> added to the Append path. > > I spent some time reviewing the partition-wise join patch during the last CF > and have an impression that set_append_rel_size() is called too early to find > out how many child paths will eventually exist if Append represents a join of > a partitioned table. I think the partition matching takes place at later stage > and that determines the actual number of partitions, and therefore the number > of Append subpaths. I understand that we must wait until set_append_rel_size() is being called on the RELOPT_BASEREL since partitions may themselves be partitioned tables and we'll only know what's left after we've recursively checked all partitions of the baserel. I've not studied the partition-wise join code yet, but if we've eliminated all but one partition in set_append_rel_size() then I imagine we'd just need to ensure partition-wise join is not attempted since we'd be joining directly to the only partition anyway. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
On 26 October 2017 at 23:30, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Oct 25, 2017 at 11:59 PM, David Rowley > <david.row...@2ndquadrant.com> wrote: >> As of today, because we include this needless [Merge]Append node, we >> cannot parallelise scans below the Append. > > Without disputing your general notion that singe-child Append or > MergeAppend nodes are a pointless waste, how does a such a needless > node prevent parallelizing scans beneath it? hmm, I think I was wrong about that now. I had been looking over the regression test diffs after having made tenk1 a partitioned table with a single partition containing all the rows, but it looks like I read the diff backwards. The planner actually parallelized the Append version, not the non-Append version, like I had previously thought. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Removing [Merge]Append nodes which contain a single subpath
It seems like a good idea for the planner not to generate Append or MergeAppend paths when the node contains just a single subpath. If we were able to remove these then the planner would have more flexibility to build a better plan. As of today, because we include this needless [Merge]Append node, we cannot parallelise scans below the Append. We must also include a Materialize node in Merge Joins since both MergeAppend and Append don't support mark and restore. Also, as shown in [1], there's an overhead to pulling tuples through nodes. I've been looking into resolving this issue but the ways I've explored so far seems to be bending the current planner a bit out of shape. Method 1: In set_append_rel_size() detect when just a single subpath would be added to the Append path. Set a promotion_child RelOptInfo field in the base rel's RelOptInfo, and during make_one_rel, after set_base_rel_sizes() modify the joinlist to get rid of the parent and use the child instead. This pretty much breaks the assumption we have that the finalrel will have all the relids to root->all_baserels. We have an Assert in make_one_rel() which checks this is true. There are also complications around set_rel_size() generating paths for subqueries which may be parameterized paths with the Append parent required for the parameterization to work. Method 2: Invent a ProxyPath concept that allows us to add Paths belonging to one relation to another relation. The ProxyPaths can have translation_vars to translate targetlists to reference the correct Vars. These ProxyPaths could exist up as far as createplan, where we could perform the translation and build the ProxyPath's subnode instead. This method is not exactly very clean either as there are various places around the planner we'd need to give special treatment to these ProxyPaths, such as is_projection_capable_path() would need to return the projection capability of the subpath within the ProxyPath since we never actually create a "Proxy" node. Probably either of these two methods could be made to work. I prefer method 2, since that infrastructure could one day be put towards scanning a Materialized View instead of the relation. in order to satisfy some query. However, method 2 appears to also require some Var translation in Path targetlists which contain this Proxy path, either that or some global Var replacement would need to be done during setrefs. I'm posting here in the hope that it will steer my development of this in a direction that's acceptable to the community. Perhaps there is also a "Method 3" which I've not thought about which would be much cleaner. [1] https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] A handful of typos in allpaths.c
A small patch to fix these is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services allpath_typos_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extended statistics is not working on Vars hidden under a RelabelType
On 15 October 2017 at 06:49, Robert Haas <robertmh...@gmail.com> wrote: > On Fri, Oct 13, 2017 at 4:49 PM, David Rowley > <david.row...@2ndquadrant.com> wrote: >> tps = 8282.481310 (including connections establishing) >> tps = 8282.750821 (excluding connections establishing) > > vs. > >> tps = 8520.822410 (including connections establishing) >> tps = 8521.132784 (excluding connections establishing) >> >> With the patch we are making use of the extended statistics, which we >> do expect to be more work for the planner. Although, we didn't add >> extended statistics to speed up the planner. > > Sure, I understand. That's actually a pretty substantial regression - > I guess that means that it's pretty important to avoid creating > extended statistics that are not needed, at least for short-running > queries. To be honest, I ran that on a VM on my laptop. I was getting quite a bit of noise. I just posted that to show that the 12x slowdown didn't exist. I don't know what the actual slowdown is. I just know extended stats are not free and that nobody expected that they ever would be. The good news is that they're off by default and if the bad ever outweighs the good then the fix for that starts with "DROP STATISTICS" I personally think it's great we're starting to see a useful feature materialise that can help with poor row estimates from the planner. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
On 12 October 2017 at 04:50, Robert Haas <robertmh...@gmail.com> wrote: > We haven't really done a very good job figuring out what to do about > optimizations that some people (mostly you) think are marginal. It's > obviously true that we can't just burn infinite planner cycles on > things that don't usually help, but at the same time, we can't just > keep ignoring requests by users for the same features and saying > "you'll be sad if we give this to you". Those people don't give up on > wanting the optimization; they just don't use PostgreSQL. I think we > need to stop just saying "no" and start thinking about what we could > do that would let us say "yes". I'm with Robert on this. Planning time *is* important, but if we were to do a quick tally on the number of patches that we've seen in the last few years to improve execution time by either improving the executor code or adding some smarts to the planner to reduce executor work vs patches aimed to reduce planning time, I think we'd find the general focus is on the executor. Personally, I don't recall a single patch aimed to just speed up the planner. Perhaps I missed some? It looks like there's plenty we could do in there, just nobody seems interested enough to go and do it, everyone who cares about performance is too busy trying to make execution run faster. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extended statistics is not working on Vars hidden under a RelabelType
On 14 October 2017 at 09:04, Robert Haas <robertmh...@gmail.com> wrote: > On Mon, Oct 9, 2017 at 11:03 PM, David Rowley > <david.row...@2ndquadrant.com> wrote: >> -- Unpatched >> Planning time: 0.184 ms >> Execution time: 105.878 ms >> >> -- Patched >> Planning time: 2.175 ms >> Execution time: 106.326 ms > > This might not be the best example to show the advantages of the > patch, honestly. The focus was on the row estimate. I try to highlight that by mentioning "Note rows=1 vs rows=98 in the Gather node.". I can't imagine the test I added would have made the planner about 12 times slower, but just for the record: create table ab (a varchar, b varchar); insert into ab select (x%1000)::varchar, (x%1)::Varchar from generate_Series(1,100)x; create statistics ab_a_b_stats (dependencies) on a,b from ab; vacuum analyze ab; $ cat a.sql explain select * from ab where a = '1' and b = '1'; e9ef11ac8bb2acc2d2462fc17ec3291a959589e7 (Patched) $ pgbench -f a.sql -T 60 -n transaction type: a.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 496950 latency average = 0.121 ms tps = 8282.481310 (including connections establishing) tps = 8282.750821 (excluding connections establishing) e9ef11ac8bb2acc2d2462fc17ec3291a959589e7~1 (Unpatched) $ pgbench -f a.sql -T 60 -n transaction type: a.sql scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 511250 latency average = 0.117 ms tps = 8520.822410 (including connections establishing) tps = 8521.132784 (excluding connections establishing) With the patch we are making use of the extended statistics, which we do expect to be more work for the planner. Although, we didn't add extended statistics to speed up the planner. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise aggregation/grouping
On 13 October 2017 at 19:36, Jeevan Chalke <jeevan.cha...@enterprisedb.com> wrote: > I have tried exactly same tests to get to this factor on my local developer > machine. And with parallelism enabled I got this number as 7.9. However, if > I disable the parallelism (and I believe David too disabled that), I get > this number as 1.8. Whereas for 1 rows, I get this number to 1.7 > > -- With Gather > # select current_Setting('cpu_tuple_cost')::float8 / ((10633.56 * (81.035 / > 72.450) - 10633.56) / 100); > 7.9 > > -- Without Gather > # select current_Setting('cpu_tuple_cost')::float8 / ((16925.01 * (172.838 / > 131.400) - 16925.01) / 100); > 1.8 > > -- With 1 rows (so no Gather too) > # select current_Setting('cpu_tuple_cost')::float8 / ((170.01 * (1.919 / > 1.424) - 170.01) / 1); > 1.7 > > So it is not so straight forward to come up the correct heuristic here. Thus > using 50% of cpu_tuple_cost look good to me here. > > As suggested by Ashutosh and Robert, attached separate small WIP patch for > it. Good to see it stays fairly consistent at different tuple counts, and is not too far away from what I got on this machine. I looked over the patch and saw this: @@ -1800,6 +1827,9 @@ cost_merge_append(Path *path, PlannerInfo *root, */ run_cost += cpu_operator_cost * tuples; + /* Add MergeAppend node overhead like we do it for the Append node */ + run_cost += cpu_tuple_cost * DEFAULT_APPEND_COST_FACTOR * tuples; + path->startup_cost = startup_cost + input_startup_cost; path->total_cost = startup_cost + run_cost + input_total_cost; } You're doing that right after a comment that says we don't do that. It also does look like the "run_cost += cpu_operator_cost * tuples" is trying to do the same thing, so perhaps it's worth just replacing that, which by default will double that additional cost, although doing so would have the planner slightly prefer a MergeAppend to an Append than previously. +#define DEFAULT_APPEND_COST_FACTOR 0.5 I don't really think the DEFAULT_APPEND_COST_FACTOR adds much. it means very little by itself. It also seems that most of the other cost functions just use the magic number. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions
On 13 October 2017 at 12:41, Tom Lane <t...@sss.pgh.pa.us> wrote: > David Rowley <david.row...@2ndquadrant.com> writes: >> If the user defines their normal aggregate as not safe for merging, >> then surely it'll not be suitable to be used as a window function >> either, since the final function will also be called there multiple >> times per state. > > Yeah, we would probably also want to check the flag in nodeWindowAgg. > Not sure exactly how that should play out --- maybe we end up with > a tri-valued property "works as normal agg without merging, works > as normal agg with merging, works as window agg". But this would > arguably be an improvement over the current situation. Right now > I'm sure there are user-written aggs out there that will just crash > if used as a window agg, and the authors don't have much choice because > the performance costs of not modifying the transition state in the > finalfn are higher than they're willing to bear. At least with a > flag they could ensure that the case will fail cleanly. hmm, maybe I'm lacking imagination here, but surely the final function is either destructive or it's not? I can't understand what the difference between nodeAgg.c calling the finalfn multiple times on the same state and nodeWindowAgg.c doing it. Maybe there's something I'm not accounting for that you are? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Aggregate transition state merging vs. hypothetical set functions
On 13 October 2017 at 12:08, Tom Lane <t...@sss.pgh.pa.us> wrote: > Therefore, I think we need to bite the bullet and provide an aggregate > property (CREATE AGGREGATE argument / pg_aggregate column) that tells > whether the aggregate supports transition state merging. Likely this > should have been in the state-merging patch to begin with, but better > late than never. > > The main thing that probably has to be hashed out before we can write > that patch is what the default should be for user-created aggregates. > I am inclined to think that we should err on the side of safety and > default it to false (no merge support). You could argue that the > lack of complaints since 9.6 came out is sufficient evidence that > defaulting to true would be all right, but I'm not sure. Are you considering that this is an option only for ordered-set aggregates or for all? If the user defines their normal aggregate as not safe for merging, then surely it'll not be suitable to be used as a window function either, since the final function will also be called there multiple times per state. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extended statistics is not working on Vars hidden under a RelabelType
On 13 October 2017 at 04:56, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > I pushed your original fix. Thanks for committing -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
On 13 October 2017 at 03:00, Tom Lane <t...@sss.pgh.pa.us> wrote: > Laurenz Albe <laurenz.a...@cybertec.at> writes: >> Robert Haas wrote: >>> One trick that some system use is avoid replanning as much as we do >>> by, for example, saving plans in a shared cache and reusing them even >>> in other sessions. That's hard to do in our architecture because the >>> controlling GUCs can be different in every session and there's not >>> even any explicit labeling of which GUCs control planner behavior. But >>> if you had it, then extra planning cycles would be, perhaps, more >>> tolerable. > >> From my experience with Oracle I would say that that is a can of worms. > > Yeah, I'm pretty suspicious of the idea too. We've had an awful lot of > bad experience with local plan caching, to the point where people wonder > why we don't just auto-replan every time. How would a shared cache > make that better? (Even assuming it was otherwise free, which it > surely won't be.) One idea I had when working on unique joins was that we could use it to determine if a plan could only return 0 or 1 row by additionally testing baserestrictinfo of each rel to see if an equality OpExpr with a const Const matches a unique constraint which would serve as a guarantee that only 0 or 1 row would match. I thought that these single row plans could be useful as a start for some pro-active plan cache. The plan here should be stable as they're not prone to any data skew that could cause a plan change. You might think it would be very few plans would fit the bill, but you'd likely find that the majority of UPDATE and DELETE queries run in an OTLP application would be eligible, and likely some decent percentage of SELECTs too. I had imagined this would be some backend local cache that saved MRU plans up to some size of memory defined by a GUC, where 0 would disable the feature. I never got any further than those thoughts. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extended statistics is not working on Vars hidden under a RelabelType
On 13 October 2017 at 02:17, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > I propose this slightly larger change. hmm, this is not right. You're not checking that there's a Var below the RelabelType. I tried with: explain select * from ab where (a||a)::varchar = '' and b = ''; and your code assumed the OpExpr was a Var. The reason Tomas coded it the way it was coded is due to the fact that there's already code that works exactly the same way in clauselist_selectivity(). Personally, I don't particularly like that code, but I'd rather not invent a new way to do the same thing. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise aggregation/grouping
On 10 October 2017 at 17:57, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > Append node just returns the result of ExecProcNode(). Charging > cpu_tuple_cost may make it too expensive. In other places where we > charge cpu_tuple_cost there's some processing done to the tuple like > ExecStoreTuple() in SeqNext(). May be we need some other measure for > Append's processing of the tuple. I don't think there's any need to invent any new GUC. You could just divide cpu_tuple_cost by something. I did a quick benchmark on my laptop to see how much Append really costs, and with the standard costs the actual cost seems to be about cpu_tuple_cost / 2.4. So probably cpu_tuple_cost / 2 might be realistic. create_set_projection_path() does something similar and brincostestimate() does some similar magic and applies 0.1 * cpu_operator_cost to the total cost. # create table p (a int, b int); # create table p1 () inherits (p); # insert into p1 select generate_series(1,100); # vacuum analyze p1; # \q $ echo "select count(*) from p1;" > p1.sql $ echo "select count(*) from p;" > p.sql $ pgbench -T 60 -f p1.sql -n latency average = 58.567 ms $ pgbench -T 60 -f p.sql -n latency average = 72.984 ms $ psql psql (11devel) Type "help" for help. # -- check the cost of the plan. # explain select count(*) from p1; QUERY PLAN -- Aggregate (cost=16925.00..16925.01 rows=1 width=8) -> Seq Scan on p1 (cost=0.00..14425.00 rows=100 width=0) (2 rows) # -- selecting from the parent is the same due to zero Append cost. # explain select count(*) from p; QUERY PLAN Aggregate (cost=16925.00..16925.01 rows=1 width=8) -> Append (cost=0.00..14425.00 rows=101 width=0) -> Seq Scan on p (cost=0.00..0.00 rows=1 width=0) -> Seq Scan on p1 (cost=0.00..14425.00 rows=100 width=0) (4 rows) # -- extrapolate the additional time taken for the Append scan and work out what the planner # -- should add to the plan's cost, then divide by the number of rows in p1 to work out the # -- tuple cost of pulling a row through the append. # select (16925.01 * (72.984 / 58.567) - 16925.01) / 100; ?column? 0.00416630302337493743 (1 row) # show cpu_tuple_cost; cpu_tuple_cost 0.01 (1 row) # -- How does that compare to the cpu_tuple_cost? # select current_Setting('cpu_tuple_cost')::float8 / 0.00416630302337493743; ?column? 2.400209476818 (1 row) Maybe it's worth trying with different row counts to see if the additional cost is consistent, but it's probably not worth being too critical here. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Extended statistics is not working on Vars hidden under a RelabelType
Basically, $subject is causing us not to properly find matching extended stats in this case. The attached patch fixes it. The following test cases is an example of the misbehaviour. Note rows=1 vs rows=98 in the Gather node. create table ab (a varchar, b varchar); insert into ab select (x%1000)::varchar, (x%1)::Varchar from generate_Series(1,100)x; create statistics ab_a_b_stats (dependencies) on a,b from ab; analyze ab; -- Unpatched explain analyze select * from ab where a = '1' and b = '1'; QUERY PLAN - Gather (cost=1000.00..12466.10 rows=1 width=7) (actual time=0.441..90.515 rows=100 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on ab (cost=0.00..11466.00 rows=1 width=7) (actual time=1.081..74.944 rows=33 loops=3) Filter: (((a)::text = '1'::text) AND ((b)::text = '1'::text)) Rows Removed by Filter: 00 Planning time: 0.184 ms Execution time: 105.878 ms (8 rows) -- Patched explain analyze select * from ab where a = '1' and b = '1'; QUERY PLAN -- Gather (cost=1000.00..12475.80 rows=98 width=7) (actual time=1.076..92.595 rows=100 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on ab (cost=0.00..11466.00 rows=41 width=7) (actual time=0.491..77.833 rows=33 loops=3) Filter: (((a)::text = '1'::text) AND ((b)::text = '1'::text)) Rows Removed by Filter: 00 Planning time: 2.175 ms Execution time: 106.326 ms (8 rows) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services allow_relabelled_vars_in_dependency_stats.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise aggregation/grouping
On 10 October 2017 at 01:10, Jeevan Chalke <jeevan.cha...@enterprisedb.com> wrote: > Attached new patch set having HEAD at 84ad4b0 with all these review points > fixed. Let me know if I missed any thanks. I've only really skimmed over this thread and only opened the code enough to extract the following: + /* Multiply the costs by partition_wise_agg_cost_factor. */ + apath->startup_cost *= partition_wise_agg_cost_factor; + apath->total_cost *= partition_wise_agg_cost_factor; I've not studied how all the path plumbing is done, but I think instead of doing this costing magic we should really stop pretending that Append/MergeAppend nodes are cost-free. I think something like charging cpu_tuple_cost per row expected through Append/MergeAppend would be a better approach to this. If you perform grouping or partial grouping before the Append, then in most cases the Append will receive less rows, so come out cheaper than if you perform the grouping after it. I've not learned the partition-wise join code enough to know if this is going to affect that too, but for everything else, there should be no plan change, since there's normally no alternative paths. I see there's even a comment in create_append_path() which claims the zero cost is a bit optimistic. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
On 9 October 2017 at 17:41, David Rowley <david.row...@2ndquadrant.com> wrote: > Thoughts? Actually, I was a little inconsistent with my List NULL/NIL checks in that last one. I've attached an updated patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_left_join_distinct_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
On 7 October 2017 at 14:48, Andres Freund <and...@anarazel.de> wrote: > 3. JOIN Elimination > > There's been a lot of discussion and several patches. There's a bunch of > problems here, one being that there's cases (during trigger firing, > before the constraint checks) where foreign keys don't hold true, so we > can't quite generally make these optimization. Another concern is > whether the added plan time is actually worthwhile. I looked over this and it seems there's some pretty low hanging fruit in there that we can add with just a handful of lines of new code. This is the case for LEFT JOINs with a DISTINCT clause. Normally we can only remove an unused LEFT JOINed relation if there are some unique properties that ensure that the join does not duplicate any outer row. We don't need to worry about this when there's a DISTINCT clause, as the DISTINCT would remove any duplicates anyway. If I'm not mistaken, we also don't need to bother looking at the actual distinct clause's exprs since we'll already know that nothing is in there regarding the relation being removed. The benefit to this could be two-fold, as 1) we don't need to join to the unused relation and 2) we don't need to remove any row duplication caused by the join. While someone might argue that this is not going to be that common a case to hit, if we consider how cheap this is to implement, it does seem to be worth doing a couple of NULL checks in the planner for it. The only slight downside I can see to this is that we're letting a few more cases through rel_supports_distinctness() which is also used for unique joins, and these proofs are not valid in those. However, it may not be worth the trouble doing anything about that as relations without unique indexes are pretty uncommon (at least in my world). Thoughts? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services remove_left_join_distinct.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Discussion on missing optimizations
On 7 October 2017 at 15:19, Tom Lane <t...@sss.pgh.pa.us> wrote: >> 9. Unneeded Self JOIN > >> Can't remember discussions of this. > > I can't get very excited about that one either. > > In the end, what the article fails to consider is that all of these are > tradeoffs, not unalloyed goods. If you spend planner cycles on every > query to look for cases that only the most unabashedly brain-dead ORMs > ever generate, you're not really doing your users a favor on balance. I think that final sentence lacks imagination. I've seen plenty of views being called where some column is unavailable, but the caller joins the very same table again on the primary key to add the column. There was no brain-dead ORM involved, just perhaps questionable design. This was very common in my last job where we had some rats nest of views several layers deep, the core of which often had some complex logic that nobody dared to try and replicate. It would be fairly cheap to check if any of the rtekind==RTE_RELATION joinlist items have above 1 RangeTblEntry with the same relid. The joinlist is never that big anyway, and if it was the join search would be slow. The more expensive part would be to build the join clauses, check if the expressions on either side of all OpExpr matches and that nothing else will cause a non-match, then perform the uniqueness check on those OpExpr operands. We do have some infrastructure to do the unique checks. Likely the slowdown in planning would be just for cases with a legitimately useful self-join, I doubt checking for a duplicate RangeTblEntry->relid would cause much of a concern. Anyway, this is giving me some feeling of Deja vu.. Perhaps we need some pg_stats view that shows us planning time and execution time so that we can get a better idea on how much these things matter in the average case. We tend to never fare so well in these sorts of comparisons with commercial databases. It's not hard to imagine someone with migration plans loading some rats nest of views into Postgres and taking it for a spin and finding performance is not quite what they need. It's a bit sad that often the people with the loudest voices are always so fast to stomp on the ideas for improvements. It would be much nicer if you'd at least wait for benchmarks before shooting. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JIT compiling - v4.0
On 5 October 2017 at 19:57, Andres Freund <and...@anarazel.de> wrote: > Here's some numbers for a a TPC-H scale 5 run. Obviously the Q01 numbers > are pretty nice in partcular. But it's also visible that the shorter > query can loose, which is largely due to the JIT overhead - that can be > ameliorated to some degree, but JITing obviously isn't always going to > be a win. It's pretty exciting to see thing being worked on. I've not looked at the code, but I'm thinking, could you not just JIT if the total cost of the plan is estimated to be > X ? Where X is some JIT threshold GUC. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 27 September 2017 at 14:22, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > - 0001 includes refactoring that Dilip proposed upthread [1] (added him as > an author). I slightly tweaked his patch -- renamed the function > get_matching_clause to match_clauses_to_partkey, similar to > match_clauses_to_index. Hi Amit, I've made a pass over the 0001 patch while trying to get myself up to speed with the various developments that are going on in partitioning right now. These are just my thoughts from reading over the patch. I understand that there's quite a bit up in the air right now about how all this is going to work, but I have some thoughts about that too, which I wouldn't mind some feedback on as my line of thought may be off. Anyway, I will start with some small typos that I noticed while reading: partition.c: + * telling what kind of NullTest has been applies to the corresponding "applies" should be "applied" plancat.c: * might've occurred to satisfy the query. Rest of the fields are set in "Rest of the" should be "The remaining" Any onto more serious stuff: allpath.c: get_rel_partitions() I wonder if this function does not belong in partition.c. I'd have expected a function to exist per partition type, RANGE and LIST, I'd imagine should have their own function in partition.c to eliminate partitions which cannot possibly match, and return the remainder. I see people speaking of HASH partitioning, but we might one day also want something like RANDOM or ROUNDROBIN (I'm not really kidding, imagine routing records to be processed into foreign tables where you never need to query them again). It would be good if we could easily expand this list and not have to touch files all over the optimizer to do that. Of course, there would be other work to do in the executor to implement any new partitioning method too. I know there's mention of it somewhere about get_rel_partitions() not being so smart about WHERE partkey > 20 AND partkey > 10, the code does not understand what's more restrictive. I think you could probably follow the same rules here that are done in eval_const_expressions(). Over there I see that evaluate_function() will call anything that's not marked as volatile. I'd imagine, for each partition key, you'd want to store a Datum with the minimum and maximum possible value based on the quals processed. If either the minimum or maximum is still set to NULL, then it's unbounded at that end. If you encounter partkey = Const, then minimum and maximum can be set to the value of that Const. From there you can likely ignore any other quals for that partition key, as if the query did contain another qual with partkey = SomeOtherConst, then that would have become a gating qual during the constant folding process. This way if the user had written WHERE partkey >= 1 AND partkey <= 1 the evaluation would end up the same as if they'd written WHERE partkey = 1 as the minimum and maximum would be the same value in both cases, and when those two values are the same then it would mean just one theoretical binary search on a partition range to find the correct partition instead of two. I see in get_partitions_for_keys you've crafted the function to return something to identify which partitions need to be scanned. I think it would be nice to see a special element in the partition array for the NULL and DEFAULT partition. I imagine 0 and 1, but obviously, these would be defined constants somewhere. The signature of that function is not so pretty and that would likely tidy it up a bit. The matching partition indexes could be returned as a Bitmapset, yet, I don't really see any code which handles adding the NULL and DEFAULT partition in get_rel_partitions() either, maybe I've just not looked hard enough yet... -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 25 September 2017 at 23:04, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > By the way, I'm now rebasing these patches on top of [1] and will try to > merge your refactoring patch in some appropriate way. Will post more > tomorrow. > > [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9140cf826 Yeah, I see 0001 conflicts with that. I'm going to set this to waiting on author while you're busy rebasing this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Remove useless joins (VARCHAR vs TEXT)
On 17 September 2017 at 08:07, Kim Rose Carlsen <k...@hiper.dk> wrote: > It seems there are some difference in VARCHAR vs TEXT when postgres tries to > decide if a LEFT JOIN is useful or not. I can't figure out if this is > intentional because there are some difference between TEXT and VARCHAR that > I dont know about or if it's a bug. > > > I would expect both examples to produce same query plan > > > a) > > create table a (id varchar primary key); > create table b (id varchar primary key); > > explain select a.* > from a > left join (select distinct id from b) as b >on a.id = b.id; > > > QUERY PLAN > -- > Hash Right Join (cost=67.60..113.50 rows=1360 width=32) >Hash Cond: ((b.id)::text = (a.id)::text) >-> HashAggregate (cost=27.00..40.60 rows=1360 width=32) > Group Key: b.id > -> Seq Scan on b (cost=0.00..23.60 rows=1360 width=32) >-> Hash (cost=23.60..23.60 rows=1360 width=32) > -> Seq Scan on a (cost=0.00..23.60 rows=1360 width=32) > (7 rows) > > b) > > create table a (id text primary key); > > create table b (id text primary key); > > explain select a.* > from a > left join (select distinct id from b) as b >on a.id = b.id; > > QUERY PLAN > -- > Seq Scan on a (cost=0.00..23.60 rows=1360 width=32) Yeah, it looks like the code to check for distinctness in the subquery fails to consider that the join condition may contain RelabelTypes instead of plain Vars. The join would be removed if you'd written: explain select a.* from a left join b on a.id = b.id; so really the subquery version should be too. I'm undecided if this should be classed as a bug or just a missed optimisation. Certainly, the original code should have done this, so I'm leaning slightly towards this being a bug. The attached fixes. (CC'd -hackers since we're starting to discuss code changes. Further discussion which includes -hackers should drop the general list) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services join_removal_subquery_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] path toward faster partition pruning
On 21 August 2017 at 18:37, Amit Langote <langote_amit...@lab.ntt.co.jp> wrote: > I've been working on implementing a way to perform plan-time > partition-pruning that is hopefully faster than the current method of > using constraint exclusion to prune each of the potentially many > partitions one-by-one. It's not fully cooked yet though. I'm interested in seeing improvements in this area, so I've put my name down to review this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making clausesel.c Smarter
On 6 September 2017 at 00:43, Daniel Gustafsson <dan...@yesql.se> wrote: > This patch was moved to the currently open Commitfest. Given the above > comment, is the last patch in this thread still up for review, or are you > working on a new version? Just to avoid anyone reviewing an outdated version. Hi Daniel, I've not had time to work on a new version yet and I can't imagine that I will for quite some time. The idea I had to remove the quadratic search on the list was to add to or modify equal() in equalfuncs.c to have it return -1, 0, +1 and use that as a comparison function in a binary search tree. The Btree would complement List as a way of storing Nodes in no particular order, but in a way that a Node can be found quickly. There's likely more than a handful of places in the planner that would benefit from this already. It's not a 5-minute patch and it probably would see some resistance, so won't have time to look at this soon. If the possibility of this increasing planning time in corner cases is going to be a problem, then it might be best to return this with feedback for now and I'll resubmit if I get time later in the cycle. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together
On 24 August 2017 at 11:15, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > Here is a slightly updated patch for consideration in the upcoming > commit fest. Hi Peter, I just had a quick glance over this and wondered about 2 things. 1. Why a GUC and not a new per user option so it can be configured differently for different users? Something like ALTER USER ... WORKER LIMIT ; perhaps. I mentioned about this up-thread a bit. 2. + if (count > max_worker_processes_per_user) + { + ereport(LOG, + (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED), + errmsg("too many worker processes for role \"%s\"", + GetUserNameFromId(GetUserId(), false; + LWLockRelease(BackgroundWorkerLock); + return false; Unless I've misunderstood something, it seems that this is going to give random errors to users which might only occur when they run queries against larger tables. Part of why it made sense not to count workers towards the CONNECTION LIMIT was the fact that we didn't want to throw these random errors when workers could not be obtained when we take precautions in other places to just silently have fewer workers. There's lots of discussions earlier in this thread about this and I don't think anyone was in favour of queries randomly working sometimes. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
On 17 August 2017 at 01:20, Heikki Linnakangas <hlinn...@iki.fi> wrote: > Looks reasonable. I edited the comments and the variable names a bit, to my > liking, and committed. Thanks! Thanks for committing. I've just been catching up on all that went on while I was sleeping. Thanks for handling the cleanup too. I'll feel better once pademelon goes green again. From looking at the latest failure on it, it appears that your swapping of pg_atomic_write_u64 for pg_atomic_init_u64 should fix this. My apologies for that mistake. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixup some misusage of appendStringInfo and friends
On 16 August 2017 at 15:38, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > committed Thanks! -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regression in join selectivity estimations when using foreign keys
On 20 June 2017 at 07:49, Tom Lane <t...@sss.pgh.pa.us> wrote: > I'm not totally satisfied that there isn't any case where the smallest > selectivity hack is appropriate. In the example you're showing here, > the FK columns are independent so that we get more or less the right > answer with or without the FK. But in some quick testing I could not > produce a counterexample proving that that heuristic is helpful; > so for now let's can it. > > Thanks, and sorry again for the delay. Many thanks for taking the time on this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perfomance bug in v10
On 2 June 2017 at 03:46, Teodor Sigaev <teo...@sigaev.ru> wrote: > I miss here why could the presence of index influence on that? removing > index causes a good plan although it isn't used in both plans . Unique indexes are used as proofs when deciding if a join to the relation is "inner_unique". A nested loop unique join is costed more cheaply than a non-unique one since we can skip to the next outer tuple once we've matched the current outer tuple to an inner tuple. In theory that's half as many comparisons for a non-parameterised nested loop. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perfomance bug in v10
On 1 June 2017 at 04:16, Teodor Sigaev <teo...@postgrespro.ru> wrote: > I found an example where v10 chooses extremely non-optimal plan: > select > i::int as a, > i::int + 1 as b, > 0 as c > into t > from > generate_series(1,32) as i; > > create unique index i on t (c, a); > > explain analyze > SELECT > t1.a, t1.b, > t2.a, t2.b, > t3.a, t3.b, > t4.a, t4.b, > t5.a, t5.b, > t6.a, t6.b > /* > , > t7.a, t7.b, > t8.a, t8.b, > t9.a, t9.b, > t10.a, t10.b > */ > FROM t T1 > LEFT OUTER JOIN t T2 > ON T1.b = T2.a AND T2.c = 0 > LEFT OUTER JOIN t T3 > ON T2.b = T3.a AND T3.c = 0 > LEFT OUTER JOIN t T4 > ON T3.b = T4.a AND T4.c = 0 > LEFT OUTER JOIN t T5 > ON T4.b = T5.a AND T5.c = 0 > LEFT OUTER JOIN t T6 > ON T5.b = T6.a AND T6.c = 0 > LEFT OUTER JOIN t T7 > ON T6.b = T7.a AND T7.c = 0 > LEFT OUTER JOIN t T8 > ON T7.b = T8.a AND T8.c = 0 > LEFT OUTER JOIN t T9 > ON T8.b = T9.a AND T9.c = 0 > LEFT OUTER JOIN t T10 > ON T9.b = T10.a AND T10.c = 0 > WHERE T1.c = 0 AND T1.a = 5 > ; That's pretty unfortunate. It only chooses this plan due to lack of any useful stats on the table. The planner thinks that a seqscan on t6 with Filter (c = 0) will return 1 row, which is not correct. In the good plan t1 is the outer rel of the inner most loop. Here the filter is c = 0 and a = 5, which *does* actually return only 1 row, which means that all of the other nested loops only execute once, as predicted. This is all caused by get_variable_numdistinct() deciding that all values are distinct because ntuples < DEFAULT_NUM_DISTINCT. I see that if the example is increased to use 300 tuples instead of 32, then that's enough for the planner to estimate 2 rows instead of clamping to 1, and the bad plan does not look so good anymore since the planner predicts that those nested loops need to be executed more than once. I really think the planner is too inclined to take risks by nesting Nested loops like this, but I'm not all that sure the best solution to fix this, and certainly not for beta1. So, I'm a bit unsure exactly how best to deal with this. It seems like we'd better make some effort, as perhaps this could be a case that might occur when temp tables are used and not ANALYZED, but the only way I can think to deal with it is not to favour unique inner nested loops in the costing model. The unfortunate thing about not doing this is that the planner will no longer swap the join order of a 2-way join to put the unique rel on the inner side. This is evident by the regression test failures caused by patching with the attached, which changes the cost model for nested loops back to what it was before unique joins. My other line of thought is just not to bother doing anything about this. There's plenty more queries you could handcraft to trick the planner into generating a plan that'll blow up like this. Is this a realistic enough one to bother accounting for? Did it come from a real world case? else, how did you stumble upon it? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services dont_reduce_cost_of_inner_unique_nested_loops.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regression in join selectivity estimations when using foreign keys
On 22 May 2017 at 16:10, David Rowley <david.row...@2ndquadrant.com> wrote: > I also just noticed that I don't think I've got ANTI join cases > correct in the patch I sent. I'll look at that now. I've attached an updated patch. This one is much less invasive than my original attempt. There are two fundamental changes here: 1) OUTER joins now use the foreign key as proof that the join condition must match. 2) selectivity of nullfrac for null valued columns for OUTER joins is no longer taken into account. This is now consistent with INNER joins, which might not be perfect, but it's less surprising. If this is a problem then we can consider applying something like my 0002 patch above, however that can mean that nulls are double counted if there are any other strict clauses which are not part of the foreign key constraint, so that idea is not perfect either. In addition to those two things, the poor selectivity estimation in my original email is also fixed. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services fk_join_est_fix_2017-05-23.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regression in join selectivity estimations when using foreign keys
On 21 May 2017 at 07:56, Tom Lane <t...@sss.pgh.pa.us> wrote: > I'm entirely unconvinced by this patch --- it seems to simply be throwing > away a lot of logic. Notably it lobotomizes the FK code altogether for > semi/antijoin cases, but you've not shown any example that even involves > such joins, so what's the argument for doing that? Also, the reason > we had the use_smallest_selectivity code in the first place was that we > didn't believe the FK-based selectivity could be applied safely to > outer-join cases, so simply deciding that it's OK to apply it after all > seems insufficiently justified. Originally I looked at just multiplying the selectivities in use_smallest_selectivity, but on further thought, I didn't manage to see why using foreign keys to assist with selectivity estimations when the ref_is_outer is true. We still have a very high probability that the outer rel contains a tuple to match each inner rel's tuples (because inner references outer). The only difference between OUTER and INNER typed joins is that OUTER will produce a bunch of NULL rows in place of non-matched inner rows. That part seems to be handled in calc_joinrel_size_estimate() by code such as: if (nrows < outer_rows) nrows = outer_rows; We could do much better than what we have today by also adding outer_rows - (n_distinct inner rows on referencing Vars), to also account for those unmatched rows. However, I'd say that's not for back-patching, although it may be especially good now that we have multi-variate ndistinct in PG10. > Or in short, exhibiting one counterexample to the existing code is not > a sufficient argument for changing things this much. You need to give > an argument why this is the right thing to do instead. > > Stepping back a bit, it seems like the core thing the FK selectivity code > was meant to do was to prevent us from underestimating selectivity of > multiple-clause join conditions through a false assumption of clause > independence. The problem with the use_smallest_selectivity code is that > it's overestimating selectivity, but that doesn't mean that we want to go > all the way back to the old way of doing things. I wonder if we could get > any traction in these dubious cases by computing the product, instead of > minimum, of the clause selectivities (thus matching the old estimate) and > then taking the greater of that and the FK-based selectivity. My concern with going back to selectivity multiplication was exactly because I didn't want to go back to the original undesired behaviour of underestimation when conditions are co-related. I'm unsure why taking the greater is any better than just using the foreign key selectivity. It seems senseless to use clause based selectivities at all when we've proved the foreign key exists -- there will be no unmatched inner tuples. The reason I dropped the non-singleton join for SEMI and ANTI is that I couldn't see how we could make any improvements over the original join estimation code for this case. Perhaps I'm assuming too much about how get_foreign_key_join_selectivity() is used by doing this? I'm assuming that leaving the clause list intact and referring the whole case to calc_joinrel_size_estimate() is okay to do. The reason I added the nullfrac code in 0002 was because I was concerned with regressing OUTER join cases where the nulfrac works due to using the clause_selectivity(). Although this case regressed between 9.5 and 9.6 for INNER joins with a non-zero nullfrac on the join condition. In short; if we replace the use_smallest_selectivity code with fkselec *= clauselist_selectivity(root, removedlist, 0, jointype, sjinfo); then I'm worried about regressing back to the old underestimations. The extended stats code in PG10's version of clauselist_selectivity() will ignore these join conditions, so nothing can be done to assist the underestmations by adding extended stats yet. I also just noticed that I don't think I've got ANTI join cases correct in the patch I sent. I'll look at that now. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Regression in join selectivity estimations when using foreign keys
On 18 May 2017 at 20:28, David Rowley <david.row...@2ndquadrant.com> wrote: > A vastly simplified example case is: > > create table fkest (a int, b int, c int unique, primary key(a,b)); > create table fkest1 (a int, b int, primary key(a,b)); > > insert into fkest select x/10,x%10, x from generate_Series(1,400) x; > insert into fkest1 select x/10,x%10 from generate_Series(1,400) x; > > alter table fkest1 add constraint fkest1_a_b_fkey foreign key (a,b) > references fkest; > > analyze fkest; > analyze fkest1; > > explain (costs on) select * from fkest f > left join fkest1 f1 on f.a = f1.a and f.b = f1.b > left join fkest1 f2 on f.a = f2.a and f.b = f2.b > left join fkest1 f3 on f.a = f3.a and f.b = f3.b > where f.c = 1; I should have shown the EXPLAIN ANALYZE of this instead. QUERY PLAN -- Hash Left Join (cost=24.15..41.89 rows=996 width=36) (actual time=0.430..0.463 rows=1 loops=1) Hash Cond: ((f.a = f3.a) AND (f.b = f3.b)) -> Hash Left Join (cost=12.15..28.36 rows=100 width=28) (actual time=0.255..0.288 rows=1 loops=1) Hash Cond: ((f.a = f2.a) AND (f.b = f2.b)) -> Nested Loop Left Join (cost=0.15..16.21 rows=10 width=20) (actual time=0.046..0.079 rows=1 loops=1) -> Seq Scan on fkest f (cost=0.00..8.00 rows=1 width=12) (actual time=0.013..0.045 rows=1 loops=1) Filter: (c = 1) Rows Removed by Filter: 399 -> Index Only Scan using fkest1_pkey on fkest1 f1 (cost=0.15..8.17 rows=1 width=8) (actual time=0.031..0.031 rows=1 loops=1) Index Cond: ((a = f.a) AND (b = f.b)) Heap Fetches: 1 -> Hash (cost=6.00..6.00 rows=400 width=8) (actual time=0.180..0.180 rows=400 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 24kB -> Seq Scan on fkest1 f2 (cost=0.00..6.00 rows=400 width=8) (actual time=0.006..0.041 rows=400 loops=1) -> Hash (cost=6.00..6.00 rows=400 width=8) (actual time=0.162..0.162 rows=400 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 24kB -> Seq Scan on fkest1 f3 (cost=0.00..6.00 rows=400 width=8) (actual time=0.010..0.040 rows=400 loops=1) Planning time: 0.409 ms Execution time: 0.513 ms (19 rows) which you can obviously see the poor estimate propagating to up the plan tree. If we add another left join the final estimate is even worse: explain analyze select * from fkest f left join fkest1 f1 on f.a = f1.a and f.b = f1.b left join fkest1 f2 on f.a = f2.a and f.b = f2.b left join fkest1 f3 on f.a = f3.a and f.b = f3.b left join fkest1 f4 on f.a = f4.a and f.b = f4.b where f.c = 1; QUERY PLAN Hash Left Join (cost=36.15..69.06 rows=9915 width=44) (actual time=0.535..0.569 rows=1 loops=1) Hash Cond: ((f.a = f4.a) AND (f.b = f4.b)) -> Hash Left Join (cost=24.15..41.89 rows=996 width=36) (actual time=0.371..0.404 rows=1 loops=1) Hash Cond: ((f.a = f3.a) AND (f.b = f3.b)) -> Hash Left Join (cost=12.15..28.36 rows=100 width=28) (actual time=0.208..0.241 rows=1 loops=1) Hash Cond: ((f.a = f2.a) AND (f.b = f2.b)) -> Nested Loop Left Join (cost=0.15..16.21 rows=10 width=20) (actual time=0.029..0.062 rows=1 loops=1) -> Seq Scan on fkest f (cost=0.00..8.00 rows=1 width=12) (actual time=0.014..0.047 rows=1 loops=1) Filter: (c = 1) Rows Removed by Filter: 399 -> Index Only Scan using fkest1_pkey on fkest1 f1 (cost=0.15..8.17 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1) Index Cond: ((a = f.a) AND (b = f.b)) Heap Fetches: 1 -> Hash (cost=6.00..6.00 rows=400 width=8) (actual time=0.168..0.168 rows=400 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 24kB -> Seq Scan on fkest1 f2 (cost=0.00..6.00 rows=400 width=8) (actual time=0.008..0.043 rows=400 loops=1) -> Hash (cost=6.00..6.00 rows=400 width=8) (actual time=0.156..0.156 rows=400 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 24kB -> Seq Scan on fkest1 f3 (cost=0.00..6.00 rows=400 width=8) (actual time=0.006..0.035 rows=400 loops=1) -> Hash (cost=6.00..6.00 rows=400 width=8) (actual time=0.155..0.155 rows=400 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 24kB -> Seq Scan on fkest1 f4 (cost=0.00..6.00 rows=400 width=8) (actual time=0.004..0.034 rows=400 loops=1) Planning time: 0.864 ms
[HACKERS] Regression in join selectivity estimations when using foreign keys
000 width=4) (actual time=0.336..0.336 rows=1000 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 44kB -> Seq Scan on part p (cost=0.00..15.00 rows=1000 width=4) (actual time=0.014..0.132 rows=1000 loops=1) Planning time: 0.468 ms Execution time: 0.787 ms (8 rows) alter table sale drop constraint sale_partid_fkey; -- simulate pre-9.6 behaviour by dropping the fkey explain analyze select * from part p inner join sale s on p.partid = s.partid; -- normal join estimations. QUERY PLAN --- Hash Join (cost=27.50..55.12 rows=1 width=12) (actual time=0.546..0.546 rows=0 loops=1) Hash Cond: (s.partid = p.partid) -> Seq Scan on sale s (cost=0.00..15.00 rows=1000 width=8) (actual time=0.014..0.102 rows=1000 loops=1) -> Hash (cost=15.00..15.00 rows=1000 width=4) (actual time=0.387..0.387 rows=1000 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 44kB -> Seq Scan on part p (cost=0.00..15.00 rows=1000 width=4) (actual time=0.009..0.110 rows=1000 loops=1) Planning time: 0.278 ms Execution time: 0.572 ms I've attached fixes, based on master, for both of these cases. Patches: 0001: Is a minimal fix for the repored regression, but may cause further regression as it does nothing to account for NULLs valued columns in outer joins, but at least it is consistent with how INNER joins are estimated regarding NULLs. 0002: Adds further code to apply the nullfrac from pg_statistics. I've given this a round of testing and I can't find any case where it gives worse estimates than the standard join estimation as seen when you drop the foreign key, but I'd like to challenge someone else to give it a go and see if they can Not because I'm super confident, more just because I want this to be correct. Thoughts? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0001-Minimal-fix-for-foreign-key-join-estimations.patch Description: Binary data 0002-Apply-nullfrac-during-foreign-key-join-estimations.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Pulling up more complicated subqueries
On 18 May 2017 at 04:30, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, May 17, 2017 at 11:08 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote: >> That's not a straight semi-join, but we could still turn it into a new kind >> of LEFT-SEMI join. A left-semi join is like a left join, in that it returns >> all rows from the left side, and NULLs for any non-matches. And like a >> semi-join, it returns only one matching row from the right-side, if there >> are duplicates. In the qual, replace the SubLink with an IS NOT NULL test. > ... >> This can be implemented using yet another new join type, a LEFT-UNIQUE join. >> It's like a LEFT JOIN, but it must check that there are no duplicates in the >> right-hand-side, and throw an error if there are (ERROR: more than one row >> returned by a subquery used as an expression). > > It seems like we might want to split what is currently called JoinType > into two separate things -- one that is INNER/LEFT/RIGHT/FULL and the > other that says what to do about multiple matches, which could be that > they are expected, they are to be ignored (as in your LEFT-SEMI case), > or they should error out (as in your LEFT-UNIQUE case). I just wanted to mention that I almost got sucked down that hole with unique joins. Instead, I'd recommend following the pattern of passing bool flags down to the executor from the planner just like what is done for inner_unique in, say make_hashjoin() I did change unique joins at one point to overload the JoinType, but it was a much more scary thing to do as there's lots of code in the planner that does special things based on the join type, and the footprint of your patch and risk factor start to grow pretty rapidly once you do that. I mention something around this in [1]. [1] https://www.postgresql.org/message-id/CAKJS1f_jRki1PQ4X-9UGKa-wnBhECQLnrxCX5haQzu4SDR_r2Q%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CTE inlining
On 13 May 2017 at 08:39, Bruce Momjian <br...@momjian.us> wrote: > To summarize, it seems we have two options if we want to add fence > control to CTEs: > > 1. add INLINE to disable the CTE fence > 2. add MATERIALIZE to enable the CTE fence I think #1 is out of the question. What would we do in cases like: WITH INLINE cte AS (SELECT random() a) SELECT * FROM cte UNION SELECT * FROM cte; I assume we won't want to inline when the CTE query contains a volatile function, and we certainly won't in cases like: WITH INLINE cte AS (DELETE FROM a RETURNING *) INSERT INTO b SELECT * from cte WHERE cte.value > 5; We'd be certain to receive complaints from disgruntled users about "Why is this not inlined when I specified INLINE?" #2 does not suffer from that. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compiler warning in costsize.c
On 11 April 2017 at 12:53, Michael Paquier <michael.paqu...@gmail.com> wrote: > On Tue, Apr 11, 2017 at 4:02 AM, Robert Haas <robertmh...@gmail.com> wrote: >> On Mon, Apr 10, 2017 at 2:09 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> ilm...@ilmari.org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=) writes: >>>> Why bother with the 'rte' variable at all if it's only used for the >>>> Assert()ing the rtekind? >>> >>> That was proposed a few messages back. I don't like it because it makes >>> these functions look different from the other scan-cost-estimation >>> functions, and we'd just have to undo the "optimization" if they ever >>> grow a need to reference the rte for another purpose. >> >> I think that's sort of silly, though. It's a trivial difference, >> neither likely to confuse anyone nor difficult to undo. > > +1. I would just do that and call it a day. There is no point to do a > mandatory list lookup as that's just for an assertion, and fixing this > warning does not seem worth the addition of fancier facilities. If the > function declarations were doubly-nested in the code, I would > personally consider the use of a variable, but not here. Any more thoughts on what is acceptable for fixing this? beta1 is looming and it seems a bit messy to be shipping that with these warnings, however harmless they are. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
On 6 May 2017 at 13:44, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > In Linux, each process that opens a file gets its own 'file' > object[1][5]. Each of those has it's own 'file_ra_state' > object[2][3], used by ondemand_readahead[4] for sequential read > detection. So I speculate that page-at-a-time parallel seq scan must > look like random access to Linux. > > In FreeBSD the situation looks similar. Each process that opens a > file gets a 'file' object[8] which has members 'f_seqcount' and > 'f_nextoff'[6]. These are used by the 'sequential_heuristics' > function[7] which affects the ioflag which UFS/FFS uses to control > read ahead (see ffs_read). So I speculate that page-at-a-time > parallel seq scan must look like random access to FreeBSD too. > > In both cases I suspect that if you'd inherited (or sent the file > descriptor to the other process via obscure tricks), it would actually > work because they'd have the same 'file' entry, but that's clearly not > workable for md.c. > Interesting! > Experimentation required... Indeed. I do remember long discussions on this before Parallel seq scan went in, but I don't recall if anyone checked any OS kernels to see what they did. We really need a machine with good IO concurrency, and not too much RAM to test these things out. It could well be that for a suitability large enough table we'd want to scan a whole 1GB extent per worker. I did post a patch to have heap_parallelscan_nextpage() use atomics instead of locking over in [1], but I think doing atomics there does not rule out also adding batching later. In fact, I think it structures things so batching would be easier than it is today. [1] https://www.postgresql.org/message-id/CAKJS1f9tgsPhqBcoPjv9_KUPZvTLCZ4jy=B=bhqgakn7cyz...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
On 5 May 2017 at 14:54, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > Just for fun, check out pages 42 and 43 of Wei Hong's thesis. He > worked on Berkeley POSTGRES parallel query and a spin-off called XPRS, > and they got linear seq scan scaling up to number of spindles: > > http://db.cs.berkeley.edu/papers/ERL-M93-28.pdf That's interesting. I'd no idea that work was done. Actually, I didn't really know that anyone had thought to have more than one processor back then :-) And I also now know the origins of the tenk1 table in the regression database. Those 10,000 rows were once used for benchmarking! I'm glad we're all using a couple more rows these days. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Atomics for heap_parallelscan_nextpage()
Hi, A while back I did some benchmarking on a big 4 socket machine to explore a bit around the outer limits of parallel aggregates. I discovered along the way that, given enough workers, and a simple enough task, that seq-scan workers were held up waiting for the lock to be released in heap_parallelscan_nextpage(). I've since done a little work in this area to try to improve things. I ended up posting about it yesterday in [1]. My original patch used batching to solve the issue; instead of allocating 1 block at a time, the batching patch allocated a range of 10 blocks for the worker to process. However, the implementation still needed a bit of work around reporting sync-scan locations. Andres mentioned in [2] that it might be worth exploring using atomics to do the same job. So I went ahead and did that, and came up with the attached, which is a slight variation on what he mentioned in the thread. To keep things a bit more simple, and streamline, I ended up pulling out the logic for setting the startblock into another function, which we only call once before the first call to heap_parallelscan_nextpage(). I also ended up changing phs_cblock and replacing it with a counter that always starts at zero. The actual block is calculated based on that + the startblock modulo nblocks. This makes things a good bit more simple for detecting when we've allocated all the blocks to the workers, and also works nicely when wrapping back to the start of a relation when we started somewhere in the middle due to piggybacking with a synchronous scan. Performance: With parallel_workers=71, it looks something like: Query 1: 881 GB, ~6 billion row TPC-H lineitem table. tpch=# select count(*) from lineitem; count 589709 (1 row) -- Master Time: 123421.283 ms (02:03.421) Time: 118895.846 ms (01:58.896) Time: 118632.546 ms (01:58.633) -- Atomics patch Time: 74038.813 ms (01:14.039) Time: 73166.200 ms (01:13.166) Time: 72492.338 ms (01:12.492) -- Batching Patch: Batching 10 pages at a time in heap_parallelscan_nextpage() Time: 76364.215 ms (01:16.364) Time: 75808.900 ms (01:15.809) Time: 74927.756 ms (01:14.928) Query 2: Single int column table with 2 billion rows. tpch=# select count(*) from a; count 20 (1 row) -- Master Time: 5853.918 ms (00:05.854) Time: 5925.633 ms (00:05.926) Time: 5859.223 ms (00:05.859) -- Atomics patch Time: 5825.745 ms (00:05.826) Time: 5849.139 ms (00:05.849) Time: 5815.818 ms (00:05.816) -- Batching Patch: Batching 10 pages at a time in heap_parallelscan_nextpage() Time: 5789.237 ms (00:05.789) Time: 5837.395 ms (00:05.837) Time: 5821.492 ms (00:05.821) I've also attached a text file with the perf report for the lineitem query. You'll notice that the heap_parallelscan_nextpage() is very visible in master, but not on each of the two patches. With the 2nd query, heap_parallelscan_nextpage is fairly insignificant on master's profile, it's only showing up as 0.48%. Likely this must be due to more tuples being read from the page, and more aggregation work getting done before the next page is needed. I'm uncertain why I previously saw a speed up in this case in [1]. I've also noticed that both the atomics patch and unpatched master do something that looks a bit weird with synchronous seq-scans. If the parallel seq-scan piggybacked on another scan, then subsequent parallel scans will start at the same non-zero block location, even when no other concurrent scans exist. I'd have expected this should go back to block 0 again, but maybe I'm just failing to understand the reason for reporting the startblock to ss_report_location() at the end of the scan. I'll now add this to the first commitfest of pg11. I just wanted to note that I've done this, so that it's less likely someone else goes and repeats the same work. [1] https://www.postgresql.org/message-id/CAKJS1f-XhfQ2-%3D85wgYo5b3WtEs%3Dys%3D2Rsq%3DNuvnmaV4ZsM1XQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/20170505023646.3uhnmf2hbwtm63lc%40alap3.anarazel.de -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Unpatched select count(*) from lineitem; 71 workers Children Self Command Shared Object Symbol + 99.83% 0.00% postgres libpthread-2.17.so[.] __restore_rt + 99.83% 0.00% postgres postgres [.] sigusr1_handler + 99.83% 0.00% postgres postgres [.] maybe_start_bgworkers + 99.83% 0.00% postgres postgres [.] do_start_bgworker + 99.83% 0.93% postgres postgres [.] ExecProcNode + 99.83% 0.00% postgres postgres [.] StartBackgroundWorker + 99.83% 0.00% postgres postgres [.] ParallelWorkerMain + 99.83% 0.00% postgres postgres [.] ParallelQueryMain + 99.83% 0.00% postgres postgres [.] ExecutorRun + 99.83%
Re: [HACKERS] CTE inlining
On 5 May 2017 at 14:04, Tom Lane <t...@sss.pgh.pa.us> wrote: > Craig Ringer <craig.rin...@2ndquadrant.com> writes: >> We're carefully maintaining this bizarre cognitive dissonance where we >> justify the need for using this as a planner hint at the same time as >> denying that we have a hint. That makes it hard to make progress here. >> I think there's fear that we're setting some kind of precedent by >> admitting what we already have. > > I think you're overstating the case. It's clear that there's a > significant subset of CTE functionality where there has to be an > optimization fence. The initial implementation basically took the > easy way out by deeming *all* CTEs to be optimization fences. Maybe > we shouldn't have documented that behavior, but we did. Now we're > arguing about how much of a compatibility break it'd be to change that > planner behavior. I don't see any particular cognitive dissonance here, > just disagreements about the extent to which backwards compatibility is > more important than better query optimization. How about we get the ball rolling on this in v10 and pull that part out of the docs. If anything that'll buy us a bit more wiggle room to change this in v11. I've attached a proposed patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services doc_caution_about_cte_changes_in_the_future.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
On 5 May 2017 at 14:36, Andres Freund <and...@anarazel.de> wrote: > I wonder how much doing the atomic ops approach alone can help, that > doesn't have the issue that the work might be unevenly distributed > between pages. I wondered that too, since I though the barrier for making this change would be lower by doing it that way. I didn't manage to think of a way to get around the wrapping the position back to 0 when synch-scans are involved. i.e parallel_scan->phs_cblock++; if (parallel_scan->phs_cblock >= scan->rs_nblocks) parallel_scan->phs_cblock = 0; -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
On 3 May 2017 at 07:13, Robert Haas <robertmh...@gmail.com> wrote: > Multiple people (including David Rowley > as well as folks here at EnterpriseDB) have demonstrated that for > certain queries, we can actually use a lot more workers and everything > works great. The problem is that for other queries, using a lot of > workers works terribly. The planner doesn't know how to figure out > which it'll be - and honestly, I don't either. For me, it seems pretty much related to the number of tuples processed on a worker, vs how many they return. As a general rule, I'd say the higher this ratio, the higher the efficiency ratio will be for the worker. Although that's not taking into account contention points where workers must wait for fellow workers to complete some operation. I think parallel_tuple_cost is a good GUC to have, perhaps we can be smarter about the use of it when deciding on how many workers should be used. By efficiency, I mean that if a query takes 10 seconds in a normal serial plan, and adding 1 worker, it takes 5 seconds, it would be 100% efficient to use another worker. I charted this in [1]. It would have been interesting to chart the same in a query that returned a larger number of groups, but I ran out of time, but I think it pretty much goes, without testing, that more groups == less efficiency. Which'll be due to more overhead in parallel tuple communication, and more work to do in the serial portion of the plan. [1] https://blog.2ndquadrant.com/parallel-monster-benchmark -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
On 5 May 2017 at 13:37, Andres Freund <and...@anarazel.de> wrote: > On 2017-05-02 15:13:58 -0400, Robert Haas wrote: >> Multiple people (including David Rowley >> as well as folks here at EnterpriseDB) have demonstrated that for >> certain queries, we can actually use a lot more workers and everything >> works great. The problem is that for other queries, using a lot of >> workers works terribly. The planner doesn't know how to figure out >> which it'll be - and honestly, I don't either. > > Have those benchmarks, even in a very informal form, been shared / > collected / referenced centrally? I'd be very interested to know where > the different contention points are. Possibilities: I posted mine on [1], although the post does not go into much detail about the contention points. I only really briefly mention it at the end. [1] https://blog.2ndquadrant.com/parallel-monster-benchmark/#comment-248273 -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] modeling parallel contention (was: Parallel Append implementation)
On 3 May 2017 at 07:13, Robert Haas <robertmh...@gmail.com> wrote: > It is of course possible that the Parallel Seq Scan could run into > contention problems if the number of workers is large, but in my > experience there are bigger problems here. The non-parallel Seq Scan > can also contend -- not of course over the shared mutex because there > isn't one, but over access to the blocks themselves. Every one of > those blocks has a content lock and a buffer header and so on, and > having multiple processes accessing those things at the same time > scales well, but not perfectly. The Hash node can also contend: if > the hash join spills to disk, you've got multiple processes reading > and writing to the temp directory at the same time and, of course, > that can be worse than just one process doing it -- sometimes much > worse. It can also be better, depending on how much I/O gets > generated and how much I/O bandwidth you have. Yeah, I did get some time to look over the contention in Parallel Seq Scan a while back and I discovered that on the machine that I was testing on. the lock obtained in heap_parallelscan_nextpage() was causing workers to have to wait for other workers to fetch their next task to work on. I ended up writing the attached (which I'd not intended to post until some time closer to when the doors open for PG11). At the moment it's basically just a test patch to see how it affects things when we give workers a bit more to do before they come back to look for more work. In this case, I've just given them 10 pages to work on, instead of the 1 that's allocated in 9.6 and v10. A quick test on a pretty large table on a large machine shows: Unpatched: postgres=# select count(*) from a; count 187400 (1 row) Time: 5211.485 ms (00:05.211) Patched: postgres=# select count(*) from a; count 187400 (1 row) Time: 2523.983 ms (00:02.524) So it seems worth looking into. "a" was just a table with a single int column. I'm unsure as yet if there are more gains to be had for tables with wider tuples. I do suspect the patch will be a bigger win in those cases, since there's less work to do for each page, e.g less advance aggregate calls, so likely they'll be looking for their next page a bit sooner. Now I'm not going to pretend that this patch is ready for the prime-time. I've not yet worked out how to properly report sync-scan locations without risking reporting later pages after reporting the end of the scan. What I have at the moment could cause a report to be missed if SYNC_SCAN_REPORT_INTERVAL is not divisible by the batch size. I'm also not sure how batching like this affect read-aheads, but at least the numbers above speak for something. Although none of the pages in this case came from disk. I'd had thoughts that the 10 pages wouldn't be constant, but the batching size would depend on the size of the relation to be scanned. I'd rough ideas to just try to make about 1 million batches. Something like batch_pages = Max(parallel_scan->phs_nblocks / 100, 1); so that we only take more than 1 page if there's some decent amount to process. We don't want to make the batches too big as we might end up having to wait on slow workers at the end of a scan. Anyway. I don't want to hi-jack this thread with discussions on this. I just wanted to mark that I plan to work on this in order to avoid any repeat developments or analysis. I'll probably start a new thread for this sometime nearer PG11's dev cycle. The patch is attached if in the meantime someone wants to run this on some big hardware. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services parallel_nextpage_batching_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()
On 2 May 2017 at 00:10, David Rowley <david.row...@2ndquadrant.com> wrote: > On 20 April 2017 at 07:29, Euler Taveira <eu...@timbira.com.br> wrote: >> 2017-04-19 1:32 GMT-03:00 Michael Paquier <michael.paqu...@gmail.com>: >>> >>> I vote for "location" -> "lsn". I would expect complains about the >>> current inconsistency at some point, and the function names have been >>> already changed for this release.. > > OK, so I've created a draft patch which does this. I ended up adding this to the open items list. I feel it's best to be on there so that we don't forget about this. If we decide not to do it then we can just remove it from the list, but it would be a shame to release the beta having forgotten to make this change. Do any of the committers who voted for this change feel inclined to pick this patch up? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()
On 20 April 2017 at 07:29, Euler Taveira <eu...@timbira.com.br> wrote: > 2017-04-19 1:32 GMT-03:00 Michael Paquier <michael.paqu...@gmail.com>: >> >> I vote for "location" -> "lsn". I would expect complains about the >> current inconsistency at some point, and the function names have been >> already changed for this release.. OK, so I've created a draft patch which does this. In summary of what it does 1. Renames all wal_location functions to wal_lsn. 2. Renames all system view columns to use "lsn" instead of "location" 3. Rename function parameters to use "lsn" instead of "location". 4. Rename function parameters "wal_position" to "lsn". (Not mentioned before, but seems consistency was high on the list from earlier comments on the thread) 5. Change documentation to reflect the above changes. 6. Fix bug where docs claimed return type of pg_logical_slot_peek_changes.location was text, when it was pg_lsn (maybe apply separately?) 7. Change some places in the func.sgml where it was referring to the lsn as a "position" rather than "location". (We want consistency here) Is this touching all places which were mentioned by everyone? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services location2lsn_rename.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
On 29 April 2017 at 15:39, Tom Lane <t...@sss.pgh.pa.us> wrote: > I'm kind of strongly tempted to apply the second patch; but it would > be fair to complain that reduce_unique_semijoins() is new development > and should wait for v11. Opinions? My vote is for the non-minimal patch. Of course, I'd be voting for minimal patch if this was for a minor version release fix, but we're not even in beta yet for v10. The original patch was intended to fix cases like this, although I'd failed to realise this particular case. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
(On 29 April 2017 at 02:26, Tom Lane <t...@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korot...@postgrespro.ru> writes: >> I've reproduced this bug on d981074c. >> On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query >> result is OK. >> But with seqscan and hashjoin disabled, query returns 0 rows. > > Ah, thanks for the clue about enable_hashjoin, because it wasn't > reproducing for me as stated. > > It looks like in the case that's giving wrong answers, the mergejoin > is wrongly getting marked as "Inner Unique". Something's a bit too () > cheesy about that planner logic --- not sure what, yet. Seems related to the unconditional setting of extra.inner_unique to true for JOIN_UNIQUE_INNER jointypes in add_paths_to_joinrel() Setting this based on the return value of innerrel_is_unique() as done with the other join types seems to fix the issue. I don't know yet if that's the correct fix. It's pretty late 'round this side to be thinking too hard about it. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] convert EXSITS to inner join gotcha and bug
On 29 April 2017 at 00:45, Alexander Korotkov <a.korot...@postgrespro.ru> wrote: > On default config, after loading example.sql.bz2 and VACUUM ANALYZE, query > result is OK. Hi, Did you mean to attach this? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fixup some misusage of appendStringInfo and friends
On 27 April 2017 at 06:41, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > On 4/19/17 08:42, Ashutosh Bapat wrote: >> I reviewed the patch. It compiles clean, make check-world passes. I do >> not see any issue with it. > > Looks reasonable. Let's keep it for the next commit fest. Thank you to both of you for looking. I'd thought that maybe the new stuff in PG10 should be fixed before the release. If we waited, and fix in PG11 then backpatching is more of a pain. However, I wasn't careful in the patch to touch only new to PG10 code. I'll defer to your better judgment and add to the next 'fest. Thanks -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign Join pushdowns not working properly for outer joins
On 27 April 2017 at 01:31, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > committed Great. Thanks! -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign Join pushdowns not working properly for outer joins
On 26 April 2017 at 02:12, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > On 4/24/17 22:50, Peter Eisentraut wrote: >> On 4/14/17 00:24, Ashutosh Bapat wrote: >>> This looks better. Here are patches for master and 9.6. >>> Since join pushdown was supported in 9.6 the patch should be >>> backported to 9.6 as well. Attached is the patch (_96) for 9.6, >>> created by rebasing on 9.6 branch and removing conflict. _v6 is >>> applicable on master. >> >> Committed to PG10. I'll work on backpatching next. > > For backpatching to 9.6, I came up with the attached reduced version. > Since we don't have add_foreign_grouping_paths() in 9.6, we can omit the > refactoring and keep the changes much simpler. Does that make sense? That makes sense to me. It fixes the reported issue and is less invasive than the pg10 patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect use of ERRCODE_UNDEFINED_COLUMN in extended stats
On 25 April 2017 at 02:15, Tom Lane <t...@sss.pgh.pa.us> wrote: > David Rowley <david.row...@2ndquadrant.com> writes: >> The attached small patched fixes an incorrect usage of an error code >> in the extended stats code. > > Hmm, looks like all of that could do with an editorial pass (e.g., > "duplicity" does not mean what the comments author seems to think). Was about to look at this, but see you've beaten me to it. Thanks for committing and for fixing the other stuff too. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PG 10 release notes
..On 25 April 2017 at 13:31, Bruce Momjian <br...@momjian.us> wrote: > The only unusual thing is that this release has ~180 items while most > recent release have had ~220. The pattern I see that there are more > large features in this release than previous ones. Thanks for drafting this up. I understand that it may have been filtered out, but I'd say that 7e534adcdc70 is worth a mention. Users creating BRIN indexes previously would have had to know beforehand that the table would be sufficiently correlated on the indexed columns for the index to be worthwhile, whereas now there's a lesser need for the user to know this beforehand. Also: New commands are CREATE, ALTER, and DROP STATISTICS. This is helpful in estimating query memory usage and ... HOW IS RATIO USED? HOW IS RATIO USED? There are two types of new stats; ndistinct, which can improve row estimations in the query planner for estimating the number of distinct value groups over multiple columns. functionaldeps, which provides the planner with a better understanding of functional depdenences between columns on which the statistics are defined. The planner takes the functional dependency degree into account when multiplying selectivity estimations for multiple columns. Unsure how best to trim that down to something short enough for the release notes. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Incorrect use of ERRCODE_UNDEFINED_COLUMN in extended stats
The attached small patched fixes an incorrect usage of an error code in the extended stats code. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services ext_stats_duplicate_column_errorcode_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 22 April 2017 at 21:30, Simon Riggs <si...@2ndquadrant.com> wrote: > CREATE STATISTICS s1 USING (dependencies, ndistinct) ON (a, b) FROM t1 > WHERE partial-stuff; +1 Seems much more CREATE INDEX like, and that's pretty good given that most of the complaints so far were about it bearing enough resemblance to CREATE INDEX -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WITH clause in CREATE STATISTICS
On 21 April 2017 at 22:30, Tels <nospam-pg-ab...@bloodgate.com> wrote: > I'd rather see: > > CREATE STATISTICS stats_name ON table(col); > > as this both mirrors CREATE INDEX and foreign keys with REFERENCES. It > could also be extended to both more columns, expressions or other tables > like so: > > CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b); > > and even: > > CREATE STATISTICS stats ON t1(col1, col2 / 2), t2 (a,b) WITH (options) > WHERE t2.a > 4; How would you express a join condition with that syntax? > This looks easy to remember, since it compares to: > > CREATE INDEX idx_name ON t2(a,b) WITH (options) WHERE t2.a > 4; > > Or am I'm missing something? Sadly yes, you are, and it's not the first time. I seem to remember mentioning this to you already in [1]. Please, can you read over [2], it mentions exactly what you're proposing and why it's not any good. [1] https://www.postgresql.org/message-id/cakjs1f9hmet+7adiceau8heompob5pkkcvyzliezje3dvut...@mail.gmail.com [2] https://www.postgresql.org/message-id/CAEZATCUtGR+U5+QTwjHhe9rLG2nguEysHQ5NaqcK=vbj78v...@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Why is get_cheapest_parallel_safe_total_inner() in pathkeys.c?
This probably ended up here because there's a bunch of other functions named get_cheapest* in that file, but all of those relate to getting a path for specific PathKeys. get_cheapest_parallel_safe_total_inner() does not do that. Maybe allpaths.c is a better home for it? It seems to have been added in [1] [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a71f10189dc10a2fe422158a2c9409e0f77c6b9e -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fixup some misusage of appendStringInfo and friends
The attached cleans up a few small misusages of appendStringInfo and related functions. Some similar work was done in, f92d6a540ac443f85f0929b284edff67da14687a d02f16470f117db3038dbfd87662d5f0eb5a2a9b cacbdd78106526d7c4f11f90b538f96ba8696fb0 so the majority of these should all be new misuseages. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services appendStringInfo_fixes.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()
On 19 April 2017 at 15:31, Tom Lane <t...@sss.pgh.pa.us> wrote: > David Rowley <david.row...@2ndquadrant.com> writes: >> OK, so I've read over this thread again and I think it's time to >> summarise the votes: >> ... >> In favour of "location" -> "lsn": Stephen, David Steel, >> In favour of "lsn" -> "location": Peter, Tom, Kyotaro > > FWIW, I was not voting in favor of "location"; I was just saying that > I wanted consistency. If we're voting which way to move, please count > me as a vote for "lsn". Updated votes: In favour of "location" -> "lsn": Tom, Stephen, David Steel In favour of "lsn" -> "location": Peter, Kyotaro -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()
On 19 April 2017 at 03:33, David Steele <da...@pgmasters.net> wrote: > +1, and I'm in favor of using "lsn" wherever applicable. It seems to me > that if a user calls a function (or queries a table) that returns an lsn > then they should be aware of what an lsn is. OK, so I've read over this thread again and I think it's time to summarise the votes: It seem that; Robert mentions he'd be inclined not to tinker with this, but mentions nothing of inconsistency. Bruce mentions he'd like consistency but does not mention which he'd prefer. Stephen wants consistency and votes to change "location" to "lsn" in regards to a pg_lsn. Peter would rather see use of "location", mentions about changing the new in v10 stuff, but not about the existing 9.6 usages of lsn in column names Tom would not object to use of "location" over "lsn". Kyotaro would rather see the use of "location" for all apart from the pg_lsn operator functions. Unsure how we handle pg_wal_location_diff() David Steel would rather see this changed to use "lsn" instead of location. So in summary, nobody apart from Robert appeared to have any concerns over breaking backward compatibility. In favour of "location" -> "lsn": Stephen, David Steel, In favour of "lsn" -> "location": Peter, Tom, Kyotaro I left Bruce out since he just voted for consistency. So "lsn" -> "location" seems to be winning Is that enough to proceed? Anyone else? The patch to do this should likely also include a regression test to ensure nothing new creeps in which breaks the new standard. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extended stats not friendly towards ANALYZE with subset of columns
On 18 April 2017 at 05:12, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > Pushed, after tweaking so that a WARNING is still emitted, because it's > likely to be useful in pointing out a procedural mistake while extended > stats are being tested. Thanks for pushing. Seems you maintained most of my original patch and added to it a bit, but credits don't seem to reflect that. It's not the end of the world but just wanted to note that. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing extended stats on foreign and partitioned tables
On 18 April 2017 at 09:01, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > David Rowley wrote: >> While reviewing extended stats I noticed that it was possible to >> create extended stats on many object types, including sequences. I >> mentioned that this should be disallowed. Statistics were then changed >> to be only allowed on plain tables and materialized views. >> >> This should be relaxed again to allow anything ANALYZE is allowed on. >> >> The attached does this. > > The error message was inconsistent in the case of indexes, because of > using heap_open instead of relation_open. That seemed gratuitous, so I > flipped it, added test for the whole thing and pushed. Thanks for changing that and pushing this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign Join pushdowns not working properly for outer joins
On 13 April 2017 at 11:22, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > Is this patch considered ready for review as a backpatch candidate? Yes, however, the v5 patch is based on master. The v4 patch should apply to 9.6. Diffing the two patches I see another tiny change to a comment, of which I think needs re-worded anyway. + * This function should usually set FDW options in fpinfo after the join is + * deemed safe to push down to save some CPU cycles. But We need server + * specific options like extensions to decide push-down safety. For + * checking extension shippability, we need foreign server as well. + */ This might be better written as: Ordinarily, we might be tempted into delaying the merging of the FDW options until we've deemed the foreign join to be ok. However, we must do this before performing this test so that we know which quals can be evaluated on the foreign server. This may depend on the shippable_extensions. Apart from that, it all looks fine to me. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_statistic_ext.staenabled might not be the best column name
I'd been thinking that staenabled is not the most suitable column name for storing the types of statistics that are defined for the extended statistics. For me, this indicates that something can be disabled, but there's no syntax for that, and even if there was, if we were to enable/disable the kinds, we'd likely want to keep tabs on which kinds were originally defined, otherwise it's more of an ADD and DROP than an ENABLE/DISABLE. "stakind" seems like a better name. I'd have personally gone with "statype" but pg_statistic already thinks stakind is better. A patch which changes this is attached -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services ext_stats_rename_staenabled.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign Join pushdowns not working properly for outer joins
On 12 April 2017 at 21:45, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > On Wed, Apr 12, 2017 at 12:18 PM, David Rowley > <david.row...@2ndquadrant.com> wrote: >> On 10 March 2017 at 17:33, Ashutosh Bapat >> <ashutosh.ba...@enterprisedb.com> wrote: >>> Yes and I also forgot to update the function prologue to refer to the >>> fpinfo_o/i instead of inner and outer relations. Attached patch >>> corrects it. >> >> Hi Ashutosh, >> >> This seems to have conflicted with 28b04787. Do you want to rebase, or >> should I? > > Here you go. Looks like the wrong patch. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Foreign Join pushdowns not working properly for outer joins
On 10 March 2017 at 17:33, Ashutosh Bapat <ashutosh.ba...@enterprisedb.com> wrote: > Yes and I also forgot to update the function prologue to refer to the > fpinfo_o/i instead of inner and outer relations. Attached patch > corrects it. Hi Ashutosh, This seems to have conflicted with 28b04787. Do you want to rebase, or should I? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_stats_ext view does not seem all that useful
During my review and time spent working on the functional dependencies part of extended statistics I wondered what was the use for the pg_stats_ext view. I was unsure why the length of the serialised dependencies was useful. Perhaps we could improve the view, but I'm not all that sure what value it would add. Maybe we need to discuss this, but in the meantime, I've attached a patch which just removes the view completely -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services drop_pg_stats_ext_view.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Allowing extended stats on foreign and partitioned tables
While reviewing extended stats I noticed that it was possible to create extended stats on many object types, including sequences. I mentioned that this should be disallowed. Statistics were then changed to be only allowed on plain tables and materialized views. This should be relaxed again to allow anything ANALYZE is allowed on. The attached does this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services ext_stats_on_analyzable_objects.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()
... and of course the other functions matching *wal*location* My thoughts here are that we're already breaking backward compatibility of these functions for PG10, so thought we might want to use this as an opportunity to fix the naming a bit more. I feel that the "location" word not the best choice. We also have a function called pg_tablespace_location() to give us the path that a tablespace is stored in, so I could understand anyone who's confused about what pg_current_wal_location() might do if they're looking at the function name only, and not the docs. For me, "lsn" suits these function names much better, so I'd like to see what other's think. It would be sad to miss this opportunity without at least discussing this first. The functions in question are: postgres=# \dfS *wal*location* List of functions Schema | Name | Result data type | Argument data types | Type ++--+-+ pg_catalog | pg_current_wal_flush_location | pg_lsn | | normal pg_catalog | pg_current_wal_insert_location | pg_lsn | | normal pg_catalog | pg_current_wal_location| pg_lsn | | normal pg_catalog | pg_last_wal_receive_location | pg_lsn | | normal pg_catalog | pg_last_wal_replay_location| pg_lsn | | normal pg_catalog | pg_wal_location_diff | numeric | pg_lsn, pg_lsn | normal (6 rows) Opinions? -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compiler warning in costsize.c
On 8 April 2017 at 04:42, Tom Lane <t...@sss.pgh.pa.us> wrote: > I'd be happier with something along the line of > > RangeTblEntry *rte; > ListCell *lc; > > /* Should only be applied to base relations that are subqueries */ > Assert(rel->relid > 0); > rte = planner_rt_fetch(rel->relid, root); > #ifdef USE_ASSERT_CHECKING > Assert(rte->rtekind == RTE_SUBQUERY); > #else > (void) rte; /* silence unreferenced-variable complaints */ > #endif the (void) rte; would not be required to silence MSVC here. Of course, PG_USED_FOR_ASSERTS_ONLY would be required to stop some smarter compiler from complaining. > assuming that that actually does silence the warning on MSVC. > > BTW, is it really true that only these two places produce such warnings > on MSVC? I see about three dozen uses of PG_USED_FOR_ASSERTS_ONLY in our > tree, and I'd have expected all of those places to be causing warnings on > a compiler that doesn't have a way to understand that annotation. Seems that MSVC is happy once the variable is assigned, and does not bother checking if the variable is used after being assigned, whereas, some other compilers might see the variable as uselessly assigned. At the moment there are no other warnings from MSVC since all the other places the variable gets assigned a value in some code path. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making clausesel.c Smarter
On 8 April 2017 at 09:33, Claudio Freire <klaussfre...@gmail.com> wrote: > Otherwise, the patch LGTM, but I'd like to solve the quadratic > behavior too... are you going to try? Otherwise I could take a stab at > it myself. It doesn't seem very difficult. I have some ideas in my head in a fairly generic way of solving this which I'd like to take care of in PG11. Many thanks for your reviews and suggestions on this patch, it's much appreciated. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance improvement for joins where outer side is unique
On 8 April 2017 at 14:23, Tom Lane <t...@sss.pgh.pa.us> wrote: > David Rowley <david.row...@2ndquadrant.com> writes: > [ unique_joins_2017-04-07b.patch ] > > It turned out that this patch wasn't as close to committable as I'd > thought, but after a full day of whacking at it, I got to a place > where I thought it was OK. So, pushed. Many thanks for taking the time to do this, and committing too! -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making clausesel.c Smarter
On 4 April 2017 at 13:35, Claudio Freire <klaussfre...@gmail.com> wrote: > On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire <klaussfre...@gmail.com> wrote: > If you ask me, I'd just leave: > > + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound == > DEFAULT_INEQ_SEL) > + { > + /* No point in checking null selectivity, DEFAULT_INEQ_SEL > implies we have no stats */ > + s2 = DEFAULT_RANGE_INEQ_SEL; > + } > + else > + { > ... > + s2 = rqlist->hibound + rqlist->lobound - 1.0; > + nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid); > + notnullsel = 1.0 - nullsel; > + > + /* Adjust for double-exclusion of NULLs */ > + s2 += nullsel; > + if (s2 <= 0.0) { > + if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6)) > + { > + /* Most likely contradictory clauses found */ > + s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel; > + } > + else > + { > + /* Could be a rounding error */ > + s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel; > + } > + } > + } > > Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly > cautious) estimation of the amount of rounding error that could be > there with 32-bit floats. > > The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is > an estimation based on stats, maybe approximate, but one that makes > sense (ie: preserves the monotonicity of the CPF), and as such > negative values are either sign of a contradiction or rounding error. > The previous code left the possibility of negatives coming out of > default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates, > but that doesn't seem possible coming out of scalarineqsel. I notice this does change the estimates for contradictory clauses such as: create table a (value int); insert into a select x/100 from generate_Series(1,1) x; analyze a; explain analyze select * from a where value between 101 and -1; We now get: QUERY PLAN - Seq Scan on a (cost=0.00..195.00 rows=1 width=4) (actual time=1.904..1.904 rows=0 loops=1) Filter: ((value >= 101) AND (value <= '-1'::integer)) Rows Removed by Filter: 1 Planning time: 0.671 ms Execution time: 1.950 ms (5 rows) where before we'd get: QUERY PLAN -- Seq Scan on a (cost=0.00..195.00 rows=50 width=4) (actual time=0.903..0.903 rows=0 loops=1) Filter: ((value >= 101) AND (value <= '-1'::integer)) Rows Removed by Filter: 1 Planning time: 0.162 ms Execution time: 0.925 ms (5 rows) Which comes from the 1 * 0.005 selectivity estimate from tuples * DEFAULT_RANGE_INEQ_SEL I've attached a patch to this effect. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services smarter_clausesel_2017-04-07a.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time to change pg_regress diffs to unified by default?
On 7 April 2017 at 10:31, Andres Freund <and...@anarazel.de> wrote: > Hi, > > I personally, and I know of a bunch of other regular contributors, find > context diffs very hard to read. Besides general dislike, for things > like regression test output context diffs are just not well suited. > E.g. in > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=prairiedog=2017-04-06%2021%3A10%3A56=check > the salient point (ERROR: 50 is outside the valid range for parameter > "effective_io_concurrency" (0 .. 0)) > is 130 lines into the diff, whereas it's right at the start in a unified diff. > Issues with one error that causes a lot of followup output changes are > really common in our regression suite. > > I personally have PG_REGRESS_DIFF_OPTS set to -dU10, but that doesn't > help much analyzing buildfarm output. > > Therefore I propose changing the defaults in pg_regress.c. You mean people actually use those diffs? I've never done anything apart from using . That way I can reject and accept the changes as I wish, just by kicking the results over to the expected results, or not, if there's a genuine mistake. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance improvement for joins where outer side is unique
On 7 April 2017 at 13:41, Tom Lane <t...@sss.pgh.pa.us> wrote: > David Rowley <david.row...@2ndquadrant.com> writes: >> On 7 April 2017 at 11:47, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> What I'm on about is that you can't do the early advance to the >>> next outer tuple unless you're sure that all the quals that were >>> relevant to the uniqueness proof have been checked for the current >>> inner tuple. That affects all three join types not only merge. > >> Well, look at the join code and you'll see this only happens after the >> joinqual is evaulated. I didn't make a special effort here. I just >> borrowed the location that JOIN_SEMI was already using. > > Right, and that's exactly the point: some of the conditions you're > depending on might have ended up in the otherqual not the joinqual. > > We'd discussed rearranging the executor logic enough to deal with > such situations and agreed that it seemed too messy; but that means > that the optimization needs to take care not to use otherqual > (ie pushed-down) conditions in the uniqueness proofs. > >> Oh yeah. I get it, but that's why we ignore !can_join clauses > > can_join seems to me to be not particularly relevant ... there's > nothing that prevents that from getting set for pushed-down clauses. > > It's possible that the case I'm worried about is unreachable in > practice because all the conditions that could be of interest would? > be strict and therefore would have forced join strength reduction. > But I'm not comfortable with assuming that. Okay, well how about we protect against that by not using such quals as unique proofs? We'd just need to ignore anything that's outerjoin_delayed? If we're struggling to think of a case that this will affect, then we shouldn't be too worried about any missed optimisations. A patch which does this is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services unique_joins_2017-04-07b.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance improvement for joins where outer side is unique
On 7 April 2017 at 11:47, Tom Lane <t...@sss.pgh.pa.us> wrote: > David Rowley <david.row...@2ndquadrant.com> writes: >> On 7 April 2017 at 07:26, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> I'm looking through this, and I'm failing to see where it deals with >>> the problem we discussed last time, namely that you can't apply the >>> optimization unless all clauses that were used in the uniqueness >>> proof are included in the join's merge/hash conditions + joinquals. > >> The code in question is: >> mergestate->mj_SkipMarkRestore = !mergestate->js.joinqual && >> mergestate->js.first_inner_tuple_only; > > Uh, AFAICS that only protects the skip-mark-and-restore logic. > What I'm on about is that you can't do the early advance to the > next outer tuple unless you're sure that all the quals that were > relevant to the uniqueness proof have been checked for the current > inner tuple. That affects all three join types not only merge. Well, look at the join code and you'll see this only happens after the joinqual is evaulated. I didn't make a special effort here. I just borrowed the location that JOIN_SEMI was already using. For example, from hash join: if (joinqual == NULL || ExecQual(joinqual, econtext)) { node->hj_MatchedOuter = true; HeapTupleHeaderSetMatch(HJTUPLE_MINTUPLE(node->hj_CurTuple)); /* In an antijoin, we never return a matched tuple */ if (node->js.jointype == JOIN_ANTI) { node->hj_JoinState = HJ_NEED_NEW_OUTER; continue; } /* * Skip to the next outer tuple if we only need to join to * the first inner matching tuple. */ if (node->js.first_inner_tuple_only) node->hj_JoinState = HJ_NEED_NEW_OUTER; Note the first line and the final two lines. Here's the one from nested loop: if (ExecQual(joinqual, econtext)) { node->nl_MatchedOuter = true; /* In an antijoin, we never return a matched tuple */ if (node->js.jointype == JOIN_ANTI) { node->nl_NeedNewOuter = true; continue; /* return to top of loop */ } /* * Skip to the next outer tuple if we only need to join to the * first inner matching tuple. */ if (node->js.first_inner_tuple_only) node->nl_NeedNewOuter = true; Again, note the first line and final 2 lines. > The case that would be relevant to this is, eg, > > create table t1 (f1 int, f2 int, primary key(f1,f2)); > > select * from t_outer left join t1 on (t_outer.f1 = t1.f1) > where t_outer.f2 = t2.f2; hmm, that query is not valid, unless you have created some table named t_outer. I don't know how you've defined that. So I guess you must have meant: postgres=# explain verbose select * from t1 t_outer left join t1 on (t_outer.f1 = t1.f1) where t_outer.f2 = t1.f2; QUERY PLAN --- Hash Join (cost=66.50..133.57 rows=128 width=16) Output: t_outer.f1, t_outer.f2, t1.f1, t1.f2 Inner Unique: Yes Hash Cond: ((t_outer.f1 = t1.f1) AND (t_outer.f2 = t1.f2)) -> Seq Scan on public.t1 t_outer (cost=0.00..32.60 rows=2260 width=8) Output: t_outer.f1, t_outer.f2 -> Hash (cost=32.60..32.60 rows=2260 width=8) Output: t1.f1, t1.f2 -> Seq Scan on public.t1 (cost=0.00..32.60 rows=2260 width=8) Output: t1.f1, t1.f2 (10 rows) Which did become an INNER JOIN due to the strict W If you'd had done: postgres=# explain verbose select * from t1 t_outer left join t1 on (t_outer.f1 = t1.f1) where t_outer.f2 = t1.f2 or t1.f1 is null; QUERY PLAN Merge Left Join (cost=0.31..608.67 rows=255 width=16) Output: t_outer.f1, t_outer.f2, t1.f1, t1.f2 Inner Unique: No Merge Cond: (t_outer.f1 = t1.f1) Filter: ((t_outer.f2 = t1.f2) OR (t1.f1 IS NULL)) -> Index Only Scan using t1_pkey on public.t1 t_outer (cost=0.16..78.06 rows=2260 width=8) Output: t_outer.f1, t_outer.f2 -> Materialize (cost=0.16..83.71 rows=2260 width=8) Output: t1.f1, t1.f2 -> Index Only Scan using t1_pkey on public.t1 (cost=0.16..78.06 rows=2260 width=8) Output: t1.f1, t1.f2 (11 rows) You'll notice that "Inner Unique: No" > Your existing patch would think t1 is unique-inner, but the qual pushed > down from WHERE would not be a joinqual so the wrong thing would happen > at runtime. > > (Hm ... actually, this example wouldn't fail as written because > the WHERE qual is probably strict, so the left join would get > reduced to an inner join and then pushed-down-ness no longer > matters. But hopefully you get my drift.) Oh yeah. I get it, but that's why we ignore !can_join clauses /* Ignore if it's not a mergejoinable clause */ if (!restrictinfo->can_join