Re: Generating partitioning tuple conversion maps faster
On Mon, Jun 17, 2019 at 12:03:03PM +0200, didier wrote: > cherry-pick apply cleanly and with a 100 columns table it improves > performance nicely (20%). 42f70cd is a performance improvement, and not a bug fix. -- Michael signature.asc Description: PGP signature
Re: Generating partitioning tuple conversion maps faster
Hi, Any chance for a backport to PG 11? cherry-pick apply cleanly and with a 100 columns table it improves performance nicely (20%). Regards Didier On Sat, Jul 14, 2018 at 1:25 AM David Rowley wrote: > > On 14 July 2018 at 04:57, Heikki Linnakangas wrote: > > Pushed, thanks! > > Thanks for pushing, and thanks again for reviewing it, Alexander. > > -- > David Rowley http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: Generating partitioning tuple conversion maps faster
On 14 July 2018 at 04:57, Heikki Linnakangas wrote: > Pushed, thanks! Thanks for pushing, and thanks again for reviewing it, Alexander. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Generating partitioning tuple conversion maps faster
On 09/07/18 22:58, David Rowley wrote: On 9 July 2018 at 23:28, Alexander Kuzmenkov wrote: On 07/09/2018 10:13 AM, David Rowley wrote: I've attached v5. v5 looks good to me, I've changed the status to ready. Many thanks for reviewing this. Pushed, thanks! - Heikki
Re: Generating partitioning tuple conversion maps faster
On 07/09/2018 10:13 AM, David Rowley wrote: I've attached v5. v5 looks good to me, I've changed the status to ready. Please feel free to add yourself as an author of this patch in the commitfest app. You've probably contributed about as much as I have to this. Thanks, I'm fine with being credited as a reviewer. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Generating partitioning tuple conversion maps faster
On 7 July 2018 at 01:08, Alexander Kuzmenkov wrote: > Great. I think we can use the same approach for make_inh_translation_list, > as in the attached patch. It show some improvement on test 6. I got the > following tps, median of 11 runs (forgot to turn off fsync though): > > test masterv3 v4 > 1 414 416 408 > 2 259 409 404 > 3 263 400 405 > 4 417 416 404 > 5 118 311 305 > 6 85 280 303 Nice. I think that's a good idea. Although, I didn't really like the use of the comma operator you added. I think since TupleDescAttr can't be NULL we can just do: (att = TupleDescAttr(new_tupdesc, new_attno))->attisdropped There's no shortage of other places that do TupleDescAttr(...)-> so I think that's perfectly fine. I also did a slight rewording of the comment above that new code. I've attached v5. Please feel free to add yourself as an author of this patch in the commitfest app. You've probably contributed about as much as I have to this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v5-0001-Improve-performance-of-tuple-conversion-map-gener.patch Description: Binary data
Re: Generating partitioning tuple conversion maps faster
On 07/05/2018 02:52 PM, David Rowley wrote: On 30 June 2018 at 05:40, David Rowley wrote: I think your idea to reduce the loops in test 6 from 2000 down to 1001 should be worth it. I'll try the idea out next week. The attached changes things to use your way of picking up the search for the next column at the column after the last match was found. Great. I think we can use the same approach for make_inh_translation_list, as in the attached patch. It show some improvement on test 6. I got the following tps, median of 11 runs (forgot to turn off fsync though): test master v3 v4 1 414 416 408 2 259 409 404 3 263 400 405 4 417 416 404 5 118 311 305 6 85 280 303 -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company >From a21da8b6e875977eff7b313e37975b39b390a03e Mon Sep 17 00:00:00 2001 From: "dgrow...@gmail.com" Date: Mon, 25 Jun 2018 16:31:47 +1200 Subject: [PATCH] Improve performance of tuple conversion map generation Previously convert_tuples_by_name_map naively performed a search of each outdesc column starting at the first column in indesc and searched each indesc column until a match was found. When partitioned tables had many columns this could result in slow generation of the tuple conversion maps. For INSERT and UPDATE statements that touched few rows, this could mean a very large overhead indeed. We can do a bit better with this loop. It's quite likely that the columns in partitioned tables and their partitions are in the same order, so it makes sense to start searching for each column outer column at the inner column position 1 after where the previous match was found (per idea from Alexander Kuzmenkov). This makes the best case search O(N) instead of O(N^2). The worst case is still O(N^2), but it seems unlikely that would happen. --- src/backend/access/common/tupconvert.c | 37 ++--- src/backend/optimizer/prep/prepunion.c | 38 -- 2 files changed, 47 insertions(+), 28 deletions(-) diff --git a/src/backend/access/common/tupconvert.c b/src/backend/access/common/tupconvert.c index 2d0d2f4..11038c6 100644 --- a/src/backend/access/common/tupconvert.c +++ b/src/backend/access/common/tupconvert.c @@ -295,12 +295,16 @@ convert_tuples_by_name_map(TupleDesc indesc, const char *msg) { AttrNumber *attrMap; - int n; + int outnatts; + int innatts; int i; + int nextindesc = -1; - n = outdesc->natts; - attrMap = (AttrNumber *) palloc0(n * sizeof(AttrNumber)); - for (i = 0; i < n; i++) + outnatts = outdesc->natts; + innatts = indesc->natts; + + attrMap = (AttrNumber *) palloc0(outnatts * sizeof(AttrNumber)); + for (i = 0; i < outnatts; i++) { Form_pg_attribute outatt = TupleDescAttr(outdesc, i); char *attname; @@ -313,10 +317,28 @@ convert_tuples_by_name_map(TupleDesc indesc, attname = NameStr(outatt->attname); atttypid = outatt->atttypid; atttypmod = outatt->atttypmod; - for (j = 0; j < indesc->natts; j++) + + /* + * Now search for an attribute with the same name in the indesc. It + * seems likely that a partitioned table will have the attributes in + * the same order as the partition, so the search below is optimized + * for that case. It is possible that columns are dropped in one of + * the relations, but not the other, so we use the 'nextindesc' + * counter to track the starting point of the search. If the inner + * loop encounters dropped columns then it will have to skip over + * them, but it should leave 'nextindesc' at the correct position for + * the next outer loop. + */ + for (j = 0; j < innatts; j++) { - Form_pg_attribute inatt = TupleDescAttr(indesc, j); + Form_pg_attribute inatt; + + nextindesc++; + if (nextindesc >= innatts) +nextindesc = 0; + + inatt = TupleDescAttr(indesc, nextindesc); if (inatt->attisdropped) continue; if (strcmp(attname, NameStr(inatt->attname)) == 0) @@ -330,7 +352,7 @@ convert_tuples_by_name_map(TupleDesc indesc, attname, format_type_be(outdesc->tdtypeid), format_type_be(indesc->tdtypeid; -attrMap[i] = (AttrNumber) (j + 1); +attrMap[i] = inatt->attnum; break; } } @@ -343,7 +365,6 @@ convert_tuples_by_name_map(TupleDesc indesc, format_type_be(outdesc->tdtypeid), format_type_be(indesc->tdtypeid; } - return attrMap; } diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 2d47024..7d7517b 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -51,6 +51,7 @@ #include "utils/lsyscache.h" #include "utils/rel.h" #include "utils/selfuncs.h" +#include "utils/syscache.h" typedef struct @@ -1896,9 +1897,11 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation, List
Re: Generating partitioning tuple conversion maps faster
On 30 June 2018 at 05:40, David Rowley wrote: > I think your idea > to reduce the loops in test 6 from 2000 down to 1001 should be worth > it. I'll try the idea out next week. The attached changes things to use your way of picking up the search for the next column at the column after the last match was found. I don't think performance will have changed much from the last version, but here are the performance results (in tps) from my laptop this time. fsync=off. Test Unpatched Patched Gain Test 1 873.837 912.981 104.48% Test 2 213.004 895.268 420.31% Test 3 224.810 887.099 394.60% Test 4 1177.533 1107.767 94.08% Test 5 97.707 402.258 411.70% Test 6 72.025 360.702 500.80% Tests are all the same as the tests done in the initial email on this thread. Tests 1 and 4 are non-partitioned tests. Any variance there should just be noise. No code was changed in either of those tests. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v3-0001-Improve-performance-of-tuple-conversion-map-gener.patch Description: Binary data
Re: Generating partitioning tuple conversion maps faster
On 30 June 2018 at 00:22, Alexander Kuzmenkov wrote: > On 06/29/2018 03:25 AM, David Rowley wrote: >> I've attached a patch that uses SearchSysCacheAttName to speed up >> these translations in the planner. > > Good idea. On my desktop, this gives 270 tps dropped vs 610 tps plain (for > updates). If you combine it with persistent inner loop index, it's probably > going to be even faster, because it will only require one catalog access for > each index shift. Now it looks like it goes to catalog for every column > after the dropped one. Thanks for testing. > What about convert_tuples_by_name_map, do you plan to switch it to catalog > lookups as well? Syscache? I didn't really see an obvious way to get the relids down to the function. e.g the call through ExecEvalConvertRowtype() -> convert_tuples_by_name() does not have a Relation to work with, just a TupleDesc. I think further work might be diminishing returns. I think your idea to reduce the loops in test 6 from 2000 down to 1001 should be worth it. I'll try the idea out next week. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Generating partitioning tuple conversion maps faster
On 06/29/2018 03:25 AM, David Rowley wrote: I've attached a patch that uses SearchSysCacheAttName to speed up these translations in the planner. Good idea. On my desktop, this gives 270 tps dropped vs 610 tps plain (for updates). If you combine it with persistent inner loop index, it's probably going to be even faster, because it will only require one catalog access for each index shift. Now it looks like it goes to catalog for every column after the dropped one. What about convert_tuples_by_name_map, do you plan to switch it to catalog lookups as well? -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Generating partitioning tuple conversion maps faster
On 29 June 2018 at 11:10, David Rowley wrote: > On further inspection, the slowdown is coming from the planner in > make_inh_translation_list(). The INSERT path does not hit that since > the planner's job is pretty simple for simple INSERTs. I've attached a patch that uses SearchSysCacheAttName to speed up these translations in the planner. This puts test 6 more at the level I'd have expected. Here are fresh benchmarks results taken with the attached, again on AWS m5d instance, though probably not the same one as before (fsync=off). Unpatched: Test 1 tps = 922.479156 (excluding connections establishing) Test 2 tps = 334.701555 (excluding connections establishing) Test 3 tps = 327.547316 (excluding connections establishing) Test 4 tps = 1198.924131 (excluding connections establishing) Test 5 tps = 125.130723 (excluding connections establishing) Test 6 tps = 81.511072 (excluding connections establishing) Patched Test 1 tps = 918.105382 (excluding connections establishing) Test 2 tps = 913.315387 (excluding connections establishing) Test 3 tps = 893.578988 (excluding connections establishing) Test 4 tps = 1213.238177 (excluding connections establishing) Test 5 tps = 459.022550 (excluding connections establishing) Test 6 tps = 416.835747 (excluding connections establishing) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services v2_speedup_building_tuple_conversion_maps.patch Description: Binary data
Re: Generating partitioning tuple conversion maps faster
On 29 June 2018 at 02:23, Alexander Kuzmenkov wrote: > I think I found one possible reason for this slowness. Your patch behaves as > expected when there is a dropped output column, but does one extra > comparison when there is a dropped input column. This backwards conversion > is called from ExecInitRoutingInfo. To fix this, I'd just keep a persistent > inner loop counter (see the attached patch). It's just 2000 comparisons vs 1000. > Still, fixing this doesn't improve the performance. According to perf > report, updatepd.sql spends 25% of time in strcmp, and updatep.sql about > 0.5%, but they are comparing the same column names, even with the same > alignment and relative offsets. I'm completely puzzled by this. On further inspection, the slowdown is coming from the planner in make_inh_translation_list(). The INSERT path does not hit that since the planner's job is pretty simple for simple INSERTs. > As a side thought, we wouldn't have to go through this if we had a hash > table that is easy to use, or perhaps string interning in catcache. Maybe it's better to try the direct lookup and fall back on SearchSysCacheAttName() if the same attnum does not have the same name. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Generating partitioning tuple conversion maps faster
On 06/25/2018 08:00 AM, David Rowley wrote: I'd have expected Test 6 to do about 480-500tps. Adding debug to check why it's not revealed that the search is doing what's expected. I'm unsure why it has not improved more. I think I found one possible reason for this slowness. Your patch behaves as expected when there is a dropped output column, but does one extra comparison when there is a dropped input column. This backwards conversion is called from ExecInitRoutingInfo. To fix this, I'd just keep a persistent inner loop counter (see the attached patch). Still, fixing this doesn't improve the performance. According to perf report, updatepd.sql spends 25% of time in strcmp, and updatep.sql about 0.5%, but they are comparing the same column names, even with the same alignment and relative offsets. I'm completely puzzled by this. As a side thought, we wouldn't have to go through this if we had a hash table that is easy to use, or perhaps string interning in catcache. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company diff --git a/src/backend/access/common/tupconvert.c b/src/backend/access/common/tupconvert.c index 2d0d2f4..573ddd82 100644 --- a/src/backend/access/common/tupconvert.c +++ b/src/backend/access/common/tupconvert.c @@ -297,6 +297,7 @@ convert_tuples_by_name_map(TupleDesc indesc, AttrNumber *attrMap; int n; int i; + int nextInput = -1; n = outdesc->natts; attrMap = (AttrNumber *) palloc0(n * sizeof(AttrNumber)); @@ -315,7 +316,13 @@ convert_tuples_by_name_map(TupleDesc indesc, atttypmod = outatt->atttypmod; for (j = 0; j < indesc->natts; j++) { - Form_pg_attribute inatt = TupleDescAttr(indesc, j); + Form_pg_attribute inatt; + + nextInput++; + if (nextInput >= indesc->natts) +nextInput = 0; + + inatt = TupleDescAttr(indesc, nextInput); if (inatt->attisdropped) continue; @@ -330,7 +337,7 @@ convert_tuples_by_name_map(TupleDesc indesc, attname, format_type_be(outdesc->tdtypeid), format_type_be(indesc->tdtypeid; -attrMap[i] = (AttrNumber) (j + 1); +attrMap[i] = (AttrNumber) (nextInput + 1); break; } }
Re: Generating partitioning tuple conversion maps faster
On Mon, Jun 25, 2018 at 10:57 AM Amit Langote wrote: > > On 2018/06/25 14:12, amul sul wrote: > > On Mon, Jun 25, 2018 at 10:30 AM David Rowley > > wrote: > >> > > [...] > >> Given the small size of this patch. I think it's a good candidate for > >> the July 'fest. > >> > > > > Just a different thought, how about having flag array something like > > needed_tuple_conv? > > > > While loading partdesc, we could calculate that the columns of partition > > table > > are aligned with the parent or not? > > Yeah maybe, but ISTM, that could be implemented *in addition to* the > tweaks to convert_tuples_by_name_map that David's proposing, which seem > helpful in their own right. > Yes, I agree. Regards, Amul
Re: Generating partitioning tuple conversion maps faster
On 2018/06/25 14:12, amul sul wrote: > On Mon, Jun 25, 2018 at 10:30 AM David Rowley > wrote: >> > [...] >> Given the small size of this patch. I think it's a good candidate for >> the July 'fest. >> > > Just a different thought, how about having flag array something like > needed_tuple_conv? > > While loading partdesc, we could calculate that the columns of partition table > are aligned with the parent or not? Yeah maybe, but ISTM, that could be implemented *in addition to* the tweaks to convert_tuples_by_name_map that David's proposing, which seem helpful in their own right. Thanks, Amit
Re: Generating partitioning tuple conversion maps faster
On Mon, Jun 25, 2018 at 10:30 AM David Rowley wrote: > [...] > Given the small size of this patch. I think it's a good candidate for > the July 'fest. > Just a different thought, how about having flag array something like needed_tuple_conv? While loading partdesc, we could calculate that the columns of partition table are aligned with the parent or not? Regards, Amul