Re: Generating partitioning tuple conversion maps faster

2019-06-18 Thread Michael Paquier
On Mon, Jun 17, 2019 at 12:03:03PM +0200, didier wrote:
> cherry-pick apply cleanly and with a 100 columns table it improves
> performance nicely (20%).

42f70cd is a performance improvement, and not a bug fix.
--
Michael


signature.asc
Description: PGP signature


Re: Generating partitioning tuple conversion maps faster

2019-06-17 Thread didier
Hi,
Any chance for a backport to PG 11?

cherry-pick apply cleanly and with a 100 columns table it improves
performance nicely (20%).

Regards
Didier


On Sat, Jul 14, 2018 at 1:25 AM David Rowley
 wrote:
>
> On 14 July 2018 at 04:57, Heikki Linnakangas  wrote:
> > Pushed, thanks!
>
> Thanks for pushing, and thanks again for reviewing it, Alexander.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>




Re: Generating partitioning tuple conversion maps faster

2018-07-13 Thread David Rowley
On 14 July 2018 at 04:57, Heikki Linnakangas  wrote:
> Pushed, thanks!

Thanks for pushing, and thanks again for reviewing it, Alexander.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Generating partitioning tuple conversion maps faster

2018-07-13 Thread Heikki Linnakangas

On 09/07/18 22:58, David Rowley wrote:

On 9 July 2018 at 23:28, Alexander Kuzmenkov  wrote:

On 07/09/2018 10:13 AM, David Rowley wrote:

I've attached v5.


v5 looks good to me, I've changed the status to ready.


Many thanks for reviewing this.


Pushed, thanks!

- Heikki



Re: Generating partitioning tuple conversion maps faster

2018-07-09 Thread Alexander Kuzmenkov

On 07/09/2018 10:13 AM, David Rowley wrote:

I've attached v5.


v5 looks good to me, I've changed the status to ready.


Please feel free to add yourself as an author of this patch in the
commitfest app. You've probably contributed about as much as I have to
this.


Thanks, I'm fine with being credited as a reviewer.

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Generating partitioning tuple conversion maps faster

2018-07-09 Thread David Rowley
On 7 July 2018 at 01:08, Alexander Kuzmenkov  wrote:
> Great. I think we can use the same approach for make_inh_translation_list,
> as in the attached patch. It show some improvement on test 6. I got the
> following tps, median of 11 runs (forgot to turn off fsync though):
>
> test  masterv3 v4
> 1  414 416 408
> 2  259 409 404
> 3  263 400 405
> 4  417 416 404
> 5  118 311 305
> 6  85  280 303

Nice. I think that's a good idea.  Although, I didn't really like the
use of the comma operator you added. I think since TupleDescAttr can't
be NULL we can just do:

(att = TupleDescAttr(new_tupdesc, new_attno))->attisdropped

There's no shortage of other places that do TupleDescAttr(...)-> so I
think that's perfectly fine.

I also did a slight rewording of the comment above that new code. I've
attached v5.

Please feel free to add yourself as an author of this patch in the
commitfest app. You've probably contributed about as much as I have to
this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v5-0001-Improve-performance-of-tuple-conversion-map-gener.patch
Description: Binary data


Re: Generating partitioning tuple conversion maps faster

2018-07-06 Thread Alexander Kuzmenkov

On 07/05/2018 02:52 PM, David Rowley wrote:

On 30 June 2018 at 05:40, David Rowley  wrote:

I think your idea
to reduce the loops in test 6 from 2000 down to 1001 should be worth
it. I'll try the idea out next week.

The attached changes things to use your way of picking up the search
for the next column at the column after the last match was found.


Great. I think we can use the same approach for 
make_inh_translation_list, as in the attached patch. It show some 
improvement on test 6. I got the following tps, median of 11 runs 
(forgot to turn off fsync though):


test  master    v3 v4
1  414 416 408
2  259 409 404
3  263 400 405
4  417 416 404
5  118 311 305
6  85  280 303

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From a21da8b6e875977eff7b313e37975b39b390a03e Mon Sep 17 00:00:00 2001
From: "dgrow...@gmail.com" 
Date: Mon, 25 Jun 2018 16:31:47 +1200
Subject: [PATCH] Improve performance of tuple conversion map generation

Previously convert_tuples_by_name_map naively performed a search of each
outdesc column starting at the first column in indesc and searched each
indesc column until a match was found.  When partitioned tables had many
columns this could result in slow generation of the tuple conversion maps.
For INSERT and UPDATE statements that touched few rows, this could mean a
very large overhead indeed.

We can do a bit better with this loop.  It's quite likely that the columns
in partitioned tables and their partitions are in the same order, so it
makes sense to start searching for each column outer column at the inner
column position 1 after where the previous match was found (per idea from
Alexander Kuzmenkov). This makes the best case search O(N) instead of
O(N^2).  The worst case is still O(N^2), but it seems unlikely that would
happen.
---
 src/backend/access/common/tupconvert.c | 37 ++---
 src/backend/optimizer/prep/prepunion.c | 38 --
 2 files changed, 47 insertions(+), 28 deletions(-)

diff --git a/src/backend/access/common/tupconvert.c b/src/backend/access/common/tupconvert.c
index 2d0d2f4..11038c6 100644
--- a/src/backend/access/common/tupconvert.c
+++ b/src/backend/access/common/tupconvert.c
@@ -295,12 +295,16 @@ convert_tuples_by_name_map(TupleDesc indesc,
 		   const char *msg)
 {
 	AttrNumber *attrMap;
-	int			n;
+	int			outnatts;
+	int			innatts;
 	int			i;
+	int			nextindesc = -1;
 
-	n = outdesc->natts;
-	attrMap = (AttrNumber *) palloc0(n * sizeof(AttrNumber));
-	for (i = 0; i < n; i++)
+	outnatts = outdesc->natts;
+	innatts = indesc->natts;
+
+	attrMap = (AttrNumber *) palloc0(outnatts * sizeof(AttrNumber));
+	for (i = 0; i < outnatts; i++)
 	{
 		Form_pg_attribute outatt = TupleDescAttr(outdesc, i);
 		char	   *attname;
@@ -313,10 +317,28 @@ convert_tuples_by_name_map(TupleDesc indesc,
 		attname = NameStr(outatt->attname);
 		atttypid = outatt->atttypid;
 		atttypmod = outatt->atttypmod;
-		for (j = 0; j < indesc->natts; j++)
+
+		/*
+		 * Now search for an attribute with the same name in the indesc. It
+		 * seems likely that a partitioned table will have the attributes in
+		 * the same order as the partition, so the search below is optimized
+		 * for that case.  It is possible that columns are dropped in one of
+		 * the relations, but not the other, so we use the 'nextindesc'
+		 * counter to track the starting point of the search.  If the inner
+		 * loop encounters dropped columns then it will have to skip over
+		 * them, but it should leave 'nextindesc' at the correct position for
+		 * the next outer loop.
+		 */
+		for (j = 0; j < innatts; j++)
 		{
-			Form_pg_attribute inatt = TupleDescAttr(indesc, j);
+			Form_pg_attribute inatt;
+
+			nextindesc++;
 
+			if (nextindesc >= innatts)
+nextindesc = 0;
+
+			inatt = TupleDescAttr(indesc, nextindesc);
 			if (inatt->attisdropped)
 continue;
 			if (strcmp(attname, NameStr(inatt->attname)) == 0)
@@ -330,7 +352,7 @@ convert_tuples_by_name_map(TupleDesc indesc,
 	   attname,
 	   format_type_be(outdesc->tdtypeid),
 	   format_type_be(indesc->tdtypeid;
-attrMap[i] = (AttrNumber) (j + 1);
+attrMap[i] = inatt->attnum;
 break;
 			}
 		}
@@ -343,7 +365,6 @@ convert_tuples_by_name_map(TupleDesc indesc,
 			   format_type_be(outdesc->tdtypeid),
 			   format_type_be(indesc->tdtypeid;
 	}
-
 	return attrMap;
 }
 
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 2d47024..7d7517b 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -51,6 +51,7 @@
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
 #include "utils/selfuncs.h"
+#include "utils/syscache.h"
 
 
 typedef struct
@@ -1896,9 +1897,11 @@ make_inh_translation_list(Relation oldrelation, Relation newrelation,
 	List	

Re: Generating partitioning tuple conversion maps faster

2018-07-05 Thread David Rowley
On 30 June 2018 at 05:40, David Rowley  wrote:
> I think your idea
> to reduce the loops in test 6 from 2000 down to 1001 should be worth
> it. I'll try the idea out next week.

The attached changes things to use your way of picking up the search
for the next column at the column after the last match was found.

I don't think performance will have changed much from the last
version, but here are the performance results (in tps) from my laptop
this time. fsync=off.

Test Unpatched Patched Gain
Test 1 873.837 912.981 104.48%
Test 2 213.004 895.268 420.31%
Test 3 224.810 887.099 394.60%
Test 4 1177.533 1107.767 94.08%
Test 5 97.707 402.258 411.70%
Test 6 72.025 360.702 500.80%

Tests are all the same as the tests done in the initial email on this thread.

Tests 1 and 4 are non-partitioned tests. Any variance there should
just be noise. No code was changed in either of those tests.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v3-0001-Improve-performance-of-tuple-conversion-map-gener.patch
Description: Binary data


Re: Generating partitioning tuple conversion maps faster

2018-06-29 Thread David Rowley
On 30 June 2018 at 00:22, Alexander Kuzmenkov
 wrote:
> On 06/29/2018 03:25 AM, David Rowley wrote:
>> I've attached a patch that uses SearchSysCacheAttName to speed up
>> these translations in the planner.
>
> Good idea. On my desktop, this gives 270 tps dropped vs 610 tps plain (for
> updates). If you combine it with persistent inner loop index, it's probably
> going to be even faster, because it will only require one catalog access for
> each index shift. Now it looks like it goes to catalog for every column
> after the dropped one.

Thanks for testing.

> What about convert_tuples_by_name_map, do you plan to switch it to catalog
> lookups as well?

Syscache? I didn't really see an obvious way to get the relids down to
the function. e.g the call through ExecEvalConvertRowtype() ->
convert_tuples_by_name() does not have a Relation to work with, just a
TupleDesc.

I think further work might be diminishing returns. I think your idea
to reduce the loops in test 6 from 2000 down to 1001 should be worth
it. I'll try the idea out next week.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Generating partitioning tuple conversion maps faster

2018-06-29 Thread Alexander Kuzmenkov

On 06/29/2018 03:25 AM, David Rowley wrote:

I've attached a patch that uses SearchSysCacheAttName to speed up
these translations in the planner.


Good idea. On my desktop, this gives 270 tps dropped vs 610 tps plain 
(for updates). If you combine it with persistent inner loop index, it's 
probably going to be even faster, because it will only require one 
catalog access for each index shift. Now it looks like it goes to 
catalog for every column after the dropped one.


What about convert_tuples_by_name_map, do you plan to switch it to 
catalog lookups as well?


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Generating partitioning tuple conversion maps faster

2018-06-28 Thread David Rowley
On 29 June 2018 at 11:10, David Rowley  wrote:
> On further inspection, the slowdown is coming from the planner in
> make_inh_translation_list(). The INSERT path does not hit that since
> the planner's job is pretty simple for simple INSERTs.

I've attached a patch that uses SearchSysCacheAttName to speed up
these translations in the planner.

This puts test 6 more at the level I'd have expected.

Here are fresh benchmarks results taken with the attached, again on
AWS m5d instance, though probably not the same one as before
(fsync=off).


Unpatched:

Test 1
tps = 922.479156 (excluding connections establishing)

Test 2
tps = 334.701555 (excluding connections establishing)

Test 3
tps = 327.547316 (excluding connections establishing)

Test 4
tps = 1198.924131 (excluding connections establishing)

Test 5
tps = 125.130723 (excluding connections establishing)

Test 6
tps = 81.511072 (excluding connections establishing)

Patched

Test 1
tps = 918.105382 (excluding connections establishing)

Test 2
tps = 913.315387 (excluding connections establishing)

Test 3
tps = 893.578988 (excluding connections establishing)

Test 4
tps = 1213.238177 (excluding connections establishing)

Test 5
tps = 459.022550 (excluding connections establishing)

Test 6
tps = 416.835747 (excluding connections establishing)


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v2_speedup_building_tuple_conversion_maps.patch
Description: Binary data


Re: Generating partitioning tuple conversion maps faster

2018-06-28 Thread David Rowley
On 29 June 2018 at 02:23, Alexander Kuzmenkov
 wrote:
> I think I found one possible reason for this slowness. Your patch behaves as
> expected when there is a dropped output column, but does one extra
> comparison when there is a dropped input column. This backwards conversion
> is called from ExecInitRoutingInfo. To fix this, I'd just keep a persistent
> inner loop counter (see the attached patch).

It's just 2000 comparisons vs 1000.

> Still, fixing this doesn't improve the performance. According to perf
> report, updatepd.sql spends 25% of time in strcmp, and updatep.sql about
> 0.5%, but they are comparing the same column names, even with the same
> alignment and relative offsets. I'm completely puzzled by this.

On further inspection, the slowdown is coming from the planner in
make_inh_translation_list(). The INSERT path does not hit that since
the planner's job is pretty simple for simple INSERTs.

> As a side thought, we wouldn't have to go through this if we had a hash
> table that is easy to use, or perhaps string interning in catcache.

Maybe it's better to try the direct lookup and fall back on
SearchSysCacheAttName() if the same attnum does not have the same
name.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Generating partitioning tuple conversion maps faster

2018-06-28 Thread Alexander Kuzmenkov

On 06/25/2018 08:00 AM, David Rowley wrote:

I'd have expected Test 6 to do about 480-500tps. Adding debug to check
why it's not revealed that the search is doing what's expected. I'm
unsure why it has not improved more.


I think I found one possible reason for this slowness. Your patch 
behaves as expected when there is a dropped output column, but does one 
extra comparison when there is a dropped input column. This backwards 
conversion is called from ExecInitRoutingInfo. To fix this, I'd just 
keep a persistent inner loop counter (see the attached patch).


Still, fixing this doesn't improve the performance. According to perf 
report, updatepd.sql spends 25% of time in strcmp, and updatep.sql about 
0.5%, but they are comparing the same column names, even with the same 
alignment and relative offsets. I'm completely puzzled by this.


As a side thought, we wouldn't have to go through this if we had a hash 
table that is easy to use, or perhaps string interning in catcache.


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/common/tupconvert.c b/src/backend/access/common/tupconvert.c
index 2d0d2f4..573ddd82 100644
--- a/src/backend/access/common/tupconvert.c
+++ b/src/backend/access/common/tupconvert.c
@@ -297,6 +297,7 @@ convert_tuples_by_name_map(TupleDesc indesc,
 	AttrNumber *attrMap;
 	int			n;
 	int			i;
+	int nextInput = -1;
 
 	n = outdesc->natts;
 	attrMap = (AttrNumber *) palloc0(n * sizeof(AttrNumber));
@@ -315,7 +316,13 @@ convert_tuples_by_name_map(TupleDesc indesc,
 		atttypmod = outatt->atttypmod;
 		for (j = 0; j < indesc->natts; j++)
 		{
-			Form_pg_attribute inatt = TupleDescAttr(indesc, j);
+			Form_pg_attribute inatt;
+
+			nextInput++;
+			if (nextInput >= indesc->natts)
+nextInput = 0;
+
+			inatt = TupleDescAttr(indesc, nextInput);
 
 			if (inatt->attisdropped)
 continue;
@@ -330,7 +337,7 @@ convert_tuples_by_name_map(TupleDesc indesc,
 	   attname,
 	   format_type_be(outdesc->tdtypeid),
 	   format_type_be(indesc->tdtypeid;
-attrMap[i] = (AttrNumber) (j + 1);
+attrMap[i] = (AttrNumber) (nextInput + 1);
 break;
 			}
 		}


Re: Generating partitioning tuple conversion maps faster

2018-06-24 Thread amul sul
On Mon, Jun 25, 2018 at 10:57 AM Amit Langote
 wrote:
>
> On 2018/06/25 14:12, amul sul wrote:
> > On Mon, Jun 25, 2018 at 10:30 AM David Rowley
> >  wrote:
> >>
> > [...]
> >> Given the small size of this patch. I think it's a good candidate for
> >> the July 'fest.
> >>
> >
> > Just a different thought, how about having flag array something like
> > needed_tuple_conv?
> >
> > While loading partdesc, we could calculate that the columns of partition 
> > table
> > are aligned with the parent or not?
>
> Yeah maybe, but ISTM, that could be implemented *in addition to* the
> tweaks to convert_tuples_by_name_map that David's proposing, which seem
> helpful in their own right.
>
Yes, I agree.

Regards,
Amul



Re: Generating partitioning tuple conversion maps faster

2018-06-24 Thread Amit Langote
On 2018/06/25 14:12, amul sul wrote:
> On Mon, Jun 25, 2018 at 10:30 AM David Rowley
>  wrote:
>>
> [...]
>> Given the small size of this patch. I think it's a good candidate for
>> the July 'fest.
>>
> 
> Just a different thought, how about having flag array something like
> needed_tuple_conv?
> 
> While loading partdesc, we could calculate that the columns of partition table
> are aligned with the parent or not?

Yeah maybe, but ISTM, that could be implemented *in addition to* the
tweaks to convert_tuples_by_name_map that David's proposing, which seem
helpful in their own right.

Thanks,
Amit




Re: Generating partitioning tuple conversion maps faster

2018-06-24 Thread amul sul
On Mon, Jun 25, 2018 at 10:30 AM David Rowley
 wrote:
>
[...]
> Given the small size of this patch. I think it's a good candidate for
> the July 'fest.
>

Just a different thought, how about having flag array something like
needed_tuple_conv?

While loading partdesc, we could calculate that the columns of partition table
are aligned with the parent or not?

Regards,
Amul