Re: Making "COPY partitioned_table FROM" faster

2018-08-02 Thread Peter Eisentraut
On 02/08/2018 01:36, David Rowley wrote:
> On 31 July 2018 at 11:51, David Rowley  wrote:
>> The attached v6 delta replaces the v5 delta and should be applied on
>> top of the full v4 patch.
> 
> (now committed)
> 
> Many thanks for committing this Peter and many thanks to Melanie and
> Karen for reviewing it!

I problems with my email as I was committing this, which is why I didn't
let you know. ;-)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Making "COPY partitioned_table FROM" faster

2018-08-01 Thread David Rowley
On 31 July 2018 at 11:51, David Rowley  wrote:
> The attached v6 delta replaces the v5 delta and should be applied on
> top of the full v4 patch.

(now committed)

Many thanks for committing this Peter and many thanks to Melanie and
Karen for reviewing it!

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Making "COPY partitioned_table FROM" faster

2018-07-30 Thread David Rowley
On 31 July 2018 at 06:21, Peter Eisentraut
 wrote:
> On 30/07/2018 15:26, David Rowley wrote:
>>> - Add some tests.  The if (nBufferedTuples > 0) that flushes the tuples
>>> when the partition changes is not currently exercised.
>>
>> That seems like a good idea. In fact, it uncovered a bug around
>> ConvertPartitionTupleSlot() freeing the previously stored tuple out
>> the slot which resulted in a crash. I didn't notice before because my
>> test had previously not required any tuple conversions.
>
> I think we need to think of a better place to put that temporary file,
> and clean it up properly afterwards.  I'm not sure whether we have
> existing uses like that.

Ideally, I could have written the test in copy2.sql, but since I had
to insert over 1000 rows to trigger the multi-insert code, copy from
stdin was not practical in the .sql file. Instead, I just followed the
lead in copy.source and used the same temp file style as the previous
test. It also leaves that file laying around, it just happens to be
smaller.  I've updated the test to truncate the table again and
perform another COPY TO to empty the file.  I didn't see any way to
remove the file due to lack of standard way of removing files in the
OS. \! rm ... is not going to work on windows, for example.

I also failed to realise how the .source files work previously. I see
there are input and output directories. I'd missed updating the output
one in the v5 delta patch.

> Also, maybe the test should check afterwards that the right count of
> rows ended up in each partition?

Agreed. I've added that.

The attached v6 delta replaces the v5 delta and should be applied on
top of the full v4 patch. I'm using deltas in the hope they're easier
to review the few changes than reviewing the entire patch again.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v6_multiinsert_copy_delta.patch
Description: Binary data


Re: Making "COPY partitioned_table FROM" faster

2018-07-30 Thread Melanie Plageman
On Mon, Jul 30, 2018 at 11:21 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 30/07/2018 15:26, David Rowley wrote:
> >> - Add some tests.  The if (nBufferedTuples > 0) that flushes the tuples
> >> when the partition changes is not currently exercised.
> >
> > That seems like a good idea. In fact, it uncovered a bug around
> > ConvertPartitionTupleSlot() freeing the previously stored tuple out
> > the slot which resulted in a crash. I didn't notice before because my
> > test had previously not required any tuple conversions.
>
> I think we need to think of a better place to put that temporary file,
> and clean it up properly afterwards.  I'm not sure whether we have
> existing uses like that.
>
> Also, maybe the test should check afterwards that the right count of
> rows ended up in each partition?
>
> Yea, I actually would suggest changing the data inserted in the third
insert statement to have 'Three' in the third column:
insert into parted_copytest select x,1,'One' from
generate_series(1011,1020) x;
And then this check:
select count(*) from parted_copytest group by a, b, c;


Re: Making "COPY partitioned_table FROM" faster

2018-07-30 Thread Peter Eisentraut
On 30/07/2018 15:26, David Rowley wrote:
>> - Add some tests.  The if (nBufferedTuples > 0) that flushes the tuples
>> when the partition changes is not currently exercised.
> 
> That seems like a good idea. In fact, it uncovered a bug around
> ConvertPartitionTupleSlot() freeing the previously stored tuple out
> the slot which resulted in a crash. I didn't notice before because my
> test had previously not required any tuple conversions.

I think we need to think of a better place to put that temporary file,
and clean it up properly afterwards.  I'm not sure whether we have
existing uses like that.

Also, maybe the test should check afterwards that the right count of
rows ended up in each partition?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Making "COPY partitioned_table FROM" faster

2018-07-30 Thread David Rowley
On 30 July 2018 at 20:33, Peter Eisentraut
 wrote:
> Two more thoughts:
>
> - Add some tests.  The if (nBufferedTuples > 0) that flushes the tuples
> when the partition changes is not currently exercised.

That seems like a good idea. In fact, it uncovered a bug around
ConvertPartitionTupleSlot() freeing the previously stored tuple out
the slot which resulted in a crash. I didn't notice before because my
test had previously not required any tuple conversions.

While ensuring my test was working correctly I noticed that I had a
thinko in the logic that decided if another avgTuplesPerPartChange
calculation was due.  The problem was that the (cstate->cur_lineno -
RECHECK_MULTI_INSERT_THRESHOLD) is unsigned and when cur_lineno is
below RECHECK_MULTI_INSERT_THRESHOLD it results in an underflow which
makes the if test always true until cur_lineno gets up to 1000. I
considered just making all those counters signed, but chickened out
when it came to changing "processed". I thought it was quite strange
to have "processed" unsigned and the other counters that do similar
things signed.  Of course, signed would have enough range, but it
would mean changing the return type of CopyFrom() which I wasn't
willing to do.

In the end, I just protected the if test so that it only calculates
the average again if cur_lineno is at least
RECHECK_MULTI_INSERT_THRESHOLD. This slightly changes when
avgTuplesPerPartChange is first set, but it'll still be at least 1000
tuples before we start doing multi-inserts. Another option would be to
cast the expression as int64... I'm a bit undecided what's best here.

> - With proute becoming a function-level variable,
> cstate->partition_tuple_routing is obsolete and could be removed.  (No
> point in saving this in cstate if it's only used in one function anyway.)

Good idea. I've removed that.

I've attached a delta patch based on the v4 patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v5_multiinsert_copy_delta.patch
Description: Binary data


Re: Making "COPY partitioned_table FROM" faster

2018-07-30 Thread Amit Langote
On 2018/07/30 17:33, Peter Eisentraut wrote:
> - With proute becoming a function-level variable,
> cstate->partition_tuple_routing is obsolete and could be removed.  (No
> point in saving this in cstate if it's only used in one function anyway.)

+1.  Also seems to apply to transition_capture, which afaict, was added in
cstate only because partition_tuple_routing is there.

Thanks,
Amit




Re: Making "COPY partitioned_table FROM" faster

2018-07-30 Thread Peter Eisentraut
Two more thoughts:

- Add some tests.  The if (nBufferedTuples > 0) that flushes the tuples
when the partition changes is not currently exercised.

- With proute becoming a function-level variable,
cstate->partition_tuple_routing is obsolete and could be removed.  (No
point in saving this in cstate if it's only used in one function anyway.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Making "COPY partitioned_table FROM" faster

2018-07-29 Thread Melanie Plageman
On Sat, Jul 28, 2018 at 6:00 PM, David Rowley 
wrote:

> Oops.  Must've fallen off in transit :) Hopefully, it's more firmly
> attached this time.
>

LGTM. Status changed to "ready for committer"


Re: Making "COPY partitioned_table FROM" faster

2018-07-28 Thread David Rowley
On 29 July 2018 at 05:24, Melanie Plageman  wrote:
> On Thu, Jul 26, 2018 at 7:26 PM, David Rowley 
> wrote:
>>
>> Fixed in the attached v4. That's the only change.
>
>
> I don't see an attachment.

Oops.  Must've fallen off in transit :) Hopefully, it's more firmly
attached this time.

> We are probably okay without doing that in this case. Assuming v4 changed
> that typo, I feel ready to change the status of the patch to "ready for
> committer".

Great. Many thanks to you and Karen for reviewing this and for pushing
me into the adaptive code. I think the patch is far better for it.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v4-0001-Allow-multi-inserts-during-COPY-into-a-partitione.patch
Description: Binary data


Re: Making "COPY partitioned_table FROM" faster

2018-07-28 Thread Melanie Plageman
On Thu, Jul 26, 2018 at 7:26 PM, David Rowley 
wrote:

> Fixed in the attached v4. That's the only change.
>

I don't see an attachment.

> So, I thinking I'm missing something. Which of the changes would cause the
> > performance improvement from patch v2 to v3? My understanding is:
>
> You could probably narrow it down by reverting those changed
> one-by-one and seeing if it was just one in particular that caused the
> performance improvement or if it was some combination of the changed.
> Sounds like that would take quite a bit of time and it would only be
> for a learning experience, but an interesting one!


We are probably okay without doing that in this case. Assuming v4 changed
that typo, I feel ready to change the status of the patch to "ready for
committer".


Re: Making "COPY partitioned_table FROM" faster

2018-07-27 Thread Robert Haas
On Wed, Jul 25, 2018 at 10:30 PM, David Rowley
 wrote:
> I agree RANGE partition is probably the most likely case to benefit
> from this optimisation, but I just don't think that HASH could never
> benefit and LIST probably sits somewhere in the middle.
>
> HASH partitioning might be useful in cases like partitioning by
> "sensor_id".  It does not seem that unreasonable that someone might
> want to load all the data for an entire sensor at once.

Another case might be restoring a dump with --load-via-partition-root.
Granted, you probably shouldn't specify that option unless you expect
rows to end up in different partition than they were in the original
cluster, but it's possible somebody might do it out of an abundance of
caution if, say, they are doing an automated dump restore on a machine
that may or may not have different endian-ness than the original.

I think making it adaptive, as you've done, is definitely better than
a heuristic based on the partitioning type.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Making "COPY partitioned_table FROM" faster

2018-07-26 Thread David Rowley
On 27 July 2018 at 05:30, Melanie Plageman  wrote:
> On patched code line 2564, there is a missing parenthesis. It might be
> better to remove the parentheses entirely and, while you're at it, there is
> a missing comma.

Thanks for noticing that.

Fixed in the attached v4. That's the only change.

> Regarding the performance improvement and code diff from v2 to v3:
> The rearranging of the code in v3 makes sense and improves the flow of the
> code, however, I stared at it for awhile and couldn't quite work out in my
> head which part caused the significant improvement from v2.
> I would have thought that a speedup from patch v2 to v3 would have come from
> doing multi-inserts less often when the target partition is switching a lot,
> however, looking at the v2 to v3 diff, I can't see how any of the changes
> would have caused a decrease in the number of multi-inserts given the same
> data and target table ddl.
> So, I thinking I'm missing something. Which of the changes would cause the
> performance improvement from patch v2 to v3? My understanding is:

My guess is that the compile rearranged the code move some of the
unlikely code out of line, perhaps to the end of the function making
the common working set of code fit the instruction cache, where maybe
before there was some cache missed. I've no evidence of that as I
didn't look at the assembly code or do any profiling.

You could probably narrow it down by reverting those changed
one-by-one and seeing if it was just one in particular that caused the
performance improvement or if it was some combination of the changed.
Sounds like that would take quite a bit of time and it would only be
for a learning experience, but an interesting one!

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Making "COPY partitioned_table FROM" faster

2018-07-26 Thread Melanie Plageman
On Wed, Jul 25, 2018 at 7:24 PM, David Rowley 
wrote:

On patched code line 2564, there is a missing parenthesis. It might be
better to remove the parentheses entirely and, while you're at it, there is
a missing comma.

/*
 * For partitioned tables we can't support multi-inserts when there
 * are any statement level insert triggers.  (It might be possible to
 * allow partitioned tables with such triggers in the future, but for
 * now CopyFromInsertBatch expects that any before row insert and
 * statement level insert triggers are on the same relation.
 */

Should be

/*
 * For partitioned tables we can't support multi-inserts when there
 * are any statement level insert triggers. It might be possible to
 * allow partitioned tables with such triggers in the future, but for
 * now, CopyFromInsertBatch expects that any before row insert and
 * statement level insert triggers are on the same relation.
 */

Regarding the performance improvement and code diff from v2 to v3:
The rearranging of the code in v3 makes sense and improves the flow of the
code, however, I stared at it for awhile and couldn't quite work out in my
head which part caused the significant improvement from v2.
I would have thought that a speedup from patch v2 to v3 would have come
from doing multi-inserts less often when the target partition is switching
a lot, however, looking at the v2 to v3 diff, I can't see how any of the
changes would have caused a decrease in the number of multi-inserts given
the same data and target table ddl.
So, I thinking I'm missing something. Which of the changes would cause the
performance improvement from patch v2 to v3? My understanding is:

a) Made the avgTuplesPerPartChange >= 1.3 the first condition in the
> test that sets leafpart_use_multi_insert.
>

This would short-circuit checking the other conditions when deciding how to
set leafpart_use_multi_insert


> b) Added an unlikely() when testing of the partition's resultRelInfo
> has set to be initialised. This is only true the first time a
> partition is inserted into during the COPY.
>

 The addition of "unlikely" here seems like a good idea because there is a
function call (ExecInitPartitionInfo) inside the if statement. However,
would that cause such a difference in performance from v2 to v3? What would
be the reason? Cache misses? Some kind of pre-evaluation of the expression?

c) Added unlikely() around lastPartitionSampleLineNo test.  This will
> only be true 1 in 1000, so pretty unlikely.
>

Even though this makes sense based on the frequency with which this
condition will evaluate to true, I don't understand what the performance
benefit would be for adding a compiler hint here around such a trivial
calculation

d) Moved the nBufferedTuples > 0 test under the insertMethod ==
> CIM_MULTI_CONDITIONAL test. It's never going to be > 0 when we're
> doing CIM_SINGLE.
>

This is a good catch. It makes this part of the code more clear, as well.
However, it doesn't seem like it would affect performance at all.

Let me know what I'm missing.


Re: Making "COPY partitioned_table FROM" faster

2018-07-26 Thread Simon Riggs
On 26 July 2018 at 03:30, David Rowley  wrote:

> The v3 version of the patch also fixes the very small performance
> regression for the (probably quite likely) worst-case situation.  New
> performance is about 3.5% faster instead of 0.5-1% slower. So likely
> there's no longer any need to consider this.

Works for me!

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Making "COPY partitioned_table FROM" faster

2018-07-25 Thread David Rowley
On 25 July 2018 at 04:37, Simon Riggs  wrote:
> I don't see any need here for another GUC, nor even a command option.
> The user has already indicated their use case to us:

I agree.

> We know that the common case for RANGE partitioned tables is to load
> into the one current partition. We also know that the partition might
> change as we load data, when the data loaded crosses the partition
> boundary, so we need this optimization to be adaptive. Not all data
> loads follow that rule, so we also need the adpative algorithm to shut
> off for those cases.
>
> We also know that the common case for HASH partitions is to load into
> all partitions at once, since hash is specifically designed to spread
> data out across partitions. It is almost never true that we would want
> to load one partition at a time, so it seems easy to turn the
> optimization off if we use this type of partitioning. Or better, we
> need work done to improve that case also, but that is outside the
> current scope of this patch.
>
> If we have multi-level partitions, if any of the levels includes a
> Hash, then turn it off.
>
> LIST partitions are less likely to have a clear pattern, so I would
> treat them like HASH and assume the data is not sorted by partition.
>
> So for this patch, just add an "if (RANGE)" test.

I agree RANGE partition is probably the most likely case to benefit
from this optimisation, but I just don't think that HASH could never
benefit and LIST probably sits somewhere in the middle.

HASH partitioning might be useful in cases like partitioning by
"sensor_id".  It does not seem that unreasonable that someone might
want to load all the data for an entire sensor at once.

The v3 version of the patch also fixes the very small performance
regression for the (probably quite likely) worst-case situation.  New
performance is about 3.5% faster instead of 0.5-1% slower. So likely
there's no longer any need to consider this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Making "COPY partitioned_table FROM" faster

2018-07-25 Thread David Rowley
Hi Melanie,

Many thanks for looking over this again.

On 25 July 2018 at 03:32, Melanie Plageman  wrote:
> One small additional typo I noticed:
>
> In the patched code on line 2555, there appears to be a typo:
>  /* ...
>  * inserting into and act differently if the tuples that have already
>  * been processed any prepared for insertion are not there.
>  */
> The word "any" here is probably wrong, though I am actually not sure what
> was meant.

It was meant to be "and"

> Though it is from the existing comments, it would be nice to spell out once
> what is meant by "BR trigger". I'm sure it is mentioned elsewhere in the
> code, but, since it is not easily googled, it might be helpful to specify.
>
>> Many thanks to both of you for the thorough review. I hope the
>> attached addresses all the concerns.
>
>
> I feel that the code itself is clear and feel our concerns are addressed.

Great!

>> One final note:  I'm not entirely convinced we need this adaptive
>> code, but it seems easy enough to rip it back out if it's more trouble
>> than it's worth. But if the other option is a GUC, then I'd rather
>> stick with the adaptive code, it's likely going to do much better than
>> a GUC since it can change itself during the copy which will be useful
>> when the stream contains a mix small and large sets of consecutive
>> tuples which belong to the same partition.
>>
> Though the v2 numbers do look better, it doesn't complete alleviate the
> slow-down in the worst case. Perhaps the GUC and the adaptive code are not
> alternatives and could instead be used together. You could even make the
> actual RECHECK_MULTI_INSERT_THRESHOLD the GUC.
> However, I think that the decision as to whether or not it makes sense to do
> the adaptive code without a GUC is really based on the average use case, to
> which I can't really speak.
> The counter-argument I see, however, is that it is not acceptable to have
> completely un-optimized insertion of data into partition tables and that an
> overwhelming flood of GUCs is undesirable.

I'm pretty much against a GUC, because:

1) Most people won't use it or know about it, and;
2) Copy streams might contain mixes of long runs of tuples belonging
to the same partition and periods where the tuple changes frequently.
GUC cannot be set to account for that, and;
3) Because of the following:

I looked at this patch again and I saw that there are a few more
things we can do to improve the performance further.

I've made the following changes:

a) Made the avgTuplesPerPartChange >= 1.3 the first condition in the
test that sets leafpart_use_multi_insert.
b) Added an unlikely() when testing of the partition's resultRelInfo
has set to be initialised. This is only true the first time a
partition is inserted into during the COPY.
c) Added unlikely() around lastPartitionSampleLineNo test.  This will
only be true 1 in 1000, so pretty unlikely.
d) Moved the nBufferedTuples > 0 test under the insertMethod ==
CIM_MULTI_CONDITIONAL test. It's never going to be > 0 when we're
doing CIM_SINGLE.

I spun up another AWS m5d.large instance to test this. This time I
only tested the case where the partition changes on every tuple.  The
new patch takes about 96.5% of the time that master takes. I performed
10 runs of each and tested both with fsync=on and fsync=off.  I've
attached the results in spreadsheet form.

I think given the new performance results there's no need for any GUC
or conditionally enabling this optimisation based on partitioning
strategy.

v3 patch is attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


copy_speedup_v3_results.ods
Description: application/vnd.oasis.opendocument.spreadsheet


v3-0001-Allow-multi-inserts-during-COPY-into-a-partitione.patch
Description: Binary data


Re: Making "COPY partitioned_table FROM" faster

2018-07-24 Thread Simon Riggs
On 24 July 2018 at 16:32, Melanie Plageman  wrote:
>
> On Fri, Jul 20, 2018 at 7:57 AM, David Rowley 
> wrote:

>> One final note:  I'm not entirely convinced we need this adaptive
>> code, but it seems easy enough to rip it back out if it's more trouble
>> than it's worth. But if the other option is a GUC, then I'd rather
>> stick with the adaptive code, it's likely going to do much better than
>> a GUC since it can change itself during the copy which will be useful
>> when the stream contains a mix small and large sets of consecutive
>> tuples which belong to the same partition.
>>
> Though the v2 numbers do look better, it doesn't complete alleviate the
> slow-down in the worst case. Perhaps the GUC and the adaptive code are not
> alternatives and could instead be used together. You could even make the
> actual RECHECK_MULTI_INSERT_THRESHOLD the GUC.
> However, I think that the decision as to whether or not it makes sense to do
> the adaptive code without a GUC is really based on the average use case, to
> which I can't really speak.
> The counter-argument I see, however, is that it is not acceptable to have
> completely un-optimized insertion of data into partition tables and that an
> overwhelming flood of GUCs is undesirable.

I don't see any need here for another GUC, nor even a command option.
The user has already indicated their use case to us:

We know that the common case for RANGE partitioned tables is to load
into the one current partition. We also know that the partition might
change as we load data, when the data loaded crosses the partition
boundary, so we need this optimization to be adaptive. Not all data
loads follow that rule, so we also need the adpative algorithm to shut
off for those cases.

We also know that the common case for HASH partitions is to load into
all partitions at once, since hash is specifically designed to spread
data out across partitions. It is almost never true that we would want
to load one partition at a time, so it seems easy to turn the
optimization off if we use this type of partitioning. Or better, we
need work done to improve that case also, but that is outside the
current scope of this patch.

If we have multi-level partitions, if any of the levels includes a
Hash, then turn it off.

LIST partitions are less likely to have a clear pattern, so I would
treat them like HASH and assume the data is not sorted by partition.

So for this patch, just add an "if (RANGE)" test.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Making "COPY partitioned_table FROM" faster

2018-07-24 Thread Melanie Plageman
On Fri, Jul 20, 2018 at 7:57 AM, David Rowley 
wrote:

> > So, overall, we feel that the code from lines 2689 until 2691 and from
> 2740 to 2766 could use further clarification with regard to switching from
> parent to child partition and sibling to sibling partition as well as
> regarding saving relinfo for partitions that have not been seen before in
> the proute->partitions[]
>
> I hope it's more clear now that I've got rid of saved_resultRelInfo.
>
> I think all of the refactoring and clarifications look good to me and are
much more clear.

One small additional typo I noticed:

In the patched code on line 2555, there appears to be a typo:
 /* ...
 * inserting into and act differently if the tuples that have already
 * been processed any prepared for insertion are not there.
 */
The word "any" here is probably wrong, though I am actually not sure what
was meant.

Though it is from the existing comments, it would be nice to spell out once
what is meant by "BR trigger". I'm sure it is mentioned elsewhere in the
code, but, since it is not easily googled, it might be helpful to specify.

Many thanks to both of you for the thorough review. I hope the
> attached addresses all the concerns.
>

I feel that the code itself is clear and feel our concerns are addressed.

>
> One final note:  I'm not entirely convinced we need this adaptive
> code, but it seems easy enough to rip it back out if it's more trouble
> than it's worth. But if the other option is a GUC, then I'd rather
> stick with the adaptive code, it's likely going to do much better than
> a GUC since it can change itself during the copy which will be useful
> when the stream contains a mix small and large sets of consecutive
> tuples which belong to the same partition.
>
> Though the v2 numbers do look better, it doesn't complete alleviate the
slow-down in the worst case. Perhaps the GUC and the adaptive code are not
alternatives and could instead be used together. You could even make the
actual RECHECK_MULTI_INSERT_THRESHOLD the GUC.
However, I think that the decision as to whether or not it makes sense to
do the adaptive code without a GUC is really based on the average use case,
to which I can't really speak.
The counter-argument I see, however, is that it is not acceptable to have
completely un-optimized insertion of data into partition tables and that an
overwhelming flood of GUCs is undesirable.


Re: Making "COPY partitioned_table FROM" faster

2018-07-23 Thread Peter Eisentraut
On 20.07.18 16:57, David Rowley wrote:
> One final note:  I'm not entirely convinced we need this adaptive
> code, but it seems easy enough to rip it back out if it's more trouble
> than it's worth. But if the other option is a GUC, then I'd rather
> stick with the adaptive code, it's likely going to do much better than
> a GUC since it can change itself during the copy which will be useful
> when the stream contains a mix small and large sets of consecutive
> tuples which belong to the same partition.

I think some kind of way to switch between the two code paths would be
desirable.  For example, with hash partitioning, it's likely that in
many cases you won't find any adjacent candidates in batches of
significant size.  So then you've just made everything 5% slower.
Unless we can make the multi-insert path itself faster.

The particular heuristic you have chosen seems sensible to me, but we'll
have to see how it holds up in practice.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Making "COPY partitioned_table FROM" faster

2018-07-20 Thread Karen Huddleston
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

This patch applied cleanly and worked as expected.

Patch description:
This patch implements the use of heap_multi_insert() for partition tables when 
using the COPY FROM command, benefiting the performance of COPY FROM in cases 
in which multiple tuples in the file are destined for the same partition.

Tests:
All tests passed make check-world and TAP tests

Functionality:
Specifically, we tried the cases in the bottom section "Exercising the code" 
and found all behavior as described and expected.
Note that we did conduct performance testing for our test cases enumerated in 
this section using COPY FROM PROGRAM comparing the patched and unpatched code 
using one example with tuples all destined for the same partition and one 
example with tuples destined for alternating partitions. We saw a similar 
performance improvement to the one recorded by David in his email, including a 
decrease in performance for copying data with tuples alternating between 
destination partitions as compared to the unpatched code. 
However, in our cases below, we focus on exercising the code added as opposed 
to on performance.

Feature feedback from a user perspective:
There will be two differences in performance that may be perceptible to the 
user after the inclusion of this patch:
1) the relatively small decrease in performance (as compared to master) for 
copying data from a file or program in which the destination partition changes 
frequently
2) the stark contrast between the performance of copying data destined for the 
same partition and data destined for alternating partitions
Based on the numbers in David's email the performance difference 
27721.054 ms patched for copying data destined for the same partition vs 
46151.844 ms patched for copying data destined for two different partitions 
alternating each row
Because both differences could impact users in data loading, it is worth 
considering making this transparent to the user in some way. Because this will 
be noticeable to the user, and, furthermore, because it is potentially within 
the power of the user to make changes to their data copying strategy given this 
information, it might be prudent to either create a GUC allowing the user to 
disable heap_multi_insert for COPY FROM (if the user knows the data he/she is 
copying over is very varied) or to add a comment to the docs that when using 
COPY FROM for a partition table, it is advisable to sort the data in some 
manner which would optimize the number of consecutive tuples destined for the 
same partition.

Code Review:
Naming of newly introduced enum:
The CopyInsertMethod enum value CIM_MULTI_PARTITION is potentially confusing, 
since, for a partition table with BEFORE or INSTEAD INSERT triggers on child 
partitions only heap_insert is valid, requiring the additional variable 
part_can_multi_insert to determine if heap_multi_insert can be used or not.
It might be more clear to name it something that indicates it is a candidate 
for multi-insertion, pending further review. This decouples the state of 
potentially applicable multi-insertion from the table being a partition table. 
In the future, perhaps, some other feature might make a table conditionally 
eligible for multi-insertion of tuples, so, it may be desirable to use this 
intermediate state for other kinds of tables as well.
Alternatively, it might make it more clear if the variable 
part_can_multi_insert was clearly for a leaf partition table, since this can 
change from leaf to leaf, maybe named leaf_part_can_multi_insert

Code comment potential typo corrections:
In the following comment, in the patched code line 2550
     /* ... 
     * Note: It does not matter if any partitions have any volatile default
     * expression as we use the defaults from the target of the COPY command.
     */
It seems like "volatile default expression" should be "volatile default 
expressions"

In the following comment, in the patched code starting on line 2582

        /* ...
         * the same partition as the previous tuple, and since we flush the
         * previous partitions buffer once the new tuple has already been
         * built, we're unable to reset the estate since we'd free the memory
         * which the new tuple is stored.  To work around this we maintain a
         * secondary expression context and alternate between these when the
         ... */
"previous partitions" should probably be "previous partition's" and 
"since we'd free the memory which the new tuple is stored" should be "since 
we'd free the memory in which the new tuple is stored"

In the following comment, in the patched code starting on line 2769

                /*
                 * We'd better make the bulk insert mechanism gets a new
                 * 

Re: Making "COPY partitioned_table FROM" faster

2018-07-20 Thread David Rowley
 (This email seemed to only come to me and somehow missed the mailing
list. Including the message here in its entirety)

On 20 July 2018 at 13:26, Karen Huddleston  wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>
> This patch applied cleanly and worked as expected.
>
> Patch description:
> This patch implements the use of heap_multi_insert() for partition tables 
> when using the COPY FROM command, benefiting the performance of COPY FROM in 
> cases in which multiple tuples in the file are destined for the same 
> partition.
>
> Tests:
> All tests passed make check-world and TAP tests
>
> Functionality:
> Specifically, we tried the cases in the bottom section "Exercising the code" 
> and found all behavior as described and expected.
> Note that we did conduct performance testing for our test cases enumerated in 
> this section using COPY FROM PROGRAM comparing the patched and unpatched code 
> using one example with tuples all destined for the same partition and one 
> example with tuples destined for alternating partitions. We saw a similar 
> performance improvement to the one recorded by David in his email, including 
> a decrease in performance for copying data with tuples alternating between 
> destination partitions as compared to the unpatched code.
> However, in our cases below, we focus on exercising the code added as opposed 
> to on performance.
>
> Feature feedback from a user perspective:
> There will be two differences in performance that may be perceptible to the 
> user after the inclusion of this patch:
> 1) the relatively small decrease in performance (as compared to master) for 
> copying data from a file or program in which the destination partition 
> changes frequently
> 2) the stark contrast between the performance of copying data destined for 
> the same partition and data destined for alternating partitions
> Based on the numbers in David's email the performance difference
> 27721.054 ms patched for copying data destined for the same partition vs 
> 46151.844 ms patched for copying data destined for two different partitions 
> alternating each row
> Because both differences could impact users in data loading, it is worth 
> considering making this transparent to the user in some way. Because this 
> will be noticeable to the user, and, furthermore, because it is potentially 
> within the power of the user to make changes to their data copying strategy 
> given this information, it might be prudent to either create a GUC allowing 
> the user to disable heap_multi_insert for COPY FROM (if the user knows the 
> data he/she is copying over is very varied) or to add a comment to the docs 
> that when using COPY FROM for a partition table, it is advisable to sort the 
> data in some manner which would optimize the number of consecutive tuples 
> destined for the same partition.

I'd rather steer clear of any new GUCs. Instead, in the patch I've
attached, I've included some adaptive code which chooses to use or not
use multi-inserts based on the average number of tuples that have been
in the batches. I also did some tests and it does appear that the
benefits of multi-inserts come pretty early. So I ended up coding it
to enabling multi-inserts when the average tuples per batch is at
least 1.3.  I originally guessed 1.5, but one of my tests was faster
with multi-inserts and the average batch size there was 1.33.

> Code Review:
> Naming of newly introduced enum:
> The CopyInsertMethod enum value CIM_MULTI_PARTITION is potentially confusing, 
> since, for a partition table with BEFORE or INSTEAD INSERT triggers on child 
> partitions only heap_insert is valid, requiring the additional variable 
> part_can_multi_insert to determine if heap_multi_insert can be used or not.
> It might be more clear to name it something that indicates it is a candidate 
> for multi-insertion, pending further review. This decouples the state of 
> potentially applicable multi-insertion from the table being a partition 
> table. In the future, perhaps, some other feature might make a table 
> conditionally eligible for multi-insertion of tuples, so, it may be desirable 
> to use this intermediate state for other kinds of tables as well.

I've renamed this to CIM_MULTI_CONDITIONAL.

> Alternatively, it might make it more clear if the variable 
> part_can_multi_insert was clearly for a leaf partition table, since this can 
> change from leaf to leaf, maybe named leaf_part_can_multi_insert

Renamed to leafpart_use_multi_insert. I changed from "can" to "use"
due to the adaptive code might just choose not to, even if it actually
"can" use multi-inserts.

> Code comment potential typo corrections:
> In the following comment, in the patched code line 2550
>  /* ...
>  * Note: It does not matter if any partitions