Re: When to use PARTITION BY HASH?

michael...@sqlexec.com Mon, 08 Jun 2020 02:51:19 -0700

Wow! That is good to know!

Sent from my iPad


> On Jun 7, 2020, at 5:23 PM, David Rowley <dgrowle...@gmail.com> wrote:
> 
>> On Sun, 7 Jun 2020 at 23:41, MichaelDBA <michael...@sqlexec.com> wrote:
>> The article referenced below assumes a worst case scenario for bulk-loading 
>> with hash partitioned tables.  It assumes that the values being inserted are 
>> in strict ascending or descending order with no gaps (like a sequence number 
>> incrementing by 1), thereby ensuring every partition is hit in order before 
>> repeating the process.  If the values being inserted are not strictly 
>> sequential with no gaps, then the performance is much better.  Obviously, 
>> what part of the tables and indexes are in memory has a lot to do with it as 
>> well.
> 
> In PostgreSQL 12, COPY was modified to support bulk-inserts for
> partitioned tables. This did speed up many scenarios.  Internally, how
> this works is that we maintain a series of multi insert buffers, one
> per partition. We generally only flush those buffers to the table when
> the buffer for the partition fills.  However, there is a sort of
> sanity limit [1] on the number of multi insert buffers we maintain at
> once and currently, that is 32.  Technically we could increase that
> limit, but there would still need to be a limit.  Unfortunately, for
> this particular case, since we're most likely touching between 199-799
> other partitions before hitting the first one again, that will mean
> that we really don't get any multi-inserts, which is likely the reason
> why the performance is worse for hash partitioning.
> 
> With PG12 and for this particular case, you're likely to see COPY
> performance drop quite drastically when going from 32 to 33
> partitions.  The code was more designed for hitting partitions more
> randomly rather than in this sort-of round-robin way that we're likely
> to get from hash partitioning on a serial column.
> 
> David
> 
> [1] 
> https://github.com/postgres/postgres/blob/master/src/backend/commands/copy.c#L2569

Re: When to use PARTITION BY HASH?

Reply via email to