Re: [HACKERS] Declarative partitioning vs. BulkInsertState

2017-01-24 Thread Robert Haas
On Mon, Jan 23, 2017 at 5:25 AM, Amit Langote
 wrote:
> I tried implementing the second idea in the attached patch.  It fixes the
> bug (multiple reports as mentioned in the commit message) that tuples may
> be inserted into the wrong partition.

Looks good to me, thanks.  Committed with a few tweaks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning vs. BulkInsertState

2017-01-23 Thread Amit Langote
On 2017/01/19 5:25, Robert Haas wrote:
> On Wed, Jan 11, 2017 at 10:53 PM, Amit Langote wrote:
>> On 2017/01/06 20:23, Amit Langote wrote:
>>>
>>> If a single BulkInsertState object is passed to
>>> heap_insert()/heap_multi_insert() for different heaps corresponding to
>>> different partitions (from one input tuple to next), tuples might end up
>>> going into wrong heaps (like demonstrated in one of the reports [1]).  A
>>> simple solution is to disable bulk-insert in case of partitioned tables.
>>>
>>> But my patch (or its motivations) was slightly wrongheaded, wherein I
>>> conflated multi-insert stuff and bulk-insert considerations.  I revised
>>> 0002 to not do that.
>>
>> Ragnar Ouchterlony pointed out [1] on pgsql-bugs that 0002 wasn't correct.
>> Attaching updated 0002 along with rebased 0001 and 0003.
> 
> The BulkInsertState is not there only to improve performance.  It's
> also there to make sure we use a BufferAccessStrategy, so that we
> don't trash the whole buffer arena.  See commit
> 85e2cedf985bfecaf43a18ca17433070f439fb0e.  If a partitioned table uses
> a separate BulkInsertState for each partition, I believe it will also
> end up using a separate ring of buffers for every partition.  That may
> well be faster than copying into an unpartitioned table in some cases,
> because dirtying everything in the buffer arena without actually
> writing any of those buffers is a lot faster than actually doing the
> writes.  But it is also anti-social behavior; we have
> BufferAccessStrategy objects for a reason.

Thanks for the explanation.  I agree that creating thousands of
BufferAccessStrategy objects would be a bad idea.

> One idea would be to have each partition use a separate
> BulkInsertState but have them point to the same underlying
> BufferAccessStrategy, but even that's problematic, because it could
> result in us holding a gigantic number of pins (one per partition). I
> think maybe a better idea would be to add an additional function
> ReleaseBulkInsertStatePin() which gets called whenever we switch
> relations, and then just use the same BulkInsertState throughout.

I tried implementing the second idea in the attached patch.  It fixes the
bug (multiple reports as mentioned in the commit message) that tuples may
be inserted into the wrong partition.

Thanks,
Amit
>From 2e31e97389b0911f749fcf59c0a24c0208a5e2b3 Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 23 Jan 2017 17:00:19 +0900
Subject: [PATCH] Make partitioned tables play nicely with bulk-insert
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When COPY'ing (or bulk-inserting) into a partitioned table, the target
heap may change from one tuple to next.  We better ask ReadBufferBI()
to get a new buffer every time such change occurs.  We do that by
having CopyFrom() call the new ReleaseBulkInsertStatePin() which unpins
the BulkInsertState.cur_buffer and sets it to InvalidBuffer, which is
enough a signal for ReadBufferBI() to do the right thing.

This also fixes the bug that tuples ended up being inserted into the
wrong partition, which occurred exactly because the wrong buffer was
used.

Reported by: 高增琦, Venkata B Nagothi, Ragnar Ouchterlony
Patch by: Amit Langote (idea by Robert Haas)
Reports: https://www.postgresql.org/message-id/CAFmBtr32FDOqofo8yG-4mjzL1HnYHxXK5S9OGFJ%3D%3DcJpgEW4vA%40mail.gmail.com
 https://www.postgresql.org/message-id/CAEyp7J9WiX0L3DoiNcRrY-9iyw%3DqP%2Bj%3DDLsAnNFF1xT2J1ggfQ%40mail.gmail.com
 https://www.postgresql.org/message-id/16d73804-c9cd-14c5-463e-5caad563ff77%40agama.tv
---
 src/backend/access/heap/heapam.c | 11 +++
 src/backend/commands/copy.c  | 12 
 src/include/access/heapam.h  |  1 +
 3 files changed, 24 insertions(+)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1ce42ea970..5fd7f1e1a2 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2324,6 +2324,17 @@ FreeBulkInsertState(BulkInsertState bistate)
 	pfree(bistate);
 }
 
+/*
+ * ReleaseBulkInsertStatePin - release a buffer currently held in bistate
+ */
+void
+ReleaseBulkInsertStatePin(BulkInsertState bistate)
+{
+	if (bistate->current_buf != InvalidBuffer)
+		ReleaseBuffer(bistate->current_buf);
+	bistate->current_buf = InvalidBuffer;
+}
+
 
 /*
  *	heap_insert		- insert tuple into a heap
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c05e14e26f..86bd53c179 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2307,6 +2307,7 @@ CopyFrom(CopyState cstate)
 	uint64		processed = 0;
 	bool		useHeapMultiInsert;
 	int			nBufferedTuples = 0;
+	int			prev_leaf_part_index = -1;
 
 #define MAX_BUFFERED_TUPLES 1000
 	HeapTuple  *bufferedTuples = NULL;	/* initialize to silence warning */
@@ -2562,6 +2563,17 @@ CopyFrom(CopyState cstate)
    leaf_part_index < cstate->num_partitions);
 
 			/*
+			 * If the 

Re: [HACKERS] Declarative partitioning vs. BulkInsertState

2017-01-18 Thread Robert Haas
On Wed, Jan 11, 2017 at 10:53 PM, Amit Langote
 wrote:
> On 2017/01/06 20:23, Amit Langote wrote:
>> On 2017/01/05 3:26, Robert Haas wrote:
>>> It's unclear to me why we need to do 0002.  It doesn't seem like it
>>> should be necessary, it doesn't seem like a good idea, and the commit
>>> message you proposed is uninformative.
>>
>> If a single BulkInsertState object is passed to
>> heap_insert()/heap_multi_insert() for different heaps corresponding to
>> different partitions (from one input tuple to next), tuples might end up
>> going into wrong heaps (like demonstrated in one of the reports [1]).  A
>> simple solution is to disable bulk-insert in case of partitioned tables.
>>
>> But my patch (or its motivations) was slightly wrongheaded, wherein I
>> conflated multi-insert stuff and bulk-insert considerations.  I revised
>> 0002 to not do that.
>
> Ragnar Ouchterlony pointed out [1] on pgsql-bugs that 0002 wasn't correct.
> Attaching updated 0002 along with rebased 0001 and 0003.

The BulkInsertState is not there only to improve performance.  It's
also there to make sure we use a BufferAccessStrategy, so that we
don't trash the whole buffer arena.  See commit
85e2cedf985bfecaf43a18ca17433070f439fb0e.  If a partitioned table uses
a separate BulkInsertState for each partition, I believe it will also
end up using a separate ring of buffers for every partition.  That may
well be faster than copying into an unpartitioned table in some cases,
because dirtying everything in the buffer arena without actually
writing any of those buffers is a lot faster than actually doing the
writes.  But it is also anti-social behavior; we have
BufferAccessStrategy objects for a reason.

One idea would be to have each partition use a separate
BulkInsertState but have them point to the same underlying
BufferAccessStrategy, but even that's problematic, because it could
result in us holding a gigantic number of pins (one per partition). I
think maybe a better idea would be to add an additional function
ReleaseBulkInsertStatePin() which gets called whenever we switch
relations, and then just use the same BulkInsertState throughout.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers