Re: Parallel Inserts in CREATE TABLE AS

2021-05-31 Thread Bharath Rupireddy
On Sat, May 29, 2021 at 9:46 AM Amit Kapila  wrote:
> While looking at results, I have observed one more thing that we are
> trying to parallelize I/O due to which we might not be seeing benefit
> in such cases. I think even for non-write queries there won't be any
> (much) benefit if we can't parallelize CPU usage. Basically, the test
> you are doing is for statement: explain analyze verbose create table
> test as select * from tenk1;. Now, in this statement, there is no
> qualification and still, the Gather node is generated for it, this
> won't be the case if we check "select * from tenk1". Is it due to the
> reason that the patch completely ignores the parallel_tuple_cost? But
> still, it should prefer a serial plan due parallel_setup_cost, why is
> that not happening? Anyway, I think we should not parallelize such
> queries where we can't parallelize CPU usage. Have you tried the cases
> without changing any of the costings for parallelism?

Hi,

I measured the execution timings for parallel inserts in CTAS in cases
where the planner chooses parallelism for selects naturally. This
means, I have used only 0001 patch from v23 patch set at [1]. I have
not used the 0002 patch that makes parallel_tuple_cost 0.

Query used for all these tests is below. Also, attached table creation
sqls in the file "test_cases".
EXPLAIN (ANALYZE, VERBOSE) create table test1 as select * from tenk1
t1, tenk2 t2 where t1.c1 = t2.d2;

All the results are of the form (number of workers, exec time in milli sec).

Test case 1: both tenk1 and tenk2 are of tables with 1 integer(of 4
bytes) columns, tuple size 28 bytes, 100mn tuples
master: (0, 277886.951 ms), (2, 171183.221 ms), (4, 159703.496 ms)
with parallel inserts CTAS patch: (0, 264709.186 ms), (2, 128354.448
ms), (4, 111533.731 ms)

Test case 2: both tenk1 and tenk2 are of tables with 2 integer(of 4
bytes each) columns, 3 varchar(8), tuple size 59 bytes, 100mn tuples
master: (0, 453505.228 ms), (2, 236762.759 ms), (4, 219038.126 ms)
with parallel inserts CTAS patch: (0, 470483.818 ms), (2, 219374.198
ms), (4, 203543.681 ms)

Test case 3: both tenk1 and tenk2 are of tables with 2 bigint(of 8
bytes each) columns, 3 name(of 64 bytes each) columns, 1 varchar(8),
tuple size 241 bytes, 100mn tuples
master: (0, 1052725.928 ms), (2, 592968.486 ms), (4, 562137.891 ms)
with parallel inserts CTAS patch: (0, 1019086.805 ms), (2, 634448.322
ms), (4, 680793.305 ms)

Test case 4: both tenk1 and tenk2 are of tables with 2 bigint(of 8
bytes each) columns, 16 name(of 64 bytes each) columns, tuple size
1064 bytes, 10mn tuples
master: (0, 371931.497 ms), (2, 247206.841 ms), (4, 241959.839 ms)
with parallel inserts CTAS patch: (0, 396342.329 ms), (2, 333860.472
ms), (4, 317895.558 ms)

Observation: parallel insert + parallel select gives good benefit wIth
very lesser tuple sizes, cases 1 and 2. If the tuple size is bigger
serial insert + parallel select fares better, cases 3 and 4.

In the coming days, I will try to work on more performance analysis
and clarify some of the points raised upthread.

[1] - 
https://www.postgresql.org/message-id/CALj2ACXVWr1o%2BFZrkQt-2GvYfuMQeJjWohajmp62Wr6BU8Y4VA%40mail.gmail.com
[2] - postgresql.conf changes I made:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = on
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
wal_level = replica

With Regards,
Bharath Rupireddy.


test_cases
Description: Binary data


Re: Parallel Inserts in CREATE TABLE AS

2021-05-28 Thread Amit Kapila
On Fri, May 28, 2021 at 8:53 AM Amit Kapila  wrote:
>
> On Thu, May 27, 2021 at 7:37 PM Bharath Rupireddy
> >
> > I captured below information with the attached patch
> > 0001-test-times-and-block-counts.patch applied on top of CTAS v23
> > patch set. Testing details are attached in the file named "test".
> > Total time spent in LockRelationForExtension
> > Total time spent in GetPageWithFreeSpace
> > Total time spent in RelationAddExtraBlocks
> > Total number of times extended the relation in bulk
> > Total number of times extended the relation by one block
> > Total number of blocks added in bulk extension
> > Total number of times getting the page from FSM
> >
>
> In your results, the number of pages each process is getting from FSM
> is not matching with the number of blocks added. I think we need to
> increment 'fsm_hit_count' in RecordAndGetPageWithFreeSpace as well
> because that is also called and the process can get a free page via
> the same. The other thing to check via debugger is when one worker
> adds the blocks in bulk does another parallel worker gets all those
> blocks. You can achieve that by allowing one worker (say worker-1) to
> extend the relation in bulk and then let it wait and allow another
> worker (say worker-2) to proceed and see if it gets all the pages
> added by worker-1 from FSM. You need to keep the leader also waiting
> or not perform any operation.
>

While looking at results, I have observed one more thing that we are
trying to parallelize I/O due to which we might not be seeing benefit
in such cases. I think even for non-write queries there won't be any
(much) benefit if we can't parallelize CPU usage. Basically, the test
you are doing is for statement: explain analyze verbose create table
test as select * from tenk1;. Now, in this statement, there is no
qualification and still, the Gather node is generated for it, this
won't be the case if we check "select * from tenk1". Is it due to the
reason that the patch completely ignores the parallel_tuple_cost? But
still, it should prefer a serial plan due parallel_setup_cost, why is
that not happening? Anyway, I think we should not parallelize such
queries where we can't parallelize CPU usage. Have you tried the cases
without changing any of the costings for parallelism?

-- 
With Regards,
Amit Kapila.




RE: Parallel Inserts in CREATE TABLE AS

2021-05-28 Thread houzj.f...@fujitsu.com
From: Bharath Rupireddy 
Sent: Thursday, May 27, 2021 10:07 PM
> On Thu, May 27, 2021 at 9:53 AM Bharath Rupireddy
>  wrote:
> > > One idea to find this out could be that we have three counters for
> > > each worker which counts the number of times each worker extended
> > > the relation in bulk, the number of times each worker extended the
> > > relation by one block, the number of times each worker gets the page
> > > from FSM. It might be possible that with this we will be able to
> > > figure out why there is a difference between your and Hou-San's
> > > results.
> >
> > Yeah, that helps. And also, the time spent in
> > LockRelationForExtension, ConditionalLockRelationForExtension,
> > GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
> > insight.
> >
> > My plan is to have a patch with above info added in (which I will
> > share it here so that others can test and see the results too) and run
> > the "case 4" where there's a regression seen on my system.
> 
> I captured below information with the attached patch
> 0001-test-times-and-block-counts.patch applied on top of CTAS v23 patch set.
> Testing details are attached in the file named "test".
> Total time spent in LockRelationForExtension Total time spent in
> GetPageWithFreeSpace Total time spent in RelationAddExtraBlocks Total
> number of times extended the relation in bulk Total number of times extended
> the relation by one block Total number of blocks added in bulk extension Total
> number of times getting the page from FSM
> 
> Here is a summary of what I observed:
> 1) The execution time with 2 workers, without TABLE_INSERT_SKIP_FSM
> (140 sec) is more than with 0 workers (112 sec)
> 2) The execution time with 2 workers, with TABLE_INSERT_SKIP_FSM (225
> sec) is more than with 2 workers, without TABLE_INSERT_SKIP_FSM (140
> sec)
> 3) Majority of the time is going into waiting for relation extension lock in
> LockRelationForExtension. With 2 workers, without TABLE_INSERT_SKIP_FSM,
> out of total execution time 140 sec, the time spent in 
> LockRelationForExtension
> is ~40 sec and the time spent in RelationAddExtraBlocks is ~20 sec. So, ~60 
> sec
> are being spent in these two functions. With 2 workers, with
> TABLE_INSERT_SKIP_FSM, out of total execution time 225 sec, the time spent
> in LockRelationForExtension is ~135 sec and the time spent in
> RelationAddExtraBlocks is 0 sec (because we skip FSM, no bulk extend logic
> applies). So, most of the time is being spent in LockRelationForExtension.
> 
> I'm still not sure why the execution time with 0 workers (or serial execution 
> or
> no parallelism involved) on my testing system is 112 sec compared to 58 sec on
> Hou-San's system for the same use case. Maybe the testing system I'm using is
> not of the latest configuration compared to others.
> 
> Having said that, I request others to try and see if the same observations (as
> above) are made on their testing systems for the same use case. If others 
> don't
> see regression (with just 2 workers) or they observe not much difference with
> and without TABLE_INSERT_SKIP_FSM.

Thanks for the patch !
I attached my test results. Note I did not change the wal_level to minimal.

I only change the the following configuration:

shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

Best regards,
houzj


test_results
Description: test_results


Re: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread Bharath Rupireddy
On Fri, May 28, 2021 at 6:24 AM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Bharath Rupireddy 
> > I'm still not sure why the execution time with 0 workers (or serial 
> > execution or
> > no parallelism involved) on my testing system is 112 sec compared to 58 sec 
> > on
> > Hou-San's system for the same use case. Maybe the testing system I'm using
> > is not of the latest configuration compared to others.
>
> What's the setting of wal_level on your two's systems?  I thought it could be 
> that you set it to > minimal, while Hou-san set it to minimal.  (I forgot the 
> results of 2 and 4 workers, though.)

Thanks. I was earlier running with default wal_level = replica.

Results on my system, with wal_level = minimal, PSA file
"test_results2" for more details:
Without TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 61875.255 ms (01:01.875)
2 workers - Time: 89227.379 ms (01:29.227)
4 workers - Time: 81484.876 ms (01:21.485)
With TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 61279.764 ms (01:01.280)
2 workers - Time: 208620.453 ms (03:28.620)
4 workers - Time: 223737.081 ms (03:43.737)

Results on my system, with wal_level = replica, PSA file
"test_results1" for more details:
Without TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 112175.273 ms (01:52.175)
2 workers - Time: 140441.158 ms (02:20.441)
4 workers - Time: 141750.577 ms (02:21.751)

With TABLE_INSERT_SKIP_FSM:
0 workers/serial execution - Time: 112637.906 ms (01:52.638)
2 workers - Time: 225358.287 ms (03:45.358)
4 workers - Time: 242172.600 ms (04:02.173)

Results on Hou-san's system:
SERIAL: 58759.213 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms  [SKIP FSM]: 58633.924 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms   [SKIP FSM]: 66,960.305 ms

Majority of the time is being spent in LockRelationForExtension,
RelationAddExtraBlocks without TABLE_INSERT_SKIP_FSM and in
LockRelationForExtension with TABLE_INSERT_SKIP_FSM. The observations
made at [1] still hold true with wal_level = minimal.

I request Hou-san to capture the same info with the add-on patch
shared earlier. This would help us to be on the same page. We can
further think on:
1) Why so much time is being spent in LockRelationForExtension?
2) Whether to use TABLE_INSERT_SKIP_FSM or not, in other words,
whether to take advantage of bulk relation extension or not.
3) If bulk relation extension is to be used i.e. without
TABLE_INSERT_SKIP_FSM flag, then whether the blocks being added by one
worker are immediately visible to other workers or not after it
finishes adding all the blocks.

[1] - 
https://www.postgresql.org/message-id/CALj2ACV-VToW65BE6ndDEB7S_3qhzQ_BUWtw2q6V88iwTwwPSg%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


test_results1
Description: Binary data


test_results2
Description: Binary data


Re: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread Amit Kapila
On Thu, May 27, 2021 at 7:37 PM Bharath Rupireddy
 wrote:
>
> On Thu, May 27, 2021 at 9:53 AM Bharath Rupireddy
>  wrote:
> > > One idea to find this out could be that we have three counters for
> > > each worker which counts the number of times each worker extended the
> > > relation in bulk, the number of times each worker extended the
> > > relation by one block, the number of times each worker gets the page
> > > from FSM. It might be possible that with this we will be able to
> > > figure out why there is a difference between your and Hou-San's
> > > results.
> >
> > Yeah, that helps. And also, the time spent in
> > LockRelationForExtension, ConditionalLockRelationForExtension,
> > GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
> > insight.
> >
> > My plan is to have a patch with above info added in (which I will
> > share it here so that others can test and see the results too) and run
> > the "case 4" where there's a regression seen on my system.
>
> I captured below information with the attached patch
> 0001-test-times-and-block-counts.patch applied on top of CTAS v23
> patch set. Testing details are attached in the file named "test".
> Total time spent in LockRelationForExtension
> Total time spent in GetPageWithFreeSpace
> Total time spent in RelationAddExtraBlocks
> Total number of times extended the relation in bulk
> Total number of times extended the relation by one block
> Total number of blocks added in bulk extension
> Total number of times getting the page from FSM
>

In your results, the number of pages each process is getting from FSM
is not matching with the number of blocks added. I think we need to
increment 'fsm_hit_count' in RecordAndGetPageWithFreeSpace as well
because that is also called and the process can get a free page via
the same. The other thing to check via debugger is when one worker
adds the blocks in bulk does another parallel worker gets all those
blocks. You can achieve that by allowing one worker (say worker-1) to
extend the relation in bulk and then let it wait and allow another
worker (say worker-2) to proceed and see if it gets all the pages
added by worker-1 from FSM. You need to keep the leader also waiting
or not perform any operation.

-- 
With Regards,
Amit Kapila.




RE: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread houzj.f...@fujitsu.com
From: Tsunakawa, Takayuki/綱川 貴之 
Sent: Friday, May 28, 2021 8:55 AM
> To: 'Bharath Rupireddy' ; Hou,
> Zhijie/侯 志杰 
> Cc: Amit Kapila ; Tang, Haiying/唐 海英
> ; PostgreSQL-development
> ; Zhihong Yu ; Luc
> Vlaming ; Dilip Kumar ;
> vignesh C 
> Subject: RE: Parallel Inserts in CREATE TABLE AS
> 
> From: Bharath Rupireddy 
> > I'm still not sure why the execution time with 0 workers (or serial
> > execution or no parallelism involved) on my testing system is 112 sec
> > compared to 58 sec on Hou-San's system for the same use case. Maybe
> > the testing system I'm using is not of the latest configuration compared to
> others.
> 
> What's the setting of wal_level on your two's systems?  I thought it could be
> that you set it to > minimal, while Hou-san set it to minimal.  (I forgot the
> results of 2 and 4 workers, though.)

I think I followed the configuration that Bharath-san mentioned.
It could be the hardware's difference, because I am not using SSD.
I will try to test on SSD to see if there is some difference.

I only change the the following configuration:

shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off

Best regards,
houzj


RE: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread tsunakawa.ta...@fujitsu.com
From: Bharath Rupireddy 
> I'm still not sure why the execution time with 0 workers (or serial execution 
> or
> no parallelism involved) on my testing system is 112 sec compared to 58 sec on
> Hou-San's system for the same use case. Maybe the testing system I'm using
> is not of the latest configuration compared to others.

What's the setting of wal_level on your two's systems?  I thought it could be 
that you set it to > minimal, while Hou-san set it to minimal.  (I forgot the 
results of 2 and 4 workers, though.)


Regards
Takayuki Tsunakawa



Re: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread Bharath Rupireddy
On Thu, May 27, 2021 at 9:53 AM Bharath Rupireddy
 wrote:
> > One idea to find this out could be that we have three counters for
> > each worker which counts the number of times each worker extended the
> > relation in bulk, the number of times each worker extended the
> > relation by one block, the number of times each worker gets the page
> > from FSM. It might be possible that with this we will be able to
> > figure out why there is a difference between your and Hou-San's
> > results.
>
> Yeah, that helps. And also, the time spent in
> LockRelationForExtension, ConditionalLockRelationForExtension,
> GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
> insight.
>
> My plan is to have a patch with above info added in (which I will
> share it here so that others can test and see the results too) and run
> the "case 4" where there's a regression seen on my system.

I captured below information with the attached patch
0001-test-times-and-block-counts.patch applied on top of CTAS v23
patch set. Testing details are attached in the file named "test".
Total time spent in LockRelationForExtension
Total time spent in GetPageWithFreeSpace
Total time spent in RelationAddExtraBlocks
Total number of times extended the relation in bulk
Total number of times extended the relation by one block
Total number of blocks added in bulk extension
Total number of times getting the page from FSM

Here is a summary of what I observed:
1) The execution time with 2 workers, without TABLE_INSERT_SKIP_FSM
(140 sec) is more than with 0 workers (112 sec)
2) The execution time with 2 workers, with TABLE_INSERT_SKIP_FSM (225
sec) is more than with 2 workers, without TABLE_INSERT_SKIP_FSM (140
sec)
3) Majority of the time is going into waiting for relation extension
lock in LockRelationForExtension. With 2 workers, without
TABLE_INSERT_SKIP_FSM, out of total execution time 140 sec, the time
spent in LockRelationForExtension is ~40 sec and the time spent in
RelationAddExtraBlocks is ~20 sec. So, ~60 sec are being spent in
these two functions. With 2 workers, with TABLE_INSERT_SKIP_FSM, out
of total execution time 225 sec, the time spent in
LockRelationForExtension is ~135 sec and the time spent in
RelationAddExtraBlocks is 0 sec (because we skip FSM, no bulk extend
logic applies). So, most of the time is being spent in
LockRelationForExtension.

I'm still not sure why the execution time with 0 workers (or serial
execution or no parallelism involved) on my testing system is 112 sec
compared to 58 sec on Hou-San's system for the same use case. Maybe
the testing system I'm using is not of the latest configuration
compared to others.

Having said that, I request others to try and see if the same
observations (as above) are made on their testing systems for the same
use case. If others don't see regression (with just 2 workers) or they
observe not much difference with and without TABLE_INSERT_SKIP_FSM.
I'm open to changing the parallel inserts in CTAS code to use
TABLE_INSERT_SKIP_FSM. In any case, if the observation is that there's
a good amount of time being spent in LockRelationForExtension, I'm not
sure (at this point) whether we can do something here or just live
with it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 105495d5aa6b25f61c812dd5cb9ff692abe38373 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 27 May 2021 13:37:32 +0530
Subject: [PATCH] test times and block counts

---
 src/backend/access/heap/hio.c | 21 
 src/backend/commands/createas.c   | 41 +++
 src/backend/storage/freespace/freespace.c | 22 +++-
 src/backend/storage/lmgr/lmgr.c   | 13 +++
 src/include/commands/createas.h   | 17 ++
 5 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/heap/hio.c b/src/backend/access/heap/hio.c
index d34edb4190..d737c2d763 100644
--- a/src/backend/access/heap/hio.c
+++ b/src/backend/access/heap/hio.c
@@ -19,6 +19,8 @@
 #include "access/hio.h"
 #include "access/htup_details.h"
 #include "access/visibilitymap.h"
+#include "commands/createas.h"
+#include "portability/instr_time.h"
 #include "storage/bufmgr.h"
 #include "storage/freespace.h"
 #include "storage/lmgr.h"
@@ -198,12 +200,17 @@ RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
 firstBlock = InvalidBlockNumber;
 	int			extraBlocks;
 	int			lockWaiters;
+	instr_time	start;
+	instr_time	end;
 
 	/* Use the length of the lock wait queue to judge how much to extend. */
 	lockWaiters = RelationExtensionLockWaiterCount(relation);
 	if (lockWaiters <= 0)
 		return;
 
+	if (is_ctas)
+		INSTR_TIME_SET_CURRENT(start);
+
 	/*
 	 * It might seem like multiplying the number of lock waiters by as much as
 	 * 20 is too aggressive, but benchmarking revealed that smaller numbers
@@ -212,6 +219,9 @@ RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
 	 */
 	extraBlocks = Min(512, 

RE: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread houzj.f...@fujitsu.com
From: Bharath Rupireddy 
Sent: Thursday, May 27, 2021 3:41 PM
> 
> On Thu, May 27, 2021 at 1:03 PM tsunakawa.ta...@fujitsu.com
>  wrote:
> >
> > From: houzj.f...@fujitsu.com 
> > > Although, the 4 workers case still has performance degradation
> > > compared to serial case.
> > >
> > > SERIAL: 58759.213 ms
> > > PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms  [SKIP FSM]:
> > > 58633.924 ms
> > > PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms   [SKIP FSM]:
> > > 66,960.305 ms
> >
> > Can you see any difference in table sizes?
> 
> Also, the number of pages the table occupies in each case along with table 
> size
> would give more insights.
> 
> I do as follows to get the number of pages a relation occupies:
> CREATE EXTENSION pgstattuple;
> SELECT pg_relpages('test');

It seems the difference between SKIP FSM and NOT SKIP FSM is not big.
I tried serval times and the average result is almost the same.

pg_relpages
-
 1428575

pg_relation_size
-
11702976512(11G)

Best regards,
houzj



Re: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread Bharath Rupireddy
On Thu, May 27, 2021 at 2:26 PM Amit Kapila  wrote:
> > I think some other cause of contention on relation extension locks are
> > 1. CTAS is using a buffer strategy and due to that, it might need to
> > evict out the buffer frequently for getting the new block in.  Maybe
> > we can identify by turning off the buffer strategy for CTAS and
> > increasing the shared buffer so that data fits in memory.
> >
>
> One more thing to ensure is whether all the workers are using the same
> access strategy?

In the Parallel Inserts in CTAS patches, the leader and each worker
uses its own ring buffer of 16MB i.e. does myState->bistate =
GetBulkInsertState(); separately.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread Amit Kapila
On Thu, May 27, 2021 at 10:27 AM Dilip Kumar  wrote:
>
> On Thu, May 27, 2021 at 10:16 AM Bharath Rupireddy
>  wrote:
> >
> > On Thu, May 27, 2021 at 7:12 AM houzj.f...@fujitsu.com
> >  wrote:
> > > I am afraid that the using the FSM seems not get a stable performance 
> > > gain(at least on my machine),
> > > I will take a deep look into this to figure out the difference. A naive 
> > > idea it that the benefit that bulk extension
> > > bring is not much greater than the cost in FSM.
> > > Do you have some ideas on it ?
> >
> > I think, if we try what Amit and I said in [1], we should get some
> > insights on whether the bulk relation extension is taking more time or
> > the FSM lookup. I plan to share the testing patch adding the timings
> > and the counters so that you can also test from your end. I hope
> > that's fine with you.
>
> I think some other cause of contention on relation extension locks are
> 1. CTAS is using a buffer strategy and due to that, it might need to
> evict out the buffer frequently for getting the new block in.  Maybe
> we can identify by turning off the buffer strategy for CTAS and
> increasing the shared buffer so that data fits in memory.
>

One more thing to ensure is whether all the workers are using the same
access strategy?

-- 
With Regards,
Amit Kapila.




RE: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread tsunakawa.ta...@fujitsu.com
From: Bharath Rupireddy 
> I think we can discuss this in a separate thread and see what other
> hackers think.

OK, unless we won't get stuck in the current direction.  (Our goal is to not 
degrade in performance, but to outperform serial execution, isn't it?)


> If the idea is to give the user control of whether or not to use the
> separate RING BUFFER for bulk inserts/writes, then how about giving it
> as a rel option? Currently BAS_BULKWRITE (GetBulkInsertState), is
> being used by CTAS, Refresh Mat View, Table Rewrites (ATRewriteTable)
> and COPY. Furthermore, we could make the rel option an integer and
> allow users to provide the size of the ring buffer they want to choose
> for a particular bulk insert operation (of course with a max limit
> which is not exceeding the shared buffers or some reasonable amount
> not exceeding the RAM of the system).

I think it's not a table property but an execution property.  So, it'd be 
appropriate to control it with the SET command, just like the DBA sets work_mem 
and maintenance_work_mem for specific maintenance operations.

I'll stop on this here...


Regards
Takayuki Tsunakawa



Re: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread Bharath Rupireddy
On Thu, May 27, 2021 at 1:03 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: houzj.f...@fujitsu.com 
> > Although, the 4 workers case still has performance degradation compared to
> > serial case.
> >
> > SERIAL: 58759.213 ms
> > PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms  [SKIP FSM]:
> > 58633.924 ms
> > PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms   [SKIP FSM]:
> > 66,960.305 ms
>
> Can you see any difference in table sizes?

Also, the number of pages the table occupies in each case along with
table size would give more insights.

I do as follows to get the number of pages a relation occupies:
CREATE EXTENSION pgstattuple;
SELECT pg_relpages('test');

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread tsunakawa.ta...@fujitsu.com
From: houzj.f...@fujitsu.com 
> Although, the 4 workers case still has performance degradation compared to
> serial case.
> 
> SERIAL: 58759.213 ms
> PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms  [SKIP FSM]:
> 58633.924 ms
> PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms   [SKIP FSM]:
> 66,960.305 ms

Can you see any difference in table sizes?


Regards
Takayuki Tsunakawa



Re: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread Bharath Rupireddy
On Thu, May 27, 2021 at 12:46 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: Dilip Kumar 
> Basically you are creating a new table and loading data to it and that means 
> you will be less likely to access those data soon so for such thing spoiling 
> buffer cache may not be a good idea.
> --
>
> Some people, including me, would say that the table will be accessed soon and 
> that's why the data is loaded quickly during minimal maintenance hours.
>
>
> --
> I was just suggesting only for experiments for identifying the root cause.
> --
>
> I thought this is a good chance to possibly change things better (^^).
> I guess the user would simply think like this: "I just want to finish CTAS as 
> quickly as possible, so I configured to take advantage of parallelism.  I 
> want CTAS to make most use of our resources.  Why doesn't Postgres try to 
> limit resource usage (by using the ring buffer) against my will?"

If the idea is to give the user control of whether or not to use the
separate RING BUFFER for bulk inserts/writes, then how about giving it
as a rel option? Currently BAS_BULKWRITE (GetBulkInsertState), is
being used by CTAS, Refresh Mat View, Table Rewrites (ATRewriteTable)
and COPY. Furthermore, we could make the rel option an integer and
allow users to provide the size of the ring buffer they want to choose
for a particular bulk insert operation (of course with a max limit
which is not exceeding the shared buffers or some reasonable amount
not exceeding the RAM of the system).

I think we can discuss this in a separate thread and see what other
hackers think.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread tsunakawa.ta...@fujitsu.com
From: Dilip Kumar  
Basically you are creating a new table and loading data to it and that means 
you will be less likely to access those data soon so for such thing spoiling 
buffer cache may not be a good idea.
--

Some people, including me, would say that the table will be accessed soon and 
that's why the data is loaded quickly during minimal maintenance hours.


--
I was just suggesting only for experiments for identifying the root cause.
--

I thought this is a good chance to possibly change things better (^^).
I guess the user would simply think like this: "I just want to finish CTAS as 
quickly as possible, so I configured to take advantage of parallelism.  I want 
CTAS to make most use of our resources.  Why doesn't Postgres try to limit 
resource usage (by using the ring buffer) against my will?"


Regards
Takayuki Tsunakawa



RE: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread houzj.f...@fujitsu.com
From: Bharath Rupireddy 
Sent: Thursday, May 27, 2021 2:59 PM
> On Thu, May 27, 2021 at 12:19 PM houzj.f...@fujitsu.com
>  wrote:
> > BTW, I checked my test results, I was testing INSERT INTO unlogged table.
> 
> What do you mean by "testing INSERT INTO"? Is it that you are testing the
> timings for parallel inserts in INSERT INTO ... SELECT command? If so, why
> should we test parallel inserts in the INSERT INTO ... SELECT command here?

Oops, sorry, it's a typo, I actually meant CREATE TABLE AS SELECT.

Best regards,
houzj


Re: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread Bharath Rupireddy
On Thu, May 27, 2021 at 12:19 PM houzj.f...@fujitsu.com
 wrote:
> BTW, I checked my test results, I was testing INSERT INTO unlogged table.

What do you mean by "testing INSERT INTO"? Is it that you are testing
the timings for parallel inserts in INSERT INTO ... SELECT command? If
so, why should we test parallel inserts in the INSERT INTO ... SELECT
command here?

The way I test parallel inserts in CTAS is: Apply the latest v23 patch
set available at [1]. Run the data preparation sqls from [2]. Enable
timing and run the CTAS query from [3]. Run with 0, 2 and 4 workers
with leader participation on.

[1] - 
https://www.postgresql.org/message-id/CALj2ACXVWr1o%2BFZrkQt-2GvYfuMQeJjWohajmp62Wr6BU8Y4VA%40mail.gmail.com

[2]
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12
name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name);
INSERT INTO tenk1 values(generate_series(1,10),
generate_series(1,1000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));

[3]
EXPLAIN ANALYZE VERBOSE CREATE TABLE test AS SELECT * FROM tenk1;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread houzj.f...@fujitsu.com
From: Bharath Rupireddy 
Sent: Thursday, May 27, 2021 12:46 PM
> On Thu, May 27, 2021 at 7:12 AM houzj.f...@fujitsu.com
>  wrote:
> > I am afraid that the using the FSM seems not get a stable performance
> > gain(at least on my machine), I will take a deep look into this to
> > figure out the difference. A naive idea it that the benefit that bulk 
> > extension
> bring is not much greater than the cost in FSM.
> > Do you have some ideas on it ?
> 
> I think, if we try what Amit and I said in [1], we should get some insights on
> whether the bulk relation extension is taking more time or the FSM lookup. I
> plan to share the testing patch adding the timings and the counters so that 
> you
> can also test from your end. I hope that's fine with you.

Sure, it will be nice if we can calculate the exact time. Thanks in advance.

BTW, I checked my test results, I was testing INSERT INTO unlogged table.
I re-test INSERT into normal(logged) table again, it seems [SKIP FSM] still 
Looks slightly better.
Although, the 4 workers case still has performance degradation compared to 
serial case.

SERIAL: 58759.213 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 68390.221 ms  [SKIP FSM]: 58633.924 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 67448.142 ms   [SKIP FSM]: 66,960.305 ms

Best regards,
houzj




Re: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread Dilip Kumar
On Thu, 27 May 2021 at 11:32 AM, tsunakawa.ta...@fujitsu.com <
tsunakawa.ta...@fujitsu.com> wrote:

> From: Dilip Kumar 
> > I think some other cause of contention on relation extension locks are
> > 1. CTAS is using a buffer strategy and due to that, it might need to
> > evict out the buffer frequently for getting the new block in.  Maybe
> > we can identify by turning off the buffer strategy for CTAS and
> > increasing the shared buffer so that data fits in memory.
>
> Yes, both Bhrath-san (on a rich-man's machine) and I (on a poor-man's VM)
> saw that it's effective.  I think we should remove this shackle from CTAS.
>
> The question is why CTAS chose to use BULKWRITE strategy in the past.  We
> need to know that to make a better decision.


Basically you are creating a new table and loading data to it and that
means you will be less likely to access those data soon so for such thing
spoiling buffer cache may not be a good idea.   I was just suggesting only
for experiments for identifying the root cause.

>
-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


RE: Parallel Inserts in CREATE TABLE AS

2021-05-27 Thread tsunakawa.ta...@fujitsu.com
From: Dilip Kumar 
> I think some other cause of contention on relation extension locks are
> 1. CTAS is using a buffer strategy and due to that, it might need to
> evict out the buffer frequently for getting the new block in.  Maybe
> we can identify by turning off the buffer strategy for CTAS and
> increasing the shared buffer so that data fits in memory.

Yes, both Bhrath-san (on a rich-man's machine) and I (on a poor-man's VM) saw 
that it's effective.  I think we should remove this shackle from CTAS.

The question is why CTAS chose to use BULKWRITE strategy in the past.  We need 
to know that to make a better decision.  I can understand why VACUUM uses a 
ring buffer, because it should want to act humbly as a background maintenance 
task to not cause trouble to frontend tasks.  But why does CTAS have to be 
humble?  If CTAS needs to be modest, why doesn't it use the BULKREAD strategy 
for its SELECT?


Regards
Takayuki Tsunakawa



Re: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread Dilip Kumar
On Thu, May 27, 2021 at 10:16 AM Bharath Rupireddy
 wrote:
>
> On Thu, May 27, 2021 at 7:12 AM houzj.f...@fujitsu.com
>  wrote:
> > I am afraid that the using the FSM seems not get a stable performance 
> > gain(at least on my machine),
> > I will take a deep look into this to figure out the difference. A naive 
> > idea it that the benefit that bulk extension
> > bring is not much greater than the cost in FSM.
> > Do you have some ideas on it ?
>
> I think, if we try what Amit and I said in [1], we should get some
> insights on whether the bulk relation extension is taking more time or
> the FSM lookup. I plan to share the testing patch adding the timings
> and the counters so that you can also test from your end. I hope
> that's fine with you.

I think some other cause of contention on relation extension locks are
1. CTAS is using a buffer strategy and due to that, it might need to
evict out the buffer frequently for getting the new block in.  Maybe
we can identify by turning off the buffer strategy for CTAS and
increasing the shared buffer so that data fits in memory.

2. I think the parallel worker are scanning are producing a lot of
tuple in a short time so the demand for the new block is very high
compare to what AddExtra block is able to produce, so maybe you can
try adding more block by increasing the multiplier and see what is the
impact.

3. Also try where the underlying select query has some complex
condition and also it select fewer record say 50%, 40%...10% and see
what are the numbers.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread Bharath Rupireddy
On Thu, May 27, 2021 at 7:12 AM houzj.f...@fujitsu.com
 wrote:
> I am afraid that the using the FSM seems not get a stable performance gain(at 
> least on my machine),
> I will take a deep look into this to figure out the difference. A naive idea 
> it that the benefit that bulk extension
> bring is not much greater than the cost in FSM.
> Do you have some ideas on it ?

I think, if we try what Amit and I said in [1], we should get some
insights on whether the bulk relation extension is taking more time or
the FSM lookup. I plan to share the testing patch adding the timings
and the counters so that you can also test from your end. I hope
that's fine with you.

[1] - 
https://www.postgresql.org/message-id/CALj2ACXskhY58%3DFh8TioKLL1DXYkKdyEyWFYykf-6aLJgJ2qmQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread Bharath Rupireddy
On Thu, May 27, 2021 at 9:43 AM Amit Kapila  wrote:
> > I have read it but I think we should try to ensure practically what is
> > happening because it is possible that first time worker checked in FSM
> > without taking relation extension lock, it didn't find any free page,
> > and then when it tried to acquire the conditional lock, it got the
> > same and just extended the relation by one block. So, in such a case
> > it won't be able to use the newly added pages by another worker. I am
> > not sure any such thing is happening here but I think it is better to
> > verify it in some way. Also, I am not sure if just getting the info
> > about the relation extension lock is sufficient?
> >
>
> One idea to find this out could be that we have three counters for
> each worker which counts the number of times each worker extended the
> relation in bulk, the number of times each worker extended the
> relation by one block, the number of times each worker gets the page
> from FSM. It might be possible that with this we will be able to
> figure out why there is a difference between your and Hou-San's
> results.

Yeah, that helps. And also, the time spent in
LockRelationForExtension, ConditionalLockRelationForExtension,
GetPageWithFreeSpace and RelationAddExtraBlocks too can give some
insight.

My plan is to have a patch with above info added in (which I will
share it here so that others can test and see the results too) and run
the "case 4" where there's a regression seen on my system.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread Amit Kapila
On Wed, May 26, 2021 at 5:51 PM Amit Kapila  wrote:
>
> On Wed, May 26, 2021 at 5:28 PM Bharath Rupireddy
>  wrote:
> >
> > On Fri, May 21, 2021 at 3:46 PM Amit Kapila  wrote:
> > >
> > > On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
> > >  wrote:
> > > >
> >
> > > The other possibility could
> > > be that the free pages added to FSM by one worker are not being used
> > > by another worker due to some reason. Can we debug and check if the
> > > pages added by one worker are being used by another worker?
> >
> > I tried to explain it at [1]. Please have a look.
> >
>
> I have read it but I think we should try to ensure practically what is
> happening because it is possible that first time worker checked in FSM
> without taking relation extension lock, it didn't find any free page,
> and then when it tried to acquire the conditional lock, it got the
> same and just extended the relation by one block. So, in such a case
> it won't be able to use the newly added pages by another worker. I am
> not sure any such thing is happening here but I think it is better to
> verify it in some way. Also, I am not sure if just getting the info
> about the relation extension lock is sufficient?
>

One idea to find this out could be that we have three counters for
each worker which counts the number of times each worker extended the
relation in bulk, the number of times each worker extended the
relation by one block, the number of times each worker gets the page
from FSM. It might be possible that with this we will be able to
figure out why there is a difference between your and Hou-San's
results.

-- 
With Regards,
Amit Kapila.




Re: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread Bharath Rupireddy
On Thu, May 27, 2021 at 7:12 AM houzj.f...@fujitsu.com
 wrote:
> I followed your above test steps and the below configuration, but my test 
> results are a little different from yours.
> I am not sure the exact reason, maybe because of the hardware..
>
> Test INSERT 1000 rows((2 bigint(of 8 bytes) 16 name(of 64 bytes each) 
> columns):
> SERIAL: 22023.631 ms
> PARALLEL 2 WORKER [NOT SKIP FSM]: 21824.934 ms  [SKIP FSM]: 19381.474 ms
> PARALLEL 4 WORKER [NOT SKIP FSM]: 20481.117 ms   [SKIP FSM]: 18381.305 ms

I'm not sure why there's a huge difference in the execution time, on
your system it just takes ~20sec whereas on my system(with SSD) it
takes ~115 sec. I hope you didn't try creating the unlogged table in
CTAS right? Just for reference, the exact use case I tried is at [1].
The configure command I used to build the postgres source code is at
[2]. I don't know whether I'm missing something here.

[1] case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes
each) columns, tuple size 1064 bytes, 10mn tuples
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12
name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name);
INSERT INTO tenk1 values(generate_series(1,1000),
generate_series(1,1000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create table test as select * from tenk1;

[2] ./configure --with-zlib --prefix=$PWD/inst/ --with-openssl
--with-readline  --with-libxml  > war.log && make -j 8 install >
war.log 2>&1 &

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread tsunakawa.ta...@fujitsu.com
Thank you for the detailed analysis, I'll look into it too.  (The times have 
changed...)

From: Bharath Rupireddy 
> Well, one might think to add more blocks at a time, say
> Min(1024, lockWaiters * 128/256/512) than currently extraBlocks =
> Min(512, lockWaiters * 20);. This will work (i.e. we don't see any
> regression with parallel inserts in CTAS patches), but it can't be a
> practical solution. Because the total pages for the relation will be
> more with many pages having more free space. Furthermore, the future
> sequential scans on that relation might take a lot of time.

> Otherwise, the similar speed up can be observed when the BAS_BULKWRITE
> is increased a bit from the current 16MB to some other reasonable
> value. I earlier tried these experiments.
> 
> Otherwise, as I said in [1], we can also increase the number of extra
> blocks added at a time, say Min(1024, lockWaiters * 128/256/512) than
> currently extraBlocks = Min(512, lockWaiters * 20);. This will also
> give some speedup and we don't see any regression with parallel
> inserts in CTAS patches.
> 
> But, I'm not so sure that the hackers will agree any of the above as a
> practical solution to the "relation extension" problem.

I think I understand your concern about resource consumption and impact on 
other concurrently running jobs (OLTP, data analysis.)

OTOH, what's the situation like when the user wants to run CTAS, and further, 
wants to speed it up by using parallelism?  isn't it okay to let the (parallel) 
CTAS use as much as it wants?  At least, I think we can provide another mode 
for it, like Oracle provides conditional path mode and direct path mode for 
INSERT and data loading.

What do we want to do to maximize parallel CTAS speedup if we were a bit 
unshackled from the current constraints (alignment with existing code, impact 
on other concurrent workloads)?

* Use as many shared buffers as possible to decrease WAL flush.
Otherwise, INSERT SELECT may be faster?

* Minimize relation extension (= increase the block count per extension)
posix_fallocate() would help too.

* Allocate added pages among parallel workers, and each worker fills pages to 
their full capacity.
The worker that extended the relation stores the page numbers of added pages in 
shared memory for parallel execution.  Each worker gets a page from there after 
waiting for the relation extension lock, instead of using FSM.
The last pages that the workers used will be filled halfway, but the amount of 
unused space should be low compared to the total table size.



Regards
Takayuki Tsunakawa



RE: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread houzj.f...@fujitsu.com
From: Bharath Rupireddy 
Sent: Wednesday, May 26, 2021 7:22 PM
> Thanks for trying that out.
> 
> Please see the code around the use_fsm flag in RelationGetBufferForTuple for
> more understanding of the points below.
> 
> What happens if FSM is skipped i.e. myState->ti_options =
> TABLE_INSERT_SKIP_FSM;?
> 1) The flag use_fsm will be false in heap_insert->RelationGetBufferForTuple.
> 2) Each worker initially gets a block and keeps inserting into it until it is 
> full.
> When the block is full, the worker doesn't look in FSM GetPageWithFreeSpace
> as use_fsm is false. It directly goes for relation extension and tries to 
> acquire
> relation extension lock with LockRelationForExtension. Note that the bulk
> extension of blocks with RelationAddExtraBlocks is not reached as use_fsm is
> false.
> 3) After acquiring the relation extension lock, it adds an extra new block 
> with
> ReadBufferBI(relation, P_NEW, ...), see the comment "In addition to whatever
> extension we performed above, we always add at least one block to satisfy our
> own request." The tuple is inserted into this new block.
> 
> Basically, the workers can't look for the empty pages from the pages added by
> other workers, they keep doing the above steps in silos.
> 
> What happens if FSM is not skipped i.e. myState->ti_options = 0;?
> 1) The flag use_fsm will be true in heap_insert->RelationGetBufferForTuple.
> 2) Each worker initially gets a block and keeps inserting into it until it is 
> full.
> When the block is full, the worker looks for the page with free space in FSM
> GetPageWithFreeSpace as use_fsm is true.
> If it can't find any page with the required amount of free space, it goes for 
> bulk
> relation extension(RelationAddExtraBlocks) after acquiring relation extension
> lock with ConditionalLockRelationForExtension. Then the worker adds
> extraBlocks = Min(512, lockWaiters * 20); new blocks in
> RelationAddExtraBlocks and immediately updates the bottom level of FSM for
> each block (see the comment around RecordPageWithFreeSpace for why only
> the bottom level, not the entire FSM tree). After all the blocks are added, 
> then
> it updates the entire FSM tree FreeSpaceMapVacuumRange.
> 4) After the bulk extension, then the worker adds another block see the
> comment "In addition to whatever extension we performed above, we always
> add at least one block to satisfy our own request." and inserts tuple into 
> this
> new block.
> 
> Basically, the workers can benefit from the bulk extension of the relation and
> they always can look for the empty pages from the pages added by other
> workers. There are high chances that the blocks will be available after bulk
> extension. Having said that, if the added extra blocks are consumed by the
> workers so fast i.e. if the tuple sizes are big i.e very less tuples per 
> page, then,
> the bulk extension too can't help much and there will be more contention on
> the relation extension lock. Well, one might think to add more blocks at a 
> time,
> say Min(1024, lockWaiters * 128/256/512) than currently extraBlocks = Min(512,
> lockWaiters * 20);. This will work (i.e. we don't see any regression with 
> parallel
> inserts in CTAS patches), but it can't be a practical solution. Because the 
> total
> pages for the relation will be more with many pages having more free space.
> Furthermore, the future sequential scans on that relation might take a lot of
> time.
> 
> If myState->ti_options = TABLE_INSERT_SKIP_FSM; in only the place(within if
> (myState->is_parallel)), then it will be effective for leader i.e. leader 
> will not
> look for FSM, but all the workers will, because within if
> (myState->is_parallel_worker) in intorel_startup,
> myState->ti_options = 0; for workers.
> 
> I ran tests with configuration shown at [1] for the case 4 (2 bigint(of 8 
> bytes
> each) columns, 16 name(of 64 bytes each) columns, tuple size 1064 bytes, 10mn
> tuples) with leader participation where I'm seeing regression:
> 
> 1) when myState->ti_options = TABLE_INSERT_SKIP_FSM; for both leader and
> workers, then my results are as follows:
> 0 workers - 116934.137, 2 workers - 209802.060, 4 workers - 248580.275
> 2) when myState->ti_options = 0; for both leader and workers, then my results
> are as follows:
> 0 workers - 1116184.718, 2 workers - 139798.055, 4 workers - 143022.409
> I hope the above explanation and the test results should clarify the fact that
> skipping FSM doesn't solve the problem. Let me know if anything is not clear 
> or
> I'm missing something.

Thanks for the explanation.
I followed your above test steps and the below configuration, but my test 
results are a little different from yours.
I am not sure the exact reason, maybe because of the hardware..

Test INSERT 1000 rows((2 bigint(of 8 bytes) 16 name(of 64 bytes each) 
columns):
SERIAL: 22023.631 ms
PARALLEL 2 WORKER [NOT SKIP FSM]: 21824.934 ms  [SKIP FSM]: 19381.474 ms
PARALLEL 4 WORKER [NOT SKIP FSM]: 20481.117 ms   [SKIP FSM]: 

Re: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread Amit Kapila
On Wed, May 26, 2021 at 5:28 PM Bharath Rupireddy
 wrote:
>
> On Fri, May 21, 2021 at 3:46 PM Amit Kapila  wrote:
> >
> > On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
> >  wrote:
> > >
>
> > The other possibility could
> > be that the free pages added to FSM by one worker are not being used
> > by another worker due to some reason. Can we debug and check if the
> > pages added by one worker are being used by another worker?
>
> I tried to explain it at [1]. Please have a look.
>

I have read it but I think we should try to ensure practically what is
happening because it is possible that first time worker checked in FSM
without taking relation extension lock, it didn't find any free page,
and then when it tried to acquire the conditional lock, it got the
same and just extended the relation by one block. So, in such a case
it won't be able to use the newly added pages by another worker. I am
not sure any such thing is happening here but I think it is better to
verify it in some way. Also, I am not sure if just getting the info
about the relation extension lock is sufficient?

--
With Regards,
Amit Kapila.




Re: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread Bharath Rupireddy
On Fri, May 21, 2021 at 3:46 PM Amit Kapila  wrote:
>
> On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
> >  wrote:
> > >
> >
> > I analyzed performance of parallel inserts in CTAS for different cases
> > with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
> > gain if the tuple sizes are lower. But if the tuple size is larger
> > i..e 1064bytes, there's a regression with parallel inserts. Upon
> > further analysis, it turned out that the parallel workers are
> > requiring frequent extra blocks addition while concurrently extending
> > the relation(in RelationAddExtraBlocks) and the majority of the time
> > spent is going into flushing those new empty pages/blocks onto the
> > disk.
> >
>
> How you have ensured that the cost is due to the flushing of pages?

I think I'm wrong to just say the problem is with the flushing of
empty pages when bulk extending the relation. I should have said the
problem is with the "relation extension lock", but I will hold on to
it for a moment until I capture the relation extension lock wait
events for the regression causing cases. I will share the information
soon.

> AFAICS, we don't flush the pages rather just write them and then
> register those to be flushed by checkpointer, now it is possible that
> the checkpointer sync queue gets full and the backend has to write by
> itself but have we checked that? I think we can check via wait events,
> if it is due to flush then we should see a lot of file sync
> (WAIT_EVENT_DATA_FILE_SYNC) wait events.

I will also capture the data file sync events along with relation
extension lock wait events.

> The other possibility could
> be that the free pages added to FSM by one worker are not being used
> by another worker due to some reason. Can we debug and check if the
> pages added by one worker are being used by another worker?

I tried to explain it at [1]. Please have a look. It looks like the
burden is more on the "relation extension lock" and the way the extra
new blocks are getting added.

[1] 
https://www.postgresql.org/message-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread Bharath Rupireddy
On Tue, May 25, 2021 at 1:50 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> From: houzj.f...@fujitsu.com 
> > + /*
> > +  * We don't need to skip contacting FSM while inserting tuples
> > for
> > +  * parallel mode, while extending the relations, workers
> > instead of
> > +  * blocking on a page while another worker is inserting, can
> > check the
> > +  * FSM for another page that can accommodate the tuples.
> > This results
> > +  * in major benefit for parallel inserts.
> > +  */
> > + myState->ti_options = 0;
> >
> > I am not quite sure that disabling the " SKIP FSM " in parallel worker will 
> > bring
> > performance gain.
> > In my test environment, if I change this code to use option "
> > TABLE_INSERT_SKIP_FSM ", then there
> > seems no performance degradation.
>
> +1, probably.

I tried to explain it at [1]. Please have a look.

> Does the code comment represent the situation like this?
>
> 1. Worker 1 is inserting into page 1.
>
> 2. Worker 2 tries to insert into page 1, but cannot acquire the buffer 
> content lock of page 1 because worker 1 holds it.
>
> 3. Worker 2 looks up FSM to find a page with enough free space.

I tried to explain it at [1]. Please have a look.

> But isn't FSM still empty during CTAS?

No, FSM will be built on the fly in case if we don't skip the FSM i.e.
myState->ti_options = 0, see RelationGetBufferForTuple with use_fsm =
true -> GetPageWithFreeSpace -> fsm_search -> fsm_set_and_search ->
fsm_readbuf with extend = true.

[1] 
https://www.postgresql.org/message-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread Bharath Rupireddy
On Tue, May 25, 2021 at 1:10 PM tsunakawa.ta...@fujitsu.com
 wrote:
>
> Although this should be a controversial and may be crazy idea, the following 
> change brought 4-11% speedup.  This is because I thought parallel workers 
> might contend for WAL flush as a result of them using the limited ring buffer 
> and flushing dirty buffers when the ring buffer is filled.  Can we take 
> advantage of this?
>
> [GetBulkInsertState]
> /*  bistate->strategy = GetAccessStrategy(BAS_BULKWRITE);*/
> bistate->strategy = NULL;

You are right. If ring buffer(16MB) is not used and shared
buffers(1GB) are used instead, in your case since the table size is
335MB and it can fit in the shared buffers, there will not be any or
will be very minimal dirty buffer flushing, so there will be more some
more speedup.

Otherwise, the similar speed up can be observed when the BAS_BULKWRITE
is increased a bit from the current 16MB to some other reasonable
value. I earlier tried these experiments.

Otherwise, as I said in [1], we can also increase the number of extra
blocks added at a time, say Min(1024, lockWaiters * 128/256/512) than
currently extraBlocks = Min(512, lockWaiters * 20);. This will also
give some speedup and we don't see any regression with parallel
inserts in CTAS patches.

But, I'm not so sure that the hackers will agree any of the above as a
practical solution to the "relation extension" problem.

[1] 
https://www.postgresql.org/message-id/CALj2ACVdcrjwHXwvJqT-Fa32vnJEOjteep_3L24X8MK50E7M8w%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-05-26 Thread Bharath Rupireddy
On Tue, May 25, 2021 at 12:05 PM houzj.f...@fujitsu.com
 wrote:
>
> I noticed one place which could be one of the reasons that cause the 
> performance degradation.
>
> +   /*
> +* We don't need to skip contacting FSM while inserting 
> tuples for
> +* parallel mode, while extending the relations, workers 
> instead of
> +* blocking on a page while another worker is inserting, can 
> check the
> +* FSM for another page that can accommodate the tuples. This 
> results
> +* in major benefit for parallel inserts.
> +*/
> +   myState->ti_options = 0;
>
> I am not quite sure that disabling the " SKIP FSM " in parallel worker will 
> bring performance gain.
> In my test environment, if I change this code to use option " 
> TABLE_INSERT_SKIP_FSM ", then there
> seems no performance degradation . Could you please have a try on it ?
> (I test with the SQL you provided earlier[1])

Thanks for trying that out.

Please see the code around the use_fsm flag in
RelationGetBufferForTuple for more understanding of the points below.

What happens if FSM is skipped i.e. myState->ti_options =
TABLE_INSERT_SKIP_FSM;?
1) The flag use_fsm will be false in heap_insert->RelationGetBufferForTuple.
2) Each worker initially gets a block and keeps inserting into it
until it is full. When the block is full, the worker doesn't look in
FSM GetPageWithFreeSpace as use_fsm is false. It directly goes for
relation extension and tries to acquire relation extension lock with
LockRelationForExtension. Note that the bulk extension of blocks with
RelationAddExtraBlocks is not reached as use_fsm is false.
3) After acquiring the relation extension lock, it adds an extra new
block with ReadBufferBI(relation, P_NEW, ...), see the comment "In
addition to whatever extension we performed above, we always add at
least one block to satisfy our own request." The tuple is inserted
into this new block.

Basically, the workers can't look for the empty pages from the pages
added by other workers, they keep doing the above steps in silos.

What happens if FSM is not skipped i.e. myState->ti_options = 0;?
1) The flag use_fsm will be true in heap_insert->RelationGetBufferForTuple.
2) Each worker initially gets a block and keeps inserting into it
until it is full. When the block is full, the worker looks for the
page with free space in FSM GetPageWithFreeSpace as use_fsm is true.
If it can't find any page with the required amount of free space, it
goes for bulk relation extension(RelationAddExtraBlocks) after
acquiring relation extension lock with
ConditionalLockRelationForExtension. Then the worker adds extraBlocks
= Min(512, lockWaiters * 20); new blocks in RelationAddExtraBlocks and
immediately updates the bottom level of FSM for each block (see the
comment around RecordPageWithFreeSpace for why only the bottom level,
not the entire FSM tree). After all the blocks are added, then it
updates the entire FSM tree FreeSpaceMapVacuumRange.
4) After the bulk extension, then the worker adds another block see
the comment "In addition to whatever extension we performed above, we
always add at least one block to satisfy our own request." and inserts
tuple into this new block.

Basically, the workers can benefit from the bulk extension of the
relation and they always can look for the empty pages from the pages
added by other workers. There are high chances that the blocks will be
available after bulk extension. Having said that, if the added extra
blocks are consumed by the workers so fast i.e. if the tuple sizes are
big i.e very less tuples per page, then, the bulk extension too can't
help much and there will be more contention on the relation extension
lock. Well, one might think to add more blocks at a time, say
Min(1024, lockWaiters * 128/256/512) than currently extraBlocks =
Min(512, lockWaiters * 20);. This will work (i.e. we don't see any
regression with parallel inserts in CTAS patches), but it can't be a
practical solution. Because the total pages for the relation will be
more with many pages having more free space. Furthermore, the future
sequential scans on that relation might take a lot of time.

If myState->ti_options = TABLE_INSERT_SKIP_FSM; in only the
place(within if (myState->is_parallel)), then it will be effective for
leader i.e. leader will not look for FSM, but all the workers will,
because within if (myState->is_parallel_worker) in intorel_startup,
myState->ti_options = 0; for workers.

I ran tests with configuration shown at [1] for the case 4 (2
bigint(of 8 bytes each) columns, 16 name(of 64 bytes each) columns,
tuple size 1064 bytes, 10mn tuples) with leader participation where
I'm seeing regression:

1) when myState->ti_options = TABLE_INSERT_SKIP_FSM; for both leader
and workers, then my results are as follows:
0 workers - 116934.137, 2 workers - 209802.060, 4 workers - 248580.275
2) when myState->ti_options = 0; for both 

RE: Parallel Inserts in CREATE TABLE AS

2021-05-25 Thread tsunakawa.ta...@fujitsu.com
From: houzj.f...@fujitsu.com 
> + /*
> +  * We don't need to skip contacting FSM while inserting tuples
> for
> +  * parallel mode, while extending the relations, workers
> instead of
> +  * blocking on a page while another worker is inserting, can
> check the
> +  * FSM for another page that can accommodate the tuples.
> This results
> +  * in major benefit for parallel inserts.
> +  */
> + myState->ti_options = 0;
> 
> I am not quite sure that disabling the " SKIP FSM " in parallel worker will 
> bring
> performance gain.
> In my test environment, if I change this code to use option "
> TABLE_INSERT_SKIP_FSM ", then there
> seems no performance degradation.

+1, probably.

Does the code comment represent the situation like this?

1. Worker 1 is inserting into page 1.

2. Worker 2 tries to insert into page 1, but cannot acquire the buffer content 
lock of page 1 because worker 1 holds it.

3. Worker 2 looks up FSM to find a page with enough free space.

But isn't FSM still empty during CTAS?



Regards
Takayuki Tsunakawa



RE: Parallel Inserts in CREATE TABLE AS

2021-05-25 Thread tsunakawa.ta...@fujitsu.com
Bharath-san, all,


Hmm, I didn't experience performance degradation on my poor-man's Linux VM (4 
CPU, 4 GB RAM, HDD)...

[benchmark preparation]
autovacuum = off
shared_buffers = 1GB
checkpoint_timeout = 1h
max_wal_size = 8GB
min_wal_size = 8GB
(other settings to enable parallelism)
CREATE UNLOGGED TABLE a (c char(1100));
INSERT INTO a SELECT i FROM generate_series(1, 30) i;
(the table size is 335 MB)

[benchmark]
CREATE TABLE b AS SELECT * FROM a;
DROP TABLE a;
CHECKPOINT;
(measure only CTAS)


[results]
parallel_leader_participation = off
  workers  time(ms)
  0  3921
  2  3290
  4  3132
parallel_leader_participation = on
  workers  time(ms)
  2  3266
  4  3247


Although this should be a controversial and may be crazy idea, the following 
change brought 4-11% speedup.  This is because I thought parallel workers might 
contend for WAL flush as a result of them using the limited ring buffer and 
flushing dirty buffers when the ring buffer is filled.  Can we take advantage 
of this?

[GetBulkInsertState]
/*  bistate->strategy = GetAccessStrategy(BAS_BULKWRITE);*/
bistate->strategy = NULL;


[results]
parallel_leader_participation = off
  workers  time(ms)
  0  3695  (5% reduction)
  2  3135  (4% reduction)
  4  2767  (11% reduction)


Regards
Takayuki Tsunakawa



RE: Parallel Inserts in CREATE TABLE AS

2021-05-25 Thread houzj.f...@fujitsu.com
Hi Bharath-san,

From: Bharath Rupireddy 
Sent: Friday, May 21, 2021 6:49 PM
> 
> On Fri, May 21, 2021 at 3:46 PM Amit Kapila  wrote:
> >
> > On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
> >  wrote:
> > >
> > > On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
> > >  wrote:
> > > >
> > >
> > > I analyzed performance of parallel inserts in CTAS for different
> > > cases with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We
> > > could gain if the tuple sizes are lower. But if the tuple size is
> > > larger i..e 1064bytes, there's a regression with parallel inserts.
> > > Upon further analysis, it turned out that the parallel workers are
> > > requiring frequent extra blocks addition while concurrently
> > > extending the relation(in RelationAddExtraBlocks) and the majority
> > > of the time spent is going into flushing those new empty
> > > pages/blocks onto the disk.
> > >
> >
> > How you have ensured that the cost is due to the flushing of pages?
> > AFAICS, we don't flush the pages rather just write them and then
> > register those to be flushed by checkpointer, now it is possible that
> > the checkpointer sync queue gets full and the backend has to write by
> > itself but have we checked that? I think we can check via wait events,
> > if it is due to flush then we should see a lot of file sync
> > (WAIT_EVENT_DATA_FILE_SYNC) wait events.  The other possibility could
> > be that the free pages added to FSM by one worker are not being used
> > by another worker due to some reason. Can we debug and check if the
> > pages added by one worker are being used by another worker?
> 
> Thanks! I will work on the above points sometime later.

I noticed one place which could be one of the reasons that cause the 
performance degradation.

+   /*
+* We don't need to skip contacting FSM while inserting tuples 
for
+* parallel mode, while extending the relations, workers 
instead of
+* blocking on a page while another worker is inserting, can 
check the
+* FSM for another page that can accommodate the tuples. This 
results
+* in major benefit for parallel inserts.
+*/
+   myState->ti_options = 0;

I am not quite sure that disabling the " SKIP FSM " in parallel worker will 
bring performance gain.
In my test environment, if I change this code to use option " 
TABLE_INSERT_SKIP_FSM ", then there
seems no performance degradation . Could you please have a try on it ?
(I test with the SQL you provided earlier[1])

[1] 
https://www.postgresql.org/message-id/CALj2ACWFvNm4d_uqT2iECPqaXZjEd-O%2By8xbghvqXeMLj0pxGw%40mail.gmail.com

Best regards,
houzj


Re: Parallel Inserts in CREATE TABLE AS

2021-05-21 Thread Bharath Rupireddy
On Fri, May 21, 2021 at 3:46 PM Amit Kapila  wrote:
>
> On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
> >  wrote:
> > >
> >
> > I analyzed performance of parallel inserts in CTAS for different cases
> > with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
> > gain if the tuple sizes are lower. But if the tuple size is larger
> > i..e 1064bytes, there's a regression with parallel inserts. Upon
> > further analysis, it turned out that the parallel workers are
> > requiring frequent extra blocks addition while concurrently extending
> > the relation(in RelationAddExtraBlocks) and the majority of the time
> > spent is going into flushing those new empty pages/blocks onto the
> > disk.
> >
>
> How you have ensured that the cost is due to the flushing of pages?
> AFAICS, we don't flush the pages rather just write them and then
> register those to be flushed by checkpointer, now it is possible that
> the checkpointer sync queue gets full and the backend has to write by
> itself but have we checked that? I think we can check via wait events,
> if it is due to flush then we should see a lot of file sync
> (WAIT_EVENT_DATA_FILE_SYNC) wait events.  The other possibility could
> be that the free pages added to FSM by one worker are not being used
> by another worker due to some reason. Can we debug and check if the
> pages added by one worker are being used by another worker?

Thanks! I will work on the above points sometime later.

BTW, I forgot to mention one point earlier that we see a benefit
without parallelism if only multi inserts are used for CTAS instead of
single inserts. See [2] for more testing results. I used "New Table
Access Methods for Multi and Single Inserts" patches from [1] for this
testing. I think it's a good idea to revisit that work.

[1] - 
https://www.postgresql.org/message-id/CALj2ACXdrOmB6Na9amHWZHKvRT3Z0nwTRsCwoMT-npOBtmXLXg%40mail.gmail.com
[2]
case 1 - 2 integer(of 4 bytes each) columns, tuple size 32 bytes, 100mn tuples
on master - 130sec
on master with multi inserts - 105sec, gain - 1.23X
on parallel CTAS patch without multi inserts - (2 workers,  82sec,
1.58X), (4 workers, 83sec, 1.56X)
on parallel CTAS patch with multi inserts - (2 workers,  45sec, 2.33X,
overall gain if seen from master 2.88X), (4 workers, 33sec, 3.18X,
overall gain if seen from master 3.9X)

case 2 - 2 integer(of 4 bytes each) columns, 3 varchar(8), tuple size
59 bytes, 100mn tuples
on master - 185sec
on master with multi inserts - 121sec, gain - 1.52X
on parallel CTAS patch without multi inserts - (2 workers,  120sec,
1.54X), (4 workers, 123sec, 1.5X)
on parallel CTAS patch with multi inserts - (2 workers,  68sec, 1.77X,
overall gain if seen from master  2.72X), (4 workers, 61sec, 1.98X,
overall gain if seen from master 3.03X)

Above two cases are the best cases with tuple size a few bytes where
parallel CTAS + multi inserts would give up to 3.9X and 3.03X
benefits.

case 3 - 2 bigint(of 8 bytes each) columns, 3 name(of 64 bytes each)
columns, 1 varchar(8), tuple size 241 bytes, 100mn tuples
on master - 367sec
on master with multi inserts - 291sec, gain - 1.26X
on parallel CTAS patch without multi inserts - (2 workers,  334sec,
1.09X), (4 workers, 336sec, 1.09X)
on parallel CTAS patch with multi inserts - (2 workers,  284sec,
1.02X, overall gain if seen from master 1.29X), (4 workers, 278sec,
1.04X, overall gain if seen from master 1.32X)

Above case where tuple size is 241 bytes, we don't gain much.

case 4 - 2 bigint(of 8 bytes each) columns, 16 name(of 64 bytes each)
columns, tuple size 1064 bytes, 10mn tuples
on master - 120sec
on master with multi inserts - 115sec, gain - 1.04X
on parallel CTAS patch without multi inserts - (2 workers,  140sec,
0.85X), (4 workers, 142sec, 0.84X)
on parallel CTAS patch with multi inserts - (2 workers,  133sec,
0.86X, overall loss if seen from master  0.9X), (4 workers, 134sec,
0.85X, overall loss if seen from master 0.89X)

Above case where tuple size is 1064 bytes, we gain very little with
multi inserts and with parallel inserts we cause regression.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-05-21 Thread Amit Kapila
On Fri, Mar 19, 2021 at 11:02 AM Bharath Rupireddy
 wrote:
>
> On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
>  wrote:
> >
>
> I analyzed performance of parallel inserts in CTAS for different cases
> with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
> gain if the tuple sizes are lower. But if the tuple size is larger
> i..e 1064bytes, there's a regression with parallel inserts. Upon
> further analysis, it turned out that the parallel workers are
> requiring frequent extra blocks addition while concurrently extending
> the relation(in RelationAddExtraBlocks) and the majority of the time
> spent is going into flushing those new empty pages/blocks onto the
> disk.
>

How you have ensured that the cost is due to the flushing of pages?
AFAICS, we don't flush the pages rather just write them and then
register those to be flushed by checkpointer, now it is possible that
the checkpointer sync queue gets full and the backend has to write by
itself but have we checked that? I think we can check via wait events,
if it is due to flush then we should see a lot of file sync
(WAIT_EVENT_DATA_FILE_SYNC) wait events.  The other possibility could
be that the free pages added to FSM by one worker are not being used
by another worker due to some reason. Can we debug and check if the
pages added by one worker are being used by another worker?

-- 
With Regards,
Amit Kapila.




Re: Parallel Inserts in CREATE TABLE AS

2021-03-19 Thread Greg Nancarrow
On Fri, Mar 19, 2021 at 4:33 PM Bharath Rupireddy
 wrote:
>
> In an offlist discussion with Robert and Dilip, using fallocate to
> extend the relation may help to extend the relation faster. In regards
> to this, it looks like the AIO/DIO patch set of Andres [1]  which
> involves using fallocate() to extend files will surely be helpful.
> Until then, we honestly feel that the parallel inserts in CTAS patch
> set be put on hold and revive it later.
>

Hi,

I had partially reviewed some of the patches (first scan) when I was
alerted to your post and intention to put the patch on hold.
I thought I'd just post the comments I have so far, and you can look
at them at a later time when/if you revive the patch.


Patch 0001

1) Patch comment

Leader inserts its share of tuples if instructed to do, and so are workers

should be:

Leader inserts its share of tuples if instructed to, and so do the workers.


2)

void
SetParallelInsertState(ParallelInsertCmdKind ins_cmd, QueryDesc *queryDesc)
{
GatherState *gstate;
DestReceiver *dest;

Assert(queryDesc && (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));

gstate = (GatherState *) queryDesc->planstate;
dest = queryDesc->dest;

/*
* Parallel insertions are not possible either if the upper node is not
* Gather or it's a Gather but it have some projections to perform.
*/
if (!IsA(gstate, GatherState) || gstate->ps.ps_ProjInfo)
return;


I think it would look better for code to be:


dest = queryDesc->dest;

/*
* Parallel insertions are not possible either if the upper node is not
* Gather or it's a Gather but it have some projections to perform.
*/
if (!IsA(queryDesc->planstate, GatherState) ||
queryDesc->planstate.ps_ProjInfo)
return;
gstate = (GatherState *) queryDesc->planstate;


3) src/backend/executor/execParallel.c

+ pg_atomic_uint64 processed;

I am wondering, when there is contention from multiple workers in
writing back their processed count, how well does this work? Any
performance issues?
For the Parallel INSERT patch (which has not yet been committed) it
currently uses an array of processed counts for the workers (since #
of workers is capped) so there is never any contention related to
this.


4) src/backend/executor/execParallel.c

You shouldn't use intermingled declarations and code.
https://www.postgresql.org/docs/13/source-conventions.html

Best to move the uninitialized variable declaration to the top of the block:


ParallelInsertCTASInfo *info = NULL;
char *intoclause_str = NULL;
int intoclause_len;
char *intoclause_space = NULL;

should be:

int intoclause_len;
ParallelInsertCTASInfo *info = NULL;
char *intoclause_str = NULL;
char *intoclause_space = NULL;


5) ExecParallelGetInsReceiver

Would look better to have:

DR_intorel *receiver;

receiver = (DR_intorel *)CreateIntoRelDestReceiver(intoclause);

receiver->is_parallel_worker = true;
receiver->object_id = fpes->objectid;


6) GetParallelInsertCmdType

I think the following would be better:

ParallelInsertCmdKind
GetParallelInsertCmdType(DestReceiver *dest)
{
if (dest &&
dest->mydest == DestIntoRel &&
((DR_intorel *) dest)->is_parallel)
return PARALLEL_INSERT_CMD_CREATE_TABLE_AS;

return PARALLEL_INSERT_CMD_UNDEF;
}


7) IsParallelInsertAllowed

In the following code:

/* Below check may hit in case this function is called from explain.c. */
if (!(into && IsA(into, IntoClause)))
return false;

If "into" is non-NULL, isn't it guaranteed to point at an IntoClause?

I think the code can just be:

/* Below check may hit in case this function is called from explain.c. */
if (!into)
return false;

8) ExecGather

The comments and variable name are likely to cause confusion when the
parallel INSERT statement is implemented. Suggest minor change:

change:

   bool perform_parallel_ins = false;

to:

   bool perform_parallel_ins_no_readers = false;


change:

/*
* Do not create tuple queue readers for commands with parallel
* insertion. Because the gather node will not receive any
* tuples, the workers will insert the tuples into the target
* relation.
*/

to:

/*
* Do not create tuple queue readers for commands with parallel
* insertion that don't additionally return tuples. In this case,
* the workers will only insert the tuples into the target
* relation and the gather node will not receive any tuples.
*/


I think some changes in other areas are needed for the same reasons.



Patch 0002

1) I noticed that "rows" is not zero (and so is not displayed as 0 in
the EXPLAIN output for Gather) for the Gather node when parallel
inserts will be used. This doesn't seem to be right. I think that if
PARALLEL_INSERT_CAN_IGN_TUP_COST is set, path->rows should be set to
0, and just let existing "run_cost" be evaluated as normal (which will
be 0 as path->rows is 0).

2) Is PARALLEL_INSERT_TUP_COST_IGNORED actually needed? Couldn't only
PARALLEL_INSERT_CAN_IGN_TUP_COST be used for the 

RE: Parallel Inserts in CREATE TABLE AS

2021-03-19 Thread houzj.f...@fujitsu.com
> > They are pretty simple though. I think someone can also check if the
> > same regression exists for parallel inserts in "INSERT INTO SELECT"
> > patch set as well for larger tuple sizes.
> 
> Thanks for reminding.
> I did some performance tests for parallel inserts in " INSERT INTO SELECT " 
> with
> the testcase you provided, the regression seems does not exists in "INSERT
> INTO SELECT".

I forgot to share the test results with Parallel CTAS.

I test with sql: explain analyze verbose create table test as select * from 
tenk1;

> CREATE UNLOGGED TABLE tenk1(c1 int, c2 int);
> CREATE UNLOGGED TABLE tenk1(c1 int, c2 int, c3 varchar(8), c4 varchar(8), c5 
> varchar(8));
> CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5 name, 
> c6 varchar(8));
I did not see regression in these cases (low tuple size).

> CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5 name, 
> c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12 name, c13 name, 
> c14 name, 
> c15 name, c16 name, c17 name, c18 name);

I can see the degradation in this case.
The average test results of CTAS are:
Serial  CTAS -Execution Time: 80892.240 ms
Parallel CTAS -Execution Time: 85725.591 ms
About 6% degradation.

I also test with Parallel INSERT patch in this case.
(Note: to keep consistent, I create a new target table(test) before inserting.)
The average test results of Parallel INSERT are:
Serial Parallel INSERT -- Execution Time: 90075.501 ms
Parallel Parallel INSERT- Execution Time: 85812.202 ms
No degradation.

Best regards,
houzj



RE: Parallel Inserts in CREATE TABLE AS

2021-03-19 Thread houzj.f...@fujitsu.com
> They are pretty simple though. I think someone can also check if the same
> regression exists for parallel inserts in "INSERT INTO SELECT"
> patch set as well for larger tuple sizes.

Thanks for reminding.
I did some performance tests for parallel inserts in " INSERT INTO SELECT " 
with the testcase you provided,
the regression seems does not exists in "INSERT INTO SELECT".

I will try to test with larger tuple size later.

Best regards,
houzj


Re: Parallel Inserts in CREATE TABLE AS

2021-03-19 Thread Bharath Rupireddy
On Fri, Mar 19, 2021 at 12:45 PM tanghy.f...@fujitsu.com
 wrote:
>
> From: Bharath Rupireddy 
> >I analyzed performance of parallel inserts in CTAS for different cases
> >with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
> >gain if the tuple sizes are lower. But if the tuple size is larger
> >i..e 1064bytes, there's a regression with parallel inserts.
>
> Thanks for the update.
> BTW, May be you have some more testcases that can reproduce this regression 
> easily.
> Can you please share some of the testcase (with big tuple size) with me.

They are pretty simple though. I think someone can also check if the
same regression exists for parallel inserts in "INSERT INTO SELECT"
patch set as well for larger tuple sizes.

[1]
DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 int, c2 int);
INSERT INTO tenk1 values(generate_series(1,1),
generate_series(1,1));
explain analyze verbose create table test as select * from tenk1;

DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 int, c2 int, c3 varchar(8), c4
varchar(8), c5 varchar(8));
INSERT INTO tenk1 values(generate_series(1,1),
generate_series(1,1),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create table test as select * from tenk1;

DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 varchar(8));
INSERT INTO tenk1 values(generate_series(1,1),
generate_series(1,1),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create table test as select * from tenk1;

DROP TABLE tenk1;
CREATE UNLOGGED TABLE tenk1(c1 bigint, c2 bigint, c3 name, c4 name, c5
name, c6 name, c7 name, c8 name, c9 name, c10 name, c11 name, c12
name, c13 name, c14 name, c15 name, c16 name, c17 name, c18 name);
INSERT INTO tenk1 values(generate_series(1,1000),
generate_series(1,1000),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)),
upper(substring(md5(random()::varchar),2,8)));
explain analyze verbose create unlogged table test as select * from tenk1;

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: Parallel Inserts in CREATE TABLE AS

2021-03-19 Thread tanghy.f...@fujitsu.com
From: Bharath Rupireddy 
>I analyzed performance of parallel inserts in CTAS for different cases
>with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
>gain if the tuple sizes are lower. But if the tuple size is larger
>i..e 1064bytes, there's a regression with parallel inserts.

Thanks for the update.
BTW, May be you have some more testcases that can reproduce this regression 
easily.
Can you please share some of the testcase (with big tuple size) with me.

Regards,
Tang




Re: Parallel Inserts in CREATE TABLE AS

2021-03-18 Thread Bharath Rupireddy
On Wed, Jan 27, 2021 at 1:47 PM Bharath Rupireddy
 wrote:
>
> On Wed, Jan 27, 2021 at 1:25 PM Tang, Haiying
>  wrote:
> > I choose 5 cases which pick parallel insert plan in CTAS to measure the 
> > patched performance. Each case run 30 times.
> >
> > Most of the tests execution become faster with this patch.
> >
> > However, Test NO 4(create table xxx as table xxx.) appears performance 
> > degradation. I tested various table size(2/10/20 millions), they all have a 
> > 6%-10% declines. I think it may need some check at this problem.
> >
> >
> >
> > Below are my test results. 'Test NO' is corresponded to 'Test NO' in 
> > attached test_ctas.sql file.
> >
> > reg%=(patched-master)/master
> >
> > Test NO |  Test Case |reg%  | patched(ms)  | master(ms)
> >
> > ||--|--|-
> >
> > 1   |  CTAS select from table| -9%  | 16709.50477  | 18370.76660
> >
> > 2   |  Append plan   | -14% | 16542.97807  | 19305.86600
> >
> > 3   |  initial plan under Gather node| -5%  | 13374.27187  | 14120.02633
> >
> > 4   |  CTAS table| 10%  | 20835.48800  | 18986.40350
> >
> > 5   |  CTAS select from execute  | -6%  | 16973.73890  | 18008.59789
> >
> >
> >
> > About Test NO 4:
> >
> > In master(HEAD), this test case picks serial seq scan.
> >
> > query plan likes:
> >
> > --
> >
> > Seq Scan on public.tenk1  (cost=0.00..444828.12 rows=1012 width=244) 
> > (actual time=0.005..1675.268 rows=1000 loops=1)
> >
> >Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, 
> > twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4  
> > Planning Time: 0.053 ms  Execution Time: 20165.023 ms
> >
> >
> >
> > With this patch, it will choose parallel seq scan and parallel insert.
> >
> > query plan likes:
> >
> > --
> >
> > Gather  (cost=1000.00..370828.03 rows=1012 width=244) (actual 
> > time=20428.823..20437.143 rows=0 loops=1)
> >
> >Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, 
> > twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
> >
> >Workers Planned: 4
> >
> >Workers Launched: 4
> >
> > ->  Create test
> >
> >->  Parallel Seq Scan on public.tenk1  (cost=0.00..369828.03 
> > rows=253 width=244) (actual time=0.021..411.094 rows=200 loops=5)
> >
> >  Output: unique1, unique2, two, four, ten, twenty, hundred, 
> > thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, 
> > string4
> >
> >  Worker 0:  actual time=0.023..390.856 rows=1858407 loops=1
> >
> >  Worker 1:  actual time=0.024..468.587 rows=2264494 loops=1
> >
> >  Worker 2:  actual time=0.023..473.170 rows=2286580 loops=1
> >
> >  Worker 3:  actual time=0.027..373.727 rows=1853216 loops=1  
> > Planning Time: 0.053 ms  Execution Time: 20437.643 ms
> >
> >
> >
> > test machine spec:
> >
> > CentOS 8.2, 128G RAM, 40 processors, disk SAS
>
> Thanks a lot for the performance tests and test cases. I will analyze
> why the performance is degrading one case and respond soon.

I analyzed performance of parallel inserts in CTAS for different cases
with tuple size 32bytes, 59bytes, 241bytes and 1064bytes. We could
gain if the tuple sizes are lower. But if the tuple size is larger
i..e 1064bytes, there's a regression with parallel inserts. Upon
further analysis, it turned out that the parallel workers are
requiring frequent extra blocks addition while concurrently extending
the relation(in RelationAddExtraBlocks) and the majority of the time
spent is going into flushing those new empty pages/blocks onto the
disk. I saw no regression when I incremented(for testing purpose) the
rate at which the extra blocks are added in RelationAddExtraBlocks to
extraBlocks = Min(1024, lockWaiters * 512); (currently it is
extraBlocks = Min(512, lockWaiters * 20); Incrementing the extra
blocks addition rate is not a practical solution to this problem
though.

In an offlist discussion with Robert and Dilip, using fallocate to
extend the relation may help to extend the relation faster. In regards
to this, it looks like the AIO/DIO patch set of Andres [1]  which
involves using fallocate() to extend files will surely be helpful.
Until then, we honestly feel that the parallel inserts in CTAS patch
set be put on hold and revive it later.

[1] - 
https://www.postgresql.org/message-id/flat/20210223100344.llw5an2aklengrmn%40alap3.anarazel.de

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: Parallel Inserts in CREATE TABLE AS

2021-03-17 Thread houzj.f...@fujitsu.com

> > Seems like v22 patch was failing in cfbot for one of the unstable test 
> > cases.
> > Attaching v23 patch set with modification in 0003 and 0004 patches. No
> > changes to 0001 and 0002 patches. Hopefully cfbot will be happy with v23.
> >
> > Please consider v23 for further review.
> Hi,
> 
> I was looking into the latest patch, here are some comments:

Few comments.

1)
Executing the following SQL will cause assertion failure.
---sql---
create table data(a int);
insert into data select 1 from generate_series(1,100,1) t;
explain (verbose) create table tt as select a,2 from data;
--

The stack message:
---stack---
TRAP: FailedAssertion("!allow && gather_exists && tuple_cost_opts && 
!(*tuple_cost_opts & PARALLEL_INSERT_TUP_COST_IGNORED)", File: 
"execParallel.c", Line: 1872, PID: 1618247)
postgres: houzj postgres [local] EXPLAIN(ExceptionalCondition+0x8b)[0x940f0b]
postgres: houzj postgres [local] EXPLAIN[0x67ba1c]
postgres: houzj postgres [local] EXPLAIN(ExplainOnePlan+0x1c2)[0x605997]
postgres: houzj postgres [local] EXPLAIN[0x605d11]
postgres: houzj postgres [local] EXPLAIN(ExplainOneUtility+0x162)[0x605eb0]
--
In this case, The Gather node have projection in which case parallel CTAS is 
not supported,
but we still ignore the cost in planner.
If we plan to detect the projection, we may need to check tlist_same_exprs.

+if (tlist_same_exprs)
+   {
ignore_parallel_tuple_cost(root);
+}
generate_useful_gather_paths(root, rel, false);

2)
+* Parallelize inserts only when the upper Gather node has no 
projections.
 */
-   gstate->dest = dest;
+   if (!gstate->ps.ps_ProjInfo)

IMO, It's better to add some comments about why we do not support projection 
for now.
Because, not all the projection are parallel unsafe (such as the case in 1) ), 
it will be desirable to support these later.

3)

+   if 
(IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+  
_ins_info))
...
/* plan the query */
plan = pg_plan_query(query, pstate->p_sourcetext,
 
CURSOR_OPT_PARALLEL_OK, params);
...
+   if 
(IsParallelInsertionAllowed(PARALLEL_INSERT_CMD_CREATE_TABLE_AS,
+  
_ins_info))
Currently, the patch call IsParallelInsertionAllowed() before and after 
pg_plan_query(),
This might lead to a misunderstanding that parallel_ins_info will get changed 
during pg_plan_query().
Since parallel_ins_info will not get changed in pg_plan_query, is it possible 
to add a bool flag(allowed) 
in parallel_ins_info to avoid the second call of IsParallelInsertionAllowed ?

Best regards,
houzj


RE: Parallel Inserts in CREATE TABLE AS

2021-03-10 Thread houzj.f...@fujitsu.com
> Seems like v22 patch was failing in cfbot for one of the unstable test cases.
> Attaching v23 patch set with modification in 0003 and 0004 patches. No
> changes to 0001 and 0002 patches. Hopefully cfbot will be happy with v23.
> 
> Please consider v23 for further review.
Hi,

I was looking into the latest patch, here are some comments:

1)
 * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
 * MATERIALIZED VIEW to use parallel plans, but as of now, only the 
leader
 * backend writes into a completely new table.  In the future, we can

Since we will support parallel insert with CTAS, this existing comments need to 
be changed.

2)
In parallel.sgml

The query writes any data or locks any database rows. If a query
contains a data-modifying operation either at the top level or within
a CTE, no parallel plans for that query will be generated.  As an
exception, the commands CREATE TABLE ... AS,

The same as 1), we'd better comment we have support parallel insert with CTAS.

3)
ExecInitParallelPlan(PlanState *planstate, EState *estate,
 Bitmapset *sendParams, int nworkers,
-int64 tuples_needed)
+int64 tuples_needed,
+ParallelInsertCmdKind parallel_ins_cmd,
+void *parallel_ins_info)

Is it better to place ParallelInsertCmdKind in struct ParallelInsertCTASInfo?
We can pass less parameter in some places.
Like:
typedef struct ParallelInsertCTASInfo
{
+   ParallelInsertCmdKind parallel_ins_cmd;
IntoClause *intoclause;
Oid objectid;

} ParallelInsertCTASInfo;

Best regards,
houzj


Re: Parallel Inserts in CREATE TABLE AS

2021-01-27 Thread Bharath Rupireddy
On Wed, Jan 27, 2021 at 1:25 PM Tang, Haiying
 wrote:
> I choose 5 cases which pick parallel insert plan in CTAS to measure the 
> patched performance. Each case run 30 times.
>
> Most of the tests execution become faster with this patch.
>
> However, Test NO 4(create table xxx as table xxx.) appears performance 
> degradation. I tested various table size(2/10/20 millions), they all have a 
> 6%-10% declines. I think it may need some check at this problem.
>
>
>
> Below are my test results. 'Test NO' is corresponded to 'Test NO' in attached 
> test_ctas.sql file.
>
> reg%=(patched-master)/master
>
> Test NO |  Test Case |reg%  | patched(ms)  | master(ms)
>
> ||--|--|-
>
> 1   |  CTAS select from table| -9%  | 16709.50477  | 18370.76660
>
> 2   |  Append plan   | -14% | 16542.97807  | 19305.86600
>
> 3   |  initial plan under Gather node| -5%  | 13374.27187  | 14120.02633
>
> 4   |  CTAS table| 10%  | 20835.48800  | 18986.40350
>
> 5   |  CTAS select from execute  | -6%  | 16973.73890  | 18008.59789
>
>
>
> About Test NO 4:
>
> In master(HEAD), this test case picks serial seq scan.
>
> query plan likes:
>
> --
>
> Seq Scan on public.tenk1  (cost=0.00..444828.12 rows=1012 width=244) 
> (actual time=0.005..1675.268 rows=1000 loops=1)
>
>Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, 
> twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4  
> Planning Time: 0.053 ms  Execution Time: 20165.023 ms
>
>
>
> With this patch, it will choose parallel seq scan and parallel insert.
>
> query plan likes:
>
> --
>
> Gather  (cost=1000.00..370828.03 rows=1012 width=244) (actual 
> time=20428.823..20437.143 rows=0 loops=1)
>
>Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, 
> twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
>
>Workers Planned: 4
>
>Workers Launched: 4
>
> ->  Create test
>
>->  Parallel Seq Scan on public.tenk1  (cost=0.00..369828.03 rows=253 
> width=244) (actual time=0.021..411.094 rows=200 loops=5)
>
>  Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, 
> twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4
>
>  Worker 0:  actual time=0.023..390.856 rows=1858407 loops=1
>
>  Worker 1:  actual time=0.024..468.587 rows=2264494 loops=1
>
>  Worker 2:  actual time=0.023..473.170 rows=2286580 loops=1
>
>  Worker 3:  actual time=0.027..373.727 rows=1853216 loops=1  Planning 
> Time: 0.053 ms  Execution Time: 20437.643 ms
>
>
>
> test machine spec:
>
> CentOS 8.2, 128G RAM, 40 processors, disk SAS

Thanks a lot for the performance tests and test cases. I will analyze
why the performance is degrading one case and respond soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: Parallel Inserts in CREATE TABLE AS

2021-01-26 Thread Tang, Haiying
Hi Bharath,



I choose 5 cases which pick parallel insert plan in CTAS to measure the patched 
performance. Each case run 30 times.

Most of the tests execution become faster with this patch.

However, Test NO 4(create table xxx as table xxx.) appears performance 
degradation. I tested various table size(2/10/20 millions), they all have a 
6%-10% declines. I think it may need some check at this problem.



Below are my test results. 'Test NO' is corresponded to 'Test NO' in attached 
test_ctas.sql file.

reg%=(patched-master)/master

Test NO |  Test Case |reg%  | patched(ms)  | master(ms)

||--|--|-

1   |  CTAS select from table| -9%  | 16709.50477  | 18370.76660

2   |  Append plan   | -14% | 16542.97807  | 19305.86600

3   |  initial plan under Gather node| -5%  | 13374.27187  | 14120.02633

4   |  CTAS table| 10%  | 20835.48800  | 18986.40350

5   |  CTAS select from execute  | -6%  | 16973.73890  | 18008.59789



About Test NO 4:

In master(HEAD), this test case picks serial seq scan.

query plan likes:

--

Seq Scan on public.tenk1  (cost=0.00..444828.12 rows=1012 width=244) 
(actual time=0.005..1675.268 rows=1000 loops=1)

   Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, 
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4  
Planning Time: 0.053 ms  Execution Time: 20165.023 ms



With this patch, it will choose parallel seq scan and parallel insert.

query plan likes:

--

Gather  (cost=1000.00..370828.03 rows=1012 width=244) (actual 
time=20428.823..20437.143 rows=0 loops=1)

   Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, 
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4

   Workers Planned: 4

   Workers Launched: 4

->  Create test

   ->  Parallel Seq Scan on public.tenk1  (cost=0.00..369828.03 rows=253 
width=244) (actual time=0.021..411.094 rows=200 loops=5)

 Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, 
twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4

 Worker 0:  actual time=0.023..390.856 rows=1858407 loops=1

 Worker 1:  actual time=0.024..468.587 rows=2264494 loops=1

 Worker 2:  actual time=0.023..473.170 rows=2286580 loops=1

 Worker 3:  actual time=0.027..373.727 rows=1853216 loops=1  Planning 
Time: 0.053 ms  Execution Time: 20437.643 ms



test machine spec:

CentOS 8.2, 128G RAM, 40 processors, disk SAS



Regards,

Tang




test_ctas.sql
Description: test_ctas.sql


RE: Parallel Inserts in CREATE TABLE AS

2021-01-24 Thread Tang, Haiying
>Thanks a lot for the tests. In your test case, parallel insertions are not 
>being picked because the Gather node has
> some projections(floor(((random() * '1'::double precision) + >'1'::double 
> precision)) to perform. That's expected.
>Whenever parallel insertions are chosen for CTAS, we should see "Create 
>target_table '' under Gather node [1] and
>also the actual >row count for Gather node 0 (but in your test it is 
>rows=16268) in the explain analyze output.
>Coming to your test case, if it's modified to something like [1], where the 
>Gather >node has no projections,
>then parallel insertions will be chosen.

Thanks for your explanation and test.
Actually, I deliberately made my test case(with projection) to pick serial 
insert to make tuples unbalance distribution(99% tuples read by one worker) 
happened.
This issue will lead the performance regression.

But it's not introduced by your patch, it’s happening in master(HEAD).
Do you have some thoughts about this.

>[1] - I did this test on my development system, I will run on some performance 
>system and post my observations.

Thank you, It will be very kind of you to do this.
To reproduce above issue, you need to use my case(with projection). Because it 
won’t occur in “parallel insert”.

Regards,
Tang




Re: Parallel Inserts in CREATE TABLE AS

2021-01-22 Thread Bharath Rupireddy
On Fri, Jan 22, 2021 at 5:16 PM Tang, Haiying 
wrote:
>
> Hi Bharath,
>
> I'm trying to take some performance measurements on you patch v23.
> But when I started, I found an issue about the tuples unbalance
distribution among workers(99% tuples read by one worker) under specified
case which lead the "parallel select" part makes no performance gain.
> Then I find it's not introduced by your patch, because it's also
happening in master(HEAD). But I don't know how to deal with it , so I put
it here to see if anybody know what's going wrong with this or have good
ideas to deal this issue.
>
> Here are the conditions to produce the issue:
> 1. high CPU spec environment(say above 20 processors). In smaller CPU, it
also happen but not so obvious(40% tuples on one worker in my tests).
> 2. query plan is "serial insert + parallel select", I have reproduce this
behavior in (CTAS, Select into, insert into select).
> 3. select part needs to query large data size(e.g. query 100 million from
200 million).
>
> According to above, IMHO, I guess it may be caused by the leader write
rate can't catch the worker read rate, then the tuples of one worker
blocked in the queue, become more and more.
>
> Below is my test info:
> 1. test spec environment
>   CentOS 8.2, 128G RAM, 40 processors, disk SAS
>
> 2. test data prepare
>   create table x(a int, b int, c int);
>   create index on x(a);
>   insert into x select
generate_series(1,2),floor(random()*(10001-1)+1),floor(random()*(10001-1)+1);
>
> 3. test execute results
>   *Patched CTAS*: please look at worker 2, 99% tuples read by it.
>   explain analyze verbose create table test(a,b,c) as select
a,floor(random()*(10001-1)+1),c from x where b%2=0;
> QUERY PLAN
>
---
>  Gather  (cost=1000.00..1942082.77 rows=101 width=16) (actual
time=0.203..24023.686 rows=16268 loops=1)
>Output: a, floor(((random() * '1'::double precision) + '1'::double
precision)), c
>Workers Planned: 4
>Workers Launched: 4
>->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=25
width=8) (actual time=0.016..4367.035 rows=20001254 loops=5)
>  Output: a, c
>  Filter: ((x.b % 2) = 0)
>  Rows Removed by Filter: 19998746
>  Worker 0:  actual time=0.016..19.265 rows=94592 loops=1
>  Worker 1:  actual time=0.027..31.422 rows=94574 loops=1
>  Worker 2:  actual time=0.014..21744.549 rows=99627749 loops=1
>  Worker 3:  actual time=0.015..19.347 rows=94586 loops=1
 Planning Time: 0.098 ms  Execution Time: 91054.828 ms
>
>   *Non-patched CTAS*: please look at worker 0, also 99% tuples read by it.
>   explain analyze verbose create table test(a,b,c) as select
a,floor(random()*(10001-1)+1),c from x where b%2=0;
> QUERY PLAN
>
---
>Gather  (cost=1000.00..1942082.77 rows=101 width=16) (actual
time=0.283..19216.157 rows=13148 loops=1)
>Output: a, floor(((random() * '1'::double precision) + '1'::double
precision)), c
>Workers Planned: 4
>Workers Launched: 4
>->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=25
width=8) (actual time=0.020..4380.360 rows=2630 loops=5)
>  Output: a, c
>  Filter: ((x.b % 2) = 0)
>  Rows Removed by Filter: 1370
>  Worker 0:  actual time=0.013..21805.647 rows=99624833 loops=1
>  Worker 1:  actual time=0.016..19.790 rows=94398 loops=1
>  Worker 2:  actual time=0.013..35.340 rows=94423 loops=1
>  Worker 3:  actual time=0.035..19.849 rows=94679 loops=1
 Planning Time: 0.083 ms  Execution Time: 91151.097 ms
>
> I'm still working on the performance tests on your patch, if I make some
progress, I will post my results here.

Thanks a lot for the tests. In your test case, parallel insertions are not
being picked because the Gather node has some projections(floor(((random()
* '1'::double precision) + '1'::double precision)) to perform. That's
expected. Whenever parallel insertions are chosen for CTAS, we should see
"Create target_table '' under Gather node [1] and also the actual row count
for Gather node 0 (but in your test it is rows=16268) in the explain
analyze output. Coming to your test case, if it's modified to something
like [1], where the Gather node has no projections, then parallel
insertions will be chosen.

[1] - I did this test on my development system, I will run on some
performance system and post my observations.
postgres=# explain (analyze, verbose) create table test(a,b,c) as select
a,b,c from x where b%2=0;
 QUERY PLAN


RE: Parallel Inserts in CREATE TABLE AS

2021-01-22 Thread Tang, Haiying
Hi Bharath,

I'm trying to take some performance measurements on you patch v23.  
But when I started, I found an issue about the tuples unbalance distribution 
among workers(99% tuples read by one worker) under specified case which lead 
the "parallel select" part makes no performance gain.
Then I find it's not introduced by your patch, because it's also happening in 
master(HEAD). But I don't know how to deal with it , so I put it here to see if 
anybody know what's going wrong with this or have good ideas to deal this issue.

Here are the conditions to produce the issue:
1. high CPU spec environment(say above 20 processors). In smaller CPU, it also 
happen but not so obvious(40% tuples on one worker in my tests).
2. query plan is "serial insert + parallel select", I have reproduce this 
behavior in (CTAS, Select into, insert into select).
3. select part needs to query large data size(e.g. query 100 million from 200 
million).

According to above, IMHO, I guess it may be caused by the leader write rate 
can't catch the worker read rate, then the tuples of one worker blocked in the 
queue, become more and more.

Below is my test info:
1. test spec environment
  CentOS 8.2, 128G RAM, 40 processors, disk SAS 

2. test data prepare
  create table x(a int, b int, c int);
  create index on x(a);
  insert into x select 
generate_series(1,2),floor(random()*(10001-1)+1),floor(random()*(10001-1)+1);

3. test execute results
  *Patched CTAS*: please look at worker 2, 99% tuples read by it.
  explain analyze verbose create table test(a,b,c) as select 
a,floor(random()*(10001-1)+1),c from x where b%2=0;
QUERY PLAN
  
---
 Gather  (cost=1000.00..1942082.77 rows=101 width=16) (actual 
time=0.203..24023.686 rows=16268 loops=1)
   Output: a, floor(((random() * '1'::double precision) + '1'::double 
precision)), c
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=25 
width=8) (actual time=0.016..4367.035 rows=20001254 loops=5)
 Output: a, c
 Filter: ((x.b % 2) = 0)
 Rows Removed by Filter: 19998746
 Worker 0:  actual time=0.016..19.265 rows=94592 loops=1
 Worker 1:  actual time=0.027..31.422 rows=94574 loops=1
 Worker 2:  actual time=0.014..21744.549 rows=99627749 loops=1
 Worker 3:  actual time=0.015..19.347 rows=94586 loops=1  Planning 
Time: 0.098 ms  Execution Time: 91054.828 ms

  *Non-patched CTAS*: please look at worker 0, also 99% tuples read by it.
  explain analyze verbose create table test(a,b,c) as select 
a,floor(random()*(10001-1)+1),c from x where b%2=0;
QUERY PLAN
  
---
   Gather  (cost=1000.00..1942082.77 rows=101 width=16) (actual 
time=0.283..19216.157 rows=13148 loops=1)
   Output: a, floor(((random() * '1'::double precision) + '1'::double 
precision)), c
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Seq Scan on public.x  (cost=0.00..1831082.66 rows=25 
width=8) (actual time=0.020..4380.360 rows=2630 loops=5)
 Output: a, c
 Filter: ((x.b % 2) = 0)
 Rows Removed by Filter: 1370
 Worker 0:  actual time=0.013..21805.647 rows=99624833 loops=1
 Worker 1:  actual time=0.016..19.790 rows=94398 loops=1
 Worker 2:  actual time=0.013..35.340 rows=94423 loops=1
 Worker 3:  actual time=0.035..19.849 rows=94679 loops=1  Planning 
Time: 0.083 ms  Execution Time: 91151.097 ms

I'm still working on the performance tests on your patch, if I make some 
progress, I will post my results here.

Regards,
Tang




Re: Parallel Inserts in CREATE TABLE AS

2021-01-12 Thread Luc Vlaming

On 06-01-2021 09:32, Bharath Rupireddy wrote:

On Tue, Jan 5, 2021 at 1:25 PM Luc Vlaming  wrote:

wrt v18-0002patch:

It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs

what i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?


IMO, The reason is we want to make sure we only ignore the cost when Gather is 
the top node.
And it seems the generate_useful_gather_paths called in 
apply_scanjoin_target_to_paths is the right place which can only create top 
node Gather.
So we change the flag in apply_scanjoin_target_to_paths around 
generate_useful_gather_paths to identify the top node.


Right. We wanted to ignore parallel tuple cost for only the upper Gather path.


I was wondering actually if we need the state machine. Reason is that as
AFAICS the code could be placed in create_gather_path, where you can
also check if it is a top gather node, whether the dest receiver is the
right type, etc? To me that seems like a nicer solution as its makes
that all logic that decides whether or not a parallel CTAS is valid is
in a single place instead of distributed over various places.


IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.

I may be wrong. Thoughts?


So the way I understand it the requirements are:
- it needs to be the top-most gather
- it should not do anything with the rows after the gather node as this
would make the parallel inserts conceptually invalid.


Right.


Right now we're trying to judge what might be added on-top that could
change the rows by inspecting all parts of the root object that would
cause anything to be added, and add a little statemachine to track the
state of that knowledge. To me this has the downside that the list in
HAS_PARENT_PATH_GENERATING_CLAUSE has to be exhaustive, and we need to
make sure it stays up-to-date, which could result in regressions if not
tracked carefully.


Right. Any new clause that will be added which generates an upper path
in grouping_planner after apply_scanjoin_target_to_paths also needs to
be added to HAS_PARENT_PATH_GENERATING_CLAUSE. Otherwise, we might
ignore the parallel tuple cost because of which the parallel plan may
be chosen and we go for parallel inserts only when the top node is
Gather. I don't think any new clause that will be added generates a
new upper Gather node in grouping_planner after
apply_scanjoin_target_to_paths.


Personally I would therefore go for a design which is safe in the sense
that regressions are not as easily introduced. IMHO that could be done
by inspecting the planned query afterwards, and then judging whether or
not the parallel inserts are actually the right thing to do.


The 0001 patch does that. It doesn't have any influence on the planner
for parallel tuple cost calculation, it just looks at the generated
plan and decides on parallel inserts. Having said that, we might miss
parallel plans even though we know that there will not be tuples
transferred from workers to Gather. So, 0002 patch adds the code for
influencing the planner for parallel tuple cost.



Ok. Thanks for the explanation and sorry for the confusion.


Another way to create more safety against regressions would be to add an
assert upon execution of the query that if we do parallel inserts that
only a subset of allowed nodes exists above the gather node.


Yes, we already do this. Please have a look at
SetParallelInsertState() in the 0002 patch. The idea is that in any
case, if the planner ignored the tuple cost, but we later not allow
parallel inserts either due to the upper node is not Gather or Gather
with projections. The assertion fails. So, in case any new parent path
generating clause is added (apart from the ones that are there in
HAS_PARENT_PATH_GENERATING_CLAUSE) and we ignore the tuple cost, then
this Assert will catch it. Currently, I couldn't find any assertion
failures in my debug build with make check and make check world.



Ok. Seems I missed that assert when reviewing.


+else
+{
+/*
+ * Upper Gather node has projections, so parallel 

RE: Parallel Inserts in CREATE TABLE AS

2021-01-10 Thread Hou, Zhijie
> Attaching v21 patch set, which has following changes:
> 1) 0001 - changed fpes->ins_cmd_type ==
> PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type !=
> PARALLEL_INSERT_CMD_UNDEF
> 2) 0002 - reworded the commit message.
> 3) 0003 - added cmin, xmin test case to one of the parallel insert cases
> to ensure leader and worker insert the tuples in the same xact and replaced
> memory usage output in numbers like 25kB to NkB to make the tests stable.
> 4) 0004 - updated one of the test output to be in NkB and made the assertion
> in SetParallelInsertState to be not under an if condition.
> 
> There's one open point [1] on selective skipping of error "cannot insert
> tuples in a parallel worker" in heap_prepare_insert(), thoughts are
> welcome.
> 
> Please consider the v21 patch set for further review.

Hi,

I took a look into the new patch and have some comments.

1.
+   /*
+* Do not consider tuple cost in case of we intend to perform parallel
+* inserts by workers. We would have turned on the ignore flag in
+* apply_scanjoin_target_to_paths before generating Gather path for the
+* upper level SELECT part of the query.
+*/
+   if ((root->parse->parallelInsCmdTupleCostOpt &
+PARALLEL_INSERT_SELECT_QUERY) &&
+   (root->parse->parallelInsCmdTupleCostOpt &
+PARALLEL_INSERT_CAN_IGN_TUP_COST))

Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ?
IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when 
PARALLEL_INSERT_SELECT_QUERY is set.


2.
+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+  void *ins_info)
...
+   info = (ParallelInsertCTASInfo *) ins_info;
+   intoclause_str = nodeToString(info->intoclause);
+   intoclause_len = strlen(intoclause_str) + 1;

+static void
+SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+  void *ins_info)
...
+   info = (ParallelInsertCTASInfo *)ins_info;
+   intoclause_str = nodeToString(info->intoclause);
+   intoclause_len = strlen(intoclause_str) + 1;
+   intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len);

I noticed the above code will call nodeToString and strlen twice which seems 
unnecessary.
Do you think it's better to store the result of nodetostring and strlen first 
and pass them when used ?

3.
+   if (node->need_to_scan_locally || node->nworkers_launched == 0)
+   {
+   EState *estate = node->ps.state;
+   TupleTableSlot *outerTupleSlot;
+
+   for(;;)
+   {
+   /* Install our DSA area while executing the plan. */
+   estate->es_query_dsa =
+   node->pei ? node->pei->area : NULL;
...
+   node->ps.state->es_processed++;
+   }

How about use the variables estate like 'estate-> es_processed++;'
Instead of node->ps.state->es_processed++;

Best regards,
houzj







Re: Parallel Inserts in CREATE TABLE AS

2021-01-06 Thread Zhihong Yu
Thanks for the clarification.

w.r.t. the commit message, maybe I was momentarily sidetracked by the
phrase: With this change.
It seems the scenarios you listed are known parallel safety constraints.

Probably rephrase that sentence a little bit to make this clearer.

Cheers

On Wed, Jan 6, 2021 at 8:01 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu  wrote:
> >
> > Hi,
> > For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
> >
> > workers to Gather node to 0. With this change, there are chances
> > that the planner may choose the parallel plan.
> >
> > It would be nice if the scenarios where a parallel plan is not chosen
> are listed.
>
> There are many reasons, the planner may not choose a parallel plan for
> the select part, for instance if there are temporary tables, parallel
> unsafe functions, or the parallelism GUCs are not set properly,
> foreign tables and so on. see
> https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
> think so, we will add all the scenarios into the commit message.
>
> Having said that, we have extensive comments in the code(especially in
> the function SetParallelInsertState) about when parallel inserts are
> chosen.
>
> + * Parallel insertions are possible only if the upper node is Gather.
>   */
> +if (!IsA(gstate, GatherState))
>  return;
>
> + * Parallelize inserts only when the upper Gather node has no
> projections.
>   */
> +if (!gstate->ps.ps_ProjInfo)
> +{
> +/* Okay to parallelize inserts, so mark it. */
> +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +((DR_intorel *) dest)->is_parallel = true;
> +
> +/*
> + * For parallelizing inserts, we must send some information so
> that the
> + * workers can build their own dest receivers. For CTAS, this
> info is
> + * into clause, object id (to open the created table).
> + *
> + * Since the required information is available in the dest
> receiver,
> + * store a reference to it in the Gather state so that it will be
> used
> + * in ExecInitParallelPlan to pick the information.
> + */
> +gstate->dest = dest;
> +}
> +else
> +{
> +/*
> + * Upper Gather node has projections, so parallel insertions are
> not
> + * allowed.
> + */
>
> > +   if ((root->parse->parallelInsCmdTupleCostOpt &
> > +PARALLEL_INSERT_SELECT_QUERY) &&
> > +   (root->parse->parallelInsCmdTupleCostOpt &
> > +PARALLEL_INSERT_CAN_IGN_TUP_COST))
> > +   {
> > +   /* We are ignoring the parallel tuple cost, so mark it. */
> > +   root->parse->parallelInsCmdTupleCostOpt |=
> > +
>  PARALLEL_INSERT_TUP_COST_IGNORED;
> >
> > If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and
> PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED
> is implied.
> >
> > Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the
> setting (1) of the first two bits should suffice.
>
> The way these flags work is as follows: before planning in CTAS, we
> set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
> gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
> actually ignored the tuple cost in cost_gather we set
> PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
> PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
> generate_useful_gather_paths, and the function
> generate_useful_gather_paths can return before reaching cost_gather,
> see below snippets. So, we need the 3 flags.
>
> void
> generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
> override_rows)
> {
> ListCell   *lc;
> doublerows;
> double   *rowsp = NULL;
> List   *useful_pathkeys_list = NIL;
> Path   *cheapest_partial_path = NULL;
>
> /* If there are no partial paths, there's nothing to do here. */
> if (rel->partial_pathlist == NIL)
> return;
>
> /* Should we override the rel's rowcount estimate? */
> if (override_rows)
> rowsp = 
>
> /* generate the regular gather (merge) paths */
> generate_gather_paths(root, rel, override_rows);
>
> void
> generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
> override_rows)
> {
> Path   *cheapest_partial_path;
> Path   *simple_gather_path;
> ListCell   *lc;
> doublerows;
> double   *rowsp = NULL;
>
> /* If there are no partial paths, there's nothing to do here. */
> if (rel->partial_pathlist == NIL)
> return;
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Parallel Inserts in CREATE TABLE AS

2021-01-06 Thread Bharath Rupireddy
On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu  wrote:
>
> Hi,
> For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
>
> workers to Gather node to 0. With this change, there are chances
> that the planner may choose the parallel plan.
>
> It would be nice if the scenarios where a parallel plan is not chosen are 
> listed.

There are many reasons, the planner may not choose a parallel plan for
the select part, for instance if there are temporary tables, parallel
unsafe functions, or the parallelism GUCs are not set properly,
foreign tables and so on. see
https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
think so, we will add all the scenarios into the commit message.

Having said that, we have extensive comments in the code(especially in
the function SetParallelInsertState) about when parallel inserts are
chosen.

+ * Parallel insertions are possible only if the upper node is Gather.
  */
+if (!IsA(gstate, GatherState))
 return;

+ * Parallelize inserts only when the upper Gather node has no projections.
  */
+if (!gstate->ps.ps_ProjInfo)
+{
+/* Okay to parallelize inserts, so mark it. */
+if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+((DR_intorel *) dest)->is_parallel = true;
+
+/*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is
+ * into clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver,
+ * store a reference to it in the Gather state so that it will be used
+ * in ExecInitParallelPlan to pick the information.
+ */
+gstate->dest = dest;
+}
+else
+{
+/*
+ * Upper Gather node has projections, so parallel insertions are not
+ * allowed.
+ */

> +   if ((root->parse->parallelInsCmdTupleCostOpt &
> +PARALLEL_INSERT_SELECT_QUERY) &&
> +   (root->parse->parallelInsCmdTupleCostOpt &
> +PARALLEL_INSERT_CAN_IGN_TUP_COST))
> +   {
> +   /* We are ignoring the parallel tuple cost, so mark it. */
> +   root->parse->parallelInsCmdTupleCostOpt |=
> +   PARALLEL_INSERT_TUP_COST_IGNORED;
>
> If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and 
> PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED is 
> implied.
>
> Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting 
> (1) of the first two bits should suffice.

The way these flags work is as follows: before planning in CTAS, we
set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
actually ignored the tuple cost in cost_gather we set
PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
generate_useful_gather_paths, and the function
generate_useful_gather_paths can return before reaching cost_gather,
see below snippets. So, we need the 3 flags.

void
generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
override_rows)
{
ListCell   *lc;
doublerows;
double   *rowsp = NULL;
List   *useful_pathkeys_list = NIL;
Path   *cheapest_partial_path = NULL;

/* If there are no partial paths, there's nothing to do here. */
if (rel->partial_pathlist == NIL)
return;

/* Should we override the rel's rowcount estimate? */
if (override_rows)
rowsp = 

/* generate the regular gather (merge) paths */
generate_gather_paths(root, rel, override_rows);

void
generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
{
Path   *cheapest_partial_path;
Path   *simple_gather_path;
ListCell   *lc;
doublerows;
double   *rowsp = NULL;

/* If there are no partial paths, there's nothing to do here. */
if (rel->partial_pathlist == NIL)
return;


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-06 Thread Zhihong Yu
Hi,
For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :

workers to Gather node to 0. With this change, there are chances
that the planner may choose the parallel plan.

It would be nice if the scenarios where parallel plan is not chosen are
listed.

+   if ((root->parse->parallelInsCmdTupleCostOpt &
+PARALLEL_INSERT_SELECT_QUERY) &&
+   (root->parse->parallelInsCmdTupleCostOpt &
+PARALLEL_INSERT_CAN_IGN_TUP_COST))
+   {
+   /* We are ignoring the parallel tuple cost, so mark it. */
+   root->parse->parallelInsCmdTupleCostOpt |=
+
PARALLEL_INSERT_TUP_COST_IGNORED;

If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY
and PARALLEL_INSERT_CAN_IGN_TUP_COST are
set, PARALLEL_INSERT_TUP_COST_IGNORED is implied.

Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting
(1) of the first two bits should suffice.

Cheers

On Mon, Jan 4, 2021 at 7:59 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
>  wrote:
> >
> > > +   if
> (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> > > +   ((DR_intorel *)
> gstate->dest)->into->rel &&
> > > +   ((DR_intorel *)
> gstate->dest)->into->rel->relname)
> > > why would rel and relname not be there? if no rows have been inserted?
> > > because it seems from the intorel_startup function that that would be
> > > set as soon as startup was done, which i assume (wrongly?) is always
> done?
> >
> > Actually, that into clause rel variable is always being set in the
> gram.y for CTAS, Create Materialized View and SELECT INTO (because
> qualified_name non-terminal is not optional). My bad. I just added it as a
> sanity check. Actually, it's not required.
> >
> > create_as_target:
> > qualified_name opt_column_list table_access_method_clause
> > OptWith OnCommitOption OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > create_mv_target:
> > qualified_name opt_column_list table_access_method_clause
> opt_reloptions OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > into_clause:
> > INTO OptTempTableName
> > {
> > $$ = makeNode(IntoClause);
> >$$->rel = $2;
> >
> > I will change the below code:
> > +if (GetParallelInsertCmdType(gstate->dest) ==
> > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
> > +((DR_intorel *) gstate->dest)->into &&
> > +((DR_intorel *) gstate->dest)->into->rel &&
> > +((DR_intorel *)
> gstate->dest)->into->rel->relname)
> > +{
> >
> > to:
> > +if (GetParallelInsertCmdType(gstate->dest) ==
> > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +{
> >
> > I will update this in the next version of the patch set.
>
> Attaching v20 patch set that has above change in 0001 patch, note that
> 0002 to 0004 patches have no changes from v19. Please consider the v20
> patch set for further review.
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Parallel Inserts in CREATE TABLE AS

2021-01-06 Thread Bharath Rupireddy
On Tue, Jan 5, 2021 at 1:25 PM Luc Vlaming  wrote:
>  wrt v18-0002patch:
> 
>  It looks like this introduces a state machine that goes like:
>  - starts at CTAS_PARALLEL_INS_UNDEF
>  - possibly moves to CTAS_PARALLEL_INS_SELECT
>  - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
>  - if both were added at some stage, we can go to
>  CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
> 
>  what i'm wondering is why you opted to put logic around
>  generate_useful_gather_paths and in cost_gather when to me it seems more
>  logical to put it in create_gather_path? i'm probably missing something
>  there?
> >>>
> >>> IMO, The reason is we want to make sure we only ignore the cost when 
> >>> Gather is the top node.
> >>> And it seems the generate_useful_gather_paths called in 
> >>> apply_scanjoin_target_to_paths is the right place which can only create 
> >>> top node Gather.
> >>> So we change the flag in apply_scanjoin_target_to_paths around 
> >>> generate_useful_gather_paths to identify the top node.
> >
> > Right. We wanted to ignore parallel tuple cost for only the upper Gather 
> > path.
> >
> >> I was wondering actually if we need the state machine. Reason is that as
> >> AFAICS the code could be placed in create_gather_path, where you can
> >> also check if it is a top gather node, whether the dest receiver is the
> >> right type, etc? To me that seems like a nicer solution as its makes
> >> that all logic that decides whether or not a parallel CTAS is valid is
> >> in a single place instead of distributed over various places.
> >
> > IMO, we can't determine the fact that we are going to generate the top
> > Gather path in create_gather_path. To decide on whether or not the top
> > Gather path generation, I think it's not only required to check the
> > root->query_level == 1 but we also need to rely on from where
> > generate_useful_gather_paths gets called. For instance, for
> > query_level 1, generate_useful_gather_paths gets called from 2 places
> > in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
> > gets called from many places. IMO, the current way i.e. setting flag
> > it in apply_scanjoin_target_to_paths and ignoring based on that in
> > cost_gather seems safe.
> >
> > I may be wrong. Thoughts?
>
> So the way I understand it the requirements are:
> - it needs to be the top-most gather
> - it should not do anything with the rows after the gather node as this
> would make the parallel inserts conceptually invalid.

Right.

> Right now we're trying to judge what might be added on-top that could
> change the rows by inspecting all parts of the root object that would
> cause anything to be added, and add a little statemachine to track the
> state of that knowledge. To me this has the downside that the list in
> HAS_PARENT_PATH_GENERATING_CLAUSE has to be exhaustive, and we need to
> make sure it stays up-to-date, which could result in regressions if not
> tracked carefully.

Right. Any new clause that will be added which generates an upper path
in grouping_planner after apply_scanjoin_target_to_paths also needs to
be added to HAS_PARENT_PATH_GENERATING_CLAUSE. Otherwise, we might
ignore the parallel tuple cost because of which the parallel plan may
be chosen and we go for parallel inserts only when the top node is
Gather. I don't think any new clause that will be added generates a
new upper Gather node in grouping_planner after
apply_scanjoin_target_to_paths.

> Personally I would therefore go for a design which is safe in the sense
> that regressions are not as easily introduced. IMHO that could be done
> by inspecting the planned query afterwards, and then judging whether or
> not the parallel inserts are actually the right thing to do.

The 0001 patch does that. It doesn't have any influence on the planner
for parallel tuple cost calculation, it just looks at the generated
plan and decides on parallel inserts. Having said that, we might miss
parallel plans even though we know that there will not be tuples
transferred from workers to Gather. So, 0002 patch adds the code for
influencing the planner for parallel tuple cost.

> Another way to create more safety against regressions would be to add an
> assert upon execution of the query that if we do parallel inserts that
> only a subset of allowed nodes exists above the gather node.

Yes, we already do this. Please have a look at
SetParallelInsertState() in the 0002 patch. The idea is that in any
case, if the planner ignored the tuple cost, but we later not allow
parallel inserts either due to the upper node is not Gather or Gather
with projections. The assertion fails. So, in case any new parent path
generating clause is added (apart from the ones that are there in
HAS_PARENT_PATH_GENERATING_CLAUSE) and we ignore the tuple cost, then
this Assert will catch it. Currently, I couldn't find any assertion
failures in my debug build with make check and make check world.


RE: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Hou, Zhijie
> >
> > +   /* Okay to parallelize inserts, so mark it. */
> > +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +   ((DR_intorel *) dest)->is_parallel = true;
> >
> > +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +   ((DR_intorel *) dest)->is_parallel = false;
> 
> We need to know exactly what is the command in above place, to dereference
> and mark is_parallel to true, because is_parallel is being added to the
> respective structures, not to the generic _DestReceiver structure. So, in
> future the above code becomes something like below:
> 
> +/* Okay to parallelize inserts, so mark it. */
> +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +((DR_intorel *) dest)->is_parallel = true;
> +else if (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
> +((DR_transientrel *) dest)->is_parallel = true;
> +else if (ins_cmd == PARALLEL_INSERT_CMD_COPY_TO)
> +((DR_copy *) dest)->is_parallel = true;
> 
> In the below place, instead of new function, I think we can just have
> something like if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)
> 
> > Or
> >
> > +   if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +   pg_atomic_add_fetch_u64(>processed,
> > + queryDesc->estate->es_processed);
> >
> > If you think the above code will extend the ins_cmd type check in the
> future, the generic function may make sense.
> 
> We can also change below to fpes->ins_cmd_type !=
> PARALLEL_INSERT_CMD_UNDEF.
> 
> +if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +receiver = ExecParallelGetInsReceiver(toc, fpes);
> 
> If okay, I will modify it in the next version of the patch.

Yes, that looks good to me.

Best regards,
houzj




Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Luc Vlaming

On 05-01-2021 13:57, Bharath Rupireddy wrote:

On Tue, Jan 5, 2021 at 1:00 PM Luc Vlaming  wrote:

Reviewing further v20-0001:

I would still opt for moving the code for the parallel worker into a
separate function, and then setting rStartup of the dest receiver to
that function in ExecParallelGetInsReceiver, as its completely
independent code. Just a matter of style I guess.


If we were to have a intorel_startup_worker and assign it to
self->pub.rStartup, 1) we can do it in the CreateIntoRelDestReceiver,
we have to pass a parameter to CreateIntoRelDestReceiver as an
indication of parallel worker, which requires code changes in places
wherever CreateIntoRelDestReceiver is used. 2) we can also assign
intorel_startup_worker after CreateIntoRelDestReceiver in
ExecParallelGetInsReceiver, but that doesn't look good to me. 3) we
can duplicate CreateIntoRelDestReceiver and have a
CreateIntoRelParallelDestReceiver with the only change being that
self->pub.rStartup = intorel_startup_worker;

IMHO, the way it is currently, looks good. Anyways, I'm open to
changing that if we agree on any of the above 3 ways.


The current way is good enough, it was a suggestion as personally I find 
it hard to read to have two completely separate code paths in the same 
function. If any I would opt for something like 3) where there's a 
CreateIntoRelParallelDestReceiver which calls CreateIntoRelDestReceiver 
and then overrides rStartup to intorel_startup_worker. Then no callsites 
have to change except the ones that are for parallel workers.


If we were to do any of the above, then we might have to do the same
thing for other commands Refresh Materialized View or Copy To where we
can parallelize.

Thoughts?


Maybe I'm not completely following why but afaics we want parallel
inserts in various scenarios, not just CTAS? I'm asking because code like
+   if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+   pg_atomic_add_fetch_u64(>processed,
queryDesc->estate->es_processed);
seems very specific to CTAS. For now that seems fine but I suppose that
would be generalized soon after? Basically I would have expected the if
to compare against PARALLEL_INSERT_CMD_UNDEF.


After this patch is reviewed and goes for commit, then the next thing
I plan to do is to allow parallel inserts in Refresh Materialized View
and it can be used for that. I think the processed variable can also
be used for parallel inserts in INSERT INTO SELECT [1] as well.
Currently, I'm keeping it for CTAS, maybe later (after this is
committed) it can be generalized.

Thoughts?


Sounds good



[1] - 
https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com


Apart from these small things v20-0001 looks (very) good to me.
v20-0003 and v20-0004:
looks good to me.


Thanks.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Kind regards,
Luc




Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Bharath Rupireddy
On Wed, Jan 6, 2021 at 11:30 AM Hou, Zhijie  wrote:
>
> > > I think it makes sense.
> > >
> > > And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be
> > > used in some places, How about define a generic function with some comment
> > to mention the purpose.
> > >
> > > An example in INSERT INTO SELECT patch:
> > > +/*
> > > + * IsModifySupportedInParallelMode
> > > + *
> > > + * Indicates whether execution of the specified table-modification
> > > +command
> > > + * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to
> > > +certain
> > > + * parallel-safety conditions.
> > > + */
> > > +static inline bool
> > > +IsModifySupportedInParallelMode(CmdType commandType) {
> > > +   /* Currently only INSERT is supported */
> > > +   return (commandType == CMD_INSERT); }
> >
> > The intention of assert is to verify that those functions are called for
> > appropriate commands such as CTAS, Refresh Mat View and so on with correct
> > parameters. I really don't think so we can replace the assert with a 
> > function
> > like above, in the release mode assertion will always be true. In a way,
> > that assertion is for only debugging purposes. And I also think that when
> > we as the callers know when to call those new functions, we can even remove
> > the assertions, if they are really a problem here. Thoughts?
> Hi
>
> Thanks for the explanation.
>
> If the check about command type is only used in assert, I think you are right.
> I suggested a new function because I guess the check can be used in some 
> other places.
> Such as:
>
> +   /* Okay to parallelize inserts, so mark it. */
> +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +   ((DR_intorel *) dest)->is_parallel = true;
>
> +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +   ((DR_intorel *) dest)->is_parallel = false;

We need to know exactly what is the command in above place, to
dereference and mark is_parallel to true, because is_parallel is being
added to the respective structures, not to the generic _DestReceiver
structure. So, in future the above code becomes something like below:

+/* Okay to parallelize inserts, so mark it. */
+if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+((DR_intorel *) dest)->is_parallel = true;
+else if (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
+((DR_transientrel *) dest)->is_parallel = true;
+else if (ins_cmd == PARALLEL_INSERT_CMD_COPY_TO)
+((DR_copy *) dest)->is_parallel = true;

In the below place, instead of new function, I think we can just have
something like if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF)

> Or
>
> +   if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +   pg_atomic_add_fetch_u64(>processed, 
> queryDesc->estate->es_processed);
>
> If you think the above code will extend the ins_cmd type check in the future, 
> the generic function may make sense.

We can also change below to fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF.

+if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+receiver = ExecParallelGetInsReceiver(toc, fpes);

If okay, I will modify it in the next version of the patch.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Dilip Kumar
On Wed, Jan 6, 2021 at 11:26 AM Bharath Rupireddy
 wrote:
>
> On Wed, Jan 6, 2021 at 10:17 AM Dilip Kumar  wrote:
> >
> > On Wed, Jan 6, 2021 at 9:23 AM Bharath Rupireddy
> >  wrote:
> > >
> >
> > +/*
> > + * List the commands here for which parallel insertions are possible.
> > + */
> > +typedef enum ParallelInsertCmdKind
> > +{
> > + PARALLEL_INSERT_CMD_UNDEF = 0,
> > + PARALLEL_INSERT_CMD_CREATE_TABLE_AS
> > +} ParallelInsertCmdKind;
> >
> > I see there is some code that is generic for CTAS and INSERT INTO
> > SELECT *,  So is it
> > possible to take out that common code to a separate base patch?  Later
> > both CTAS and INSERT INTO SELECT * can expand
> > that for their usage.
>
> I currently see the common code for parallel inserts i.e. insert into
> selects, copy, ctas/create mat view/refresh mat view is the code in -
> heapam.c, xact.c and xact.h. I can make a separate patch if required
> for these changes alone. Thoughts?

I just saw this structure (ParallelInsertCmdKind) where it is defining
the ParallelInsertCmdKind and also usage is different based on the
command type.  So I think the code which is defining the generic code
e.g. this structure and other similar code can go to the first patch
and we can build the remaining patch atop that patch.  But if you
think this is just this structure and not much code is common then we
can let it be.

> IIRC parallel inserts in insert into select and copy don't use the
> design idea of pushing the dest receiver down to Gather. Whereas
> ctas/create mat view, refresh mat view, copy to can use the idea of
> pushing the dest receiver to Gather and can easily extend on the
> patches I made here.
>
> Is there anything else do you feel that we can have in common?

Nothing specific.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




RE: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Hou, Zhijie
> > I think it makes sense.
> >
> > And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be
> > used in some places, How about define a generic function with some comment
> to mention the purpose.
> >
> > An example in INSERT INTO SELECT patch:
> > +/*
> > + * IsModifySupportedInParallelMode
> > + *
> > + * Indicates whether execution of the specified table-modification
> > +command
> > + * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to
> > +certain
> > + * parallel-safety conditions.
> > + */
> > +static inline bool
> > +IsModifySupportedInParallelMode(CmdType commandType) {
> > +   /* Currently only INSERT is supported */
> > +   return (commandType == CMD_INSERT); }
> 
> The intention of assert is to verify that those functions are called for
> appropriate commands such as CTAS, Refresh Mat View and so on with correct
> parameters. I really don't think so we can replace the assert with a function
> like above, in the release mode assertion will always be true. In a way,
> that assertion is for only debugging purposes. And I also think that when
> we as the callers know when to call those new functions, we can even remove
> the assertions, if they are really a problem here. Thoughts?
Hi

Thanks for the explanation.

If the check about command type is only used in assert, I think you are right.
I suggested a new function because I guess the check can be used in some other 
places.
Such as:

+   /* Okay to parallelize inserts, so mark it. */
+   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+   ((DR_intorel *) dest)->is_parallel = true;

+   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+   ((DR_intorel *) dest)->is_parallel = false;

Or

+   if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+   pg_atomic_add_fetch_u64(>processed, 
queryDesc->estate->es_processed);

If you think the above code will extend the ins_cmd type check in the future, 
the generic function may make sense.

Best regards,
houzj 





Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Bharath Rupireddy
On Wed, Jan 6, 2021 at 10:17 AM Dilip Kumar  wrote:
>
> On Wed, Jan 6, 2021 at 9:23 AM Bharath Rupireddy
>  wrote:
> >
>
> +/*
> + * List the commands here for which parallel insertions are possible.
> + */
> +typedef enum ParallelInsertCmdKind
> +{
> + PARALLEL_INSERT_CMD_UNDEF = 0,
> + PARALLEL_INSERT_CMD_CREATE_TABLE_AS
> +} ParallelInsertCmdKind;
>
> I see there is some code that is generic for CTAS and INSERT INTO
> SELECT *,  So is it
> possible to take out that common code to a separate base patch?  Later
> both CTAS and INSERT INTO SELECT * can expand
> that for their usage.

I currently see the common code for parallel inserts i.e. insert into
selects, copy, ctas/create mat view/refresh mat view is the code in -
heapam.c, xact.c and xact.h. I can make a separate patch if required
for these changes alone. Thoughts?

IIRC parallel inserts in insert into select and copy don't use the
design idea of pushing the dest receiver down to Gather. Whereas
ctas/create mat view, refresh mat view, copy to can use the idea of
pushing the dest receiver to Gather and can easily extend on the
patches I made here.

Is there anything else do you feel that we can have in common?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Bharath Rupireddy
On Wed, Jan 6, 2021 at 11:06 AM Hou, Zhijie  wrote:
>
> > > For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
> > >
> > > ParallelInsCmdEstimate :
> > >
> > > +   Assert(pcxt && ins_info &&
> > > +  (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
> > > +
> > > +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > >
> > > Sinc the if condition is covered by the assertion, I wonder why the if
> > check is still needed.
> > >
> > > Similar comment for SaveParallelInsCmdFixedInfo and
> > > SaveParallelInsCmdInfo
> >
> > Thanks.
> >
> > The idea is to have assertion with all the expected ins_cmd types, and then
> > later to have selective handling for different ins_cmds. For example, if
> > we add (in future) parallel insertion in Refresh Materialized View, then
> > the code in those functions will be something
> > like:
> >
> > +static void
> > +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind
> > ins_cmd,
> > +   void *ins_info)
> > +{
> > +Assert(pcxt && ins_info &&
> > +   (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS ||
> > +(ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW));
> > +
> > +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +{
> > +
> > +}
> > +   else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
> > +   {
> > +
> > +   }
> >
> > Similarly for other functions as well.
>
> I think it makes sense.
>
> And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be used in 
> some places,
> How about define a generic function with some comment to mention the purpose.
>
> An example in INSERT INTO SELECT patch:
> +/*
> + * IsModifySupportedInParallelMode
> + *
> + * Indicates whether execution of the specified table-modification command
> + * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to certain
> + * parallel-safety conditions.
> + */
> +static inline bool
> +IsModifySupportedInParallelMode(CmdType commandType)
> +{
> +   /* Currently only INSERT is supported */
> +   return (commandType == CMD_INSERT);
> +}

The intention of assert is to verify that those functions are called
for appropriate commands such as CTAS, Refresh Mat View and so on with
correct parameters. I really don't think so we can replace the assert
with a function like above, in the release mode assertion will always
be true. In a way, that assertion is for only debugging purposes. And
I also think that when we as the callers know when to call those new
functions, we can even remove the assertions, if they are really a
problem here. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Bharath Rupireddy
On Wed, Jan 6, 2021 at 10:05 AM Zhihong Yu  wrote:
>
> The plan sounds good.
>
> Before the second command type is added, can you leave out the 'if (ins_cmd 
> == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)' and keep the pair of curlies ?
>
> You can add the if condition back when the second command type is added.

Thanks.

IMO, an empty pair of curlies is not a good idea. Having if (ins_cmd
== PARALLEL_INSERT_CMD_CREATE_TABLE_AS) doesn't harm anything.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




RE: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Hou, Zhijie
> > For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
> >
> > ParallelInsCmdEstimate :
> >
> > +   Assert(pcxt && ins_info &&
> > +  (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
> > +
> > +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> >
> > Sinc the if condition is covered by the assertion, I wonder why the if
> check is still needed.
> >
> > Similar comment for SaveParallelInsCmdFixedInfo and
> > SaveParallelInsCmdInfo
> 
> Thanks.
> 
> The idea is to have assertion with all the expected ins_cmd types, and then
> later to have selective handling for different ins_cmds. For example, if
> we add (in future) parallel insertion in Refresh Materialized View, then
> the code in those functions will be something
> like:
> 
> +static void
> +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind
> ins_cmd,
> +   void *ins_info)
> +{
> +Assert(pcxt && ins_info &&
> +   (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS ||
> +(ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW));
> +
> +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +{
> +
> +}
> +   else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
> +   {
> +
> +   }
> 
> Similarly for other functions as well.

I think it makes sense.

And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be used in 
some places,
How about define a generic function with some comment to mention the purpose.

An example in INSERT INTO SELECT patch:
+/*
+ * IsModifySupportedInParallelMode
+ *
+ * Indicates whether execution of the specified table-modification command
+ * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to certain
+ * parallel-safety conditions.
+ */
+static inline bool
+IsModifySupportedInParallelMode(CmdType commandType)
+{
+   /* Currently only INSERT is supported */
+   return (commandType == CMD_INSERT);
+}

Best regards,
houzj




Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Dilip Kumar
On Wed, Jan 6, 2021 at 9:23 AM Bharath Rupireddy
 wrote:
>

+/*
+ * List the commands here for which parallel insertions are possible.
+ */
+typedef enum ParallelInsertCmdKind
+{
+ PARALLEL_INSERT_CMD_UNDEF = 0,
+ PARALLEL_INSERT_CMD_CREATE_TABLE_AS
+} ParallelInsertCmdKind;

I see there is some code that is generic for CTAS and INSERT INTO
SELECT *,  So is it
possible to take out that common code to a separate base patch?  Later
both CTAS and INSERT INTO SELECT * can expand
that for their usage.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Zhihong Yu
The plan sounds good.

Before the second command type is added, can you leave out the 'if (ins_cmd
== PARALLEL_INSERT_CMD_CREATE_TABLE_AS)' and keep the pair of curlies ?

You can add the if condition back when the second command type is added.

Cheers

On Tue, Jan 5, 2021 at 7:53 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Wed, Jan 6, 2021 at 8:19 AM Zhihong Yu  wrote:
> > For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
> >
> > ParallelInsCmdEstimate :
> >
> > +   Assert(pcxt && ins_info &&
> > +  (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
> > +
> > +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> >
> > Sinc the if condition is covered by the assertion, I wonder why the if
> check is still needed.
> >
> > Similar comment for SaveParallelInsCmdFixedInfo and
> SaveParallelInsCmdInfo
>
> Thanks.
>
> The idea is to have assertion with all the expected ins_cmd types, and
> then later to have selective handling for different ins_cmds. For
> example, if we add (in future) parallel insertion in Refresh
> Materialized View, then the code in those functions will be something
> like:
>
> +static void
> +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind
> ins_cmd,
> +   void *ins_info)
> +{
> +Assert(pcxt && ins_info &&
> +   (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS ||
> +(ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW));
> +
> +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +{
> +
> +}
> +   else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
> +   {
> +
> +   }
>
> Similarly for other functions as well.
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Bharath Rupireddy
On Wed, Jan 6, 2021 at 8:19 AM Zhihong Yu  wrote:
> For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :
>
> ParallelInsCmdEstimate :
>
> +   Assert(pcxt && ins_info &&
> +  (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
> +
> +   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
>
> Sinc the if condition is covered by the assertion, I wonder why the if check 
> is still needed.
>
> Similar comment for SaveParallelInsCmdFixedInfo and SaveParallelInsCmdInfo

Thanks.

The idea is to have assertion with all the expected ins_cmd types, and
then later to have selective handling for different ins_cmds. For
example, if we add (in future) parallel insertion in Refresh
Materialized View, then the code in those functions will be something
like:

+static void
+ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd,
+   void *ins_info)
+{
+Assert(pcxt && ins_info &&
+   (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS ||
+(ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW));
+
+if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+{
+
+}
+   else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)
+   {
+
+   }

Similarly for other functions as well.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Zhihong Yu
For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch :

ParallelInsCmdEstimate :

+   Assert(pcxt && ins_info &&
+  (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS));
+
+   if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)

Sinc the if condition is covered by the assertion, I wonder why the if
check is still needed.

Similar comment for SaveParallelInsCmdFixedInfo and SaveParallelInsCmdInfo

Cheers

On Mon, Jan 4, 2021 at 7:59 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
>  wrote:
> >
> > > +   if
> (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> > > +   ((DR_intorel *)
> gstate->dest)->into->rel &&
> > > +   ((DR_intorel *)
> gstate->dest)->into->rel->relname)
> > > why would rel and relname not be there? if no rows have been inserted?
> > > because it seems from the intorel_startup function that that would be
> > > set as soon as startup was done, which i assume (wrongly?) is always
> done?
> >
> > Actually, that into clause rel variable is always being set in the
> gram.y for CTAS, Create Materialized View and SELECT INTO (because
> qualified_name non-terminal is not optional). My bad. I just added it as a
> sanity check. Actually, it's not required.
> >
> > create_as_target:
> > qualified_name opt_column_list table_access_method_clause
> > OptWith OnCommitOption OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > create_mv_target:
> > qualified_name opt_column_list table_access_method_clause
> opt_reloptions OptTableSpace
> > {
> > $$ = makeNode(IntoClause);
> > $$->rel = $1;
> > into_clause:
> > INTO OptTempTableName
> > {
> > $$ = makeNode(IntoClause);
> >$$->rel = $2;
> >
> > I will change the below code:
> > +if (GetParallelInsertCmdType(gstate->dest) ==
> > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
> > +((DR_intorel *) gstate->dest)->into &&
> > +((DR_intorel *) gstate->dest)->into->rel &&
> > +((DR_intorel *)
> gstate->dest)->into->rel->relname)
> > +{
> >
> > to:
> > +if (GetParallelInsertCmdType(gstate->dest) ==
> > +PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> > +{
> >
> > I will update this in the next version of the patch set.
>
> Attaching v20 patch set that has above change in 0001 patch, note that
> 0002 to 0004 patches have no changes from v19. Please consider the v20
> patch set for further review.
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Bharath Rupireddy
On Tue, Jan 5, 2021 at 1:00 PM Luc Vlaming  wrote:
> Reviewing further v20-0001:
>
> I would still opt for moving the code for the parallel worker into a
> separate function, and then setting rStartup of the dest receiver to
> that function in ExecParallelGetInsReceiver, as its completely
> independent code. Just a matter of style I guess.

If we were to have a intorel_startup_worker and assign it to
self->pub.rStartup, 1) we can do it in the CreateIntoRelDestReceiver,
we have to pass a parameter to CreateIntoRelDestReceiver as an
indication of parallel worker, which requires code changes in places
wherever CreateIntoRelDestReceiver is used. 2) we can also assign
intorel_startup_worker after CreateIntoRelDestReceiver in
ExecParallelGetInsReceiver, but that doesn't look good to me. 3) we
can duplicate CreateIntoRelDestReceiver and have a
CreateIntoRelParallelDestReceiver with the only change being that
self->pub.rStartup = intorel_startup_worker;

IMHO, the way it is currently, looks good. Anyways, I'm open to
changing that if we agree on any of the above 3 ways.

If we were to do any of the above, then we might have to do the same
thing for other commands Refresh Materialized View or Copy To where we
can parallelize.

Thoughts?

> Maybe I'm not completely following why but afaics we want parallel
> inserts in various scenarios, not just CTAS? I'm asking because code like
> +   if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> +   pg_atomic_add_fetch_u64(>processed,
> queryDesc->estate->es_processed);
> seems very specific to CTAS. For now that seems fine but I suppose that
> would be generalized soon after? Basically I would have expected the if
> to compare against PARALLEL_INSERT_CMD_UNDEF.

After this patch is reviewed and goes for commit, then the next thing
I plan to do is to allow parallel inserts in Refresh Materialized View
and it can be used for that. I think the processed variable can also
be used for parallel inserts in INSERT INTO SELECT [1] as well.
Currently, I'm keeping it for CTAS, maybe later (after this is
committed) it can be generalized.

Thoughts?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com

> Apart from these small things v20-0001 looks (very) good to me.
> v20-0003 and v20-0004:
> looks good to me.

Thanks.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Luc Vlaming

On 05-01-2021 11:32, Dilip Kumar wrote:

On Tue, Jan 5, 2021 at 12:43 PM Luc Vlaming  wrote:


On 04-01-2021 14:32, Bharath Rupireddy wrote:

On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming mailto:l...@swarm64.com>> wrote:
  > Sorry it took so long to get back to reviewing this.

Thanks for the comments.

  > wrt v18-0001patch:
  >
  > +   /*
  > +* If the worker is for parallel insert in CTAS, then
use the proper
  > +* dest receiver.
  > +*/
  > +   intoclause = (IntoClause *) stringToNode(intoclausestr);
  > +   receiver = CreateIntoRelDestReceiver(intoclause);
  > +   ((DR_intorel *)receiver)->is_parallel_worker = true;
  > +   ((DR_intorel *)receiver)->object_id = fpes->objectid;
  > I would move this into a function called e.g.
  > GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
  > createas.c.
  > I would then also split up intorel_startup into intorel_leader_startup
  > and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
  > self->pub.rStartup to intorel_worker_startup.

My intention was to not add any new APIs to the dest receiver. I simply
made the changes in intorel_startup, in which for workers it just does
the minimalistic work and exit from it. In the leader most of the table
creation and sanity check is kept untouched. Please have a look at the
v19 patch posted upthread [1].



Looks much better, really nicely abstracted away in the v20 patch.


  > +   volatile pg_atomic_uint64   *processed;
  > why is it volatile?

Intention is to always read from the actual memory location. I referred
it from the way pg_atomic_fetch_add_u64_impl,
pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their
u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.


But in your case, I do not understand the intention that where do you
think that the compiler can optimize it and read the old value?



It can not and should not. I had just only seen so far c++ atomic 
variables and not a (postgres-specific?) c atomic variable which 
apparently requires the volatile keyword. My stupidity ;)


Cheers,
Luc




Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Dilip Kumar
On Tue, Jan 5, 2021 at 12:43 PM Luc Vlaming  wrote:
>
> On 04-01-2021 14:32, Bharath Rupireddy wrote:
> > On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming  > > wrote:
> >  > Sorry it took so long to get back to reviewing this.
> >
> > Thanks for the comments.
> >
> >  > wrt v18-0001patch:
> >  >
> >  > +   /*
> >  > +* If the worker is for parallel insert in CTAS, then
> > use the proper
> >  > +* dest receiver.
> >  > +*/
> >  > +   intoclause = (IntoClause *) stringToNode(intoclausestr);
> >  > +   receiver = CreateIntoRelDestReceiver(intoclause);
> >  > +   ((DR_intorel *)receiver)->is_parallel_worker = true;
> >  > +   ((DR_intorel *)receiver)->object_id = fpes->objectid;
> >  > I would move this into a function called e.g.
> >  > GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
> >  > createas.c.
> >  > I would then also split up intorel_startup into intorel_leader_startup
> >  > and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
> >  > self->pub.rStartup to intorel_worker_startup.
> >
> > My intention was to not add any new APIs to the dest receiver. I simply
> > made the changes in intorel_startup, in which for workers it just does
> > the minimalistic work and exit from it. In the leader most of the table
> > creation and sanity check is kept untouched. Please have a look at the
> > v19 patch posted upthread [1].
> >
>
> Looks much better, really nicely abstracted away in the v20 patch.
>
> >  > +   volatile pg_atomic_uint64   *processed;
> >  > why is it volatile?
> >
> > Intention is to always read from the actual memory location. I referred
> > it from the way pg_atomic_fetch_add_u64_impl,
> > pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their
> > u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.

But in your case, I do not understand the intention that where do you
think that the compiler can optimize it and read the old value?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-05 Thread Bharath Rupireddy
On Tue, Jan 5, 2021 at 10:08 AM vignesh C  wrote:
> On Mon, Jan 4, 2021 at 3:07 PM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Dec 30, 2020 at 5:28 PM vignesh C  wrote:
> > > Few comments:
> > > -   /*
> > > -* To allow parallel inserts, we need to ensure that they are 
> > > safe to be
> > > -* performed in workers. We have the infrastructure to allow 
> > > parallel
> > > -* inserts in general except for the cases where inserts generate 
> > > a new
> > > -* CommandId (eg. inserts into a table having a foreign key 
> > > column).
> > > -*/
> > > -   if (IsParallelWorker())
> > > -   ereport(ERROR,
> > > -   
> > > (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> > > -errmsg("cannot insert tuples in a
> > > parallel worker")));
> > >
> > > Is it possible to add a check if it is a CTAS insert here as we do not
> > > support insert in parallel workers from others as of now.
> >
> > Currently, there's no global variable in which we can selectively skip
> > this in case of parallel insertion in CTAS. How about having a
> > variable in any of the worker global contexts, set that when parallel
> > insertion is chosen for CTAS and use that in heap_prepare_insert() to
> > skip the above error? Eventually, we can remove this restriction
> > entirely in case we fully allow parallelism for INSERT INTO SELECT,
> > CTAS, and COPY.
> >
> > Thoughts?
>
> Yes, I felt that the leader can store the command as CTAS and the
> leader/worker can use it to check and throw an error. The similar
> change can be used for the parallel insert patches and once all the
> patches are committed, we can remove it eventually.

We can skip the error "cannot insert tuples in a parallel worker" in
heap_prepare_insert() selectively for each parallel insertion and
eventually we can remove that error after all the parallel insertion
related patches are committed. The main problem is that we should be
knowing in heap_prepare_insert() that we are coming from parallel
insertion for CTAS, or some other command at the same time we don't
want to alter the table_tuple_insert()/heap_prepare_insert() API
because this change will be removed eventually.

We can achieve this in below ways:
1) Add a backend global variable, set it before each
table_tuple_insert() in intorel_receive() and use that in
heap_prepare_insert() to skip the error.
2) Add a variable to MyBgworkerEntry structure, set it before each
table_tuple_insert() in intorel_receive() or in ParallelQueryMain() if
we are for CTAS parallel insertion and use that in
heap_prepare_insert() to skip the error.
3) Currently, we pass table insert options to
table_tuple_insert()/heap_prepare_insert(), which is a bitmap of below
values. We could also add something like #define
PARALLEL_INSERTION_CMD_CTAS 0x000F, set it before each
table_tuple_insert() in intorel_receive() and use that in
heap_prepare_insert() to skip the error, then unset it.

/* "options" flag bits for table_tuple_insert */
/* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */
#define TABLE_INSERT_SKIP_FSM0x0002
#define TABLE_INSERT_FROZEN0x0004
#define TABLE_INSERT_NO_LOGICAL0x0008

IMO either 2 or 3 would be fine. Thoughts?

> > > +   Oid objectid;   /* workers to
> > > open relation/table.  */
> > > +   /* Number of tuples inserted by all the workers. */
> > > +   pg_atomic_uint64processed;
> > >
> > > We can just mention relation instead of relation/table.
> >
> > I will modify it in the next patch set.
> >
> > > +select explain_pictas(
> > > +'create table parallel_write as select length(stringu1) from tenk1;');
> > > +  explain_pictas
> > > +--
> > > + Gather (actual rows=N loops=N)
> > > +   Workers Planned: 4
> > > +   Workers Launched: N
> > > + ->  Create parallel_write
> > > +   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> > > +(5 rows)
> > > +
> > > +select count(*) from parallel_write;
> > >
> > > Can we include selection of cmin, xmin for one of the test to verify
> > > that it uses the same transaction id  in the parallel workers
> > > something like:
> > > select distinct(cmin,xmin) from parallel_write;
> >
> > This is not possible since cmin and xmin are dynamic, we can not use
> > them in test cases. I think it's not necessary to check whether the
> > leader and workers are in the same txn or not, since we are not
> > creating a new txn. All the txn state from the leader is serialized in
> > SerializeTransactionState and restored in
> > StartParallelWorkerTransaction.
> >
>
> I had seen in your patch that you serialize and use the same
> transaction, but it will be good if you can have at least one test
> case to validate that the leader and worker both use the same
> transaction. To solve the problem that you are facing 

Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread Luc Vlaming

On 04-01-2021 14:53, Bharath Rupireddy wrote:

On Mon, Jan 4, 2021 at 5:44 PM Luc Vlaming  wrote:

On 04-01-2021 12:16, Hou, Zhijie wrote:


wrt v18-0002patch:

It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs

what i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?


IMO, The reason is we want to make sure we only ignore the cost when Gather is 
the top node.
And it seems the generate_useful_gather_paths called in 
apply_scanjoin_target_to_paths is the right place which can only create top 
node Gather.
So we change the flag in apply_scanjoin_target_to_paths around 
generate_useful_gather_paths to identify the top node.


Right. We wanted to ignore parallel tuple cost for only the upper Gather path.


I was wondering actually if we need the state machine. Reason is that as
AFAICS the code could be placed in create_gather_path, where you can
also check if it is a top gather node, whether the dest receiver is the
right type, etc? To me that seems like a nicer solution as its makes
that all logic that decides whether or not a parallel CTAS is valid is
in a single place instead of distributed over various places.


IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.

I may be wrong. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



So the way I understand it the requirements are:
- it needs to be the top-most gather
- it should not do anything with the rows after the gather node as this 
would make the parallel inserts conceptually invalid.


Right now we're trying to judge what might be added on-top that could 
change the rows by inspecting all parts of the root object that would 
cause anything to be added, and add a little statemachine to track the 
state of that knowledge. To me this has the downside that the list in 
HAS_PARENT_PATH_GENERATING_CLAUSE has to be exhaustive, and we need to 
make sure it stays up-to-date, which could result in regressions if not 
tracked carefully.


Personally I would therefore go for a design which is safe in the sense 
that regressions are not as easily introduced. IMHO that could be done 
by inspecting the planned query afterwards, and then judging whether or 
not the parallel inserts are actually the right thing to do.


Another way to create more safety against regressions would be to add an 
assert upon execution of the query that if we do parallel inserts that 
only a subset of allowed nodes exists above the gather node.


Some (not extremely fact checked) approaches as food for thought:
1. Plan the query as normal, and then afterwards look at the resulting 
plan to see if there are only nodes that are ok between the gather node 
and the top node, which afaics would only be things like append nodes.

Which would mean two things:
- at the end of subquery_planner before the final_rel is fetched, we add 
another pass like the grouping_planner called e.g. 
parallel_modify_planner or so, which traverses the query plan and checks 
if the inserts would indeed be executed parallel, and if so sets the 
cost of the gather to 0.
- we always keep around the best gathered partial path, or the partial 
path itself.


2. Generate both gather paths: one with zero cost for the inserts and 
one with costs. the one with zero costs would however be kept separately 
and added as prime candidate for the final rel. then we can check in the 
subquery_planner if the final candidate is different and then choose.


Kind regards,
Luc




Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread Luc Vlaming

On 05-01-2021 04:59, Bharath Rupireddy wrote:

On Mon, Jan 4, 2021 at 7:02 PM Bharath Rupireddy
 wrote:



+   if (IS_PARALLEL_CTAS_DEST(gstate->dest) 
&&
+   ((DR_intorel *) 
gstate->dest)->into->rel &&
+   ((DR_intorel *) 
gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted?
because it seems from the intorel_startup function that that would be
set as soon as startup was done, which i assume (wrongly?) is always done?


Actually, that into clause rel variable is always being set in the gram.y for 
CTAS, Create Materialized View and SELECT INTO (because qualified_name 
non-terminal is not optional). My bad. I just added it as a sanity check. 
Actually, it's not required.

create_as_target:
 qualified_name opt_column_list table_access_method_clause
 OptWith OnCommitOption OptTableSpace
 {
 $$ = makeNode(IntoClause);
 $$->rel = $1;
create_mv_target:
 qualified_name opt_column_list table_access_method_clause 
opt_reloptions OptTableSpace
 {
 $$ = makeNode(IntoClause);
 $$->rel = $1;
into_clause:
 INTO OptTempTableName
 {
 $$ = makeNode(IntoClause);
$$->rel = $2;

I will change the below code:
+if (GetParallelInsertCmdType(gstate->dest) ==
+PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
+((DR_intorel *) gstate->dest)->into &&
+((DR_intorel *) gstate->dest)->into->rel &&
+((DR_intorel *) gstate->dest)->into->rel->relname)
+{

to:
+if (GetParallelInsertCmdType(gstate->dest) ==
+PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+{

I will update this in the next version of the patch set.


Attaching v20 patch set that has above change in 0001 patch, note that
0002 to 0004 patches have no changes from v19. Please consider the v20
patch set for further review.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Hi,

Reviewing further v20-0001:

I would still opt for moving the code for the parallel worker into a 
separate function, and then setting rStartup of the dest receiver to 
that function in ExecParallelGetInsReceiver, as its completely 
independent code. Just a matter of style I guess.


Maybe I'm not completely following why but afaics we want parallel 
inserts in various scenarios, not just CTAS? I'm asking because code like

+   if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+		pg_atomic_add_fetch_u64(>processed, 
queryDesc->estate->es_processed);
seems very specific to CTAS. For now that seems fine but I suppose that 
would be generalized soon after? Basically I would have expected the if 
to compare against PARALLEL_INSERT_CMD_UNDEF.


Apart from these small things v20-0001 looks (very) good to me.

v20-0002:
will reply on the specific mail-thread about the state machine

v20-0003 and v20-0004:
looks good to me.

Kind regards,
Luc




Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread Luc Vlaming

On 04-01-2021 14:32, Bharath Rupireddy wrote:
On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming > wrote:

 > Sorry it took so long to get back to reviewing this.

Thanks for the comments.

 > wrt v18-0001patch:
 >
 > +               /*
 > +                * If the worker is for parallel insert in CTAS, then 
use the proper

 > +                * dest receiver.
 > +                */
 > +               intoclause = (IntoClause *) stringToNode(intoclausestr);
 > +               receiver = CreateIntoRelDestReceiver(intoclause);
 > +               ((DR_intorel *)receiver)->is_parallel_worker = true;
 > +               ((DR_intorel *)receiver)->object_id = fpes->objectid;
 > I would move this into a function called e.g.
 > GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
 > createas.c.
 > I would then also split up intorel_startup into intorel_leader_startup
 > and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
 > self->pub.rStartup to intorel_worker_startup.

My intention was to not add any new APIs to the dest receiver. I simply 
made the changes in intorel_startup, in which for workers it just does 
the minimalistic work and exit from it. In the leader most of the table 
creation and sanity check is kept untouched. Please have a look at the 
v19 patch posted upthread [1].




Looks much better, really nicely abstracted away in the v20 patch.


 > +       volatile pg_atomic_uint64       *processed;
 > why is it volatile?

Intention is to always read from the actual memory location. I referred 
it from the way pg_atomic_fetch_add_u64_impl, 
pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their 
u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.




Okay I had not seen this syntax before for atomics with the volatile 
keyword but its apparently how the atomics abstraction works in postgresql.



 > +                       if (isctas)
 > +                       {
 > +                               intoclause = ((DR_intorel *) 
node->dest)->into;
 > +                               objectid = ((DR_intorel *) 
node->dest)->object_id;

 > +                       }
 > Given that you extract them each once and then pass them directly into
 > the parallel-worker, can't you instead pass in the destreceiver and
 > leave that logic to ExecInitParallelPlan?

That's changed entirely in the v19 patch set posted upthread [1]. Please 
have a look. I didn't pass the dest receiver, to keep the API generic, I 
passed parallel insert command type and a void * ptr which points to 
insertion command because the information we pass to workers depends on 
the insertion command (for instance, the information needed by workers 
is for CTAS into clause and object id and for Refresh Mat View object id).


 >
 > +                                       if 
(IS_PARALLEL_CTAS_DEST(gstate->dest) &&
 > +                                               ((DR_intorel *) 
gstate->dest)->into->rel &&
 > +                                               ((DR_intorel *) 
gstate->dest)->into->rel->relname)

 > why would rel and relname not be there? if no rows have been inserted?
 > because it seems from the intorel_startup function that that would be
 > set as soon as startup was done, which i assume (wrongly?) is always 
done?


Actually, that into clause rel variable is always being set in the 
gram.y for CTAS, Create Materialized View and SELECT INTO (because 
qualified_name non-terminal is not optional). My bad. I just added it as 
a sanity check. Actually, it's not required.


create_as_target:
*qualified_name* opt_column_list table_access_method_clause
             OptWith OnCommitOption OptTableSpace
                 {
                     $$ = makeNode(IntoClause);
*                    $$->rel = $1;*
create_mv_target:
*qualified_name* opt_column_list table_access_method_clause 
opt_reloptions OptTableSpace

     {
     $$ = makeNode(IntoClause);
*    $$->rel = $1;*
into_clause:
     INTO OptTempTableName
     {
     $$ = makeNode(IntoClause);
*   $$->rel = $2;*

I will change the below code:
+                    if (GetParallelInsertCmdType(gstate->dest) ==
+                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
+                        ((DR_intorel *) gstate->dest)->into &&
+                        ((DR_intorel *) gstate->dest)->into->rel &&
+                        ((DR_intorel *) gstate->dest)->into->rel->relname)
+                    {

to:
+                    if (GetParallelInsertCmdType(gstate->dest) ==
+                        PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+                    {

I will update this in the next version of the patch set.



Thanks

 >
 > +        * In case if no workers were launched, allow the leader to 
insert entire

 > +        * tuples.
 > what does "entire tuples" mean? should it maybe be "all tuples"?

Yeah, noticed that while 

Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread vignesh C
On Mon, Jan 4, 2021 at 3:07 PM Bharath Rupireddy
 wrote:
>
> On Wed, Dec 30, 2020 at 5:28 PM vignesh C  wrote:
> > Few comments:
> > -   /*
> > -* To allow parallel inserts, we need to ensure that they are safe 
> > to be
> > -* performed in workers. We have the infrastructure to allow 
> > parallel
> > -* inserts in general except for the cases where inserts generate a 
> > new
> > -* CommandId (eg. inserts into a table having a foreign key column).
> > -*/
> > -   if (IsParallelWorker())
> > -   ereport(ERROR,
> > -   (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> > -errmsg("cannot insert tuples in a
> > parallel worker")));
> >
> > Is it possible to add a check if it is a CTAS insert here as we do not
> > support insert in parallel workers from others as of now.
>
> Currently, there's no global variable in which we can selectively skip
> this in case of parallel insertion in CTAS. How about having a
> variable in any of the worker global contexts, set that when parallel
> insertion is chosen for CTAS and use that in heap_prepare_insert() to
> skip the above error? Eventually, we can remove this restriction
> entirely in case we fully allow parallelism for INSERT INTO SELECT,
> CTAS, and COPY.
>
> Thoughts?

Yes, I felt that the leader can store the command as CTAS and the
leader/worker can use it to check and throw an error. The similar
change can be used for the parallel insert patches and once all the
patches are committed, we can remove it eventually.

>
> > +   Oid objectid;   /* workers to
> > open relation/table.  */
> > +   /* Number of tuples inserted by all the workers. */
> > +   pg_atomic_uint64processed;
> >
> > We can just mention relation instead of relation/table.
>
> I will modify it in the next patch set.
>
> > +select explain_pictas(
> > +'create table parallel_write as select length(stringu1) from tenk1;');
> > +  explain_pictas
> > +--
> > + Gather (actual rows=N loops=N)
> > +   Workers Planned: 4
> > +   Workers Launched: N
> > + ->  Create parallel_write
> > +   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> > +(5 rows)
> > +
> > +select count(*) from parallel_write;
> >
> > Can we include selection of cmin, xmin for one of the test to verify
> > that it uses the same transaction id  in the parallel workers
> > something like:
> > select distinct(cmin,xmin) from parallel_write;
>
> This is not possible since cmin and xmin are dynamic, we can not use
> them in test cases. I think it's not necessary to check whether the
> leader and workers are in the same txn or not, since we are not
> creating a new txn. All the txn state from the leader is serialized in
> SerializeTransactionState and restored in
> StartParallelWorkerTransaction.
>

I had seen in your patch that you serialize and use the same
transaction, but it will be good if you can have at least one test
case to validate that the leader and worker both use the same
transaction. To solve the problem that you are facing where cmin and
xmin are dynamic, you can check the distinct count by using something
like below:
SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM  t1) as dt;

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread Bharath Rupireddy
On Mon, Jan 4, 2021 at 5:44 PM Luc Vlaming  wrote:
> On 04-01-2021 12:16, Hou, Zhijie wrote:
> >> 
> >> wrt v18-0002patch:
> >>
> >> It looks like this introduces a state machine that goes like:
> >> - starts at CTAS_PARALLEL_INS_UNDEF
> >> - possibly moves to CTAS_PARALLEL_INS_SELECT
> >> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
> >> - if both were added at some stage, we can go to
> >> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
> >>
> >> what i'm wondering is why you opted to put logic around
> >> generate_useful_gather_paths and in cost_gather when to me it seems more
> >> logical to put it in create_gather_path? i'm probably missing something
> >> there?
> >
> > IMO, The reason is we want to make sure we only ignore the cost when Gather 
> > is the top node.
> > And it seems the generate_useful_gather_paths called in 
> > apply_scanjoin_target_to_paths is the right place which can only create top 
> > node Gather.
> > So we change the flag in apply_scanjoin_target_to_paths around 
> > generate_useful_gather_paths to identify the top node.

Right. We wanted to ignore parallel tuple cost for only the upper Gather path.

> I was wondering actually if we need the state machine. Reason is that as
> AFAICS the code could be placed in create_gather_path, where you can
> also check if it is a top gather node, whether the dest receiver is the
> right type, etc? To me that seems like a nicer solution as its makes
> that all logic that decides whether or not a parallel CTAS is valid is
> in a single place instead of distributed over various places.

IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.

I may be wrong. Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread Bharath Rupireddy
On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming  wrote:
> Sorry it took so long to get back to reviewing this.

Thanks for the comments.

> wrt v18-0001patch:
>
> +   /*
> +* If the worker is for parallel insert in CTAS, then use
the proper
> +* dest receiver.
> +*/
> +   intoclause = (IntoClause *) stringToNode(intoclausestr);
> +   receiver = CreateIntoRelDestReceiver(intoclause);
> +   ((DR_intorel *)receiver)->is_parallel_worker = true;
> +   ((DR_intorel *)receiver)->object_id = fpes->objectid;
> I would move this into a function called e.g.
> GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
> createas.c.
> I would then also split up intorel_startup into intorel_leader_startup
> and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
> self->pub.rStartup to intorel_worker_startup.

My intention was to not add any new APIs to the dest receiver. I simply
made the changes in intorel_startup, in which for workers it just does the
minimalistic work and exit from it. In the leader most of the table
creation and sanity check is kept untouched. Please have a look at the v19
patch posted upthread [1].

> +   volatile pg_atomic_uint64   *processed;
> why is it volatile?

Intention is to always read from the actual memory location. I referred it
from the way pg_atomic_fetch_add_u64_impl,
pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their u32
counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.

> +   if (isctas)
> +   {
> +   intoclause = ((DR_intorel *)
node->dest)->into;
> +   objectid = ((DR_intorel *)
node->dest)->object_id;
> +   }
> Given that you extract them each once and then pass them directly into
> the parallel-worker, can't you instead pass in the destreceiver and
> leave that logic to ExecInitParallelPlan?

That's changed entirely in the v19 patch set posted upthread [1]. Please
have a look. I didn't pass the dest receiver, to keep the API generic, I
passed parallel insert command type and a void * ptr which points to
insertion command because the information we pass to workers depends on the
insertion command (for instance, the information needed by workers is for
CTAS into clause and object id and for Refresh Mat View object id).

>
> +   if
(IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> +   ((DR_intorel *)
gstate->dest)->into->rel &&
> +   ((DR_intorel *)
gstate->dest)->into->rel->relname)
> why would rel and relname not be there? if no rows have been inserted?
> because it seems from the intorel_startup function that that would be
> set as soon as startup was done, which i assume (wrongly?) is always done?

Actually, that into clause rel variable is always being set in the gram.y
for CTAS, Create Materialized View and SELECT INTO (because qualified_name
non-terminal is not optional). My bad. I just added it as a sanity check.
Actually, it's not required.

create_as_target:
*qualified_name* opt_column_list table_access_method_clause
OptWith OnCommitOption OptTableSpace
{
$$ = makeNode(IntoClause);
*$$->rel = $1;*
create_mv_target:
*qualified_name* opt_column_list table_access_method_clause
opt_reloptions OptTableSpace
{
$$ = makeNode(IntoClause);
*$$->rel = $1;*
into_clause:
INTO OptTempTableName
{
$$ = makeNode(IntoClause);
*   $$->rel = $2;*

I will change the below code:
+if (GetParallelInsertCmdType(gstate->dest) ==
+PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
+((DR_intorel *) gstate->dest)->into &&
+((DR_intorel *) gstate->dest)->into->rel &&
+((DR_intorel *) gstate->dest)->into->rel->relname)
+{

to:
+if (GetParallelInsertCmdType(gstate->dest) ==
+PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+{

I will update this in the next version of the patch set.

>
> +* In case if no workers were launched, allow the leader to
insert entire
> +* tuples.
> what does "entire tuples" mean? should it maybe be "all tuples"?

Yeah, noticed that while working on the v19 patch set. Please have a look
at the v19 patch posted upthread [1].

> 
> wrt v18-0003patch:
>
> not sure if it is needed, but i was wondering if we would want more
> tests with multiple gather nodes existing? caused e.g. by using CTE's,
> valid subquery's (like the one test you have, but without the group
> by/having)?


Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread Luc Vlaming

On 04-01-2021 12:16, Hou, Zhijie wrote:

Hi



wrt v18-0002patch:

It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs

what i'm wondering is why you opted to put logic around
generate_useful_gather_paths and in cost_gather when to me it seems more
logical to put it in create_gather_path? i'm probably missing something
there?


IMO, The reason is we want to make sure we only ignore the cost when Gather is 
the top node.
And it seems the generate_useful_gather_paths called in 
apply_scanjoin_target_to_paths is the right place which can only create top 
node Gather.
So we change the flag in apply_scanjoin_target_to_paths around 
generate_useful_gather_paths to identify the top node.


Best regards,
houzj




Hi,

I was wondering actually if we need the state machine. Reason is that as 
AFAICS the code could be placed in create_gather_path, where you can 
also check if it is a top gather node, whether the dest receiver is the 
right type, etc? To me that seems like a nicer solution as its makes 
that all logic that decides whether or not a parallel CTAS is valid is 
in a single place instead of distributed over various places.


Kind regards,
Luc




RE: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread Hou, Zhijie
Hi

> 
> wrt v18-0002patch:
> 
> It looks like this introduces a state machine that goes like:
> - starts at CTAS_PARALLEL_INS_UNDEF
> - possibly moves to CTAS_PARALLEL_INS_SELECT
> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
> - if both were added at some stage, we can go to
> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
> 
> what i'm wondering is why you opted to put logic around
> generate_useful_gather_paths and in cost_gather when to me it seems more
> logical to put it in create_gather_path? i'm probably missing something
> there?

IMO, The reason is we want to make sure we only ignore the cost when Gather is 
the top node.
And it seems the generate_useful_gather_paths called in 
apply_scanjoin_target_to_paths is the right place which can only create top 
node Gather.
So we change the flag in apply_scanjoin_target_to_paths around 
generate_useful_gather_paths to identify the top node. 


Best regards,
houzj




Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread Luc Vlaming

On 30-12-2020 04:55, Bharath Rupireddy wrote:

On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu  wrote:

w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch

+ * Push the dest receiver to Gather node when it is either at the top of the
+ * plan or under top Append node unless it does not have any projections to do.

I think the 'unless' should be 'if'. As can be seen from the body of the method:

+   if (!ps->ps_ProjInfo)
+   {
+   GatherState *gstate = (GatherState *) ps;
+
+   parallel = true;


Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
that no change in 0001 to 0003 patches from v17.

Please consider v18 patch set for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Hi,

Sorry it took so long to get back to reviewing this.

wrt v18-0001patch:

+   /*
+* If the worker is for parallel insert in CTAS, then use the 
proper
+* dest receiver.
+*/
+   intoclause = (IntoClause *) stringToNode(intoclausestr);
+   receiver = CreateIntoRelDestReceiver(intoclause);
+   ((DR_intorel *)receiver)->is_parallel_worker = true;
+   ((DR_intorel *)receiver)->object_id = fpes->objectid;
I would move this into a function called e.g. 
GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in 
createas.c.

I would then also split up intorel_startup into intorel_leader_startup
and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set 
self->pub.rStartup to intorel_worker_startup.



+   volatile pg_atomic_uint64   *processed;
why is it volatile?


+   if (isctas)
+   {
+   intoclause = ((DR_intorel *) node->dest)->into;
+   objectid = ((DR_intorel *) 
node->dest)->object_id;
+   }
Given that you extract them each once and then pass them directly into 
the parallel-worker, can't you instead pass in the destreceiver and 
leave that logic to ExecInitParallelPlan?



+   if (IS_PARALLEL_CTAS_DEST(gstate->dest) 
&&
+   ((DR_intorel *) 
gstate->dest)->into->rel &&
+   ((DR_intorel *) 
gstate->dest)->into->rel->relname)
why would rel and relname not be there? if no rows have been inserted? 
because it seems from the intorel_startup function that that would be 
set as soon as startup was done, which i assume (wrongly?) is always done?



+* In case if no workers were launched, allow the leader to insert 
entire
+* tuples.
what does "entire tuples" mean? should it maybe be "all tuples"?



wrt v18-0002patch:

It looks like this introduces a state machine that goes like:
- starts at CTAS_PARALLEL_INS_UNDEF
- possibly moves to CTAS_PARALLEL_INS_SELECT
- CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
- if both were added at some stage, we can go to 
CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs


what i'm wondering is why you opted to put logic around 
generate_useful_gather_paths and in cost_gather when to me it seems more 
logical to put it in create_gather_path? i'm probably missing something 
there?




wrt v18-0003patch:

not sure if it is needed, but i was wondering if we would want more 
tests with multiple gather nodes existing? caused e.g. by using CTE's, 
valid subquery's (like the one test you have, but without the group 
by/having)?



Kind regards,
Luc




Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread Bharath Rupireddy
On Wed, Dec 30, 2020 at 5:28 PM vignesh C  wrote:
> Few comments:
> -   /*
> -* To allow parallel inserts, we need to ensure that they are safe to 
> be
> -* performed in workers. We have the infrastructure to allow parallel
> -* inserts in general except for the cases where inserts generate a 
> new
> -* CommandId (eg. inserts into a table having a foreign key column).
> -*/
> -   if (IsParallelWorker())
> -   ereport(ERROR,
> -   (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> -errmsg("cannot insert tuples in a
> parallel worker")));
>
> Is it possible to add a check if it is a CTAS insert here as we do not
> support insert in parallel workers from others as of now.

Currently, there's no global variable in which we can selectively skip
this in case of parallel insertion in CTAS. How about having a
variable in any of the worker global contexts, set that when parallel
insertion is chosen for CTAS and use that in heap_prepare_insert() to
skip the above error? Eventually, we can remove this restriction
entirely in case we fully allow parallelism for INSERT INTO SELECT,
CTAS, and COPY.

Thoughts?

> +   Oid objectid;   /* workers to
> open relation/table.  */
> +   /* Number of tuples inserted by all the workers. */
> +   pg_atomic_uint64processed;
>
> We can just mention relation instead of relation/table.

I will modify it in the next patch set.

> +select explain_pictas(
> +'create table parallel_write as select length(stringu1) from tenk1;');
> +  explain_pictas
> +--
> + Gather (actual rows=N loops=N)
> +   Workers Planned: 4
> +   Workers Launched: N
> + ->  Create parallel_write
> +   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> +(5 rows)
> +
> +select count(*) from parallel_write;
>
> Can we include selection of cmin, xmin for one of the test to verify
> that it uses the same transaction id  in the parallel workers
> something like:
> select distinct(cmin,xmin) from parallel_write;

This is not possible since cmin and xmin are dynamic, we can not use
them in test cases. I think it's not necessary to check whether the
leader and workers are in the same txn or not, since we are not
creating a new txn. All the txn state from the leader is serialized in
SerializeTransactionState and restored in
StartParallelWorkerTransaction.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2021-01-04 Thread Bharath Rupireddy
On Wed, Dec 30, 2020 at 5:26 PM vignesh C  wrote:
>
> On Wed, Dec 30, 2020 at 10:47 AM Bharath Rupireddy
>  wrote:
> >
> > On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar  wrote:
> > > I have completed reviewing 0001, I don't have more comments, just one
> > > question.  Soon I will review the remaining patches.
> >
> > Thanks.
> >
> > > +/* If parallel inserts are to be allowed, set a few extra 
> > > information. */
> > > +if (myState->is_parallel)
> > > +{
> > > +myState->object_id = intoRelationAddr.objectId;
> > > +
> > > +/*
> > > + * We don't need to skip contacting FSM while inserting tuples 
> > > for
> > > + * parallel mode, while extending the relations, workers instead 
> > > of
> > > + * blocking on a page while another worker is inserting, can 
> > > check the
> > > + * FSM for another page that can accommodate the tuples. This 
> > > results
> > > + * in major benefit for parallel inserts.
> > > + */
> > > +myState->ti_options = 0;
> > >
> > > Is there any performance data for this or just theoretical analysis?
> >
> > I have seen that we don't get much performance with the skip fsm
> > option, though I don't have the data to back it up. I'm planning to
> > run performance tests after the patches 0001, 0002 and 0003 get
> > reviewed. I will capture the data at that time. Hope that's fine.
> >
>
> When you run the performance tests, you can try to capture and publish
> relation size & the number of pages that are getting created for base
> table and the CTAS table, you can use something like SELECT relpages
> FROM pg_class WHERE relname = 'tablename &  SELECT
> pg_total_relation_size('tablename'). Just to make sure that there is
> no significant difference between the base table and CTAS table.

I can do that, I'm sure the number of pages will be equal or little
more, since I observed this for parallel copy.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-30 Thread Dilip Kumar
On Wed, Dec 30, 2020 at 7:47 PM Bharath Rupireddy
 wrote:
>
> Thanks for the comments.
>
> How about naming like below more generically and placing them in
> parallel.h so that it will also be used for refresh materialized view?
>
> +typedef enum ParallelInsertTupleCostOpt
> +{
> + PINS_SELECT_QUERY = 1 << 0, /* turn on this before planning */
> + /*
> + * Turn on this while planning for upper Gather path to ignore parallel
> + * tuple cost in cost_gather.
> + */
> + PINS_CAN_IGN_TUP_COST = 1 << 1,
> + /* Turn on this after the cost is ignored. */
> + PINS_TUP_COST_IGNORED = 1 << 2
>
> My plan was to get the main design idea of pushing the dest receiver
> to gather reviewed and once agreed, then I thought of making few
> functions common and place them in parallel.h and parallel.c so that
> they can be used for Parallel Inserts in REFRESH MATERIALIZED VIEW
> because the same design idea can be applied there as well.


I think instead of PINS_* we can name PARALLEL_INSERT_* other than
that I am fine with the name.

> For instance my thoughts are: add the below structures, functions and
> other macros to parallel.h and parallel.c:
> typedef enum ParallelInsertKind
> {
> PINS_UNDEF = 0,
> PINS_CREATE_TABLE_AS,
> PINS_REFRESH_MAT_VIEW
> } ParallelInsertKind;
>
> typedef struct ParallelInsertCTASInfo
> {
> IntoClause *intoclause;
> Oid objectid;
> } ParallelInsertCTASInfo;
>
> typedef struct ParallelInsertRMVInfo
> {
> Oid objectid;
> } ParallelInsertRMVInfo;
>
> ExecInitParallelPlan(PlanState *planstate, EState *estate,
>   Bitmapset *sendParams, int nworkers,
> - int64 tuples_needed)
> + int64 tuples_needed, ParallelInsertKind pinskind,
> + void *pinsinfo)
>
> Change ExecParallelInsertInCTAS to
>
> +static void
> +ExecParallelInsert(GatherState *node)
> +{
>
> Change SetCTASParallelInsertState to
> +void
> +SetParallelInsertState(QueryDesc *queryDesc)
>
> Change IsParallelInsertionAllowedInCTAS to
>
> +bool
> +IsParallelInsertionAllowed(ParallelInsertKind pinskind, IntoClause *into)
> +{
>
> Thoughts?
>

I haven’t thought about these structures yet but yeah making them
generic will be good.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-30 Thread Bharath Rupireddy
Thanks for the comments.

How about naming like below more generically and placing them in
parallel.h so that it will also be used for refresh materialized view?

+typedef enum ParallelInsertTupleCostOpt
+{
+ PINS_SELECT_QUERY = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ PINS_CAN_IGN_TUP_COST = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ PINS_TUP_COST_IGNORED = 1 << 2

My plan was to get the main design idea of pushing the dest receiver
to gather reviewed and once agreed, then I thought of making few
functions common and place them in parallel.h and parallel.c so that
they can be used for Parallel Inserts in REFRESH MATERIALIZED VIEW
because the same design idea can be applied there as well.

For instance my thoughts are: add the below structures, functions and
other macros to parallel.h and parallel.c:
typedef enum ParallelInsertKind
{
PINS_UNDEF = 0,
PINS_CREATE_TABLE_AS,
PINS_REFRESH_MAT_VIEW
} ParallelInsertKind;

typedef struct ParallelInsertCTASInfo
{
IntoClause *intoclause;
Oid objectid;
} ParallelInsertCTASInfo;

typedef struct ParallelInsertRMVInfo
{
Oid objectid;
} ParallelInsertRMVInfo;

ExecInitParallelPlan(PlanState *planstate, EState *estate,
  Bitmapset *sendParams, int nworkers,
- int64 tuples_needed)
+ int64 tuples_needed, ParallelInsertKind pinskind,
+ void *pinsinfo)

Change ExecParallelInsertInCTAS to

+static void
+ExecParallelInsert(GatherState *node)
+{

Change SetCTASParallelInsertState to
+void
+SetParallelInsertState(QueryDesc *queryDesc)

Change IsParallelInsertionAllowedInCTAS to

+bool
+IsParallelInsertionAllowed(ParallelInsertKind pinskind, IntoClause *into)
+{

Thoughts?

If okay, I can work on these points and add a new patch into the patch
set that will have changes for parallel inserts in REFRESH
MATERIALIZED VIEW.

On Wed, Dec 30, 2020 at 3:04 PM Dilip Kumar  wrote:
> Some comments in 0002
>
> 1.
> +/*
> + * Information sent to the planner from CTAS to account for the cost
> + * calculations in cost_gather. We need to do this because, no tuples will be
> + * received by the Gather node if the workers insert the tuples in parallel.
> + */
> +typedef enum CTASParallelInsertOpt
> +{
> + CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
> + CTAS_PARALLEL_INS_SELECT = 1 << 0, /* turn on this before planning */
> + /*
> + * Turn on this while planning for upper Gather path to ignore parallel
> + * tuple cost in cost_gather.
> + */
> + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
> + /* Turn on this after the cost is ignored. */
> + CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
> +} CTASParallelInsertOpt;
>
>
> I don't like the naming of these flags.  Especially no need to define
> CTAS_PARALLEL_INS_UNDEF, we can directl use 0
> for that purpose instead of giving some weird name.  So I suggest
> first, just get rid of CTAS_PARALLEL_INS_UNDEF.

+1. I will change it in the next version of the patch.

> 2.
> + /*
> + * Turn on a flag to ignore parallel tuple cost by the Gather path in
> + * cost_gather if the SELECT is for CTAS and we are generating an upper
> + * level Gather path.
> + */
> + bool ignore = ignore_parallel_tuple_cost(root);
> +
>   generate_useful_gather_paths(root, rel, false);
>
> + /*
> + * Reset the ignore flag, in case we turned it on but
> + * generate_useful_gather_paths returned without reaching cost_gather.
> + * If we reached cost_gather, we would have been reset it there.
> + */
> + if (ignore && (root->parse->CTASParallelInsInfo &
> + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
> + {
> + root->parse->CTASParallelInsInfo &=
> + ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
> + }
>
> I think th way we are using these cost ignoring flag, doesn't look clean.
>
> I mean first, CTAS_PARALLEL_INS_SELECT is set if it is coming from
> CTAS and then ignore_parallel_tuple_cost will
> set the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN if it satisfies certain
> condition which is fine.  Now, internally cost
> gather will add CTAS_PARALLEL_INS_TUP_COST_IGNORED and remove
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN and if
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is not removed then we will remove
> it outside.  Why do we need to remove
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN flag at all?

Yes we don't need to remove the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN
flag. I will change it in the next version.

> 3.
> + if (tuple_cost_flags && gstate->ps.ps_ProjInfo)
> + Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));
>
> Instead of adding Assert inside an IF statement, you can convert whole
> statement as an assert.  Lets not add unnecessary
> if in the release mode.

+1. I will change it in the version.

> 4.
> + if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
> + (root->parse->CTASParallelInsInfo &
> + CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
> + {
> + 

Re: Parallel Inserts in CREATE TABLE AS

2020-12-30 Thread vignesh C
On Wed, Dec 30, 2020 at 9:25 AM Bharath Rupireddy
 wrote:
>
> On Wed, Dec 30, 2020 at 5:22 AM Zhihong Yu  wrote:
> > w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch
> >
> > + * Push the dest receiver to Gather node when it is either at the top of 
> > the
> > + * plan or under top Append node unless it does not have any projections 
> > to do.
> >
> > I think the 'unless' should be 'if'. As can be seen from the body of the 
> > method:
> >
> > +   if (!ps->ps_ProjInfo)
> > +   {
> > +   GatherState *gstate = (GatherState *) ps;
> > +
> > +   parallel = true;
>
> Thanks. Modified it in the 0004 patch. Attaching v18 patch set. Note
> that no change in 0001 to 0003 patches from v17.
>
> Please consider v18 patch set for further review.
>

Few comments:
-   /*
-* To allow parallel inserts, we need to ensure that they are safe to be
-* performed in workers. We have the infrastructure to allow parallel
-* inserts in general except for the cases where inserts generate a new
-* CommandId (eg. inserts into a table having a foreign key column).
-*/
-   if (IsParallelWorker())
-   ereport(ERROR,
-   (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
-errmsg("cannot insert tuples in a
parallel worker")));

Is it possible to add a check if it is a CTAS insert here as we do not
support insert in parallel workers from others as of now.

+   Oid objectid;   /* workers to
open relation/table.  */
+   /* Number of tuples inserted by all the workers. */
+   pg_atomic_uint64processed;

We can just mention relation instead of relation/table.

+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+  explain_pictas
+--
+ Gather (actual rows=N loops=N)
+   Workers Planned: 4
+   Workers Launched: N
+ ->  Create parallel_write
+   ->  Parallel Seq Scan on tenk1 (actual rows=N loops=N)
+(5 rows)
+
+select count(*) from parallel_write;

Can we include selection of cmin, xmin for one of the test to verify
that it uses the same transaction id  in the parallel workers
something like:
select distinct(cmin,xmin) from parallel_write;

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-30 Thread vignesh C
On Wed, Dec 30, 2020 at 10:47 AM Bharath Rupireddy
 wrote:
>
> On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar  wrote:
> > I have completed reviewing 0001, I don't have more comments, just one
> > question.  Soon I will review the remaining patches.
>
> Thanks.
>
> > +/* If parallel inserts are to be allowed, set a few extra information. 
> > */
> > +if (myState->is_parallel)
> > +{
> > +myState->object_id = intoRelationAddr.objectId;
> > +
> > +/*
> > + * We don't need to skip contacting FSM while inserting tuples for
> > + * parallel mode, while extending the relations, workers instead of
> > + * blocking on a page while another worker is inserting, can check 
> > the
> > + * FSM for another page that can accommodate the tuples. This 
> > results
> > + * in major benefit for parallel inserts.
> > + */
> > +myState->ti_options = 0;
> >
> > Is there any performance data for this or just theoretical analysis?
>
> I have seen that we don't get much performance with the skip fsm
> option, though I don't have the data to back it up. I'm planning to
> run performance tests after the patches 0001, 0002 and 0003 get
> reviewed. I will capture the data at that time. Hope that's fine.
>

When you run the performance tests, you can try to capture and publish
relation size & the number of pages that are getting created for base
table and the CTAS table, you can use something like SELECT relpages
FROM pg_class WHERE relname = 'tablename &  SELECT
pg_total_relation_size('tablename'). Just to make sure that there is
no significant difference between the base table and CTAS table.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-30 Thread Dilip Kumar
On Wed, Dec 30, 2020 at 10:49 AM Dilip Kumar  wrote:
>
> On Wed, 30 Dec 2020 at 10:47 AM, Bharath Rupireddy 
>  wrote:
>>
>> On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar  wrote:
>> > I have completed reviewing 0001, I don't have more comments, just one
>> > question.  Soon I will review the remaining patches.
>>
>> Thanks.
>>
>> > +/* If parallel inserts are to be allowed, set a few extra 
>> > information. */
>> > +if (myState->is_parallel)
>> > +{
>> > +myState->object_id = intoRelationAddr.objectId;
>> > +
>> > +/*
>> > + * We don't need to skip contacting FSM while inserting tuples for
>> > + * parallel mode, while extending the relations, workers instead 
>> > of
>> > + * blocking on a page while another worker is inserting, can 
>> > check the
>> > + * FSM for another page that can accommodate the tuples. This 
>> > results
>> > + * in major benefit for parallel inserts.
>> > + */
>> > +myState->ti_options = 0;
>> >
>> > Is there any performance data for this or just theoretical analysis?
>>
>> I have seen that we don't get much performance with the skip fsm
>> option, though I don't have the data to back it up. I'm planning to
>> run performance tests after the patches 0001, 0002 and 0003 get
>> reviewed. I will capture the data at that time. Hope that's fine.
>
>
> Yeah that’s fine
>

Some comments in 0002

1.
+/*
+ * Information sent to the planner from CTAS to account for the cost
+ * calculations in cost_gather. We need to do this because, no tuples will be
+ * received by the Gather node if the workers insert the tuples in parallel.
+ */
+typedef enum CTASParallelInsertOpt
+{
+ CTAS_PARALLEL_INS_UNDEF = 0, /* undefined */
+ CTAS_PARALLEL_INS_SELECT = 1 << 0, /* turn on this before planning */
+ /*
+ * Turn on this while planning for upper Gather path to ignore parallel
+ * tuple cost in cost_gather.
+ */
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN = 1 << 1,
+ /* Turn on this after the cost is ignored. */
+ CTAS_PARALLEL_INS_TUP_COST_IGNORED = 1 << 2
+} CTASParallelInsertOpt;


I don't like the naming of these flags.  Especially no need to define
CTAS_PARALLEL_INS_UNDEF, we can directl use 0
for that purpose instead of giving some weird name.  So I suggest
first, just get rid of CTAS_PARALLEL_INS_UNDEF.

2.
+ /*
+ * Turn on a flag to ignore parallel tuple cost by the Gather path in
+ * cost_gather if the SELECT is for CTAS and we are generating an upper
+ * level Gather path.
+ */
+ bool ignore = ignore_parallel_tuple_cost(root);
+
  generate_useful_gather_paths(root, rel, false);

+ /*
+ * Reset the ignore flag, in case we turned it on but
+ * generate_useful_gather_paths returned without reaching cost_gather.
+ * If we reached cost_gather, we would have been reset it there.
+ */
+ if (ignore && (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ }

I think th way we are using these cost ignoring flag, doesn't look clean.

I mean first, CTAS_PARALLEL_INS_SELECT is set if it is coming from
CTAS and then ignore_parallel_tuple_cost will
set the CTAS_PARALLEL_INS_TUP_COST_CAN_IGN if it satisfies certain
condition which is fine.  Now, internally cost
gather will add CTAS_PARALLEL_INS_TUP_COST_IGNORED and remove
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN and if
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is not removed then we will remove
it outside.  Why do we need to remove
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN flag at all?

3.
+ if (tuple_cost_flags && gstate->ps.ps_ProjInfo)
+ Assert(!(*tuple_cost_flags & CTAS_PARALLEL_INS_TUP_COST_IGNORED));

Instead of adding Assert inside an IF statement, you can convert whole
statement as an assert.  Lets not add unnecessary
if in the release mode.

4.
+ if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
+ (root->parse->CTASParallelInsInfo &
+ CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
+ {
+ ignore_tuple_cost = true;
+ root->parse->CTASParallelInsInfo &=
+ ~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
+ root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
+ }
+
+ if (!ignore_tuple_cost)
+ run_cost += parallel_tuple_cost * path->path.rows;

Changes this to (if, else) as shown below, because if it goes to the
IF part then ignore_tuple_cost will always be true
so no need to have an extra if check.

if ((root->parse->CTASParallelInsInfo & CTAS_PARALLEL_INS_SELECT) &&
(root->parse->CTASParallelInsInfo &
CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
{
ignore_tuple_cost = true;
root->parse->CTASParallelInsInfo &=
~CTAS_PARALLEL_INS_TUP_COST_CAN_IGN;
root->parse->CTASParallelInsInfo |= CTAS_PARALLEL_INS_TUP_COST_IGNORED;
}
else
run_cost += parallel_tuple_cost * path->path.rows;

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-29 Thread Dilip Kumar
On Wed, 30 Dec 2020 at 10:47 AM, Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar 
> wrote:
> > I have completed reviewing 0001, I don't have more comments, just one
> > question.  Soon I will review the remaining patches.
>
> Thanks.
>
> > +/* If parallel inserts are to be allowed, set a few extra
> information. */
> > +if (myState->is_parallel)
> > +{
> > +myState->object_id = intoRelationAddr.objectId;
> > +
> > +/*
> > + * We don't need to skip contacting FSM while inserting tuples
> for
> > + * parallel mode, while extending the relations, workers
> instead of
> > + * blocking on a page while another worker is inserting, can
> check the
> > + * FSM for another page that can accommodate the tuples. This
> results
> > + * in major benefit for parallel inserts.
> > + */
> > +myState->ti_options = 0;
> >
> > Is there any performance data for this or just theoretical analysis?
>
> I have seen that we don't get much performance with the skip fsm
> option, though I don't have the data to back it up. I'm planning to
> run performance tests after the patches 0001, 0002 and 0003 get
> reviewed. I will capture the data at that time. Hope that's fine.


Yeah that’s fine

> --
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Parallel Inserts in CREATE TABLE AS

2020-12-29 Thread Bharath Rupireddy
On Wed, Dec 30, 2020 at 10:32 AM Dilip Kumar  wrote:
> I have completed reviewing 0001, I don't have more comments, just one
> question.  Soon I will review the remaining patches.

Thanks.

> +/* If parallel inserts are to be allowed, set a few extra information. */
> +if (myState->is_parallel)
> +{
> +myState->object_id = intoRelationAddr.objectId;
> +
> +/*
> + * We don't need to skip contacting FSM while inserting tuples for
> + * parallel mode, while extending the relations, workers instead of
> + * blocking on a page while another worker is inserting, can check 
> the
> + * FSM for another page that can accommodate the tuples. This results
> + * in major benefit for parallel inserts.
> + */
> +myState->ti_options = 0;
>
> Is there any performance data for this or just theoretical analysis?

I have seen that we don't get much performance with the skip fsm
option, though I don't have the data to back it up. I'm planning to
run performance tests after the patches 0001, 0002 and 0003 get
reviewed. I will capture the data at that time. Hope that's fine.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-29 Thread Dilip Kumar
On Mon, Dec 28, 2020 at 10:45 AM Dilip Kumar  wrote:
>
> On Sun, Dec 27, 2020 at 2:20 PM Bharath Rupireddy
>  wrote:
> >
> > On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar  wrote:
> > > I have reviewed part of v15-0001 patch, I have a few comments, I will
> > > continue to review this.
> >
> > Thanks a lot.
> >
> > > 1.
> > > Why is this temporary hack? and what is the plan for removing this hack?
> >
> > The changes in xact.c, xact.h and heapam.c are common to all the
> > parallel insert patches - COPY, INSERT INTO SELECT. That was the
> > initial comment, I forgot to keep it in sync with the other patches.
> > Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
> > was to have these code in all the parallel inserts patch, whichever
> > gets to review and commit first, others will update their patches
> > accordingly.
> >
> > > 2.
> > > +/*
> > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > > + * insertion is possible, if yes set the parallel insert state i.e. push 
> > > down
> > > + * the dest receiver to the Gather nodes.
> > > + */
> > > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > > +{
> > > +if (!IS_CTAS(into))
> > > +return;
> > >
> > > When will this hit?  The functtion name suggest that it is from CTAS
> > > but now you have a check that if it is
> > > not for CTAS then return,  can you add the comment that when do you
> > > expect this case?
> >
> > Yes it will hit for explain cases, but I choose to remove this and
> > check outside in the explain something like:
> > if (into)
> > ChooseParallelInsertsInCTAS()
> >
> > > Also the function name should start in a new line
> > > i.e
> > > void
> > > ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> >
> > Ah, missed that. Modified now.
> >
> > > 3.
> > > +/*
> > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > > + * insertion is possible, if yes set the parallel insert state i.e. push 
> > > down
> > > + * the dest receiver to the Gather nodes.
> > > + */
> > >
> > > Push down to the Gather nodes?  I think the right statement will be
> > > push down below the Gather node.
> >
> > Modified.
> >
> > > 4.
> > > intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
> > > {
> > >  DR_intorel *myState = (DR_intorel *) self;
> > >
> > > -- Comment ->in parallel worker we don't need to crease dest recv 
> > > blah blah
> > > +if (myState->is_parallel_worker)
> > > {
> > > --parallel worker handling--
> > > return;
> > > }
> > >
> > > --non-parallel worker code stay right there, instead of moving to else
> >
> > Done.
> >
> > > 5.
> > > +/*
> > > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > > + * insertion is possible, if yes set the parallel insert state i.e. push 
> > > down
> > > + * the dest receiver to the Gather nodes.
> > > + */
> > > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > > +{
> > >
> > > From function name and comments it appeared that this function will
> > > return boolean saying whether
> > > Parallel insert should be selected or not.  I think name/comment
> > > should be better for this
> >
> > Yeah that function can still return void because no point in returning
> > bool there, since the intention is to see if parallel inserts can be
> > performed, if yes, set the state otherwise exit. I changed the
> > function name to TryParallelizingInsertsInCTAS(). Let me know your
> > suggestions if that doesn't work out.
> >
> > > 6.
> > > /*
> > > + * For parallelizing inserts in CTAS i.e. making each parallel 
> > > worker
> > > + * insert the tuples, we must send information such as into 
> > > clause (for
> > > + * each worker to build separate dest receiver), object id (for 
> > > each
> > > + * worker to open the created table).
> > >
> > > Comment is saying we need to pass object id but the code under this
> > > comment is not doing so.
> >
> > Improved the comment.
> >
> > > 7.
> > > +/*
> > > + * Since there are no rows that are transferred from workers to 
> > > Gather
> > > + * node, so we set it to 0 to be visible in estimated row count 
> > > of
> > > + * explain plans.
> > > + */
> > > +queryDesc->planstate->plan->plan_rows = 0;
> > >
> > > This seems a bit hackies Why it is done after the planning,  I mean
> > > plan must know that it is returning a 0 rows?
> >
> > This exists to show up the estimated row count(in case of EXPLAIN CTAS
> > without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
> > are shown correctly as 0 because Gather doesn't receive any tuples.
> > if (es->costs)
> > {
> > if (es->format == EXPLAIN_FORMAT_TEXT)
> > {
> > appendStringInfo(es->str, "  (cost=%.2f..%.2f rows=%.0f 
> > width=%d)",
> >

Re: Parallel Inserts in CREATE TABLE AS

2020-12-29 Thread Zhihong Yu
w.r.t. v17-0004-Enable-CTAS-Parallel-Inserts-For-Append.patch

+ * Push the dest receiver to Gather node when it is either at the top of
the
+ * plan or under top Append node unless it does not have any projections
to do.

I think the 'unless' should be 'if'. As can be seen from the body of the
method:

+   if (!ps->ps_ProjInfo)
+   {
+   GatherState *gstate = (GatherState *) ps;
+
+   parallel = true;

Cheers

On Mon, Dec 28, 2020 at 4:12 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Mon, Dec 28, 2020 at 10:46 AM Dilip Kumar 
> wrote:
> > Thanks for working on this, I will have a look at the updated patches
> soon.
>
> Attaching v17 patch set after addressing comments raised in other
> threads. Please consider this patch set for further review.
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Parallel Inserts in CREATE TABLE AS

2020-12-28 Thread Bharath Rupireddy
On Mon, Dec 28, 2020 at 11:24 AM vignesh C  wrote:
> Test comments are detailed in a few cases and in few others it is not 
> detailed for similar kinds of parallelism selected tests. I felt we could 
> make the test comments consistent across the file.

Modified the test case description in the v17 patch set posted
upthread. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-28 Thread Bharath Rupireddy
On Mon, Dec 28, 2020 at 1:16 AM Zhihong Yu  wrote:
> For v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch:
>
> +   if (ignore &&
> +   (root->parse->CTASParallelInsInfo &
> +CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))
>
> I wonder why CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is checked again in the above 
> if since when ignore_parallel_tuple_cost returns true, 
> CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is set already.

Sometimes, we may set the flag CTAS_PARALLEL_INS_TUP_COST_CAN_IGN
before generate_useful_gather_paths, but the
generate_useful_gather_paths can return without reaching cost_gather
where we reset. The generate_useful_gather_paths can return without
reaching cost_gather, in following case

if (rel->partial_pathlist == NIL)
return;

So, for such cases, I'm resetting it here.

> + * In this function we only care Append and Gather nodes.
>
> 'care' -> 'care about'

Done.

> +   for (int i = 0; i < aps->as_nplans; i++)
> +   {
> +   parallel |= PushDownCTASParallelInsertState(dest,
> +   aps->appendplans[i],
> +   gather_exists);
>
> It seems the loop termination condition can include parallel since we can 
> come out of the loop once parallel is true.

No, we can not come out of the for loop if parallel is true, because
our intention there is to look for all the child/sub plans under
Append, and push the inserts to the Gather nodes wherever possible.

> +   if (!allow && tuple_cost_flags && gather_exists)
>
> As the above code shows, gather_exists is only checked when allow is false.

Yes, if at least one gather node exists under the Append for which the
planner would have ignored the tuple cost, and now if we don't allow
parallel inserts, we should assert that the parallelism is not picked
because of wrong parallel tuple cost enforcement.

> +* We set the flag for two cases when there is no parent path will
> +* be created(such as : limit,sort,distinct...):
>
> Please correct the grammar : there are two verbs following 'when'

Done.

> For set_append_rel_size:
>
> +   {
> +   root->parse->CTASParallelInsInfo |=
> +   CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
> +   }
> +   }
> +
> +   if (root->parse->CTASParallelInsInfo &
> +   CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
> +   {
> +   root->parse->CTASParallelInsInfo &=
> +   
> ~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
>
> In the if block for childrel->rtekind == RTE_SUBQUERY, 
> CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND maybe set. Why is it cleared 
> immediately after ?

Thanks for pointing that out. It's a miss, intention is to reset it
after set_rel_size(). Corrected in the v17 patch.

> +   /* Set to this in case tuple cost needs to be ignored for Append cases. */
> +   CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3
>
> Since each CTAS_PARALLEL_INS_ flag is a bit, maybe it's better to use 'turn 
> on' or similar term in the comment. Because 'set to' normally means 
> assignment.

Done.

All the above comments are addressed in the v17 patch set posted
upthread. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: Parallel Inserts in CREATE TABLE AS

2020-12-27 Thread vignesh C
On Sun, Dec 27, 2020 at 2:28 PM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:
>
> On Sat, Dec 26, 2020 at 9:20 PM vignesh C  wrote:
> > +-- parallel inserts must occur
> > +select explain_pictas(
> > +'create table parallel_write as select length(stringu1) from tenk1;');
> > +select count(*) from parallel_write;
> > +drop table parallel_write;
> >
> > We can change comment  "parallel inserts must occur" like "parallel
> > insert must be selected for CTAS on normal table"
> >
> > +-- parallel inserts must occur
> > +select explain_pictas(
> > +'create unlogged table parallel_write as select length(stringu1) from
tenk1;');
> > +select count(*) from parallel_write;
> > +drop table parallel_write;
> >
> > We can change comment "parallel inserts must occur" like "parallel
> > insert must be selected for CTAS on unlogged table"
> > Similar comment need to be handled in other places also.
>
> I think the existing comments look fine. The info like table type and
> the Query CTAS or CMV is visible by looking at the test case. What I
> wanted from the comments is whether we support parallel inserts or not
> and if not why so that it will be easy to read. I tried to keep it as
> succinctly as possible.
>

I saw few inconsistencies in the patch:

*+-- parallel inserts must occur*+select explain_pictas(
+'create table parallel_write as select length(stringu1) from tenk1;');
+  explain_pictas


*+-- parallel inserts must not occur as the table is temporary*+select
explain_pictas(
+'create temporary table parallel_write as select length(stringu1) from
tenk1;');
+  explain_pictas


*+-- parallel inserts must occur, as there is init plan that gets executed
by+-- each parallel worker*
+select explain_pictas(
+'create table parallel_write as select two col1,
+(select two from (select * from tenk2) as tt limit 1) col2
+from tenk1  where tenk1.four = 3;');
+ explain_pictas


*+-- the top node is Gather under which merge join happens, so parallel
inserts+-- must occur*
+set enable_nestloop to off;
+set enable_mergejoin to on;


*+-- parallel hash join happens under Gather node, so parallel inserts must
occur*+set enable_mergejoin to off;
+set enable_hashjoin to on;
+select explain_pictas(

Test comments are detailed in a few cases and in few others it is not
detailed for similar kinds of parallelism selected tests. I felt we could
make the test comments consistent across the file.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: Parallel Inserts in CREATE TABLE AS

2020-12-27 Thread Dilip Kumar
On Sun, Dec 27, 2020 at 2:20 PM Bharath Rupireddy
 wrote:
>
> On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar  wrote:
> > I have reviewed part of v15-0001 patch, I have a few comments, I will
> > continue to review this.
>
> Thanks a lot.
>
> > 1.
> > Why is this temporary hack? and what is the plan for removing this hack?
>
> The changes in xact.c, xact.h and heapam.c are common to all the
> parallel insert patches - COPY, INSERT INTO SELECT. That was the
> initial comment, I forgot to keep it in sync with the other patches.
> Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
> was to have these code in all the parallel inserts patch, whichever
> gets to review and commit first, others will update their patches
> accordingly.
>
> > 2.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e. push 
> > down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> > +if (!IS_CTAS(into))
> > +return;
> >
> > When will this hit?  The functtion name suggest that it is from CTAS
> > but now you have a check that if it is
> > not for CTAS then return,  can you add the comment that when do you
> > expect this case?
>
> Yes it will hit for explain cases, but I choose to remove this and
> check outside in the explain something like:
> if (into)
> ChooseParallelInsertsInCTAS()
>
> > Also the function name should start in a new line
> > i.e
> > void
> > ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
>
> Ah, missed that. Modified now.
>
> > 3.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e. push 
> > down
> > + * the dest receiver to the Gather nodes.
> > + */
> >
> > Push down to the Gather nodes?  I think the right statement will be
> > push down below the Gather node.
>
> Modified.
>
> > 4.
> > intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
> > {
> >  DR_intorel *myState = (DR_intorel *) self;
> >
> > -- Comment ->in parallel worker we don't need to crease dest recv blah 
> > blah
> > +if (myState->is_parallel_worker)
> > {
> > --parallel worker handling--
> > return;
> > }
> >
> > --non-parallel worker code stay right there, instead of moving to else
>
> Done.
>
> > 5.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e. push 
> > down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> >
> > From function name and comments it appeared that this function will
> > return boolean saying whether
> > Parallel insert should be selected or not.  I think name/comment
> > should be better for this
>
> Yeah that function can still return void because no point in returning
> bool there, since the intention is to see if parallel inserts can be
> performed, if yes, set the state otherwise exit. I changed the
> function name to TryParallelizingInsertsInCTAS(). Let me know your
> suggestions if that doesn't work out.
>
> > 6.
> > /*
> > + * For parallelizing inserts in CTAS i.e. making each parallel 
> > worker
> > + * insert the tuples, we must send information such as into clause 
> > (for
> > + * each worker to build separate dest receiver), object id (for 
> > each
> > + * worker to open the created table).
> >
> > Comment is saying we need to pass object id but the code under this
> > comment is not doing so.
>
> Improved the comment.
>
> > 7.
> > +/*
> > + * Since there are no rows that are transferred from workers to 
> > Gather
> > + * node, so we set it to 0 to be visible in estimated row count of
> > + * explain plans.
> > + */
> > +queryDesc->planstate->plan->plan_rows = 0;
> >
> > This seems a bit hackies Why it is done after the planning,  I mean
> > plan must know that it is returning a 0 rows?
>
> This exists to show up the estimated row count(in case of EXPLAIN CTAS
> without ANALYZE) in the output. For EXPLAIN ANALYZE CTAS actual tuples
> are shown correctly as 0 because Gather doesn't receive any tuples.
> if (es->costs)
> {
> if (es->format == EXPLAIN_FORMAT_TEXT)
> {
> appendStringInfo(es->str, "  (cost=%.2f..%.2f rows=%.0f 
> width=%d)",
>  plan->startup_cost, plan->total_cost,
>  plan->plan_rows, plan->plan_width);
>
> Since it's an estimated row count(which may not be always correct), we
> will let the EXPLAIN plan show that and I think we can remove that
> part. Thoughts?
>
> I removed it in v6 patch set.
>
> > 8.
> > +char 

Re: Parallel Inserts in CREATE TABLE AS

2020-12-27 Thread Zhihong Yu
For v16-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch:

+   if (ignore &&
+   (root->parse->CTASParallelInsInfo &
+CTAS_PARALLEL_INS_TUP_COST_CAN_IGN))

I wonder why CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is checked again in the
above if since when ignore_parallel_tuple_cost returns
true, CTAS_PARALLEL_INS_TUP_COST_CAN_IGN is set already.

+ * In this function we only care Append and Gather nodes.

'care' -> 'care about'

+   for (int i = 0; i < aps->as_nplans; i++)
+   {
+   parallel |= PushDownCTASParallelInsertState(dest,
+   aps->appendplans[i],
+   gather_exists);

It seems the loop termination condition can include parallel since we can
come out of the loop once parallel is true.

+   if (!allow && tuple_cost_flags && gather_exists)

As the above code shows, gather_exists is only checked when allow is false.

+* We set the flag for two cases when there is no parent path
will
+* be created(such as : limit,sort,distinct...):

Please correct the grammar : there are two verbs following 'when'

For set_append_rel_size:

+   {
+   root->parse->CTASParallelInsInfo |=
+
CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;
+   }
+   }
+
+   if (root->parse->CTASParallelInsInfo &
+   CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND)
+   {
+   root->parse->CTASParallelInsInfo &=
+
~CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND;

In the if block for childrel->rtekind ==
RTE_SUBQUERY, CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND maybe set. Why is it
cleared immediately after ?

+   /* Set to this in case tuple cost needs to be ignored for Append cases.
*/
+   CTAS_PARALLEL_INS_IGN_TUP_COST_APPEND = 1 << 3

Since each CTAS_PARALLEL_INS_ flag is a bit, maybe it's better to use 'turn
on' or similar term in the comment. Because 'set to' normally means
assignment.

Cheers

On Sun, Dec 27, 2020 at 12:50 AM Bharath Rupireddy <
bharath.rupireddyforpostg...@gmail.com> wrote:

> On Sat, Dec 26, 2020 at 11:11 AM Dilip Kumar 
> wrote:
> > I have reviewed part of v15-0001 patch, I have a few comments, I will
> > continue to review this.
>
> Thanks a lot.
>
> > 1.
> > Why is this temporary hack? and what is the plan for removing this hack?
>
> The changes in xact.c, xact.h and heapam.c are common to all the
> parallel insert patches - COPY, INSERT INTO SELECT. That was the
> initial comment, I forgot to keep it in sync with the other patches.
> Now, I used the comment from INSERT INTO SELECT patch. IIRC, the plan
> was to have these code in all the parallel inserts patch, whichever
> gets to review and commit first, others will update their patches
> accordingly.
>
> > 2.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e.
> push down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> > +if (!IS_CTAS(into))
> > +return;
> >
> > When will this hit?  The functtion name suggest that it is from CTAS
> > but now you have a check that if it is
> > not for CTAS then return,  can you add the comment that when do you
> > expect this case?
>
> Yes it will hit for explain cases, but I choose to remove this and
> check outside in the explain something like:
> if (into)
> ChooseParallelInsertsInCTAS()
>
> > Also the function name should start in a new line
> > i.e
> > void
> > ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
>
> Ah, missed that. Modified now.
>
> > 3.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e.
> push down
> > + * the dest receiver to the Gather nodes.
> > + */
> >
> > Push down to the Gather nodes?  I think the right statement will be
> > push down below the Gather node.
>
> Modified.
>
> > 4.
> > intorel_startup(DestReceiver *self, int operation, TupleDesc typeinfo)
> > {
> >  DR_intorel *myState = (DR_intorel *) self;
> >
> > -- Comment ->in parallel worker we don't need to crease dest recv
> blah blah
> > +if (myState->is_parallel_worker)
> > {
> > --parallel worker handling--
> > return;
> > }
> >
> > --non-parallel worker code stay right there, instead of moving to
> else
>
> Done.
>
> > 5.
> > +/*
> > + * ChooseParallelInsertsInCTAS --- determine whether or not parallel
> > + * insertion is possible, if yes set the parallel insert state i.e.
> push down
> > + * the dest receiver to the Gather nodes.
> > + */
> > +void ChooseParallelInsertsInCTAS(IntoClause *into, QueryDesc *queryDesc)
> > +{
> >
> > From function name and comments it appeared that this function will
> > return boolean saying whether
> > Parallel insert should be selected or not.  I 

Re: Parallel Inserts in CREATE TABLE AS

2020-12-27 Thread Bharath Rupireddy
On Sat, Dec 26, 2020 at 9:20 PM vignesh C  wrote:
> +-- parallel inserts must occur
> +select explain_pictas(
> +'create table parallel_write as select length(stringu1) from tenk1;');
> +select count(*) from parallel_write;
> +drop table parallel_write;
>
> We can change comment  "parallel inserts must occur" like "parallel
> insert must be selected for CTAS on normal table"
>
> +-- parallel inserts must occur
> +select explain_pictas(
> +'create unlogged table parallel_write as select length(stringu1) from 
> tenk1;');
> +select count(*) from parallel_write;
> +drop table parallel_write;
>
> We can change comment "parallel inserts must occur" like "parallel
> insert must be selected for CTAS on unlogged table"
> Similar comment need to be handled in other places also.

I think the existing comments look fine. The info like table type and
the Query CTAS or CMV is visible by looking at the test case. What I
wanted from the comments is whether we support parallel inserts or not
and if not why so that it will be easy to read. I tried to keep it as
succinctly as possible.

> If possible try to make a common function for both and use.

Yes you are right. The function explain_pictas is the same as
explain_parallel_append from partition_prune.sql. It's a test
function, and I also see that we have serial_schedule and
parallel_schedule which means that these sql files can run in any
order. I'm not quite sure whether we can have it in a common test sql
file and use it across other tests sql files. AFAICS, I didn't find
any function being used in such a manner. Thoughts?

> +   if (intoclausestr && OidIsValid(objectid))
> +   fpes->objectid = objectid;
> +   else
> +   fpes->objectid = InvalidOid;
> Here OidIsValid(objectid) check is not required intoclausestr will be
> set only if OidIsValid.

Removed the OidIsValid check in the latest v16 patch set posted
upthread. Please have a look.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




  1   2   >