RE: Failed transaction statistics to measure the logical replication progress
On Tues, Sep 28, 2021 6:05 PM Amit Kapila wrote: > On Tue, Sep 28, 2021 at 11:35 AM Masahiko Sawada > wrote: > > > > On Tue, Sep 28, 2021 at 1:54 PM Amit Kapila > wrote: > > > > > > > > > > > Then, if, we proceed in this direction, > > > > the place to implement those stats > > > > would be on the LogicalRepWorker struct, instead ? > > > > > > > > > > Or, we can make existing stats persistent and then add these stats on > > > top of it. Sawada-San, do you have any thoughts on this matter? > > > > I think that making existing stats including received_lsn and > > last_msg_receipt_time persistent by using stats collector could cause > > massive reporting messages. We can report these messages with a > > certain interval to reduce the amount of messages but we will end up > > seeing old stats on the view. > > > > Can't we keep the current and new stats both in-memory and persist on > disk? So, the persistent stats data will be used to fill the in-memory > counters after restarting of workers, otherwise, we will always refer > to in-memory values. I think this approach works, but I have another concern about it. The current pg_stat_subscription view is listed as "Dynamic Statistics Views" in the document, the data in it seems about the worker process, and the view data shows what the current worker did. But if we keep the new xact stat persist, then it's not what the current worker did, it looks more related to the subscription historic data. Adding a new view seems resonalble, but it will bring another subscription related view which might be too much. OTOH, I can see there are already some different views[1] including xact stat, maybe adding another one is accepatble ? [1] pg_stat_xact_all_tables pg_stat_xact_sys_tables pg_stat_xact_user_tables pg_stat_xact_user_functions Best regards, Hou zj
RE: A reloption for partitioned tables - parallel_workers
> hi, > > Here we go, my first patch... solves > https://www.postgresql.org/message-id/7d6fdc20-857c-4cbe-ae2e-c0ff9520 > e...@www.fastmail.com > Hi, partitioned_table_reloptions(Datum reloptions, bool validate) { + static const relopt_parse_elt tab[] = { + {"parallel_workers", RELOPT_TYPE_INT, + offsetof(StdRdOptions, parallel_workers)}, + }; + return (bytea *) build_reloptions(reloptions, validate, RELOPT_KIND_PARTITIONED, - 0, NULL, 0); + sizeof(StdRdOptions), + tab, lengthof(tab)); } I noticed that you plan to store the parallel_workers in the same struct(StdRdOptions) as normal heap relation. It seems better to store it in a separate struct. And as commit 1bbd608 said: > Splitting things has the advantage to > make the information stored in rd_options include only the necessary > information, reducing the amount of memory used for a relcache entry > with partitioned tables if new reloptions are introduced at this level. What do you think? Best regards, Houzj
RE: Determine parallel-safety of partition relations for Inserts
Hi, Attaching v6 patches with the changes: * rebase the code on the greg's latest parallel insert patch. * fix some code comment. * add some test to cover the partitioned table. Please consider it for further review. Best regards, Houzj v6_0004-reloption-parallel_dml-test-and-doc.patch Description: v6_0004-reloption-parallel_dml-test-and-doc.patch v6_0001-guc-option-enable_parallel_dml-src.patch Description: v6_0001-guc-option-enable_parallel_dml-src.patch v6_0002-guc-option-enable_parallel_dml-doc-and-test.patch Description: v6_0002-guc-option-enable_parallel_dml-doc-and-test.patch v6_0003-reloption-parallel_dml-src.patch.patch Description: v6_0003-reloption-parallel_dml-src.patch.patch
RE: Parallel INSERT (INTO ... SELECT ...)
> > > It did have performance gain, but I think it's not huge enough to > > > ignore the extra's index cost. > > > What do you think ? > > > > Yes... as you suspect, I'm afraid the benefit from parallel bitmap > > scan may not compensate for the loss of the parallel insert operation. > > > > The loss is probably due to 1) more index page splits, 2) more buffer > > writes (table and index), and 3) internal locks for things such as > > relation extension and page content protection. To investigate 3), we > > should want something like [1], which tells us the wait event > > statistics (wait count and time for each wait event) per session or > > across the instance like Oracke, MySQL and EDB provides. I want to > continue this in the near future. > > What would the result look like if you turn off > parallel_leader_participation? If the leader is freed from > reading/writing the table and index, the index page splits and internal > lock contention may decrease enough to recover part of the loss. > > https://www.postgresql.org/docs/devel/parallel-plans.html > > "In a parallel bitmap heap scan, one process is chosen as the leader. That > process performs a scan of one or more indexes and builds a bitmap indicating > which table blocks need to be visited. These blocks are then divided among > the cooperating processes as in a parallel sequential scan. In other words, > the heap scan is performed in parallel, but the underlying index scan is > not." If I disable parallel_leader_participation. For max_parallel_workers_per_gather = 4, It still have performance degradation. For max_parallel_workers_per_gather = 2, the performance degradation will not happen in most of the case. There is sometimes a noise(performance degradation), but most of result(about 80%) is good. Best regards, houzj postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<8 or (a%2=0 and a>19990); QUERY PLAN - Gather (cost=4272.20..1269111.15 rows=79918 width=0) (actual time=381.764..382.715 rows=0 loops=1) Workers Planned: 4 Workers Launched: 4 Buffers: shared hit=407094 read=4 dirtied=1085 written=1158 WAL: records=260498 bytes=17019359 -> Insert on public.testscan (cost=3272.20..1260119.35 rows=0 width=0) (actual time=378.227..378.229 rows=0 loops=5) Buffers: shared hit=407094 read=4 dirtied=1085 written=1158 WAL: records=260498 bytes=17019359 Worker 0: actual time=376.638..376.640 rows=0 loops=1 Buffers: shared hit=88281 dirtied=236 written=337 WAL: records=56167 bytes=3674994 Worker 1: actual time=377.889..377.892 rows=0 loops=1 Buffers: shared hit=81231 dirtied=213 written=99 WAL: records=52175 bytes=3386885 Worker 2: actual time=377.388..377.389 rows=0 loops=1 Buffers: shared hit=82544 dirtied=232 written=279 WAL: records=52469 bytes=3443843 Worker 3: actual time=377.733..377.734 rows=0 loops=1 Buffers: shared hit=87728 dirtied=225 written=182 WAL: records=56307 bytes=3660758 -> Parallel Bitmap Heap Scan on public.x (cost=3272.20..1260119.35 rows=19980 width=8) (actual time=5.832..14.787 rows=26000 loops=5) Output: x.a, NULL::integer Recheck Cond: ((x.a < 8) OR (x.a > 19990)) Filter: ((x.a < 8) OR (((x.a % 2) = 0) AND (x.a > 19990))) Rows Removed by Filter: 1 Heap Blocks: exact=167 Buffers: shared hit=1475 Worker 0: actual time=5.203..14.921 rows=28028 loops=1 Buffers: shared hit=209 Worker 1: actual time=5.165..14.403 rows=26039 loops=1 Buffers: shared hit=196 Worker 2: actual time=5.188..14.284 rows=26177 loops=1 Buffers: shared hit=195 Worker 3: actual time=5.151..14.760 rows=28104 loops=1 Buffers: shared hit=208 -> BitmapOr (cost=3272.20..3272.20 rows=174813 width=0) (actual time=8.288..8.290 rows=0 loops=1) Buffers: shared hit=500 -> Bitmap Index Scan on x_a_idx (cost=0.00..1468.38 rows=79441 width=0) (actual time=3.681..3.681 rows=7 loops=1) Index Cond: (x.a < 8) Buffers: shared hit=222 -> Bitmap Index Scan on x_a_idx (cost=0.00..1763.86 rows=95372 width=0) (actual time=4.605..4.605 rows=10 loops=1) Index Cond: (x.a > 19990) Buffers: shared hit=278 Planning: Buffers: shared hit=19 Planning Time: 0.173 ms Execution Time: 382.776 ms (47
RE: Parallel INSERT (INTO ... SELECT ...)
> What are the results if disable the bitmap heap scan(Set enable_bitmapscan > = off)? If that happens to be true, then we might also want to consider > if in some way we can teach parallel insert to cost more in such cases. > Another thing we can try is to integrate a parallel-insert patch with the > patch on another thread [1] and see if that makes any difference but not > sure if we want to go there at this stage unless it is simple to try that > out? If we diable bitmapscan, the performance degradation seems will not happen. [Parallel] postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<8 or (a%2=0 and a>19990); QUERY PLAN --- Gather (cost=1000.00..2090216.68 rows=81338 width=0) (actual time=0.226..5488.455 rows=0 loops=1) Workers Planned: 4 Workers Launched: 4 Buffers: shared hit=393364 read=1079535 dirtied=984 written=1027 WAL: records=260400 bytes=16549513 -> Insert on public.testscan (cost=0.00..2081082.88 rows=0 width=0) (actual time=5483.113..5483.114 rows=0 loops=4) Buffers: shared hit=393364 read=1079535 dirtied=984 written=1027 WAL: records=260400 bytes=16549513 Worker 0: actual time=5483.116..5483.117 rows=0 loops=1 Buffers: shared hit=36306 read=264288 dirtied=100 written=49 WAL: records=23895 bytes=1575860 Worker 1: actual time=5483.220..5483.222 rows=0 loops=1 Buffers: shared hit=39750 read=280476 dirtied=101 written=106 WAL: records=26141 bytes=1685083 Worker 2: actual time=5482.844..5482.845 rows=0 loops=1 Buffers: shared hit=38660 read=263713 dirtied=105 written=250 WAL: records=25318 bytes=1657396 Worker 3: actual time=5483.272..5483.274 rows=0 loops=1 Buffers: shared hit=278648 read=271058 dirtied=678 written=622 WAL: records=185046 bytes=11631174 -> Parallel Seq Scan on public.x (cost=0.00..2081082.88 rows=20334 width=8) (actual time=4001.641..5287.248 rows=32500 loops=4) Output: x.a, NULL::integer Filter: ((x.a < 8) OR (((x.a % 2) = 0) AND (x.a > 19990))) Rows Removed by Filter: 49967500 Buffers: shared hit=1551 read=1079531 Worker 0: actual time=5335.456..5340.757 rows=11924 loops=1 Buffers: shared hit=281 read=264288 Worker 1: actual time=5335.559..5341.766 rows=13049 loops=1 Buffers: shared hit=281 read=280476 Worker 2: actual time=5335.534..5340.964 rows=12636 loops=1 Buffers: shared hit=278 read=263712 Worker 3: actual time=0.015..5125.503 rows=92390 loops=1 Buffers: shared hit=711 read=271055 Planning: Buffers: shared hit=19 Planning Time: 0.175 ms Execution Time: 5488.493 ms [Serial] postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<8 or (a%2=0 and a>19990); QUERY PLAN --- Insert on public.testscan (cost=0.00..5081085.52 rows=0 width=0) (actual time=19311.642..19311.643 rows=0 loops=1) Buffers: shared hit=392485 read=1079694 dirtied=934 written=933 WAL: records=260354 bytes=16259841 -> Seq Scan on public.x (cost=0.00..5081085.52 rows=81338 width=8) (actual time=0.010..18997.317 rows=12 loops=1) Output: x.a, NULL::integer Filter: ((x.a < 8) OR (((x.a % 2) = 0) AND (x.a > 19990))) Rows Removed by Filter: 199870001 Buffers: shared hit=1391 read=1079691 Planning: Buffers: shared hit=10 Planning Time: 0.125 ms Execution Time: 19311.700 ms Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> > > > > > else if (IsA(node, Query)) > > > { > > > Query *query = (Query *) node; > > > > > > /* SELECT FOR UPDATE/SHARE must be treated as unsafe */ > > > if (query->rowMarks != NULL) > > > { > > > context->max_hazard = PROPARALLEL_UNSAFE; > > > return true; > > > } > > > > In my understanding, the max_parallel_hazard() query tree walk is to > > find constructs that are parallel unsafe in that they call code that > > can't run in parallel mode. For example, FOR UPDATE/SHARE on > > traditional heap AM tuples calls AssignTransactionId() which doesn't > > support being called in parallel mode. Likewise ON CONFLICT ... DO > > UPDATE calls heap_update() which doesn't support parallelism. I'm not > > aware of any such hazards downstream of ExecValuesScan(). > > > > > >You're trying to > > > > add something that bails based on second-guessing that a parallel > > > >path would not be chosen, which I find somewhat objectionable. > > > > > > > > If the main goal of bailing out is to avoid doing the potentially > > > > expensive modification safety check on the target relation, maybe > > > > we should try to somehow make the check less expensive. I > > > > remember reading somewhere in the thread about caching the result > > > > of this check in relcache, but haven't closely studied the feasibility > of doing so. > > > > > > > > > > There's no "second-guessing" involved here. > > > There is no underlying way of dividing up the VALUES data of > > > "INSERT...VALUES" amongst the parallel workers, even if the planner > > > was updated to produce a parallel-plan for the "INSERT...VALUES" > > > case (apart from the fact that spawning off parallel workers to > > > insert that data would almost always result in worse performance > > > than a non-parallel plan...) The division of work for parallel > > > workers is part of the table AM > > > (scan) implementation, which is not invoked for "INSERT...VALUES". > > > > I don't disagree that the planner would not normally assign a parallel > > path simply to pull values out of a VALUES list mentioned in the > > INSERT command, but deciding something based on the certainty of it in > > an earlier planning phase seems odd to me. Maybe that's just me > > though. > > > > I think it is more of a case where neither is a need for parallelism nor > we want to support parallelism of it. The other possibility for such a check > could be at some earlier phase say in standard_planner [1] where we are > doing checks for other constructs where we don't want parallelism (I think > the check for 'parse->hasModifyingCTE' is quite similar). If you see in > that check as well we just assume other operations to be in the category > of parallel-unsafe. I think we should rule out such cases earlier than later. > Do you have better ideas than what Greg has done to avoid parallelism for > such cases? > > [1] - > standard_planner() > { > .. > if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && > parse->commandType == CMD_SELECT && > !parse->hasModifyingCTE && > max_parallel_workers_per_gather > 0 && > !IsParallelWorker()) > { > /* all the cheap tests pass, so scan the query tree */ > glob->maxParallelHazard = max_parallel_hazard(parse); parallelModeOK = > glob->(glob->maxParallelHazard != PROPARALLEL_UNSAFE); > } > else > { > /* skip the query tree scan, just assume it's unsafe */ > glob->maxParallelHazard = PROPARALLEL_UNSAFE; parallelModeOK = false; > } +1. In the current parallel_dml option patch. I put this check and some high-level check in a separate function called is_parallel_possible_for_modify. - * PROPARALLEL_UNSAFE, PROPARALLEL_RESTRICTED, PROPARALLEL_SAFE + * Check at a high-level if parallel mode is able to be used for the specified + * table-modification statement. + * It's not possible in the following cases: + * + * 1) enable_parallel_dml is off + * 2) UPDATE or DELETE command + * 3) INSERT...ON CONFLICT...DO UPDATE + * 4) INSERT without SELECT on a relation + * 5) the reloption parallel_dml_enabled is not set for the target table + * + * (Note: we don't do in-depth parallel-safety checks here, we do only the + * cheaper tests that can quickly exclude obvious cases for which + * parallelism isn't supported, to avoid having to do further parallel-safety + * checks for these) */ +bool +is_parallel_possible_for_modify(Query *parse) And I put the function at earlier place like the following: if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 && IsUnderPostmaster && (parse->commandType == CMD_SELECT || - (enable_parallel_dml && - IsModifySupportedInParallelMode(parse->commandType))) && + is_parallel_possible_for_modify(parse)) && !parse->hasModifyingCTE && max_parallel_workers_per_gather > 0 && !IsParallelWorker()) If this looks good,
RE: Parallel INSERT (INTO ... SELECT ...)
> > -- > > postgres=# select pg_size_pretty(pg_indexes_size('testscan_index')); > > pg_size_pretty > > > > 4048 kB > > (1 row) > > > > postgres=# select pg_size_pretty(pg_relation_size('testscan_index')); > > pg_size_pretty > > > > 4768 kB > > (1 row) > > Which of the above shows the table size? What does pg_indexes_size() > against an index (testscan_index) return? Sorry, Maybe the tablename is a little confused, but 'testscan_index' is actually a table's name. pg_indexes_size will return the index's size attatched to the table. pg_relation_size will return the table's size. Did I miss something ? > > IMO, due to the difference of inserts with parallel execution, the > > btree insert's cost is more than serial. > > > > At the same time, the parallel does not have a huge performance gain > > with bitmapscan, So the extra cost of btree index will result in > > performance degradation. > > How did you know that the parallelism didn't have a huge performance gain > with bitmap scan? > > [serial] >-> Bitmap Heap Scan on public.x (cost=3272.20..3652841.26 rows=79918 > width=8) (actual time=8.096..41.005 rows=12 loops=1) > > [parallel] > -> Parallel Bitmap Heap Scan on public.x > (cost=3272.20..1260119.35 rows=19980 width=8) (actual time=5.832..14.787 > rows=26000 loops=5) I tested the case without insert(Just the query use bitmapscan): [serial]: postgres=# explain analyze select a from x where a<8 or (a%2=0 and a>19990); QUERY PLAN -- Bitmap Heap Scan on x (cost=3258.59..3647578.53 rows=81338 width=4) (actual time=8.091..34.222 rows=12 loops=1) Recheck Cond: ((a < 8) OR (a > 19990)) Filter: ((a < 8) OR (((a % 2) = 0) AND (a > 19990))) Rows Removed by Filter: 5 Heap Blocks: exact=975 -> BitmapOr (cost=3258.59..3258.59 rows=173971 width=0) (actual time=7.964..7.965 rows=0 loops=1) -> Bitmap Index Scan on x_a_idx (cost=0.00..1495.11 rows=80872 width=0) (actual time=3.451..3.451 rows=7 loops=1) Index Cond: (a < 8) -> Bitmap Index Scan on x_a_idx (cost=0.00..1722.81 rows=93099 width=0) (actual time=4.513..4.513 rows=10 loops=1) Index Cond: (a > 19990) Planning Time: 0.108 ms Execution Time: 38.136 ms [parallel] postgres=# explain analyze select a from x where a<8 or (a%2=0 and a>19990); QUERY PLAN Gather (cost=4258.59..1266704.42 rows=81338 width=4) (actual time=9.177..22.592 rows=12 loops=1) Workers Planned: 4 Workers Launched: 4 -> Parallel Bitmap Heap Scan on x (cost=3258.59..1257570.62 rows=20334 width=4) (actual time=6.402..12.882 rows=26000 loops=5) Recheck Cond: ((a < 8) OR (a > 19990)) Filter: ((a < 8) OR (((a % 2) = 0) AND (a > 19990))) Rows Removed by Filter: 1 Heap Blocks: exact=1 -> BitmapOr (cost=3258.59..3258.59 rows=173971 width=0) (actual time=8.785..8.786 rows=0 loops=1) -> Bitmap Index Scan on x_a_idx (cost=0.00..1495.11 rows=80872 width=0) (actual time=3.871..3.871 rows=7 loops=1) Index Cond: (a < 8) -> Bitmap Index Scan on x_a_idx (cost=0.00..1722.81 rows=93099 width=0) (actual time=4.914..4.914 rows=10 loops=1) Index Cond: (a > 19990) Planning Time: 0.158 ms Execution Time: 26.951 ms (15 rows) It did have performance gain, but I think it's not huge enough to ignore the extra's index cost. What do you think ? Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> > Till now, what I found is that: > > With tang's conf, when doing parallel insert, the walrecord is more > > than serial insert (IMO, this is the main reason why it has > > performance degradation) See the attatchment for the plan info. > > > > I have tried alter the target table to unlogged and then the > > performance degradation will not happen any more. > > > > And the additional walrecord seems related to the index on the target > table. > > If the target table does not have any index, the wal record is the > > same between parallel plan and serial plan. > > Also, it does not have performance degradation without index. > > > [serial] > Insert on public.testscan (cost=3272.20..3652841.26 rows=0 width=0) > (actual time=360.474..360.476 rows=0 loops=1) >Buffers: shared hit=392569 read=3 dirtied=934 written=933 >WAL: records=260354 bytes=16259841 > > [parallel] >-> Insert on public.testscan (cost=3272.20..1260119.35 rows=0 > width=0) (actual time=378.227..378.229 rows=0 loops=5) > Buffers: shared hit=407094 read=4 dirtied=1085 written=1158 > WAL: records=260498 bytes=17019359 > > > More pages are dirtied and written in the parallel execution. Aren't the > index and possibly the target table bigger with parallel execution than > with serial execution? That may be due to the difference of inserts of > index keys. Yes, the table size and index size is bigger with parallel execution. table and index's size after parallel insert -- postgres=# select pg_size_pretty(pg_indexes_size('testscan_index')); pg_size_pretty 4048 kB (1 row) postgres=# postgres=# select pg_size_pretty(pg_relation_size('testscan_index')); pg_size_pretty 4768 kB (1 row) -- table and index's size after serial insert -- postgres=# select pg_size_pretty(pg_indexes_size('testscan_index')); pg_size_pretty 2864 kB (1 row) postgres=# select pg_size_pretty(pg_relation_size('testscan_index')); pg_size_pretty 4608 kB -- To Amit: > I think you might want to see which exact WAL records are extra by using > pg_waldump? Yes, thanks for the hint, I was doing that and the result is as follow: Heap wal record is the same between parallel and serial: (12 which is the number count of the query result). parallel Btree walrecord(130500 record): -- INSERT_LEAF:129500 INSERT_UPPER:497 SPLIT_L:172 SPLIT_R:328 INSERT_POST:0 DEDUP:0 VACUUM:0 DELETE:0 MARK_PAGE_HALFDEAD:0 UNLINK_PAGE:0 UNLINK_PAGE_META:0 NEWROOT:3 REUSE_PAGE:0 META_CLEANUP:0 -- serial Btree walrecord(130355 record): -- INSERT_LEAF:129644 INSERT_UPPER:354 SPLIT_L:0 SPLIT_R:355 INSERT_POST:0 DEDUP:0 VACUUM:0 DELETE:0 MARK_PAGE_HALFDEAD:0 UNLINK_PAGE:0 UNLINK_PAGE_META:0 NEWROOT:2 REUSE_PAGE:0 META_CLEANUP:0 -- IMO, due to the difference of inserts with parallel execution, the btree insert's cost is more than serial. At the same time, the parallel does not have a huge performance gain with bitmapscan, So the extra cost of btree index will result in performance degradation. Does it make sense ? Best regards, Houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> > > postgres=# explain verbose insert into testscan select a from x > > > where > > > a<8 or (a%2=0 and a>19990); > > > QUERY PLAN > > > > > > > -- > > > - > > > Gather (cost=4346.89..1281204.64 rows=81372 width=0) > > >Workers Planned: 4 > > >-> Insert on public.testscan (cost=3346.89..1272067.44 rows=0 > > > width=0) > > > -> Parallel Bitmap Heap Scan on public.x1 > > > (cost=3346.89..1272067.44 rows=20343 width=8) > > >Output: x1.a, NULL::integer > > >Recheck Cond: ((x1.a < 8) OR (x1.a > 19990)) > > >Filter: ((x1.a < 8) OR (((x1.a % 2) = 0) AND > > > (x1.a > > > > 19990))) > > >-> BitmapOr (cost=3346.89..3346.89 rows=178808 > > > width=0) > > > -> Bitmap Index Scan on x1_a_idx > > > (cost=0.00..1495.19 rows=80883 width=0) > > >Index Cond: (x1.a < 8) > > > -> Bitmap Index Scan on x1_a_idx > > > (cost=0.00..1811.01 rows=97925 width=0) > > >Index Cond: (x1.a > 19990) > > > > > > PSA is my postgresql.conf file, maybe you can have a look. Besides, > > > I didn't do any parameters tuning in my test session. > > > > I reproduced this on my machine. > > > > I think we'd better do "analyze" before insert which helps reproduce this > easier. > > Like: > > > > - > > analyze; > > explain analyze verbose insert into testscan select a from x where > > a<8 or (a%2=0 and a>19990); > > - > > OK then. > Can you check if just the underlying SELECTs are run (without INSERT), is > there any performance degradation when compared to a non-parallel scan? It seems there is no performance degradation without insert. Till now, what I found is that: With tang's conf, when doing parallel insert, the walrecord is more than serial insert (IMO, this is the main reason why it has performance degradation) See the attatchment for the plan info. I have tried alter the target table to unlogged and then the performance degradation will not happen any more. And the additional walrecord seems related to the index on the target table. If the target table does not have any index, the wal record is the same between parallel plan and serial plan. Also, it does not have performance degradation without index. I am still looking at this problem, if someone think of something about it, It's very grateful to share the knowledge with me. Best regards, houzj postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<8 or (a%2=0 and a>19990); QUERY PLAN Insert on public.testscan (cost=3272.20..3652841.26 rows=0 width=0) (actual time=360.474..360.476 rows=0 loops=1) Buffers: shared hit=392569 read=3 dirtied=934 written=933 WAL: records=260354 bytes=16259841 -> Bitmap Heap Scan on public.x (cost=3272.20..3652841.26 rows=79918 width=8) (actual time=8.096..41.005 rows=12 loops=1) Output: x.a, NULL::integer Recheck Cond: ((x.a < 8) OR (x.a > 19990)) Filter: ((x.a < 8) OR (((x.a % 2) = 0) AND (x.a > 19990))) Rows Removed by Filter: 5 Heap Blocks: exact=975 Buffers: shared hit=1475 -> BitmapOr (cost=3272.20..3272.20 rows=174813 width=0) (actual time=7.975..7.976 rows=0 loops=1) Buffers: shared hit=500 -> Bitmap Index Scan on x_a_idx (cost=0.00..1468.38 rows=79441 width=0) (actual time=3.469..3.470 rows=7 loops=1) Index Cond: (x.a < 8) Buffers: shared hit=222 -> Bitmap Index Scan on x_a_idx (cost=0.00..1763.86 rows=95372 width=0) (actual time=4.499..4.499 rows=10 loops=1) Index Cond: (x.a > 19990) Buffers: shared hit=278 Planning: Buffers: shared hit=10 Planning Time: 0.126 ms Execution Time: 360.547 ms (22 rows) postgres=# explain (ANALYZE, BUFFERS, VERBOSE, WAL) insert into testscan select a from x where a<8 or (a%2=0 and a>19990); QUERY PLAN - Gather (cost=4272.20..1269111.15 rows=79918 width=0) (actual time=381.764..382.715 rows=0 loops=1) Workers Planned: 4 Workers Launched: 4 Buffers: shared hit=407094 read=4 dirtied=1085 written=1158 WAL: records=260498 bytes=17019359 -> Insert on public.testscan (cost=3272.20..1260119.35 rows=0
RE: Parallel INSERT (INTO ... SELECT ...)
> > Did it actually use a parallel plan in your testing? > > When I ran these tests with the Parallel INSERT patch applied, it did > > not naturally choose a parallel plan for any of these cases. > > Yes, these cases pick parallel plan naturally on my test environment. > > postgres=# explain verbose insert into testscan select a from x where > a<8 or (a%2=0 and a>19990); > QUERY PLAN > -- > - > Gather (cost=4346.89..1281204.64 rows=81372 width=0) >Workers Planned: 4 >-> Insert on public.testscan (cost=3346.89..1272067.44 rows=0 > width=0) > -> Parallel Bitmap Heap Scan on public.x1 > (cost=3346.89..1272067.44 rows=20343 width=8) >Output: x1.a, NULL::integer >Recheck Cond: ((x1.a < 8) OR (x1.a > 19990)) >Filter: ((x1.a < 8) OR (((x1.a % 2) = 0) AND (x1.a > > 19990))) >-> BitmapOr (cost=3346.89..3346.89 rows=178808 > width=0) > -> Bitmap Index Scan on x1_a_idx > (cost=0.00..1495.19 rows=80883 width=0) >Index Cond: (x1.a < 8) > -> Bitmap Index Scan on x1_a_idx > (cost=0.00..1811.01 rows=97925 width=0) >Index Cond: (x1.a > 19990) > > PSA is my postgresql.conf file, maybe you can have a look. Besides, I didn't > do any parameters tuning in my test session. I reproduced this on my machine. I think we'd better do "analyze" before insert which helps reproduce this easier. Like: - analyze; explain analyze verbose insert into testscan select a from x where a<8 or (a%2=0 and a>19990); - Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> Posting an updated set of patches. A minor comment about doc. + +Where the above target table features are determined to be, at worst, +parallel-restricted, rather than parallel-unsafe, at least a parallel table +scan may be used in the query plan for the INSERT +statement. For more information about Parallel Safety, see +. + It seems does not mention that if target table is a foreign/temp table, a parallel table scan may be used. So how about: + +Where the target table is a foreign/temporary table or the above target table features +are determined to be, at worst, parallel-restricted, rather than parallel-unsafe, +at least a parallel table scan may be used in the query plan for the +INSERT statement. For more information about Parallel Safety, +see . + Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> > > > > > static Query * > > > rewriteRuleAction(Query *parsetree, > > > ... > > > if (sub_action_ptr) > > > + { > > > *sub_action_ptr = sub_action; > > > + rule_action->hasModifyingCTE |= > > parsetree->hasModifyingCTE; > > > + } > > > > > > > > > And the Basic test passed. > > > What do you think ? > > > > That is very close to what I was going to suggest, which is this: > > > > diff --git a/src/backend/rewrite/rewriteHandler.c > > b/src/backend/rewrite/rewriteHandler.c > > index 0672f497c6..3c4417af98 100644 > > --- a/src/backend/rewrite/rewriteHandler.c > > +++ b/src/backend/rewrite/rewriteHandler.c > > @@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree, > > checkExprHasSubLink((Node *) > > rule_action->returningList); > > } > > > > + rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE; > > + > > return rule_action; > > } > > > if (parsetree->cteList != NIL && sub_action->commandType != > CMD_UTILITY) > { > ... > sub_action->cteList = list_concat(sub_action->cteList, > } > > Is is possible when sub_action is CMD_UTILITY ? > In this case CTE will be copied to the newone, should we set the set the > flag in this case ? Sorry , a typo in my word. In this case CTE will not be copied to the newone, should we set the set the flag in this case ? Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> > > > static Query * > > rewriteRuleAction(Query *parsetree, > > ... > > if (sub_action_ptr) > > + { > > *sub_action_ptr = sub_action; > > + rule_action->hasModifyingCTE |= > parsetree->hasModifyingCTE; > > + } > > > > > > And the Basic test passed. > > What do you think ? > > That is very close to what I was going to suggest, which is this: > > diff --git a/src/backend/rewrite/rewriteHandler.c > b/src/backend/rewrite/rewriteHandler.c > index 0672f497c6..3c4417af98 100644 > --- a/src/backend/rewrite/rewriteHandler.c > +++ b/src/backend/rewrite/rewriteHandler.c > @@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree, > checkExprHasSubLink((Node *) > rule_action->returningList); > } > > + rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE; > + > return rule_action; > } if (parsetree->cteList != NIL && sub_action->commandType != CMD_UTILITY) { ... sub_action->cteList = list_concat(sub_action->cteList, } Is is possible when sub_action is CMD_UTILITY ? In this case CTE will be copied to the newone, should we set the set the flag in this case ? Best regard, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> > I took a look into the hasModifyingCTE bugfix recently, and found a > > possible bug case without the parallel insert patch. > > > > - > > drop table if exists test_data1; > > create table test_data1(a int, b int) ; insert into test_data1 select > > generate_series(1,1000), generate_series(1,1000); set > > force_parallel_mode=on; > > > > CREATE TEMP TABLE bug6051 AS > > select i from generate_series(1,3) as i; > > > > SELECT * FROM bug6051; > > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as > > i from test_data1; > > > > WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051 > > SELECT * FROM t1; > > > > *** > > ***ERROR: cannot assign XIDs during a parallel operation > > *** > > - > > > > I debugged it and it did have modifycte in the parsetree after rewrite. > > I think if we can properly set the hasModifyingCTE, we can avoid this > error by not consider parallel for this. > > > > Thanks. You've identified that the bug exists for SELECT too. I've verified > that the issue is fixed by the bugfix included in the Parallel INSERT patch. > Are you able to review my bugfix? > Since the problem exists for SELECT in the current Postgres code, I'd like > to pull that bugfix out and provide it as a separate fix. > My concern is that there may well be a better way to fix the issue - for > example, during the re-writing, rather than after the query has been > re-written. Hi, I took a look at the fix and have some thoughts on it. (Please correct me if you have tried this idea and found something is wrong) IMO, the main reason for the hasModifyingCTE=false is that: the Rewriter did not update the hasModifyingCTE when copying the existsing 'cteList' to the rewrited one. It seems there is only one place where ctelist will be copied separately. --- static Query * rewriteRuleAction(Query *parsetree, ... /* OK, it's safe to combine the CTE lists */ sub_action->cteList = list_concat(sub_action->cteList, copyObject(parsetree->cteList)); + sub_action->hasModifyingCTE |= parsetree->hasModifyingCTE; Based on the above, if we update the hasModifyingCTE here, we may solve this problem. And there is another point here, the sub_action may be not the final parsetree. If defined the rule like "DO INSTEAD insert into xx select xx from xx", Rewriter will Put the ctelist into subquery in parsetree's rtable. In this case, we may also need to update the final parsetree too. (I think you know this case, I found same logic in the latest patch) static Query * rewriteRuleAction(Query *parsetree, ... if (sub_action_ptr) + { *sub_action_ptr = sub_action; + rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE; + } And the Basic test passed. What do you think ? Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
Hi, I took a look into the hasModifyingCTE bugfix recently, and found a possible bug case without the parallel insert patch. - drop table if exists test_data1; create table test_data1(a int, b int) ; insert into test_data1 select generate_series(1,1000), generate_series(1,1000); set force_parallel_mode=on; CREATE TEMP TABLE bug6051 AS select i from generate_series(1,3) as i; SELECT * FROM bug6051; CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as i from test_data1; WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051 SELECT * FROM t1; *** ***ERROR: cannot assign XIDs during a parallel operation *** - I debugged it and it did have modifycte in the parsetree after rewrite. I think if we can properly set the hasModifyingCTE, we can avoid this error by not consider parallel for this. Thoughts ? Best regards, houzj
RE: Determine parallel-safety of partition relations for Inserts
> For v3_0003-reloption-parallel_dml-src.patch : > + table_close(rel, NoLock); > Since the rel would always be closed, it seems the return value from > RelationGetParallelDML() can be assigned to a variable, followed by call to > table_close(), then the return statement. Thanks for the comment. Fixed in the latest patch. Best regards, houzj
RE: Determine parallel-safety of partition relations for Inserts
Hi, Attaching v5 patches with the changes: * rebase the code on the greg's latest parallel insert patch * fix some code style. Please consider it for further review. Best regards, Houzj v5_0004-reloption-parallel_dml-test-and-doc.patch Description: v5_0004-reloption-parallel_dml-test-and-doc.patch v5_0001-guc-option-enable_parallel_dml-src.patch Description: v5_0001-guc-option-enable_parallel_dml-src.patch v5_0002-guc-option-enable_parallel_dml-doc-and-test.patch Description: v5_0002-guc-option-enable_parallel_dml-doc-and-test.patch v5_0003-reloption-parallel_dml-src.patch.patch Description: v5_0003-reloption-parallel_dml-src.patch.patch
RE: Parallel INSERT (INTO ... SELECT ...)
> > I've had a further look at this, and this walker function is doing a lot > of work recursing the parse tree, and I'm not sure that it reliably retrieves > the information that we;re looking for, for all cases of different SQL > queries. Unless it can be made much more efficient and specific to our needs, > I think we should not try to do this optimization, because there's too much > overhead. Also, keep in mind that for the current parallel SELECT > functionality in Postgres, I don't see any similar optimization being > attempted (and such optimization should be attempted at the SELECT level). > So I don't think we should be attempting such optimization in this patch > (but could be attempted in a separate patch, just related to current parallel > SELECT functionality). Yes, I agreed, I was worried about the overhead it may bring too, we can remove this from the current patch. Best regards, houzj
RE: Determine parallel-safety of partition relations for Inserts
> > IMO, It seems more readable to extract all the check that we can do > > before the safety-check and put them in the new function. > > > > Please consider it for further review. > > > > I've updated your v2 patches and altered some comments and documentation > changes (but made no code changes) - please compare against your v2 patches, > and see whether you agree with the changes to the wording. > In the documentation, you will also notice that in your V2 patch, it says > that the "parallel_dml_enabled" table option defaults to false. > As it actually defaults to true, I changed that in the documentation too. Hi greg, Thanks a lot for the document update, LGTM. Attaching v4 patches with the changes: * fix some typos in the code. * store partitioned reloption in a separate struct PartitionedOptions, making it more standard and easier to expand in the future. Please consider it for further review. Best regards, Houzj v4_0001-guc-option-enable_parallel_dml-src.patch Description: v4_0001-guc-option-enable_parallel_dml-src.patch v4_0002-guc-option-enable_parallel_dml-doc-and-test.patch Description: v4_0002-guc-option-enable_parallel_dml-doc-and-test.patch v4_0003-reloption-parallel_dml-src.patch.patch Description: v4_0003-reloption-parallel_dml-src.patch.patch v4_0004-reloption-parallel_dml-test-and-doc.patch Description: v4_0004-reloption-parallel_dml-test-and-doc.patch
RE: Parallel INSERT (INTO ... SELECT ...)
Hi, When developing the reloption patch, I noticed some issues in the patch. 1). > - Reduce Insert parallel-safety checks required for some SQL, by noting > that the subquery must operate on a relation (check for RTE_RELATION in > subquery range-table) + foreach(lcSub, rte->subquery->rtable) + { + rteSub = lfirst_node(RangeTblEntry, lcSub); + if (rteSub->rtekind == RTE_RELATION) + { + hasSubQueryOnRelation = true; + break; + } + } It seems we can not only search RTE_RELATION in rtable, because RTE_RELATION may exist in other place like: --- --** explain insert into target select (select * from test); Subplan's subplan --** with cte as (select * from test) insert into target select * from cte; In query's ctelist. --- May be we should use a walker function [1] to search the subquery and ctelist. 2). +-- +-- Test INSERT into temporary table with underlying query. +-- (should not use a parallel plan) +-- May be the comment here need some change since we currently support parallel plan for temp table. 3) Do you think we can add a testcase for foreign-table ? To test parallel query with serial insert on foreign table. [1] static bool relation_walker(Node *node) { if (node == NULL) return false; else if (IsA(node, RangeTblEntry)) { RangeTblEntry *rte = (RangeTblEntry *) node; if (rte->rtekind == RTE_RELATION) return true; return false; } else if (IsA(node, Query)) { Query *query = (Query *) node; /* Recurse into subselects */ return query_tree_walker(query, relation_walker, NULL, QTW_EXAMINE_RTES_BEFORE); } /* Recurse to check arguments */ return expression_tree_walker(node, relation_walker, NULL); } Best regards, houzj
RE: Determine parallel-safety of partition relations for Inserts
Hi greg, Thanks for the review ! > Personally, I think a table's "parallel_dml" option should be ON by default. > It's annoying to have to separately enable it for each and every table being > used, when I think the need to turn it selectively OFF should be fairly > rare. Yes, I agreed. Changed. > And I'm not sure that "parallel_dml" is the best name for the table option > - because it sort of implies parallel dml WILL be used - but that isn't > true, it depends on other factors too. > So I think (to be consistent with other table option naming) it would have > to be something like "parallel_dml_enabled". Agreed. Changed to parallel_dml_enabled. Attatching v2 patch which addressed the comments above. Some further refactor: Introducing a new function is_parallel_possible_for_modify() which decide whether to do safety check. IMO, It seems more readable to extract all the check that we can do before the safety-check and put them in the new function. Please consider it for further review. Best regards, houzj v2_0003-reloption-parallel_dml-src.patch Description: v2_0003-reloption-parallel_dml-src.patch v2_0004-reloption-parallel_dml-test-and-doc.patch Description: v2_0004-reloption-parallel_dml-test-and-doc.patch v2_0001-guc-option-enable_parallel_dml-src.patch Description: v2_0001-guc-option-enable_parallel_dml-src.patch v2_0002-guc-option-enable_parallel_dml-doc-and-test.patch Description: v2_0002-guc-option-enable_parallel_dml-doc-and-test.patch
RE: Determine parallel-safety of partition relations for Inserts
> Hi, > > Attatching v1 patches, introducing options which let user manually control > whether to use parallel dml. > > About the patch: > 1. add a new guc option: enable_parallel_dml (boolean) 2. add a new > tableoption: parallel_dml (boolean) > >The default of both is off(false). > > User can set enable_parallel_dml in postgresql.conf or seesion to enable > parallel dml. > If user want to choose some specific to use parallel insert, they can set > table.parallel_dml to on. > > Some attention: > (1) > Currently if guc option enable_parallel_dml is set to on but table's > parallel_dml is off, planner still do not consider parallel for dml. > > In this way, If user want to use parallel dml , they have to set > enable_parallel_dml=on and set parallel_dml = on. > If someone dislike this, I think we can also let tableoption. parallel_dml's > default value to on ,with this user only need to set enable_parallel_dml=on > > (2) > For the parallel_dml. > If target table is partitioned, and it's parallel_dml is set to on, planner > will ignore The option value of it's child. > This is beacause we can not divide dml plan to separate table, so we just > check the target table itself. > > > Thoughts and comments are welcome. The patch is based on v13 patch(parallel insert select) in [1] [1] https://www.postgresql.org/message-id/CAJcOf-ejV8iU%2BYpuV4qbYEY-2vCG1QF2g3Gxn%3DZ%2BPyUH_5f84A%40mail.gmail.com Best regards, houzj
RE: Determine parallel-safety of partition relations for Inserts
Hi, Attatching v1 patches, introducing options which let user manually control whether to use parallel dml. About the patch: 1. add a new guc option: enable_parallel_dml (boolean) 2. add a new tableoption: parallel_dml (boolean) The default of both is off(false). User can set enable_parallel_dml in postgresql.conf or seesion to enable parallel dml. If user want to choose some specific to use parallel insert, they can set table.parallel_dml to on. Some attention: (1) Currently if guc option enable_parallel_dml is set to on but table's parallel_dml is off, planner still do not consider parallel for dml. In this way, If user want to use parallel dml , they have to set enable_parallel_dml=on and set parallel_dml = on. If someone dislike this, I think we can also let tableoption. parallel_dml's default value to on ,with this user only need to set enable_parallel_dml=on (2) For the parallel_dml. If target table is partitioned, and it's parallel_dml is set to on, planner will ignore The option value of it's child. This is beacause we can not divide dml plan to separate table, so we just check the target table itself. Thoughts and comments are welcome. Best regards, houzj v1_0004-reloption-parallel_dml-test-and-doc.patch Description: v1_0004-reloption-parallel_dml-test-and-doc.patch v1_0001-guc-option-enable_parallel_dml-src.patch Description: v1_0001-guc-option-enable_parallel_dml-src.patch v1_0002-guc-option-enable_parallel_dml-doc-and-test.patch Description: v1_0002-guc-option-enable_parallel_dml-doc-and-test.patch v1_0003-reloption-parallel_dml-src.patch Description: v1_0003-reloption-parallel_dml-src.patch
RE: Determine parallel-safety of partition relations for Inserts
> > Hi > > > > I have an issue about the parameter for DML. > > > > If we define the parameter as a tableoption. > > > > For a partitioned table, if we set the parent table's parallel_dml=on, > > and set one of its partition parallel_dml=off, it seems we can not divide > the plan for the separate table. > > > > For this case, should we just check the parent's parameter ? > > > > I think so. IIUC, this means the Inserts where target table is parent table, > those will just check the option of the parent table and then ignore the > option value for child tables whereas we will consider childrel's option > for Inserts where target table is one of the child table, right? > Yes, I think we can just check the target table itself. Best regards, houzj
RE: Determine parallel-safety of partition relations for Inserts
> From: Amit Kapila > > Good question. I think if we choose to have a separate parameter for > > DML, it can probably a boolean to just indicate whether to enable > > parallel DML for a specified table and use the parallel_workers > > specified in the table used in SELECT. > > Agreed. Hi I have an issue about the parameter for DML. If we define the parameter as a tableoption. For a partitioned table, if we set the parent table's parallel_dml=on, and set one of its partition parallel_dml=off, it seems we can not divide the plan for the separate table. For this case, should we just check the parent's parameter ? Best regards, houzj
RE: fix typo in reorderbuffer.c
> >> Combocid's ==>> Combocids > > > > Ping again, just in case it’s missed. > > There are in the comments "ComboCIDs", "combocids" and "ComboCids" on top > of the existing Combocid's. Your patch introduces a fourth way of writing > that. It may be better to just have one way to govern them all. Hi Michael, Thanks for the review. Actually, "Combocid's" in the patch is the first word of the sentence, so the first char is uppercase(may be not a new style). I agree that it’s better to have a common way, but some different style of combocid follow the variable or functionname[1]. We may need to change the var name or function name too. Personally , I prefer "combocids". But do you think we can change that in a separate patch apart from this typo patch ? [1]: void SerializeComboCIDState(Size maxsize, char *start_address) static int usedComboCids = 0; /* number of elements in comboCids */ ... Best regards, houzj
RE: fix typo in reorderbuffer.c
> I found a possible typo in reorderbuffer.c > > * has got a combocid. Combocid's are only valid for the duration > of a > > Combocid's ==>> Combocids > > Attatching a small patch to correct it. > Ping again, just in case it’s missed.
RE: Parallel INSERT (INTO ... SELECT ...)
Hi, When testing the patch with the following kind of sql. --- Insert into part_table select 1; Insert into part_table select generate_series(1,1,1); Insert into part_table select * from testfunc(); --- we usually use these sqls to initialize the table or for testing purpose. Personally I think we do not need to do the parallel safety-check for these cases, because there seems no chance for the select part to consider parallel. I thought we aim to not check the safety unless parallel is possible. , So I was thinking is it possible to avoid the check it these cases ? I did some quick check on the code, An Immature ideal is to check if there is RTE_RELATION in query. If no we do not check the safety-check. I am not sure is it worth to do that, any thoughts ? Best regards, Houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> I think we can allow parallel insert in this case, because the column value > is determined according to the DEFAULT definition of the target table > specified in the INSERT statement. This is described here: > > https://www.postgresql.org/docs/devel/sql-createtable.html > > "Defaults may be specified separately for each partition. But note that > a partition's default value is not applied when inserting a tuple through > a partitioned table." > > So the parallel-unsafe function should not be called. Thanks for the explanation. I think you are right, I did miss it. Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
Hi, I have an issue of the check about column default expressions. + if (command_type == CMD_INSERT) + { + /* +* Column default expressions for columns in the target-list are +* already being checked for parallel-safety in the +* max_parallel_hazard() scan of the query tree in standard_planner(). +*/ + + tupdesc = RelationGetDescr(rel); + for (attnum = 0; attnum < tupdesc->natts; attnum++) IMO, max_parallel_hazard() only check the parent table's default expressions, But if the table has partitions and its partition have its own default expressions, max_parallel_hazard() seems does not check that. And we seems does not check that too. I am not sure should we allow parallel insert for this case ? Example: - set parallel_setup_cost=0; set parallel_tuple_cost=0; set min_parallel_table_scan_size=0; set max_parallel_workers_per_gather=4; create table origin(a int); insert into origin values(generate_series(1,5000)); create or replace function bdefault_unsafe () returns int language plpgsql parallel unsafe as $$ begin RETURN 5; end $$; create table parttable1 (a int, b name) partition by range (a); create table parttable1_1 partition of parttable1 for values from (0) to (5000); create table parttable1_2 partition of parttable1 for values from (5000) to (1); alter table parttable1_1 ALTER COLUMN b SET DEFAULT bdefault_unsafe(); postgres=# explain insert into parttable1 select * from origin ; QUERY PLAN Gather (cost=0.00..41.92 rows=5865 width=0) Workers Planned: 3 -> Insert on parttable1 (cost=0.00..41.92 rows=0 width=0) -> Parallel Seq Scan on origin (cost=0.00..41.92 rows=1892 width=68) (4 rows) postgres=# explain insert into parttable1_1 select * from origin ; QUERY PLAN --- Insert on parttable1_1 (cost=0.00..1348.00 rows=0 width=0) -> Seq Scan on origin (cost=0.00..1348.00 rows=5000 width=68) (2 rows) - Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
Hi, When reading the code of rel_max_parallel_hazard_for_modify in 0001. I thought there are so many places call table_close(). Personally, It's a little confused to me. Do you think it's better to do the table_open/close outside of rel_max_parallel_hazard_for_modify ? Like: static bool rel_max_parallel_hazard_for_modify(Relation rel, CmdType command_type, max_parallel_hazard_context *context); ... Relation relation = table_open(rte->relid, NoLock); (void) rel_max_parallel_hazard_for_modify(relation, parse->commandType, ); table_close(relation, NoLock); And we seems do not need the lockmode param with the above define. Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
Hi, After doing some test to cover the code path in the PATCH 0001. I have some suggestions for the 0002 testcase. (1) + /* Check parallel-safety of any expressions in the partition key */ + if (get_partition_col_attnum(pkey, i) == 0) + { + Node *check_expr = (Node *) lfirst(partexprs_item); + + if (max_parallel_hazard_walker(check_expr, context)) + { + table_close(rel, lockmode); + return true; + } The testcase seems does not cover the above code(test when the table have parallel unsafe expression in the partition key). Personally, I use the following sql to cover this: - create table partkey_unsafe_key_expr_t (a int4, b name) partition by range ((fullname_parallel_unsafe('',a::varchar))); explain (costs off) insert into partkey_unsafe_key_expr_t select unique1, stringu1 from tenk1; - (2) I noticed that most of testcase test both (parallel safe/unsafe/restricted). But the index expression seems does not test the parallel restricted. How about add a testcase like: - create or replace function fullname_parallel_restricted(f text, l text) returns text as $$ begin return f || l; end; $$ language plpgsql immutable parallel restricted; create table names4(index int, first_name text, last_name text); create index names4_fullname_idx on names4 (fullname_parallel_restricted(first_name, last_name)); -- -- Test INSERT with parallel-restricted index expression -- (should create a parallel plan) -- explain (costs off) insert into names4 select * from names; - (3) + /* Recursively check each partition ... */ + pdesc = RelationGetPartitionDesc(rel); + for (i = 0; i < pdesc->nparts; i++) + { + if (rel_max_parallel_hazard_for_modify(pdesc->oids[i], + command_type, + context, + AccessShareLock)) + { + table_close(rel, lockmode); + return true; + } + } It seems we do not have a testcase to test (some parallel unsafe expression or.. in partition) Hoe about add one testcase to test parallel unsafe partition ? Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> > And there seems another solution for this: > > > > In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , > > ii_IndexAttrNumbers } from the IndexInfo, which seems can get from > "Relation-> rd_index". > > > > Based on above, May be we do not need to call BuildIndexInfo to build > the IndexInfo. > > It can avoid some unnecessary cost. > > And in this solution we do not need to remove expression_planner. > > > > Hmmm, when I debugged my simple test case, I found rel->rd_index was NULL, > so it seems that the call to BuildIndexInfo is needed. > (have I understood your suggestion correctly?) > Hi greg, Thanks for debugging this. May be I missed something. I am not sure about the case when rel->rd_index was NULL. Because, In function BuildIndexInfo, it seems does not have NULL-check for index->rd_index. Like the following: BuildIndexInfo(Relation index) { IndexInfo *ii; Form_pg_index indexStruct = index->rd_index; int i; int numAtts; /* check the number of keys, and copy attr numbers into the IndexInfo */ numAtts = indexStruct->indnatts; And the patch do not have NULL-check for index->rd_index too. So I thought we can assume index->rd_index is not null, but it seems I may missed something ? Can you please share the test case with me ? I use the following code to replace the call of BuildIndexInfo. And the installcheck passed. Example: +Form_pg_index indexStruct = index_rel->rd_index; +List *ii_Expressions = RelationGetIndexExpressions(index_rel); +int ii_NumIndexAttrs = indexStruct->indnatts; +AttrNumber ii_IndexAttrNumbers[INDEX_MAX_KEYS]; +for (int i = 0; i < ii_NumIndexAttrs; i++) +ii_IndexAttrNumbers[i] = indexStruct->indkey.values[i]; Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
Hi I took a look at v12-0001 patch, here are some comments: 1. + /* +* Setup the context used in finding the max parallel-mode hazard. +*/ + Assert(initial_max_parallel_hazard == 0 || + initial_max_parallel_hazard == PROPARALLEL_SAFE || + initial_max_parallel_hazard == PROPARALLEL_RESTRICTED); + context.max_hazard = initial_max_parallel_hazard == 0 ? + PROPARALLEL_SAFE : initial_max_parallel_hazard; I am not quiet sure when will " max_parallel_hazard == 0" Does it means the case max_parallel_hazard_context not initialized ? 2. Some tiny code style suggestions + if (con->contype == CONSTRAINT_CHECK) + { + char *conbin; + Datum val; + boolisnull; + Expr *check_expr; How about : if (con->contype != CONSTRAINT_CHECK) continue; 3. + if (keycol == 0) + { + /* Found an index expression */ + + Node *index_expr; Like 2, how about: If (keycol != 0) Continue; 4. + ListCell *index_expr_item = list_head(index_info->ii_Expressions); ... + index_expr = (Node *) lfirst(index_expr_item); + index_expr = (Node *) expression_planner((Expr *) index_expr); It seems BuildIndexInfo has already called eval_const_expressions for ii_Expressions, Like the flow: BuildIndexInfo--> RelationGetIndexExpressions--> eval_const_expressions So, IMO, we do not need to call expression_planner for the expr again. And there seems another solution for this: In the patch, We only use the { ii_Expressions , ii_NumIndexAttrs , ii_IndexAttrNumbers } from the IndexInfo, which seems can get from "Relation-> rd_index". Based on above, May be we do not need to call BuildIndexInfo to build the IndexInfo. It can avoid some unnecessary cost. And in this solution we do not need to remove expression_planner. What do you think ? Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> > > > + > > + index_oid_list = RelationGetIndexList(rel); > > ... > > > > As memtioned in the comments of RelationGetIndexList: > > * we return a copy of the list palloc'd in the caller's context. The > > caller > > * may list_free() the returned list after scanning it. > > > > Shall we list_free(index_oid_list) at the end of function ? > > Just to avoid potential memory leak. > > > > I think that's a good idea, so I'll make that update in the next version > of the patch. > I do notice, however, that there seems to be quite a few places in the > Postgres > code where RelationGetIndexList() is being called without a corresponding > list_free() being called - obviously instead relying on it being deallocated > when the caller's memory-context is destroyed. > Yes, it will be deallocated when the caller's memory-context is destroyed. Currently, parallel safety-check check each partition. I am just a little worried about if there are lots of partition here, it may cause high memory use. And there is another place like this: 1. + conbin = TextDatumGetCString(val); + check_expr = stringToNode(conbin); It seems we can free the cobin when not used(for the same reason above). What do you think ? Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> > So I think we're saying that if the target table is a foreign table or > > temporary table, it can be regarded as PARALLEL_RESTRICTED, right? > > > > Yes > > IMO, PARALLEL_RESTRICTED currently enable parallel select but disable > parallel insert. > So, the INSERT only happen in leader worker which seems safe to insert into > tempory/foreigh table. > > In addition, there are some other restriction about parallel select which > seems can be removed: > > 1.- Target table has a parallel-unsafe trigger, index expression, column > default > expression or check constraint > 2.- Target table is a partitioned table with a parallel-unsafe partition > key > expression or support function > > If the Insert's target table is the type listed above, Is there some reasons > why we can not support parallel select ? > It seems only leader worker will execute the trigger and key-experssion > which seems safe. > (If I miss something about it, please let me know) So Sorry, please ignore the above, I think of something wrong. Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> > > Hi > > > > > > > > I may be wrong, and if I miss sth in previous mails, please give > > > > > me some > > > > hints. > > > > > IMO, serial insertion with underlying parallel SELECT can be > > > > > considered for foreign table or temporary table, as the > > > > > insertions only > > > > happened in the leader process. > > > > > > > > > > > > > I don't think we support parallel scan for temporary tables. Can > > > > you please try once both of these operations without Insert being > > > > involved? If you are able to produce a parallel plan without > > > > Insert then we can see why it is not supported with Insert. > > > > > > Sorry, may be I did not express it clearly, I actually means the case > when insert's target(not in select part) table is temporary. > > > And you are right that parallel select is not enabled when temporary > table is in select part. > > > > > > > I think Select can be parallel for this case and we should support this > case. > > > > So I think we're saying that if the target table is a foreign table or > temporary table, it can be regarded as PARALLEL_RESTRICTED, right? > Yes IMO, PARALLEL_RESTRICTED currently enable parallel select but disable parallel insert. So, the INSERT only happen in leader worker which seems safe to insert into tempory/foreigh table. In addition, there are some other restriction about parallel select which seems can be removed: 1.- Target table has a parallel-unsafe trigger, index expression, column default expression or check constraint 2.- Target table is a partitioned table with a parallel-unsafe partition key expression or support function If the Insert's target table is the type listed above, Is there some reasons why we can not support parallel select ? It seems only leader worker will execute the trigger and key-experssion which seems safe. (If I miss something about it, please let me know) Best regards, houzj
RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
> > > Attaching v15 patch set. Please consider it for further review. > > > > Hi > > > > I have some comments for the 0001 patch > > > > In v15-0001-postgres_fdw-function-to-discard-cached-connecti > > > > 1. > > + If there is no open connection to the given foreign server, > false > > + is returned. If no foreign server with the given name is found, > > + an error > > > > Do you think it's better add some testcases about: > > call postgres_fdw_disconnect and postgres_fdw_disconnect_all > > when there is no open connection to the given foreign server > > Do you mean a test case where foreign server exists but > postgres_fdw_disconnect() returns false because there's no connection for > that server? Yes, I read this from the doc, so I think it's better to test this. > > 2. > > + /* > > +* For the given server, if we closed connection > or it is still in > > +* use, then no need of scanning the cache > further. > > +*/ > > + if (entry->server_hashvalue == hashvalue && > > + (entry->xact_depth > 0 || result)) > > + { > > + hash_seq_term(); > > + break; > > + } > > > > If I am not wrong, is the following condition always true ? > > (entry->xact_depth > 0 || result) > > It's not always true. But it seems like it's too confusing, please have > a look at the upthread suggestion to change this with can_terminate_scan > boolean. Thanks for the remind, I will look at that. Best regards, houzj
RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
> Attaching v15 patch set. Please consider it for further review. Hi I have some comments for the 0001 patch In v15-0001-postgres_fdw-function-to-discard-cached-connecti 1. + If there is no open connection to the given foreign server, false + is returned. If no foreign server with the given name is found, an error Do you think it's better add some testcases about: call postgres_fdw_disconnect and postgres_fdw_disconnect_all when there is no open connection to the given foreign server 2. + /* +* For the given server, if we closed connection or it is still in +* use, then no need of scanning the cache further. +*/ + if (entry->server_hashvalue == hashvalue && + (entry->xact_depth > 0 || result)) + { + hash_seq_term(); + break; + } If I am not wrong, is the following condition always true ? (entry->xact_depth > 0 || result) Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
> > Thanks for the feedback. > Posting an updated set of patches. Changes are based on feedback, as detailed > below: Hi It seems there are some previous comments[1][2] not addressed in current patch. Just to make sure it's not missed. [1] https://www.postgresql.org/message-id/77e1c06ffb2240838e5fc94ec8dcb7d3%40G08CNEXMBPEKD05.g08.fujitsu.local [2] https://www.postgresql.org/message-id/CAA4eK1LMmz58ej5BgVLJ8VsUGd%3D%2BKcaA8X%3DkStORhxpfpODOxg%40mail.gmail.com Best regards, houzj
RE: Add Nullif case for eval_const_expressions_mutator
Hi Thanks for the review. > It's a bit unfortunate now that between OpExpr, DistinctExpr, NullIfExpr, > and to a lesser extent ScalarArrayOpExpr we will now have several different > implementations of nearly the same thing, without any explanation why one > approach was chosen here and another there. We should at least document > this. I am not quiet sure where to document the difference. Temporarily, I tried to add some comments for the Nullif to explain why this one is different. + /* +* Since NullIf is not common enough to deserve specially +* optimized code, use ece_generic_processing and +* ece_evaluate_expr to simplify the code as much as possible. +*/ Any suggestions ? > Some inconsistencies I found: The code for DistinctExpr calls > expression_tree_mutator() directly, but your code for NullIfExpr calls > ece_generic_processing(), even though the explanation in the comment for > DistinctExpr would apply there as well. > > Your code for NullIfExpr doesn't appear to call set_opfuncid() anywhere. IMO, we will call set_opfuncid in function ece_evaluate_expr. Like the following flow: ece_evaluate_expr-->evaluate_expr--> fix_opfuncids--> fix_opfuncids_walker--> set_opfuncid And we do not need the opfuncid till we call ece_evaluate_expr. So, to simplify the code as much as possible, I did not call set_opfuncid in eval_const_expressions_mutator again. > I would move your new block for NullIfExpr after the block for DistinctExpr. > That's the order in which these blocks appear elsewhere for generic node > processing. > Changed. > Check your whitespace usage: > > if(!has_nonconst_input) > > should have a space after the "if". (It's easy to fix of course, but I > figure I'd point it out here since you have submitted several patches with > this style, so it's perhaps a habit to break.) Changed. > Perhaps add a comment to the tests like this so it's clear what they are > for: > > diff --git a/src/test/regress/sql/case.sql > b/src/test/regress/sql/case.sql index 4742e1d0e0..98e3fb8de5 100644 > --- a/src/test/regress/sql/case.sql > +++ b/src/test/regress/sql/case.sql > @@ -137,6 +137,7 @@ CREATE TABLE CASE2_TBL ( > FROM CASE_TBL a, CASE2_TBL b > WHERE COALESCE(f,b.i) = 2; > > +-- Tests for constant subexpression simplification > explain (costs off) > SELECT * FROM CASE_TBL WHERE NULLIF(1, 2) = 2; Added. Attatching v3 patch, please consider it for further review. Best regards, houzj v3_0001-add-nullif-case-for-eval_const_expressions.patch Description: v3_0001-add-nullif-case-for-eval_const_expressions.patch
RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
> >>> +1 to add it after "dropped (Note )", how about as follows > >>> with slight changes? > >>> > >>> dropped (Note that server name of an invalid connection can be NULL > >>> if the server is dropped), and then such . > >> > >> Yes, I like this one. One question is; "should" or "is" is better > >> than "can" in this case because the server name of invalid connection > >> is always NULL when its server is dropped? > > > > I think "dropped (Note that server name of an invalid connection will > > be NULL if the server is dropped), and then such ." > > Sounds good to me. So patch attached. +1 Best regards, houzj
RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
> We need to create the loopback3 with user mapping public, otherwise the > test might become unstable as shown below. Note that loopback and > loopback2 are not dropped in the test, so no problem with them. > > ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off'); DROP > SERVER loopback3 CASCADE; > NOTICE: drop cascades to 2 other objects > -DETAIL: drop cascades to user mapping for postgres on server loopback3 > +DETAIL: drop cascades to user mapping for bharath on server loopback3 > > Attaching v2 patch for postgres_fdw_get_connections. Please have a look. Hi I have a comment for the doc about postgres_fdw_get_connections. +postgres_fdw_get_connections(OUT server_name text, OUT valid boolean) returns setof record + + + This function returns the foreign server names of all the open + connections that postgres_fdw established from + the local session to the foreign servers. It also returns whether + each connection is valid or not. false is returned + if the foreign server connection is used in the current local + transaction but its foreign server or user mapping is changed or + dropped, and then such invalid connection will be closed at + the end of that transaction. true is returned + otherwise. If there are no open connections, no record is returned. + Example usage of the function: The doc seems does not memtion the case when the function returns NULL in server_name. Users may be a little confused about why NULL was returned. Best regards, houzj
RE: Logical Replication - behavior of ALTER PUBLICATION .. DROP TABLE and ALTER SUBSCRIPTION .. REFRESH PUBLICATION
> On Thu, Jan 14, 2021 at 5:36 PM Li Japin wrote > > Do we really need to access PUBLICATIONRELMAP in this patch? What if > > we just set it to false in the else condition of (if (publish && > > (relkind != RELKIND_PARTITIONED_TABLE || pub->pubviaroot))) > > > > Thank for you review. I agree with you, it doesn’t need to access > > PUBLICATIONRELMAP, since We already get the publication oid in > > GetRelationPublications(relid) [1], which also access > > PUBLICATIONRELMAP. If the current pub->oid does not in pubids, the > publish is false, so we do not need publish the table. > > +1. This is enough. > > > I have another question, the data->publications is a list, when did it > has more then one items? > > IIUC, when the single table is associated with multiple publications, then > data->publications will have multiple entries. Though I have not tried, > we can try having two or three publications for the same table and verify > that. > > > 0001 - change as you suggest. > > 0002 - does not change. Hi In 0001 patch The comments says " Relation is not associated with the publication anymore " That comment was accurate when check is_relation_part_of_publication() in the last patch. But when put the code in the else branch, not only the case in the comments(Relation is dropped), Some other case such as "partitioned tables" will hit else branch too. Do you think we need fix the comment a little to make it accurate? And it seems we can add a "continue;" in else branch. Of course this it not necessary. Best regards, houzj
fix typo in reorderbuffer.c
Hi I found a possible typo in reorderbuffer.c * has got a combocid. Combocid's are only valid for the duration of a Combocid's ==>> Combocids Attatching a small patch to correct it. Best regards, houzj 0001-fix-typo.patch Description: 0001-fix-typo.patch
RE: remove unneeded pstrdup in fetch_table_list
> >>> Your observation seems correct to me, though I have not tried to > >>> test your patch. > >> > >> +1 to apply, this looks correct and passes tests. Scanning around I > >> +didn't see > >> any other occurrences of the same pattern. > > > > Thanks. I am thinking to backpatch this even though there is no > > problem reported from any production system. What do you think? > > No objections from me. +1 Best regards, houzj
RE: Single transaction in the tablesync worker?
> I am not exactly sure of the concern. (If the extra info below does not > help can you please describe your concern with more details). > > This [v14] patch code/feature is only referring to the immediate stopping > of only the *** "tablesync" *** worker (if any) for any/each table being > removed from the subscription. It has nothing to say about the "apply" worker > of the subscription, which continues replicating as before. > > OTOH, I think the other mail problem is not really related to the "tablesync" > workers. As you can see (e.g. steps 7,8,9,10 of [2]), that problem is > described as continuing over multiple transactions to replicate unexpected > rows - I think this could only be done by the subscription "apply" worker, > and is after the "tablesync" worker has gone away. > > So AFAIK these are 2 quite unrelated problems, and would be solved > independently. > > It just happens that they are both exposed using ALTER SUBSCRIPTION ... > REFRESH PUBLICATION; So sorry for the confusion, you are right that these are 2 quite unrelated problems. I misunderstood the 'stop the worker' here. + /* Immediately stop the worker. */ + logicalrep_worker_stop_at_commit(subid, relid); /* prevent re-launching */ + logicalrep_worker_stop(subid, relid); /* stop immediately */ Do you think we can add some comments to describe what type "worker" is stop here ? (sync worker here) And should we add some more comments to talk about the reason of " Immediately stop " here ? it may looks easier to understand. Best regards, Houzj
remove unneeded pstrdup in fetch_table_list
Hi In function fetch_table_list, it get the table names from publicer and return a list of tablenames. When append the name to the list, it use the following code: ** nspname = TextDatumGetCString(slot_getattr(slot, 1, )); Assert(!isnull); relname = TextDatumGetCString(slot_getattr(slot, 2, )); rv = makeRangeVar(pstrdup(nspname), pstrdup(relname), -1); tablelist = lappend(tablelist, rv); ** the nspname and relname will be copied which seems unnecessary. Because nspame and relname is get from TextDatumGetCString. IMO, TextDatumGetCString returns a newly palloced string. ** result = (char *) palloc(len + 1); memcpy(result, VARDATA_ANY(tunpacked), len); result[len] = '\0'; if (tunpacked != t) pfree(tunpacked); return result; ** It may harm when there are a lot of tables are replicated. So I try to fix this. Best regards, houzj 0001-remove-unneeded-pstrdup.patch Description: 0001-remove-unneeded-pstrdup.patch
RE: Single transaction in the tablesync worker?
> Also PSA some detailed logging evidence of some test scenarios involving > Drop/AlterSubscription: > + Test-20210112-AlterSubscriptionRefresh-ok.txt = > AlterSubscription_refresh which successfully drops a tablesync slot > + Test-20210112-AlterSubscriptionRefresh-warning.txt = > AlterSubscription_refresh gives WARNING that it cannot drop the tablesync > slot (which no longer exists) > + Test-20210112-DropSubscription-warning.txt = DropSubscription with a > disassociated slot_name gives a WARNING that it cannot drop the tablesync > slot (due to broken connection) Hi > * The AlterSubscription_refresh (v14+) is now more similar to > DropSubscription w.r.t to stopping workers for any "removed" tables. I have an issue about the above feature. With the patch, it seems does not stop the worker in the case of [1]. I probably missed something, should we stop the worker in such case ? [1] https://www.postgresql.org/message-id/CALj2ACV%2B0UFpcZs5czYgBpujM9p0Hg1qdOZai_43OU7bqHU_xw%40mail.gmail.com Best regards, houzj
RE: Add Nullif case for eval_const_expressions_mutator
> > I think this patch should be about a tenth the size. Try modeling it > > on the T_SubscriptingRef-etc case, ie, use ece_generic_processing and > > then ece_evaluate_expr to cover the generic cases. OpExpr is common > > enough to deserve specially optimized code, but NullIf isn't, so shorter > is better. > > Thanks for the review. > > Attaching v2 patch , which followed the suggestion to use > ece_generic_processing and ece_evaluate_expr to simplify the code. > > Please have a check. Sorry, I found the code still be simplified better. Attaching the new patch. Best regards, houzj v2_1-0001-add-nullif-case-for-eval_const_expressions.patch Description: v2_1-0001-add-nullif-case-for-eval_const_expressions.patch
RE: Add Nullif case for eval_const_expressions_mutator
> > I notice that there are no Nullif case in eval_const_expression. > > Since Nullif is similar to Opexpr and is easy to implement, I try add > > this case in eval_const_expressions_mutator. > > I think this patch should be about a tenth the size. Try modeling it on > the T_SubscriptingRef-etc case, ie, use ece_generic_processing and then > ece_evaluate_expr to cover the generic cases. OpExpr is common enough to > deserve specially optimized code, but NullIf isn't, so shorter is better. Thanks for the review. Attaching v2 patch , which followed the suggestion to use ece_generic_processing and ece_evaluate_expr to simplify the code. Please have a check. Best regards, houzj v2-0001-add-nullif-case-for-eval_const_expressions.patch Description: v2-0001-add-nullif-case-for-eval_const_expressions.patch
RE: Parallel Inserts in CREATE TABLE AS
> Attaching v21 patch set, which has following changes: > 1) 0001 - changed fpes->ins_cmd_type == > PARALLEL_INSERT_CMD_CREATE_TABLE_AS to fpes->ins_cmd_type != > PARALLEL_INSERT_CMD_UNDEF > 2) 0002 - reworded the commit message. > 3) 0003 - added cmin, xmin test case to one of the parallel insert cases > to ensure leader and worker insert the tuples in the same xact and replaced > memory usage output in numbers like 25kB to NkB to make the tests stable. > 4) 0004 - updated one of the test output to be in NkB and made the assertion > in SetParallelInsertState to be not under an if condition. > > There's one open point [1] on selective skipping of error "cannot insert > tuples in a parallel worker" in heap_prepare_insert(), thoughts are > welcome. > > Please consider the v21 patch set for further review. Hi, I took a look into the new patch and have some comments. 1. + /* +* Do not consider tuple cost in case of we intend to perform parallel +* inserts by workers. We would have turned on the ignore flag in +* apply_scanjoin_target_to_paths before generating Gather path for the +* upper level SELECT part of the query. +*/ + if ((root->parse->parallelInsCmdTupleCostOpt & +PARALLEL_INSERT_SELECT_QUERY) && + (root->parse->parallelInsCmdTupleCostOpt & +PARALLEL_INSERT_CAN_IGN_TUP_COST)) Can we just check PARALLEL_INSERT_CAN_IGN_TUP_COST here ? IMO, PARALLEL_INSERT_CAN_IGN_TUP_COST will be set only when PARALLEL_INSERT_SELECT_QUERY is set. 2. +static void +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd, + void *ins_info) ... + info = (ParallelInsertCTASInfo *) ins_info; + intoclause_str = nodeToString(info->intoclause); + intoclause_len = strlen(intoclause_str) + 1; +static void +SaveParallelInsCmdInfo(ParallelContext *pcxt, ParallelInsertCmdKind ins_cmd, + void *ins_info) ... + info = (ParallelInsertCTASInfo *)ins_info; + intoclause_str = nodeToString(info->intoclause); + intoclause_len = strlen(intoclause_str) + 1; + intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len); I noticed the above code will call nodeToString and strlen twice which seems unnecessary. Do you think it's better to store the result of nodetostring and strlen first and pass them when used ? 3. + if (node->need_to_scan_locally || node->nworkers_launched == 0) + { + EState *estate = node->ps.state; + TupleTableSlot *outerTupleSlot; + + for(;;) + { + /* Install our DSA area while executing the plan. */ + estate->es_query_dsa = + node->pei ? node->pei->area : NULL; ... + node->ps.state->es_processed++; + } How about use the variables estate like 'estate-> es_processed++;' Instead of node->ps.state->es_processed++; Best regards, houzj
RE: Single transaction in the tablesync worker?
> PSA the v12 patch for the Tablesync Solution1. > > Differences from v11: > + Added PG docs to mention the tablesync slot > + Refactored tablesync slot drop (done by > DropSubscription/process_syncing_tables_for_sync) > + Fixed PG docs mentioning wrong state code > + Fixed wrong code comment describing TCOPYDONE state > Hi I look into the new patch and have some comments. 1. + /* Setup replication origin tracking. */ ①+ originid = replorigin_by_name(originname, true); + if (!OidIsValid(originid)) + { ②+ originid = replorigin_by_name(originname, true); + if (originid != InvalidRepOriginId) + { There are two different style code which check whether originid is valid. Both are fine, but do you think it’s better to have a same style here? 2. * state to SYNCDONE. There might be zero changes applied between * CATCHUP and SYNCDONE, because the sync worker might be ahead of the * apply worker. + *- The sync worker has a intermediary state TCOPYDONE which comes after + * DATASYNC and before SYNCWAIT. This state indicates that the initial This comment about TCOPYDONE is better to be placed at [1]*, where is between DATASYNC and SYNCWAIT. * - Tablesync worker starts; changes table state from INIT to DATASYNC while * copying. [1]* * - Tablesync worker finishes the copy and sets table state to SYNCWAIT; * waits for state change. 3. + /* +* To build a slot name for the sync work, we are limited to NAMEDATALEN - +* 1 characters. +* +* The name is calculated as pg_%u_sync_%u (3 + 10 + 6 + 10 + '\0'). (It's +* actually the NAMEDATALEN on the remote that matters, but this scheme +* will also work reasonably if that is different.) +*/ + StaticAssertStmt(NAMEDATALEN >= 32, "NAMEDATALEN too small"); /* for sanity */ + + syncslotname = psprintf("pg_%u_sync_%u", suboid, relid); The comments says syncslotname is limit to NAMEDATALEN - 1 characters. But the actual size of it is (3 + 10 + 6 + 10 + '\0') = 30,which seems not NAMEDATALEN - 1. Should we change the comment here? Best regards, houzj
RE: Single transaction in the tablesync worker?
> PSA the v11 patch for the Tablesync Solution1. > > Difference from v10: > - Addresses several recent review comments. > - pg_indent has been run > Hi I took a look into the patch and have some comments. 1. * So the state progression is always: INIT -> DATASYNC -> SYNCWAIT -> - * CATCHUP -> SYNCDONE -> READY. + * CATCHUP -> (sync worker TCOPYDONE) -> SYNCDONE -> READY. I noticed the new state TCOPYDONE is commented between CATCHUP and SYNCDONE, But It seems the SUBREL_STATE_TCOPYDONE is actually set before SUBREL_STATE_SYNCWAIT[1]. Did i miss something here ? [1]- + UpdateSubscriptionRelState(MyLogicalRepWorker->subid, + MyLogicalRepWorker->relid, + SUBREL_STATE_TCOPYDONE, + MyLogicalRepWorker->relstate_lsn); ... /* * We are done with the initial data synchronization, update the state. */ SpinLockAcquire(>relmutex); MyLogicalRepWorker->relstate = SUBREL_STATE_SYNCWAIT; -- 2. i = initialize, d = data is being copied, + C = table data has been copied, s = synchronized, r = ready (normal replication) +#define SUBREL_STATE_TCOPYDONE 't' /* tablesync copy phase is completed +* (sublsn NULL) */ The character representing 'data has been copied' in the catalog seems different from the macro define. Best regards, houzj
RE: Parallel Inserts in CREATE TABLE AS
> > > > + /* Okay to parallelize inserts, so mark it. */ > > + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) > > + ((DR_intorel *) dest)->is_parallel = true; > > > > + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) > > + ((DR_intorel *) dest)->is_parallel = false; > > We need to know exactly what is the command in above place, to dereference > and mark is_parallel to true, because is_parallel is being added to the > respective structures, not to the generic _DestReceiver structure. So, in > future the above code becomes something like below: > > +/* Okay to parallelize inserts, so mark it. */ > +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) > +((DR_intorel *) dest)->is_parallel = true; > +else if (ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW) > +((DR_transientrel *) dest)->is_parallel = true; > +else if (ins_cmd == PARALLEL_INSERT_CMD_COPY_TO) > +((DR_copy *) dest)->is_parallel = true; > > In the below place, instead of new function, I think we can just have > something like if (fpes->ins_cmd_type != PARALLEL_INSERT_CMD_UNDEF) > > > Or > > > > + if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) > > + pg_atomic_add_fetch_u64(>processed, > > + queryDesc->estate->es_processed); > > > > If you think the above code will extend the ins_cmd type check in the > future, the generic function may make sense. > > We can also change below to fpes->ins_cmd_type != > PARALLEL_INSERT_CMD_UNDEF. > > +if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) > +receiver = ExecParallelGetInsReceiver(toc, fpes); > > If okay, I will modify it in the next version of the patch. Yes, that looks good to me. Best regards, houzj
RE: Parallel Inserts in CREATE TABLE AS
> > I think it makes sense. > > > > And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be > > used in some places, How about define a generic function with some comment > to mention the purpose. > > > > An example in INSERT INTO SELECT patch: > > +/* > > + * IsModifySupportedInParallelMode > > + * > > + * Indicates whether execution of the specified table-modification > > +command > > + * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to > > +certain > > + * parallel-safety conditions. > > + */ > > +static inline bool > > +IsModifySupportedInParallelMode(CmdType commandType) { > > + /* Currently only INSERT is supported */ > > + return (commandType == CMD_INSERT); } > > The intention of assert is to verify that those functions are called for > appropriate commands such as CTAS, Refresh Mat View and so on with correct > parameters. I really don't think so we can replace the assert with a function > like above, in the release mode assertion will always be true. In a way, > that assertion is for only debugging purposes. And I also think that when > we as the callers know when to call those new functions, we can even remove > the assertions, if they are really a problem here. Thoughts? Hi Thanks for the explanation. If the check about command type is only used in assert, I think you are right. I suggested a new function because I guess the check can be used in some other places. Such as: + /* Okay to parallelize inserts, so mark it. */ + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + ((DR_intorel *) dest)->is_parallel = true; + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + ((DR_intorel *) dest)->is_parallel = false; Or + if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) + pg_atomic_add_fetch_u64(>processed, queryDesc->estate->es_processed); If you think the above code will extend the ins_cmd type check in the future, the generic function may make sense. Best regards, houzj
RE: Parallel Inserts in CREATE TABLE AS
> > For v20-0001-Parallel-Inserts-in-CREATE-TABLE-AS.patch : > > > > ParallelInsCmdEstimate : > > > > + Assert(pcxt && ins_info && > > + (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)); > > + > > + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) > > > > Sinc the if condition is covered by the assertion, I wonder why the if > check is still needed. > > > > Similar comment for SaveParallelInsCmdFixedInfo and > > SaveParallelInsCmdInfo > > Thanks. > > The idea is to have assertion with all the expected ins_cmd types, and then > later to have selective handling for different ins_cmds. For example, if > we add (in future) parallel insertion in Refresh Materialized View, then > the code in those functions will be something > like: > > +static void > +ParallelInsCmdEstimate(ParallelContext *pcxt, ParallelInsertCmdKind > ins_cmd, > + void *ins_info) > +{ > +Assert(pcxt && ins_info && > + (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS || > +(ins_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW)); > + > +if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS) > +{ > + > +} > + else if (ns_cmd == PARALLEL_INSERT_CMD_REFRESH_MAT_VIEW) > + { > + > + } > > Similarly for other functions as well. I think it makes sense. And if the check about ' ins_cmd == xxx1 || ins_cmd == xxx2' may be used in some places, How about define a generic function with some comment to mention the purpose. An example in INSERT INTO SELECT patch: +/* + * IsModifySupportedInParallelMode + * + * Indicates whether execution of the specified table-modification command + * (INSERT/UPDATE/DELETE) in parallel-mode is supported, subject to certain + * parallel-safety conditions. + */ +static inline bool +IsModifySupportedInParallelMode(CmdType commandType) +{ + /* Currently only INSERT is supported */ + return (commandType == CMD_INSERT); +} Best regards, houzj
fix typo in ReorderBufferProcessTXN
Hi I found a possible typo in the comment of ReorderBufferProcessTXN(). * relmapper has no "historic" view, in contrast to normal * the normal catalog during decoding. Thus repeated Is there an extra ‘normal’ in the comment ? in contrast to normal the normal catalog during decoding == >> in contrast to the normal catalog during decoding Best regards, houzj 0001-fix-typo.patch Description: 0001-fix-typo.patch
RE: Parallel Inserts in CREATE TABLE AS
Hi > > wrt v18-0002patch: > > It looks like this introduces a state machine that goes like: > - starts at CTAS_PARALLEL_INS_UNDEF > - possibly moves to CTAS_PARALLEL_INS_SELECT > - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added > - if both were added at some stage, we can go to > CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs > > what i'm wondering is why you opted to put logic around > generate_useful_gather_paths and in cost_gather when to me it seems more > logical to put it in create_gather_path? i'm probably missing something > there? IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node. And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can only create top node Gather. So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top node. Best regards, houzj
RE: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW
> Yeah without explain analyze we can not show whether the parallelism is > picked in the test cases. What we could do is that we can add a plain RMV > test case in write_parallel.sql after CMV so that at least we can be ensured > that the parallelism will be picked because of the enforcement there. We > can always see the parallelism for the select part of explain analyze CMV > in write_parallel.sql and the same select query gets planned even in RMV > cases. > > IMO, the patch in this thread can go with test case addition to > write_parallel.sql. since it is very small. > > Thoughts? Yes, agreed. Best regards, houzj
RE: [POC] Fast COPY FROM command for the table with foreign partitions
Hi > see new version in attachment. I took a look into the patch, and have some comments. 1. + PG_FINALLY(); + { + copy_fmstate = NULL; /* Detect problems */ I don't quite understand this comment, does it means we want to detect something like Null reference ? 2. + PG_FINALLY(); + { ... + if (!OK) + PG_RE_THROW(); + } Is this PG_RE_THROW() necessary ? IMO, PG_FINALLY will reproduce the PG_RE_THROW action if we get to the code block due to an error being thrown. 3. + ereport(ERROR, + (errmsg("unexpected extra results during COPY of table: %s", + PQerrorMessage(conn; I found some similar message like the following: pg_log_warning("unexpected extra results during COPY of table \"%s\"", tocEntryTag); How about using existing messages style ? 4. I noticed some not standard code comment[1]. I think it's better to comment like: /* * line 1 * line 2 */ [1]--- + /* Finish COPY IN protocol. It is needed to do after successful copy or +* after an error. +*/ +/* + * + * postgresExecForeignCopy +/* + * + * postgresBeginForeignCopy --- Best regards, Houzj
RE: Parallel copy
Hi > Yes this optimization can be done, I will handle this in the next patch > set. > I have a suggestion for the parallel safety-check. As designed, The leader does not participate in the insertion of data. If User use (PARALLEL 1), there is only one worker process which will do the insertion. IMO, we can skip some of the safety-check in this case, becase the safety-check is to limit parallel insert. (except temporary table or ...) So, how about checking (PARALLEL 1) separately ? Although it looks a bit complicated, But (PARALLEL 1) do have a good performance improvement. Best regards, houzj
RE: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW
> Thanks for taking a look at the patch. > > The intention of the patch is to just enable the parallel mode while planning > the select part of the materialized view, but the insertions do happen in > the leader backend itself. That way even if there's temporary tablespace > gets created, we have no problem. > > This patch can be thought as a precursor to parallelizing inserts in refresh > matview. I'm thinking to follow the design of parallel inserts in ctas [1] > i.e. pushing the dest receiver down to the workers once that gets reviewed > and finalized. At that time, we should take care of not pushing inserts > down to workers if temporary tablespace gets created. > > In summary, as far as this patch is considered we don't have any problem > with temporary tablespace getting created with CONCURRENTLY option. > > I'm planning to add a few test cases to cover this patch in matview.sql > and post a v2 patch soon. Thanks for your explanation! You are right that temporary tablespace does not affect current patch. For the testcase: I noticed that you have post a mail about add explain support for REFRESH MATERIALIZED VIEW. Do you think we can combine these two features in one thread ? Personally, The testcase with explain info will be better. Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
Hi > > I may be wrong, and if I miss sth in previous mails, please give me some > hints. > > IMO, serial insertion with underlying parallel SELECT can be > > considered for foreign table or temporary table, as the insertions only > happened in the leader process. > > > > I don't think we support parallel scan for temporary tables. Can you please > try once both of these operations without Insert being involved? If you > are able to produce a parallel plan without Insert then we can see why it > is not supported with Insert. Sorry, may be I did not express it clearly, I actually means the case when insert's target(not in select part) table is temporary. And you are right that parallel select is not enabled when temporary table is in select part. I test for the case when insert's target table is temporary or not. --insert into not temporary table--- postgres=# explain (costs off) insert into notemp select * from test where i < 600; QUERY PLAN --- Gather Workers Planned: 4 -> Insert on notemp -> Parallel Seq Scan on test Filter: (i < 600) --insert into temporary table--- postgres=# explain (costs off) insert into temp select * from test where i < 600; QUERY PLAN --- Insert on temp -> Seq Scan on test Filter: (i < 600) ---without insert part--- postgres=# explain (costs off) select * from test where i < 600; QUERY PLAN - Gather Workers Planned: 4 -> Parallel Seq Scan on test Filter: (i < 600) Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
Hi I have an issue about the parallel-safety checks. If target table is foreign table or temporary table, rel_max_parallel_hazard_for_modify will return PROPARALLEL_UNSAFE, which not only disable parallel insert but also disable underlying parallel SELECT. +create temporary table temp_names (like names); +explain (costs off) insert into temp_names select * from names; + QUERY PLAN +- + Insert on temp_names + -> Seq Scan on names +(2 rows) I may be wrong, and if I miss sth in previous mails, please give me some hints. IMO, serial insertion with underlying parallel SELECT can be considered for foreign table or temporary table, as the insertions only happened in the leader process. Are there any special considerations for this case ? Best regards, houzj
RE: Consider Parallelism While Planning For REFRESH MATERIALIZED VIEW
Hi > Added this to commitfest, in case it is useful - > https://commitfest.postgresql.org/31/2856/ I have an issue about the safety of enable parallel select. I checked the [parallel insert into select] patch. https://commitfest.postgresql.org/31/2844/ It seems parallel select is not allowed when target table is temporary table. + /* +* We can't support table modification in parallel-mode if it's a foreign +* table/partition (no FDW API for supporting parallel access) or a +* temporary table. +*/ + if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE || + RelationUsesLocalBuffers(rel)) + { + table_close(rel, lockmode); + context->max_hazard = PROPARALLEL_UNSAFE; + return context->max_hazard; + } For Refresh MatView. if CONCURRENTLY is specified, It will builds new data in temp tablespace: if (concurrent) { tableSpace = GetDefaultTablespace(RELPERSISTENCE_TEMP, false); relpersistence = RELPERSISTENCE_TEMP; } For the above case, should we still consider parallelism ? Best regards, houzj
RE: Parallel INSERT (INTO ... SELECT ...)
Hi > Posting an updated set of patches to address recent feedback: > > - Removed conditional-locking code used in parallel-safety checking code > (Tsunakawa-san feedback). It turns out that for the problem test case, no > parallel-safety checking should be occurring that locks relations because > those inserts are specifying VALUES, not an underlying SELECT, so the > parallel-safety checking code was updated to bail out early if no underlying > SELECT is specified for the INSERT. No change to the test code was required. > - Added comment to better explain the reason for treating "INSERT ... > ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip) > - Added assertion to heap_prepare_insert() (Amit) > - Updated error handling for NULL index_expr_item case (Vignesh) + + index_oid_list = RelationGetIndexList(rel); ... As memtioned in the comments of RelationGetIndexList: * we return a copy of the list palloc'd in the caller's context. The caller * may list_free() the returned list after scanning it. Shall we list_free(index_oid_list) at the end of function ? Just to avoid potential memory leak. Best regards, houzj
RE: Parallel Inserts in CREATE TABLE AS
Hi The cfbost seems complains about the testcase: Command exited with code 1 perl dumpregr.pl === $path ===\ndiff -w -U3 C:/projects/postgresql/src/test/regress/expected/write_parallel.out C:/projects/postgresql/src/test/regress/results/write_parallel.out --- C:/projects/postgresql/src/test/regress/expected/write_parallel.out 2020-12-21 01:41:17.745091500 + +++ C:/projects/postgresql/src/test/regress/results/write_parallel.out 2020-12-21 01:47:20.375514800 + @@ -1204,7 +1204,7 @@ -> Gather (actual rows=2 loops=1) Workers Planned: 3 Workers Launched: 3 - -> Parallel Seq Scan on temp2 (actual rows=0 loops=4) + -> Parallel Seq Scan on temp2 (actual rows=1 loops=4) Filter: (col2 < 3) Rows Removed by Filter: 1 (14 rows) @@ -1233,7 +1233,7 @@ -> Gather (actual rows=2 loops=1) Workers Planned: 3 Workers Launched: 3 - -> Parallel Seq Scan on temp2 (actual rows=0 loops=4) + -> Parallel Seq Scan on temp2 (actual rows=1 loops=4) Filter: (col2 < 3) Rows Removed by Filter: 1 (14 rows) Best regards, houzj
RE: postgres_fdw - cached connection leaks if the associated user mapping/foreign server is dropped
Hi I have an issue about the existing testcase. """ -- Test that alteration of server options causes reconnection SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1; -- should work ALTER SERVER loopback OPTIONS (SET dbname 'no such database'); SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1; -- should fail DO $d$ BEGIN EXECUTE $$ALTER SERVER loopback OPTIONS (SET dbname '$$||current_database()||$$')$$; END; $d$; SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1; -- should work again """ IMO, the above case is designed to test the following code[1]: With the patch, it seems the following code[1] will not work for this case, right? (It seems the connection will be disconnect in pgfdw_xact_callback) I do not know does it matter, or should we add a testcase to cover that? [1] /* * If the connection needs to be remade due to invalidation, disconnect as * soon as we're out of all transactions. */ if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0) { elog(DEBUG3, "closing connection %p for option changes to take effect", entry->conn); disconnect_pg_server(entry); } Best regards, houzj
RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Hi > Discussion here is on the point - whether to show up the invalidated > connections in the output of the new postgres_fdw_get_connections() > function? If we were to show, then because of the solution we proposed for > the connection leak problem in [1], will the invalidated entries be shown > every time? IMO, we introduced the function postgres_fdw_get_connections to decide whether there are too many connections exists and we should disconnect them. If User decide to disconnect, we have two cases: 1. user decide to disconnect one of them, I think it’s ok for user to disconnect invalidated connection, so we'd better list the invalidated connections. 2. User decide to disconnect all of them. In this case, It seems postgres_fdw_disconnect will disconnect both invalidated and not connections, And we should let user realize what connections they are disconnecting, so we should list the invalidated connections. Based on the above two cases, Personlly, I think we can list the invalidated connections. - I took a look into the patch, and have a little issue: +bool disconnect_cached_connections(uint32 hashvalue, bool all) + if (all) + { + hash_destroy(ConnectionHash); + ConnectionHash = NULL; + result = true; + } If disconnect_cached_connections is called to disconnect all the connections, should we reset the 'xact_got_connection' flag ? > [1] - > https://www.postgresql.org/message-id/flat/CALj2ACVNcGH_6qLY-4_tXz8JLv > A%2B4yeBThRfxMz7Oxbk1aHcpQ%40mail.gmail.com The patch about connection leak looks good to me. And I have a same issue about the new 'have_invalid_connections' flag, If we disconnect all the connections, should we reset the flag ? Best regards, houzj
RE: Parallel Inserts in CREATE TABLE AS
> From: Hou, Zhijie [mailto:houzj.f...@cn.fujitsu.com] > Sent: Tuesday, December 15, 2020 7:30 PM > To: Bharath Rupireddy > Cc: Amit Kapila ; Luc Vlaming ; > PostgreSQL-development ; Zhihong Yu > ; Dilip Kumar > Subject: RE: Parallel Inserts in CREATE TABLE AS > > > Thanks for the append use case. > > > > Here's my analysis on pushing parallel inserts down even in case the > > top node is Append. > > > > For union cases which need to remove duplicate tuples, we can't push > > the inserts or CTAS dest receiver down. If I'm not wrong, Append node > > is not doing duplicate removal(??), I saw that it's the HashAggregate > > node (which is the top node that removes the duplicate tuples). And > > also for except/except all/intersect/intersect all cases we receive > > HashSetOp nodes on top of Append. So for both cases, our check for > > Gather or Append at the top node is enough to detect this to not allow > parallel inserts. > > > > For union all: > > case 1: We can push the CTAS dest receiver to each Gather node Append > > ->Gather > > ->Parallel Seq Scan > > ->Gather > > ->Parallel Seq Scan > > ->Gather > > ->Parallel Seq Scan > > > > case 2: We can still push the CTAS dest receiver to each Gather node. > > Non-Gather nodes will do inserts as they do now i.e. by sending tuples > > to Append and from there to CTAS dest receiver. > > Append > > ->Gather > > ->Parallel Seq Scan > > ->Seq Scan / Join / any other non-Gather node > > ->Gather > > ->Parallel Seq Scan > > ->Seq Scan / Join / any other non-Gather node > > > > case 3: We can push the CTAS dest receiver to Gather Gather > > ->Parallel Append > > ->Parallel Seq Scan > > ->Parallel Seq Scan > > > > case 4: We can push the CTAS dest receiver to Gather Gather > > ->Parallel Append > > ->Parallel Seq Scan > > ->Parallel Seq Scan > > ->Seq Scan / Join / any other non-Gather node > > > > Please let me know if I'm missing any other possible use case. > > > > Thoughts? > > > Yes, The analysis looks right to me. > > > > As suggested by Amit earlier, I kept the 0001 patch(so far) such that > > it doesn't have the code to influence the planner to consider parallel > > tuple cost as 0. It works on the plan whatever gets generated and > > decides to allow parallel inserts or not. And in the 0002 patch, I > > added the code for influencing the planner to consider parallel tuple > > cost as 0. Maybe we can have a 0003 patch for tests alone. > > > > Once we are okay with the above analysis and use cases, we can > > incorporate the Append changes to respective patches. > > > > Hope that's okay. > > A little explanation about how to push down the ctas info in append. > > 1. about how to ignore tuple cost in this case. > IMO, it create gather path under append like the following: > query_planner > -make_one_rel > --set_base_rel_sizes > ---set_rel_size > set_append_rel_size (*) > -set_rel_size > --set_subquery_pathlist > ---subquery_planner > grouping_planner > -apply_scanjoin_target_to_paths > --generate_useful_gather_paths > > set_append_rel_size seems the right place where we can check and set a flag > to ignore tuple cost later. > We can set the flag for two cases when there is no parent path will be > created(such as : limit,sort,distinct...): > i) query_level is 1 > ii) query_level > 1 and we have set the flag in the parent_root. > > The case ii) is to check append under append: > Append >->Append >->Gather >->Other plan > > 2.about how to push ctas info down. > > We traversing the whole plans tree, and we only care Append and Gather type. > Gather: It set the ctas dest info and returned true at once if the gathernode > does not have projection. > Append: It will recursively traversing the subplan of Appendnode and will > reture true if one of the subplan can be parallel. > > +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) { > + bool parallel = false; > + > + if(ps == NULL) > + return parallel; > + > + if(IsA(ps, AppendState)) > + { > + AppendState *aps = (AppendState *) ps; > + for(int i = 0; i < aps->as_nplans; i++) > + { > + parallel |= > PushDownC
RE: Parallel Inserts in CREATE TABLE AS
> Thanks for the append use case. > > Here's my analysis on pushing parallel inserts down even in case the top > node is Append. > > For union cases which need to remove duplicate tuples, we can't push the > inserts or CTAS dest receiver down. If I'm not wrong, Append node is not > doing duplicate removal(??), I saw that it's the HashAggregate node (which > is the top node that removes the duplicate tuples). And also for > except/except all/intersect/intersect all cases we receive HashSetOp nodes > on top of Append. So for both cases, our check for Gather or Append at the > top node is enough to detect this to not allow parallel inserts. > > For union all: > case 1: We can push the CTAS dest receiver to each Gather node Append > ->Gather > ->Parallel Seq Scan > ->Gather > ->Parallel Seq Scan > ->Gather > ->Parallel Seq Scan > > case 2: We can still push the CTAS dest receiver to each Gather node. > Non-Gather nodes will do inserts as they do now i.e. by sending tuples to > Append and from there to CTAS dest receiver. > Append > ->Gather > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > ->Gather > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > > case 3: We can push the CTAS dest receiver to Gather Gather > ->Parallel Append > ->Parallel Seq Scan > ->Parallel Seq Scan > > case 4: We can push the CTAS dest receiver to Gather Gather > ->Parallel Append > ->Parallel Seq Scan > ->Parallel Seq Scan > ->Seq Scan / Join / any other non-Gather node > > Please let me know if I'm missing any other possible use case. > > Thoughts? Yes, The analysis looks right to me. > As suggested by Amit earlier, I kept the 0001 patch(so far) such that it > doesn't have the code to influence the planner to consider parallel tuple > cost as 0. It works on the plan whatever gets generated and decides to allow > parallel inserts or not. And in the 0002 patch, I added the code for > influencing the planner to consider parallel tuple cost as 0. Maybe we can > have a 0003 patch for tests alone. > > Once we are okay with the above analysis and use cases, we can incorporate > the Append changes to respective patches. > > Hope that's okay. A little explanation about how to push down the ctas info in append. 1. about how to ignore tuple cost in this case. IMO, it create gather path under append like the following: query_planner -make_one_rel --set_base_rel_sizes ---set_rel_size set_append_rel_size (*) -set_rel_size --set_subquery_pathlist ---subquery_planner grouping_planner -apply_scanjoin_target_to_paths --generate_useful_gather_paths set_append_rel_size seems the right place where we can check and set a flag to ignore tuple cost later. We can set the flag for two cases when there is no parent path will be created(such as : limit,sort,distinct...): i) query_level is 1 ii) query_level > 1 and we have set the flag in the parent_root. The case ii) is to check append under append: Append ->Append ->Gather ->Other plan 2.about how to push ctas info down. We traversing the whole plans tree, and we only care Append and Gather type. Gather: It set the ctas dest info and returned true at once if the gathernode does not have projection. Append: It will recursively traversing the subplan of Appendnode and will reture true if one of the subplan can be parallel. +PushDownCTASParallelInsertState(DestReceiver *dest, PlanState *ps) +{ + bool parallel = false; + + if(ps == NULL) + return parallel; + + if(IsA(ps, AppendState)) + { + AppendState *aps = (AppendState *) ps; + for(int i = 0; i < aps->as_nplans; i++) + { + parallel |= PushDownCTASParallelInsertState(dest, aps->appendplans[i]); + } + } + else if(IsA(ps, GatherState) && !ps->ps_ProjInfo) + { + GatherState *gstate = (GatherState *) ps; + parallel = true; + + ((DR_intorel *) dest)->is_parallel = true; + gstate->dest = dest; + ps->plan->plan_rows = 0; + } + + return parallel; +} Best regards, houzj 0001-support-pctas-in-append-parallel-inserts.patch Description: 0001-support-pctas-in-append-parallel-inserts.patch 0002-support-pctas-in-append-tuple-cost-adjustment.patch Description: 0002-support-pctas-in-append-tuple-cost-adjustment.patch
RE: Parallel Inserts in CREATE TABLE AS
Hi > Attaching v11 patch set. Please review it further. Currently with the patch, we can allow parallel CTAS when topnode is Gather. When top-node is Append and Gather is the sub-node of Append, I think we can still enable Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as: Append -->Gather ->Create table ->Seqscan -->Gather ->create table ->Seqscan And the use case seems common to me, such as: select * from A where xxx union all select * from B where xxx; I attatch a WIP patch which just show the possibility of this feature. The patch is based on the latest v11-patch. What do you think? Best regards, houzj 0001-patch-for-pctas-in-append.patch Description: 0001-patch-for-pctas-in-append.patch
RE: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
> IMO, let's not change the 1) behaviour to 3) with the patch. If agreed, > I can do the following way in ExplainOneUtility and will add a comment on > why we are doing this. > > if (es->analyze) > (void) CheckRelExistenceInCTAS(ctas, true); > > Thoughts? Agreed. Just in case, I took a look at Oracle 12’s behavior about [explain CTAS]. Oracle 12 will output the plan without throwing any msg in this case. Best regards, houzj
RE: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs
> Currently, for CTAS or CREATE MATERIALIZED VIEW(CMV) without if-not-exists > clause, the existence of the relation gets checked during the execution > of the select part and an error is thrown there. > All the unnecessary rewrite and planning for the select part would have > happened just to fail later. However, if if-not-exists clause is present, > then a notice is issued and returned immediately without any further rewrite > or planning for . This seems somewhat inconsistent to me. > > I propose to check the relation existence early in ExecCreateTableAs() as > well as in ExplainOneUtility() and throw an error in case it exists already > to avoid unnecessary rewrite, planning and execution of the select part. > > Attaching a patch. Note that I have not added any test cases as the existing > test cases in create_table.sql and matview.sql would cover the code. > > Thoughts? Personally, I think it make sense, as other CMD(such as create extension/index ...) throw that error before any further operation too. I am just a little worried about the behavior change of [explain CTAS]. May be someone will complain the change from normal explaininfo to error output. And I took a look into the patch. + StringInfoData emsg; + + initStringInfo(); + + if (level == NOTICE) + appendStringInfo(, Using variable emsg and level seems a little complicated to me. How about just: if (!is_explain && ctas->if_not_exists) ereport(NOTICE,xxx else ereport(ERROR,xxx Best regards, houzj
RE: Parallel Inserts in CREATE TABLE AS
Hi + allow = ps && IsA(ps, GatherState) && !ps->ps_ProjInfo && + plannedstmt->parallelModeNeeded && + plannedstmt->planTree && + IsA(plannedstmt->planTree, Gather) && + plannedstmt->planTree->lefttree && + plannedstmt->planTree->lefttree->parallel_aware && + plannedstmt->planTree->lefttree->parallel_safe; I noticed it check both IsA(ps, GatherState) and IsA(plannedstmt->planTree, Gather). Does it mean it is possible that IsA(ps, GatherState) is true but IsA(plannedstmt->planTree, Gather) is false ? I did some test but did not find a case like that. Best regards, houzj
Fix typo about generate_gather_paths
Hi Since ba3e76c, the optimizer call generate_useful_gather_paths instead of generate_gather_paths() outside. But I noticed that some comment still talking about generate_gather_paths not generate_useful_gather_paths. I think we should fix these comment, and I try to replace these " generate_gather_paths " with " generate_useful_gather_paths " Best regards, houzj 0001-fix-typo-about-generate_gather_paths.patch Description: 0001-fix-typo-about-generate_gather_paths.patch
RE: Parallel Inserts in CREATE TABLE AS
> > I'm not quite sure how to address this. Can we not allow the planner > > to consider that the select is for CTAS and check only after the > > planning is done for the Gather node and other checks? > > > > IIUC, you are saying that we should not influence the cost of gather node > even when the insertion would be done by workers? I think that should be > our fallback option anyway but that might miss some paths to be considered > parallel where the cost becomes more due to parallel_tuple_cost (aka tuple > transfer cost). I think the idea is we can avoid the tuple transfer cost > only when Gather is the top node because only at that time we can push > insertion down, right? How about if we have some way to detect the same > before calling generate_useful_gather_paths()? I think when we are calling > apply_scanjoin_target_to_paths() in grouping_planner(), if the > query_level is 1, it is for CTAS, and it doesn't have a chance to create > UPPER_REL (doesn't have grouping, order, limit, etc clause) then we can > probably assume that the Gather will be top_node. I am not sure about this > but I think it is worth exploring. > I took a look at the parallel insert patch and have the same idea. https://commitfest.postgresql.org/31/2844/ * Consider generating Gather or Gather Merge paths. We must only do this * if the relation is parallel safe, and we don't do it for child rels to * avoid creating multiple Gather nodes within the same plan. We must do * this after all paths have been generated and before set_cheapest, since * one of the generated paths may turn out to be the cheapest one. */ if (rel->consider_parallel && !IS_OTHER_REL(rel)) generate_useful_gather_paths(root, rel, false); IMO Gatherpath created here seems the right one which can possible ignore parallel cost if in CTAS. But We need check the following parse option which will create path to be the parent of Gatherpath here. if (root->parse->rowMarks) if (limit_needed(root->parse)) if (root->parse->sortClause) if (root->parse->distinctClause) if (root->parse->hasWindowFuncs) if (root->parse->groupClause || root->parse->groupingSets || root->parse->hasAggs || root->root->hasHavingQual) Best regards, houzj
RE: Parallel copy
> > 4. > > A suggestion for CacheLineInfo. > > > > It use appendBinaryStringXXX to store the line in memory. > > appendBinaryStringXXX will double the str memory when there is no enough > spaces. > > > > How about call enlargeStringInfo in advance, if we already know the whole > line size? > > It can avoid some memory waste and may impove a little performance. > > > > Here we will not know the size beforehand, in some cases we will start > processing the data when current block is populated and keep processing > block by block, we will come to know of the size at the end. We cannot use > enlargeStringInfo because of this. > > Attached v11 patch has the fix for this, it also includes the changes to > rebase on top of head. Thanks for the explanation. I think there is still chances we can know the size. +* line_size will be set. Read the line_size again to be sure if it is +* completed or partial block. +*/ + dataSize = pg_atomic_read_u32(>line_size); + if (dataSize != -1) + { If I am not wrong, this seems the branch that procsssing the populated block. I think we can check the copiedSize here, if copiedSize == 0, that means Datasizes is the size of the whole line and in this case we can do the enlarge. Best regards, houzj
RE: Parallel Inserts in CREATE TABLE AS
Hi + /* +* Flag to let the planner know that the SELECT query is for CTAS. This is +* used to calculate the tuple transfer cost from workers to gather node(in +* case parallelism kicks in for the SELECT part of the CTAS), to zero as +* each worker will insert its share of tuples in parallel. +*/ + if (IsParallelInsertInCTASAllowed(into, NULL)) + query->isForCTAS = true; + /* +* We do not compute the parallel_tuple_cost for CTAS because the number of +* tuples that are transferred from workers to the gather node is zero as +* each worker, in parallel, inserts the tuples that are resulted from its +* chunk of plan execution. This change may make the parallel plan cheap +* among all other plans, and influence the planner to consider this +* parallel plan. +*/ + if (!(root->parse->isForCTAS && + root->query_level == 1)) + run_cost += parallel_tuple_cost * path->path.rows; I noticed that the parallel_tuple_cost will still be ignored, When Gather is not the top node. Example: Create table test(i int); insert into test values(generate_series(1,1000,1)); explain create table ntest3 as select * from test where i < 200 limit 1; QUERY PLAN --- Limit (cost=1000.00..97331.33 rows=1000 width=4) -> Gather (cost=1000.00..97331.33 rows=1000 width=4) Workers Planned: 2 -> Parallel Seq Scan on test (cost=0.00..96331.33 rows=417 width=4) Filter: (i < 200) The isForCTAS will be true because [create table as], the query_level is always 1 because there is no subquery. So even if gather is not the top node, parallel cost will still be ignored. Is that works as expected ? Best regards, houzj
Re: A new function to wait for the backend exit after termination
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation:not tested Thanks for the new patch, the patch LGTM and works as expected The new status of this patch is: Ready for Committer
RE: A new function to wait for the backend exit after termination
Hi, When test pg_terminate_backend_and_wait with parallel query. I noticed that the function is not defined as parallel safe. I am not very familiar with the standard about whether a function should be parallel safe. But I found the following function are all defined as parallel safe: pg_promote pg_terminate_backend(integer) pg_sleep* Is there a reason why pg_terminate_backend_and_wait are not parallel safe ? (I'm sorry if I miss something in previous mails.) Best regards, houzj
RE: A new function to wait for the backend exit after termination
Hi, -however only superusers can terminate superuser backends. +however only superusers can terminate superuser backends. When no +wait and timeout are +provided, only SIGTERM is sent to the backend with the given process +ID and false is returned immediately. But the I test the case when no wait and timeout are provided. True is returned as the following which seems different from the doc. postgres=# select pg_terminate_backend(pid); pg_terminate_backend -- t (1 row) Best regards, houzj
RE: A new function to wait for the backend exit after termination
> > I changed the status to 'wait on anthor'. > The others of the patch LGTM, > I think it can be changed to Ready for Committer again, when this comment > is confirmed. > I am Sorry I forgot a possible typo comment. +{ oid => '16386', descr => 'terminate a backend process and wait for it\'s exit or until timeout occurs' Does the following change looks better? Wait for it\'s exit => Wait for its exit Best regards, houzj
RE: A new function to wait for the backend exit after termination
> > + ereport(WARNING, > > + (errmsg("could not wait for the termination of the > backend with PID %d within %ld milliseconds", > > + pid, timeout))); > > > The code use %ld to print int64 type. > > How about use INT64_FORMAT, which looks more appropriate. > > This is a translatable message, so INT64_FORMAT is no good -- we need > something that is the same across platforms. The current project standard > for this problem is to use "%lld" and explicitly cast the argument to long > long int to match that. Thank you for pointing out that, And sorry for did not think of it. Yes, we can use %lld, (long long int) timeout. Best regards, houzj
RE: A new function to wait for the backend exit after termination
Hi I take a look into the patch, and here some comments. 1. + + ereport(WARNING, + (errmsg("could not wait for the termination of the backend with PID %d within %ld milliseconds", + pid, timeout))); + The code use %ld to print int64 type. How about use INT64_FORMAT, which looks more appropriate. 2. + if (timeout <= 0) + { + ereport(WARNING, + (errmsg("timeout cannot be negative or zero: %ld", timeout))); + PG_RETURN_BOOL(r); + } The same as 1. 3. +pg_terminate_backend_and_wait(PG_FUNCTION_ARGS) +{ + int pid = PG_GETARG_DATUM(0); +pg_wait_backend(PG_FUNCTION_ARGS) +{ + int pid = PG_GETARG_INT32(0); The code use different macro to get pid, How about use PG_GETARG_INT32(0) for each one. I changed the status to 'wait on anthor'. The others of the patch LGTM, I think it can be changed to Ready for Committer again, when this comment is confirmed. Best regards, houzj
Avoid using lcons and list_delete_first in plan_union_children()
Hi, In function plan_union_children(), I found the lcons and list_delete_first here is easy to be replaced by lappend and list_delete_last. And I also found a previous commit do similar thing, so I try to improve this one. Previous commit: d97b714a219959a50f9e7b37ded674f5132f93f3 Best regards, houzj 0001-Avoid-using-lcons-and-list_delete_first.patch Description: 0001-Avoid-using-lcons-and-list_delete_first.patch
RE: [PATCH] Keeps tracking the uniqueness with UniqueKey
Hi I look into the patch again and have some comments. 1. + Size oid_cmp_len = sizeof(Oid) * ind1->ncolumns; + + return ind1->ncolumns == ind2->ncolumns && + ind1->unique == ind2->unique && + memcmp(ind1->indexkeys, ind2->indexkeys, sizeof(int) * ind1->ncolumns) == 0 && + memcmp(ind1->opfamily, ind2->opfamily, oid_cmp_len) == 0 && + memcmp(ind1->opcintype, ind2->opcintype, oid_cmp_len) == 0 && + memcmp(ind1->sortopfamily, ind2->sortopfamily, oid_cmp_len) == 0 && + equal(get_tlist_exprs(ind1->indextlist, true), + get_tlist_exprs(ind2->indextlist, true)); The length of sortopfamily,opfamily and opcintype seems ->nkeycolumns not ->ncolumns. I checked function get_relation_info where init the IndexOptInfo. (If there are more places where can change the length, please correct me) 2. + COPY_SCALAR_FIELD(ncolumns); + COPY_SCALAR_FIELD(nkeycolumns); + COPY_SCALAR_FIELD(unique); + COPY_SCALAR_FIELD(immediate); + /* We just need to know if it is NIL or not */ + COPY_SCALAR_FIELD(indpred); + COPY_SCALAR_FIELD(predOK); + COPY_POINTER_FIELD(indexkeys, from->ncolumns * sizeof(int)); + COPY_POINTER_FIELD(indexcollations, from->ncolumns * sizeof(Oid)); + COPY_POINTER_FIELD(opfamily, from->ncolumns * sizeof(Oid)); + COPY_POINTER_FIELD(opcintype, from->ncolumns * sizeof(Oid)); + COPY_POINTER_FIELD(sortopfamily, from->ncolumns * sizeof(Oid)); + COPY_NODE_FIELD(indextlist); The same as 1. Should use nkeycolumns if I am right. 3. + foreach(lc, newnode->indextlist) + { + TargetEntry *tle = lfirst_node(TargetEntry, lc); + /* Index on expression is ignored */ + Assert(IsA(tle->expr, Var)); + tle->expr = (Expr *) find_parent_var(appinfo, (Var *) tle->expr); + newnode->indexkeys[idx] = castNode(Var, tle->expr)->varattno; + idx++; + } The count variable 'idx' can be replaces by foreach_current_index(). Best regards, houzj
support IncrementalSortPath type in outNode()
Hi, I found IncrementalSortPath type is not analyzed in outNode. It does not matter during runtime though. Just for the convenience of debugging, when debug by gdb, we can use "call pprint(incresortpath)" to list the node info. So I try to support IncrementalSortPath in outNode. Best regards, houzj 0001-support-incremental-sort-in-outnode.patch Description: 0001-support-incremental-sort-in-outnode.patch
RE: Parallel Inserts in CREATE TABLE AS
Hi, > > I took a deep look at the projection logic. > > In most cases, you are right that Gather node does not need projection. > > > > In some rare cases, such as Subplan (or initplan I guess). > > The projection will happen in Gather node. > > > > The example: > > > > Create table test(i int); > > Create table test2(a int, b int); > > insert into test values(generate_series(1,1000,1)); > > insert into test2 values(generate_series(1,1000,1), > > generate_series(1,1000,1)); > > > > postgres=# explain(verbose, costs off) select test.i,(select i from > (select * from test2) as tt limit 1) from test where test.i < 2000; > >QUERY PLAN > > > > Gather > >Output: test.i, (SubPlan 1) > >Workers Planned: 2 > >-> Parallel Seq Scan on public.test > > Output: test.i > > Filter: (test.i < 2000) > >SubPlan 1 > > -> Limit > >Output: (test.i) > >-> Seq Scan on public.test2 > > Output: test.i > > > > In this case, projection is necessary, because the subplan will be > > executed in projection. > > > > If skipped, the table created will loss some data. > > > > Thanks a lot for the use case. Yes with the current patch table will lose > data related to the subplan. On analyzing further, I think we can not allow > parallel inserts in the cases when the Gather node has some projections > to do. Because the workers can not perform that projection. So, having > ps_ProjInfo in the Gather node is an indication for us to disable parallel > inserts and only the leader can do the insertions after the Gather node > does the required projections. > > Thoughts? > Agreed. 2. @@ -166,6 +228,16 @@ ExecGather(PlanState *pstate) { ParallelContext *pcxt; + /* +* Take the necessary information to be passed to workers for +* parallel inserts in CTAS. +*/ + if (ISCTAS(node->ps.intoclause)) + { + node->ps.lefttree->intoclause = node->ps.intoclause; + node->ps.lefttree->objectid = node->ps.objectid; + } + /* Initialize, or re-initialize, shared state needed by workers. */ if (!node->pei) node->pei = ExecInitParallelPlan(node->ps.lefttree, I found the code pass intoclause and objectid to Gather node's lefttree. Is it necessary? It seems only Gather node will use the information. Best regards, houzj
RE: Parallel Inserts in CREATE TABLE AS
Hi , > On Thu, Nov 26, 2020 at 7:47 AM Hou, Zhijie > wrote: > > > > Hi, > > > > I have an issue about the following code: > > > > econtext = node->ps.ps_ExprContext; > > ResetExprContext(econtext); > > > > + if (ISCTAS(node->ps.intoclause)) > > + { > > + ExecParallelInsertInCTAS(node); > > + return NULL; > > + } > > > > /* If no projection is required, we're done. */ > > if (node->ps.ps_ProjInfo == NULL) > > return slot; > > > > /* > > * Form the result tuple using ExecProject(), and return it. > > */ > > econtext->ecxt_outertuple = slot; > > return ExecProject(node->ps.ps_ProjInfo); > > > > It seems the projection will be skipped. > > Is this because projection is not required in this case ? > > (I'm not very familiar with where the projection will be.) > > > > For parallel inserts in CTAS, I don't think we need to project the tuples > being returned from the underlying plan nodes, and also we have nothing > to project from the Gather node further up. The required projection will > happen while the tuples are being returned from the underlying nodes and > the projected tuples are being directly fed to CTAS's dest receiver > intorel_receive(), from there into the created table. We don't need > ExecProject again in ExecParallelInsertInCTAS(). > > For instance, projection will always be done when the tuple is being returned > from an underlying sequential scan node(see ExecScan() --> > ExecProject() and this is true for both leader and workers. In both leader > and workers, we are just calling CTAS's dest receiver intorel_receive(). > > Thoughts? I took a deep look at the projection logic. In most cases, you are right that Gather node does not need projection. In some rare cases, such as Subplan (or initplan I guess). The projection will happen in Gather node. The example: Create table test(i int); Create table test2(a int, b int); insert into test values(generate_series(1,1000,1)); insert into test2 values(generate_series(1,1000,1), generate_series(1,1000,1)); postgres=# explain(verbose, costs off) select test.i,(select i from (select * from test2) as tt limit 1) from test where test.i < 2000; QUERY PLAN Gather Output: test.i, (SubPlan 1) Workers Planned: 2 -> Parallel Seq Scan on public.test Output: test.i Filter: (test.i < 2000) SubPlan 1 -> Limit Output: (test.i) -> Seq Scan on public.test2 Output: test.i In this case, projection is necessary, because the subplan will be executed in projection. If skipped, the table created will loss some data. Best regards, houzj
RE: Parallel Inserts in CREATE TABLE AS
Hi, I have an issue about the following code: econtext = node->ps.ps_ExprContext; ResetExprContext(econtext); + if (ISCTAS(node->ps.intoclause)) + { + ExecParallelInsertInCTAS(node); + return NULL; + } /* If no projection is required, we're done. */ if (node->ps.ps_ProjInfo == NULL) return slot; /* * Form the result tuple using ExecProject(), and return it. */ econtext->ecxt_outertuple = slot; return ExecProject(node->ps.ps_ProjInfo); It seems the projection will be skipped. Is this because projection is not required in this case ? (I'm not very familiar with where the projection will be.) If projection is not required here, shall we add some comments here? Best regards, houzj
RE: Parallel Inserts in CREATE TABLE AS
Hi, I'm very interested in this feature, and I'm looking at the patch, here are some comments. 1. + if (!TupIsNull(outerTupleSlot)) + { + (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest); + node->ps.state->es_processed++; + } + + if(TupIsNull(outerTupleSlot)) + break; + } How about the following style: if(TupIsNull(outerTupleSlot)) Break; (void) node->ps.dest->receiveSlot(outerTupleSlot, node->ps.dest); node->ps.state->es_processed++; Which looks cleaner. 2. + + if (into != NULL && + IsA(into, IntoClause)) + { The check can be replaced by ISCTAS(into). 3. + /* +* For parallelizing inserts in CTAS i.e. making each +* parallel worker inerst it's tuples, we must send +* information such as intoclause(for each worker 'inerst' looks like a typo (insert). 4. + /* Estimate space for into clause for CTAS. */ + if (ISCTAS(planstate->intoclause)) + { + intoclausestr = nodeToString(planstate->intoclause); + shm_toc_estimate_chunk(>estimator, strlen(intoclausestr) + 1); + shm_toc_estimate_keys(>estimator, 1); + } ... + if (intoclausestr != NULL) + { + char *shmptr = (char *)shm_toc_allocate(pcxt->toc, + strlen(intoclausestr) + 1); + strcpy(shmptr, intoclausestr); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr); + } The code here call strlen(intoclausestr) for two times, After checking the existing code in ExecInitParallelPlan, It used to store the strlen in a variable. So how about the following style: intoclause_len = strlen(intoclausestr); ... /* Store serialized intoclause. */ intoclause_space = shm_toc_allocate(pcxt->toc, intoclause_len + 1); memcpy(shmptr, intoclausestr, intoclause_len + 1); shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, intoclause_space); the code in ExecInitParallelPlan 5. + if (intoclausestr != NULL) + { + char *shmptr = (char *)shm_toc_allocate(pcxt->toc, + strlen(intoclausestr) + 1); + strcpy(shmptr, intoclausestr); + shm_toc_insert(pcxt->toc, PARALLEL_KEY_INTO_CLAUSE, shmptr); + } + /* Set up the tuple queues that the workers will write into. */ - pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false); + if (intoclausestr == NULL) + pei->tqueue = ExecParallelSetupTupleQueues(pcxt, false); The two check about intoclausestr seems can be combined like: if (intoclausestr != NULL) { ... } else { ... } Best regards, houzj
RE: Parallel copy
Hi Vignesh, I took a look at the v10 patch set. Here are some comments: 1. +/* + * CheckExprParallelSafety + * + * Determine if where cluase and default expressions are parallel safe & do not + * have volatile expressions, return true if condition satisfies else return + * false. + */ 'cluase' seems a typo. 2. + /* +* Make sure that no worker has consumed this element, if this +* line is spread across multiple data blocks, worker would have +* started processing, no need to change the state to +* LINE_LEADER_POPULATING in this case. +*/ + (void) pg_atomic_compare_exchange_u32(>line_state, + _line_state, + LINE_LEADER_POPULATED); About the commect +* started processing, no need to change the state to +* LINE_LEADER_POPULATING in this case. Does it means no need to change the state to LINE_LEADER_POPULATED ' here? 3. + * 3) only one worker should choose one line for processing, this is handled by + *using pg_atomic_compare_exchange_u32, worker will change the state to + *LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED. In the latest patch, it will set the state to LINE_WORKER_PROCESSING if line_state is LINE_LEADER_POPULATED or LINE_LEADER_POPULATING. So The comment here seems wrong. 4. A suggestion for CacheLineInfo. It use appendBinaryStringXXX to store the line in memory. appendBinaryStringXXX will double the str memory when there is no enough spaces. How about call enlargeStringInfo in advance, if we already know the whole line size? It can avoid some memory waste and may impove a little performance. Best regards, houzj
Add Nullif case for eval_const_expressions_mutator
Hi I notice that there are no Nullif case in eval_const_expression. Since Nullif is similar to Opexpr and is easy to implement, I try add this case in eval_const_expressions_mutator. Best regards, houzj 0001-patch-eval-NULLIF.patch Description: 0001-patch-eval-NULLIF.patch
RE: Parallel copy
Hi > > my $bytes = $ARGV[0]; > for(my $i = 0; $i < $bytes; $i+=8){ > print "longdata"; > } > print "\n"; > > > postgres=# copy longdata from program 'perl /tmp/longdata.pl 1' > with (parallel 2); > > This gets stuck forever (or at least I didn't have the patience to wait > it finish). Both worker processes are consuming 100% of CPU. I had a look over this problem. the ParallelCopyDataBlock has size limit: uint8 skip_bytes; chardata[DATA_BLOCK_SIZE]; /* data read from file */ It seems the input line is so long that the leader process run out of the Shared memory among parallel copy workers. And the leader process keep waiting free block. For the worker process, it wait util line_state becomes LINE_LEADER_POPULATED, But leader process won't set the line_state unless it read the whole line. So it stuck forever. May be we should reconsider about this situation. The stack is as follows: Leader stack: #3 0x0075f7a1 in WaitLatch (latch=, wakeEvents=wakeEvents@entry=41, timeout=timeout@entry=1, wait_event_info=wait_event_info@entry=150994945) at latch.c:411 #4 0x005a9245 in WaitGetFreeCopyBlock (pcshared_info=pcshared_info@entry=0x7f26d2ed3580) at copyparallel.c:1546 #5 0x005a98ce in SetRawBufForLoad (cstate=cstate@entry=0x2978a88, line_size=67108864, copy_buf_len=copy_buf_len@entry=65536, raw_buf_ptr=raw_buf_ptr@entry=65536, copy_raw_buf=copy_raw_buf@entry=0x7fff4cdc0e18) at copyparallel.c:1572 #6 0x005a1963 in CopyReadLineText (cstate=cstate@entry=0x2978a88) at copy.c:4058 #7 0x005a4e76 in CopyReadLine (cstate=cstate@entry=0x2978a88) at copy.c:3863 Worker stack: #0 GetLinePosition (cstate=cstate@entry=0x29e1f28) at copyparallel.c:1474 #1 0x005a8aa4 in CacheLineInfo (cstate=cstate@entry=0x29e1f28, buff_count=buff_count@entry=0) at copyparallel.c:711 #2 0x005a8e46 in GetWorkerLine (cstate=cstate@entry=0x29e1f28) at copyparallel.c:885 #3 0x005a4f2e in NextCopyFromRawFields (cstate=cstate@entry=0x29e1f28, fields=fields@entry=0x7fff4cdc0b48, nfields=nfields@entry=0x7fff4cdc0b44) at copy.c:3615 #4 0x005a50af in NextCopyFrom (cstate=cstate@entry=0x29e1f28, econtext=econtext@entry=0x2a358d8, values=0x2a42068, nulls=0x2a42070) at copy.c:3696 #5 0x005a5b90 in CopyFrom (cstate=cstate@entry=0x29e1f28) at copy.c:2985 Best regards, houzj
RE: psql \df choose functions by their arguments
Hi (sorry forget to cc the hacklist) > Improve psql \df to choose functions by their arguments I think this is useful. I found some comments in the patch. 1. > * Abbreviations of common types is permitted (because who really likes > to type out "character varying"?), so the above could also be written as: some Abbreviations of common types are not added to the type_abbreviations[] Such as: Int8=> bigint Int2=> smallint Int4 ,int => integer Float4 => real Float8,float,double => double precision (as same as array type) Single array seems difficult to handle it, may be we can use double array or use a struct. 2. And I think It's better to update '/?' info about '\df[+]' in function slashUsage(unsigned short int pager). Best regards, houzj
Fix typo in xlogreader.h and procarray.c
Hi I found some possible typo in procarray.c and xlogreader.h - * For VACUUM separate horizons (used to to decide which deleted tuples must + * For VACUUM separate horizons (used to decide which deleted tuples must - * Callers supply a page_read callback if they want to to call + * Callers supply a page_read callback if they want to call Best regards, houzj 0001-fix-typo.patch Description: 0001-fix-typo.patch
RE: Parallel copy
Hi I found some issue in v9-0002 1. + + elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d", +write_pos, lineInfo->first_block, +pg_atomic_read_u32(_blk_ptr->unprocessed_line_parts), +offset, pg_atomic_read_u32(>line_size)); + write_pos or other variable to be printed here are type of uint32, I think it'better to use '%u' in elog msg. 2. +* line_size will be set. Read the line_size again to be sure if it is +* completed or partial block. +*/ + dataSize = pg_atomic_read_u32(>line_size); + if (dataSize) It use dataSize( type int ) to get uint32 which seems a little dangerous. Is it better to define dataSize uint32 here? 3. Since function with 'Cstate' in name has been changed to 'CState' I think we can change function PopulateCommonCstateInfo as well. 4. + if (pcdata->worker_line_buf_count) I think some check like the above can be 'if (xxx > 0)', which seems easier to understand. Best regards, houzj
Fix typo in src/backend/utils/cache/lsyscache.c
Hi I found the comment of function get_attgenerated(Oid relid, AttrNumber attnum) seems wrong. It seems the function is given the attribute number not the name. /* * get_attgenerated * - * Given the relation id and the attribute name, + * Given the relation id and the attribute number, * return the "attgenerated" field from the attribute relation. * * Errors if not found. Best regards, houzj 0001-fix-typo.patch Description: 0001-fix-typo.patch