Re: Support tid range scan in parallel?

2024-05-14 Thread Cary Huang
> You should add a test that creates a table with a very low fillfactor,
> low enough so only 1 tuple can fit on each page and insert a few dozen
> tuples. The test would do SELECT COUNT(*) to ensure you find the
> correct subset of tuples. You'd maybe want MIN(ctid) and MAX(ctid) in
> there too for extra coverage to ensure that the correct tuples are
> being counted.  Just make sure and EXPLAIN (COSTS OFF) the query first
> in the test to ensure that it's always testing the plan you're
> expecting to test.

thank you for the test suggestion. I moved the regress tests from 
select_parallel
to tidrangescan instead. I follow the existing test table creation in 
tidrangescan
with the lowest fillfactor of 10, I am able to get consistent 5 tuples per page
instead of 1. It should be okay as long as it is consistently 5 tuples per page 
so
the tuple count results from parallel tests would be in multiples of 5.

The attached v4 patch includes the improved regression tests.

Thank you very much!

Cary Huang
-
HighGo Software Inc. (Canada)
cary.hu...@highgo.ca
www.highgo.ca








v4-0001-add-parallel-tid-rangescan.patch
Description: Binary data


Re: Support tid range scan in parallel?

2024-05-10 Thread David Rowley
On Fri, 10 May 2024 at 05:16, Cary Huang  wrote:
> I understand that the regression tests for parallel ctid range scan is a
> bit lacking now. It only has a few EXPLAIN clauses to ensure parallel
> workers are used when tid ranges are specified. They are added as
> part of select_parallel.sql test. I am not sure if it is more appropriate
> to have them as part of tidrangescan.sql test instead. So basically
> re-run the same test cases in tidrangescan.sql but in parallel?

I think tidrangescan.sql is a more suitable location than
select_parallel.sql I don't think you need to repeat all the tests as
many of them are testing the tid qual processing which is the same
code as it is in the parallel version.

You should add a test that creates a table with a very low fillfactor,
low enough so only 1 tuple can fit on each page and insert a few dozen
tuples. The test would do SELECT COUNT(*) to ensure you find the
correct subset of tuples. You'd maybe want MIN(ctid) and MAX(ctid) in
there too for extra coverage to ensure that the correct tuples are
being counted.  Just make sure and EXPLAIN (COSTS OFF) the query first
in the test to ensure that it's always testing the plan you're
expecting to test.

David




Re: Support tid range scan in parallel?

2024-05-09 Thread Cary Huang
 > I've not looked at the patch, but please add it to the July CF.  I'll
 > try and look in more detail then.

Thanks David, I have added this patch on July commitfest under the
server feature category. 

I understand that the regression tests for parallel ctid range scan is a
bit lacking now. It only has a few EXPLAIN clauses to ensure parallel 
workers are used when tid ranges are specified. They are added as
part of select_parallel.sql test. I am not sure if it is more appropriate
to have them as part of tidrangescan.sql test instead. So basically
re-run the same test cases in tidrangescan.sql but in parallel? 

thank you

Cary









Re: Support tid range scan in parallel?

2024-05-08 Thread David Rowley
On Thu, 9 May 2024 at 10:23, Cary Huang  wrote:
> The v3 patch is attached.

I've not looked at the patch, but please add it to the July CF.  I'll
try and look in more detail then.

David




Re: Support tid range scan in parallel?

2024-05-08 Thread Cary Huang
Thank you very much for the test and review. Greatly appreciated!
 
> This now calls heap_setscanlimits() for the parallel version, it's
> just that table_block_parallelscan_nextpage() does nothing to obey
> those limits.

yes, you are absolutely right. Though heap_setscanlimits() is now called by
parallel tid range scan,  table_block_parallelscan_nextpage() does nothing
to obey these limits, resulting in more blocks being inefficiently filtered out
by the optimization code you mentioned in heap_getnextslot_tidrange().

> I've not studied exactly how you'd get the rs_numblock information
> down to the parallel scan descriptor.  But when you figure that out,
> just remember that you can't do the --scan->rs_numblocks from
> table_block_parallelscan_nextpage() as that's not parallel safe. You
> might be able to add an or condition to: "if (nallocated >=
> pbscan->phs_nblocks)" to make it "if (nallocated >=
> pbscan->phs_nblocks || nallocated >= pbscan->phs_numblocks)",
> although the field names don't seem very intuitive there. It would be
> nicer if the HeapScanDesc field was called rs_blocklimit rather than
> rs_numblocks.  It's not for this patch to go messing with that,
> however.

rs_numblock was not passed down to the parallel scan context and
table_block_parallelscan_nextpage() did not seem to have a logic to limit
the block scan range set by heap_setscanlimits() in parallel scan. Also, I
noticed that the rs_startblock was also not passed to the parallel scan
context, which causes the parallel scan always start from 0 even when a
lower ctid bound is specified.

so I added a logic in heap_set_tidrange() to pass these 2 values to parallel
scan descriptor as "phs_startblock" and "phs_numblock". These will be
available in table_block_parallelscan_nextpage() in parallel scan. 

I followed your recommendation and modified the condition to:

if (nallocated >= pbscan->phs_nblocks || (pbscan->phs_numblock != 0 &&
nallocated >= pbscan->phs_numblock))

so that the parallel tid range scan will stop when the upper scan limit is
reached. With these changes, I see that the number of buffer reads is
consistent between single and parallel ctid range scans. The v3 patch is
attached.


postgres=# explain (analyze, buffers) select count(*) from test where ctid >= 
'(0,1)' and ctid < '(11,0)';
   QUERY PLAN
-
 Aggregate  (cost=39.43..39.44 rows=1 width=8) (actual time=1.007..1.008 rows=1 
loops=1)
   Buffers: shared read=11
   ->  Tid Range Scan on test  (cost=0.01..34.35 rows=2034 width=0) (actual 
time=0.076..0.639 rows=2035 loops=1)
 TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid < '(11,0)'::tid))
 Buffers: shared read=11

postgres=# set max_parallel_workers_per_gather=2;
SET
postgres=# explain (analyze, buffers) select count(*) from test where ctid >= 
'(0,1)' and ctid < '(11,0)';
 QUERY PLAN

 Finalize Aggregate  (cost=16.45..16.46 rows=1 width=8) (actual 
time=14.329..16.840 rows=1 loops=1)
   Buffers: shared hit=11
   ->  Gather  (cost=16.43..16.44 rows=2 width=8) (actual time=3.197..16.814 
rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared hit=11
 ->  Partial Aggregate  (cost=16.43..16.44 rows=1 width=8) (actual 
time=0.705..0.706 rows=1 loops=3)
   Buffers: shared hit=11
   ->  Parallel Tid Range Scan on test  (cost=0.01..14.31 rows=848 
width=0) (actual time=0.022..0.423 rows=678 loops=3)
 TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid < 
'(11,0)'::tid))
 Buffers: shared hit=11

postgres=# set max_parallel_workers_per_gather=3;
SET
postgres=# explain (analyze, buffers) select count(*) from test where ctid >= 
'(0,1)' and ctid < '(11,0)';
 QUERY PLAN

 Finalize Aggregate  (cost=12.74..12.75 rows=1 width=8) (actual 
time=16.793..19.053 rows=1 loops=1)
   Buffers: shared hit=11
   ->  Gather  (cost=12.72..12.73 rows=3 width=8) (actual time=2.827..19.012 
rows=4 loops=1)
 Workers Planned: 3
 Workers Launched: 3
 Buffers: shared hit=11
 ->  Partial Aggregate  (cost=12.72..12.73 rows=1 width=8) (actual 
time=0.563..0.565 rows=1 loops=4)
   Buffers: shared hit=11
   ->  Parallel Tid Range Scan on test  (cost=0.01..11.08 rows=656 
width=0) (actual time=0.018..0.338 rows=509 loops=4)
 TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid < 
'(11,0)'::tid))
 Buffers: 

Re: Support tid range scan in parallel?

2024-05-03 Thread David Rowley
On Sat, 4 May 2024 at 06:55, Cary Huang  wrote:
> With syncscan enabled, the "table_block_parallelscan_nextpage()" would
> return the next block since the end of the first tid rangescan instead of the
> correct start block that should be scanned. I see that single tid rangescan
> does not have SO_ALLOW_SYNC set, so I figure syncscan should also be
> disabled in parallel case. With this change, then it would be okay to call
> heap_setscanlimits() in parallel case, so I added this call back to
> heap_set_tidrange() in both serial and parallel cases.

This now calls heap_setscanlimits() for the parallel version, it's
just that table_block_parallelscan_nextpage() does nothing to obey
those limits.

The only reason the code isn't reading the entire table is due to the
optimisation in heap_getnextslot_tidrange() which returns false when
the ctid goes out of range. i.e, this code:

/*
* When scanning forward, the TIDs will be in ascending order.
* Future tuples in this direction will be higher still, so we can
* just return false to indicate there will be no more tuples.
*/
if (ScanDirectionIsForward(direction))
return false;

If I comment out that line, I see all pages are accessed:

postgres=# explain (analyze, buffers) select count(*) from a where
ctid >= '(0,1)' and ctid < '(11,0)';
QUERY PLAN
---
 Finalize Aggregate  (cost=18.80..18.81 rows=1 width=8) (actual
time=33.530..36.118 rows=1 loops=1)
   Buffers: shared read=4425
   ->  Gather  (cost=18.78..18.79 rows=2 width=8) (actual
time=33.456..36.102 rows=3 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 Buffers: shared read=4425
 ->  Partial Aggregate  (cost=18.78..18.79 rows=1 width=8)
(actual time=20.389..20.390 rows=1 loops=3)
   Buffers: shared read=4425
   ->  Parallel Tid Range Scan on a  (cost=0.01..16.19
rows=1035 width=0) (actual time=9.375..20.349 rows=829 loops=3)
 TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
 Buffers: shared read=4425 <  this is all
pages in the table instead of 11 pages.

With that code still commented out, the non-parallel version still
won't read all pages due to the setscanlimits being obeyed.

postgres=# set max_parallel_workers_per_gather=0;
SET
postgres=# explain (analyze, buffers) select count(*) from a where
ctid >= '(0,1)' and ctid < '(11,0)';
  QUERY PLAN
--
 Aggregate  (cost=45.07..45.08 rows=1 width=8) (actual
time=0.302..0.302 rows=1 loops=1)
   Buffers: shared hit=11
   ->  Tid Range Scan on a  (cost=0.01..38.86 rows=2485 width=0)
(actual time=0.019..0.188 rows=2486 loops=1)
 TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid < '(11,0)'::tid))
 Buffers: shared hit=11


If I put that code back in, how many pages are read depends on the
number of parallel workers as workers will keep running with higher
page numbers and heap_getnextslot_tidrange() will just (inefficiently)
filter those out.

max_parallel_workers_per_gather=2;
   ->  Parallel Tid Range Scan on a  (cost=0.01..16.19
rows=1035 width=0) (actual time=0.191..0.310 rows=829 loops=3)
 TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
 Buffers: shared read=17

max_parallel_workers_per_gather=3;
   ->  Parallel Tid Range Scan on a  (cost=0.01..12.54
rows=802 width=0) (actual time=0.012..0.114 rows=622 loops=4)
 TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
 Buffers: shared hit=19

max_parallel_workers_per_gather=4;
   ->  Parallel Tid Range Scan on a  (cost=0.01..9.72
rows=621 width=0) (actual time=0.014..0.135 rows=497 loops=5)
 TID Cond: ((ctid >= '(0,1)'::tid) AND (ctid <
'(11,0)'::tid))
 Buffers: shared hit=21

To fix this you need to make table_block_parallelscan_nextpage obey
the limits imposed by heap_setscanlimits().

The equivalent code in the non-parallel version is in
heapgettup_advance_block().

/* check if the limit imposed by heap_setscanlimits() is met */
if (scan->rs_numblocks != InvalidBlockNumber)
{
if (--scan->rs_numblocks == 0)
return InvalidBlockNumber;
}

I've not studied exactly how you'd get the rs_numblock information
down to the parallel scan descriptor.  But when you figure that out,
just remember that you can't do the --scan->rs_numblocks from
table_block_parallelscan_nextpage() as that's not parallel safe. You
might be able to add an or condition to: "if (nallocated >=
pbscan->phs_nblocks)" to make it "if (nallocated >=
pbscan->phs_nblocks || nallocated >= 

Re: Support tid range scan in parallel?

2024-05-03 Thread Cary Huang
Hello

> -- parallel plan
> postgres=# set max_parallel_workers_per_gather=2;
> SET
> postgres=# select count(*) from t where ctid >= '(0,0)' and ctid < '(10,0)';
>  count
> ---
>  0
> (1 row)
> 
> I've not really looked into why, but I see you're not calling
> heap_setscanlimits() in parallel mode. You need to somehow restrict
> the block range of the scan to the range specified in the quals. You
> might need to do more work to make the scan limits work with parallel
> scans.

I found that select count(*) using parallel tid rangescan for the very first 
time,
it would return the correct result, but the same subsequent queries would
result in 0 as you stated. This is due to the "pscan->phs_syncscan" set to true
in ExecTidRangeScanInitializeDSM(), inherited from parallel seq scan case.
With syncscan enabled, the "table_block_parallelscan_nextpage()" would
return the next block since the end of the first tid rangescan instead of the
correct start block that should be scanned. I see that single tid rangescan
does not have SO_ALLOW_SYNC set, so I figure syncscan should also be
disabled in parallel case. With this change, then it would be okay to call
heap_setscanlimits() in parallel case, so I added this call back to
heap_set_tidrange() in both serial and parallel cases.


> 2. There's a 4 line comment you've added to cost_tidrangescan() which
> is just a copy and paste from cost_seqscan().  If you look at the
> seqscan costing, the comment is true in that scenario, but not true in
> where you've pasted it.  The I/O cost is all tied in to run_cost.

thanks for pointing out, I have removed these incorrect comments

> 3. Calling TidRangeQualFromRestrictInfoList() once for the serial path
> and again for the partial path isn't great. It would be good to just
> call that function once and use the result for both path types.

good point. I moved the adding of tid range scan partial path inside
create_tidscan_paths() where it makes a TidRangeQualFromRestrictInfoList()
call for serial path, so I can just reuse tidrangequals if it is appropriate to
consider parallel tid rangescan.

> 4. create_tidrangescan_subpaths() seems like a weird name for a
> function.  That seems to imply that scans have subpaths. Scans are
> always leaf paths and have no subpaths.

I removed this function with weird name; it is not needed because the logic 
inside
is moved to create_tidscan_paths() where it can reuse tidrangequals.

> It would be good to see the EXPLAIN (ANALYZE, BUFFERS) with SET
> track_io_timing = 1;

the v2 patch is attached that should address the issues above. Below are the 
EXPLAIN
outputs with track_io_timing = 1 in my environment. Generally, parallel tid 
range scan
results in more I/O timings and shorter execution time.


SET track_io_timing = 1;

===serial tid rangescan===

EXPLAIN (ANALYZE, BUFFERS) select a from test where ctid >= '(0,0)' and ctid < 
'(216216,40)';
QUERY PLAN
---
 Tid Range Scan on test  (cost=0.01..490815.59 rows=27459559 width=4) (actual 
time=0.072..10143.770 rows=3999 loops=1)
   TID Cond: ((ctid >= '(0,0)'::tid) AND (ctid < '(216216,40)'::tid))
   Buffers: shared hit=298 read=215919 written=12972
   I/O Timings: shared read=440.277 write=58.525
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.289 ms
 Execution Time: 12497.081 ms
(8 rows)

set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_table_scan_size=0;
set max_parallel_workers_per_gather=2;

===parallel tid rangescan===

EXPLAIN (ANALYZE, BUFFERS) select a from test where ctid >= '(0,0)' and ctid < 
'(216216,40)';
   QUERY PLAN
-
 Gather  (cost=0.01..256758.88 rows=4130 width=4) (actual 
time=0.878..7083.705 rows=3999 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared read=216217
   I/O Timings: shared read=1224.153
   ->  Parallel Tid Range Scan on test  (cost=0.01..256758.88 rows=1721 
width=4) (actual time=0.256..3980.770 rows=1333 loops=3)
 TID Cond: ((ctid >= '(0,0)'::tid) AND (ctid < '(216216,40)'::tid))
 Buffers: shared read=216217
 I/O Timings: shared read=1224.153
 Planning Time: 0.258 ms
 Execution Time: 9731.800 ms
(11 rows)

===serial tid rangescan with aggregate===

set max_parallel_workers_per_gather=0;

EXPLAIN (ANALYZE, BUFFERS) select count(a) from test where ctid >= '(0,0)' and 
ctid < '(216216,40)';
   QUERY PLAN

 Aggregate  (cost=716221.63..716221.64 rows=1 width=8) (actual 

Re: Support tid range scan in parallel?

2024-05-01 Thread Cary Huang
 > This isn't a complete review. It's just that this seems enough to keep
 > you busy for a while. I can look a bit harder when the patch is
 > working correctly. I think you should have enough feedback to allow
 > that now.

Thanks for the test, review and feedback. They are greatly appreciated! 
I will polish the patch some more following your feedback and share new
results / patch when I have them. 

Thanks again!

Cary






Re: Support tid range scan in parallel?

2024-05-01 Thread David Rowley
On Wed, 1 May 2024 at 07:10, Cary Huang  wrote:
> Yes of course. These numbers were obtained earlier this year on master with 
> the patch applied most likely without the read stream code you mentioned. The 
> patch attached here is rebased to commit 
> dd0183469bb779247c96e86c2272dca7ff4ec9e7 on master, which is quite recent and 
> should have the read stream code for v17 as I can immediately tell that the 
> serial scans run much faster now in my setup. I increased the records on the 
> test table from 40 to 100 million because serial scans are much faster now. 
> Below is the summary and details of my test. Note that I only include the 
> EXPLAIN ANALYZE details of round1 test. Round2 is the same except for 
> different execution times.

It would be good to see the EXPLAIN (ANALYZE, BUFFERS) with SET
track_io_timing = 1;

Here's a quick review

1. Does not produce correct results:

-- serial plan
postgres=# select count(*) from t where ctid >= '(0,0)' and ctid < '(10,0)';
 count
---
  2260
(1 row)

-- parallel plan
postgres=# set max_parallel_workers_per_gather=2;
SET
postgres=# select count(*) from t where ctid >= '(0,0)' and ctid < '(10,0)';
 count
---
 0
(1 row)

I've not really looked into why, but I see you're not calling
heap_setscanlimits() in parallel mode. You need to somehow restrict
the block range of the scan to the range specified in the quals. You
might need to do more work to make the scan limits work with parallel
scans.

If you look at heap_scan_stream_read_next_serial(), it's calling
heapgettup_advance_block(), where there's  "if (--scan->rs_numblocks
== 0)".  But no such equivalent code in
table_block_parallelscan_nextpage() called by
heap_scan_stream_read_next_parallel().  To make Parallel TID Range
work, you'll need heap_scan_stream_read_next_parallel() to abide by
the scan limits.

2. There's a 4 line comment you've added to cost_tidrangescan() which
is just a copy and paste from cost_seqscan().  If you look at the
seqscan costing, the comment is true in that scenario, but not true in
where you've pasted it.  The I/O cost is all tied in to run_cost.

+ /* The CPU cost is divided among all the workers. */
+ run_cost /= parallel_divisor;
+
+ /*
+ * It may be possible to amortize some of the I/O cost, but probably
+ * not very much, because most operating systems already do aggressive
+ * prefetching.  For now, we assume that the disk run cost can't be
+ * amortized at all.
+ */

3. Calling TidRangeQualFromRestrictInfoList() once for the serial path
and again for the partial path isn't great. It would be good to just
call that function once and use the result for both path types.

4. create_tidrangescan_subpaths() seems like a weird name for a
function.  That seems to imply that scans have subpaths. Scans are
always leaf paths and have no subpaths.

This isn't a complete review. It's just that this seems enough to keep
you busy for a while. I can look a bit harder when the patch is
working correctly. I think you should have enough feedback to allow
that now.

David




Re: Support tid range scan in parallel?

2024-04-30 Thread Cary Huang
Hi David

Thank you for your reply.

> From a CPU point of view, I'd hard to imagine that a SELECT * query
> without any other items in the WHERE clause other than the TID range
> quals would run faster with multiple workers than with 1.  The problem
> is the overhead of pushing tuples to the main process often outweighs
> the benefits of the parallelism.  However, from an I/O point of view
> on a server with slow enough disks, I can imagine there'd be a
> speedup.

yeah, this is generally true. With everything set to default, the planner would 
not choose parallel sequential scan if the scan range covers mostly all tuples 
of a table (to reduce the overhead of pushing tuples to main proc as you 
mentioned). It is preferred when the target data is small but the table is 
huge. In my case, it is also the same, the planner by default uses normal tid 
range scan, so I had to alter cost parameters to influence the planner's 
decision. This is where I found that with WHERE clause only containing TID 
ranges that cover the entire table would result faster with parallel workers, 
at least in my environment.

> Of course, it may be beneficial to have parallel TID Range for other
> cases when more row filtering or aggregation is being done as that
> requires pushing fewer tuples over from the parallel worker to the
> main process. It just would be good to get to the bottom of if there's
> still any advantage to parallelism when no filtering other than the
> ctid quals is being done now that we've less chance of having to wait
> for I/O coming from disk with the read streams code.

I believe so too. I shared my test procedure below with ctid being the only 
quals. 

>> below is the timing to complete a select query covering all the records in a 
>> simple 2-column table with 40 million records,
>>
>> - tid range scan takes 10216ms
>> - tid range scan with 2 workers takes 7109ms
>> - sequential scan with 2 workers takes 8499ms
>
> Can you share more details about this test? i.e. the query, what the
> times are that you've measured (EXPLAIN ANALYZE, or SELECT, COPY?).
> Also, which version/commit did you patch against? I was wondering if
> the read stream code added in v17 would result in the serial case
> running faster because the parallelism just resulted in more I/O
> concurrency.

Yes of course. These numbers were obtained earlier this year on master with the 
patch applied most likely without the read stream code you mentioned. The patch 
attached here is rebased to commit dd0183469bb779247c96e86c2272dca7ff4ec9e7 on 
master, which is quite recent and should have the read stream code for v17 as I 
can immediately tell that the serial scans run much faster now in my setup. I 
increased the records on the test table from 40 to 100 million because serial 
scans are much faster now. Below is the summary and details of my test. Note 
that I only include the EXPLAIN ANALYZE details of round1 test. Round2 is the 
same except for different execution times. 

[env]
- OS: Ubuntu 18.04
- CPU: 4 cores @ 3.40 GHz
- MEM: 16 GB

[test table setup]
initdb with all default values
CREATE TABLE test (a INT, b TEXT);
INSERT INTO test VALUES(generate_series(1,1), 'testing');
SELECT min(ctid), max(ctid) from test;
  min  | max
---+--
 (0,1) | (540540,100)
(1 row)

[summary]
round 1:
tid range scan: 14915ms
tid range scan 2 workers: 12265ms
seq scan with 2 workers: 12675ms

round2:
tid range scan: 12112ms
tid range scan 2 workers: 10649ms
seq scan with 2 workers: 11206ms

[details of EXPLAIN ANALYZE below]

[default tid range scan]
EXPLAIN ANALYZE SELECT a FROM test WHERE ctid >= '(1,0)' AND ctid <= 
'(540540,100)';
 QUERY PLAN

 Tid Range Scan on test  (cost=0.01..1227029.81 rows=68648581 width=4) (actual 
time=0.188..12280.791 rows=9815 loops=1)
   TID Cond: ((ctid >= '(1,0)'::tid) AND (ctid <= '(540540,100)'::tid))
 Planning Time: 0.817 ms
 Execution Time: 14915.035 ms
(4 rows)

[parallel tid range scan with 2 workers]
set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_table_scan_size=0;
set max_parallel_workers_per_gather=2;

EXPLAIN ANALYZE SELECT a FROM test WHERE ctid >= '(1,0)' AND ctid <= 
'(540540,100)';
   QUERY PLAN
-
 Gather  (cost=0.01..511262.43 rows=68648581 width=4) (actual 
time=1.322..9249.197 rows=9815 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Tid Range Scan on test  (cost=0.01..511262.43 rows=28603575 
width=4) (actual time=0.332..4906.262 rows=3272 loops=3)
 TID Cond: ((ctid >= '(1,0)'::tid) AND (ctid <= '(540540,100)'::tid))
 Planning Time: 0.213 ms

Re: Support tid range scan in parallel?

2024-04-29 Thread David Rowley
On Tue, 30 Apr 2024 at 10:36, Cary Huang  wrote:
> In one of our migration scenarios, we rely on tid range scan to migrate huge 
> table from one database to another once the lower and upper ctid bound is 
> determined. With the support of parallel ctid range scan, this process could 
> be done much quicker.

I would have thought that the best way to migrate would be to further
divide the TID range into N segments and run N queries, one per
segment to get the data out.

>From a CPU point of view, I'd hard to imagine that a SELECT * query
without any other items in the WHERE clause other than the TID range
quals would run faster with multiple workers than with 1.  The problem
is the overhead of pushing tuples to the main process often outweighs
the benefits of the parallelism.  However, from an I/O point of view
on a server with slow enough disks, I can imagine there'd be a
speedup.

> The attached patch is my approach to add parallel ctid range scan to 
> PostgreSQL's planner and executor. In my tests, I do see an increase in 
> performance using parallel tid range scan over the single worker tid range 
> scan and it is also faster than parallel sequential scan covering similar 
> ranges. Of course, the table needs to be large enough to reflect the 
> performance increase.
>
> below is the timing to complete a select query covering all the records in a 
> simple 2-column table with 40 million records,
>
>  - tid range scan takes 10216ms
>  - tid range scan with 2 workers takes 7109ms
>  - sequential scan with 2 workers takes 8499ms

Can you share more details about this test? i.e. the query, what the
times are that you've measured (EXPLAIN ANALYZE, or SELECT, COPY?).
Also, which version/commit did you patch against?  I was wondering if
the read stream code added in v17 would result in the serial case
running faster because the parallelism just resulted in more I/O
concurrency.

Of course, it may be beneficial to have parallel TID Range for other
cases when more row filtering or aggregation is being done as that
requires pushing fewer tuples over from the parallel worker to the
main process. It just would be good to get to the bottom of if there's
still any advantage to parallelism when no filtering other than the
ctid quals is being done now that we've less chance of having to wait
for I/O coming from disk with the read streams code.

David




Support tid range scan in parallel?

2024-04-29 Thread Cary Huang
Hello

 

When using ctid as a
restriction clause with lower and upper bounds, PostgreSQL's planner will use
TID range scan plan to handle such query. This works and generally fine.
However, if the ctid range covers a huge amount of data, the planner will not
use parallel worker to perform ctid range scan because it is not supported. It 
could, however,
still choose to use parallel sequential scan to complete the scan if ti costs 
less. 

 

In one of our
migration scenarios, we rely on tid range scan to migrate huge table from one
database to another once the lower and upper ctid bound is determined. With the
support of parallel ctid range scan, this process could be done much quicker.

 

The attached patch
is my approach to add parallel ctid range scan to PostgreSQL's planner and 
executor. In my
tests, I do see an increase in performance using parallel tid range scan over
the single worker tid range scan and it is also faster than parallel sequential
scan covering similar ranges. Of course, the table needs to be large enough to
reflect the performance increase.

 

below is the timing to complete a select query covering all the records in a 
simple 2-column
table with 40 million records,

 

 - tid range scan takes 10216ms

 - tid range scan with 2 workers takes 7109ms

 - sequential scan with 2 workers takes 8499ms

 

Having the support
for parallel ctid range scan is definitely helpful in our migration case, I am
sure it could be useful in other cases as well. I am sharing the patch here and
if someone could provide a quick feedback or review that would be greatly 
appreciated.

 

Thank you!

 

Cary Huang

-

HighGo Software Inc. (Canada)

mailto:cary.hu...@highgo.ca

http://www.highgo.ca

v1-0001-add-parallel-tid-rangescan.patch
Description: Binary data