RE: Parallel Seq Scan vs kernel read ahead

k.jami...@fujitsu.com Wed, 22 Jul 2020 08:25:15 -0700

On Tuesday, July 21, 2020 7:33 PM, Amit Kapila wrote:
> On Tue, Jul 21, 2020 at 3:08 PM k.jami...@fujitsu.com <k.jami...@fujitsu.com>
> wrote:
> >
> > On Tuesday, July 21, 2020 12:18 PM, Amit Kapila wrote:
> > > On Tue, Jul 21, 2020 at 8:06 AM k.jami...@fujitsu.com
> > > <k.jami...@fujitsu.com>
> > > wrote:
> > > >
> > > > I am definitely missing something. Perhaps I think I could not
> > > > understand why there's no I/O difference between the Master and
> > > > Patched (V2). Or has it been already improved even without this patch?
> > > >
> > >
> > > I don't think it is strange that you are not seeing much difference
> > > because as per the initial email by Thomas this patch is not
> > > supposed to give benefits on all systems.  I think we wanted to
> > > check that the patch should not regress performance in cases where
> > > it doesn't give benefits.  I think it might be okay to run with a
> > > higher number of workers than you have CPUs in the system as we
> > > wanted to check if such cases regress as shown by Soumyadeep above
> > > [1].  Can you once try with
> > > 8 and or 10 workers as well?
> > >
> >
> > You are right. Kindly excuse me on that part, which only means it may
> > or may not have any benefits on the filesystem I am using. But for
> > other fs, as we can see from David's benchmarks significant 
> > results/benefits.
> >
> > Following your advice on regression test case, I increased the number
> > of workers, but the query planner still capped the workers at 6, so
> > the results from 6 workers onwards are almost the same.
> >
> 
> I am slightly confused if the number of workers are capped at 6, then what 
> exactly
> the data at 32 worker count means?  If you want query planner to choose more
> number of workers, then I think either you need to increase the data or use 
> Alter
> Table <tbl_name> Set (parallel_workers = <num_workers_you_want>);


Oops I'm sorry, the "workers" labelled in those tables actually mean 
max_parallel_workers_per_gather
and not parallel_workers. In the query planner, I thought the _per_gather 
corresponds or controls
the workers planned/launched values, and those are the numbers that I used in 
the tables.

I used the default max_parallel_workers & max_worker_proceses which is 8 by 
default in postgresql.conf.
IOW, I ran all those tests with maximum of 8 processes set. But my query 
planner capped both the
Workers Planned and Launched at 6 for some reason when increasing the value for
max_parallel_workers_per_gather. 

However, when I used the ALTER TABLE SET (parallel_workers = N) based from your 
suggestion,
the query planner acquired that set value only for "Workers Planned", but not 
for "Workers Launched". 
The behavior of query planner is also different when I also set the value of 
max_worker_processes
and max_parallel_workers to parallel_workers + 1.

For example (ran on Master),
1. Set same value as parallel_workers, but "Workers Launched" and "Workers 
Planned" do not match.
max_worker_processes = 8
max_parallel_workers = 8
ALTER TABLE t_heap SET (parallel_workers = 8);
ALTER TABLE
SET max_parallel_workers_per_gather = 8;
SET
test=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) from t_heap;
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=619778.66..619778.67 rows=1 width=8) (actual 
time=16316.295..16316.295 rows=1 loops=1)
   Buffers: shared read=442478 dirtied=442478 written=442222
   ->  Gather  (cost=619777.83..619778.64 rows=8 width=8) (actual 
time=16315.528..16316.668 rows=8 loops=1)
         Workers Planned: 8
         Workers Launched: 7
         Buffers: shared read=442478 dirtied=442478 written=442222
         ->  Partial Aggregate  (cost=618777.83..618777.84 rows=1 width=8) 
(actual time=16305.092..16305.092 rows=1 loops=8)
               Buffers: shared read=442478 dirtied=442478 written=442222
               ->  Parallel Seq Scan on t_heap  (cost=0.00..583517.86 
rows=14103986 width=0) (actual time=0.725..14290.117 rows=12500000 loops=8)
                     Buffers: shared read=442478 dirtied=442478 written=442222
 Planning Time: 5.327 ms
   Buffers: shared hit=17 read=10
 Execution Time: 16316.915 ms
(13 rows)

2. Match the workers launched and workers planned values (parallel_workers + 1)
max_worker_processes = 9
max_parallel_workers = 9

ALTER TABLE t_heap SET (parallel_workers = 8);
ALTER TABLE;
SET max_parallel_workers_per_gather = 8;
SET

test=# EXPLAIN (ANALYZE, BUFFERS) SELECT count(*) from t_heap;
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=619778.66..619778.67 rows=1 width=8) (actual 
time=16783.944..16783.944 rows=1 loops=1)
   Buffers: shared read=442478 dirtied=442478 written=442190
   ->  Gather  (cost=619777.83..619778.64 rows=8 width=8) (actual 
time=16783.796..16785.474 rows=9 loops=1)
         Workers Planned: 8
         Workers Launched: 8
         Buffers: shared read=442478 dirtied=442478 written=442190
         ->  Partial Aggregate  (cost=618777.83..618777.84 rows=1 width=8) 
(actual time=16770.218..16770.218 rows=1 loops=9)
               Buffers: shared read=442478 dirtied=442478 written=442190
               ->  Parallel Seq Scan on t_heap  (cost=0.00..583517.86 
rows=14103986 width=0) (actual time=6.004..14967.329 rows=11111111 loops=9)
                     Buffers: shared read=442478 dirtied=442478 written=442190
 Planning Time: 4.755 ms
   Buffers: shared hit=17 read=10
 Execution Time: 16785.719 ms
(13 rows)



Kind regards,
Kirk Jamison

RE: Parallel Seq Scan vs kernel read ahead

Reply via email to