Thomas Munro <[email protected]> writes:
> I'll address the instability of the regression test output separately.
If you're still looking for data on that --- prairiedog is able to
reproduce the "multibatch = f" variant about one time in thirty.
I modified the test case to print out the full EXPLAIN ANALYZE output
rather than a heavily filtered version. Here's a typical successful run:
! explain analyze
select length(max(s.t))
from wide left join (select id, coalesce(t, '') || '' as t from wide) s
using (id);
! QUERY PLAN
!
------------------------------------------------------------------------------------------------------------------------------------------------
! Finalize Aggregate (cost=97.49..97.51 rows=1 width=4) (actual
time=409.330..409.331 rows=1 loops=1)
! -> Gather (cost=97.28..97.49 rows=2 width=32) (actual
time=376.957..395.841 rows=3 loops=1)
! Workers Planned: 2
! Workers Launched: 2
! -> Partial Aggregate (cost=97.28..97.29 rows=1 width=32) (actual
time=254.370..254.373 rows=1 loops=3)
! -> Parallel Hash Left Join (cost=23.23..88.88 rows=3360
width=32) (actual time=240.812..241.297 rows=1 loops=3)
! Hash Cond: (wide.id = wide_1.id)
! -> Parallel Seq Scan on wide (cost=0.00..15.29
rows=529 width=4) (actual time=0.066..0.075 rows=1 loops=3)
! -> Parallel Hash (cost=16.61..16.61 rows=529 width=36)
(actual time=109.565..109.565 rows=1 loops=3)
! Buckets: 1024 (originally 2048) Batches: 8
(originally 1) Memory Usage: 321kB
! -> Parallel Seq Scan on wide wide_1
(cost=0.00..16.61 rows=529 width=36) (actual time=2.989..7.218 rows=1 loops=3)
! Planning time: 1.317 ms
! Execution time: 424.395 ms
! (13 rows)
and here's a failing run:
! explain analyze
select length(max(s.t))
from wide left join (select id, coalesce(t, '') || '' as t from wide) s
using (id);
! QUERY PLAN
!
-------------------------------------------------------------------------------------------------------------------------------------------------
! Finalize Aggregate (cost=97.49..97.51 rows=1 width=4) (actual
time=232.440..232.441 rows=1 loops=1)
! -> Gather (cost=97.28..97.49 rows=2 width=32) (actual
time=225.738..226.744 rows=3 loops=1)
! Workers Planned: 2
! Workers Launched: 2
! -> Partial Aggregate (cost=97.28..97.29 rows=1 width=32) (actual
time=29.377..29.379 rows=1 loops=3)
! -> Parallel Hash Left Join (cost=23.23..88.88 rows=3360
width=32) (actual time=22.747..25.340 rows=1 loops=3)
! Hash Cond: (wide.id = wide_1.id)
! -> Parallel Seq Scan on wide (cost=0.00..15.29
rows=529 width=4) (actual time=0.086..0.113 rows=2 loops=1)
! -> Parallel Hash (cost=16.61..16.61 rows=529 width=36)
(actual time=16.382..16.382 rows=1 loops=3)
! Buckets: 1024 (originally 2048) Batches: 1
(originally 1) Memory Usage: 0kB
! -> Parallel Seq Scan on wide wide_1
(cost=0.00..16.61 rows=529 width=36) (actual time=9.167..21.301 rows=2 loops=1)
! Planning time: 1.289 ms
! Execution time: 243.120 ms
! (13 rows)
I don't have enough insight to be totally sure what this means, but the
"Memory Usage: 0kB" bit is obviously bogus, so I'd venture that at least
part of the issue is failure to return stats from a worker. I also find
it most curious that the "success" case is a lot slower than the "not
success" case. Perhaps this is related to your livelock issue? Doing
another run, I get something even slower:
! explain analyze
select length(max(s.t))
from wide left join (select id, coalesce(t, '') || '' as t from wide) s
using (id);
! QUERY PLAN
!
-------------------------------------------------------------------------------------------------------------------------------------------------
! Finalize Aggregate (cost=97.49..97.51 rows=1 width=4) (actual
time=487.245..487.246 rows=1 loops=1)
! -> Gather (cost=97.28..97.49 rows=2 width=32) (actual
time=444.650..475.390 rows=3 loops=1)
! Workers Planned: 2
! Workers Launched: 2
! -> Partial Aggregate (cost=97.28..97.29 rows=1 width=32) (actual
time=345.816..345.819 rows=1 loops=3)
! -> Parallel Hash Left Join (cost=23.23..88.88 rows=3360
width=32) (actual time=334.229..338.098 rows=1 loops=3)
! Hash Cond: (wide.id = wide_1.id)
! -> Parallel Seq Scan on wide (cost=0.00..15.29
rows=529 width=4) (actual time=0.065..0.074 rows=1 loops=3)
! -> Parallel Hash (cost=16.61..16.61 rows=529 width=36)
(actual time=140.210..140.210 rows=1 loops=3)
! Buckets: 1024 (originally 2048) Batches: 8
(originally 1) Memory Usage: 321kB
! -> Parallel Seq Scan on wide wide_1
(cost=0.00..16.61 rows=529 width=36) (actual time=4.233..15.117 rows=1 loops=3)
! Planning time: 1.380 ms
! Execution time: 509.607 ms
! (13 rows)
Aside from the instability problems, I'm pretty unhappy about how much
the PHJ patch has added to the runtime of "make check". I do not think
any one feature can justify adding 20% to that. Can't you cut down the
amount of data processed by these new test cases?
regards, tom lane