[HACKERS] using custom scan nodes to prototype parallel sequential scan

Robert Haas Mon, 10 Nov 2014 07:58:34 -0800

On Wed, Oct 15, 2014 at 2:55 PM, Simon Riggs <si...@2ndquadrant.com> wrote:
> Something usable, with severe restrictions, is actually better than we
> have now. I understand the journey this work represents, so don't be
> embarrassed by submitting things with heuristics and good-enoughs in
> it. Our mentor, Mr.Lane, achieved much by spreading work over many
> releases, leaving others to join in the task.


It occurs to me that, now that the custom-scan stuff is committed, it
wouldn't be that hard to use that, plus the other infrastructure we
already have, to write a prototype of parallel sequential scan.  Given
where we are with the infrastructure, there would be a number of
unhandled problems, such as deadlock detection (needs group locking or
similar), assessment of quals as to parallel-safety (needs
proisparallel or similar), general waterproofing to make sure that
pushing down a qual we shouldn't does do anything really dastardly
like crash the server (another written but yet-to-be-published patch
adds a bunch of relevant guards), and snapshot sharing (likewise).
But if you don't do anything weird, it should basically work.

I think this would be useful for a couple of reasons.  First, it would
be a demonstrable show of progress, illustrating how close we are to
actually having something you can really deploy.  Second, we could use
it to demonstrate how the remaining infrastructure patches close up
gaps in the initial prototype.  Third, it would let us start doing
real performance testing.  It seems pretty clear that a parallel
sequential scan of data that's in memory (whether the page cache or
the OS cache) can be accelerated by having multiple processes scan it
in parallel.  But it's much less clear what will happen when the data
is being read in from disk.  Does parallelism help at all?  What
degree of parallelism helps?  Do we break OS readahead so badly that
performance actually regresses?  These are things that are likely to
need a fair amount of tuning before this is ready for prime time, so
being able to start experimenting with them in advance of all of the
infrastructure being completely ready seems like it might help.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] using custom scan nodes to prototype parallel sequential scan

Reply via email to