Re: [HACKERS] Hash partitioning.

Jeff Janes Thu, 27 Jun 2013 14:14:14 -0700

On Wed, Jun 26, 2013 at 8:55 AM, Markus Wanner <[email protected]> wrote:

> On 06/26/2013 05:46 PM, Heikki Linnakangas wrote:
> > We could also allow a large query to search a single table in parallel.
> > A seqscan would be easy to divide into N equally-sized parts that can be
> > scanned in parallel. It's more difficult for index scans, but even then
> > it might be possible at least in some limited cases.
>
> So far reading sequentially is still faster than hopping between
> different locations. Purely from the I/O perspective, that is.
>

Wouldn't any IO system being used on a high-end system be fairly good about
making this work through interleaved read-ahead algorithms?  Also,
hopefully the planner would be able to predict when parallelization has
nothing to add and avoid using it, although surely that is easier said than
done.

>
> For queries where the single CPU core turns into a bottle-neck and which
> we want to parallelize, we should ideally still do a normal, fully
> sequential scan and only fan out after the scan and distribute the
> incoming pages (or even tuples) to the multiple cores to process.
>

That sounds like it would be much more susceptible to lock contention, and
harder to get bug-free, than dividing into bigger chunks, like whole 1 gig
segments.

Fanning out line by line (according to line_number % number_processes) was
my favorite parallelization method in Perl, but those files were read only
and so had no concurrency issues.

Cheers,

Jeff

Re: [HACKERS] Hash partitioning.

Reply via email to