On Wed, Jun 26, 2013 at 8:55 AM, Markus Wanner <mar...@bluegap.ch> wrote:
> On 06/26/2013 05:46 PM, Heikki Linnakangas wrote: > > We could also allow a large query to search a single table in parallel. > > A seqscan would be easy to divide into N equally-sized parts that can be > > scanned in parallel. It's more difficult for index scans, but even then > > it might be possible at least in some limited cases. > > So far reading sequentially is still faster than hopping between > different locations. Purely from the I/O perspective, that is. > Wouldn't any IO system being used on a high-end system be fairly good about making this work through interleaved read-ahead algorithms? Also, hopefully the planner would be able to predict when parallelization has nothing to add and avoid using it, although surely that is easier said than done. > > For queries where the single CPU core turns into a bottle-neck and which > we want to parallelize, we should ideally still do a normal, fully > sequential scan and only fan out after the scan and distribute the > incoming pages (or even tuples) to the multiple cores to process. > That sounds like it would be much more susceptible to lock contention, and harder to get bug-free, than dividing into bigger chunks, like whole 1 gig segments. Fanning out line by line (according to line_number % number_processes) was my favorite parallelization method in Perl, but those files were read only and so had no concurrency issues. Cheers, Jeff