On Sat, Feb 21, 2015 at 12:57 AM, Amit Kapila <amit.kapil...@gmail.com> wrote: > On Wed, Feb 18, 2015 at 6:44 PM, Andres Freund <and...@2ndquadrant.com> > wrote: >> On 2015-02-18 16:59:26 +0530, Amit Kapila wrote: >> >> > There could be some cases where it could be beneficial for worker >> > to process a sub-tree, but I think there will be more cases where >> > it will just work on a part of node and send the result back to either >> > master backend or another worker for further processing. >> >> I think many parallelism projects start out that way, and then notice >> that it doesn't parallelize very efficiently. >> >> The most extreme example, but common, is aggregation over large amounts >> of data - unless you want to ship huge amounts of data between processes >> eto parallize it you have to do the sequential scan and the >> pre-aggregate step (that e.g. selects count() and sum() to implement a >> avg over all the workers) inside one worker. >> > > OTOH if someone wants to parallelize scan (including expensive qual) and > sort then it will be better to perform scan (or part of scan by one worker) > and sort by other worker.
There exists a performance problem if we perform SCAN in one worker and SORT operation in another worker, because there is a need of twice tuple transfer between worker to worker/backend. This is a costly operation. It is better to combine SCAN and SORT operation into a one worker job. This can be targeted once the parallel scan code is stable. Regards, Hari Babu Fujitsu Australia -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers