Re: [HACKERS] Parallel query execution with SPI
On 31.03.2017 13:48, Robert Haas wrote: On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnikwrote: It is possible to execute query concurrently using SPI? If so, how it can be enforced? I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help: query is executed by single backend while the same query been launched at top level uses parallel plan: fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query, fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK); ... SPI_cursor_fetch(fsstate->portal, true, 1); Parallel execution isn't possible if you are using a cursor-type interface, because a parallel query can't be suspended and resumed like a non-parallel query. If you use a function that executes the query to completion in one go, like SPI_execute_plan, then it's cool. See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d. Thank you very much for explanation. In case of using SPI_execute the query is really executed concurrently. But it means that when I am executing some query using SPI, I need to somehow predict number of returned tuples. If it is not so much, then it is better to use SPI_execute to allow concurrent execution of the query. But if it is large enough, then SPI_execute without limit can cause memory overflow. Certainly I can specify some reasonable limit and it if is reached, then use cursor instead. But it is neither convenient, neither efficient. I wonder if somebody can suggest better solution? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution with SPI
On Fri, Mar 31, 2017 at 4:18 PM, Robert Haaswrote: > On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnik > wrote: >> It is possible to execute query concurrently using SPI? >> If so, how it can be enforced? >> I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help: >> query is executed by single backend while the same query been launched at >> top level uses parallel plan: >> >> fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query, >> fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK); >> ... >> SPI_cursor_fetch(fsstate->portal, true, 1); > > Parallel execution isn't possible if you are using a cursor-type > interface, because a parallel query can't be suspended and resumed > like a non-parallel query. If you use a function that executes the > query to completion in one go, like SPI_execute_plan, then it's cool. > See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d. > > -- Adding to that, for your case, passing CURSOR_OPT_PARALLEL_OK is not enough, because PortalRun for the cursor would be having portal->run_once set as false which restricts parallelism in ExecutePlan, if (!execute_once || dest->mydest == DestIntoRel) use_parallel_mode = false; You may check [1] for the discussion on this. [1] https://www.postgresql.org/message-id/flat/CAFiTN-vxhvvi-rMJFOxkGzNaQpf%2BKS76%2Bsu7-sG_NQZGRPJkQg%40mail.gmail.com#cafitn-vxhvvi-rmjfoxkgznaqpf+ks76+su7-sg_nqzgrpj...@mail.gmail.com -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution with SPI
On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnikwrote: > It is possible to execute query concurrently using SPI? > If so, how it can be enforced? > I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help: > query is executed by single backend while the same query been launched at > top level uses parallel plan: > > fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query, > fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK); > ... > SPI_cursor_fetch(fsstate->portal, true, 1); Parallel execution isn't possible if you are using a cursor-type interface, because a parallel query can't be suspended and resumed like a non-parallel query. If you use a function that executes the query to completion in one go, like SPI_execute_plan, then it's cool. See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2
On Tue, Aug 16, 2016 at 1:05 AM, Rushabh Lathiawrote: > I agree, this make sense. > > Here is the patch to allocate worker instrumentation into same context > as the regular instrumentation which is per-query context. Looks good, committed. I am not sure it was a very good idea for af33039317ddc4a0e38a02e2255c2bf453115fd2 by Tom Lane to change the current memory context for the entire execution of gather_readnext(); this might not be the only or the last bug that results from that decision. However, I don't really want to get an argument about that right now, and this at least fixes the problem we know about. Thanks for the report and patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2
On Mon, Aug 15, 2016 at 6:02 PM, Robert Haaswrote: > On Sat, Aug 13, 2016 at 4:36 AM, Amit Kapila > wrote: > > AFAICS, your patch seems to be the right fix for this issue, unless we > > need the instrumentation information during execution (other than for > > explain) for some purpose. > > Hmm, I disagree. It should be the job of > ExecParallelRetrieveInstrumentation to allocate its data in the > correct context, not the responsibility of nodeGather.c to work around > the fact that it doesn't. The worker instrumentation should be > allocated in the same context as the regular instrumentation > information, which I assume is probably the per-query context. > I agree, this make sense. Here is the patch to allocate worker instrumentation into same context as the regular instrumentation which is per-query context. PFA patch. -- Rushabh Lathia www.EnterpriseDB.com diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c index 380d743..5aa6f02 100644 --- a/src/backend/executor/execParallel.c +++ b/src/backend/executor/execParallel.c @@ -500,6 +500,7 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate, int n; int ibytes; int plan_node_id = planstate->plan->plan_node_id; + MemoryContext oldcontext; /* Find the instumentation for this node. */ for (i = 0; i < instrumentation->num_plan_nodes; ++i) @@ -514,10 +515,19 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate, for (n = 0; n < instrumentation->num_workers; ++n) InstrAggNode(planstate->instrument, [n]); - /* Also store the per-worker detail. */ + /* + * Also store the per-worker detail. + * + * Worker instrumentation should be allocated in the same context as + * the regular instrumentation information, which is the per-query + * context. Switch into per-query memory context. + */ + oldcontext = MemoryContextSwitchTo(planstate->state->es_query_cxt); ibytes = mul_size(instrumentation->num_workers, sizeof(Instrumentation)); planstate->worker_instrument = palloc(ibytes + offsetof(WorkerInstrumentation, instrument)); + MemoryContextSwitchTo(oldcontext); + planstate->worker_instrument->num_workers = instrumentation->num_workers; memcpy(>worker_instrument->instrument, instrument, ibytes); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2
On Sat, Aug 13, 2016 at 4:36 AM, Amit Kapilawrote: > AFAICS, your patch seems to be the right fix for this issue, unless we > need the instrumentation information during execution (other than for > explain) for some purpose. Hmm, I disagree. It should be the job of ExecParallelRetrieveInstrumentation to allocate its data in the correct context, not the responsibility of nodeGather.c to work around the fact that it doesn't. The worker instrumentation should be allocated in the same context as the regular instrumentation information, which I assume is probably the per-query context. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [parallel query] random server crash while running tpc-h query on power2
On Sat, Aug 13, 2016 at 11:10 AM, Rushabh Lathiawrote: > Hi All, > > Recently while running tpc-h queries on postgresql master branch, I am > noticed > random server crash. Most of the time server crash coming while turn tpch > query > number 3 - (but its very random). > > > Here its clear that work_instrument is either corrupted or Un-inililized > that is the > reason its ending up with server crash. > > With bit more debugging and looked at git history I found that issue started > coming > with commit af33039317ddc4a0e38a02e2255c2bf453115fd2. gather_readnext() > calls > ExecShutdownGatherWorkers() when nreaders == 0. ExecShutdownGatherWorkers() > calls ExecParallelFinish() which collects the instrumentation before marking > ParallelExecutorInfo to finish. ExecParallelRetrieveInstrumentation() do the > allocation > of planstate->worker_instrument. > > With commit af33039317 now we calling the gather_readnext() with per-tuple > context, > but with nreader == 0 with ExecShutdownGatherWorkers() we end up with > allocation > of planstate->worker_instrument into per-tuple context - which is wrong. > > Now fix can be: > > 1) Avoid calling ExecShutdownGatherWorkers() from the gather_readnext() and > let > ExecEndGather() do that things. > I don't think we can wait till ExecEndGather() to collect statistics, as we need it before that for explain path. However, we do call ExecShutdownNode() from ExecutePlan() when there are no more tuples which can take care of ensuring the shutdown of Gather node. I think the advantage of calling it in gather_readnext() is that it will resources to be released early and populating the instrumentation/statistics as early as possible. > But with this change, gather_readread() and > gather_getnext() depend on planstate->reader structure to continue reading > tuple. > Now either we can change those condition to be depend on planstate->nreaders > or > just pfree(planstate->reader) into gather_readnext() instead of calling > ExecShutdownGatherWorkers(). > > > Attaching patch, which fix the issue with approach 1). > AFAICS, your patch seems to be the right fix for this issue, unless we need the instrumentation information during execution (other than for explain) for some purpose. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jul 5, 2016 at 3:59 PM, Peter Geogheganwrote: > On Tue, Jul 5, 2016 at 12:58 PM, Tom Lane wrote: >> Perhaps we could change the wording of temp_file_limit's description >> from "space that a session can use" to "space that a process can use" >> to help clarify this? > > That's all that I was looking for, really. OK, done that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
Peter Geogheganwrites: > On Tue, Jul 5, 2016 at 12:00 PM, Robert Haas wrote: >> I think that it is not worth mentioning specifically for >> temp_file_limit; to me that seems to be a hole with no bottom. We'll >> end up arguing about which GUCs should mention it specifically and >> there will be no end to it. > I don't think that you need it for any other GUC, so I really don't > know why you're concerned about a slippery slope. FWIW, I agree with Robert on this. It seems just weird to call out temp_file_limit specifically. Also, I don't agree that that's the only interesting per-process resource consumption; max_files_per_process seems much more likely to cause trouble in practice. Perhaps we could change the wording of temp_file_limit's description from "space that a session can use" to "space that a process can use" to help clarify this? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jul 5, 2016 at 12:58 PM, Tom Lanewrote: > Perhaps we could change the wording of temp_file_limit's description > from "space that a session can use" to "space that a process can use" > to help clarify this? That's all that I was looking for, really. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jul 5, 2016 at 12:00 PM, Robert Haaswrote: > I think that it is not worth mentioning specifically for > temp_file_limit; to me that seems to be a hole with no bottom. We'll > end up arguing about which GUCs should mention it specifically and > there will be no end to it. I don't think that you need it for any other GUC, so I really don't know why you're concerned about a slippery slope. The only other resource GUC that is scoped per session that I can see is temp_buffers, but that doesn't need to change, since parallel workers cannot use temp_buffers directly in practice. max_files_per_process is already clearly per process, so no change needed there either. I don't see a case other than temp_file_limit that appears to be even marginally in need of a specific note. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jul 5, 2016 at 1:58 PM, Peter Geogheganwrote: > On Tue, Jul 5, 2016 at 7:45 AM, Robert Haas wrote: >> Since Peter doesn't seem in a hurry to produce a patch for this issue, >> I wrote one. It is attached. I'll commit this in a day or two if >> nobody objects. > > Sorry about the delay. > > Your patch seems reasonable, but I thought we'd also want to change > "per session" to "per session (with an additional temp_file_limit > allowance within each parallel worker)" for temp_file_limit. > > I think it's worthwhile noting this for temp_file_limit specifically, > since it's explicitly a per session limit, whereas users are quite > used to the idea that work_mem might be doled out multiple times for > multiple executor nodes. I think that it is not worth mentioning specifically for temp_file_limit; to me that seems to be a hole with no bottom. We'll end up arguing about which GUCs should mention it specifically and there will be no end to it. We can see what other people think, but that's my position. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jul 5, 2016 at 7:45 AM, Robert Haaswrote: > Since Peter doesn't seem in a hurry to produce a patch for this issue, > I wrote one. It is attached. I'll commit this in a day or two if > nobody objects. Sorry about the delay. Your patch seems reasonable, but I thought we'd also want to change "per session" to "per session (with an additional temp_file_limit allowance within each parallel worker)" for temp_file_limit. I think it's worthwhile noting this for temp_file_limit specifically, since it's explicitly a per session limit, whereas users are quite used to the idea that work_mem might be doled out multiple times for multiple executor nodes. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jun 21, 2016 at 8:15 AM, Robert Haaswrote: > On Mon, Jun 20, 2016 at 11:01 PM, Tom Lane wrote: >> Peter Geoghegan writes: >>> On Wed, May 18, 2016 at 3:40 AM, Robert Haas wrote: What I'm tempted to do is trying to document that, as a point of policy, parallel query in 9.6 uses up to (workers + 1) times the resources that a single session might use. That includes not only CPU but also things like work_mem and temp file space. This obviously isn't ideal, but it's what could be done by the ship date. >> >>> Where would that be documented, though? Would it need to be noted in >>> the case of each such GUC? >> >> Why can't we just note this in the number-of-workers GUCs? It's not like >> there even *is* a GUC for many of our per-process resource consumption >> behaviors. > > +1. Since Peter doesn't seem in a hurry to produce a patch for this issue, I wrote one. It is attached. I'll commit this in a day or two if nobody objects. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company parallel-workers-guc-doc.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Mon, Jun 20, 2016 at 11:01 PM, Tom Lanewrote: > Peter Geoghegan writes: >> On Wed, May 18, 2016 at 3:40 AM, Robert Haas wrote: >>> What I'm tempted to do is trying to document that, as a point of >>> policy, parallel query in 9.6 uses up to (workers + 1) times the >>> resources that a single session might use. That includes not only CPU >>> but also things like work_mem and temp file space. This obviously >>> isn't ideal, but it's what could be done by the ship date. > >> Where would that be documented, though? Would it need to be noted in >> the case of each such GUC? > > Why can't we just note this in the number-of-workers GUCs? It's not like > there even *is* a GUC for many of our per-process resource consumption > behaviors. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
Peter Geogheganwrites: > On Wed, May 18, 2016 at 3:40 AM, Robert Haas wrote: >> What I'm tempted to do is trying to document that, as a point of >> policy, parallel query in 9.6 uses up to (workers + 1) times the >> resources that a single session might use. That includes not only CPU >> but also things like work_mem and temp file space. This obviously >> isn't ideal, but it's what could be done by the ship date. > Where would that be documented, though? Would it need to be noted in > the case of each such GUC? Why can't we just note this in the number-of-workers GUCs? It's not like there even *is* a GUC for many of our per-process resource consumption behaviors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Wed, May 18, 2016 at 3:40 AM, Robert Haaswrote: > What I'm tempted to do is trying to document that, as a point of > policy, parallel query in 9.6 uses up to (workers + 1) times the > resources that a single session might use. That includes not only CPU > but also things like work_mem and temp file space. This obviously > isn't ideal, but it's what could be done by the ship date. Where would that be documented, though? Would it need to be noted in the case of each such GUC? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, Jun 7, 2016 at 8:32 AM, Robert Haaswrote: > You previously offered to write a patch for this. Are you still > planning to do that? OK, I'll get to that in the next few days. I'm slightly concerned that I might have missed a real problem in the code. I'll need to examine the issue more closely. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Sun, Jun 5, 2016 at 4:32 PM, Peter Geogheganwrote: > On Wed, May 18, 2016 at 12:01 PM, Peter Geoghegan wrote: >>> I think for 9.6 we just have to document this issue. In the next >>> release, we could (and might well want to) try to do something more >>> clever. >> >> Works for me. You may wish to update comments within fd.c at the same time. > > I've created a 9.6 open issue for this. You previously offered to write a patch for this. Are you still planning to do that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Wed, May 18, 2016 at 12:01 PM, Peter Geogheganwrote: >> I think for 9.6 we just have to document this issue. In the next >> release, we could (and might well want to) try to do something more >> clever. > > Works for me. You may wish to update comments within fd.c at the same time. I've created a 9.6 open issue for this. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query
On Sun, May 22, 2016 at 10:36 AM, Tatsuo Ishiiwrote: >> The brief introudce of MPI(Message Passing Interface) as following URL, >> which is a message protocol used for parallel computinng, just like DSM >> does in parallel query. The DSM play a message passing role(in fact, it's. >> by passing the query plan/raw node tree to anthor worker) in parallel >> query. I think the parallel query resemble the MPI. so I mentioned that we >> can refere to the MPI bechmark, and use the idea which is used to test the >> parallel computing system.If the parallel query to be feature in future, I >> think we must have an other bechmark for this feature, just like tpcc does. >> So, I mention the MPI. >> >> https://www.open-mpi.org/ >> >> https://en.wikipedia.org/wiki/Message_Passing_Interface > > Thank you for the info. Ishii-san is doing so... Please be sure to press "reply-all" when answering to an email in the community mailing lists. It is hard to follow this discussion. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query
> The brief introudce of MPI(Message Passing Interface) as following URL, > which is a message protocol used for parallel computinng, just like DSM > does in parallel query. The DSM play a message passing role(in fact, it's. > by passing the query plan/raw node tree to anthor worker) in parallel > query. I think the parallel query resemble the MPI. so I mentioned that we > can refere to the MPI bechmark, and use the idea which is used to test the > parallel computing system.If the parallel query to be feature in future, I > think we must have an other bechmark for this feature, just like tpcc does. > So, I mention the MPI. > > https://www.open-mpi.org/ > > https://en.wikipedia.org/wiki/Message_Passing_Interface Thank you for the info. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query
What's MPI? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp > Maybe we can refere to the MPI test cases. > > On Sun, May 22, 2016 at 3:19 PM, Hao Leewrote: > >> What kind of cases do you want to run? beside the multi-cores, i think the >> working mem and its access rate are also a main criteria. As you known the >> parallel query uses DSM as IPC tools, which means it will meet the memory >> access barrier.,the memory bus has its access rate limitation. the >> different system architecture, such as different CPU architect, etc,will >> also be considered when we do the performace test. Do we need to consider >> what i mentioned above? >> >> Best Regards, >> >> Hao LEE. >> >> On Thu, May 19, 2016 at 11:07 PM, Tatsuo Ishii >> wrote: >> >>> Robert, >>> (and others who are involved in parallel query of PostgreSQL) >>> >>> PostgreSQL Enterprise Consortium (one of the PostgreSQL communities in >>> Japan, in short "PGECons") is planning to test the parallel query >>> performance of PostgreSQL 9.6. Besides TPC-H (I know you have already >>> tested on an IBM box), what kind of tests would you like be performed? >>> >>> We are planning to use a big intel box (like more than 60 cores). >>> Any suggestions are welcome. >>> >>> Best regards, >>> -- >>> Tatsuo Ishii >>> SRA OSS, Inc. Japan >>> English: http://www.sraoss.co.jp/index_en.php >>> Japanese:http://www.sraoss.co.jp >>> >>> >>> -- >>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >>> To make changes to your subscription: >>> http://www.postgresql.org/mailpref/pgsql-hackers >>> >> >> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query
Thank you for the suggesion. Currently no particular test cases are in my mind. That's the reason why I need input from community. Regarding the test schedule, PGECons starts the planning from next month or so. So I guess test starts no earlier than July. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp > What kind of cases do you want to run? beside the multi-cores, i think the > working mem and its access rate are also a main criteria. As you known the > parallel query uses DSM as IPC tools, which means it will meet the memory > access barrier.,the memory bus has its access rate limitation. the > different system architecture, such as different CPU architect, etc,will > also be considered when we do the performace test. Do we need to consider > what i mentioned above? > > Best Regards, > > Hao LEE. > > On Thu, May 19, 2016 at 11:07 PM, Tatsuo Ishiiwrote: > >> Robert, >> (and others who are involved in parallel query of PostgreSQL) >> >> PostgreSQL Enterprise Consortium (one of the PostgreSQL communities in >> Japan, in short "PGECons") is planning to test the parallel query >> performance of PostgreSQL 9.6. Besides TPC-H (I know you have already >> tested on an IBM box), what kind of tests would you like be performed? >> >> We are planning to use a big intel box (like more than 60 cores). >> Any suggestions are welcome. >> >> Best regards, >> -- >> Tatsuo Ishii >> SRA OSS, Inc. Japan >> English: http://www.sraoss.co.jp/index_en.php >> Japanese:http://www.sraoss.co.jp >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Wed, May 18, 2016 at 3:40 AM, Robert Haaswrote: >> I'll write a patch to fix the issue, if there is a consensus on a solution. > > I think for 9.6 we just have to document this issue. In the next > release, we could (and might well want to) try to do something more > clever. Works for me. You may wish to update comments within fd.c at the same time. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On 18 May 2016 at 22:40, Robert Haaswrote: > On Tue, May 17, 2016 at 6:40 PM, Peter Geoghegan wrote: > > On Tue, May 17, 2016 at 3:33 PM, Peter Geoghegan wrote: > >> Fundamentally, since temporary_files_size enforcement simply > >> piggy-backs on low-level fd.c file management, without any > >> consideration of what the temp files contain, it'll be hard to be sure > >> that parallel workers will not have issues. I think it'll be far > >> easier to fix the problem then it would be to figure out if it's > >> possible to get away with it. > > > > I'll write a patch to fix the issue, if there is a consensus on a > solution. > > I think for 9.6 we just have to document this issue. In the next > release, we could (and might well want to) try to do something more > clever. > > What I'm tempted to do is trying to document that, as a point of > policy, parallel query in 9.6 uses up to (workers + 1) times the > resources that a single session might use. That includes not only CPU > but also things like work_mem and temp file space. This obviously > isn't ideal, but it's what could be done by the ship date. > I was asked (internally I believe) about abuse of work_mem during my work on parallel aggregates, at the time I didn't really feel like I was abusing that any more than parallel hash join was. My thought was that one day it would be nice if work_mem could be granted to a query and we had some query marshal system which ensured that the total grants did not exceed the server wide memory dedicated to work_mem. Of course that's lots of work, as there's at least one node (HashAgg) which can still blow out work_mem for bad estimates. For this release, I assumed it wouldn't be too big an issue if we're shipping with max_parallel_degree = 0 as we could just decorate the docs with some warnings about work_mem is per node / per worker to caution users setting this setting any higher. That might be enough to give us wriggle from for the future where we can make improvements, so I agree with Robert, the docs seem like the best solution for 9.6. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, May 17, 2016 at 6:40 PM, Peter Geogheganwrote: > On Tue, May 17, 2016 at 3:33 PM, Peter Geoghegan wrote: >> Fundamentally, since temporary_files_size enforcement simply >> piggy-backs on low-level fd.c file management, without any >> consideration of what the temp files contain, it'll be hard to be sure >> that parallel workers will not have issues. I think it'll be far >> easier to fix the problem then it would be to figure out if it's >> possible to get away with it. > > I'll write a patch to fix the issue, if there is a consensus on a solution. I think for 9.6 we just have to document this issue. In the next release, we could (and might well want to) try to do something more clever. What I'm tempted to do is trying to document that, as a point of policy, parallel query in 9.6 uses up to (workers + 1) times the resources that a single session might use. That includes not only CPU but also things like work_mem and temp file space. This obviously isn't ideal, but it's what could be done by the ship date. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, May 17, 2016 at 3:33 PM, Peter Geogheganwrote: > Fundamentally, since temporary_files_size enforcement simply > piggy-backs on low-level fd.c file management, without any > consideration of what the temp files contain, it'll be hard to be sure > that parallel workers will not have issues. I think it'll be far > easier to fix the problem then it would be to figure out if it's > possible to get away with it. I'll write a patch to fix the issue, if there is a consensus on a solution. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Tue, May 17, 2016 at 1:53 PM, Amit Kapilawrote: > What kind of special treatment are you expecting for temporary_files_size, > also why do you think it is required? Currently neither we build hash in > parallel nor there is any form of parallel sort work. I expect only that temporary_files_size be described accurately, and have new behavior for parallel query that is not surprising. There are probably several solutions that would meet that standard, and I am not attached to any particular one of them. I wrote a parallel sort patch already (CREATE INDEX for the B-Tree AM), and will post it at an opportune time. So, I think we can expect your observations about there not being parallel sort work to no longer apply in a future release, which we should get ahead of now. Also, won't parallel workers that build their own copy of the hash table (for a hash join) also use their own temp files, if there is a need for temp files? I think parallel query will end up sharing temp files fairly often, and not just out of convenience to implementers (that is, not just to avoid using shared memory extensively). Fundamentally, since temporary_files_size enforcement simply piggy-backs on low-level fd.c file management, without any consideration of what the temp files contain, it'll be hard to be sure that parallel workers will not have issues. I think it'll be far easier to fix the problem then it would be to figure out if it's possible to get away with it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query and temp_file_limit
On Wed, May 18, 2016 at 12:55 AM, Peter Geogheganwrote: > > temp_file_limit "specifies the maximum amount of disk space that a > session can use for temporary files, such as sort and hash temporary > files", according to the documentation. That's not true when parallel > query is in use, since the global variable temporary_files_size > receives no special treatment for parallel query. > What kind of special treatment are you expecting for temporary_files_size, also why do you think it is required? Currently neither we build hash in parallel nor there is any form of parallel sort work. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] parallel query vs extensions
On Mon, Apr 18, 2016 at 09:56:28AM -0400, Robert Haas wrote: > On Fri, Apr 15, 2016 at 12:45 AM, Jeff Janeswrote: > > Should every relevant contrib extension get a version bump with a > > transition file which is nothing but a list of "alter function blah > > blah blah parallel safe" ? > > Yes, I think that's what we would need to do. It's a lot of work, > albeit mostly mechanical. This is in the open items list, but I think it is too late to include such a change in 9.6. This is an opportunity for further optimization, not a defect. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel query vs extensions
On Fri, Apr 15, 2016 at 12:45 AM, Jeff Janeswrote: > Should every relevant contrib extension get a version bump with a > transition file which is nothing but a list of "alter function blah > blah blah parallel safe" ? Yes, I think that's what we would need to do. It's a lot of work, albeit mostly mechanical. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel query vs extensions
On 15 April 2016 at 12:45, Jeff Janeswrote: > I think there are a lot of extensions which create functions which > could benefit from being declared parallel safe. But how does one go > about doing that? > > create extension xml2; > select xml_valid(filler),count(*) from pgbench_accounts group by 1; > Time: 3205.830 ms > > alter function xml_valid (text) parallel safe; > > select xml_valid(filler),count(*) from pgbench_accounts group by 1; > Time: 1803.936 ms > > (Note that I have no particular interest in the xml2 extension, it > just provides a convenient demonstration of the general principle) > > Should every relevant contrib extension get a version bump with a > transition file which is nothing but a list of "alter function blah > blah blah parallel safe" ? > > And what of non-contrib extensions? Is there some clever alternative > to having a bunch of pseudo versions, like "1.0", "1.0_96", "1.1", > "1.1_9.6", "1.2", "1.2_96", etc.? > What I've done in the past for similar problems is preprocess the extension--x.y.sql files in the Makefile to conditionally remove unsupported syntax, functions, etc. It's rather less than perfect because if the user pg_upgrades they won't get the now-supported options. They'll have the old-version extension on the new version and would have to drop & re-create to get the new version contents. You could create variant pseudo-extensions to make this clearer - myext95--1.0.sql, myext96--1.0.sql, etc - but there's still no way to ALTER EXTENSION to upgrade. pseudo-versions like you suggest are probably going to work, but the extension machinery doesn't understand them and you can only specify one of them as the default in the control file. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Parallel query fails on standby server
On Wed, Mar 9, 2016 at 12:34 AM, Robert Haaswrote: > On Tue, Mar 8, 2016 at 8:23 AM, Michael Paquier > wrote: >> On Tue, Mar 8, 2016 at 9:51 PM, Craig Ringer wrote: >>> On 8 March 2016 at 20:30, Ashutosh Sharma wrote: While testing a parallel scan feature on standby server, it is found that the parallel query fails with an error "ERROR: failed to initialize transaction_read_only to 0". >>> >>> Looks like it might be a good idea to add some tests to src/test/recovery >>> for parallel query on standby servers... >> >> An even better thing would be a set of read-only tests based on the >> database "regression" generated by make check, itself run with >> pg_regress. > > I'm not sure anything in the main regression suite actually goes > parallel right now, which is probably the first thing to fix. > > Unless, of course, you use force_parallel_mode=regress, max_parallel_degree>0. I was thinking about a test in src/test/recovery, that runs a standby and a master. pg_regress with the main recovery test suite is run on the master, then a second pg_regress run happens with a set of read-only queries, set with sql/expected located in src/test/recovery directly for example. Do we actually have a buildfarm animal using those parameters in extra_config? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query fails on standby server
On Tue, Mar 8, 2016 at 8:23 AM, Michael Paquierwrote: > On Tue, Mar 8, 2016 at 9:51 PM, Craig Ringer wrote: >> On 8 March 2016 at 20:30, Ashutosh Sharma wrote: >>> >>> While testing a parallel scan feature on standby server, it is found that >>> the parallel query fails with an error "ERROR: failed to initialize >>> transaction_read_only to 0". >>> >> >> Looks like it might be a good idea to add some tests to src/test/recovery >> for parallel query on standby servers... > > An even better thing would be a set of read-only tests based on the > database "regression" generated by make check, itself run with > pg_regress. I'm not sure anything in the main regression suite actually goes parallel right now, which is probably the first thing to fix. Unless, of course, you use force_parallel_mode=regress, max_parallel_degree>0. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query fails on standby server
On Tue, Mar 8, 2016 at 9:51 PM, Craig Ringerwrote: > On 8 March 2016 at 20:30, Ashutosh Sharma wrote: >> >> >> While testing a parallel scan feature on standby server, it is found that >> the parallel query fails with an error "ERROR: failed to initialize >> transaction_read_only to 0". >> > > Looks like it might be a good idea to add some tests to src/test/recovery > for parallel query on standby servers... An even better thing would be a set of read-only tests based on the database "regression" generated by make check, itself run with pg_regress. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query fails on standby server
On 8 March 2016 at 20:30, Ashutosh Sharmawrote: > > While testing a parallel scan feature on standby server, it is found that > the parallel query fails with an error "*ERROR: failed to initialize > transaction_read_only to 0*". > > Looks like it might be a good idea to add some tests to src/test/recovery for parallel query on standby servers... -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Parallel query execution
On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I just got out of a meeting that included Oracle Spatial folks, who were boasting of big performance increases in enabling parallel query on their spatial queries. Basically the workloads on things like big spatial joins are entirely CPU bound, so they are seeing that adding 15 processors makes things 15x faster. Spatial folks would love love love to see parallel query execution. -- Paul Ramsey http://cleverelephant.ca http://postgis.net
Re: [HACKERS] Parallel query execution
On Thu, Jan 24, 2013 at 02:34:49PM -0800, Paul Ramsey wrote: On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I just got out of a meeting that included Oracle Spatial folks, who were boasting of big performance increases in enabling parallel query on their spatial queries. Basically the workloads on things like big spatial joins are entirely CPU bound, so they are seeing that adding 15 processors makes things 15x faster. Spatial folks would love love love to see parallel query execution. I added PostGIS under the Expensive Functions opportunity: https://wiki.postgresql.org/wiki/Parallel_Query_Execution#Specific_Opportunities -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 11:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. In case you hadn't noticed, we've totally lost control of the CF process. Quite aside from the lack of progress on closing CF3, major hackers who should know better are submitting significant new feature patches now, despite our agreement in Ottawa that nothing big would be accepted after CF3. At this point I'd bet against releasing 9.3 during 2013. I have been skimming the commitfest application, and unlike some of the previous commitfests a huge number of patches have had review at some point in time, but probably need more...so looking for the red Nobody in the 'reviewers' column probably understates the shortage of review. I'm curious what the qualitative feelings are on patches or clusters thereof and what kind of review would be helpful in clearing the field. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 12:03 AM, Bruce Momjian br...@momjian.us wrote: On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote: On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote: Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. Bruce, there are many, many patches on the queue. How will we ever get to beta testing if we begin open ended discussions on next release? If we can't finish what we've started for 9.3, why talk about 9.4? Yes, its a great topic for discussion, but there are better times. Like when? I don't remember a policy of not discussing things now. Does anyone else remember this? Are you saying feature discussion is only between commit-fests? Is this written down anywhere? I only remember beta-time as a time not to discuss features. We kind of do - when in a CF we should do reviewing of existing patches, when outside a CF we should do discussions and work on new features. It's on http://wiki.postgresql.org/wiki/CommitFest. It doesn't specifically say do this and don't do htat, but it says focus on review and discussing things that will happen that far ahead is definitely not focusing on review. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 2:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. In case you hadn't noticed, we've totally lost control of the CF process. Quite aside from the lack of progress on closing CF3, major hackers who should know better are submitting significant new feature patches now, despite our agreement in Ottawa that nothing big would be accepted after CF3. At this point I'd bet against releasing 9.3 during 2013. Or we could reject all of those patches. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 6:52 AM, Magnus Hagander mag...@hagander.net wrote: On Wed, Jan 16, 2013 at 12:03 AM, Bruce Momjian br...@momjian.us wrote: On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote: On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote: Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. Bruce, there are many, many patches on the queue. How will we ever get to beta testing if we begin open ended discussions on next release? If we can't finish what we've started for 9.3, why talk about 9.4? Yes, its a great topic for discussion, but there are better times. Like when? I don't remember a policy of not discussing things now. Does anyone else remember this? Are you saying feature discussion is only between commit-fests? Is this written down anywhere? I only remember beta-time as a time not to discuss features. We kind of do - when in a CF we should do reviewing of existing patches, when outside a CF we should do discussions and work on new features. It's on http://wiki.postgresql.org/wiki/CommitFest. It doesn't specifically say do this and don't do htat, but it says focus on review and discussing things that will happen that far ahead is definitely not focusing on review. Bruce is evidently under the impression that he's no longer under any obligation to review or commit other people's patches, or participate in the CommitFest process in any way. I believe that he has not committed a significant patch written by someone else in several years. If the committers on the core team aren't committed to the process, it doesn't stand much chance of working. The fact that I have been completely buried for the last six months is perhaps not helping, either, but even at the very low level of engagement I've been at recently, I've still done more reviews (a few) than patch submissions (none). I view it as everyone's responsibility to maintain a similar balance in their own work. And some people are, but not enough, especially among the committers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Claudio Freire (klaussfre...@gmail.com) wrote: Well, there's the fault in your logic. It won't be as linear. I really don't see how this has become so difficult to communicate. It doesn't have to be linear. We're currently doing massive amounts of parallel processing by hand using partitioning, tablespaces, and client-side logic to split up the jobs. It's certainly *much* faster than doing it in a single thread. It's also faster with 10 processes going than 5 (we've checked). With 10 going, we've hit the FC fabric limit (and these are spinning disks in the SAN, not SSDs). I'm also sure it'd be much slower if all 10 processes were trying to read data through a single process that's reading from the I/O system. We've got some processes which essentially end up doing that and we don't come anywhere near the total FC fabric bandwidth when just scanning through the system because, at that point, you do hit the limits of how fast the individual drive sets can provide data. To be clear- I'm not suggesting that we would parallelize a SeqScan node and have the nodes above it be single-threaded. As I said upthread- we want to parallelize reading and processing the data coming in. Perhaps at some level that works out to not change how we actually *do* seqscans at all and instead something higher in the plan tree just creates multiple of them on independent threads, but it's still going to end up being parallel I/O in the end. I'm done with this thread for now- as brought up, we need to focus on getting 9.3 out the door. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
* Tom Lane (t...@sss.pgh.pa.us) wrote: In case you hadn't noticed, we've totally lost control of the CF process. I concur. Quite aside from the lack of progress on closing CF3, major hackers who should know better are submitting significant new feature patches now, despite our agreement in Ottawa that nothing big would be accepted after CF3. For my small part, it wasn't my intent to drop a contentious patch at the end. I had felt it was pretty minor and relatively simple. My arguments regarding the popen patch were simply that it didn't address one of the use-cases that I was hoping to. I'll hold off on working on the compressed transport for now in favor of doing reviews and trying to help get 9.3 wrapped up. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
* Daniel Farina (dan...@heroku.com) wrote: I have been skimming the commitfest application, and unlike some of the previous commitfests a huge number of patches have had review at some point in time, but probably need more...so looking for the red Nobody in the 'reviewers' column probably understates the shortage of review. I've been frustrated by that myself. I realize we don't want to duplicate work but I'm really starting to think that having the Reviewers column has turned out to actually work against us. I'm curious what the qualitative feelings are on patches or clusters thereof and what kind of review would be helpful in clearing the field. I haven't been thrilled with the patches that I've looked at but they've also been ones that hadn't been reviewed before, so perhaps that's what should be expected. It'd be neat if we had some idea of what committers were actively working on and keep off of *those*, but keep working on the ones which aren't being worked by a committer currently. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
On 01/15/2013 11:32 PM, Bruce Momjian wrote: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? I am about half way through reviewing it. Unfortunately paid work take precedence over unpaid work. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 10:33 AM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: Well, there's the fault in your logic. It won't be as linear. I really don't see how this has become so difficult to communicate. It doesn't have to be linear. We're currently doing massive amounts of parallel processing by hand using partitioning, tablespaces, and client-side logic to split up the jobs. It's certainly *much* faster than doing it in a single thread. It's also faster with 10 processes going than 5 (we've checked). With 10 going, we've hit the FC fabric limit (and these are spinning disks in the SAN, not SSDs). I'm also sure it'd be much slower if all 10 processes were trying to read data through a single process that's reading from the I/O system. We've got some processes which essentially end up doing that and we don't come anywhere near the total FC fabric bandwidth when just scanning through the system because, at that point, you do hit the limits of how fast the individual drive sets can provide data. Well... just closing then (to let people focus on 9.3's CF), that's a level of hardware I haven't had experience with, but seems to behave much different than regular (big and small) RAID arrays. In any case, perhaps tablespaces are a hint here: if nodes are working on different tablespaces, there's an indication that they *can* be parallelized efficiently. That could be fleshed out on a parallel execution node, but for that to work the whole execution engine needs to be thread-safe (or it has to fork). It won't be easy. It's best to concentrate on lower-hanging fruits, like sorting and aggregates. Now back to the CF. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 08:42:29AM -0500, Stephen Frost wrote: * Daniel Farina (dan...@heroku.com) wrote: I have been skimming the commitfest application, and unlike some of the previous commitfests a huge number of patches have had review at some point in time, but probably need more...so looking for the red Nobody in the 'reviewers' column probably understates the shortage of review. I've been frustrated by that myself. I realize we don't want to duplicate work but I'm really starting to think that having the Reviewers column has turned out to actually work against us. That column tells the CF manager whom to browbeat. Without a CF manager, a stale entry can indeed make a patch look under-control when it isn't. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 01:37:28PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:32 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I think we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? Not exactly, I meant something like being able to use parallel processing when doing INSERT or COPY directly in core. If there is a parallel processing infrastructure, it could also be used for such write operations. I agree that the cases mentioned by Josh are far more appealing though... I am not sure how a COPY could be easily parallelized, but I supposed it could be done as part of the 1GB segment feature. People have complained that COPY is CPU-bound, so it might be very interesting to see if we could offload some of that parsing overhead to a child. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 01:48:29AM -0300, Alvaro Herrera wrote: Bruce Momjian escribió: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? It's in the previous-to-last commitfest. IIRC that patch required review and testing from people with some Windows background. There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. Wow, I had no idea we were that far behind. I have avoided commit-fest work because I often travel so might leave the items abandoned, and I try to do cleanup of items that never make the commit-fest --- I thought that was something that needed doing too, and I rarely can complete that task. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 08:11:06AM -0500, Robert Haas wrote: We kind of do - when in a CF we should do reviewing of existing patches, when outside a CF we should do discussions and work on new features. It's on http://wiki.postgresql.org/wiki/CommitFest. It doesn't specifically say do this and don't do htat, but it says focus on review and discussing things that will happen that far ahead is definitely not focusing on review. Bruce is evidently under the impression that he's no longer under any obligation to review or commit other people's patches, or participate in the CommitFest process in any way. I believe that he has not committed a significant patch written by someone else in several years. If the committers on the core team aren't committed to the process, it doesn't stand much chance of working. I assume you know I was the most frequent committer of other people's patches for years before the commit-fests started, so I thought I would move on to other things. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 09:05:39AM -0500, Andrew Dunstan wrote: On 01/15/2013 11:32 PM, Bruce Momjian wrote: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? I am about half way through reviewing it. Unfortunately paid work take precedence over unpaid work. Do you think it will make it into 9.3? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On 01/16/2013 12:20 PM, Bruce Momjian wrote: On Wed, Jan 16, 2013 at 09:05:39AM -0500, Andrew Dunstan wrote: On 01/15/2013 11:32 PM, Bruce Momjian wrote: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? I am about half way through reviewing it. Unfortunately paid work take precedence over unpaid work. Do you think it will make it into 9.3? Yes, I hope it will. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Bruce Momjian (br...@momjian.us) wrote: I am not sure how a COPY could be easily parallelized, but I supposed it could be done as part of the 1GB segment feature. People have complained that COPY is CPU-bound, so it might be very interesting to see if we could offload some of that parsing overhead to a child. COPY can certainly be CPU bound but before we can parallelize that usefully we need to solve the problem around extent locking when trying to do multiple COPY's to the same table. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
2013/1/16 Stephen Frost sfr...@snowman.net: * Bruce Momjian (br...@momjian.us) wrote: I am not sure how a COPY could be easily parallelized, but I supposed it could be done as part of the 1GB segment feature. People have complained that COPY is CPU-bound, so it might be very interesting to see if we could offload some of that parsing overhead to a child. COPY can certainly be CPU bound but before we can parallelize that usefully we need to solve the problem around extent locking when trying to do multiple COPY's to the same table. Probably update any related indexes and constraint checking should be paralellized. Regards Pavel Thanks, Stephen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 10:06:51PM +0100, Pavel Stehule wrote: 2013/1/16 Stephen Frost sfr...@snowman.net: * Bruce Momjian (br...@momjian.us) wrote: I am not sure how a COPY could be easily parallelized, but I supposed it could be done as part of the 1GB segment feature. People have complained that COPY is CPU-bound, so it might be very interesting to see if we could offload some of that parsing overhead to a child. COPY can certainly be CPU bound but before we can parallelize that usefully we need to solve the problem around extent locking when trying to do multiple COPY's to the same table. Probably update any related indexes and constraint checking should be paralellized. Wiki updated: https://wiki.postgresql.org/wiki/Parallel_Query_Execution -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
2013/1/16 Bruce Momjian br...@momjian.us: Wiki updated: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Could we add CTE to that opportunities list? I think that some kind of queries in CTE queries could be easilly parallelized. []s -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 07:57:01PM -0200, Dickson S. Guedes wrote: 2013/1/16 Bruce Momjian br...@momjian.us: Wiki updated: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Could we add CTE to that opportunities list? I think that some kind of queries in CTE queries could be easilly parallelized. I added CTEs with joins. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tuesday, January 15, 2013, Stephen Frost wrote: * Gavin Flower (gavinflo...@archidevsys.co.nz javascript:;) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. I'd rather not have the benefits of parallelism be tied to partitioning if we can help it. Hopefully implementing parallelism in core would result in something more transparent than that. Cheers, Jeff
Re: [HACKERS] Parallel query execution
On Tuesday, January 15, 2013, Gavin Flower wrote: On 16/01/13 11:14, Bruce Momjian wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. Hmm... How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? effective_io_concurrency does this for bitmap scans. I thought there was a patch in the commitfest to extend this to ordinary index scans, but now I can't find it. But it still doesn't give you CPU parallelism. The nice thing about CPU parallelism is that it usually brings some amount of IO parallelism for free, while the reverse less likely to be so. Cheers, Jeff
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 10:04 PM, Jeff Janes jeff.ja...@gmail.com wrote: Hmm... How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? effective_io_concurrency does this for bitmap scans. I thought there was a patch in the commitfest to extend this to ordinary index scans, but now I can't find it. I never pushed it to the CF since it interacts so badly with the kernel. I was thinking about pushing the small part that is a net win in all cases, the back-sequential patch, but that's independent of any spindle count. It's more related to rotating media and read request merges than it is to multiple spindles or parallelization. The kernel guys basically are waiting for me to patch the kernel. I think I convinced our IT guy at the office to lend me a machine for tests... so it might happen soon. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote: On Tuesday, January 15, 2013, Stephen Frost wrote: * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. I'd rather not have the benefits of parallelism be tied to partitioning if we can help it. Hopefully implementing parallelism in core would result in something more transparent than that. We will need a way to know we are not saturating the I/O channel with random I/O that could have been sequential if it was single-threaded. Tablespaces give us that info; not sure what else does. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 11:44 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote: On Tuesday, January 15, 2013, Stephen Frost wrote: * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. I'd rather not have the benefits of parallelism be tied to partitioning if we can help it. Hopefully implementing parallelism in core would result in something more transparent than that. We will need a way to know we are not saturating the I/O channel with random I/O that could have been sequential if it was single-threaded. Tablespaces give us that info; not sure what else does. I do also think tablespaces are a safe bet. But it wouldn't help for parallelizing sorts or other operations with tempfiles (tempfiles reside on the same tablespace), or even over a single table (same tablespace again). And when the query is CPU-bound, it could be parallelized by simply making a multithreaded memory sort. Well, not so simply, but I do think it's an important building block. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 11:56:21PM -0300, Claudio Freire wrote: On Wed, Jan 16, 2013 at 11:44 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote: On Tuesday, January 15, 2013, Stephen Frost wrote: * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. I'd rather not have the benefits of parallelism be tied to partitioning if we can help it. Hopefully implementing parallelism in core would result in something more transparent than that. We will need a way to know we are not saturating the I/O channel with random I/O that could have been sequential if it was single-threaded. Tablespaces give us that info; not sure what else does. I do also think tablespaces are a safe bet. But it wouldn't help for parallelizing sorts or other operations with tempfiles (tempfiles reside on the same tablespace), or even over a single table (same We can round-robin temp tablespace usage if you list multiple entries. tablespace again). And when the query is CPU-bound, it could be parallelized by simply making a multithreaded memory sort. Well, not so simply, but I do think it's an important building block. Yes, and detecting when to use these parallel features will be hard. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wednesday, January 16, 2013, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us javascript:;) wrote: I am not sure how a COPY could be easily parallelized, but I supposed it could be done as part of the 1GB segment feature. People have complained that COPY is CPU-bound, so it might be very interesting to see if we could offload some of that parsing overhead to a child. COPY can certainly be CPU bound but before we can parallelize that usefully we need to solve the problem around extent locking when trying to do multiple COPY's to the same table. I think that is rather over-stating it. Even with unindexed untriggered tables, I can get some benefit from doing hand-rolled parallel COPY before the extension lock becomes an issue, at least on some machines. And with triggered or indexed tables, all the more so. Cheers, Jeff
Re: [HACKERS] Parallel query execution
* Bruce Momjian (br...@momjian.us) wrote: Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. This would be fantastic and I'd like to help. Parallel query and real partitioning are two of our biggest holes for OLAP and data warehouse users. Please consider updating the page yourself or posting your ideas to this thread. Thanks. Will do. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote: I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. A few months back, I remarked [1] that speeding up sorting using pipelining and asynchronous I/O was probably parallelism low-hanging fruit. That hasn't changed, though I personally still don't have the bandwidth to look into it in a serious way. [1] http://www.postgresql.org/message-id/caeylb_vezpkdx54vex3x30oy_uoth89xoejjw6aucjjiujs...@mail.gmail.com -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 10:39:10PM +, Peter Geoghegan wrote: On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote: I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. A few months back, I remarked [1] that speeding up sorting using pipelining and asynchronous I/O was probably parallelism low-hanging fruit. That hasn't changed, though I personally still don't have the bandwidth to look into it in a serious way. [1] http://www.postgresql.org/message-id/caeylb_vezpkdx54vex3x30oy_uoth89xoejjw6aucjjiujs...@mail.gmail.com OK, I added the link to the wiki. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 10:53:29PM +, Simon Riggs wrote: On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote: I mentioned last year that I wanted to start working on parallelism: We don't normally begin discussing topics for next release just as a CF is starting. Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote: I mentioned last year that I wanted to start working on parallelism: We don't normally begin discussing topics for next release just as a CF is starting. Why is this being discussed now? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote: Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. Bruce, there are many, many patches on the queue. How will we ever get to beta testing if we begin open ended discussions on next release? If we can't finish what we've started for 9.3, why talk about 9.4? Yes, its a great topic for discussion, but there are better times. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On 16/01/13 11:14, Bruce Momjian wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. Hmm... How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? On large multiple processor machines, there are different blocks of memory that might be accessed at different speeds depending on the processor. Possibly a mechanism could be used to split a transaction over multiple processors to ensure the fastest memory is used? Once a selection of rows has been made, then if there is a lot of reformatting going on, then could this be done in parallel? I can of think of 2 very simplistic strategies: (A) use a different processor core for each column, or (B) farm out sets of rows to different cores. I am sure in reality, there are more subtleties and aspects of both the strategies will be used in a hybrid fashion along with other approaches. I expect that before any parallel algorithm is invoked, then some sort of threshold needs to be exceeded to make it worth while. Different aspects of the parallel algorithm may have their own thresholds. It may not be worth applying a parallel algorithm for 10 rows from a simple table, but selecting 10,000 records from multiple tables each over 10 million rows using joins may benefit for more extreme parallelism. I expect that UNIONs, as well as the processing of partitioned tables, may be amenable to parallel processing. Cheers, Gavin
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote: On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote: Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. Bruce, there are many, many patches on the queue. How will we ever get to beta testing if we begin open ended discussions on next release? If we can't finish what we've started for 9.3, why talk about 9.4? Yes, its a great topic for discussion, but there are better times. Like when? I don't remember a policy of not discussing things now. Does anyone else remember this? Are you saying feature discussion is only between commit-fests? Is this written down anywhere? I only remember beta-time as a time not to discuss features. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote: On 16/01/13 11:14, Bruce Momjian wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. Hmm... How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Well, we usually label these as tablespaces. I don't know if spindle-level is a reasonable level to add. On large multiple processor machines, there are different blocks of memory that might be accessed at different speeds depending on the processor. Possibly a mechanism could be used to split a transaction over multiple processors to ensure the fastest memory is used? That seems too far-out for an initial approach. Once a selection of rows has been made, then if there is a lot of reformatting going on, then could this be done in parallel? I can of think of 2 very simplistic strategies: (A) use a different processor core for each column, or (B) farm out sets of rows to different cores. I am sure in reality, there are more subtleties and aspects of both the strategies will be used in a hybrid fashion along with other approaches. Probably #2, but that is going to require having some of modules thread/fork-safe, and that is going to be tricky. I expect that before any parallel algorithm is invoked, then some sort of threshold needs to be exceeded to make it worth while. Different aspects of the parallel algorithm may have their own thresholds. It may not be worth applying a parallel algorithm for 10 rows from a simple table, but selecting 10,000 records from multiple tables each over 10 million rows using joins may benefit for more extreme parallelism. Right, I bet we will need some way to control when the overhead of parallel execution is worth it. I expect that UNIONs, as well as the processing of partitioned tables, may be amenable to parallel processing. Interesting idea on UNION. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Gavin Flower (gavinflo...@archidevsys.co.nz) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. We're implementing our own poor-man's parallelism using exactly this to use as much of the CPU and I/O bandwidth as we can. I have every confidence that it could be done better and be simpler for us if it was handled in the backend. On large multiple processor machines, there are different blocks of memory that might be accessed at different speeds depending on the processor. Possibly a mechanism could be used to split a transaction over multiple processors to ensure the fastest memory is used? Let's work on getting it working on the h/w that PG is most commonly deployed on first.. I agree that we don't want to paint ourselves into a corner with this, but I don't think massive NUMA systems are what we should focus on first (are you familiar with any that run PG today..?). I don't expect we're going to be trying to fight with the Linux (or whatever) kernel over what threads run on what processors with access to what memory on small-NUMA systems (x86-based). Once a selection of rows has been made, then if there is a lot of reformatting going on, then could this be done in parallel? I can of think of 2 very simplistic strategies: (A) use a different processor core for each column, or (B) farm out sets of rows to different cores. I am sure in reality, there are more subtleties and aspects of both the strategies will be used in a hybrid fashion along with other approaches. Given our row-based storage architecture, I can't imagine we'd do anything other than take a row-based approach to this.. I would think we'd do two things: parallelize based on partitioning, and parallelize seqscan's across the individual heap files which are split on a per-1G boundary already. Perhaps we can generalize that and scale it based on the number of available processors and the size of the relation but I could see advantages in matching up with what the kernel thinks are independent files. I expect that before any parallel algorithm is invoked, then some sort of threshold needs to be exceeded to make it worth while. Certainly. That's need to be included in the optimization model to support this. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 06:15:57PM -0500, Stephen Frost wrote: * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. We're implementing our own poor-man's parallelism using exactly this to use as much of the CPU and I/O bandwidth as we can. I have every confidence that it could be done better and be simpler for us if it was handled in the backend. Yes, I have listed tablespaces and partitions as possible parallel options on the wiki. On large multiple processor machines, there are different blocks of memory that might be accessed at different speeds depending on the processor. Possibly a mechanism could be used to split a transaction over multiple processors to ensure the fastest memory is used? Let's work on getting it working on the h/w that PG is most commonly deployed on first.. I agree that we don't want to paint ourselves into a corner with this, but I don't think massive NUMA systems are what we should focus on first (are you familiar with any that run PG today..?). I don't expect we're going to be trying to fight with the Linux (or whatever) kernel over what threads run on what processors with access to what memory on small-NUMA systems (x86-based). Agreed. Once a selection of rows has been made, then if there is a lot of reformatting going on, then could this be done in parallel? I can of think of 2 very simplistic strategies: (A) use a different processor core for each column, or (B) farm out sets of rows to different cores. I am sure in reality, there are more subtleties and aspects of both the strategies will be used in a hybrid fashion along with other approaches. Given our row-based storage architecture, I can't imagine we'd do anything other than take a row-based approach to this.. I would think we'd do two things: parallelize based on partitioning, and parallelize seqscan's across the individual heap files which are split on a per-1G boundary already. Perhaps we can generalize that and scale it based on the number of available processors and the size of the relation but I could see advantages in matching up with what the kernel thinks are independent files. The 1GB idea is interesting. I found in pg_upgrade that file copy would just overwhelm the I/O channel, and that doing multiple copies on the same device had no win, but those were pure I/O operations --- a sequential scan might be enough of a mix of I/O and CPU that parallelism might help. I expect that before any parallel algorithm is invoked, then some sort of threshold needs to be exceeded to make it worth while. Certainly. That's need to be included in the optimization model to support this. I have updated the wiki to reflect the ideas mentioned above. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian br...@momjian.us wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. Honestly that would be a great feature, and I would be happy helping working on it. Taking advantage of parallelism in a server with multiple core, especially for things like large sorting operations would be great. Just thinking loudly, but wouldn't it be the role of the planner to determine if such or such query is worth using parallelism? The executor would then be in charge of actually firing the tasks in parallel that planner has determined necessary to do. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 09:11:20AM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian br...@momjian.us wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. Honestly that would be a great feature, and I would be happy helping working on it. Taking advantage of parallelism in a server with multiple core, especially for things like large sorting operations would be great. Just thinking loudly, but wouldn't it be the role of the planner to determine if such or such query is worth using parallelism? The executor would then be in charge of actually firing the tasks in parallel that planner has determined necessary to do. Yes, it would probably be driven off of the optimizer statistics. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote: Given our row-based storage architecture, I can't imagine we'd do anything other than take a row-based approach to this.. I would think we'd do two things: parallelize based on partitioning, and parallelize seqscan's across the individual heap files which are split on a per-1G boundary already. Perhaps we can generalize that and scale it based on the number of available processors and the size of the relation but I could see advantages in matching up with what the kernel thinks are independent files. The 1GB idea is interesting. I found in pg_upgrade that file copy would just overwhelm the I/O channel, and that doing multiple copies on the same device had no win, but those were pure I/O operations --- a sequential scan might be enough of a mix of I/O and CPU that parallelism might help. AFAIR, synchroscans were introduced because multiple large sequential scans were counterproductive (big time). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Claudio Freire (klaussfre...@gmail.com) wrote: On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote: The 1GB idea is interesting. I found in pg_upgrade that file copy would just overwhelm the I/O channel, and that doing multiple copies on the same device had no win, but those were pure I/O operations --- a sequential scan might be enough of a mix of I/O and CPU that parallelism might help. AFAIR, synchroscans were introduced because multiple large sequential scans were counterproductive (big time). Sequentially scanning the *same* data over and over is certainly counterprouctive. Synchroscans fixed that, yes. That's not what we're talking about though- we're talking about scanning and processing independent sets of data using multiple processes. It's certainly possible that in some cases that won't be as good, but there will be quite a few cases where it's much, much better. Consider a very complicated function running against each row which makes the CPU the bottleneck instead of the i/o system. That type of a query will never run faster than a single CPU in a single-process environment, regardless of if you have synch-scans or not, while in a multi-process environment you'll take advantage of the extra CPUs which are available and use more of the I/O bandwidth that isn't yet exhausted. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote: The 1GB idea is interesting. I found in pg_upgrade that file copy would just overwhelm the I/O channel, and that doing multiple copies on the same device had no win, but those were pure I/O operations --- a sequential scan might be enough of a mix of I/O and CPU that parallelism might help. AFAIR, synchroscans were introduced because multiple large sequential scans were counterproductive (big time). Sequentially scanning the *same* data over and over is certainly counterprouctive. Synchroscans fixed that, yes. That's not what we're talking about though- we're talking about scanning and processing independent sets of data using multiple processes. I don't see the difference. Blocks are blocks (unless they're cached). It's certainly possible that in some cases that won't be as good If memory serves me correctly (and it does, I suffered it a lot), the performance hit is quite considerable. Enough to make it a lot worse rather than not as good. but there will be quite a few cases where it's much, much better. Just cached segments. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Claudio Freire (klaussfre...@gmail.com) wrote: On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost sfr...@snowman.net wrote: Sequentially scanning the *same* data over and over is certainly counterprouctive. Synchroscans fixed that, yes. That's not what we're talking about though- we're talking about scanning and processing independent sets of data using multiple processes. I don't see the difference. Blocks are blocks (unless they're cached). Not quite. Having to go out to the kernel isn't free. Additionally, the seq scans used to pollute our shared buffers prior to synch-scanning, which didn't help things. It's certainly possible that in some cases that won't be as good If memory serves me correctly (and it does, I suffered it a lot), the performance hit is quite considerable. Enough to make it a lot worse rather than not as good. I feel like we must not be communicating very well. If the CPU is pegged at 100% and the I/O system is at 20%, adding another CPU at 100% will bring the I/O load up to 40% and you're now processing data twice as fast overall. If you're running a single CPU at 20% and your I/O system is at 100%, then adding another CPU isn't going to help and may even degrade performance by causing problems for the I/O system. The goal of the optimizer will be to model the plan to account for exactly that, as best it can. but there will be quite a few cases where it's much, much better. Just cached segments. No, certainly not just cached segments. Any situation where the CPU is the bottleneck. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
but there will be quite a few cases where it's much, much better. Just cached segments. Actually, thanks to much faster storage (think SSD, SAN), it's easily possible for PostgreSQL to become CPU-limited on a seq scan query, even when reading from disk. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Josh Berkus (j...@agliodbs.com) wrote: Actually, thanks to much faster storage (think SSD, SAN), it's easily possible for PostgreSQL to become CPU-limited on a seq scan query, even when reading from disk. Particularly with a complex filter being applied or if it's feeding into something above that's expensive.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 1:32 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? Not exactly, I meant something like being able to use parallel processing when doing INSERT or COPY directly in core. If there is a parallel processing infrastructure, it could also be used for such write operations. I agree that the cases mentioned by Josh are far more appealing though... -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 12:55 AM, Stephen Frost sfr...@snowman.net wrote: If memory serves me correctly (and it does, I suffered it a lot), the performance hit is quite considerable. Enough to make it a lot worse rather than not as good. I feel like we must not be communicating very well. If the CPU is pegged at 100% and the I/O system is at 20%, adding another CPU at 100% will bring the I/O load up to 40% and you're now processing data twice as fast overall Well, there's the fault in your logic. It won't be as linear. Adding another sequential scan will decrease bandwidth, if the I/O system was doing say 10MB/s at 20% load, now it will be doing 20MB/s at 80% load (maybe even worse). Quite suddenly you'll meet diminishing returns, and the I/O subsystem which wasn't the bottleneck will become it, bandwidth being the key. You might end up with less bandwidth than you've started, if you go far enough past that knee. Add some concurrent operations (connections) to the mix and it just gets worse. Figuring out where the knee is may be the hardest problem you'll face. I don't think it'll be predictable enough to make I/O parallelization in that case worth the effort. If you instead think of parallelizing random I/O (say index scans within nested loops), that might work (or it might not). Again it depends a helluva lot on what else is contending with the I/O resources and how far ahead of optimum you push it. I've faced this problem when trying to prefetch on index scans. If you try to prefetch too much, you induce extra delays and it's a bad tradeoff. Feel free to do your own testing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
Bruce Momjian escribió: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? It's in the previous-to-last commitfest. IIRC that patch required review and testing from people with some Windows background. There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tuesday, January 15, 2013, Simon Riggs wrote: On 15 January 2013 22:55, Bruce Momjian br...@momjian.us javascript:; wrote: Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. Bruce, there are many, many patches on the queue. How will we ever get to beta testing if we begin open ended discussions on next release? If we can't finish what we've started for 9.3, why talk about 9.4? Yes, its a great topic for discussion, but there are better times. Possibly so. But unless we are to introduce a thinkfest, how do we know when such a better time would be? Lately commit-fests have been basically a continuous thing, except during beta which would be an even worse time to discuss it. It think that parallel execution is huge and probably more likely for 9.5 (10.0?) than 9.4 for the general case (maybe some special cases for 9.4, like index builds). Yet the single biggest risk I see to the future of the project is the lack of parallel execution. Cheers, Jeff
Re: [HACKERS] Parallel query execution
Alvaro Herrera alvhe...@2ndquadrant.com writes: There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. In case you hadn't noticed, we've totally lost control of the CF process. Quite aside from the lack of progress on closing CF3, major hackers who should know better are submitting significant new feature patches now, despite our agreement in Ottawa that nothing big would be accepted after CF3. At this point I'd bet against releasing 9.3 during 2013. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Query Execution Project
Hi, On 09/28/2010 07:24 AM, Li Jie wrote: I'm interested in this parallel project, http://wiki.postgresql.org/wiki/Parallel_Query_Execution But I can't find any discussion and current progress in the website, it seems to stop for nearly a year? Yeah, I don't know of anybody really working on it ATM. If you are interested in a process based design, please have a look at the bgworker infrastructure stuff. It could be of help for a process-based implementation. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Query Execution Project
On Sep 28, 2010, at 10:15 AM, Markus Wanner wrote: Hi, On 09/28/2010 07:24 AM, Li Jie wrote: I'm interested in this parallel project, http://wiki.postgresql.org/wiki/Parallel_Query_Execution But I can't find any discussion and current progress in the website, it seems to stop for nearly a year? Yeah, I don't know of anybody really working on it ATM. If you are interested in a process based design, please have a look at the bgworker infrastructure stuff. It could be of help for a process-based implementation. Regards Markus Wanner yes, i don't know of anybody either. in addition to that it is more than a giant task. it means working on more than just one isolated part. practically i cannot think of any stage of query execution which would not need some changes. i don't see a feature like that within a realistic timeframe. regards, hans -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers