Re: [HACKERS] Parallel query execution with SPI
On 31.03.2017 13:48, Robert Haas wrote: On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnikwrote: It is possible to execute query concurrently using SPI? If so, how it can be enforced? I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help: query is executed by single backend while the same query been launched at top level uses parallel plan: fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query, fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK); ... SPI_cursor_fetch(fsstate->portal, true, 1); Parallel execution isn't possible if you are using a cursor-type interface, because a parallel query can't be suspended and resumed like a non-parallel query. If you use a function that executes the query to completion in one go, like SPI_execute_plan, then it's cool. See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d. Thank you very much for explanation. In case of using SPI_execute the query is really executed concurrently. But it means that when I am executing some query using SPI, I need to somehow predict number of returned tuples. If it is not so much, then it is better to use SPI_execute to allow concurrent execution of the query. But if it is large enough, then SPI_execute without limit can cause memory overflow. Certainly I can specify some reasonable limit and it if is reached, then use cursor instead. But it is neither convenient, neither efficient. I wonder if somebody can suggest better solution? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution with SPI
On Fri, Mar 31, 2017 at 4:18 PM, Robert Haaswrote: > On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnik > wrote: >> It is possible to execute query concurrently using SPI? >> If so, how it can be enforced? >> I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help: >> query is executed by single backend while the same query been launched at >> top level uses parallel plan: >> >> fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query, >> fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK); >> ... >> SPI_cursor_fetch(fsstate->portal, true, 1); > > Parallel execution isn't possible if you are using a cursor-type > interface, because a parallel query can't be suspended and resumed > like a non-parallel query. If you use a function that executes the > query to completion in one go, like SPI_execute_plan, then it's cool. > See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d. > > -- Adding to that, for your case, passing CURSOR_OPT_PARALLEL_OK is not enough, because PortalRun for the cursor would be having portal->run_once set as false which restricts parallelism in ExecutePlan, if (!execute_once || dest->mydest == DestIntoRel) use_parallel_mode = false; You may check [1] for the discussion on this. [1] https://www.postgresql.org/message-id/flat/CAFiTN-vxhvvi-rMJFOxkGzNaQpf%2BKS76%2Bsu7-sG_NQZGRPJkQg%40mail.gmail.com#cafitn-vxhvvi-rmjfoxkgznaqpf+ks76+su7-sg_nqzgrpj...@mail.gmail.com -- Regards, Rafia Sabih EnterpriseDB: http://www.enterprisedb.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution with SPI
On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnikwrote: > It is possible to execute query concurrently using SPI? > If so, how it can be enforced? > I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help: > query is executed by single backend while the same query been launched at > top level uses parallel plan: > > fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query, > fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK); > ... > SPI_cursor_fetch(fsstate->portal, true, 1); Parallel execution isn't possible if you are using a cursor-type interface, because a parallel query can't be suspended and resumed like a non-parallel query. If you use a function that executes the query to completion in one go, like SPI_execute_plan, then it's cool. See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel query execution with SPI
Hi hackers, It is possible to execute query concurrently using SPI? If so, how it can be enforced? I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help: query is executed by single backend while the same query been launched at top level uses parallel plan: fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query, fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK); ... SPI_cursor_fetch(fsstate->portal, true, 1); Thanks in advance, -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I just got out of a meeting that included Oracle Spatial folks, who were boasting of big performance increases in enabling parallel query on their spatial queries. Basically the workloads on things like big spatial joins are entirely CPU bound, so they are seeing that adding 15 processors makes things 15x faster. Spatial folks would love love love to see parallel query execution. -- Paul Ramsey http://cleverelephant.ca http://postgis.net
Re: [HACKERS] Parallel query execution
On Thu, Jan 24, 2013 at 02:34:49PM -0800, Paul Ramsey wrote: On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I just got out of a meeting that included Oracle Spatial folks, who were boasting of big performance increases in enabling parallel query on their spatial queries. Basically the workloads on things like big spatial joins are entirely CPU bound, so they are seeing that adding 15 processors makes things 15x faster. Spatial folks would love love love to see parallel query execution. I added PostGIS under the Expensive Functions opportunity: https://wiki.postgresql.org/wiki/Parallel_Query_Execution#Specific_Opportunities -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 11:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. In case you hadn't noticed, we've totally lost control of the CF process. Quite aside from the lack of progress on closing CF3, major hackers who should know better are submitting significant new feature patches now, despite our agreement in Ottawa that nothing big would be accepted after CF3. At this point I'd bet against releasing 9.3 during 2013. I have been skimming the commitfest application, and unlike some of the previous commitfests a huge number of patches have had review at some point in time, but probably need more...so looking for the red Nobody in the 'reviewers' column probably understates the shortage of review. I'm curious what the qualitative feelings are on patches or clusters thereof and what kind of review would be helpful in clearing the field. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 12:03 AM, Bruce Momjian br...@momjian.us wrote: On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote: On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote: Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. Bruce, there are many, many patches on the queue. How will we ever get to beta testing if we begin open ended discussions on next release? If we can't finish what we've started for 9.3, why talk about 9.4? Yes, its a great topic for discussion, but there are better times. Like when? I don't remember a policy of not discussing things now. Does anyone else remember this? Are you saying feature discussion is only between commit-fests? Is this written down anywhere? I only remember beta-time as a time not to discuss features. We kind of do - when in a CF we should do reviewing of existing patches, when outside a CF we should do discussions and work on new features. It's on http://wiki.postgresql.org/wiki/CommitFest. It doesn't specifically say do this and don't do htat, but it says focus on review and discussing things that will happen that far ahead is definitely not focusing on review. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 2:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: Alvaro Herrera alvhe...@2ndquadrant.com writes: There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. In case you hadn't noticed, we've totally lost control of the CF process. Quite aside from the lack of progress on closing CF3, major hackers who should know better are submitting significant new feature patches now, despite our agreement in Ottawa that nothing big would be accepted after CF3. At this point I'd bet against releasing 9.3 during 2013. Or we could reject all of those patches. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 6:52 AM, Magnus Hagander mag...@hagander.net wrote: On Wed, Jan 16, 2013 at 12:03 AM, Bruce Momjian br...@momjian.us wrote: On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote: On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote: Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. Bruce, there are many, many patches on the queue. How will we ever get to beta testing if we begin open ended discussions on next release? If we can't finish what we've started for 9.3, why talk about 9.4? Yes, its a great topic for discussion, but there are better times. Like when? I don't remember a policy of not discussing things now. Does anyone else remember this? Are you saying feature discussion is only between commit-fests? Is this written down anywhere? I only remember beta-time as a time not to discuss features. We kind of do - when in a CF we should do reviewing of existing patches, when outside a CF we should do discussions and work on new features. It's on http://wiki.postgresql.org/wiki/CommitFest. It doesn't specifically say do this and don't do htat, but it says focus on review and discussing things that will happen that far ahead is definitely not focusing on review. Bruce is evidently under the impression that he's no longer under any obligation to review or commit other people's patches, or participate in the CommitFest process in any way. I believe that he has not committed a significant patch written by someone else in several years. If the committers on the core team aren't committed to the process, it doesn't stand much chance of working. The fact that I have been completely buried for the last six months is perhaps not helping, either, but even at the very low level of engagement I've been at recently, I've still done more reviews (a few) than patch submissions (none). I view it as everyone's responsibility to maintain a similar balance in their own work. And some people are, but not enough, especially among the committers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Claudio Freire (klaussfre...@gmail.com) wrote: Well, there's the fault in your logic. It won't be as linear. I really don't see how this has become so difficult to communicate. It doesn't have to be linear. We're currently doing massive amounts of parallel processing by hand using partitioning, tablespaces, and client-side logic to split up the jobs. It's certainly *much* faster than doing it in a single thread. It's also faster with 10 processes going than 5 (we've checked). With 10 going, we've hit the FC fabric limit (and these are spinning disks in the SAN, not SSDs). I'm also sure it'd be much slower if all 10 processes were trying to read data through a single process that's reading from the I/O system. We've got some processes which essentially end up doing that and we don't come anywhere near the total FC fabric bandwidth when just scanning through the system because, at that point, you do hit the limits of how fast the individual drive sets can provide data. To be clear- I'm not suggesting that we would parallelize a SeqScan node and have the nodes above it be single-threaded. As I said upthread- we want to parallelize reading and processing the data coming in. Perhaps at some level that works out to not change how we actually *do* seqscans at all and instead something higher in the plan tree just creates multiple of them on independent threads, but it's still going to end up being parallel I/O in the end. I'm done with this thread for now- as brought up, we need to focus on getting 9.3 out the door. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
* Tom Lane (t...@sss.pgh.pa.us) wrote: In case you hadn't noticed, we've totally lost control of the CF process. I concur. Quite aside from the lack of progress on closing CF3, major hackers who should know better are submitting significant new feature patches now, despite our agreement in Ottawa that nothing big would be accepted after CF3. For my small part, it wasn't my intent to drop a contentious patch at the end. I had felt it was pretty minor and relatively simple. My arguments regarding the popen patch were simply that it didn't address one of the use-cases that I was hoping to. I'll hold off on working on the compressed transport for now in favor of doing reviews and trying to help get 9.3 wrapped up. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
* Daniel Farina (dan...@heroku.com) wrote: I have been skimming the commitfest application, and unlike some of the previous commitfests a huge number of patches have had review at some point in time, but probably need more...so looking for the red Nobody in the 'reviewers' column probably understates the shortage of review. I've been frustrated by that myself. I realize we don't want to duplicate work but I'm really starting to think that having the Reviewers column has turned out to actually work against us. I'm curious what the qualitative feelings are on patches or clusters thereof and what kind of review would be helpful in clearing the field. I haven't been thrilled with the patches that I've looked at but they've also been ones that hadn't been reviewed before, so perhaps that's what should be expected. It'd be neat if we had some idea of what committers were actively working on and keep off of *those*, but keep working on the ones which aren't being worked by a committer currently. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
On 01/15/2013 11:32 PM, Bruce Momjian wrote: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? I am about half way through reviewing it. Unfortunately paid work take precedence over unpaid work. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 10:33 AM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: Well, there's the fault in your logic. It won't be as linear. I really don't see how this has become so difficult to communicate. It doesn't have to be linear. We're currently doing massive amounts of parallel processing by hand using partitioning, tablespaces, and client-side logic to split up the jobs. It's certainly *much* faster than doing it in a single thread. It's also faster with 10 processes going than 5 (we've checked). With 10 going, we've hit the FC fabric limit (and these are spinning disks in the SAN, not SSDs). I'm also sure it'd be much slower if all 10 processes were trying to read data through a single process that's reading from the I/O system. We've got some processes which essentially end up doing that and we don't come anywhere near the total FC fabric bandwidth when just scanning through the system because, at that point, you do hit the limits of how fast the individual drive sets can provide data. Well... just closing then (to let people focus on 9.3's CF), that's a level of hardware I haven't had experience with, but seems to behave much different than regular (big and small) RAID arrays. In any case, perhaps tablespaces are a hint here: if nodes are working on different tablespaces, there's an indication that they *can* be parallelized efficiently. That could be fleshed out on a parallel execution node, but for that to work the whole execution engine needs to be thread-safe (or it has to fork). It won't be easy. It's best to concentrate on lower-hanging fruits, like sorting and aggregates. Now back to the CF. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 08:42:29AM -0500, Stephen Frost wrote: * Daniel Farina (dan...@heroku.com) wrote: I have been skimming the commitfest application, and unlike some of the previous commitfests a huge number of patches have had review at some point in time, but probably need more...so looking for the red Nobody in the 'reviewers' column probably understates the shortage of review. I've been frustrated by that myself. I realize we don't want to duplicate work but I'm really starting to think that having the Reviewers column has turned out to actually work against us. That column tells the CF manager whom to browbeat. Without a CF manager, a stale entry can indeed make a patch look under-control when it isn't. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 01:37:28PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:32 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I think we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? Not exactly, I meant something like being able to use parallel processing when doing INSERT or COPY directly in core. If there is a parallel processing infrastructure, it could also be used for such write operations. I agree that the cases mentioned by Josh are far more appealing though... I am not sure how a COPY could be easily parallelized, but I supposed it could be done as part of the 1GB segment feature. People have complained that COPY is CPU-bound, so it might be very interesting to see if we could offload some of that parsing overhead to a child. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 01:48:29AM -0300, Alvaro Herrera wrote: Bruce Momjian escribió: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? It's in the previous-to-last commitfest. IIRC that patch required review and testing from people with some Windows background. There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. Wow, I had no idea we were that far behind. I have avoided commit-fest work because I often travel so might leave the items abandoned, and I try to do cleanup of items that never make the commit-fest --- I thought that was something that needed doing too, and I rarely can complete that task. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 08:11:06AM -0500, Robert Haas wrote: We kind of do - when in a CF we should do reviewing of existing patches, when outside a CF we should do discussions and work on new features. It's on http://wiki.postgresql.org/wiki/CommitFest. It doesn't specifically say do this and don't do htat, but it says focus on review and discussing things that will happen that far ahead is definitely not focusing on review. Bruce is evidently under the impression that he's no longer under any obligation to review or commit other people's patches, or participate in the CommitFest process in any way. I believe that he has not committed a significant patch written by someone else in several years. If the committers on the core team aren't committed to the process, it doesn't stand much chance of working. I assume you know I was the most frequent committer of other people's patches for years before the commit-fests started, so I thought I would move on to other things. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 09:05:39AM -0500, Andrew Dunstan wrote: On 01/15/2013 11:32 PM, Bruce Momjian wrote: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? I am about half way through reviewing it. Unfortunately paid work take precedence over unpaid work. Do you think it will make it into 9.3? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On 01/16/2013 12:20 PM, Bruce Momjian wrote: On Wed, Jan 16, 2013 at 09:05:39AM -0500, Andrew Dunstan wrote: On 01/15/2013 11:32 PM, Bruce Momjian wrote: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? I am about half way through reviewing it. Unfortunately paid work take precedence over unpaid work. Do you think it will make it into 9.3? Yes, I hope it will. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Bruce Momjian (br...@momjian.us) wrote: I am not sure how a COPY could be easily parallelized, but I supposed it could be done as part of the 1GB segment feature. People have complained that COPY is CPU-bound, so it might be very interesting to see if we could offload some of that parsing overhead to a child. COPY can certainly be CPU bound but before we can parallelize that usefully we need to solve the problem around extent locking when trying to do multiple COPY's to the same table. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
2013/1/16 Stephen Frost sfr...@snowman.net: * Bruce Momjian (br...@momjian.us) wrote: I am not sure how a COPY could be easily parallelized, but I supposed it could be done as part of the 1GB segment feature. People have complained that COPY is CPU-bound, so it might be very interesting to see if we could offload some of that parsing overhead to a child. COPY can certainly be CPU bound but before we can parallelize that usefully we need to solve the problem around extent locking when trying to do multiple COPY's to the same table. Probably update any related indexes and constraint checking should be paralellized. Regards Pavel Thanks, Stephen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 10:06:51PM +0100, Pavel Stehule wrote: 2013/1/16 Stephen Frost sfr...@snowman.net: * Bruce Momjian (br...@momjian.us) wrote: I am not sure how a COPY could be easily parallelized, but I supposed it could be done as part of the 1GB segment feature. People have complained that COPY is CPU-bound, so it might be very interesting to see if we could offload some of that parsing overhead to a child. COPY can certainly be CPU bound but before we can parallelize that usefully we need to solve the problem around extent locking when trying to do multiple COPY's to the same table. Probably update any related indexes and constraint checking should be paralellized. Wiki updated: https://wiki.postgresql.org/wiki/Parallel_Query_Execution -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
2013/1/16 Bruce Momjian br...@momjian.us: Wiki updated: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Could we add CTE to that opportunities list? I think that some kind of queries in CTE queries could be easilly parallelized. []s -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 07:57:01PM -0200, Dickson S. Guedes wrote: 2013/1/16 Bruce Momjian br...@momjian.us: Wiki updated: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Could we add CTE to that opportunities list? I think that some kind of queries in CTE queries could be easilly parallelized. I added CTEs with joins. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tuesday, January 15, 2013, Stephen Frost wrote: * Gavin Flower (gavinflo...@archidevsys.co.nz javascript:;) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. I'd rather not have the benefits of parallelism be tied to partitioning if we can help it. Hopefully implementing parallelism in core would result in something more transparent than that. Cheers, Jeff
Re: [HACKERS] Parallel query execution
On Tuesday, January 15, 2013, Gavin Flower wrote: On 16/01/13 11:14, Bruce Momjian wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. Hmm... How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? effective_io_concurrency does this for bitmap scans. I thought there was a patch in the commitfest to extend this to ordinary index scans, but now I can't find it. But it still doesn't give you CPU parallelism. The nice thing about CPU parallelism is that it usually brings some amount of IO parallelism for free, while the reverse less likely to be so. Cheers, Jeff
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 10:04 PM, Jeff Janes jeff.ja...@gmail.com wrote: Hmm... How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? effective_io_concurrency does this for bitmap scans. I thought there was a patch in the commitfest to extend this to ordinary index scans, but now I can't find it. I never pushed it to the CF since it interacts so badly with the kernel. I was thinking about pushing the small part that is a net win in all cases, the back-sequential patch, but that's independent of any spindle count. It's more related to rotating media and read request merges than it is to multiple spindles or parallelization. The kernel guys basically are waiting for me to patch the kernel. I think I convinced our IT guy at the office to lend me a machine for tests... so it might happen soon. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote: On Tuesday, January 15, 2013, Stephen Frost wrote: * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. I'd rather not have the benefits of parallelism be tied to partitioning if we can help it. Hopefully implementing parallelism in core would result in something more transparent than that. We will need a way to know we are not saturating the I/O channel with random I/O that could have been sequential if it was single-threaded. Tablespaces give us that info; not sure what else does. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 11:44 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote: On Tuesday, January 15, 2013, Stephen Frost wrote: * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. I'd rather not have the benefits of parallelism be tied to partitioning if we can help it. Hopefully implementing parallelism in core would result in something more transparent than that. We will need a way to know we are not saturating the I/O channel with random I/O that could have been sequential if it was single-threaded. Tablespaces give us that info; not sure what else does. I do also think tablespaces are a safe bet. But it wouldn't help for parallelizing sorts or other operations with tempfiles (tempfiles reside on the same tablespace), or even over a single table (same tablespace again). And when the query is CPU-bound, it could be parallelized by simply making a multithreaded memory sort. Well, not so simply, but I do think it's an important building block. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 11:56:21PM -0300, Claudio Freire wrote: On Wed, Jan 16, 2013 at 11:44 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote: On Tuesday, January 15, 2013, Stephen Frost wrote: * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. I'd rather not have the benefits of parallelism be tied to partitioning if we can help it. Hopefully implementing parallelism in core would result in something more transparent than that. We will need a way to know we are not saturating the I/O channel with random I/O that could have been sequential if it was single-threaded. Tablespaces give us that info; not sure what else does. I do also think tablespaces are a safe bet. But it wouldn't help for parallelizing sorts or other operations with tempfiles (tempfiles reside on the same tablespace), or even over a single table (same We can round-robin temp tablespace usage if you list multiple entries. tablespace again). And when the query is CPU-bound, it could be parallelized by simply making a multithreaded memory sort. Well, not so simply, but I do think it's an important building block. Yes, and detecting when to use these parallel features will be hard. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wednesday, January 16, 2013, Stephen Frost wrote: * Bruce Momjian (br...@momjian.us javascript:;) wrote: I am not sure how a COPY could be easily parallelized, but I supposed it could be done as part of the 1GB segment feature. People have complained that COPY is CPU-bound, so it might be very interesting to see if we could offload some of that parsing overhead to a child. COPY can certainly be CPU bound but before we can parallelize that usefully we need to solve the problem around extent locking when trying to do multiple COPY's to the same table. I think that is rather over-stating it. Even with unindexed untriggered tables, I can get some benefit from doing hand-rolled parallel COPY before the extension lock becomes an issue, at least on some machines. And with triggered or indexed tables, all the more so. Cheers, Jeff
[HACKERS] Parallel query execution
I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Bruce Momjian (br...@momjian.us) wrote: Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. This would be fantastic and I'd like to help. Parallel query and real partitioning are two of our biggest holes for OLAP and data warehouse users. Please consider updating the page yourself or posting your ideas to this thread. Thanks. Will do. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote: I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. A few months back, I remarked [1] that speeding up sorting using pipelining and asynchronous I/O was probably parallelism low-hanging fruit. That hasn't changed, though I personally still don't have the bandwidth to look into it in a serious way. [1] http://www.postgresql.org/message-id/caeylb_vezpkdx54vex3x30oy_uoth89xoejjw6aucjjiujs...@mail.gmail.com -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 10:39:10PM +, Peter Geoghegan wrote: On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote: I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. A few months back, I remarked [1] that speeding up sorting using pipelining and asynchronous I/O was probably parallelism low-hanging fruit. That hasn't changed, though I personally still don't have the bandwidth to look into it in a serious way. [1] http://www.postgresql.org/message-id/caeylb_vezpkdx54vex3x30oy_uoth89xoejjw6aucjjiujs...@mail.gmail.com OK, I added the link to the wiki. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 10:53:29PM +, Simon Riggs wrote: On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote: I mentioned last year that I wanted to start working on parallelism: We don't normally begin discussing topics for next release just as a CF is starting. Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote: I mentioned last year that I wanted to start working on parallelism: We don't normally begin discussing topics for next release just as a CF is starting. Why is this being discussed now? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote: Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. Bruce, there are many, many patches on the queue. How will we ever get to beta testing if we begin open ended discussions on next release? If we can't finish what we've started for 9.3, why talk about 9.4? Yes, its a great topic for discussion, but there are better times. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On 16/01/13 11:14, Bruce Momjian wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. Hmm... How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? On large multiple processor machines, there are different blocks of memory that might be accessed at different speeds depending on the processor. Possibly a mechanism could be used to split a transaction over multiple processors to ensure the fastest memory is used? Once a selection of rows has been made, then if there is a lot of reformatting going on, then could this be done in parallel? I can of think of 2 very simplistic strategies: (A) use a different processor core for each column, or (B) farm out sets of rows to different cores. I am sure in reality, there are more subtleties and aspects of both the strategies will be used in a hybrid fashion along with other approaches. I expect that before any parallel algorithm is invoked, then some sort of threshold needs to be exceeded to make it worth while. Different aspects of the parallel algorithm may have their own thresholds. It may not be worth applying a parallel algorithm for 10 rows from a simple table, but selecting 10,000 records from multiple tables each over 10 million rows using joins may benefit for more extreme parallelism. I expect that UNIONs, as well as the processing of partitioned tables, may be amenable to parallel processing. Cheers, Gavin
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote: On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote: Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. Bruce, there are many, many patches on the queue. How will we ever get to beta testing if we begin open ended discussions on next release? If we can't finish what we've started for 9.3, why talk about 9.4? Yes, its a great topic for discussion, but there are better times. Like when? I don't remember a policy of not discussing things now. Does anyone else remember this? Are you saying feature discussion is only between commit-fests? Is this written down anywhere? I only remember beta-time as a time not to discuss features. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote: On 16/01/13 11:14, Bruce Momjian wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. Hmm... How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Well, we usually label these as tablespaces. I don't know if spindle-level is a reasonable level to add. On large multiple processor machines, there are different blocks of memory that might be accessed at different speeds depending on the processor. Possibly a mechanism could be used to split a transaction over multiple processors to ensure the fastest memory is used? That seems too far-out for an initial approach. Once a selection of rows has been made, then if there is a lot of reformatting going on, then could this be done in parallel? I can of think of 2 very simplistic strategies: (A) use a different processor core for each column, or (B) farm out sets of rows to different cores. I am sure in reality, there are more subtleties and aspects of both the strategies will be used in a hybrid fashion along with other approaches. Probably #2, but that is going to require having some of modules thread/fork-safe, and that is going to be tricky. I expect that before any parallel algorithm is invoked, then some sort of threshold needs to be exceeded to make it worth while. Different aspects of the parallel algorithm may have their own thresholds. It may not be worth applying a parallel algorithm for 10 rows from a simple table, but selecting 10,000 records from multiple tables each over 10 million rows using joins may benefit for more extreme parallelism. Right, I bet we will need some way to control when the overhead of parallel execution is worth it. I expect that UNIONs, as well as the processing of partitioned tables, may be amenable to parallel processing. Interesting idea on UNION. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Gavin Flower (gavinflo...@archidevsys.co.nz) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. We're implementing our own poor-man's parallelism using exactly this to use as much of the CPU and I/O bandwidth as we can. I have every confidence that it could be done better and be simpler for us if it was handled in the backend. On large multiple processor machines, there are different blocks of memory that might be accessed at different speeds depending on the processor. Possibly a mechanism could be used to split a transaction over multiple processors to ensure the fastest memory is used? Let's work on getting it working on the h/w that PG is most commonly deployed on first.. I agree that we don't want to paint ourselves into a corner with this, but I don't think massive NUMA systems are what we should focus on first (are you familiar with any that run PG today..?). I don't expect we're going to be trying to fight with the Linux (or whatever) kernel over what threads run on what processors with access to what memory on small-NUMA systems (x86-based). Once a selection of rows has been made, then if there is a lot of reformatting going on, then could this be done in parallel? I can of think of 2 very simplistic strategies: (A) use a different processor core for each column, or (B) farm out sets of rows to different cores. I am sure in reality, there are more subtleties and aspects of both the strategies will be used in a hybrid fashion along with other approaches. Given our row-based storage architecture, I can't imagine we'd do anything other than take a row-based approach to this.. I would think we'd do two things: parallelize based on partitioning, and parallelize seqscan's across the individual heap files which are split on a per-1G boundary already. Perhaps we can generalize that and scale it based on the number of available processors and the size of the relation but I could see advantages in matching up with what the kernel thinks are independent files. I expect that before any parallel algorithm is invoked, then some sort of threshold needs to be exceeded to make it worth while. Certainly. That's need to be included in the optimization model to support this. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 06:15:57PM -0500, Stephen Frost wrote: * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote: How about being aware of multiple spindles - so if the requested data covers multiple spindles, then data could be extracted in parallel. This may, or may not, involve multiple I/O channels? Yes, this should dovetail with partitioning and tablespaces to pick up on exactly that. We're implementing our own poor-man's parallelism using exactly this to use as much of the CPU and I/O bandwidth as we can. I have every confidence that it could be done better and be simpler for us if it was handled in the backend. Yes, I have listed tablespaces and partitions as possible parallel options on the wiki. On large multiple processor machines, there are different blocks of memory that might be accessed at different speeds depending on the processor. Possibly a mechanism could be used to split a transaction over multiple processors to ensure the fastest memory is used? Let's work on getting it working on the h/w that PG is most commonly deployed on first.. I agree that we don't want to paint ourselves into a corner with this, but I don't think massive NUMA systems are what we should focus on first (are you familiar with any that run PG today..?). I don't expect we're going to be trying to fight with the Linux (or whatever) kernel over what threads run on what processors with access to what memory on small-NUMA systems (x86-based). Agreed. Once a selection of rows has been made, then if there is a lot of reformatting going on, then could this be done in parallel? I can of think of 2 very simplistic strategies: (A) use a different processor core for each column, or (B) farm out sets of rows to different cores. I am sure in reality, there are more subtleties and aspects of both the strategies will be used in a hybrid fashion along with other approaches. Given our row-based storage architecture, I can't imagine we'd do anything other than take a row-based approach to this.. I would think we'd do two things: parallelize based on partitioning, and parallelize seqscan's across the individual heap files which are split on a per-1G boundary already. Perhaps we can generalize that and scale it based on the number of available processors and the size of the relation but I could see advantages in matching up with what the kernel thinks are independent files. The 1GB idea is interesting. I found in pg_upgrade that file copy would just overwhelm the I/O channel, and that doing multiple copies on the same device had no win, but those were pure I/O operations --- a sequential scan might be enough of a mix of I/O and CPU that parallelism might help. I expect that before any parallel algorithm is invoked, then some sort of threshold needs to be exceeded to make it worth while. Certainly. That's need to be included in the optimization model to support this. I have updated the wiki to reflect the ideas mentioned above. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian br...@momjian.us wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. Honestly that would be a great feature, and I would be happy helping working on it. Taking advantage of parallelism in a server with multiple core, especially for things like large sorting operations would be great. Just thinking loudly, but wouldn't it be the role of the planner to determine if such or such query is worth using parallelism? The executor would then be in charge of actually firing the tasks in parallel that planner has determined necessary to do. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 09:11:20AM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian br...@momjian.us wrote: I mentioned last year that I wanted to start working on parallelism: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Years ago I added thread-safety to libpq. Recently I added two parallel execution paths to pg_upgrade. The first parallel path allows execution of external binaries pg_dump and psql (to restore). The second parallel path does copy/link by calling fork/thread-safe C functions. I was able to do each in 2-3 days. I believe it is time to start adding parallel execution to the backend. We already have some parallelism in the backend: effective_io_concurrency and helper processes. I think it is time we start to consider additional options. Parallelism isn't going to help all queries, in fact it might be just a small subset, but it will be the larger queries. The pg_upgrade parallelism only helps clusters with multiple databases or tablespaces, but the improvements are significant. I have summarized my ideas by updating our Parallel Query Execution wiki page: https://wiki.postgresql.org/wiki/Parallel_Query_Execution Please consider updating the page yourself or posting your ideas to this thread. Thanks. Honestly that would be a great feature, and I would be happy helping working on it. Taking advantage of parallelism in a server with multiple core, especially for things like large sorting operations would be great. Just thinking loudly, but wouldn't it be the role of the planner to determine if such or such query is worth using parallelism? The executor would then be in charge of actually firing the tasks in parallel that planner has determined necessary to do. Yes, it would probably be driven off of the optimizer statistics. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote: Given our row-based storage architecture, I can't imagine we'd do anything other than take a row-based approach to this.. I would think we'd do two things: parallelize based on partitioning, and parallelize seqscan's across the individual heap files which are split on a per-1G boundary already. Perhaps we can generalize that and scale it based on the number of available processors and the size of the relation but I could see advantages in matching up with what the kernel thinks are independent files. The 1GB idea is interesting. I found in pg_upgrade that file copy would just overwhelm the I/O channel, and that doing multiple copies on the same device had no win, but those were pure I/O operations --- a sequential scan might be enough of a mix of I/O and CPU that parallelism might help. AFAIR, synchroscans were introduced because multiple large sequential scans were counterproductive (big time). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Claudio Freire (klaussfre...@gmail.com) wrote: On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote: The 1GB idea is interesting. I found in pg_upgrade that file copy would just overwhelm the I/O channel, and that doing multiple copies on the same device had no win, but those were pure I/O operations --- a sequential scan might be enough of a mix of I/O and CPU that parallelism might help. AFAIR, synchroscans were introduced because multiple large sequential scans were counterproductive (big time). Sequentially scanning the *same* data over and over is certainly counterprouctive. Synchroscans fixed that, yes. That's not what we're talking about though- we're talking about scanning and processing independent sets of data using multiple processes. It's certainly possible that in some cases that won't be as good, but there will be quite a few cases where it's much, much better. Consider a very complicated function running against each row which makes the CPU the bottleneck instead of the i/o system. That type of a query will never run faster than a single CPU in a single-process environment, regardless of if you have synch-scans or not, while in a multi-process environment you'll take advantage of the extra CPUs which are available and use more of the I/O bandwidth that isn't yet exhausted. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost sfr...@snowman.net wrote: * Claudio Freire (klaussfre...@gmail.com) wrote: On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote: The 1GB idea is interesting. I found in pg_upgrade that file copy would just overwhelm the I/O channel, and that doing multiple copies on the same device had no win, but those were pure I/O operations --- a sequential scan might be enough of a mix of I/O and CPU that parallelism might help. AFAIR, synchroscans were introduced because multiple large sequential scans were counterproductive (big time). Sequentially scanning the *same* data over and over is certainly counterprouctive. Synchroscans fixed that, yes. That's not what we're talking about though- we're talking about scanning and processing independent sets of data using multiple processes. I don't see the difference. Blocks are blocks (unless they're cached). It's certainly possible that in some cases that won't be as good If memory serves me correctly (and it does, I suffered it a lot), the performance hit is quite considerable. Enough to make it a lot worse rather than not as good. but there will be quite a few cases where it's much, much better. Just cached segments. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Claudio Freire (klaussfre...@gmail.com) wrote: On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost sfr...@snowman.net wrote: Sequentially scanning the *same* data over and over is certainly counterprouctive. Synchroscans fixed that, yes. That's not what we're talking about though- we're talking about scanning and processing independent sets of data using multiple processes. I don't see the difference. Blocks are blocks (unless they're cached). Not quite. Having to go out to the kernel isn't free. Additionally, the seq scans used to pollute our shared buffers prior to synch-scanning, which didn't help things. It's certainly possible that in some cases that won't be as good If memory serves me correctly (and it does, I suffered it a lot), the performance hit is quite considerable. Enough to make it a lot worse rather than not as good. I feel like we must not be communicating very well. If the CPU is pegged at 100% and the I/O system is at 20%, adding another CPU at 100% will bring the I/O load up to 40% and you're now processing data twice as fast overall. If you're running a single CPU at 20% and your I/O system is at 100%, then adding another CPU isn't going to help and may even degrade performance by causing problems for the I/O system. The goal of the optimizer will be to model the plan to account for exactly that, as best it can. but there will be quite a few cases where it's much, much better. Just cached segments. No, certainly not just cached segments. Any situation where the CPU is the bottleneck. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
but there will be quite a few cases where it's much, much better. Just cached segments. Actually, thanks to much faster storage (think SSD, SAN), it's easily possible for PostgreSQL to become CPU-limited on a seq scan query, even when reading from disk. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
* Josh Berkus (j...@agliodbs.com) wrote: Actually, thanks to much faster storage (think SSD, SAN), it's easily possible for PostgreSQL to become CPU-limited on a seq scan query, even when reading from disk. Particularly with a complex filter being applied or if it's feeding into something above that's expensive.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Parallel query execution
Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 1:32 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? Not exactly, I meant something like being able to use parallel processing when doing INSERT or COPY directly in core. If there is a parallel processing infrastructure, it could also be used for such write operations. I agree that the cases mentioned by Josh are far more appealing though... -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] Parallel query execution
On Wed, Jan 16, 2013 at 12:55 AM, Stephen Frost sfr...@snowman.net wrote: If memory serves me correctly (and it does, I suffered it a lot), the performance hit is quite considerable. Enough to make it a lot worse rather than not as good. I feel like we must not be communicating very well. If the CPU is pegged at 100% and the I/O system is at 20%, adding another CPU at 100% will bring the I/O load up to 40% and you're now processing data twice as fast overall Well, there's the fault in your logic. It won't be as linear. Adding another sequential scan will decrease bandwidth, if the I/O system was doing say 10MB/s at 20% load, now it will be doing 20MB/s at 80% load (maybe even worse). Quite suddenly you'll meet diminishing returns, and the I/O subsystem which wasn't the bottleneck will become it, bandwidth being the key. You might end up with less bandwidth than you've started, if you go far enough past that knee. Add some concurrent operations (connections) to the mix and it just gets worse. Figuring out where the knee is may be the hardest problem you'll face. I don't think it'll be predictable enough to make I/O parallelization in that case worth the effort. If you instead think of parallelizing random I/O (say index scans within nested loops), that might work (or it might not). Again it depends a helluva lot on what else is contending with the I/O resources and how far ahead of optimum you push it. I've faced this problem when trying to prefetch on index scans. If you try to prefetch too much, you induce extra delays and it's a bad tradeoff. Feel free to do your own testing. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
Bruce Momjian escribió: On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote: On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote: Claudio, Stephen, It really seems like the areas where we could get the most bang for the buck in parallelism would be: 1. Parallel sort 2. Parallel aggregation (for commutative aggregates) 3. Parallel nested loop join (especially for expression joins, like GIS) parallel data load? :/ We have that in pg_restore, and I thinnk we are getting parallel dump in 9.3, right? Unfortunately, I don't see it in the last 9.3 commit-fest. Is it still being worked on? It's in the previous-to-last commitfest. IIRC that patch required review and testing from people with some Windows background. There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel query execution
On Tuesday, January 15, 2013, Simon Riggs wrote: On 15 January 2013 22:55, Bruce Momjian br...@momjian.us javascript:; wrote: Why is this being discussed now? It is for 9.4 and will take months. I didn't think there was a better time. We don't usually discuss features during beta testing. Bruce, there are many, many patches on the queue. How will we ever get to beta testing if we begin open ended discussions on next release? If we can't finish what we've started for 9.3, why talk about 9.4? Yes, its a great topic for discussion, but there are better times. Possibly so. But unless we are to introduce a thinkfest, how do we know when such a better time would be? Lately commit-fests have been basically a continuous thing, except during beta which would be an even worse time to discuss it. It think that parallel execution is huge and probably more likely for 9.5 (10.0?) than 9.4 for the general case (maybe some special cases for 9.4, like index builds). Yet the single biggest risk I see to the future of the project is the lack of parallel execution. Cheers, Jeff
Re: [HACKERS] Parallel query execution
Alvaro Herrera alvhe...@2ndquadrant.com writes: There are still 34 items needing attention in CF3. I suggest that, if you have some spare time, your help would be very much appreciated there. The commitfest that started on Jan 15th has 65 extra items. Anything currently listed in CF3 can rightfully be considered to be part of CF4, too. In case you hadn't noticed, we've totally lost control of the CF process. Quite aside from the lack of progress on closing CF3, major hackers who should know better are submitting significant new feature patches now, despite our agreement in Ottawa that nothing big would be accepted after CF3. At this point I'd bet against releasing 9.3 during 2013. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel Query Execution Project
Hi all, I'm interested in this parallel project, http://wiki.postgresql.org/wiki/Parallel_Query_Execution But I can't find any discussion and current progress in the website, it seems to stop for nearly a year? Thanks, Li Jie -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Query Execution Project
Hi, On 09/28/2010 07:24 AM, Li Jie wrote: I'm interested in this parallel project, http://wiki.postgresql.org/wiki/Parallel_Query_Execution But I can't find any discussion and current progress in the website, it seems to stop for nearly a year? Yeah, I don't know of anybody really working on it ATM. If you are interested in a process based design, please have a look at the bgworker infrastructure stuff. It could be of help for a process-based implementation. Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Query Execution Project
On Sep 28, 2010, at 10:15 AM, Markus Wanner wrote: Hi, On 09/28/2010 07:24 AM, Li Jie wrote: I'm interested in this parallel project, http://wiki.postgresql.org/wiki/Parallel_Query_Execution But I can't find any discussion and current progress in the website, it seems to stop for nearly a year? Yeah, I don't know of anybody really working on it ATM. If you are interested in a process based design, please have a look at the bgworker infrastructure stuff. It could be of help for a process-based implementation. Regards Markus Wanner yes, i don't know of anybody either. in addition to that it is more than a giant task. it means working on more than just one isolated part. practically i cannot think of any stage of query execution which would not need some changes. i don't see a feature like that within a realistic timeframe. regards, hans -- Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers