Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
On Wed, Feb 15, 2017 at 8:13 PM, Andres Freund wrote: > I don't think general INSERTs are safe, if you consider unique indexes > and foreign keys (both setting xmax in the simple cases and multixacts > are likely to be problematic). There's no real problem with setting xmax or creating multixacts - I think - but there's definitely a problem if an INSERT can lead to the creation of a new combo CID, because we have no way at present of transmitting the new combo CID mapping to the workers, and if a worker sees a combo CID for which it doesn't have a mapping, the world blows up. Do you think an insert can trigger combo CID creation? (Of course, now that we've got DSA, it wouldn't be nearly as hard to fix the combo CID synchronization problem.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
On 2017-02-15 20:35:16 -0500, Stephen Frost wrote: > Perhaps, but until we've got a system worked out for having the workers > do the writes, we aren't getting anything. Being able to have the > leader do the writes based on the tuples fed back from the workers is > clearly better than nothing. Never argued differently. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
Andres, * Andres Freund (and...@anarazel.de) wrote: > On 2017-02-15 20:28:43 -0500, Stephen Frost wrote: > > * Andres Freund (and...@anarazel.de) wrote: > > > On February 15, 2017 5:20:20 PM PST, Stephen Frost > > > wrote: > > > >In many cases, I expect this would work just as well, if not better, > > > >than trying to actually do writes in parallel. > > > > > > Why? IPCing tuples around is quite expensive. Or do you mean because > > > it'll be more suitable because of the possible plans? > > > > Because I've seen some serious problems when trying to have multiple > > processes writing into the same relation due to the relation extension > > lock, cases where it was much faster to have each process write into its > > own table. Admittedly, we've improved things there, so perhaps this isn't > > an issue any longer, but we also don't yet really know what the locking > > implementation looks like yet for having multiple parallel workers > > writing into the same relation, so it may be that sending a few records > > back to the leader is cheaper than working out the locking to allow > > parallel workers to write to the same relation, or at least not any more > > expensive. > > I quite seriously doubt that you will get enough rows to the master via > tuplequeues that it'll be faster than inserting on the workers, eve > nwith such scalability problems present. Even before the 9.6 > improvements, and even more so after. Perhaps, but until we've got a system worked out for having the workers do the writes, we aren't getting anything. Being able to have the leader do the writes based on the tuples fed back from the workers is clearly better than nothing. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
On 2017-02-15 20:28:43 -0500, Stephen Frost wrote: > Andres, > > * Andres Freund (and...@anarazel.de) wrote: > > On February 15, 2017 5:20:20 PM PST, Stephen Frost > > wrote: > > >In many cases, I expect this would work just as well, if not better, > > >than trying to actually do writes in parallel. > > > > Why? IPCing tuples around is quite expensive. Or do you mean because it'll > > be more suitable because of the possible plans? > > Because I've seen some serious problems when trying to have multiple > processes writing into the same relation due to the relation extension > lock, cases where it was much faster to have each process write into its > own table. Admittedly, we've improved things there, so perhaps this isn't > an issue any longer, but we also don't yet really know what the locking > implementation looks like yet for having multiple parallel workers > writing into the same relation, so it may be that sending a few records > back to the leader is cheaper than working out the locking to allow > parallel workers to write to the same relation, or at least not any more > expensive. I quite seriously doubt that you will get enough rows to the master via tuplequeues that it'll be faster than inserting on the workers, eve nwith such scalability problems present. Even before the 9.6 improvements, and even more so after. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
Andres, * Andres Freund (and...@anarazel.de) wrote: > On February 15, 2017 5:20:20 PM PST, Stephen Frost wrote: > >In many cases, I expect this would work just as well, if not better, > >than trying to actually do writes in parallel. > > Why? IPCing tuples around is quite expensive. Or do you mean because it'll > be more suitable because of the possible plans? Because I've seen some serious problems when trying to have multiple processes writing into the same relation due to the relation extension lock, cases where it was much faster to have each process write into its own table. Admittedly, we've improved things there, so perhaps this isn't an issue any longer, but we also don't yet really know what the locking implementation looks like yet for having multiple parallel workers writing into the same relation, so it may be that sending a few records back to the leader is cheaper than working out the locking to allow parallel workers to write to the same relation, or at least not any more expensive. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
On February 15, 2017 5:20:20 PM PST, Stephen Frost wrote: >In many cases, I expect this would work just as well, if not better, >than trying to actually do writes in parallel. Why? IPCing tuples around is quite expensive. Or do you mean because it'll be more suitable because of the possible plans? Andres Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
All, * Andres Freund (and...@anarazel.de) wrote: > On 2017-02-15 08:48:44 -0500, Robert Haas wrote: > > The other way of fixing this problem is to have each worker generate a > > subset of the tuples and funnel them all back to the leader through a > > Gather node; the leader then does all the inserts. That avoids having > > to solve the problems mentioned above, but it probably doesn't perform > > nearly as well. > > I think it'd already be tremendously useful however. I think it'd not > be an unreasonable first step. It'd be a good fallback that'd be useful > for !insert and such anyway. Absolutely. I had always figured this would be what we would do first, before coming up with something more clever down the road. In particular, this allows filters to be pushed down and performed in parallel, which may significantly reduce the result which is passed back up to the leader. In many cases, I expect this would work just as well, if not better, than trying to actually do writes in parallel. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
Hi, On 2017-02-15 08:48:44 -0500, Robert Haas wrote: > We got rid of the major existing use of page locks in > 6d46f4783efe457f74816a75173eb23ed8930020, which extirpated them from > hash indexes, and I was kind of hoping they could go away altogether, > but we can't do that as long as GIN is using them. Learned a new word today. > Anyway, if we solve those problems, we can allow inserts (not updates > or deletes, those have other problems, principally relating to combo > CIDs) in parallel mode, which would make it possible to allow the > kinds of things you are asking about here. I don't think general INSERTs are safe, if you consider unique indexes and foreign keys (both setting xmax in the simple cases and multixacts are likely to be problematic). > The other way of fixing this problem is to have each worker generate a > subset of the tuples and funnel them all back to the leader through a > Gather node; the leader then does all the inserts. That avoids having > to solve the problems mentioned above, but it probably doesn't perform > nearly as well. I think it'd already be tremendously useful however. I think it'd not be an unreasonable first step. It'd be a good fallback that'd be useful for !insert and such anyway. Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
On Thu, Feb 16, 2017 at 12:48 AM, Robert Haas wrote: > On Wed, Feb 15, 2017 at 12:24 AM, Joshua Chamberlain > wrote: > > Hello, > > > > (I'm posting to hackers since I got no response on the general list.) > > > > I use Postgres + PostGIS quite heavily, and recently have been taking > full > > advantage of the new parallelism in 9.6. I'm now running queries in a few > > hours that would otherwise take more than a day. > > > > However, parallelism is disabled for all queries that perform writes (as > > documented). I would normally run "CREATE TABLE AS [some super-expensive > > query]", but since that can't use parallelism I'm using the \o option in > > psql, creating the table separately, and then \copy-ing in the results. > That > > works, but "CREATE TABLE AS" would be more convenient. > > > > Are there plans in 10.0 to allow parallelism in queries that write, or at > > least in "CREATE TABLE AS" queries? (Support in materialized views would > be > > great, too!) > > Somebody else asked me about this in private email. Nobody at > EnterpriseDB is working on this right now, and I don't think anybody > else is working on it either. There are several ways to do it, but > none of them are entirely easy and the ones likely to perform better > are less easy. > > I think that what we need to do to make this work is come up with a > way to fix the interaction between group locking (which allows > multiple processes to hold mutually conflicting locks at the same time > if they are in a parallel group) and relation extension (which uses > heavyweight locking to prevent multiple processes from trying to > extend the same relation at the same time). I think that perhaps the > right way to solve that problem is to come up with a way of providing > mutual exclusion around relation extension that doesn't need the > heavyweight lock manager. Relation extension locks don't need > multiple lock modes or deadlock detection or any of the other > frammishes that the heavyweight lock manager provides, but they do > need to be fast, which the heavyweight lock manager isn't especially > good at. So moving this out of the heavyweight lock manager might be > a way to solve two problems at once. > > There's also a hazard related to GIN indexes since > e2c79e14d998cd31f860854bc9210b37b457bb01, which introduced a new use > of Page locks, which have a similar kind of problem. We got rid of > the major existing use of page locks in > 6d46f4783efe457f74816a75173eb23ed8930020, which extirpated them from > hash indexes, and I was kind of hoping they could go away altogether, > but we can't do that as long as GIN is using them. > > Anyway, if we solve those problems, we can allow inserts (not updates > or deletes, those have other problems, principally relating to combo > CIDs) in parallel mode, which would make it possible to allow the > kinds of things you are asking about here. Then you could fix things > so that each worker generates a subset of the tuples and inserts the > ones it generates. You'd end up with a parallel plan but no Gather > node! The workers could feed tuples directly to a suitable > DestReceiver, which would be really spiffy. > > The other way of fixing this problem is to have each worker generate a > subset of the tuples and funnel them all back to the leader through a > Gather node; the leader then does all the inserts. That avoids having > to solve the problems mentioned above, but it probably doesn't perform > nearly as well. > How about supporting something like, backend does the write operations and whereas the worker will produce the results. This way it may not produce good performance for all the cases compared to do the writer operation by all parallel workers, but this may be useful for some scenarios like; CREATE MATERIALIZED VIEW and etc. Following are the explain plan with minimal changes in the code to allow write operations. I didn't verified all the scenarios. How about supporting writer operations as below and then later enhance it to do the write operations by the parallel workers also? POC patch is attached. postgres=# explain create materialized view mat_view as select * from tbl where f1 =10; QUERY PLAN Gather (cost=1000.00..37458.43 rows=1 width=214) Workers Planned: 2 -> Parallel Seq Scan on tbl (cost=0.00..36458.33 rows=1 width=214) Filter: (f1 = 10) (4 rows) postgres=# explain insert into tbl select * from tbl where f1 = 10; QUERY PLAN Insert on tbl (cost=1000.00..37458.43 rows=1 width=214) -> Gather (cost=1000.00..37458.43 rows=1 width=214) Workers Planned: 2 -> Parallel Seq Scan on tbl tbl_1 (cost=0.00..36458.33 rows=1 width=214) Filter: (f1 = 10) (5 rows) postgres=# explain update tbl s
Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
On Wed, Feb 15, 2017 at 12:24 AM, Joshua Chamberlain wrote: > Hello, > > (I'm posting to hackers since I got no response on the general list.) > > I use Postgres + PostGIS quite heavily, and recently have been taking full > advantage of the new parallelism in 9.6. I'm now running queries in a few > hours that would otherwise take more than a day. > > However, parallelism is disabled for all queries that perform writes (as > documented). I would normally run "CREATE TABLE AS [some super-expensive > query]", but since that can't use parallelism I'm using the \o option in > psql, creating the table separately, and then \copy-ing in the results. That > works, but "CREATE TABLE AS" would be more convenient. > > Are there plans in 10.0 to allow parallelism in queries that write, or at > least in "CREATE TABLE AS" queries? (Support in materialized views would be > great, too!) Somebody else asked me about this in private email. Nobody at EnterpriseDB is working on this right now, and I don't think anybody else is working on it either. There are several ways to do it, but none of them are entirely easy and the ones likely to perform better are less easy. I think that what we need to do to make this work is come up with a way to fix the interaction between group locking (which allows multiple processes to hold mutually conflicting locks at the same time if they are in a parallel group) and relation extension (which uses heavyweight locking to prevent multiple processes from trying to extend the same relation at the same time). I think that perhaps the right way to solve that problem is to come up with a way of providing mutual exclusion around relation extension that doesn't need the heavyweight lock manager. Relation extension locks don't need multiple lock modes or deadlock detection or any of the other frammishes that the heavyweight lock manager provides, but they do need to be fast, which the heavyweight lock manager isn't especially good at. So moving this out of the heavyweight lock manager might be a way to solve two problems at once. There's also a hazard related to GIN indexes since e2c79e14d998cd31f860854bc9210b37b457bb01, which introduced a new use of Page locks, which have a similar kind of problem. We got rid of the major existing use of page locks in 6d46f4783efe457f74816a75173eb23ed8930020, which extirpated them from hash indexes, and I was kind of hoping they could go away altogether, but we can't do that as long as GIN is using them. Anyway, if we solve those problems, we can allow inserts (not updates or deletes, those have other problems, principally relating to combo CIDs) in parallel mode, which would make it possible to allow the kinds of things you are asking about here. Then you could fix things so that each worker generates a subset of the tuples and inserts the ones it generates. You'd end up with a parallel plan but no Gather node! The workers could feed tuples directly to a suitable DestReceiver, which would be really spiffy. The other way of fixing this problem is to have each worker generate a subset of the tuples and funnel them all back to the leader through a Gather node; the leader then does all the inserts. That avoids having to solve the problems mentioned above, but it probably doesn't perform nearly as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
On Tue, Feb 14, 2017 at 09:24:47PM -0800, Joshua Chamberlain wrote: > Hello, > > (I'm posting to hackers since I got no response on the general > list.) > > I use Postgres + PostGIS quite heavily, and recently have been > taking full advantage of the new parallelism in 9.6. I'm now running > queries in a few hours that would otherwise take more than a day. > > However, parallelism is disabled for all queries that perform writes > (as documented). I would normally run "CREATE TABLE AS [some > super-expensive query]", but since that can't use parallelism I'm > using the \o option in psql, creating the table separately, and then > \copy-ing in the results. That works, but "CREATE TABLE AS" would > be more convenient. How about creating a temp view? CREATE TEMPORARY VIEW foo_tv AS [your gigantic query goes here]; CREATE TABLE foo (LIKE foo_tv); INSERT INTO foo SELECT * FROM foo_tv; > Are there plans in 10.0 to allow parallelism in queries that write, > or at least in "CREATE TABLE AS" queries? (Support in materialized > views would be great, too!) Patches are always welcome, and there's one more commitfest to go before 10. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?
On Wed, Feb 15, 2017 at 2:24 PM, Joshua Chamberlain wrote: > Are there plans in 10.0 to allow parallelism in queries that write, or at > least in "CREATE TABLE AS" queries? (Support in materialized views would be > great, too!) Nope. There are no patches for now. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CREATE TABLE with parallel workers, 10.0?
Hello, (I'm posting to hackers since I got no response on the general list.) I use Postgres + PostGIS quite heavily, and recently have been taking full advantage of the new parallelism in 9.6. I'm now running queries in a few hours that would otherwise take more than a day. However, parallelism is disabled for all queries that perform writes (as documented). I would normally run "CREATE TABLE AS [some super-expensive query]", but since that can't use parallelism I'm using the \o option in psql, creating the table separately, and then \copy-ing in the results. That works, but "CREATE TABLE AS" would be more convenient. Are there plans in 10.0 to allow parallelism in queries that write, or at least in "CREATE TABLE AS" queries? (Support in materialized views would be great, too!) Thanks, Joshua Chamberlain