Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-16 Thread Robert Haas
On Wed, Feb 15, 2017 at 8:13 PM, Andres Freund  wrote:
> I don't think general INSERTs are safe, if you consider unique indexes
> and foreign keys (both setting xmax in the simple cases and multixacts
> are likely to be problematic).

There's no real problem with setting xmax or creating multixacts - I
think - but there's definitely a problem if an INSERT can lead to the
creation of a new combo CID, because we have no way at present of
transmitting the new combo CID mapping to the workers, and if a worker
sees a combo CID for which it doesn't have a mapping, the world blows
up.  Do you think an insert can trigger combo CID creation?

(Of course, now that we've got DSA, it wouldn't be nearly as hard to
fix the combo CID synchronization problem.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-15 Thread Andres Freund
On 2017-02-15 20:35:16 -0500, Stephen Frost wrote:
> Perhaps, but until we've got a system worked out for having the workers
> do the writes, we aren't getting anything.  Being able to have the
> leader do the writes based on the tuples fed back from the workers is
> clearly better than nothing.

Never argued differently.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-15 Thread Stephen Frost
Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-02-15 20:28:43 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > On February 15, 2017 5:20:20 PM PST, Stephen Frost  
> > > wrote:
> > > >In many cases, I expect this would work just as well, if not better,
> > > >than trying to actually do writes in parallel.
> > > 
> > > Why? IPCing tuples around is quite expensive.  Or do you mean because 
> > > it'll be more suitable because of the possible plans?
> > 
> > Because I've seen some serious problems when trying to have multiple
> > processes writing into the same relation due to the relation extension
> > lock, cases where it was much faster to have each process write into its
> > own table.  Admittedly, we've improved things there, so perhaps this isn't
> > an issue any longer, but we also don't yet really know what the locking
> > implementation looks like yet for having multiple parallel workers
> > writing into the same relation, so it may be that sending a few records
> > back to the leader is cheaper than working out the locking to allow
> > parallel workers to write to the same relation, or at least not any more
> > expensive.
> 
> I quite seriously doubt that you will get enough rows to the master via
> tuplequeues that it'll be faster than inserting on the workers, eve
> nwith such scalability problems present. Even before the 9.6
> improvements, and even more so after.

Perhaps, but until we've got a system worked out for having the workers
do the writes, we aren't getting anything.  Being able to have the
leader do the writes based on the tuples fed back from the workers is
clearly better than nothing.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-15 Thread Andres Freund
On 2017-02-15 20:28:43 -0500, Stephen Frost wrote:
> Andres,
> 
> * Andres Freund (and...@anarazel.de) wrote:
> > On February 15, 2017 5:20:20 PM PST, Stephen Frost  
> > wrote:
> > >In many cases, I expect this would work just as well, if not better,
> > >than trying to actually do writes in parallel.
> > 
> > Why? IPCing tuples around is quite expensive.  Or do you mean because it'll 
> > be more suitable because of the possible plans?
> 
> Because I've seen some serious problems when trying to have multiple
> processes writing into the same relation due to the relation extension
> lock, cases where it was much faster to have each process write into its
> own table.  Admittedly, we've improved things there, so perhaps this isn't
> an issue any longer, but we also don't yet really know what the locking
> implementation looks like yet for having multiple parallel workers
> writing into the same relation, so it may be that sending a few records
> back to the leader is cheaper than working out the locking to allow
> parallel workers to write to the same relation, or at least not any more
> expensive.

I quite seriously doubt that you will get enough rows to the master via
tuplequeues that it'll be faster than inserting on the workers, eve
nwith such scalability problems present. Even before the 9.6
improvements, and even more so after.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-15 Thread Stephen Frost
Andres,

* Andres Freund (and...@anarazel.de) wrote:
> On February 15, 2017 5:20:20 PM PST, Stephen Frost  wrote:
> >In many cases, I expect this would work just as well, if not better,
> >than trying to actually do writes in parallel.
> 
> Why? IPCing tuples around is quite expensive.  Or do you mean because it'll 
> be more suitable because of the possible plans?

Because I've seen some serious problems when trying to have multiple
processes writing into the same relation due to the relation extension
lock, cases where it was much faster to have each process write into its
own table.  Admittedly, we've improved things there, so perhaps this isn't
an issue any longer, but we also don't yet really know what the locking
implementation looks like yet for having multiple parallel workers
writing into the same relation, so it may be that sending a few records
back to the leader is cheaper than working out the locking to allow
parallel workers to write to the same relation, or at least not any more
expensive.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-15 Thread Andres Freund


On February 15, 2017 5:20:20 PM PST, Stephen Frost  wrote:
>In many cases, I expect this would work just as well, if not better,
>than trying to actually do writes in parallel.

Why? IPCing tuples around is quite expensive.  Or do you mean because it'll be 
more suitable because of the possible plans?

Andres

Andres

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-15 Thread Stephen Frost
All,

* Andres Freund (and...@anarazel.de) wrote:
> On 2017-02-15 08:48:44 -0500, Robert Haas wrote:
> > The other way of fixing this problem is to have each worker generate a
> > subset of the tuples and funnel them all back to the leader through a
> > Gather node; the leader then does all the inserts.  That avoids having
> > to solve the problems mentioned above, but it probably doesn't perform
> > nearly as well.
> 
> I think it'd already be tremendously useful however.  I think it'd not
> be an unreasonable first step. It'd be a good fallback that'd be useful
> for !insert and such anyway.

Absolutely.  I had always figured this would be what we would do first,
before coming up with something more clever down the road.  In
particular, this allows filters to be pushed down and performed in
parallel, which may significantly reduce the result which is passed back
up to the leader.

In many cases, I expect this would work just as well, if not better,
than trying to actually do writes in parallel.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-15 Thread Andres Freund
Hi,

On 2017-02-15 08:48:44 -0500, Robert Haas wrote:
> We got rid of the major existing use of page locks in
> 6d46f4783efe457f74816a75173eb23ed8930020, which extirpated them from
> hash indexes, and I was kind of hoping they could go away altogether,
> but we can't do that as long as GIN is using them.

Learned a new word today.


> Anyway, if we solve those problems, we can allow inserts (not updates
> or deletes, those have other problems, principally relating to combo
> CIDs) in parallel mode, which would make it possible to allow the
> kinds of things you are asking about here.

I don't think general INSERTs are safe, if you consider unique indexes
and foreign keys (both setting xmax in the simple cases and multixacts
are likely to be problematic).


> The other way of fixing this problem is to have each worker generate a
> subset of the tuples and funnel them all back to the leader through a
> Gather node; the leader then does all the inserts.  That avoids having
> to solve the problems mentioned above, but it probably doesn't perform
> nearly as well.

I think it'd already be tremendously useful however.  I think it'd not
be an unreasonable first step. It'd be a good fallback that'd be useful
for !insert and such anyway.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-15 Thread Haribabu Kommi
On Thu, Feb 16, 2017 at 12:48 AM, Robert Haas  wrote:

> On Wed, Feb 15, 2017 at 12:24 AM, Joshua Chamberlain 
> wrote:
> > Hello,
> >
> > (I'm posting to hackers since I got no response on the general list.)
> >
> > I use Postgres + PostGIS quite heavily, and recently have been taking
> full
> > advantage of the new parallelism in 9.6. I'm now running queries in a few
> > hours that would otherwise take more than a day.
> >
> > However, parallelism is disabled for all queries that perform writes (as
> > documented). I would normally run "CREATE TABLE AS [some super-expensive
> > query]", but since that can't use parallelism I'm using the \o option in
> > psql, creating the table separately, and then \copy-ing in the results.
> That
> > works, but "CREATE TABLE AS" would be more convenient.
> >
> > Are there plans in 10.0 to allow parallelism in queries that write, or at
> > least in "CREATE TABLE AS" queries? (Support in materialized views would
> be
> > great, too!)
>
> Somebody else asked me about this in private email.  Nobody at
> EnterpriseDB is working on this right now, and I don't think anybody
> else is working on it either.  There are several ways to do it, but
> none of them are entirely easy and the ones likely to perform better
> are less easy.
>
> I think that what we need to do to make this work is come up with a
> way to fix the interaction between group locking (which allows
> multiple processes to hold mutually conflicting locks at the same time
> if they are in a parallel group) and relation extension (which uses
> heavyweight locking to prevent multiple processes from trying to
> extend the same relation at the same time).  I think that perhaps the
> right way to solve that problem is to come up with a way of providing
> mutual exclusion around relation extension that doesn't need the
> heavyweight lock manager.  Relation extension locks don't need
> multiple lock modes or deadlock detection or any of the other
> frammishes that the heavyweight lock manager provides, but they do
> need to be fast, which the heavyweight lock manager isn't especially
> good at.  So moving this out of the heavyweight lock manager might be
> a way to solve two problems at once.
>
> There's also a hazard related to GIN indexes since
> e2c79e14d998cd31f860854bc9210b37b457bb01, which introduced a new use
> of Page locks, which have a similar kind of problem.  We got rid of
> the major existing use of page locks in
> 6d46f4783efe457f74816a75173eb23ed8930020, which extirpated them from
> hash indexes, and I was kind of hoping they could go away altogether,
> but we can't do that as long as GIN is using them.
>
> Anyway, if we solve those problems, we can allow inserts (not updates
> or deletes, those have other problems, principally relating to combo
> CIDs) in parallel mode, which would make it possible to allow the
> kinds of things you are asking about here.  Then you could fix things
> so that each worker generates a subset of the tuples and inserts the
> ones it generates.  You'd end up with a parallel plan but no Gather
> node!   The workers could feed tuples directly to a suitable
> DestReceiver, which would be really spiffy.
>
> The other way of fixing this problem is to have each worker generate a
> subset of the tuples and funnel them all back to the leader through a
> Gather node; the leader then does all the inserts.  That avoids having
> to solve the problems mentioned above, but it probably doesn't perform
> nearly as well.
>

How about supporting something like, backend does the write operations
and whereas the worker will produce the results. This way it may not produce
good performance for all the cases compared to do the writer operation by
all parallel workers, but this may be useful for some scenarios like;
CREATE MATERIALIZED VIEW and etc.

Following are the explain plan with minimal changes in the code to allow
write operations. I didn't verified all the scenarios. How about supporting
writer operations as below and then later enhance it to do the write
operations
by the parallel workers also?

POC patch is attached.

postgres=# explain create materialized view mat_view as select * from tbl
where f1 =10;
   QUERY PLAN

 Gather  (cost=1000.00..37458.43 rows=1 width=214)
   Workers Planned: 2
   ->  Parallel Seq Scan on tbl  (cost=0.00..36458.33 rows=1 width=214)
 Filter: (f1 = 10)
(4 rows)

postgres=# explain insert into tbl select * from tbl where f1 = 10;
 QUERY PLAN


 Insert on tbl  (cost=1000.00..37458.43 rows=1 width=214)
   ->  Gather  (cost=1000.00..37458.43 rows=1 width=214)
 Workers Planned: 2
 ->  Parallel Seq Scan on tbl tbl_1  (cost=0.00..36458.33 rows=1
width=214)
   Filter: (f1 = 10)

Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-15 Thread Robert Haas
On Wed, Feb 15, 2017 at 12:24 AM, Joshua Chamberlain  wrote:
> Hello,
>
> (I'm posting to hackers since I got no response on the general list.)
>
> I use Postgres + PostGIS quite heavily, and recently have been taking full
> advantage of the new parallelism in 9.6. I'm now running queries in a few
> hours that would otherwise take more than a day.
>
> However, parallelism is disabled for all queries that perform writes (as
> documented). I would normally run "CREATE TABLE AS [some super-expensive
> query]", but since that can't use parallelism I'm using the \o option in
> psql, creating the table separately, and then \copy-ing in the results. That
> works, but "CREATE TABLE AS" would be more convenient.
>
> Are there plans in 10.0 to allow parallelism in queries that write, or at
> least in "CREATE TABLE AS" queries? (Support in materialized views would be
> great, too!)

Somebody else asked me about this in private email.  Nobody at
EnterpriseDB is working on this right now, and I don't think anybody
else is working on it either.  There are several ways to do it, but
none of them are entirely easy and the ones likely to perform better
are less easy.

I think that what we need to do to make this work is come up with a
way to fix the interaction between group locking (which allows
multiple processes to hold mutually conflicting locks at the same time
if they are in a parallel group) and relation extension (which uses
heavyweight locking to prevent multiple processes from trying to
extend the same relation at the same time).  I think that perhaps the
right way to solve that problem is to come up with a way of providing
mutual exclusion around relation extension that doesn't need the
heavyweight lock manager.  Relation extension locks don't need
multiple lock modes or deadlock detection or any of the other
frammishes that the heavyweight lock manager provides, but they do
need to be fast, which the heavyweight lock manager isn't especially
good at.  So moving this out of the heavyweight lock manager might be
a way to solve two problems at once.

There's also a hazard related to GIN indexes since
e2c79e14d998cd31f860854bc9210b37b457bb01, which introduced a new use
of Page locks, which have a similar kind of problem.  We got rid of
the major existing use of page locks in
6d46f4783efe457f74816a75173eb23ed8930020, which extirpated them from
hash indexes, and I was kind of hoping they could go away altogether,
but we can't do that as long as GIN is using them.

Anyway, if we solve those problems, we can allow inserts (not updates
or deletes, those have other problems, principally relating to combo
CIDs) in parallel mode, which would make it possible to allow the
kinds of things you are asking about here.  Then you could fix things
so that each worker generates a subset of the tuples and inserts the
ones it generates.  You'd end up with a parallel plan but no Gather
node!   The workers could feed tuples directly to a suitable
DestReceiver, which would be really spiffy.

The other way of fixing this problem is to have each worker generate a
subset of the tuples and funnel them all back to the leader through a
Gather node; the leader then does all the inserts.  That avoids having
to solve the problems mentioned above, but it probably doesn't perform
nearly as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-14 Thread David Fetter
On Tue, Feb 14, 2017 at 09:24:47PM -0800, Joshua Chamberlain wrote:
> Hello,
> 
> (I'm posting to hackers since I got no response on the general
> list.)
> 
> I use Postgres + PostGIS quite heavily, and recently have been
> taking full advantage of the new parallelism in 9.6. I'm now running
> queries in a few hours that would otherwise take more than a day.
> 
> However, parallelism is disabled for all queries that perform writes
> (as documented). I would normally run "CREATE TABLE AS [some
> super-expensive query]", but since that can't use parallelism I'm
> using the \o option in psql, creating the table separately, and then
> \copy-ing in the results.  That works, but "CREATE TABLE AS" would
> be more convenient.

How about creating a temp view?

CREATE TEMPORARY VIEW foo_tv AS [your gigantic query goes here];
CREATE TABLE foo (LIKE foo_tv);
INSERT INTO foo SELECT * FROM foo_tv;

> Are there plans in 10.0 to allow parallelism in queries that write,
> or at least in "CREATE TABLE AS" queries? (Support in materialized
> views would be great, too!)

Patches are always welcome, and there's one more commitfest to go
before 10.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE TABLE with parallel workers, 10.0?

2017-02-14 Thread Michael Paquier
On Wed, Feb 15, 2017 at 2:24 PM, Joshua Chamberlain  wrote:
> Are there plans in 10.0 to allow parallelism in queries that write, or at
> least in "CREATE TABLE AS" queries? (Support in materialized views would be
> great, too!)

Nope. There are no patches for now.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers