Re: [HACKERS] Parallel query execution with SPI

2017-03-31 Thread Konstantin Knizhnik



On 31.03.2017 13:48, Robert Haas wrote:

On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnik
 wrote:

It is possible to execute query concurrently using SPI?
If so, how it can be enforced?
I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help:
query is executed by single backend while the same query been launched at
top level uses parallel plan:

 fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query,
fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK);
 ...
 SPI_cursor_fetch(fsstate->portal, true, 1);

Parallel execution isn't possible if you are using a cursor-type
interface, because a parallel query can't be suspended and resumed
like a non-parallel query.  If you use a function that executes the
query to completion in one go, like SPI_execute_plan, then it's cool.
See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d.


Thank you very much for explanation.
In case of using SPI_execute the query is really executed concurrently.
But it means that when I am executing some query using SPI, I need to 
somehow predict number of returned tuples.
If it is not so much, then it is better to use SPI_execute to allow 
concurrent execution of the query.
But if it is large enough, then SPI_execute without limit can cause 
memory overflow.
Certainly I can specify some reasonable limit and it if is reached, then 
use cursor instead.

But it is neither convenient, neither efficient.

I wonder if somebody can suggest better solution?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution with SPI

2017-03-31 Thread Rafia Sabih
On Fri, Mar 31, 2017 at 4:18 PM, Robert Haas  wrote:
> On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnik
>  wrote:
>> It is possible to execute query concurrently using SPI?
>> If so, how it can be enforced?
>> I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help:
>> query is executed by single backend while the same query been launched at
>> top level uses parallel plan:
>>
>> fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query,
>> fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK);
>> ...
>> SPI_cursor_fetch(fsstate->portal, true, 1);
>
> Parallel execution isn't possible if you are using a cursor-type
> interface, because a parallel query can't be suspended and resumed
> like a non-parallel query.  If you use a function that executes the
> query to completion in one go, like SPI_execute_plan, then it's cool.
> See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d.
>
> --

Adding to that, for your case, passing CURSOR_OPT_PARALLEL_OK is not
enough, because PortalRun for the cursor would be having
portal->run_once set as false which restricts parallelism in
ExecutePlan,
if (!execute_once || dest->mydest == DestIntoRel)
  use_parallel_mode = false;

You may check [1] for the discussion on this.

[1] 
https://www.postgresql.org/message-id/flat/CAFiTN-vxhvvi-rMJFOxkGzNaQpf%2BKS76%2Bsu7-sG_NQZGRPJkQg%40mail.gmail.com#cafitn-vxhvvi-rmjfoxkgznaqpf+ks76+su7-sg_nqzgrpj...@mail.gmail.com

-- 
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution with SPI

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 3:33 AM, Konstantin Knizhnik
 wrote:
> It is possible to execute query concurrently using SPI?
> If so, how it can be enforced?
> I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't help:
> query is executed by single backend while the same query been launched at
> top level uses parallel plan:
>
> fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query,
> fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK);
> ...
> SPI_cursor_fetch(fsstate->portal, true, 1);

Parallel execution isn't possible if you are using a cursor-type
interface, because a parallel query can't be suspended and resumed
like a non-parallel query.  If you use a function that executes the
query to completion in one go, like SPI_execute_plan, then it's cool.
See also commit 61c2e1a95f94bb904953a6281ce17a18ac38ee6d.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel query execution with SPI

2017-03-31 Thread Konstantin Knizhnik

Hi hackers,

It is possible to execute query concurrently using SPI?
If so, how it can be enforced?
I tried to open cursor with CURSOR_OPT_PARALLEL_OK flag but it doesn't 
help: query is executed by single backend while the same query been 
launched at top level uses parallel plan:


fsstate->portal = SPI_cursor_open_with_args(NULL, fsstate->query, 
fsstate->numParams, argtypes, values, nulls, true, CURSOR_OPT_PARALLEL_OK);

...
SPI_cursor_fetch(fsstate->portal, true, 1);

Thanks in advance,

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-24 Thread Paul Ramsey
On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote:
 I mentioned last year that I wanted to start working on parallelism:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 I believe it is time to start adding parallel execution to the backend. 
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes. I think it is time we
 start to consider additional options
 
 

 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries. The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.
 
I just got out of a meeting that included Oracle Spatial folks, who
were boasting of big performance increases in enabling parallel query
on their spatial queries. Basically the workloads on things like big
spatial joins are entirely CPU bound, so they are seeing that adding
15 processors makes things 15x faster. Spatial folks would love love
love to see parallel query execution.

-- 
Paul Ramsey
http://cleverelephant.ca
http://postgis.net

 
 
 
 




Re: [HACKERS] Parallel query execution

2013-01-24 Thread Bruce Momjian
On Thu, Jan 24, 2013 at 02:34:49PM -0800, Paul Ramsey wrote:
 On Tuesday, January 15, 2013 at 2:14 PM, Bruce Momjian wrote:
 
 I mentioned last year that I wanted to start working on parallelism:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes. I think it is time we
 start to consider additional options
 
 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries. The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.
 
 
 I just got out of a meeting that included Oracle Spatial folks, who
 were boasting of big performance increases in enabling parallel query
 on their spatial queries. Basically the workloads on things like big
 spatial joins are entirely CPU bound, so they are seeing that adding
 15 processors makes things 15x faster. Spatial folks would love love
 love to see parallel query execution.

I added PostGIS under the Expensive Functions opportunity:


https://wiki.postgresql.org/wiki/Parallel_Query_Execution#Specific_Opportunities

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Daniel Farina
On Tue, Jan 15, 2013 at 11:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 There are still 34 items needing attention in CF3.  I suggest that, if
 you have some spare time, your help would be very much appreciated
 there.  The commitfest that started on Jan 15th has 65 extra items.
 Anything currently listed in CF3 can rightfully be considered to be part
 of CF4, too.

 In case you hadn't noticed, we've totally lost control of the CF
 process.  Quite aside from the lack of progress on closing CF3, major
 hackers who should know better are submitting significant new feature
 patches now, despite our agreement in Ottawa that nothing big would be
 accepted after CF3.  At this point I'd bet against releasing 9.3 during
 2013.

I have been skimming the commitfest application, and unlike some of
the previous commitfests a huge number of patches have had review at
some point in time, but probably need more...so looking for the red
Nobody in the 'reviewers' column probably understates the shortage
of review.

I'm curious what the qualitative feelings are on patches or clusters
thereof and what kind of review would be helpful in clearing the
field.

--
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Magnus Hagander
On Wed, Jan 16, 2013 at 12:03 AM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote:
 On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote:

  Why is this being discussed now?
 
  It is for 9.4 and will take months.  I didn't think there was a better
  time.  We don't usually discuss features during beta testing.

 Bruce, there are many, many patches on the queue. How will we ever get
 to beta testing if we begin open ended discussions on next release?

 If we can't finish what we've started for 9.3, why talk about 9.4?

 Yes, its a great topic for discussion, but there are better times.

 Like when?  I don't remember a policy of not discussing things now.
 Does anyone else remember this?  Are you saying feature discussion is
 only between commit-fests?  Is this written down anywhere?  I only
 remember beta-time as a time not to discuss features.

We kind of do - when in a CF we should do reviewing of existing
patches, when outside a CF we should do discussions and work on new
features. It's on http://wiki.postgresql.org/wiki/CommitFest. It
doesn't specifically say do this and don't do htat, but it says focus
on review and discussing things that will happen that far ahead is
definitely not focusing on review.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Robert Haas
On Wed, Jan 16, 2013 at 2:07 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 There are still 34 items needing attention in CF3.  I suggest that, if
 you have some spare time, your help would be very much appreciated
 there.  The commitfest that started on Jan 15th has 65 extra items.
 Anything currently listed in CF3 can rightfully be considered to be part
 of CF4, too.

 In case you hadn't noticed, we've totally lost control of the CF
 process.  Quite aside from the lack of progress on closing CF3, major
 hackers who should know better are submitting significant new feature
 patches now, despite our agreement in Ottawa that nothing big would be
 accepted after CF3.  At this point I'd bet against releasing 9.3 during
 2013.

Or we could reject all of those patches.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Robert Haas
On Wed, Jan 16, 2013 at 6:52 AM, Magnus Hagander mag...@hagander.net wrote:
 On Wed, Jan 16, 2013 at 12:03 AM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote:
 On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote:

  Why is this being discussed now?
 
  It is for 9.4 and will take months.  I didn't think there was a better
  time.  We don't usually discuss features during beta testing.

 Bruce, there are many, many patches on the queue. How will we ever get
 to beta testing if we begin open ended discussions on next release?

 If we can't finish what we've started for 9.3, why talk about 9.4?

 Yes, its a great topic for discussion, but there are better times.

 Like when?  I don't remember a policy of not discussing things now.
 Does anyone else remember this?  Are you saying feature discussion is
 only between commit-fests?  Is this written down anywhere?  I only
 remember beta-time as a time not to discuss features.

 We kind of do - when in a CF we should do reviewing of existing
 patches, when outside a CF we should do discussions and work on new
 features. It's on http://wiki.postgresql.org/wiki/CommitFest. It
 doesn't specifically say do this and don't do htat, but it says focus
 on review and discussing things that will happen that far ahead is
 definitely not focusing on review.

Bruce is evidently under the impression that he's no longer under any
obligation to review or commit other people's patches, or participate
in the CommitFest process in any way.  I believe that he has not
committed a significant patch written by someone else in several
years.  If the committers on the core team aren't committed to the
process, it doesn't stand much chance of working.

The fact that I have been completely buried for the last six months is
perhaps not helping, either, but even at the very low level of
engagement I've been at recently, I've still done more reviews (a few)
than patch submissions (none).  I view it as everyone's responsibility
to maintain a similar balance in their own work.  And some people are,
but not enough, especially among the committers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote:
 Well, there's the fault in your logic. It won't be as linear.

I really don't see how this has become so difficult to communicate.

It doesn't have to be linear.

We're currently doing massive amounts of parallel processing by hand
using partitioning, tablespaces, and client-side logic to split up the
jobs.  It's certainly *much* faster than doing it in a single thread.
It's also faster with 10 processes going than 5 (we've checked).  With
10 going, we've hit the FC fabric limit (and these are spinning disks in
the SAN, not SSDs).  I'm also sure it'd be much slower if all 10
processes were trying to read data through a single process that's
reading from the I/O system.  We've got some processes which essentially
end up doing that and we don't come anywhere near the total FC fabric
bandwidth when just scanning through the system because, at that point,
you do hit the limits of how fast the individual drive sets can provide
data.

To be clear- I'm not suggesting that we would parallelize a SeqScan node
and have the nodes above it be single-threaded.  As I said upthread- we
want to parallelize reading and processing the data coming in.  Perhaps
at some level that works out to not change how we actually *do* seqscans
at all and instead something higher in the plan tree just creates
multiple of them on independent threads, but it's still going to end up
being parallel I/O in the end.

I'm done with this thread for now- as brought up, we need to focus on
getting 9.3 out the door.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 In case you hadn't noticed, we've totally lost control of the CF
 process.  

I concur.

 Quite aside from the lack of progress on closing CF3, major
 hackers who should know better are submitting significant new feature
 patches now, despite our agreement in Ottawa that nothing big would be
 accepted after CF3.  

For my small part, it wasn't my intent to drop a contentious patch at
the end.  I had felt it was pretty minor and relatively simple.  My
arguments regarding the popen patch were simply that it didn't address
one of the use-cases that I was hoping to.

I'll hold off on working on the compressed transport for now in favor of
doing reviews and trying to help get 9.3 wrapped up.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Stephen Frost
* Daniel Farina (dan...@heroku.com) wrote:
 I have been skimming the commitfest application, and unlike some of
 the previous commitfests a huge number of patches have had review at
 some point in time, but probably need more...so looking for the red
 Nobody in the 'reviewers' column probably understates the shortage
 of review.

I've been frustrated by that myself.  I realize we don't want to
duplicate work but I'm really starting to think that having the
Reviewers column has turned out to actually work against us.

 I'm curious what the qualitative feelings are on patches or clusters
 thereof and what kind of review would be helpful in clearing the
 field.

I haven't been thrilled with the patches that I've looked at but they've
also been ones that hadn't been reviewed before, so perhaps that's what
should be expected.  It'd be neat if we had some idea of what committers
were actively working on and keep off of *those*, but keep working on
the ones which aren't being worked by a committer currently.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Andrew Dunstan


On 01/15/2013 11:32 PM, Bruce Momjian wrote:

On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:


On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:

 Claudio, Stephen,

 It really seems like the areas where we could get the most bang for the
 buck in parallelism would be:

 1. Parallel sort
 2. Parallel aggregation (for commutative aggregates)
 3. Parallel nested loop join (especially for expression joins, like GIS)

parallel data load? :/

We have that in pg_restore, and I thinnk we are getting parallel dump in
9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest.
Is it still being worked on?




I am about half way through reviewing it. Unfortunately paid work take 
precedence over unpaid work.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Claudio Freire
On Wed, Jan 16, 2013 at 10:33 AM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
 Well, there's the fault in your logic. It won't be as linear.

 I really don't see how this has become so difficult to communicate.

 It doesn't have to be linear.

 We're currently doing massive amounts of parallel processing by hand
 using partitioning, tablespaces, and client-side logic to split up the
 jobs.  It's certainly *much* faster than doing it in a single thread.
 It's also faster with 10 processes going than 5 (we've checked).  With
 10 going, we've hit the FC fabric limit (and these are spinning disks in
 the SAN, not SSDs).  I'm also sure it'd be much slower if all 10
 processes were trying to read data through a single process that's
 reading from the I/O system.  We've got some processes which essentially
 end up doing that and we don't come anywhere near the total FC fabric
 bandwidth when just scanning through the system because, at that point,
 you do hit the limits of how fast the individual drive sets can provide
 data.

Well... just closing then (to let people focus on 9.3's CF), that's a
level of hardware I haven't had experience with, but seems to behave
much different than regular (big and small) RAID arrays.

In any case, perhaps tablespaces are a hint here: if nodes are working
on different tablespaces, there's an indication that they *can* be
parallelized efficiently. That could be fleshed out on a parallel
execution node, but for that to work the whole execution engine needs
to be thread-safe (or it has to fork). It won't be easy.

It's best to concentrate on lower-hanging fruits, like sorting and aggregates.

Now back to the CF.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Noah Misch
On Wed, Jan 16, 2013 at 08:42:29AM -0500, Stephen Frost wrote:
 * Daniel Farina (dan...@heroku.com) wrote:
  I have been skimming the commitfest application, and unlike some of
  the previous commitfests a huge number of patches have had review at
  some point in time, but probably need more...so looking for the red
  Nobody in the 'reviewers' column probably understates the shortage
  of review.
 
 I've been frustrated by that myself.  I realize we don't want to
 duplicate work but I'm really starting to think that having the
 Reviewers column has turned out to actually work against us.

That column tells the CF manager whom to browbeat.  Without a CF manager, a
stale entry can indeed make a patch look under-control when it isn't.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 01:37:28PM +0900, Michael Paquier wrote:
 
 
 On Wed, Jan 16, 2013 at 1:32 PM, Bruce Momjian br...@momjian.us wrote:
 
 On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
 
 
  On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
 
  Claudio, Stephen,
 
  It really seems like the areas where we could get the most bang for
 the
  buck in parallelism would be:
 
  1. Parallel sort
  2. Parallel aggregation (for commutative aggregates)
  3. Parallel nested loop join (especially for expression joins, like
 GIS)
 
  parallel data load? :/
 
 We have that in pg_restore, and I think we are getting parallel dump in
 9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest.
 Is it still being worked on?
 
 Not exactly, I meant something like being able to use parallel processing when
 doing INSERT or COPY directly in core. If there is a parallel processing
 infrastructure, it could also be used for such write operations. I agree that
 the cases mentioned by Josh are far more appealing though...

I am not sure how a COPY could be easily parallelized, but I supposed it
could be done as part of the 1GB segment feature.  People have
complained that COPY is CPU-bound, so it might be very interesting to
see if we could offload some of that parsing overhead to a child.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 01:48:29AM -0300, Alvaro Herrera wrote:
 Bruce Momjian escribió:
  On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
   
   
   On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
   
   Claudio, Stephen,
   
   It really seems like the areas where we could get the most bang for 
   the
   buck in parallelism would be:
   
   1. Parallel sort
   2. Parallel aggregation (for commutative aggregates)
   3. Parallel nested loop join (especially for expression joins, like 
   GIS)
   
   parallel data load? :/
  
  We have that in pg_restore, and I thinnk we are getting parallel dump in
  9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest. 
  Is it still being worked on?
 
 It's in the previous-to-last commitfest.  IIRC that patch required
 review and testing from people with some Windows background.
 
 There are still 34 items needing attention in CF3.  I suggest that, if
 you have some spare time, your help would be very much appreciated
 there.  The commitfest that started on Jan 15th has 65 extra items.
 Anything currently listed in CF3 can rightfully be considered to be part
 of CF4, too.

Wow, I had no idea we were that far behind.  I have avoided commit-fest
work because I often travel so might leave the items abandoned, and I
try to do cleanup of items that never make the commit-fest --- I thought
that was something that needed doing too, and I rarely can complete that
task.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 08:11:06AM -0500, Robert Haas wrote:
  We kind of do - when in a CF we should do reviewing of existing
  patches, when outside a CF we should do discussions and work on new
  features. It's on http://wiki.postgresql.org/wiki/CommitFest. It
  doesn't specifically say do this and don't do htat, but it says focus
  on review and discussing things that will happen that far ahead is
  definitely not focusing on review.
 
 Bruce is evidently under the impression that he's no longer under any
 obligation to review or commit other people's patches, or participate
 in the CommitFest process in any way.  I believe that he has not
 committed a significant patch written by someone else in several
 years.  If the committers on the core team aren't committed to the
 process, it doesn't stand much chance of working.

I assume you know I was the most frequent committer of other people's
patches for years before the commit-fests started, so I thought I would
move on to other things.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 09:05:39AM -0500, Andrew Dunstan wrote:
 
 On 01/15/2013 11:32 PM, Bruce Momjian wrote:
 On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
 
 On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
 
  Claudio, Stephen,
 
  It really seems like the areas where we could get the most bang for 
  the
  buck in parallelism would be:
 
  1. Parallel sort
  2. Parallel aggregation (for commutative aggregates)
  3. Parallel nested loop join (especially for expression joins, like 
  GIS)
 
 parallel data load? :/
 We have that in pg_restore, and I thinnk we are getting parallel dump in
 9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest.
 Is it still being worked on?
 
 
 
 I am about half way through reviewing it. Unfortunately paid work
 take precedence over unpaid work.

Do you think it will make it into 9.3?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Andrew Dunstan


On 01/16/2013 12:20 PM, Bruce Momjian wrote:

On Wed, Jan 16, 2013 at 09:05:39AM -0500, Andrew Dunstan wrote:

On 01/15/2013 11:32 PM, Bruce Momjian wrote:

On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:

On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:

 Claudio, Stephen,

 It really seems like the areas where we could get the most bang for the
 buck in parallelism would be:

 1. Parallel sort
 2. Parallel aggregation (for commutative aggregates)
 3. Parallel nested loop join (especially for expression joins, like GIS)

parallel data load? :/

We have that in pg_restore, and I thinnk we are getting parallel dump in
9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest.
Is it still being worked on?



I am about half way through reviewing it. Unfortunately paid work
take precedence over unpaid work.

Do you think it will make it into 9.3?


Yes, I hope it will.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 I am not sure how a COPY could be easily parallelized, but I supposed it
 could be done as part of the 1GB segment feature.  People have
 complained that COPY is CPU-bound, so it might be very interesting to
 see if we could offload some of that parsing overhead to a child.

COPY can certainly be CPU bound but before we can parallelize that
usefully we need to solve the problem around extent locking when trying
to do multiple COPY's to the same table.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Pavel Stehule
2013/1/16 Stephen Frost sfr...@snowman.net:
 * Bruce Momjian (br...@momjian.us) wrote:
 I am not sure how a COPY could be easily parallelized, but I supposed it
 could be done as part of the 1GB segment feature.  People have
 complained that COPY is CPU-bound, so it might be very interesting to
 see if we could offload some of that parsing overhead to a child.

 COPY can certainly be CPU bound but before we can parallelize that
 usefully we need to solve the problem around extent locking when trying
 to do multiple COPY's to the same table.

Probably update any related indexes and constraint checking should be
paralellized.

Regards

Pavel


 Thanks,

 Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 10:06:51PM +0100, Pavel Stehule wrote:
 2013/1/16 Stephen Frost sfr...@snowman.net:
  * Bruce Momjian (br...@momjian.us) wrote:
  I am not sure how a COPY could be easily parallelized, but I supposed it
  could be done as part of the 1GB segment feature.  People have
  complained that COPY is CPU-bound, so it might be very interesting to
  see if we could offload some of that parsing overhead to a child.
 
  COPY can certainly be CPU bound but before we can parallelize that
  usefully we need to solve the problem around extent locking when trying
  to do multiple COPY's to the same table.
 
 Probably update any related indexes and constraint checking should be
 paralellized.

Wiki updated:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Dickson S. Guedes
2013/1/16 Bruce Momjian br...@momjian.us:
 Wiki updated:

 https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Could we add CTE to that opportunities list? I think that some kind of
queries in CTE queries could be easilly parallelized.

[]s
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 07:57:01PM -0200, Dickson S. Guedes wrote:
 2013/1/16 Bruce Momjian br...@momjian.us:
  Wiki updated:
 
  https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 Could we add CTE to that opportunities list? I think that some kind of
 queries in CTE queries could be easilly parallelized.

I added CTEs with joins.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Jeff Janes
On Tuesday, January 15, 2013, Stephen Frost wrote:

 * Gavin Flower (gavinflo...@archidevsys.co.nz javascript:;) wrote:
  How about being aware of multiple spindles - so if the requested
  data covers multiple spindles, then data could be extracted in
  parallel. This may, or may not, involve multiple I/O channels?

 Yes, this should dovetail with partitioning and tablespaces to pick up
 on exactly that.


I'd rather not have the benefits of parallelism be tied to partitioning if
we can help it.  Hopefully implementing parallelism in core would result in
something more transparent than that.

Cheers,

Jeff


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Jeff Janes
On Tuesday, January 15, 2013, Gavin Flower wrote:

  On 16/01/13 11:14, Bruce Momjian wrote:

 I mentioned last year that I wanted to start working on parallelism:

   https://wiki.postgresql.org/wiki/Parallel_Query_Execution

 Years ago I added thread-safety to libpq.  Recently I added two parallel
 execution paths to pg_upgrade.  The first parallel path allows execution
 of external binaries pg_dump and psql (to restore).  The second parallel
 path does copy/link by calling fork/thread-safe C functions.  I was able
 to do each in 2-3 days.

 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes.  I think it is time we
 start to consider additional options.

 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries.  The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.

 I have summarized my ideas by updating our Parallel Query Execution wiki
 page:

   https://wiki.postgresql.org/wiki/Parallel_Query_Execution

 Please consider updating the page yourself or posting your ideas to this
 thread.  Thanks.


  Hmm...

 How about being aware of multiple spindles - so if the requested data
 covers multiple spindles, then data could be extracted in parallel.  This
 may, or may not, involve multiple I/O channels?



effective_io_concurrency does this for bitmap scans.  I thought there was a
patch in the commitfest to extend this to ordinary index scans, but now I
can't find it.  But it still doesn't give you CPU parallelism.  The nice
thing about CPU parallelism is that it usually brings some amount of IO
parallelism for free, while the reverse less likely to be so.

Cheers,

Jeff




Re: [HACKERS] Parallel query execution

2013-01-16 Thread Claudio Freire
On Wed, Jan 16, 2013 at 10:04 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Hmm...

 How about being aware of multiple spindles - so if the requested data
 covers multiple spindles, then data could be extracted in parallel.  This
 may, or may not, involve multiple I/O channels?



 effective_io_concurrency does this for bitmap scans.  I thought there was a
 patch in the commitfest to extend this to ordinary index scans, but now I
 can't find it.

I never pushed it to the CF since it interacts so badly with the
kernel. I was thinking about pushing the small part that is a net win
in all cases, the back-sequential patch, but that's independent of any
spindle count. It's more related to rotating media and read request
merges than it is to multiple spindles or parallelization.

The kernel guys basically are waiting for me to patch the kernel. I
think I convinced our IT guy at the office to lend me a machine for
tests... so it might happen soon.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote:
 On Tuesday, January 15, 2013, Stephen Frost wrote:
 
 * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
  How about being aware of multiple spindles - so if the requested
  data covers multiple spindles, then data could be extracted in
  parallel. This may, or may not, involve multiple I/O channels?
 
 Yes, this should dovetail with partitioning and tablespaces to pick up
 on exactly that.  
 
 
 I'd rather not have the benefits of parallelism be tied to partitioning if we
 can help it.  Hopefully implementing parallelism in core would result in
 something more transparent than that.

We will need a way to know we are not saturating the I/O channel with
random I/O that could have been sequential if it was single-threaded. 
Tablespaces give us that info;  not sure what else does.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Claudio Freire
On Wed, Jan 16, 2013 at 11:44 PM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote:
 On Tuesday, January 15, 2013, Stephen Frost wrote:

 * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
  How about being aware of multiple spindles - so if the requested
  data covers multiple spindles, then data could be extracted in
  parallel. This may, or may not, involve multiple I/O channels?

 Yes, this should dovetail with partitioning and tablespaces to pick up
 on exactly that.


 I'd rather not have the benefits of parallelism be tied to partitioning if we
 can help it.  Hopefully implementing parallelism in core would result in
 something more transparent than that.

 We will need a way to know we are not saturating the I/O channel with
 random I/O that could have been sequential if it was single-threaded.
 Tablespaces give us that info;  not sure what else does.

I do also think tablespaces are a safe bet. But it wouldn't help for
parallelizing sorts or other operations with tempfiles (tempfiles
reside on the same tablespace), or even over a single table (same
tablespace again). And when the query is CPU-bound, it could be
parallelized by simply making a multithreaded memory sort. Well, not
so simply, but I do think it's an important building block.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 11:56:21PM -0300, Claudio Freire wrote:
 On Wed, Jan 16, 2013 at 11:44 PM, Bruce Momjian br...@momjian.us wrote:
  On Wed, Jan 16, 2013 at 05:04:05PM -0800, Jeff Janes wrote:
  On Tuesday, January 15, 2013, Stephen Frost wrote:
 
  * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
   How about being aware of multiple spindles - so if the requested
   data covers multiple spindles, then data could be extracted in
   parallel. This may, or may not, involve multiple I/O channels?
 
  Yes, this should dovetail with partitioning and tablespaces to pick up
  on exactly that.
 
 
  I'd rather not have the benefits of parallelism be tied to partitioning if 
  we
  can help it.  Hopefully implementing parallelism in core would result in
  something more transparent than that.
 
  We will need a way to know we are not saturating the I/O channel with
  random I/O that could have been sequential if it was single-threaded.
  Tablespaces give us that info;  not sure what else does.
 
 I do also think tablespaces are a safe bet. But it wouldn't help for
 parallelizing sorts or other operations with tempfiles (tempfiles
 reside on the same tablespace), or even over a single table (same

We can round-robin temp tablespace usage if you list multiple entries.

 tablespace again). And when the query is CPU-bound, it could be
 parallelized by simply making a multithreaded memory sort. Well, not
 so simply, but I do think it's an important building block.

Yes, and detecting when to use these parallel features will be hard.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-16 Thread Jeff Janes
On Wednesday, January 16, 2013, Stephen Frost wrote:

 * Bruce Momjian (br...@momjian.us javascript:;) wrote:
  I am not sure how a COPY could be easily parallelized, but I supposed it
  could be done as part of the 1GB segment feature.  People have
  complained that COPY is CPU-bound, so it might be very interesting to
  see if we could offload some of that parsing overhead to a child.

 COPY can certainly be CPU bound but before we can parallelize that
 usefully we need to solve the problem around extent locking when trying
 to do multiple COPY's to the same table.


I think that is rather over-stating it.  Even with unindexed untriggered
tables, I can get some benefit from doing hand-rolled parallel COPY before
the extension lock becomes an issue, at least on some machines.  And with
triggered or indexed tables, all the more so.

Cheers,

Jeff


[HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq.  Recently I added two parallel
execution paths to pg_upgrade.  The first parallel path allows execution
of external binaries pg_dump and psql (to restore).  The second parallel
path does copy/link by calling fork/thread-safe C functions.  I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend. 
We already have some parallelism in the backend:
effective_io_concurrency and helper processes.  I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries.  The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page: 

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread.  Thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries.  The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.

This would be fantastic and I'd like to help.  Parallel query and real
partitioning are two of our biggest holes for OLAP and data warehouse
users.

 Please consider updating the page yourself or posting your ideas to this
 thread.  Thanks.

Will do.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Peter Geoghegan
On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote:
 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes.  I think it is time we
 start to consider additional options.

A few months back, I remarked [1] that speeding up sorting using
pipelining and asynchronous I/O was probably parallelism low-hanging
fruit. That hasn't changed, though I personally still don't have the
bandwidth to look into it in a serious way.

[1] 
http://www.postgresql.org/message-id/caeylb_vezpkdx54vex3x30oy_uoth89xoejjw6aucjjiujs...@mail.gmail.com

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 10:39:10PM +, Peter Geoghegan wrote:
 On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote:
  I believe it is time to start adding parallel execution to the backend.
  We already have some parallelism in the backend:
  effective_io_concurrency and helper processes.  I think it is time we
  start to consider additional options.
 
 A few months back, I remarked [1] that speeding up sorting using
 pipelining and asynchronous I/O was probably parallelism low-hanging
 fruit. That hasn't changed, though I personally still don't have the
 bandwidth to look into it in a serious way.
 
 [1] 
 http://www.postgresql.org/message-id/caeylb_vezpkdx54vex3x30oy_uoth89xoejjw6aucjjiujs...@mail.gmail.com

OK, I added the link to the wiki.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 10:53:29PM +, Simon Riggs wrote:
 On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote:
 
  I mentioned last year that I wanted to start working on parallelism:
 
 We don't normally begin discussing topics for next release just as a
 CF is starting.
 
 Why is this being discussed now?

It is for 9.4 and will take months.  I didn't think there was a better
time.  We don't usually discuss features during beta testing.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Simon Riggs
On 15 January 2013 22:14, Bruce Momjian br...@momjian.us wrote:

 I mentioned last year that I wanted to start working on parallelism:

We don't normally begin discussing topics for next release just as a
CF is starting.

Why is this being discussed now?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Simon Riggs
On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote:

 Why is this being discussed now?

 It is for 9.4 and will take months.  I didn't think there was a better
 time.  We don't usually discuss features during beta testing.

Bruce, there are many, many patches on the queue. How will we ever get
to beta testing if we begin open ended discussions on next release?

If we can't finish what we've started for 9.3, why talk about 9.4?

Yes, its a great topic for discussion, but there are better times.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Gavin Flower

On 16/01/13 11:14, Bruce Momjian wrote:

I mentioned last year that I wanted to start working on parallelism:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Years ago I added thread-safety to libpq.  Recently I added two parallel
execution paths to pg_upgrade.  The first parallel path allows execution
of external binaries pg_dump and psql (to restore).  The second parallel
path does copy/link by calling fork/thread-safe C functions.  I was able
to do each in 2-3 days.

I believe it is time to start adding parallel execution to the backend.
We already have some parallelism in the backend:
effective_io_concurrency and helper processes.  I think it is time we
start to consider additional options.

Parallelism isn't going to help all queries, in fact it might be just a
small subset, but it will be the larger queries.  The pg_upgrade
parallelism only helps clusters with multiple databases or tablespaces,
but the improvements are significant.

I have summarized my ideas by updating our Parallel Query Execution wiki
page:

https://wiki.postgresql.org/wiki/Parallel_Query_Execution

Please consider updating the page yourself or posting your ideas to this
thread.  Thanks.


Hmm...

How about being aware of multiple spindles - so if the requested data 
covers multiple spindles, then data could be extracted in parallel. This 
may, or may not, involve multiple I/O channels?


On large multiple processor machines, there are different blocks of 
memory that might be accessed at different speeds depending on the 
processor. Possibly a mechanism could be used to split a transaction 
over multiple processors to ensure the fastest memory is used?


Once a selection of rows has been made, then if there is a lot of 
reformatting going on, then could this be done in parallel?  I can of 
think of 2 very simplistic strategies: (A) use a different processor 
core for each column, or (B) farm out sets of rows to different cores.  
I am sure in reality, there are more subtleties and aspects of both the 
strategies will be used in a hybrid fashion along with other approaches.


I expect that before any parallel algorithm is invoked, then some sort 
of threshold needs to be exceeded to make it worth while. Different 
aspects of the parallel algorithm may have their own thresholds. It may 
not be worth applying a parallel algorithm for 10 rows from a simple 
table, but selecting 10,000 records from multiple tables each over 10 
million rows using joins may benefit for more extreme parallelism.


I expect that UNIONs, as well as the processing of partitioned tables, 
may be amenable to parallel processing.



Cheers,
Gavin



Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 11:01:04PM +, Simon Riggs wrote:
 On 15 January 2013 22:55, Bruce Momjian br...@momjian.us wrote:
 
  Why is this being discussed now?
 
  It is for 9.4 and will take months.  I didn't think there was a better
  time.  We don't usually discuss features during beta testing.
 
 Bruce, there are many, many patches on the queue. How will we ever get
 to beta testing if we begin open ended discussions on next release?
 
 If we can't finish what we've started for 9.3, why talk about 9.4?
 
 Yes, its a great topic for discussion, but there are better times.

Like when?  I don't remember a policy of not discussing things now. 
Does anyone else remember this?  Are you saying feature discussion is
only between commit-fests?  Is this written down anywhere?  I only
remember beta-time as a time not to discuss features.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote:
 On 16/01/13 11:14, Bruce Momjian wrote:
 
 I mentioned last year that I wanted to start working on parallelism:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 Years ago I added thread-safety to libpq.  Recently I added two parallel
 execution paths to pg_upgrade.  The first parallel path allows execution
 of external binaries pg_dump and psql (to restore).  The second parallel
 path does copy/link by calling fork/thread-safe C functions.  I was able
 to do each in 2-3 days.
 
 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes.  I think it is time we
 start to consider additional options.
 
 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries.  The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.
 
 I have summarized my ideas by updating our Parallel Query Execution wiki
 page:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 Please consider updating the page yourself or posting your ideas to this
 thread.  Thanks.
 
 
 Hmm...
 
 How about being aware of multiple spindles - so if the requested data covers
 multiple spindles, then data could be extracted in parallel.  This may, or may
 not, involve multiple I/O channels?

Well, we usually label these as tablespaces.  I don't know if
spindle-level is a reasonable level to add.

 On large multiple processor machines, there are different blocks of memory 
 that
 might be accessed at different speeds depending on the processor.  Possibly a
 mechanism could be used to split a transaction over multiple processors to
 ensure the fastest memory is used?

That seems too far-out for an initial approach.

 Once a selection of rows has been made, then if there is a lot of reformatting
 going on, then could this be done in parallel?  I can of think of 2 very
 simplistic strategies: (A) use a different processor core for each column, or
 (B) farm out sets of rows to different cores.  I am sure in reality, there are
 more subtleties and aspects of both the strategies will be used in a hybrid
 fashion along with other approaches.

Probably #2, but that is going to require having some of modules
thread/fork-safe, and that is going to be tricky.

 I expect that before any parallel algorithm is invoked, then some sort of
 threshold needs to be exceeded to make it worth while.  Different aspects of
 the parallel algorithm may have their own thresholds.  It may not be worth
 applying a parallel algorithm for 10 rows from a simple table, but selecting
 10,000 records from multiple tables each over 10 million rows using joins may
 benefit for more extreme parallelism.

Right, I bet we will need some way to control when the overhead of
parallel execution is worth it.

 I expect that UNIONs, as well as the processing of partitioned tables, may be
 amenable to parallel processing.

Interesting idea on UNION.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost
* Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
 How about being aware of multiple spindles - so if the requested
 data covers multiple spindles, then data could be extracted in
 parallel. This may, or may not, involve multiple I/O channels?

Yes, this should dovetail with partitioning and tablespaces to pick up
on exactly that.  We're implementing our own poor-man's parallelism
using exactly this to use as much of the CPU and I/O bandwidth as we
can.  I have every confidence that it could be done better and be
simpler for us if it was handled in the backend.

 On large multiple processor machines, there are different blocks of
 memory that might be accessed at different speeds depending on the
 processor. Possibly a mechanism could be used to split a transaction
 over multiple processors to ensure the fastest memory is used?

Let's work on getting it working on the h/w that PG is most commonly
deployed on first..  I agree that we don't want to paint ourselves into
a corner with this, but I don't think massive NUMA systems are what we
should focus on first (are you familiar with any that run PG today..?).
I don't expect we're going to be trying to fight with the Linux (or
whatever) kernel over what threads run on what processors with access to
what memory on small-NUMA systems (x86-based).

 Once a selection of rows has been made, then if there is a lot of
 reformatting going on, then could this be done in parallel?  I can
 of think of 2 very simplistic strategies: (A) use a different
 processor core for each column, or (B) farm out sets of rows to
 different cores.  I am sure in reality, there are more subtleties
 and aspects of both the strategies will be used in a hybrid fashion
 along with other approaches.

Given our row-based storage architecture, I can't imagine we'd do
anything other than take a row-based approach to this..  I would think
we'd do two things: parallelize based on partitioning, and parallelize
seqscan's across the individual heap files which are split on a per-1G
boundary already.  Perhaps we can generalize that and scale it based on
the number of available processors and the size of the relation but I
could see advantages in matching up with what the kernel thinks are
independent files.

 I expect that before any parallel algorithm is invoked, then some
 sort of threshold needs to be exceeded to make it worth while.

Certainly.  That's need to be included in the optimization model to
support this.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Tue, Jan 15, 2013 at 06:15:57PM -0500, Stephen Frost wrote:
 * Gavin Flower (gavinflo...@archidevsys.co.nz) wrote:
  How about being aware of multiple spindles - so if the requested
  data covers multiple spindles, then data could be extracted in
  parallel. This may, or may not, involve multiple I/O channels?
 
 Yes, this should dovetail with partitioning and tablespaces to pick up
 on exactly that.  We're implementing our own poor-man's parallelism
 using exactly this to use as much of the CPU and I/O bandwidth as we
 can.  I have every confidence that it could be done better and be
 simpler for us if it was handled in the backend.

Yes, I have listed tablespaces and partitions as possible parallel
options on the wiki.

  On large multiple processor machines, there are different blocks of
  memory that might be accessed at different speeds depending on the
  processor. Possibly a mechanism could be used to split a transaction
  over multiple processors to ensure the fastest memory is used?
 
 Let's work on getting it working on the h/w that PG is most commonly
 deployed on first..  I agree that we don't want to paint ourselves into
 a corner with this, but I don't think massive NUMA systems are what we
 should focus on first (are you familiar with any that run PG today..?).
 I don't expect we're going to be trying to fight with the Linux (or
 whatever) kernel over what threads run on what processors with access to
 what memory on small-NUMA systems (x86-based).

Agreed.

  Once a selection of rows has been made, then if there is a lot of
  reformatting going on, then could this be done in parallel?  I can
  of think of 2 very simplistic strategies: (A) use a different
  processor core for each column, or (B) farm out sets of rows to
  different cores.  I am sure in reality, there are more subtleties
  and aspects of both the strategies will be used in a hybrid fashion
  along with other approaches.
 
 Given our row-based storage architecture, I can't imagine we'd do
 anything other than take a row-based approach to this..  I would think
 we'd do two things: parallelize based on partitioning, and parallelize
 seqscan's across the individual heap files which are split on a per-1G
 boundary already.  Perhaps we can generalize that and scale it based on
 the number of available processors and the size of the relation but I
 could see advantages in matching up with what the kernel thinks are
 independent files.

The 1GB idea is interesting.  I found in pg_upgrade that file copy would
just overwhelm the I/O channel, and that doing multiple copies on the
same device had no win, but those were pure I/O operations --- a
sequential scan might be enough of a mix of I/O and CPU that parallelism
might help.

  I expect that before any parallel algorithm is invoked, then some
  sort of threshold needs to be exceeded to make it worth while.
 
 Certainly.  That's need to be included in the optimization model to
 support this.

I have updated the wiki to reflect the ideas mentioned above.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Michael Paquier
On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian br...@momjian.us wrote:

 I mentioned last year that I wanted to start working on parallelism:

 https://wiki.postgresql.org/wiki/Parallel_Query_Execution

 Years ago I added thread-safety to libpq.  Recently I added two parallel
 execution paths to pg_upgrade.  The first parallel path allows execution
 of external binaries pg_dump and psql (to restore).  The second parallel
 path does copy/link by calling fork/thread-safe C functions.  I was able
 to do each in 2-3 days.

 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes.  I think it is time we
 start to consider additional options.

 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries.  The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.

 I have summarized my ideas by updating our Parallel Query Execution wiki
 page:

 https://wiki.postgresql.org/wiki/Parallel_Query_Execution

 Please consider updating the page yourself or posting your ideas to this
 thread.  Thanks.

Honestly that would be a great feature, and I would be happy helping
working on it.
Taking advantage of parallelism in a server with multiple core, especially
for things like large sorting operations would be great.
Just thinking loudly, but wouldn't it be the role of the planner to
determine if such or such query is worth using parallelism? The executor
would then be in charge of actually firing the tasks in parallel that
planner has determined necessary to do.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 09:11:20AM +0900, Michael Paquier wrote:
 
 
 On Wed, Jan 16, 2013 at 7:14 AM, Bruce Momjian br...@momjian.us wrote:
 
 I mentioned last year that I wanted to start working on parallelism:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 Years ago I added thread-safety to libpq.  Recently I added two parallel
 execution paths to pg_upgrade.  The first parallel path allows execution
 of external binaries pg_dump and psql (to restore).  The second parallel
 path does copy/link by calling fork/thread-safe C functions.  I was able
 to do each in 2-3 days.
 
 I believe it is time to start adding parallel execution to the backend.
 We already have some parallelism in the backend:
 effective_io_concurrency and helper processes.  I think it is time we
 start to consider additional options.
 
 Parallelism isn't going to help all queries, in fact it might be just a
 small subset, but it will be the larger queries.  The pg_upgrade
 parallelism only helps clusters with multiple databases or tablespaces,
 but the improvements are significant.
 
 I have summarized my ideas by updating our Parallel Query Execution wiki
 page:
 
 https://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 Please consider updating the page yourself or posting your ideas to this
 thread.  Thanks.
 
 Honestly that would be a great feature, and I would be happy helping working 
 on
 it.
 Taking advantage of parallelism in a server with multiple core, especially for
 things like large sorting operations would be great.
 Just thinking loudly, but wouldn't it be the role of the planner to determine
 if such or such query is worth using parallelism? The executor would then be 
 in
 charge of actually firing the tasks in parallel that planner has determined
 necessary to do.

Yes, it would probably be driven off of the optimizer statistics.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Claudio Freire
On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote:
 Given our row-based storage architecture, I can't imagine we'd do
 anything other than take a row-based approach to this..  I would think
 we'd do two things: parallelize based on partitioning, and parallelize
 seqscan's across the individual heap files which are split on a per-1G
 boundary already.  Perhaps we can generalize that and scale it based on
 the number of available processors and the size of the relation but I
 could see advantages in matching up with what the kernel thinks are
 independent files.

 The 1GB idea is interesting.  I found in pg_upgrade that file copy would
 just overwhelm the I/O channel, and that doing multiple copies on the
 same device had no win, but those were pure I/O operations --- a
 sequential scan might be enough of a mix of I/O and CPU that parallelism
 might help.

AFAIR, synchroscans were introduced because multiple large sequential
scans were counterproductive (big time).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote:
 On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote:
  The 1GB idea is interesting.  I found in pg_upgrade that file copy would
  just overwhelm the I/O channel, and that doing multiple copies on the
  same device had no win, but those were pure I/O operations --- a
  sequential scan might be enough of a mix of I/O and CPU that parallelism
  might help.
 
 AFAIR, synchroscans were introduced because multiple large sequential
 scans were counterproductive (big time).

Sequentially scanning the *same* data over and over is certainly
counterprouctive.  Synchroscans fixed that, yes.  That's not what we're
talking about though- we're talking about scanning and processing
independent sets of data using multiple processes.  It's certainly
possible that in some cases that won't be as good, but there will be
quite a few cases where it's much, much better.

Consider a very complicated function running against each row which
makes the CPU the bottleneck instead of the i/o system.  That type of a
query will never run faster than a single CPU in a single-process
environment, regardless of if you have synch-scans or not, while in a
multi-process environment you'll take advantage of the extra CPUs which
are available and use more of the I/O bandwidth that isn't yet
exhausted.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Claudio Freire
On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost sfr...@snowman.net wrote:
 * Claudio Freire (klaussfre...@gmail.com) wrote:
 On Tue, Jan 15, 2013 at 8:19 PM, Bruce Momjian br...@momjian.us wrote:
  The 1GB idea is interesting.  I found in pg_upgrade that file copy would
  just overwhelm the I/O channel, and that doing multiple copies on the
  same device had no win, but those were pure I/O operations --- a
  sequential scan might be enough of a mix of I/O and CPU that parallelism
  might help.

 AFAIR, synchroscans were introduced because multiple large sequential
 scans were counterproductive (big time).

 Sequentially scanning the *same* data over and over is certainly
 counterprouctive.  Synchroscans fixed that, yes.  That's not what we're
 talking about though- we're talking about scanning and processing
 independent sets of data using multiple processes.

I don't see the difference. Blocks are blocks (unless they're cached).

  It's certainly
 possible that in some cases that won't be as good

If memory serves me correctly (and it does, I suffered it a lot), the
performance hit is quite considerable. Enough to make it a lot worse
rather than not as good.

 but there will be
 quite a few cases where it's much, much better.

Just cached segments.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost
* Claudio Freire (klaussfre...@gmail.com) wrote:
 On Wed, Jan 16, 2013 at 12:13 AM, Stephen Frost sfr...@snowman.net wrote:
  Sequentially scanning the *same* data over and over is certainly
  counterprouctive.  Synchroscans fixed that, yes.  That's not what we're
  talking about though- we're talking about scanning and processing
  independent sets of data using multiple processes.
 
 I don't see the difference. Blocks are blocks (unless they're cached).

Not quite.  Having to go out to the kernel isn't free.  Additionally,
the seq scans used to pollute our shared buffers prior to
synch-scanning, which didn't help things.

   It's certainly
  possible that in some cases that won't be as good
 
 If memory serves me correctly (and it does, I suffered it a lot), the
 performance hit is quite considerable. Enough to make it a lot worse
 rather than not as good.

I feel like we must not be communicating very well.

If the CPU is pegged at 100% and the I/O system is at 20%, adding
another CPU at 100% will bring the I/O load up to 40% and you're now
processing data twice as fast overall.  If you're running a single CPU
at 20% and your I/O system is at 100%, then adding another CPU isn't
going to help and may even degrade performance by causing problems for
the I/O system.  The goal of the optimizer will be to model the plan to
account for exactly that, as best it can.

  but there will be
  quite a few cases where it's much, much better.
 
 Just cached segments.

No, certainly not just cached segments.  Any situation where the CPU is
the bottleneck.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Josh Berkus

 but there will be
 quite a few cases where it's much, much better.
 
 Just cached segments.

Actually, thanks to much faster storage (think SSD, SAN), it's easily
possible for PostgreSQL to become CPU-limited on a seq scan query, even
when reading from disk.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Stephen Frost
* Josh Berkus (j...@agliodbs.com) wrote:
 Actually, thanks to much faster storage (think SSD, SAN), it's easily
 possible for PostgreSQL to become CPU-limited on a seq scan query, even
 when reading from disk.

Particularly with a complex filter being applied or if it's feeding into
something above that's expensive..

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Josh Berkus
Claudio, Stephen,

It really seems like the areas where we could get the most bang for the
buck in parallelism would be:

1. Parallel sort
2. Parallel aggregation (for commutative aggregates)
3. Parallel nested loop join (especially for expression joins, like GIS)


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Michael Paquier
On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:

 Claudio, Stephen,

 It really seems like the areas where we could get the most bang for the
 buck in parallelism would be:

 1. Parallel sort
 2. Parallel aggregation (for commutative aggregates)
 3. Parallel nested loop join (especially for expression joins, like GIS)

parallel data load? :/
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Bruce Momjian
On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
 
 
 On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
 
 Claudio, Stephen,
 
 It really seems like the areas where we could get the most bang for the
 buck in parallelism would be:
 
 1. Parallel sort
 2. Parallel aggregation (for commutative aggregates)
 3. Parallel nested loop join (especially for expression joins, like GIS)
 
 parallel data load? :/

We have that in pg_restore, and I thinnk we are getting parallel dump in
9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest. 
Is it still being worked on?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Michael Paquier
On Wed, Jan 16, 2013 at 1:32 PM, Bruce Momjian br...@momjian.us wrote:

 On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
 
 
  On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
 
  Claudio, Stephen,
 
  It really seems like the areas where we could get the most bang for
 the
  buck in parallelism would be:
 
  1. Parallel sort
  2. Parallel aggregation (for commutative aggregates)
  3. Parallel nested loop join (especially for expression joins, like
 GIS)
 
  parallel data load? :/

 We have that in pg_restore, and I thinnk we are getting parallel dump in
 9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest.
 Is it still being worked on?

Not exactly, I meant something like being able to use parallel processing
when doing INSERT or COPY directly in core. If there is a parallel
processing infrastructure, it could also be used for such write operations.
I agree that the cases mentioned by Josh are far more appealing though...
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Claudio Freire
On Wed, Jan 16, 2013 at 12:55 AM, Stephen Frost sfr...@snowman.net wrote:
 If memory serves me correctly (and it does, I suffered it a lot), the
 performance hit is quite considerable. Enough to make it a lot worse
 rather than not as good.

 I feel like we must not be communicating very well.

 If the CPU is pegged at 100% and the I/O system is at 20%, adding
 another CPU at 100% will bring the I/O load up to 40% and you're now
 processing data twice as fast overall

Well, there's the fault in your logic. It won't be as linear. Adding
another sequential scan will decrease bandwidth, if the I/O system was
doing say 10MB/s at 20% load, now it will be doing 20MB/s at 80% load
(maybe even worse). Quite suddenly you'll meet diminishing returns,
and the I/O subsystem which wasn't the bottleneck will become it,
bandwidth being the key. You might end up with less bandwidth than
you've started, if you go far enough past that knee.

Add some concurrent operations (connections) to the mix and it just gets worse.

Figuring out where the knee is may be the hardest problem you'll face.
I don't think it'll be predictable enough to make I/O parallelization
in that case worth the effort.

If you instead think of parallelizing random I/O (say index scans
within nested loops), that might work (or it might not). Again it
depends a helluva lot on what else is contending with the I/O
resources and how far ahead of optimum you push it. I've faced this
problem when trying to prefetch on index scans. If you try to prefetch
too much, you induce extra delays and it's a bad tradeoff.

Feel free to do your own testing.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Alvaro Herrera
Bruce Momjian escribió:
 On Wed, Jan 16, 2013 at 01:28:18PM +0900, Michael Paquier wrote:
  
  
  On Wed, Jan 16, 2013 at 1:22 PM, Josh Berkus j...@agliodbs.com wrote:
  
  Claudio, Stephen,
  
  It really seems like the areas where we could get the most bang for the
  buck in parallelism would be:
  
  1. Parallel sort
  2. Parallel aggregation (for commutative aggregates)
  3. Parallel nested loop join (especially for expression joins, like GIS)
  
  parallel data load? :/
 
 We have that in pg_restore, and I thinnk we are getting parallel dump in
 9.3, right?  Unfortunately, I don't see it in the last 9.3 commit-fest. 
 Is it still being worked on?

It's in the previous-to-last commitfest.  IIRC that patch required
review and testing from people with some Windows background.

There are still 34 items needing attention in CF3.  I suggest that, if
you have some spare time, your help would be very much appreciated
there.  The commitfest that started on Jan 15th has 65 extra items.
Anything currently listed in CF3 can rightfully be considered to be part
of CF4, too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Jeff Janes
On Tuesday, January 15, 2013, Simon Riggs wrote:

 On 15 January 2013 22:55, Bruce Momjian br...@momjian.us javascript:;
 wrote:

  Why is this being discussed now?
 
  It is for 9.4 and will take months.  I didn't think there was a better
  time.  We don't usually discuss features during beta testing.

 Bruce, there are many, many patches on the queue. How will we ever get
 to beta testing if we begin open ended discussions on next release?

 If we can't finish what we've started for 9.3, why talk about 9.4?

 Yes, its a great topic for discussion, but there are better times.


Possibly so.  But unless we are to introduce a thinkfest, how do we know
when such a better time would be?

Lately commit-fests have been basically a continuous thing, except during
beta which would be an even worse time to discuss it.  It think that
parallel execution is huge and probably more likely for 9.5 (10.0?) than
9.4 for the general case (maybe some special cases for 9.4, like index
builds).  Yet the single biggest risk I see to the future of the project is
the lack of parallel execution.

Cheers,

Jeff


Re: [HACKERS] Parallel query execution

2013-01-15 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 There are still 34 items needing attention in CF3.  I suggest that, if
 you have some spare time, your help would be very much appreciated
 there.  The commitfest that started on Jan 15th has 65 extra items.
 Anything currently listed in CF3 can rightfully be considered to be part
 of CF4, too.

In case you hadn't noticed, we've totally lost control of the CF
process.  Quite aside from the lack of progress on closing CF3, major
hackers who should know better are submitting significant new feature
patches now, despite our agreement in Ottawa that nothing big would be
accepted after CF3.  At this point I'd bet against releasing 9.3 during
2013.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel Query Execution Project

2010-09-28 Thread Li Jie

Hi all,

I'm interested in this parallel project, 
http://wiki.postgresql.org/wiki/Parallel_Query_Execution


But I can't find any discussion and current progress in the website, it 
seems to stop for nearly a year?


Thanks,
Li Jie

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Query Execution Project

2010-09-28 Thread Markus Wanner
Hi,

On 09/28/2010 07:24 AM, Li Jie wrote:
 I'm interested in this parallel project,
 http://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 But I can't find any discussion and current progress in the website, it
 seems to stop for nearly a year?

Yeah, I don't know of anybody really working on it ATM.

If you are interested in a process based design, please have a look at
the bgworker infrastructure stuff. It could be of help for a
process-based implementation.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Query Execution Project

2010-09-28 Thread Hans-Jürgen Schönig
On Sep 28, 2010, at 10:15 AM, Markus Wanner wrote:

 Hi,
 
 On 09/28/2010 07:24 AM, Li Jie wrote:
 I'm interested in this parallel project,
 http://wiki.postgresql.org/wiki/Parallel_Query_Execution
 
 But I can't find any discussion and current progress in the website, it
 seems to stop for nearly a year?
 
 Yeah, I don't know of anybody really working on it ATM.
 
 If you are interested in a process based design, please have a look at
 the bgworker infrastructure stuff. It could be of help for a
 process-based implementation.
 
 Regards
 
 Markus Wanner



yes, i don't know of anybody either.
in addition to that it is more than a giant task. it means working on more than 
just one isolated part.
practically i cannot think of any stage of query execution which would not need 
some changes.
i don't see a feature like that within a realistic timeframe.

regards,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers