Re: [HACKERS] Parallel Sort

2013-05-26 Thread Stephen Frost
* Noah Misch (n...@leadboat.com) wrote: > In particular, implementing a GPU-based strcoll() for bttextcmp > sounds like quite a project in its own right. It also wouldn't likely be helpful... To be able to use a GPU effectively, last I looked, you need to be able to move a large chunk of data to

Re: [HACKERS] Parallel Sort

2013-05-26 Thread Noah Misch
On Fri, May 24, 2013 at 01:13:21PM -0500, Jim Nasby wrote: > On 5/13/13 9:28 AM, Noah Misch wrote: >> It would be great if one client session could take advantage of multiple CPU >> cores. EnterpriseDB wishes to start the trek into this problem space for 9.4 >> by implementing parallel internal (i

Re: [HACKERS] Parallel Sort

2013-05-24 Thread Kohei KaiGai
Let me introduce one thing we discussed in the developer meeting at Ottawa. We got a consensus that pluggable exec-node may be useful to replace a part of exec-node tree with an alternative one being implemented by extensions; which will allow to run something like "GpuSort" instead of existing Sor

Re: [HACKERS] Parallel Sort

2013-05-24 Thread james
> Have you considered GPU-based sorting? I know there's been discussion in the past. If you use OpenCL, then you can use a CPU driver if there is no GPU, and that can allow you to leverage all the CPU cores without having to do the multi-thread stuff in the backend. While the compilation of

Re: [HACKERS] Parallel Sort

2013-05-24 Thread Jim Nasby
On 5/13/13 9:28 AM, Noah Misch wrote: It would be great if one client session could take advantage of multiple CPU cores. EnterpriseDB wishes to start the trek into this problem space for 9.4 by implementing parallel internal (i.e. not spilling to disk) sort. This touches on a notable subset of

Re: [HACKERS] Parallel Sort

2013-05-19 Thread Hitoshi Harada
On Wed, May 15, 2013 at 11:11 AM, Noah Misch wrote: > On Wed, May 15, 2013 at 08:12:34AM +0900, Michael Paquier wrote: > > The concept of clause parallelism for backend worker is close to the > > concept of clause shippability introduced in Postgres-XC. In the case of > > XC, the equivalent of th

Re: [HACKERS] Parallel Sort

2013-05-16 Thread Noah Misch
On Wed, May 15, 2013 at 08:49:00PM +0200, Andres Freund wrote: > On 2013-05-13 10:28:59 -0400, Noah Misch wrote: > > Each worker needs to make SnapshotNow visibility decisions coherent with the > > master. For sorting, this allows us to look up comparison functions, even > > when the current trans

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Claudio Freire
On Wed, May 15, 2013 at 3:04 PM, Noah Misch wrote: > On Tue, May 14, 2013 at 12:15:24PM -0300, Claudio Freire wrote: >> You know what would be a low-hanging fruit that I've been thinking >> would benefit many of my own queries? >> >> "Parallel" sequential scan nodes. Even if there's no real parall

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Peter Geoghegan
On Wed, May 15, 2013 at 11:32 AM, Peter Geoghegan wrote: > I think that this effort could justify itself independently of any > attempt to introduce parallelism to in-memory sorting. I abandoned a > patch to introduce timsort to Postgres, because I knew that there was > no principled way to reap t

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Andres Freund
On 2013-05-13 10:28:59 -0400, Noah Misch wrote: > Each worker needs to make SnapshotNow visibility decisions coherent with the > master. For sorting, this allows us to look up comparison functions, even > when the current transaction created or modified those functions. This will > also be an ess

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Peter Geoghegan
On Mon, May 13, 2013 at 7:28 AM, Noah Misch wrote: > We should decide whether to actually sort in parallel based on the comparator > cost and the data size. The system currently has no information on comparator > cost: bt*cmp (and indeed almost all built-in functions) all have procost=1, > but bt

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Noah Misch
On Wed, May 15, 2013 at 12:26:52PM +0530, Amit Kapila wrote: > On Monday, May 13, 2013 7:59 PM Noah Misch wrote: > > We can allocate a small amount of permanent shared memory for > > coordination > > among a group of processes, but sorting will benefit from a region as > > large as > > maintenance_

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Noah Misch
On Wed, May 15, 2013 at 08:12:34AM +0900, Michael Paquier wrote: > The concept of clause parallelism for backend worker is close to the > concept of clause shippability introduced in Postgres-XC. In the case of > XC, the equivalent of the master backend is a backend located on a node > called Coord

Re: [HACKERS] Parallel Sort

2013-05-15 Thread Noah Misch
On Tue, May 14, 2013 at 12:15:24PM -0300, Claudio Freire wrote: > You know what would be a low-hanging fruit that I've been thinking > would benefit many of my own queries? > > "Parallel" sequential scan nodes. Even if there's no real parallelism > involved, when a query has to scan the same table

Re: [HACKERS] Parallel Sort

2013-05-14 Thread Amit Kapila
On Monday, May 13, 2013 7:59 PM Noah Misch wrote: > It would be great if one client session could take advantage of > multiple CPU > cores. EnterpriseDB wishes to start the trek into this problem space > for 9.4 > by implementing parallel internal (i.e. not spilling to disk) sort. > This > touches

Re: [HACKERS] Parallel Sort

2013-05-14 Thread Hannu Krosing
On 05/15/2013 07:30 AM, Tom Lane wrote: > Hannu Krosing writes: >> Has anybody looked into making syscache MVCC compliant ? > This is the wrong statement of the question. > > The right statement is "how would you like backend A to be updating > table T in compliance with a view of table T's schema

Re: [HACKERS] Parallel Sort

2013-05-14 Thread Tom Lane
Hannu Krosing writes: > Has anybody looked into making syscache MVCC compliant ? This is the wrong statement of the question. The right statement is "how would you like backend A to be updating table T in compliance with a view of table T's schema that is obsolete because of backend B's already-

Re: [HACKERS] Parallel Sort

2013-05-14 Thread Hannu Krosing
On 05/13/2013 07:10 PM, Robert Haas wrote: > On Mon, May 13, 2013 at 10:57 AM, Tom Lane wrote: >> This approach seems to me to be likely to guarantee that the startup >> overhead for any parallel sort is so large that only fantastically >> enormous sorts will come out ahead. >> >> I think you need

Re: [HACKERS] Parallel Sort

2013-05-14 Thread Michael Paquier
On Tue, May 14, 2013 at 11:59 PM, Noah Misch wrote: > On Tue, May 14, 2013 at 01:51:42PM +0900, Michael Paquier wrote: > > On Mon, May 13, 2013 at 11:28 PM, Noah Misch wrote: > > > > > * Identifying Parallel-Compatible Functions > > > > > > Not all functions can reasonably run on a worker backen

Re: [HACKERS] Parallel Sort

2013-05-14 Thread Claudio Freire
On Tue, May 14, 2013 at 11:50 AM, Noah Misch wrote: > On Mon, May 13, 2013 at 09:52:43PM +0200, Kohei KaiGai wrote: >> 2013/5/13 Noah Misch >> > The choice of whether to parallelize can probably be made a manner similar >> > to >> > the choice to do an external sort: the planner guesses the outco

Re: [HACKERS] Parallel Sort

2013-05-14 Thread Noah Misch
On Tue, May 14, 2013 at 01:51:42PM +0900, Michael Paquier wrote: > On Mon, May 13, 2013 at 11:28 PM, Noah Misch wrote: > > > * Identifying Parallel-Compatible Functions > > > > Not all functions can reasonably run on a worker backend. We should not > > presume that a VOLATILE function can tolera

Re: [HACKERS] Parallel Sort

2013-05-14 Thread Noah Misch
On Mon, May 13, 2013 at 09:52:43PM +0200, Kohei KaiGai wrote: > 2013/5/13 Noah Misch > > The choice of whether to parallelize can probably be made a manner similar > > to > > the choice to do an external sort: the planner guesses the outcome for > > costing > > purposes, but the actual decision is

Re: [HACKERS] Parallel Sort

2013-05-14 Thread Noah Misch
On Mon, May 13, 2013 at 07:55:16PM +0100, Simon Riggs wrote: > On 13 May 2013 15:28, Noah Misch wrote: > > > The heavyweight locking mechanism will need to be aware of the association > > between the master and its workers. > > Not sure I can see why that would be true. > > ISTM that the worker

Re: [HACKERS] Parallel Sort

2013-05-14 Thread Robert Haas
On Tue, May 14, 2013 at 12:51 AM, Michael Paquier wrote: >> I'm not sure what the specific answer here should look like. Simply >> having a >> CREATE FUNCTION ... PARALLEL_IS_FINE flag is not entirely satisfying, >> because >> the rules are liable to loosen over time. > > Having a flag would be e

Re: [HACKERS] Parallel Sort

2013-05-13 Thread Michael Paquier
On Mon, May 13, 2013 at 11:28 PM, Noah Misch wrote: > * Identifying Parallel-Compatible Functions > > Not all functions can reasonably run on a worker backend. We should not > presume that a VOLATILE function can tolerate the unstable execution order > imposed by parallelism, though a function l

Re: [HACKERS] Parallel Sort

2013-05-13 Thread Simon Riggs
On 13 May 2013 15:57, Tom Lane wrote: > I think you need to think in terms of restricting the problem space +1 > One obvious suggestion is to forbid the workers from > doing any database access of their own at all --- the parent would > have to do any required catalog lookups for sort functions

Re: [HACKERS] Parallel Sort

2013-05-13 Thread Kohei KaiGai
2013/5/13 Noah Misch > * Planner & Similar Issues > > We should decide whether to actually sort in parallel based on the > comparator > cost and the data size. The system currently has no information on > comparator > cost: bt*cmp (and indeed almost all built-in functions) all have procost=1, >

Re: [HACKERS] Parallel Sort

2013-05-13 Thread Simon Riggs
On 13 May 2013 15:28, Noah Misch wrote: > The heavyweight locking mechanism will need to be aware of the association > between the master and its workers. Not sure I can see why that would be true. ISTM that the workers need to be restricted in various ways from a full-functioned master. If the

Re: [HACKERS] Parallel Sort

2013-05-13 Thread Noah Misch
On Mon, May 13, 2013 at 04:39:01PM +0200, Andres Freund wrote: > On 2013-05-13 10:28:59 -0400, Noah Misch wrote: > > Each worker needs to make SnapshotNow visibility decisions coherent with the > > master. For sorting, this allows us to look up comparison functions, even > > when the current trans

Re: [HACKERS] Parallel Sort

2013-05-13 Thread Robert Haas
On Mon, May 13, 2013 at 10:57 AM, Tom Lane wrote: > This approach seems to me to be likely to guarantee that the startup > overhead for any parallel sort is so large that only fantastically > enormous sorts will come out ahead. > > I think you need to think in terms of restricting the problem spac

Re: [HACKERS] Parallel Sort

2013-05-13 Thread Andres Freund
On 2013-05-13 10:57:39 -0400, Tom Lane wrote: > Noah Misch writes: > > Each worker needs to make SnapshotNow visibility decisions coherent with the > > master. For sorting, this allows us to look up comparison functions, even > > when the current transaction created or modified those functions.

Re: [HACKERS] Parallel Sort

2013-05-13 Thread Tom Lane
Noah Misch writes: > Each worker needs to make SnapshotNow visibility decisions coherent with the > master. For sorting, this allows us to look up comparison functions, even > when the current transaction created or modified those functions. This will > also be an essential building block for an

Re: [HACKERS] Parallel Sort

2013-05-13 Thread Andres Freund
Hi, Interesting! Need to think about most, but one piece immediately came to mind: On 2013-05-13 10:28:59 -0400, Noah Misch wrote: > Each worker needs to make SnapshotNow visibility decisions coherent with the > master. For sorting, this allows us to look up comparison functions, even > when the

[HACKERS] Parallel Sort

2013-05-13 Thread Noah Misch
It would be great if one client session could take advantage of multiple CPU cores. EnterpriseDB wishes to start the trek into this problem space for 9.4 by implementing parallel internal (i.e. not spilling to disk) sort. This touches on a notable subset of the infrastructure components we'll nee