Re: [HACKERS] a funnel by any other name

2015-09-23 Thread Simon Riggs
On 22 September 2015 at 20:34, Robert Haas  wrote:

> On Tue, Sep 22, 2015 at 10:34 AM, Simon Riggs 
> wrote:
> > Robert, thanks for asking. We'll be stuck with these words for some time,
> > user visible via EXPLAIN so this is important.
>
> I agree, thanks for taking an interest.
>
> > The main operations are the 3 mentioned by Nicolas:
> > 1. Send data from many to one - which has subtypes for Unsorted, Sorted
> and
> > Evenly balanced (but unsorted)
> > 2. Send data from one process to many
> > 3. Send data from many to many
> >
> > My preferences for this would be
> > 1. Gather (but not Gather Motion) e.g. Gather, Gather Sorted
> > 2. Scatter (since Broadcast only makes sense in the context of a
> distributed
> > query, it sounds weird for intra-node query)
> > 3. Redistribution - which implies the description of how we spread data
> > across nodes is "Distribution" (or DISTRIBUTED BY)
>
> "Scatter" isn't one of the things that I mentioned in my original
> email.  Not sure where we'd use that, although there might be
> somewhere.


Understood. Thought it best to cover all the phrases we'll use in the
future now in one discussion.


> > For 3 we should definitely use Redistribute, since this is what Teradata
> has
> > been calling it for 30 years, which is where Greenplum got it from.
>
> That's a reasonable option.  We can bikeshed it some more when we get that
> far.


Sure


> > For 1, Gather makes most sense.
>
> Yeah, I'm leaning that way myself.  Amit argued for "Parallel Gather"
> but I think that's overkill.  There can't be a non-parallel gather,
> and long names are a pain.


Agreed

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] a funnel by any other name

2015-09-23 Thread Simon Riggs
On 22 September 2015 at 21:14, Alvaro Herrera 
wrote:

> Robert Haas wrote:
> > On Tue, Sep 22, 2015 at 10:34 AM, Simon Riggs 
> wrote:
>
> > > For 1, Gather makes most sense.
> >
> > Yeah, I'm leaning that way myself.  Amit argued for "Parallel Gather"
> > but I think that's overkill.  There can't be a non-parallel gather,
> > and long names are a pain.
>
> "Gather" seems a pretty decent choice to me too, even if we only have a
> single worker (your "1").  I don't think there's much need to
> distinguish 1 from 2, is there?
>

I think so. 1 is Many->1 and the other is 1->Many.

You may wish to do an operation like a parallel merge join.

Parallel Sort -> Scatter -> Parallel Merge

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] a funnel by any other name

2015-09-22 Thread Robert Haas
On Tue, Sep 22, 2015 at 10:34 AM, Simon Riggs  wrote:
> Robert, thanks for asking. We'll be stuck with these words for some time,
> user visible via EXPLAIN so this is important.

I agree, thanks for taking an interest.

> The main operations are the 3 mentioned by Nicolas:
> 1. Send data from many to one - which has subtypes for Unsorted, Sorted and
> Evenly balanced (but unsorted)
> 2. Send data from one process to many
> 3. Send data from many to many
>
> My preferences for this would be
> 1. Gather (but not Gather Motion) e.g. Gather, Gather Sorted
> 2. Scatter (since Broadcast only makes sense in the context of a distributed
> query, it sounds weird for intra-node query)
> 3. Redistribution - which implies the description of how we spread data
> across nodes is "Distribution" (or DISTRIBUTED BY)

"Scatter" isn't one of the things that I mentioned in my original
email.  Not sure where we'd use that, although there might be
somewhere.

> For 3 we should definitely use Redistribute, since this is what Teradata has
> been calling it for 30 years, which is where Greenplum got it from.

That's a reasonable option.  We can bikeshed it some more when we get that far.

> For 1, Gather makes most sense.

Yeah, I'm leaning that way myself.  Amit argued for "Parallel Gather"
but I think that's overkill.  There can't be a non-parallel gather,
and long names are a pain.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a funnel by any other name

2015-09-22 Thread Alvaro Herrera
Robert Haas wrote:
> On Tue, Sep 22, 2015 at 10:34 AM, Simon Riggs  wrote:

> > For 1, Gather makes most sense.
> 
> Yeah, I'm leaning that way myself.  Amit argued for "Parallel Gather"
> but I think that's overkill.  There can't be a non-parallel gather,
> and long names are a pain.

"Gather" seems a pretty decent choice to me too, even if we only have a
single worker (your "1").  I don't think there's much need to
distinguish 1 from 2, is there?

We can bikeshed the other names when the time comes; the insight in the
thread is good to have.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a funnel by any other name

2015-09-22 Thread Simon Riggs
On 17 September 2015 at 05:07, Nicolas Barbier 
wrote:

> 2015-09-17 Robert Haas :
>
> > 1. Exchange Bushy
> > 2. Exchange Inter-Operator (this is what's currently implemented)
> > 3. Exchange Replicate
> > 4. Exchange Merge
> > 5. Interchange
>
> > 1. ?
> > 2. Gather
> > 3. Broadcast (sorta)
> > 4. Gather Merge
> > 5. Redistribute
>
> > 1. Parallel Child
> > 2. Parallel Gather
> > 3. Parallel Replicate
> > 4. Parallel Merge
> > 5. Parallel Redistribute
>
> FYI, SQL Server has these in its execution plans:
>
> * Distribute Streams: read from one thread, write to multiple threads
> * Repartition Streams: both read and write from/to multiple threads
> * Gather Streams: read from multiple threads, write to one thread
>

Robert, thanks for asking. We'll be stuck with these words for some time,
user visible via EXPLAIN so this is important.

In general we should stick to words already used in other similar
situations, which could include DBMS and parallel ETL tools, of which there
are many more than mentioned here.

I would be against using any of these words: Funnel, Motion, Bushy because
I don't find them very descriptive (I think of spiders, bowels and shrubs
respectively, sorry).

These words are liable to confusion with other concepts: Replicate,
Duplicate, Distribute, Partition, Repartition, MERGE.

I've seen this concept called Fan-In/Fan-Out and Scatter/Gather

The main operations are the 3 mentioned by Nicolas:
1. Send data from many to one - which has subtypes for Unsorted, Sorted and
Evenly balanced (but unsorted)
2. Send data from one process to many
3. Send data from many to many

My preferences for this would be
1. Gather (but not Gather Motion) e.g. Gather, Gather Sorted
2. Scatter (since Broadcast only makes sense in the context of a
distributed query, it sounds weird for intra-node query)
3. Redistribution - which implies the description of how we spread data
across nodes is "Distribution" (or DISTRIBUTED BY)

For 3 we should definitely use Redistribute, since this is what Teradata
has been calling it for 30 years, which is where Greenplum got it from.
For 1, Gather makes most sense.

For 2, it could be either Scatter or Distribute. The former works well with
Gather, the latter works well with Redistribute.

Sorry for my absence for further review on parallel ops.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] a funnel by any other name

2015-09-17 Thread Nicolas Barbier
2015-09-17 Robert Haas :

> 1. Exchange Bushy
> 2. Exchange Inter-Operator (this is what's currently implemented)
> 3. Exchange Replicate
> 4. Exchange Merge
> 5. Interchange

> 1. ?
> 2. Gather
> 3. Broadcast (sorta)
> 4. Gather Merge
> 5. Redistribute

> 1. Parallel Child
> 2. Parallel Gather
> 3. Parallel Replicate
> 4. Parallel Merge
> 5. Parallel Redistribute

FYI, SQL Server has these in its execution plans:

* Distribute Streams: read from one thread, write to multiple threads
* Repartition Streams: both read and write from/to multiple threads
* Gather Streams: read from multiple threads, write to one thread

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] a funnel by any other name

2015-09-17 Thread Amit Kapila
On Thu, Sep 17, 2015 at 8:09 AM, Robert Haas  wrote:
>
> Or, yet another option, we could combine the similar operators under
> one umbrella while keeping the things that are more different as
> separate nodes:
>
> 1, 2. Exchange (or Gather or Funnel)
> 3, 5. Distribute (or Redistribute or Interchange or Exchange)
> 4. Exchange Merge (or Gather Merge or Funnel Merge)
>

+1 for combining, but it seems better to call 1,2 as Parallel Gather
and similarly for others.  Adding Parallel to Gather makes it
self-explanatory.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] a funnel by any other name

2015-09-16 Thread Petr Jelinek

On 2015-09-17 04:39, Robert Haas wrote:


1. Exchange Bushy
2. Exchange Inter-Operator (this is what's currently implemented)
3. Exchange Replicate
4. Exchange Merge
5. Interchange

Or taking inspiration from Greenplum, we could go with:

1. ?
2. Gather
3. Broadcast (sorta)
4. Gather Merge
5. Redistribute

Or maybe something like this:

1. Parallel Child
2. Parallel Gather
3. Parallel Replicate
4. Parallel Merge
5. Parallel Redistribute

Or, yet another option, we could combine the similar operators under
one umbrella while keeping the things that are more different as
separate nodes:

1, 2. Exchange (or Gather or Funnel)
3, 5. Distribute (or Redistribute or Interchange or Exchange)
4. Exchange Merge (or Gather Merge or Funnel Merge)

Thoughts?



Interesting read.

I think 1 and 2 are similar enough to be same node (Exchange sounds good 
to me).


Exchange Merge for 4 also sounds good.

About 3 and 5, if I understand correctly those are similar with the main 
difference being that in 3 all parents get copy of every tuple while in 
5 the tuples are partitioned between the parents. Sounds reasonable to 
have Redistribute/Interchange or something like that for both with some 
additional info saying if tuples are being partitioned or duplicated.


In any case, let's not name any of the nodes as "Replicate".

--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers