subject:"Re\: \[HACKERS\] On partitioning"

El 12/12/2014 23:09, "Alvaro Herrera"  escribió:
>
> Claudio Freire wrote:
>
> > Fair enough, but that's not the same as not requiring easy proofs. The
> > planner might not the one doing the proofs, but you still need proofs.
> >
> > Even if the proving method is hardcoded into the partitioning method,
> > as in the case of list or range partitioning, it's still a proof. With
> > arbitrary functions (which is what prompted me to mention proofs) you
> > can't do that. A function works very well for inserting, but not for
> > selecting.
> >
> > I could be wrong though. Maybe there's a way to turn SQL functions
> > into analyzable things? But it would still be very easy to shoot
> > yourself in the foot by writing one that is too complex.
>
> Arbitrary SQL expressions (including functions) are not the thing to use
> for partitioning -- at least that's how I understand this whole
> discussion.  I don't think you want to do "proofs" as such -- they are
> expensive.
>
> To make this discussion a bit clearer, there are two things to
> distinguish: one is routing tuples, when an INSERT or COPY command
> references the partitioned table, into the individual partitions
> (ingress); the other is deciding which partitions to read when a SELECT
> query wants to read tuples from the partitioned table (egress).
>
> On ingress, what you want is something like being able to do something
> on the tuple that tells you which partition it belongs into.  Ideally
> this is something much lighter than running an expression; if you can
> just apply an operator to the partitioning column values, that should be
> plenty fast.  This requires no proof.
>
> On egress you need some direct way to compare the scan quals with the
> partitioning values.  I would imagine this to be similar to how scan
> quals are compared to the values stored in a BRIN index: each scan qual
> has a corresponding operator strategy and a scan key, and you can say
> "aye" or "nay" based on a small set of operations that can be run
> cheaply, again without any proof or running arbitrary expressions.

Interesting that you mention BRIN. It does seem that it could be made to
work with BRIN's operator classes.

In fact, a partition-wide brin tuple could be stored per partition and that
in itself could be the definition for the partition.

Either preinitialized or dynamically updated. Would work even for arbitrary
routing functions, especially if the operator class to use is customizable.

I stand corrected.

Re: [HACKERS] On partitioning

2014-12-12 Thread Alvaro Herrera

Claudio Freire wrote:

> Fair enough, but that's not the same as not requiring easy proofs. The
> planner might not the one doing the proofs, but you still need proofs.
> 
> Even if the proving method is hardcoded into the partitioning method,
> as in the case of list or range partitioning, it's still a proof. With
> arbitrary functions (which is what prompted me to mention proofs) you
> can't do that. A function works very well for inserting, but not for
> selecting.
> 
> I could be wrong though. Maybe there's a way to turn SQL functions
> into analyzable things? But it would still be very easy to shoot
> yourself in the foot by writing one that is too complex.

Arbitrary SQL expressions (including functions) are not the thing to use
for partitioning -- at least that's how I understand this whole
discussion.  I don't think you want to do "proofs" as such -- they are
expensive.

To make this discussion a bit clearer, there are two things to
distinguish: one is routing tuples, when an INSERT or COPY command
references the partitioned table, into the individual partitions
(ingress); the other is deciding which partitions to read when a SELECT
query wants to read tuples from the partitioned table (egress).

On ingress, what you want is something like being able to do something
on the tuple that tells you which partition it belongs into.  Ideally
this is something much lighter than running an expression; if you can
just apply an operator to the partitioning column values, that should be
plenty fast.  This requires no proof.

On egress you need some direct way to compare the scan quals with the
partitioning values.  I would imagine this to be similar to how scan
quals are compared to the values stored in a BRIN index: each scan qual
has a corresponding operator strategy and a scan key, and you can say
"aye" or "nay" based on a small set of operations that can be run
cheaply, again without any proof or running arbitrary expressions.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Fri, Dec 12, 2014 at 7:40 PM, Josh Berkus  wrote:
> On 12/12/2014 02:10 PM, Tom Lane wrote:
>> Actually, I'm not sure that's what we want.  I thought what we really
>> wanted here was to postpone partition-routing decisions to runtime,
>> so that the behavior would be efficient whether or not the decision
>> could be predetermined at plan time.
>>
>> This still leads to the same point Robert is making: the routing
>> decisions have to be cheap and fast.  But it's wrong to think of it
>> in terms of planner proofs.
>
> The other reason I'd really like to have the new partitioning taken out
> of the planner: expressions.
>
> Currently, if you have partitions with constraints on, day,
> "event_date", the following WHERE clause will NOT use CE and will scan
> all partitions:
>
> WHERE event_date BETWEEN ( '2014-12-11' - interval '1 month' ) and
> '2014-12-11'.
>
> This is despite the fact that the expression above gets rewritten to a
> constant by the time the query is executed; by then it's too late.  To
> say nothing of functions like to_timestamp(), now(), etc.
>
> As long as partitions need to be chosen at plan time, I don't see a good
> way to fix the expression problem.

Fair enough, but that's not the same as not requiring easy proofs. The
planner might not the one doing the proofs, but you still need proofs.

Even if the proving method is hardcoded into the partitioning method,
as in the case of list or range partitioning, it's still a proof. With
arbitrary functions (which is what prompted me to mention proofs) you
can't do that. A function works very well for inserting, but not for
selecting.

I could be wrong though. Maybe there's a way to turn SQL functions
into analyzable things? But it would still be very easy to shoot
yourself in the foot by writing one that is too complex.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-12 Thread Josh Berkus

On 12/12/2014 02:10 PM, Tom Lane wrote:
> Actually, I'm not sure that's what we want.  I thought what we really
> wanted here was to postpone partition-routing decisions to runtime,
> so that the behavior would be efficient whether or not the decision
> could be predetermined at plan time.
> 
> This still leads to the same point Robert is making: the routing
> decisions have to be cheap and fast.  But it's wrong to think of it
> in terms of planner proofs.

The other reason I'd really like to have the new partitioning taken out
of the planner: expressions.

Currently, if you have partitions with constraints on, day,
"event_date", the following WHERE clause will NOT use CE and will scan
all partitions:

WHERE event_date BETWEEN ( '2014-12-11' - interval '1 month' ) and
'2014-12-11'.

This is despite the fact that the expression above gets rewritten to a
constant by the time the query is executed; by then it's too late.  To
say nothing of functions like to_timestamp(), now(), etc.

As long as partitions need to be chosen at plan time, I don't see a good
way to fix the expression problem.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Fri, Dec 12, 2014 at 7:10 PM, Tom Lane  wrote:
> Claudio Freire  writes:
>> On Fri, Dec 12, 2014 at 6:48 PM, Robert Haas  wrote:
>>> I have very little idea what the API you're imagining would actually
>>> look like from this description, but it sounds like a terrible idea.
>>> We don't want to make this infinitely general.  We need a *fast* way
>>> to go from a value (or list of values, one per partitioning column) to
>>> a partition OID, and the way to get there is not to call arbitrary
>>> user code.
>
>> I think this was mentioned upthread, but I'll repeat it anyway since
>> it seems to need repeating.
>
>> More than fast, you want it analyzable (by the planner). Ie: it has to
>> be easy to prove partition exclusion against a where clause.
>
> Actually, I'm not sure that's what we want.  I thought what we really
> wanted here was to postpone partition-routing decisions to runtime,
> so that the behavior would be efficient whether or not the decision
> could be predetermined at plan time.
>
> This still leads to the same point Robert is making: the routing
> decisions have to be cheap and fast.  But it's wrong to think of it
> in terms of planner proofs.

You'll need proofs whether at the planner or at the execution engine.

A sequential scan over a partition with a query like

select * from foo where date between X and Y

Would be ripe for that but at some point you need to prove that the
where clause excludes whole partitions. Be it at runtime (while
executing the sequential scan node) or planning time.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-12 Thread Tom Lane

Claudio Freire  writes:
> On Fri, Dec 12, 2014 at 6:48 PM, Robert Haas  wrote:
>> I have very little idea what the API you're imagining would actually
>> look like from this description, but it sounds like a terrible idea.
>> We don't want to make this infinitely general.  We need a *fast* way
>> to go from a value (or list of values, one per partitioning column) to
>> a partition OID, and the way to get there is not to call arbitrary
>> user code.

> I think this was mentioned upthread, but I'll repeat it anyway since
> it seems to need repeating.

> More than fast, you want it analyzable (by the planner). Ie: it has to
> be easy to prove partition exclusion against a where clause.

Actually, I'm not sure that's what we want.  I thought what we really
wanted here was to postpone partition-routing decisions to runtime,
so that the behavior would be efficient whether or not the decision
could be predetermined at plan time.

This still leads to the same point Robert is making: the routing
decisions have to be cheap and fast.  But it's wrong to think of it
in terms of planner proofs.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Fri, Dec 12, 2014 at 6:48 PM, Robert Haas  wrote:
> On Fri, Dec 12, 2014 at 4:28 PM, Jim Nasby  wrote:
>>> Sure.  Mind you, I'm not proposing that the syntax I just mooted is
>>> actually for the best.  What I'm saying is that we need to talk about
>>> it.
>>
>> Frankly, if we're going to require users to explicitly define each partition
>> then I think the most appropriate API would be a function. Users will be
>> writing code to create new partitions as needed, and it's generally easier
>> to write code that calls a function as opposed to glomming a text string
>> together and passing that to EXECUTE.
>
> I have very little idea what the API you're imagining would actually
> look like from this description, but it sounds like a terrible idea.
> We don't want to make this infinitely general.  We need a *fast* way
> to go from a value (or list of values, one per partitioning column) to
> a partition OID, and the way to get there is not to call arbitrary
> user code.

I think this was mentioned upthread, but I'll repeat it anyway since
it seems to need repeating.

More than fast, you want it analyzable (by the planner). Ie: it has to
be easy to prove partition exclusion against a where clause.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-12 Thread Robert Haas

On Fri, Dec 12, 2014 at 4:28 PM, Jim Nasby  wrote:
>> Sure.  Mind you, I'm not proposing that the syntax I just mooted is
>> actually for the best.  What I'm saying is that we need to talk about
>> it.
>
> Frankly, if we're going to require users to explicitly define each partition
> then I think the most appropriate API would be a function. Users will be
> writing code to create new partitions as needed, and it's generally easier
> to write code that calls a function as opposed to glomming a text string
> together and passing that to EXECUTE.

I have very little idea what the API you're imagining would actually
look like from this description, but it sounds like a terrible idea.
We don't want to make this infinitely general.  We need a *fast* way
to go from a value (or list of values, one per partitioning column) to
a partition OID, and the way to get there is not to call arbitrary
user code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-12 Thread Jim Nasby


On 12/12/14, 8:03 AM, Robert Haas wrote:

On Thu, Dec 11, 2014 at 11:43 PM, Amit Langote
  wrote:

>In case of what we would have called a 'LIST' partition, this could look like
>
>... FOR VALUES (val1, val2, val3, ...)
>
>Assuming we only support partition key to contain only one column in such a 
case.
>
>In case of what we would have called a 'RANGE' partition, this could look like
>
>... FOR VALUES (val1min, val2min, ...) TO (val1max, val2max, ...)
>
>How about BETWEEN ... AND ... ?

Sure.  Mind you, I'm not proposing that the syntax I just mooted is
actually for the best.  What I'm saying is that we need to talk about
it.


Frankly, if we're going to require users to explicitly define each partition 
then I think the most appropriate API would be a function. Users will be 
writing code to create new partitions as needed, and it's generally easier to 
write code that calls a function as opposed to glomming a text string together 
and passing that to EXECUTE.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-12 Thread Robert Haas

On Thu, Dec 11, 2014 at 11:43 PM, Amit Langote
 wrote:
> In case of what we would have called a 'LIST' partition, this could look like
>
> ... FOR VALUES (val1, val2, val3, ...)
>
> Assuming we only support partition key to contain only one column in such a 
> case.
>
> In case of what we would have called a 'RANGE' partition, this could look like
>
> ... FOR VALUES (val1min, val2min, ...) TO (val1max, val2max, ...)
>
> How about BETWEEN ... AND ... ?

Sure.  Mind you, I'm not proposing that the syntax I just mooted is
actually for the best.  What I'm saying is that we need to talk about
it.

> I am not sure but perhaps RANGE and LIST as partitioning kinds may as well 
> just be noise keywords. We can parse those values into a parse node such that 
> we don’t have to care about whether they describe partition as being one kind 
> or the other. Say a List of something like,
>
> typedef struct PartitionColumnValue
> {
> NodeTagtype,
> Oid*partitionid,
> char   *partcolname,
> Node   *partrangelower,
> Node   *partrangeupper,
> List   *partlistvalues
> };
>
> Or we could still add a (char) partkind just to say which of the fields 
> matter.
>
> We don't need any defining values here for hash partitions if and when we add 
> support for the same. We would either be using a system-wide common hash 
> function or we could add something with partitioning key definition.

Yeah, range and list partition definitions are very similar, but hash
partition definitions are a different kettle of fish.  I don't think
we really need hash partitioning for anything right away - it's pretty
useless unless you've got, say, a way for the partitions to be foreign
tables living on remote servers - but we shouldn't pick a design that
will make it really hard to add later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-11 Thread Amit Kapila

On Thu, Dec 11, 2014 at 8:42 PM, Robert Haas  wrote:
>
> On Thu, Dec 11, 2014 at 12:00 AM, Amit Kapila 
wrote:
> > Yeah either this way or what Josh has suggested upthread, the main
> > point was that if at all we want to support multi-column list
partitioning
> > then we need to have slightly different syntax, however I feel that we
> > can leave multi-column list partitioning for first version.
>
> Yeah, possibly.
>
> I think we could stand to have a lot more discussion about the syntax
> here.  So far the idea seems to be to copy what Oracle has, but it's
> not clear if we're going to have exactly what Oracle has or something
> subtly different.  I personally don't find the Oracle syntax very
> PostgreSQL-ish.

I share your concern w.r.t the difficulties it can create if we don't
do it carefully (one of the issue you have mentioned upthread about
pg_dump, other such things could cause problems, if not thought
of carefully from the beginning).  One more thing, on a quick check
it seems to me even DB2 uses some-thing similar to Oracle for
defining partitions

CREATE TABLE orders(id INT, shipdate DATE, …)
PARTITION BY RANGE(shipdate)
( PARTITION q4_05 STARTING MINVALUE,
  PARTITION q1_06 STARTING '1/1/2006',
  PARTITION q2_06 STARTING '4/1/2006',
  PARTITION q3_06 STARTING '7/1/2006',
  PARTITION q4_06 STARTING '10/1/2006'
  ENDING ‘12/31/2006' )

I don't think there is any pressing need for PostgreSQL to use
syntax similar to what some of the other databases use, however
it has an advantage for ease of migration and ease of use (as
people are already familiar with using such syntax).

> Stuff like "VALUES LESS THAN 500" doesn't sit
> especially well with me - less than according to which opclass?  Are
> we going to insist that partitioning must use the default btree
> opclass so that we can use that syntax?  That seems kind of lame.
>

Can't we simply specify the opclass along with column name while
specifying partition clause which I feel is something similar to we
already do in CREATE INDEX syntax.

CREATE TABLE sales
 ( invoice_no NUMBER,
   sale_year  INT NOT NULL,
   sale_month INT NOT NULL,
   sale_day   INT NOT NULL )
   PARTITION BY RANGE ( sale_year )
 ( PARTITION sales_q1 VALUES LESS THAN (1999)

Isn't the default operator class for a partition column would fit the
bill for this particular case as the operators required in this syntax
will be quite simple?

> There are lots of interesting things we could do here, e.g.:
>
> CREATE TABLE parent_name PARTITION ON (column [ USING opclass ] [, ... ]);
> CREATE TABLE child_name PARTITION OF parent_name
>FOR { (value, ...) [ TO (value, ...) ] } [, ...];
>

The only thing which slightly bothers me about this syntax is that
it makes apparent that partitions are separate tables and it would
be inconvenient if we choose to disallow some operations on
partitions.  I think it might be better we treat partitions as a way
to divide the large amount of data and users be only given the
option to specify boundaries to divide this data and storage mechanism
of partitions should be an internal detail (something like we do in
TOAST table case).  I am not sure which syntax users will be more
comfortable to use as I am seeing and using Oracle type syntax from
long time so my opinion could be biased in this case.  It would be really
helpful if others who need or use partitioning scheme can share their
inputs.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] On partitioning

2014-12-11 Thread Amit Langote

> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> On Thu, Dec 11, 2014 at 12:00 AM, Amit Kapila 
> wrote:
> > Yeah either this way or what Josh has suggested upthread, the main
> > point was that if at all we want to support multi-column list partitioning
> > then we need to have slightly different syntax, however I feel that we
> > can leave multi-column list partitioning for first version.
> 
> Yeah, possibly.
> 
> I think we could stand to have a lot more discussion about the syntax
> here.  So far the idea seems to be to copy what Oracle has, but it's
> not clear if we're going to have exactly what Oracle has or something
> subtly different.  I personally don't find the Oracle syntax very
> PostgreSQL-ish.  Stuff like "VALUES LESS THAN 500" doesn't sit
> especially well with me - less than according to which opclass?  Are
> we going to insist that partitioning must use the default btree
> opclass so that we can use that syntax?  That seems kind of lame.
> 

Syntax like VALUES LESS THAN 500 also means, we then have to go figure out 
what's that partition's lower bound based on upper bound of the previous one. 
Forget holes in the range if they matter. I expressed that concern elsewhere in 
favour of having available both a range's lower and upper bounds.

> There are lots of interesting things we could do here, e.g.:
> 
> CREATE TABLE parent_name PARTITION ON (column [ USING opclass ] [, ... ]);

So, no PARTITION BY [RANGE | LIST] clause huh?

What we are calling pg_partitioned_rel would obtain following bits of 
information from such a definition of a partitioned relation:

 * column(s) to partition on and respective opclass(es)
 * the level this partitioned relation lies in the partitioning hierarchy
 (determining its relkind and storage qualification)

By the way, I am not sure how we define a partitioning key on a partition (in 
other words, a subpartitioning key on the corresponding partitioned relation). 
Perhaps (only) via ALTER TABLE on a partition relation?

> CREATE TABLE child_name PARTITION OF parent_name
>FOR { (value, ...) [ TO (value, ...) ] } [, ...];
> 

So it's still a CREATE "TABLE" but the part 'PARTITION OF' turns this "table" 
into something having characteristics of a partition relation getting all kinds 
of new treatments at various places. It appears there is a redistribution of 
table-characteristics between a partitioned relation and its partition. We take 
away storage from the former and instead give it to the latter. On the other 
hand, the latter's data is only accessible through the former perhaps with 
escape routes for direct access via some special syntax attached to various 
access commands. We also stand to lose certain abilities with a partitioned 
relation such as not able to define a unique constraint (other than what 
partition key could potentially help ensure) or use it as target of foreign key 
constraint (just reiterating).

What we call pg_partition_def obtains following bits of information from such a 
definition of a partition relation:

 * parent relation (partitioned relation this is partition of)
 * partition kind (do we even want to keep carrying this 
 around as a separate field in catalog?)
 * values this partition holds

The last part being the most important.

In case of what we would have called a 'LIST' partition, this could look like

... FOR VALUES (val1, val2, val3, ...)

Assuming we only support partition key to contain only one column in such a 
case.

In case of what we would have called a 'RANGE' partition, this could look like

... FOR VALUES (val1min, val2min, ...) TO (val1max, val2max, ...)

How about BETWEEN ... AND ... ?

Here we allow a partition key to contain more than one column.

> So instead of making a hard distinction between range and list
> partitioning, you can say:
> 
> CREATE TABLE child_name PARTITION OF parent_name FOR (3), (5), (7);
> CREATE TABLE child2_name PARTITION OF parent_name FOR (8) TO (12);
> CREATE TABLE child2_name PARTITION OF parent_name FOR (20) TO (30),
> (120) TO (130);
> 

I would include the noise keyword VALUES just for readability if anything.

> Now that might be a crappy idea for various reasons, but the point is
> there are a lot of details to be hammered out with the syntax, and
> there are several ways we can go wrong.  If we choose an
> overly-limiting syntax, we're needlessly restricting what can be done.
> If we choose an overly-permissive syntax, we'll restrict the
> optimization opportunities.
> 

I am not sure but perhaps RANGE and LIST as partitioning kinds may as well just 
be noise keywords. We can parse those values into a parse node such that we 
don’t have to care about whether they describe partition as being one kind or 
the other. Say a List of something like,

typedef struct PartitionColumnValue
{
NodeTagtype,
Oid*partitionid,
char   *partcolname,
Node   *partrangelower,
Node   *partrangeup

Re: [HACKERS] On partitioning

2014-12-11 Thread Robert Haas

On Thu, Dec 11, 2014 at 12:00 AM, Amit Kapila  wrote:
> Yeah either this way or what Josh has suggested upthread, the main
> point was that if at all we want to support multi-column list partitioning
> then we need to have slightly different syntax, however I feel that we
> can leave multi-column list partitioning for first version.

Yeah, possibly.

I think we could stand to have a lot more discussion about the syntax
here.  So far the idea seems to be to copy what Oracle has, but it's
not clear if we're going to have exactly what Oracle has or something
subtly different.  I personally don't find the Oracle syntax very
PostgreSQL-ish.  Stuff like "VALUES LESS THAN 500" doesn't sit
especially well with me - less than according to which opclass?  Are
we going to insist that partitioning must use the default btree
opclass so that we can use that syntax?  That seems kind of lame.

There are lots of interesting things we could do here, e.g.:

CREATE TABLE parent_name PARTITION ON (column [ USING opclass ] [, ... ]);
CREATE TABLE child_name PARTITION OF parent_name
   FOR { (value, ...) [ TO (value, ...) ] } [, ...];

So instead of making a hard distinction between range and list
partitioning, you can say:

CREATE TABLE child_name PARTITION OF parent_name FOR (3), (5), (7);
CREATE TABLE child2_name PARTITION OF parent_name FOR (8) TO (12);
CREATE TABLE child2_name PARTITION OF parent_name FOR (20) TO (30),
(120) TO (130);

Now that might be a crappy idea for various reasons, but the point is
there are a lot of details to be hammered out with the syntax, and
there are several ways we can go wrong.  If we choose an
overly-limiting syntax, we're needlessly restricting what can be done.
If we choose an overly-permissive syntax, we'll restrict the
optimization opportunities.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-10 Thread Amit Kapila

On Wed, Dec 10, 2014 at 11:51 PM, Robert Haas  wrote:
>
> On Mon, Dec 8, 2014 at 10:59 PM, Amit Kapila 
wrote:
> > Yeah and also how would user specify the values, as an example
> > assume that table is partitioned on monthly_salary, so partition
> > definition would look:
> >
> > PARTITION BY LIST(monthly_salary)
> > (
> > PARTITION salary_less_than_thousand VALUES(300, 900),
> > PARTITION salary_less_than_two_thousand VALUES (500,1000,1500),
> > ...
> > )
> >
> > Now if user wants to define multi-column Partition based on
> > monthly_salary and annual_salary, how do we want him to
> > specify the values.  Basically how to distinguish which values
> > belong to first column key and which one's belong to second
> > column key.
>
> I assume you just add some parentheses.
>
> PARTITION BY LIST (colA, colB) (PARTITION VALUES ((valA1, valB1),
> (valA2, valB2), (valA3, valB3))
>
> Multi-column list partitioning may or may not be worth implementing,
> but the syntax is not a real problem.
>

Yeah either this way or what Josh has suggested upthread, the main
point was that if at all we want to support multi-column list partitioning
then we need to have slightly different syntax, however I feel that we
can leave multi-column list partitioning for first version.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] On partitioning

2014-12-10 Thread Amit Kapila

On Wed, Dec 10, 2014 at 7:52 PM, Alvaro Herrera 
wrote:
>
> Amit Langote wrote:
>
> > On Wed, Dec 10, 2014 at 12:46 PM, Amit Kapila 
wrote:
> > > On Tue, Dec 9, 2014 at 7:21 PM, Alvaro Herrera <
alvhe...@2ndquadrant.com>
> > > wrote:
>
> > >> FWIW in my original proposal I was rejecting some things that after
> > >> further consideration turn out to be possible to allow; for instance
> > >> directly referencing individual partitions in COPY.  We could allow
> > >> something like
> > >>
> > >> COPY lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT
> > >> or maybe
> > >> COPY PARTITION FOR VALUE '2000-01-01' ON TABLE lineitems TO STDOUT
> > >>
> > > or
> > > COPY [TABLE] lineitems PARTITION FOR VALUE '2000-01-01'  TO STDOUT
> > > COPY [TABLE] lineitems PARTITION   TO STDOUT
> > >
> > > I think we should try to support operations on partitions via main
> > > table whereever it is required.
>
> Um, I think the only difference is that you added the noise word TABLE
> which we currently don't allow in COPY,

Yeah, we could eliminate TABLE keyword from this syntax, the reason
I have kept was for easier understanding of syntax, currently we don't have
concept of PARTITION in COPY syntax, but now if we want to introduce
such a concept, then it might be better to have TABLE keyword for the
purpose of syntax clarity.

> and that you added the
> possibility of using named partitions, about which see below.
>
> > We can also allow to explicitly name a partition
> >
> > COPY [TABLE ] lineitems PARTITION lineitems_2001 TO STDOUT;
>
> The problem with naming partitions is that the user has to pick names
> for every partition, which is tedious and doesn't provide any
> significant benefit.  The input I had from users of other partitioning
> systems was that they very much preferred not to name the partitions at
> all,

It seems to me both Oracle and DB2 supports named partitions, so even
though there are user's which don't prefer named partitions, I suspect
equal or more number of users will be there who will prefer for the sake
of migration and because they are already used to such a syntax.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] On partitioning

On Wed, Dec 10, 2014 at 7:25 PM, Amit Langote
 wrote:
> In heap_create(), do we create storage for a top level partitioned table 
> (say, RELKIND_PARTITIONED_TABLE)? How about a partition that is further 
> sub-partitioned? We might allocate storage for a partition at some point and 
> then later choose to sub-partition it. In such a case, perhaps, we would have 
> to move existing data to the storage of subpartitions and deallocate the 
> partition's storage. In other words only leaf relations in a partition 
> hierarchy would have storage. Is there such a notion within code for some 
> other purpose or we'd have to invent it for partitioning scheme?

I think it would be advantageous to have storage only for the leaf
partitions, because then you don't need to waste time doing a
zero-block sequential scan of the root as part of the append-plan, an
annoyance of the current system.

We have no concept for this right now; in fact, right now, the relkind
fully determines whether a given relation has storage.  One idea is to
make the leaves relkind = 'r' and the interior notes some new relkind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-10 Thread Amit Langote

> From: Robert Haas [mailto:robertmh...@gmail.com]
> On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund
>  wrote:
> >> I don't think that's mutually exclusive with the idea of
> >> partitions-as-tables.  I mean, you can add code to the ALTER TABLE
> >> path that says if (i_am_not_the_partitioning_root) ereport(ERROR, ...)
> >> wherever you want.
> >
> > That'll be a lot of places you'll need to touch. More fundamentally: Why
> > should we name something a table that's not one?
> 
> Well, I'm not convinced that it isn't one.  And adding a new relkind
> will involve a bunch of code churn, too.  But I don't much care to
> pre-litigate this: when someone has got a patch, we can either agree
> that the approach is OK or argue that it is problematic because X.  I
> think we need to hammer down the design in broad strokes first, and
> I'm not sure we're totally there yet.
> 

In heap_create(), do we create storage for a top level partitioned table (say, 
RELKIND_PARTITIONED_TABLE)? How about a partition that is further 
sub-partitioned? We might allocate storage for a partition at some point and 
then later choose to sub-partition it. In such a case, perhaps, we would have 
to move existing data to the storage of subpartitions and deallocate the 
partition's storage. In other words only leaf relations in a partition 
hierarchy would have storage. Is there such a notion within code for some other 
purpose or we'd have to invent it for partitioning scheme?

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Mon, Dec 8, 2014 at 10:59 PM, Amit Kapila  wrote:
> Yeah and also how would user specify the values, as an example
> assume that table is partitioned on monthly_salary, so partition
> definition would look:
>
> PARTITION BY LIST(monthly_salary)
> (
> PARTITION salary_less_than_thousand VALUES(300, 900),
> PARTITION salary_less_than_two_thousand VALUES (500,1000,1500),
> ...
> )
>
> Now if user wants to define multi-column Partition based on
> monthly_salary and annual_salary, how do we want him to
> specify the values.  Basically how to distinguish which values
> belong to first column key and which one's belong to second
> column key.

I assume you just add some parentheses.

PARTITION BY LIST (colA, colB) (PARTITION VALUES ((valA1, valB1),
(valA2, valB2), (valA3, valB3))

Multi-column list partitioning may or may not be worth implementing,
but the syntax is not a real problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Mon, Dec 8, 2014 at 5:05 PM, Jim Nasby  wrote:
> Agreed, but it's possible to keep a block/CTID interface while doing
> something different on the disk.

Objection: hand-waving.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Wed, Dec 10, 2014 at 9:22 AM, Alvaro Herrera
 wrote:
> The problem with naming partitions is that the user has to pick names
> for every partition, which is tedious and doesn't provide any
> significant benefit.  The input I had from users of other partitioning
> systems was that they very much preferred not to name the partitions at
> all, which is why I chose the PARTITION FOR VALUE syntax (not sure if
> this syntax is exactly what other systems use; it just seemed the
> natural choice.)

FWIW, Oracle does name partitions.  It generates the names
automatically if you don't care to specify them, and the partition
names for a given table live in their own namespace that is separate
from the toplevel object namespace.  For example:

CREATE TABLE sales
 ( invoice_no NUMBER,
   sale_year  INT NOT NULL,
   sale_month INT NOT NULL,
   sale_day   INT NOT NULL )
   STORAGE (INITIAL 100K NEXT 50K) LOGGING
   PARTITION BY RANGE ( sale_year, sale_month, sale_day)
 ( PARTITION sales_q1 VALUES LESS THAN ( 1999, 04, 01 )
TABLESPACE tsa STORAGE (INITIAL 20K, NEXT 10K),
   PARTITION sales_q2 VALUES LESS THAN ( 1999, 07, 01 )
TABLESPACE tsb,
   PARTITION sales_q3 VALUES LESS THAN ( 1999, 10, 01 )
TABLESPACE tsc,
   PARTITION sales q4 VALUES LESS THAN ( 2000, 01, 01 )
TABLESPACE tsd)
   ENABLE ROW MOVEMENT;

I don't think this practice has much to recommend it.  We're going to
need a way to refer to individual partitions by name, and I don't see
much benefit in making that name something other than what is stored
in pg_class.relname.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-10 Thread Alvaro Herrera

Amit Langote wrote:

> On Wed, Dec 10, 2014 at 12:46 PM, Amit Kapila  wrote:
> > On Tue, Dec 9, 2014 at 7:21 PM, Alvaro Herrera 
> > wrote:

> >> FWIW in my original proposal I was rejecting some things that after
> >> further consideration turn out to be possible to allow; for instance
> >> directly referencing individual partitions in COPY.  We could allow
> >> something like
> >>
> >> COPY lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT
> >> or maybe
> >> COPY PARTITION FOR VALUE '2000-01-01' ON TABLE lineitems TO STDOUT
> >>
> > or
> > COPY [TABLE] lineitems PARTITION FOR VALUE '2000-01-01'  TO STDOUT
> > COPY [TABLE] lineitems PARTITION   TO STDOUT
> >
> > I think we should try to support operations on partitions via main
> > table whereever it is required.

Um, I think the only difference is that you added the noise word TABLE
which we currently don't allow in COPY, and that you added the
possibility of using named partitions, about which see below.

> We can also allow to explicitly name a partition
> 
> COPY [TABLE ] lineitems PARTITION lineitems_2001 TO STDOUT;

The problem with naming partitions is that the user has to pick names
for every partition, which is tedious and doesn't provide any
significant benefit.  The input I had from users of other partitioning
systems was that they very much preferred not to name the partitions at
all, which is why I chose the PARTITION FOR VALUE syntax (not sure if
this syntax is exactly what other systems use; it just seemed the
natural choice.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-09 Thread Amit Langote



On Wed, Dec 10, 2014 at 12:46 PM, Amit Kapila  wrote:
> On Tue, Dec 9, 2014 at 7:21 PM, Alvaro Herrera 
> wrote:
>>
>> Amit Kapila wrote:
>> > On Tue, Dec 9, 2014 at 1:42 AM, Robert Haas 
>> > wrote:
>> > > On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund 
>> > wrote:
>> > > >> I don't think that's mutually exclusive with the idea of
>> > > >> partitions-as-tables.  I mean, you can add code to the ALTER TABLE
>> > > >> path that says if (i_am_not_the_partitioning_root) ereport(ERROR,
>> > > >> ...)
>> > > >> wherever you want.
>> > > >
>> > > > That'll be a lot of places you'll need to touch. More fundamentally:
>> > > > Why
>> > > > should we name something a table that's not one?
>> > >
>> > > Well, I'm not convinced that it isn't one.  And adding a new relkind
>> > > will involve a bunch of code churn, too.  But I don't much care to
>> > > pre-litigate this: when someone has got a patch, we can either agree
>> > > that the approach is OK or argue that it is problematic because X.  I
>> > > think we need to hammer down the design in broad strokes first, and
>> > > I'm not sure we're totally there yet.
>> >
>> > That's right, I think at this point defining the top level
>> > behaviour/design
>> > is very important to proceed, we can decide about the better
>> > implementation approach afterwards (may be once initial patch is ready,
>> > because it might not be a major work to do it either way).  So here's
>> > where
>> > we are on this point till now as per my understanding, I think that
>> > direct
>> > operations should be prohibited on partitions, you think that they
>> > should be
>> > allowed and Andres think that it might be better to allow direct
>> > operations
>> > on partitions for Read.
>>
>> FWIW in my original proposal I was rejecting some things that after
>> further consideration turn out to be possible to allow; for instance
>> directly referencing individual partitions in COPY.  We could allow
>> something like
>>
>> COPY lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT
>> or maybe
>> COPY PARTITION FOR VALUE '2000-01-01' ON TABLE lineitems TO STDOUT
>>
> or
> COPY [TABLE] lineitems PARTITION FOR VALUE '2000-01-01'  TO STDOUT
> COPY [TABLE] lineitems PARTITION   TO STDOUT
>
> I think we should try to support operations on partitions via main
> table whereever it is required.
>

We can also allow to explicitly name a partition

COPY [TABLE ] lineitems PARTITION lineitems_2001 TO STDOUT;

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-09 Thread Amit Langote



On Wed, Dec 10, 2014 at 12:33 PM, Amit Kapila  wrote:
> On Tue, Dec 9, 2014 at 11:44 PM, Josh Berkus  wrote:
>> On 12/09/2014 12:17 AM, Amit Langote wrote:
>> >> Now if user wants to define multi-column Partition based on
>> >> > monthly_salary and annual_salary, how do we want him to
>> >> > specify the values.  Basically how to distinguish which values
>> >> > belong to first column key and which one's belong to second
>> >> > column key.
>> >> >
>> > Perhaps you are talking about "syntactic" difficulties that I totally
>> > missed in my other reply to this mail?
>> >
>> > Can we represent the same data by rather using a subpartitioning scheme?
>> > ISTM, semantics would remain the same.
>> >
>> > ... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)?
>>
>
> Using SUBPARTITION is not the answer for multi-column partition,
> I think if we have to support it for List partitioning then something
> on lines what Josh has mentioned below could workout, but I don't
> think it is important to support multi-column partition for List at this
> stage.  
>

Yeah, I realize multicolumn list partitioning and list-list composite 
partitioning are different things in many respects. And given how awkward 
multicolumn list partitioning is looking to implement, I also think we only 
allow single column in a list partition key.

>> ... or just use arrays.
>>
>> PARTITION BY LIST ( monthly_salary, annual_salary )
>> PARTITION salary_small VALUES ({[300,400],[5000,6000]})
>> ) 
>>
>> ... but that begs the question of how partition by list over two columns
>> (or more) would even work?  You'd need an a*b number of partitions, and
>> the user would be pretty much certain to miss a few value combinations.
>>  Maybe we should just restrict list partitioning to a single column for
>> a first release, and wait and see if people ask for more?
>>
>
> I also think we should not support multi-column list partition in first
> release.
>

Yes.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-09 Thread Amit Kapila

On Tue, Dec 9, 2014 at 7:21 PM, Alvaro Herrera 
wrote:
>
> Amit Kapila wrote:
> > On Tue, Dec 9, 2014 at 1:42 AM, Robert Haas 
wrote:
> > > On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund 
> > wrote:
> > > >> I don't think that's mutually exclusive with the idea of
> > > >> partitions-as-tables.  I mean, you can add code to the ALTER TABLE
> > > >> path that says if (i_am_not_the_partitioning_root) ereport(ERROR,
...)
> > > >> wherever you want.
> > > >
> > > > That'll be a lot of places you'll need to touch. More
fundamentally: Why
> > > > should we name something a table that's not one?
> > >
> > > Well, I'm not convinced that it isn't one.  And adding a new relkind
> > > will involve a bunch of code churn, too.  But I don't much care to
> > > pre-litigate this: when someone has got a patch, we can either agree
> > > that the approach is OK or argue that it is problematic because X.  I
> > > think we need to hammer down the design in broad strokes first, and
> > > I'm not sure we're totally there yet.
> >
> > That's right, I think at this point defining the top level
behaviour/design
> > is very important to proceed, we can decide about the better
> > implementation approach afterwards (may be once initial patch is ready,
> > because it might not be a major work to do it either way).  So here's
where
> > we are on this point till now as per my understanding, I think that
direct
> > operations should be prohibited on partitions, you think that they
should be
> > allowed and Andres think that it might be better to allow direct
operations
> > on partitions for Read.
>
> FWIW in my original proposal I was rejecting some things that after
> further consideration turn out to be possible to allow; for instance
> directly referencing individual partitions in COPY.  We could allow
> something like
>
> COPY lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT
> or maybe
> COPY PARTITION FOR VALUE '2000-01-01' ON TABLE lineitems TO STDOUT
>
or
COPY [TABLE] lineitems PARTITION FOR VALUE '2000-01-01'  TO STDOUT
COPY [TABLE] lineitems PARTITION   TO STDOUT

I think we should try to support operations on partitions via main
table whereever it is required.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] On partitioning

2014-12-09 Thread Amit Kapila

On Tue, Dec 9, 2014 at 11:44 PM, Josh Berkus  wrote:
> On 12/09/2014 12:17 AM, Amit Langote wrote:
> >> Now if user wants to define multi-column Partition based on
> >> > monthly_salary and annual_salary, how do we want him to
> >> > specify the values.  Basically how to distinguish which values
> >> > belong to first column key and which one's belong to second
> >> > column key.
> >> >
> > Perhaps you are talking about "syntactic" difficulties that I totally
missed in my other reply to this mail?
> >
> > Can we represent the same data by rather using a subpartitioning
scheme? ISTM, semantics would remain the same.
> >
> > ... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)?
>

Using SUBPARTITION is not the answer for multi-column partition,
I think if we have to support it for List partitioning then something
on lines what Josh has mentioned below could workout, but I don't
think it is important to support multi-column partition for List at this
stage.

> ... or just use arrays.
>
> PARTITION BY LIST ( monthly_salary, annual_salary )
> PARTITION salary_small VALUES ({[300,400],[5000,6000]})
> ) 
>
> ... but that begs the question of how partition by list over two columns
> (or more) would even work?  You'd need an a*b number of partitions, and
> the user would be pretty much certain to miss a few value combinations.
>  Maybe we should just restrict list partitioning to a single column for
> a first release, and wait and see if people ask for more?
>

I also think we should not support multi-column list partition in first
release.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] On partitioning

2014-12-09 Thread Jim Nasby


On 12/8/14, 5:19 PM, Josh Berkus wrote:

On 12/08/2014 02:12 PM, Jim Nasby wrote:

On 12/8/14, 12:26 PM, Josh Berkus wrote:

4. Creation Locking Problem
high probability of lock pile-ups whenever a new partition is created on
demand due to multiple backends trying to create the partition at the
same time.
Not Addressed?


Do users actually try and create new partitions during DML? That sounds
doomed to failure in pretty much any system...


There is no question that it would be easier for users to create
partitions on demand automatically.  Particularly if you're partitioning
by something other than time.  For a particular case, consider users on
RDS, which has no cron jobs for creating new partitons; it's on demand
or manually.

It's quite possible that there is no good way to work out the locking
for on-demand partitions though, but *if* we're going to have a 2nd
partition system, I think it's important to at least discuss the
problems with on-demand creation.


Yeah, we should discuss it. Perhaps the right answer here may be our own job 
scheduler, something a lot of folks want anyway.


11. Hash Partitioning
Some users would prefer to partition into a fixed number of
hash-allocated partitions.
Not Addressed.


Though, you should be able to do that in either system if you bother to
define your own hash in a BEFORE trigger...


That doesn't do you any good with the SELECT query, unless you change
your middleware to add a hash(column) to every query.  Which would be
really hard to do for joins.


A. COPY/ETL then attach
In inheritance partitioning, you can easily build a partition outside
the master and then "attach" it, allowing for minimal disturbance of
concurrent users.  Could be addressed in the future.


How much of the desire for this is because our current "row routing"
solutions are very slow? I suspect that's the biggest reason, and
hopefully Alvaro's proposal mostly eliminates it.


That doesn't always work, though.  In some cases the partition is being
built using some fairly complex logic (think of partitions which are
based on matviews) and there's no fast way to create the new data.
Again, this is an acceptable casualty of an improved design, but if it
will be so, we should consciously decide that.


Is there an example you can give here? If the scheme is that complicated I'm 
failing to see how you're supposed to do things like partition elimination.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-09 Thread Josh Berkus

On 12/09/2014 12:17 AM, Amit Langote wrote:
>> Now if user wants to define multi-column Partition based on
>> > monthly_salary and annual_salary, how do we want him to
>> > specify the values.  Basically how to distinguish which values
>> > belong to first column key and which one's belong to second
>> > column key.
>> >
> Perhaps you are talking about "syntactic" difficulties that I totally missed 
> in my other reply to this mail?
> 
> Can we represent the same data by rather using a subpartitioning scheme? 
> ISTM, semantics would remain the same.
> 
> ... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)?

... or just use arrays.

PARTITION BY LIST ( monthly_salary, annual_salary )
PARTITION salary_small VALUES ({[300,400],[5000,6000]})
) 

... but that begs the question of how partition by list over two columns
(or more) would even work?  You'd need an a*b number of partitions, and
the user would be pretty much certain to miss a few value combinations.
 Maybe we should just restrict list partitioning to a single column for
a first release, and wait and see if people ask for more?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-09 Thread Alvaro Herrera

Amit Kapila wrote:
> On Tue, Dec 9, 2014 at 1:42 AM, Robert Haas  wrote:
> > On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund 
> wrote:
> > >> I don't think that's mutually exclusive with the idea of
> > >> partitions-as-tables.  I mean, you can add code to the ALTER TABLE
> > >> path that says if (i_am_not_the_partitioning_root) ereport(ERROR, ...)
> > >> wherever you want.
> > >
> > > That'll be a lot of places you'll need to touch. More fundamentally: Why
> > > should we name something a table that's not one?
> >
> > Well, I'm not convinced that it isn't one.  And adding a new relkind
> > will involve a bunch of code churn, too.  But I don't much care to
> > pre-litigate this: when someone has got a patch, we can either agree
> > that the approach is OK or argue that it is problematic because X.  I
> > think we need to hammer down the design in broad strokes first, and
> > I'm not sure we're totally there yet.
> 
> That's right, I think at this point defining the top level behaviour/design
> is very important to proceed, we can decide about the better
> implementation approach afterwards (may be once initial patch is ready,
> because it might not be a major work to do it either way).  So here's where
> we are on this point till now as per my understanding, I think that direct
> operations should be prohibited on partitions, you think that they should be
> allowed and Andres think that it might be better to allow direct operations
> on partitions for Read.

FWIW in my original proposal I was rejecting some things that after
further consideration turn out to be possible to allow; for instance
directly referencing individual partitions in COPY.  We could allow
something like

COPY lineitems PARTITION FOR VALUE '2000-01-01' TO STDOUT
or maybe
COPY PARTITION FOR VALUE '2000-01-01' ON TABLE lineitems TO STDOUT

and this would emit the whole partition for year 2000 of table
lineitems, and only that (the value is just computed on the fly to fit
the partitioning constraints for that individual partition).  Then
pg_dump would be able to dump each and every partition separately.

In a similar way we could have COPY FROM allow input into individual
partitions so that such a dump can be restored in parallel for each
partition.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-09 Thread Alvaro Herrera

Josh Berkus wrote:

Hi,

> Pardon me for jumping into this late.  In general, I like Alvaro's
> approach.

Please don't call this "Alvaro's approach" as I'm not involved in this
anymore.  Amit Langote has taken ownership of it now.  While some
resemblance to what I originally proposed might remain, I haven't kept
track of how this has evolved and this might be a totally different
thing now.  Or not.

Anyway I just wanted to comment on a single point:

> 6. Unique Index Problem
> Cannot create a unique index across multiple partitions, which prevents
> the partitioned table from being FK'd.
> Not Addressed
> (but could be addressed in the future)

I think it's unlikely that we will ever create a unique index that spans
all the partitions, actually.  Even if there are some wild ideas on how
to implement such a thing, the number of difficult issues that no one
knows how to attack seems too large.  I would perhaps be thinking in
allowing foreign keys to be defined on column sets that are prefixed by
partition keys; unique indexes must exist on all partitions on the same
columns including the partition keys.  (Perhaps make an extra exception
that if a partition allows a single value for the partition column, that
column need not be part of the unique index.)

> 10. Scaling Problem
> Inheritance partitioning becomes prohibitively slow for the planner at
> somewhere between 100 and 500 partitions depending on various factors.
> No idea?

At least it was my intention to make the system scale to huge number of
partitions, but this requires some forward thinking (such as avoiding
loading the index list of all of them or evern opening all of them at
the planner stage) and I think would be defeated if we want to keep
all the generality of the inheritance-based approach.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-09 Thread Amit Langote



On Tue, Dec 9, 2014 at 12:59 PM, Amit Kapila  wrote:
> On Tue, Dec 9, 2014 at 8:08 AM, Amit Langote 
> wrote:
>> > From: Robert Haas [mailto:robertmh...@gmail.com]
>> > On Sat, Dec 6, 2014 at 2:59 AM, Amit Kapila 
>> > wrote:
>> > >> I guess you could list or hash partition on multiple columns, too.
>> > >
>> > > How would you distinguish values in list partition for multiple
>> > > columns? I mean for range partition, we are sure there will
>> > > be either one value for each column, but for list it could
>> > > be multiple and not fixed for each partition, so I think it will not
>> > > be easy to support the multicolumn partition key for list
>> > > partitions.
>> >
>> > I don't understand.  If you want to range partition on columns (a, b),
>> > you say that, say, tuples with (a, b) values less than (100, 200) go
>> > here and the rest go elsewhere.  For list partitioning, you say that,
>> > say, tuples with (a, b) values of EXACTLY (100, 200) go here and the
>> > rest go elsewhere.  I'm not sure how useful that is but it's not
>> > illogical.
>> >
>>
>> In case of list partitioning, 100 and 200 would respectively be one of the
>> values in lists of allowed values for a and b. I thought his concern is
>> whether this "list of values for each column in partkey" is as convenient to
>> store and manipulate as range partvalues.
>>
>
> Yeah and also how would user specify the values, as an example
> assume that table is partitioned on monthly_salary, so partition
> definition would look:
>
> PARTITION BY LIST(monthly_salary)
> (
> PARTITION salary_less_than_thousand VALUES(300, 900),
> PARTITION salary_less_than_two_thousand VALUES (500,1000,1500),
> ...
> )
>
> Now if user wants to define multi-column Partition based on
> monthly_salary and annual_salary, how do we want him to
> specify the values.  Basically how to distinguish which values
> belong to first column key and which one's belong to second
> column key.
>

Perhaps you are talking about "syntactic" difficulties that I totally missed in 
my other reply to this mail?

Can we represent the same data by rather using a subpartitioning scheme? ISTM, 
semantics would remain the same.

... PARTITION BY (monthly_salary) SUBPARTITION BY (annual_salary)?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-08 Thread Amit Langote


On Tue, Dec 9, 2014 at 12:59 PM, Amit Kapila  wrote:
> On Tue, Dec 9, 2014 at 8:08 AM, Amit Langote 
> wrote:
>> > From: Robert Haas [mailto:robertmh...@gmail.com]
>> > I don't understand.  If you want to range partition on columns (a, b),
>> > you say that, say, tuples with (a, b) values less than (100, 200) go
>> > here and the rest go elsewhere.  For list partitioning, you say that,
>> > say, tuples with (a, b) values of EXACTLY (100, 200) go here and the
>> > rest go elsewhere.  I'm not sure how useful that is but it's not
>> > illogical.
>> >
>>
>> In case of list partitioning, 100 and 200 would respectively be one of the
>> values in lists of allowed values for a and b. I thought his concern is
>> whether this "list of values for each column in partkey" is as convenient to
>> store and manipulate as range partvalues.
>>
>
> Yeah and also how would user specify the values, as an example
> assume that table is partitioned on monthly_salary, so partition
> definition would look:
>
> PARTITION BY LIST(monthly_salary)
> (
> PARTITION salary_less_than_thousand VALUES(300, 900),
> PARTITION salary_less_than_two_thousand VALUES (500,1000,1500),
> ...
> )
>
> Now if user wants to define multi-column Partition based on
> monthly_salary and annual_salary, how do we want him to
> specify the values.  Basically how to distinguish which values
> belong to first column key and which one's belong to second
> column key.
>

Amit, in one of my earlier replies to your question of why we may not want to 
implement multi-column list partitioning (lack of user interest in the feature 
or possible complexity of the code), I tried to explain how that may work if we 
do choose to go that way. Basically, something we may call PartitionColumnValue 
should be such that above issue can be suitably sorted out.

For example, a partition defining/bounding value would be a pg_node_tree 
representation of List of one of the (say) following parse nodes as appropriate 
- 

typedef struct PartitionColumnValue
{
NodeTag type,
Oid *partitionid,
char*partcolname,
charpartkind,
Node*partrangelower,
Node*partrangeupper,
List*partlistvalues
};

OR separately,

typedef struct RangePartitionColumnValue
{
NodeTag type,
Oid *partitionid,
char*partcolname,
Node*partrangelower,
Node*partrangeupper
};

& 

typedef struct ListPartitionColumnValue
{
NodeTag type,
Oid *partitionid,
char*partcolname,
List*partlistvalues
};

Where a partition definition would look like

typedef struct PartitionDef
{
NodeTag type,
RangeVarpartition,
RangeVarparentrel,
char*kind,
Node*values,
List*options,
char*tablespacename
};

PartitionDef.values is an (ordered) List of PartitionColumnValue each of which 
corresponds to one column in the partition key in that order.

We should be able to devise a way to load the pg_node_tree representation of  
PartitionDef.values (on-disk pg_partition_def.partvalues) into relcache using a 
"suitable data structure" so that it becomes readily usable in variety of 
contexts that we are interested in using this information. 

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-08 Thread Amit Kapila

On Tue, Dec 9, 2014 at 1:42 AM, Robert Haas  wrote:
> On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund 
wrote:
> >> I don't think that's mutually exclusive with the idea of
> >> partitions-as-tables.  I mean, you can add code to the ALTER TABLE
> >> path that says if (i_am_not_the_partitioning_root) ereport(ERROR, ...)
> >> wherever you want.
> >
> > That'll be a lot of places you'll need to touch. More fundamentally: Why
> > should we name something a table that's not one?
>
> Well, I'm not convinced that it isn't one.  And adding a new relkind
> will involve a bunch of code churn, too.  But I don't much care to
> pre-litigate this: when someone has got a patch, we can either agree
> that the approach is OK or argue that it is problematic because X.  I
> think we need to hammer down the design in broad strokes first, and
> I'm not sure we're totally there yet.

That's right, I think at this point defining the top level behaviour/design
is very important to proceed, we can decide about the better
implementation approach afterwards (may be once initial patch is ready,
because it might not be a major work to do it either way).  So here's where
we are on this point till now as per my understanding, I think that direct
operations should be prohibited on partitions, you think that they should be
allowed and Andres think that it might be better to allow direct operations
on partitions for Read.

>
> >> - Direct access to individual partitions to bypass
> >> tuple-routing/query-planning overhead.
> >
> > I think that might be ok in some cases, but in general I'd be very wary
> > to allow that. I think it might be ok to allow direct read access, but
> > everything else I'd be opposed. I'd much rather go the route of allowing
> > to few things and then gradually opening up if required than the other
> > way round (as that pretty much will never happen because it'll break
> > deployed systems).
>
> Why?
>

Because I think it will be difficult for users to write/maintain more of
such
code, which is one of the complaints with previous system where user
needs to write triggers to route the tuple to appropriate partition.
I think in first step we should try to improve the tuple routing algorithm
so that it is not pain for users or atleast it should be at par with some of
the other competitive database systems and if we are not able
to come up with such an implementation, then may be we can think of
providing it as a special way for users to improve performance.

Another reason is that fundamentally partitions are managed internally
to divide the user data in a way so that access could be cheaper and we
take the specifications for defining the partitions from users and allowing
operations on internally managed objects could lead to user writing quite
some code to do what database actually does internally.  If we see that
TOAST table are internally used to manage large tuples, however we
don't want users to directly perform dml on those tables.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] On partitioning

2014-12-08 Thread Amit Kapila

On Tue, Dec 9, 2014 at 8:08 AM, Amit Langote 
wrote:
> > From: Robert Haas [mailto:robertmh...@gmail.com]
> > On Sat, Dec 6, 2014 at 2:59 AM, Amit Kapila 
> > wrote:
> > >> I guess you could list or hash partition on multiple columns, too.
> > >
> > > How would you distinguish values in list partition for multiple
> > > columns? I mean for range partition, we are sure there will
> > > be either one value for each column, but for list it could
> > > be multiple and not fixed for each partition, so I think it will not
> > > be easy to support the multicolumn partition key for list
> > > partitions.
> >
> > I don't understand.  If you want to range partition on columns (a, b),
> > you say that, say, tuples with (a, b) values less than (100, 200) go
> > here and the rest go elsewhere.  For list partitioning, you say that,
> > say, tuples with (a, b) values of EXACTLY (100, 200) go here and the
> > rest go elsewhere.  I'm not sure how useful that is but it's not
> > illogical.
> >
>
> In case of list partitioning, 100 and 200 would respectively be one of
the values in lists of allowed values for a and b. I thought his concern is
whether this "list of values for each column in partkey" is as convenient
to store and manipulate as range partvalues.
>

Yeah and also how would user specify the values, as an example
assume that table is partitioned on monthly_salary, so partition
definition would look:

PARTITION BY LIST(monthly_salary)
(
PARTITION salary_less_than_thousand VALUES(300, 900),
PARTITION salary_less_than_two_thousand VALUES (500,1000,1500),
...
)

Now if user wants to define multi-column Partition based on
monthly_salary and annual_salary, how do we want him to
specify the values.  Basically how to distinguish which values
belong to first column key and which one's belong to second
column key.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] On partitioning

2014-12-08 Thread Amit Langote


> From: Robert Haas [mailto:robertmh...@gmail.com]
> On Sat, Dec 6, 2014 at 2:59 AM, Amit Kapila 
> wrote:
> >> I guess you could list or hash partition on multiple columns, too.
> >
> > How would you distinguish values in list partition for multiple
> > columns? I mean for range partition, we are sure there will
> > be either one value for each column, but for list it could
> > be multiple and not fixed for each partition, so I think it will not
> > be easy to support the multicolumn partition key for list
> > partitions.
> 
> I don't understand.  If you want to range partition on columns (a, b),
> you say that, say, tuples with (a, b) values less than (100, 200) go
> here and the rest go elsewhere.  For list partitioning, you say that,
> say, tuples with (a, b) values of EXACTLY (100, 200) go here and the
> rest go elsewhere.  I'm not sure how useful that is but it's not
> illogical.
> 

In case of list partitioning, 100 and 200 would respectively be one of the 
values in lists of allowed values for a and b. I thought his concern is whether 
this "list of values for each column in partkey" is as convenient to store and 
manipulate as range partvalues. 

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On 12/08/2014 02:12 PM, Jim Nasby wrote:
> On 12/8/14, 12:26 PM, Josh Berkus wrote:
>> 4. Creation Locking Problem
>> high probability of lock pile-ups whenever a new partition is created on
>> demand due to multiple backends trying to create the partition at the
>> same time.
>> Not Addressed?
> 
> Do users actually try and create new partitions during DML? That sounds
> doomed to failure in pretty much any system...

There is no question that it would be easier for users to create
partitions on demand automatically.  Particularly if you're partitioning
by something other than time.  For a particular case, consider users on
RDS, which has no cron jobs for creating new partitons; it's on demand
or manually.

It's quite possible that there is no good way to work out the locking
for on-demand partitions though, but *if* we're going to have a 2nd
partition system, I think it's important to at least discuss the
problems with on-demand creation.

>> 11. Hash Partitioning
>> Some users would prefer to partition into a fixed number of
>> hash-allocated partitions.
>> Not Addressed.
> 
> Though, you should be able to do that in either system if you bother to
> define your own hash in a BEFORE trigger...

That doesn't do you any good with the SELECT query, unless you change
your middleware to add a hash(column) to every query.  Which would be
really hard to do for joins.

>> A. COPY/ETL then attach
>> In inheritance partitioning, you can easily build a partition outside
>> the master and then "attach" it, allowing for minimal disturbance of
>> concurrent users.  Could be addressed in the future.
> 
> How much of the desire for this is because our current "row routing"
> solutions are very slow? I suspect that's the biggest reason, and
> hopefully Alvaro's proposal mostly eliminates it.

That doesn't always work, though.  In some cases the partition is being
built using some fairly complex logic (think of partitions which are
based on matviews) and there's no fast way to create the new data.
Again, this is an acceptable casualty of an improved design, but if it
will be so, we should consciously decide that.

>> B. Catchall Partition
>> Many partitioning schemes currently contain a "catchall" partition which
>> accepts rows outside of the range of the partitioning scheme, due to bad
>> input data.  Probably not handled on purpose; Alvaro is proposing that
>> we reject these instead, or create the partitions on demand, which is a
>> legitimate approach.
>>
>> C. Asymmetric Partitioning / NULLs in partition column
>> This is the classic Active/Inactive By Month setup for partitions.
>> Could be addressed via special handling for NULL/infinity in the
>> partitioned column.
> 
> If we allowed for a "catchall partition" and supported normal
> inheritance/triggers on that partition then users could continue to do
> whatever they needed with data that didn't fit the "normal" partitioning
> pattern.

That sounds to me like it would fall under the heading of "impossible
levels of backwards-compatibility".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-08 Thread Jim Nasby

On 12/8/14, 12:26 PM, Josh Berkus wrote:

4. Creation Locking Problem
high probability of lock pile-ups whenever a new partition is created on
demand due to multiple backends trying to create the partition at the
same time.
Not Addressed?

Do users actually try and create new partitions during DML? That sounds doomed
to failure in pretty much any system...

6. Unique Index Problem
Cannot create a unique index across multiple partitions, which prevents
the partitioned table from being FK'd.
Not Addressed
(but could be addressed in the future)

And would be extremely useful even with simple inheritance, let alone
partitioning...

9. Hibernate Problem
When using the trigger method, inserts into the master partition return
0, which Hibernate and some other ORMs regard as an insert failure.
Addressed.

It would be really nice to address this with regular inheritance too...

11. Hash Partitioning
Some users would prefer to partition into a fixed number of
hash-allocated partitions.
Not Addressed.

Though, you should be able to do that in either system if you bother to define
your own hash in a BEFORE trigger...

A. COPY/ETL then attach
In inheritance partitioning, you can easily build a partition outside
the master and then "attach" it, allowing for minimal disturbance of
concurrent users. Could be addressed in the future.

How much of the desire for this is because our current "row routing" solutions
are very slow? I suspect that's the biggest reason, and hopefully Alvaro's proposal
mostly eliminates it.

B. Catchall Partition
Many partitioning schemes currently contain a "catchall" partition which
accepts rows outside of the range of the partitioning scheme, due to bad
input data. Probably not handled on purpose; Alvaro is proposing that
we reject these instead, or create the partitions on demand, which is a
legitimate approach.

C. Asymmetric Partitioning / NULLs in partition column
This is the classic Active/Inactive By Month setup for partitions.
Could be addressed via special handling for NULL/infinity in the
partitioned column.

If we allowed for a "catchall partition" and supported normal inheritance/triggers on
that partition then users could continue to do whatever they needed with data that didn't fit the
"normal" partitioning pattern.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-08 Thread Jim Nasby


On 12/8/14, 1:05 PM, Robert Haas wrote:

Besides, I haven't really seen anyone propose something that sounds
like a credible alternative.  If we could make partition objects
things that the storage layer needs to know about but the query
planner doesn't need to understand, that'd be maybe worth considering.
But I don't see any way that that's remotely feasible.  There are lots
of places that we assume that a heap consists of blocks number 0 up
through N: CTID pointers, index-to-heap pointers, nodeSeqScan, bits
and pieces of the way index vacuuming is handled, which in turn bleeds
into Hot Standby.  You can't just decide that now block numbers are
going to be replaced by some more complex structure, or even that
they're now going to be nonlinear, without breaking a huge amount of
stuff.


Agreed, but it's possible to keep a block/CTID interface while doing something 
different on the disk.

If you think about it, partitioning is really a hack anyway. It clutters up 
your logical set implementation with a bunch of physical details. What most 
people really want when they implement partitioning is simply data locality.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Mon, Dec 8, 2014 at 2:58 PM, Josh Berkus  wrote:
>> I think any new partitioning system should keep the good things about
>> the existing system, of which there are some, and not try to reinvent
>> the wheel.  The yard stick for a new system shouldn't be "is this
>> different enough?" but "does this solve the problems without creating
>> new ones?".
>
> It's unrealistic to assume that a new system would support all of the
> features of the existing inheritance partitioning without restriction.
>  In fact, I'd say that such a requirement amounts to saying "don't
> bother trying".
>
> For example, inheritance allows us to have different indexes,
> constraints, and even columns on partitions.  We can have overlapping
> partitions, and heterogenous multilevel partitioning (partition this
> customer by month but partition that customer by week).  We can even add
> triggers on individual partitions to reroute data away from a specific
> partition.   A requirement to support all of these peculiar uses of
> inheritance partitioning would doom any new partitioning project.

I don't think it has to be possible to support every use case that we
can support today; clearly, a part of the goal here is to be LESS
general so that we can be more performant.  But I think the urge to
change too many things at once had better be tempered by a clear-eyed
vision of what can reasonably be accomplished in one patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Mon, Dec 8, 2014 at 2:56 PM, Andres Freund  wrote:
>> I don't think that's mutually exclusive with the idea of
>> partitions-as-tables.  I mean, you can add code to the ALTER TABLE
>> path that says if (i_am_not_the_partitioning_root) ereport(ERROR, ...)
>> wherever you want.
>
> That'll be a lot of places you'll need to touch. More fundamentally: Why
> should we name something a table that's not one?

Well, I'm not convinced that it isn't one.  And adding a new relkind
will involve a bunch of code churn, too.  But I don't much care to
pre-litigate this: when someone has got a patch, we can either agree
that the approach is OK or argue that it is problematic because X.  I
think we need to hammer down the design in broad strokes first, and
I'm not sure we're totally there yet.

>> - Direct access to individual partitions to bypass
>> tuple-routing/query-planning overhead.
>
> I think that might be ok in some cases, but in general I'd be very wary
> to allow that. I think it might be ok to allow direct read access, but
> everything else I'd be opposed. I'd much rather go the route of allowing
> to few things and then gradually opening up if required than the other
> way round (as that pretty much will never happen because it'll break
> deployed systems).

Why?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On 12/08/2014 11:40 AM, Robert Haas wrote:
>> I don't thing its feasible to drop inheritance partitioning at this
>> point; too many user exploit a lot of peculiarities of that system which
>> wouldn't be supported by any other system.  So any new partitioning
>> system we're talking about would be *in addition* to the existing
>> system.  Hence my prior email trying to make sure that a new proposed
>> system is sufficiently different from the existing one to be worthwhile.
> 
> I think any new partitioning system should keep the good things about
> the existing system, of which there are some, and not try to reinvent
> the wheel.  The yard stick for a new system shouldn't be "is this
> different enough?" but "does this solve the problems without creating
> new ones?".

It's unrealistic to assume that a new system would support all of the
features of the existing inheritance partitioning without restriction.
 In fact, I'd say that such a requirement amounts to saying "don't
bother trying".

For example, inheritance allows us to have different indexes,
constraints, and even columns on partitions.  We can have overlapping
partitions, and heterogenous multilevel partitioning (partition this
customer by month but partition that customer by week).  We can even add
triggers on individual partitions to reroute data away from a specific
partition.   A requirement to support all of these peculiar uses of
inheritance partitioning would doom any new partitioning project.

>>> Besides, I haven't really seen anyone propose something that sounds
>>> like a credible alternative.  If we could make partition objects
>>> things that the storage layer needs to know about but the query
>>> planner doesn't need to understand, that'd be maybe worth considering.
>>> But I don't see any way that that's remotely feasible.
>>
>> On the other hand, as long as partitions exist exclusively at the
>> planner layer, we can't fix the existing major shortcomings of
>> inheritance partitioning, such as its inability to handle expressions.
>> Again, see previous.
> 
> Huh?

Explained in the other email I posted on this thread.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-08 Thread Andres Freund

On 2014-12-08 14:48:50 -0500, Robert Haas wrote:
> On Mon, Dec 8, 2014 at 2:39 PM, Andres Freund  wrote:
> >> I guess I'm in disagreement with you - and, perhaps - the majority on
> >> this point.  I think that ship has already sailed: partitions ARE
> >> tables.  We can try to make it less necessary for users to ever look
> >> at those tables as separate objects, and I think that's a good idea.
> >> But trying to go from a system where partitions are tables, which is
> >> what we have today, to a system where they are not seems like a bad
> >> idea to me.  If we make a major break from how things work today,
> >> we're going to end up having to reimplement stuff that already works.
> >
> > I don't think this makes much sense. That'd severely restrict our
> > ability to do stuff for a long time. Unless we can absolutely rely on
> > the fact that partitions have the same schema and such we'll rob
> > ourselves of significant optimization opportunities.
> 
> I don't think that's mutually exclusive with the idea of
> partitions-as-tables.  I mean, you can add code to the ALTER TABLE
> path that says if (i_am_not_the_partitioning_root) ereport(ERROR, ...)
> wherever you want.

That'll be a lot of places you'll need to touch. More fundamentally: Why
should we name something a table that's not one?

> >> Besides, I haven't really seen anyone propose something that sounds
> >> like a credible alternative.  If we could make partition objects
> >> things that the storage layer needs to know about but the query
> >> planner doesn't need to understand, that'd be maybe worth considering.
> >> But I don't see any way that that's remotely feasible.  There are lots
> >> of places that we assume that a heap consists of blocks number 0 up
> >> through N: CTID pointers, index-to-heap pointers, nodeSeqScan, bits
> >> and pieces of the way index vacuuming is handled, which in turn bleeds
> >> into Hot Standby.  You can't just decide that now block numbers are
> >> going to be replaced by some more complex structure, or even that
> >> they're now going to be nonlinear, without breaking a huge amount of
> >> stuff.
> >
> > I think you're making a wrong fundamental assumption here. Just because
> > we define partitions to not be full relations doesn't mean we have to
> > treat them entirely separate. I don't see why a pg_class.relkind = 'p'
> > entry would be something actually problematic. That'd easily allow to
> > treat them differently in all the relevant places (all of ALTER TABLE,
> > DML et al) and still allow all of the current planner/executor
> > infrastructure. We can even allow direct SELECTs from individual
> > partitions if we want to - that's trivial to achieve.
> 
> We may just be using different words to talk about more-or-less the
> same thing, then.

That might be

> What I'm saying is that I want these things to keep working:

> - Indexes.

Nobody argued against that I think.

> - Merge append and any other inheritance-aware query planning
> techniques.

Same here.

> - Direct access to individual partitions to bypass
> tuple-routing/query-planning overhead.

I think that might be ok in some cases, but in general I'd be very wary
to allow that. I think it might be ok to allow direct read access, but
everything else I'd be opposed. I'd much rather go the route of allowing
to few things and then gradually opening up if required than the other
way round (as that pretty much will never happen because it'll break
deployed systems).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Mon, Dec 8, 2014 at 2:39 PM, Andres Freund  wrote:
>> I guess I'm in disagreement with you - and, perhaps - the majority on
>> this point.  I think that ship has already sailed: partitions ARE
>> tables.  We can try to make it less necessary for users to ever look
>> at those tables as separate objects, and I think that's a good idea.
>> But trying to go from a system where partitions are tables, which is
>> what we have today, to a system where they are not seems like a bad
>> idea to me.  If we make a major break from how things work today,
>> we're going to end up having to reimplement stuff that already works.
>
> I don't think this makes much sense. That'd severely restrict our
> ability to do stuff for a long time. Unless we can absolutely rely on
> the fact that partitions have the same schema and such we'll rob
> ourselves of significant optimization opportunities.

I don't think that's mutually exclusive with the idea of
partitions-as-tables.  I mean, you can add code to the ALTER TABLE
path that says if (i_am_not_the_partitioning_root) ereport(ERROR, ...)
wherever you want.

>> Besides, I haven't really seen anyone propose something that sounds
>> like a credible alternative.  If we could make partition objects
>> things that the storage layer needs to know about but the query
>> planner doesn't need to understand, that'd be maybe worth considering.
>> But I don't see any way that that's remotely feasible.  There are lots
>> of places that we assume that a heap consists of blocks number 0 up
>> through N: CTID pointers, index-to-heap pointers, nodeSeqScan, bits
>> and pieces of the way index vacuuming is handled, which in turn bleeds
>> into Hot Standby.  You can't just decide that now block numbers are
>> going to be replaced by some more complex structure, or even that
>> they're now going to be nonlinear, without breaking a huge amount of
>> stuff.
>
> I think you're making a wrong fundamental assumption here. Just because
> we define partitions to not be full relations doesn't mean we have to
> treat them entirely separate. I don't see why a pg_class.relkind = 'p'
> entry would be something actually problematic. That'd easily allow to
> treat them differently in all the relevant places (all of ALTER TABLE,
> DML et al) and still allow all of the current planner/executor
> infrastructure. We can even allow direct SELECTs from individual
> partitions if we want to - that's trivial to achieve.

We may just be using different words to talk about more-or-less the
same thing, then.  What I'm saying is that I want these things to keep
working:

- Indexes.
- Merge append and any other inheritance-aware query planning techniques.
- Direct access to individual partitions to bypass
tuple-routing/query-planning overhead.

I am not necessarily saying that I have a problem with putting other
restrictions on partitions, like requiring them to have the same tuple
descriptor or the same ACLs as their parents.  Those kinds of details
bear discussion, but I'm not intrinsically opposed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Mon, Dec 8, 2014 at 2:30 PM, Josh Berkus  wrote:
> On 12/08/2014 11:05 AM, Robert Haas wrote:
>> I guess I'm in disagreement with you - and, perhaps - the majority on
>> this point.  I think that ship has already sailed: partitions ARE
>> tables.  We can try to make it less necessary for users to ever look
>> at those tables as separate objects, and I think that's a good idea.
>> But trying to go from a system where partitions are tables, which is
>> what we have today, to a system where they are not seems like a bad
>> idea to me.  If we make a major break from how things work today,
>> we're going to end up having to reimplement stuff that already works.
>
> I don't thing its feasible to drop inheritance partitioning at this
> point; too many user exploit a lot of peculiarities of that system which
> wouldn't be supported by any other system.  So any new partitioning
> system we're talking about would be *in addition* to the existing
> system.  Hence my prior email trying to make sure that a new proposed
> system is sufficiently different from the existing one to be worthwhile.

I think any new partitioning system should keep the good things about
the existing system, of which there are some, and not try to reinvent
the wheel.  The yard stick for a new system shouldn't be "is this
different enough?" but "does this solve the problems without creating
new ones?".

>> Besides, I haven't really seen anyone propose something that sounds
>> like a credible alternative.  If we could make partition objects
>> things that the storage layer needs to know about but the query
>> planner doesn't need to understand, that'd be maybe worth considering.
>> But I don't see any way that that's remotely feasible.
>
> On the other hand, as long as partitions exist exclusively at the
> planner layer, we can't fix the existing major shortcomings of
> inheritance partitioning, such as its inability to handle expressions.
> Again, see previous.

Huh?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-08 Thread Andres Freund

On 2014-12-08 14:05:52 -0500, Robert Haas wrote:
> On Sat, Dec 6, 2014 at 3:06 AM, Amit Kapila  wrote:
> > Sure, I don't feel we should not provide anyway to take dump
> > for individual partition but not at level of independent table.
> > May be something like --table 
> > --partition .
> >
> > In general, I think we should try to avoid exposing that partitions are
> > individual tables as that might hinder any future enhancement in that
> > area (example if we someone finds a different and better way to
> > arrange the partition data, then due to the currently exposed syntax,
> > we might feel blocked).
> 
> I guess I'm in disagreement with you - and, perhaps - the majority on
> this point.  I think that ship has already sailed: partitions ARE
> tables.  We can try to make it less necessary for users to ever look
> at those tables as separate objects, and I think that's a good idea.
> But trying to go from a system where partitions are tables, which is
> what we have today, to a system where they are not seems like a bad
> idea to me.  If we make a major break from how things work today,
> we're going to end up having to reimplement stuff that already works.

I don't think this makes much sense. That'd severely restrict our
ability to do stuff for a long time. Unless we can absolutely rely on
the fact that partitions have the same schema and such we'll rob
ourselves of significant optimization opportunities.

> Besides, I haven't really seen anyone propose something that sounds
> like a credible alternative.  If we could make partition objects
> things that the storage layer needs to know about but the query
> planner doesn't need to understand, that'd be maybe worth considering.
> But I don't see any way that that's remotely feasible.  There are lots
> of places that we assume that a heap consists of blocks number 0 up
> through N: CTID pointers, index-to-heap pointers, nodeSeqScan, bits
> and pieces of the way index vacuuming is handled, which in turn bleeds
> into Hot Standby.  You can't just decide that now block numbers are
> going to be replaced by some more complex structure, or even that
> they're now going to be nonlinear, without breaking a huge amount of
> stuff.

I think you're making a wrong fundamental assumption here. Just because
we define partitions to not be full relations doesn't mean we have to
treat them entirely separate. I don't see why a pg_class.relkind = 'p'
entry would be something actually problematic. That'd easily allow to
treat them differently in all the relevant places (all of ALTER TABLE,
DML et al) and still allow all of the current planner/executor
infrastructure. We can even allow direct SELECTs from individual
partitions if we want to - that's trivial to achieve.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On 12/08/2014 11:05 AM, Robert Haas wrote:
> I guess I'm in disagreement with you - and, perhaps - the majority on
> this point.  I think that ship has already sailed: partitions ARE
> tables.  We can try to make it less necessary for users to ever look
> at those tables as separate objects, and I think that's a good idea.
> But trying to go from a system where partitions are tables, which is
> what we have today, to a system where they are not seems like a bad
> idea to me.  If we make a major break from how things work today,
> we're going to end up having to reimplement stuff that already works.

I don't thing its feasible to drop inheritance partitioning at this
point; too many user exploit a lot of peculiarities of that system which
wouldn't be supported by any other system.  So any new partitioning
system we're talking about would be *in addition* to the existing
system.  Hence my prior email trying to make sure that a new proposed
system is sufficiently different from the existing one to be worthwhile.

> Besides, I haven't really seen anyone propose something that sounds
> like a credible alternative.  If we could make partition objects
> things that the storage layer needs to know about but the query
> planner doesn't need to understand, that'd be maybe worth considering.
> But I don't see any way that that's remotely feasible. 

On the other hand, as long as partitions exist exclusively at the
planner layer, we can't fix the existing major shortcomings of
inheritance partitioning, such as its inability to handle expressions.
Again, see previous.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Sat, Dec 6, 2014 at 3:06 AM, Amit Kapila  wrote:
> Sure, I don't feel we should not provide anyway to take dump
> for individual partition but not at level of independent table.
> May be something like --table 
> --partition .
>
> In general, I think we should try to avoid exposing that partitions are
> individual tables as that might hinder any future enhancement in that
> area (example if we someone finds a different and better way to
> arrange the partition data, then due to the currently exposed syntax,
> we might feel blocked).

I guess I'm in disagreement with you - and, perhaps - the majority on
this point.  I think that ship has already sailed: partitions ARE
tables.  We can try to make it less necessary for users to ever look
at those tables as separate objects, and I think that's a good idea.
But trying to go from a system where partitions are tables, which is
what we have today, to a system where they are not seems like a bad
idea to me.  If we make a major break from how things work today,
we're going to end up having to reimplement stuff that already works.

Besides, I haven't really seen anyone propose something that sounds
like a credible alternative.  If we could make partition objects
things that the storage layer needs to know about but the query
planner doesn't need to understand, that'd be maybe worth considering.
But I don't see any way that that's remotely feasible.  There are lots
of places that we assume that a heap consists of blocks number 0 up
through N: CTID pointers, index-to-heap pointers, nodeSeqScan, bits
and pieces of the way index vacuuming is handled, which in turn bleeds
into Hot Standby.  You can't just decide that now block numbers are
going to be replaced by some more complex structure, or even that
they're now going to be nonlinear, without breaking a huge amount of
stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Mon, Dec 8, 2014 at 12:13 AM, Amit Langote
 wrote:
> So just to clarify, first and last destinations are considered "defined" if 
> you have something like:
>
> ...
> PARTITION p1 VALUES LESS THAN 10
> PARTITION p2 VALUES BETWEEN 10 AND 20
> PARTITION p3 VALUES GREATER THAN 20
> ...
>
> And "not defined" if:
>
> ...
> PARTITION p1 VALUES BETWEEN 10 AND 20
> ...

Yes.

>> For pg_dump --binary-upgrade, you need a statement like SELECT
>> binary_upgrade.set_next_toast_pg_class_oid('%d'::pg_catalog.oid) for
>> each pg_class entry.  So you can't easily have a single SQL statement
>> creating multiple such entries.
>
> Hmm, do you mean "pg_dump cannot emit" such a SQL or there shouldn't be one 
> in the first place?

I mean that the binary upgrade script needs to set the OID for every
pg_class object being restored, and it does that by stashing away up
to one (1) pg_class OID before each CREATE statement.  If a single
CREATE statement generates multiple pg_class entries, this method
doesn't work.

> Makes sense. This would double as a way to create subpartitions too? And that 
> would have to play well with any choice we end up making about how we treat 
> subpartitioning key (one of the points discussed above)

Yes, I think so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

On Sat, Dec 6, 2014 at 2:59 AM, Amit Kapila  wrote:
>> I guess you could list or hash partition on multiple columns, too.
>
> How would you distinguish values in list partition for multiple
> columns? I mean for range partition, we are sure there will
> be either one value for each column, but for list it could
> be multiple and not fixed for each partition, so I think it will not
> be easy to support the multicolumn partition key for list
> partitions.

I don't understand.  If you want to range partition on columns (a, b),
you say that, say, tuples with (a, b) values less than (100, 200) go
here and the rest go elsewhere.  For list partitioning, you say that,
say, tuples with (a, b) values of EXACTLY (100, 200) go here and the
rest go elsewhere.  I'm not sure how useful that is but it's not
illogical.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

All,

Pardon me for jumping into this late.  In general, I like Alvaro's
approach.  However, I wanted to list the major shortcomings of the
existing replication system (based on complaints by PGX's users and on
IRC) and compare them to Alvaro's proposed implementation to make sure
that enough of them are addressed, and that the ones which aren't
addressed are not being addressed as a clear decision.  We can't address
*all* of the limitations of the current system, but let's make sure that
we're addressing enough of them to make implementing a 2nd partitioning
system worthwhile.

Where I have ? is because I'm not clear from Alvaro's proposal whether
they're addressed or not.

1.The Trigger Problem
the need to write triggers for INSERT/UPDATE/DELETE.
Addressed.

2. The Clutter Problem
cluttering up system views and dumps with hundreds of partitioned tables
Addressed.

3. Creation Problem
the need two write triggers and/or cron jobs to create new partitions
Addressed.

4. Creation Locking Problem
high probability of lock pile-ups whenever a new partition is created on
demand due to multiple backends trying to create the partition at the
same time.
Not Addressed?

5. Constant Problem
Since current partitioned query planning happens before the rewrite
phase, SELECTs do not use partition logic to evaluate even simple
expressions, let alone IMMUTABLE or STABLE functions.
Addressed??

6. Unique Index Problem
Cannot create a unique index across multiple partitions, which prevents
the partitioned table from being FK'd.
Not Addressed
(but could be addressed in the future)

7. JOIN Problem
Two partitioned tables being JOINed need to append and materialize
before the join, causing a very slow join under some circumstances, even
if both tables are partitioned on the same ranges.
Not Addressed?
(but could be addressed in the future)

8. COPY Problem
Cannot bulk-load into the Master, just into individual partitions.
Addressed.

9. Hibernate Problem
When using the trigger method, inserts into the master partition return
0, which Hibernate and some other ORMs regard as an insert failure.
Addressed.

10. Scaling Problem
Inheritance partitioning becomes prohibitively slow for the planner at
somewhere between 100 and 500 partitions depending on various factors.
No idea?

11. Hash Partitioning
Some users would prefer to partition into a fixed number of
hash-allocated partitions.
Not Addressed.

12. Extra Constraint Evaluation
Inheritance partitioning evaluates *all* constraints on the partitions,
whether they are part of the partitioning scheme or not.  This is way
expensive if those are, say, polygon comparisons.
Addressed.


Additionally, I believe that Alvaro's proposal will make the following
activities which are supported by partition-by-inheritance more
difficult or impossible.  Again, these are probably acceptable because
inheritance partitioning isn't going away.  However, we should
consciously decide that:

A. COPY/ETL then attach
In inheritance partitioning, you can easily build a partition outside
the master and then "attach" it, allowing for minimal disturbance of
concurrent users.  Could be addressed in the future.

B. Catchall Partition
Many partitioning schemes currently contain a "catchall" partition which
accepts rows outside of the range of the partitioning scheme, due to bad
input data.  Probably not handled on purpose; Alvaro is proposing that
we reject these instead, or create the partitions on demand, which is a
legitimate approach.

C. Asymmetric Partitioning / NULLs in partition column
This is the classic Active/Inactive By Month setup for partitions.
Could be addressed via special handling for NULL/infinity in the
partitioned column.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning


From: Amit Kapila [mailto:amit.kapil...@gmail.com] 
> > > How would you distinguish values in list partition for multiple
> > > columns? I mean for range partition, we are sure there will
> > > be either one value for each column, but for list it could
> > > be multiple and not fixed for each partition, so I think it will not
> > > be easy to support the multicolumn partition key for list
> > > partitions.
>
> >Irrespective of difficulties of representing it using pg_node_tree, it seems 
> >to me that multicolumn list partitioning is not widely used.
> 
> So I think it is better to be clear why we are not planning to
> support it, is it that because it is not required by users or
> is it due to the reason that code seems to be tricky or is it due
> to both of the reasons.  It might help us if anyone raises this
> during the development of this patch or in general if someone
> requests such a feature.

Coming back to the how pg_node_tree representation for list partitions - 

For each column in a multicolumn list partition key, a value would look like a 
dumped Node for List of Consts (all allowed values in a given list partition). 
And the whole key would then be a List of such Nodes (a dump thereof). That's 
perhaps pretty verbose but I guess that's supposed to be only a catalog 
representation. During relcache building, we turn this back into a collection 
of structs to efficiently locate the partition of interest whatever the method 
of doing that ends up being (based on partition type). The relcache step 
ensures that we have decoupled the concern of quickly locating an interesting 
partition from its catalog representation.

Of course, there may be flaws in this picture and would only reveal themselves 
when actually trying to implement it or they can be pointed out in advance.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning

2014-12-07 Thread Amit Kapila

On Mon, Dec 8, 2014 at 11:01 AM, Amit Langote 
wrote:
> From: Amit Kapila [mailto:amit.kapil...@gmail.com]
> Sent: Saturday, December 06, 2014 5:00 PM
> To: Robert Haas
> Cc: Amit Langote; Andres Freund; Alvaro Herrera; Bruce Momjian; Pg Hackers
> Subject: Re: [HACKERS] On partitioning
>
> On Fri, Dec 5, 2014 at 10:03 PM, Robert Haas 
wrote:
> > On Tue, Dec 2, 2014 at 10:43 PM, Amit Langote
> >  wrote:
> >
> > > I wonder if your suggestion of pg_node_tree plays well here. This
then could be a list of CONSTs or some such... And I am thinking it's a
concern only for range partitions, no? (that is, a multicolumn partition
key)
> >
> > I guess you could list or hash partition on multiple columns, too.
> >
> > How would you distinguish values in list partition for multiple
> > columns? I mean for range partition, we are sure there will
> > be either one value for each column, but for list it could
> > be multiple and not fixed for each partition, so I think it will not
> > be easy to support the multicolumn partition key for list
> > partitions.
>
> Irrespective of difficulties of representing it using pg_node_tree, it
seems to me that multicolumn list partitioning is not widely used.

So I think it is better to be clear why we are not planning to
support it, is it that because it is not required by users or
is it due to the reason that code seems to be tricky or is it due
to both of the reasons.  It might help us if anyone raises this
during the development of this patch or in general if someone
requests such a feature.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] On partitioning



From: pgsql-hackers-ow...@postgresql.org 
[mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
Sent: Saturday, December 06, 2014 5:06 PM
To: Robert Haas
Cc: Amit Langote; Andres Freund; Alvaro Herrera; Bruce Momjian; Pg Hackers
Subject: Re: [HACKERS] On partitioning

On Fri, Dec 5, 2014 at 10:12 PM, Robert Haas  wrote:
> On Fri, Dec 5, 2014 at 2:18 AM, Amit Kapila  wrote:
> > Do we really need to support dml or pg_dump for individual partitions?
>
> I think we do.  It's quite reasonable for a DBA (or developer or
> whatever) to want to dump all the data that's in a single partition;
> for example, maybe they have the table partitioned, but also spread
> across several servers.  When the data on one machine grows too big,
> they want to dump that partition, move it to a new machine, and drop
> the partition from the old machine.  That needs to be easy and
> efficient.
>
> More generally, with inheritance, I've seen the ability to reference
> individual inheritance children be a real life-saver on any number of
> occasions.  Now, a new partitioning system that is not as clunky as
> constraint exclusion will hopefully be fast enough that people don't
> need to do it very often any more.  But I would be really cautious
> about removing the option.  That is the equivalent of installing a new
> fire suppression system and then boarding up the emergency exit.
> Yeah, you *hope* the new fire suppression system is good enough that
> nobody will ever need to go out that way any more.  But if you're
> wrong, people will die, so getting rid of it isn't prudent.  The
> stakes are not quite so high here, but the principle is the same.
>
> 
> Sure, I don't feel we should not provide anyway to take dump
> for individual partition but not at level of independent table.
> May be something like --table 
> --partition .
> 

This does sound cleaner.

> In general, I think we should try to avoid exposing that partitions are
> individual tables as that might hinder any future enhancement in that
> area (example if we someone finds a different and better way to
> arrange the partition data, then due to the currently exposed syntax,
> we might feel blocked). 

Sounds like a concern. I guess you are referring to whether we allow a 
partition relation to be included in the range table and then some other cases. 
In the former case we could allow referring to individual partitions by some 
additional syntax if it doesn’t end up looking too ugly or invite a bunch of 
other issues.

This seems to have been discussed a little bit upthread (for example, see "Open 
Questions" in Alvaro's original proposal and Hannu Krosing's reply). 

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning



From: Amit Kapila [mailto:amit.kapil...@gmail.com] 
Sent: Saturday, December 06, 2014 5:00 PM
To: Robert Haas
Cc: Amit Langote; Andres Freund; Alvaro Herrera; Bruce Momjian; Pg Hackers
Subject: Re: [HACKERS] On partitioning

On Fri, Dec 5, 2014 at 10:03 PM, Robert Haas  wrote:
> On Tue, Dec 2, 2014 at 10:43 PM, Amit Langote
>  wrote:
>
> > I wonder if your suggestion of pg_node_tree plays well here. This then 
> > could be a list of CONSTs or some such... And I am thinking it's a concern 
> > only for range partitions, no? (that is, a multicolumn partition key)
>
> I guess you could list or hash partition on multiple columns, too.
>
> How would you distinguish values in list partition for multiple
> columns? I mean for range partition, we are sure there will
> be either one value for each column, but for list it could
> be multiple and not fixed for each partition, so I think it will not
> be easy to support the multicolumn partition key for list
> partitions.

Irrespective of difficulties of representing it using pg_node_tree, it seems to 
me that multicolumn list partitioning is not widely used. It is used in 
combination with range or hash partitioning as composite partitioning. So, 
perhaps we need not worry about that.

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] On partitioning