Re: [HACKERS] Declarative partitioning - another take

2016-11-09 Thread Robert Haas
In this latest patch set:

src/backend/parser/parse_utilcmd.c:3194: indent with spaces.
+*rdatum;

With all patches applied, "make check" fails with a bunch of diffs
that look like this:

  Check constraints:
- "pt1chk2" CHECK (c2 <> ''::text)
  "pt1chk3" CHECK (c2 <> ''::text)


On Wed, Nov 9, 2016 at 6:14 AM, Amit Langote
 wrote:
> Actually the sentence there is: The parenthesized list of columns or
> expressions forms the the partitioning key for the
> table

OK.

>> +/*
>> + * Strip any top-level COLLATE clause.  This ensures that we 
>> treat
>> + * "x COLLATE y" and "(x COLLATE y)" alike.
>> + */
>>
>> But you don't, right?  Unless I am confused, which is possible, the
>> latter COLLATE will be ignored, while the former one will set the
>> collation to be used in the context of partitioning comparisons.
>
> The code immediately following the comment does in fact strip the
> top-level COLLATE clause.
>
> while (IsA(expr, CollateExpr))
> expr = (Node *) ((CollateExpr *) expr)->arg;
>
> So that the following two specifications are equivalent which is the intent:
>
> create table p (a text) partition by range (a collate "en_US");
> vs.
> create table p (a text) partition by range ((a collate "en_US"));

I see.  You're right.

>> Re-reviewing 0002:
>>
>> +   if (fout->remoteVersion >= 10)
>> +   {
>> +   PQExpBuffer acl_subquery = createPQExpBuffer();
>> +   PQExpBuffer racl_subquery = createPQExpBuffer();
>> +   PQExpBuffer initacl_subquery = createPQExpBuffer();
>> +   PQExpBuffer initracl_subquery = createPQExpBuffer();
>> +
>> +   PQExpBuffer attacl_subquery = createPQExpBuffer();
>> +   PQExpBuffer attracl_subquery = createPQExpBuffer();
>> +   PQExpBuffer attinitacl_subquery = createPQExpBuffer();
>> +   PQExpBuffer attinitracl_subquery = createPQExpBuffer();
>>
>> It seems unnecessary to repeat all of this.  The only differences
>> between the 1 version and the 9600 version are:
>>
>> 60,61c60
>> <   "AS changed_acl, "
>> <   "CASE WHEN c.relkind = 'P' THEN
>> pg_catalog.pg_get_partkeydef(c.oid) ELSE NULL END AS partkeydef "
>> ---
>>>   "AS changed_acl "
>> 73c72
>> <"WHERE c.relkind in ('%c', '%c', '%c', '%c',
>> '%c', '%c', '%c') "
>> ---
>>>"WHERE c.relkind in ('%c', '%c', '%c', '%c', '%c', '%c') 
>>> "
>> 87,88c86
>> <   RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE,
>> <   RELKIND_PARTITIONED_TABLE);
>> ---
>>>   RELKIND_MATVIEW, RELKIND_FOREIGN_TABLE);
>>
>> But none of that is really a problem.  Sure, the 'P' case will never
>> arise in 9.6, but so what?  I'd really like to not keep duplicating
>> these increasingly-complex hunks of code if we can find some way to
>> avoid that.
>
> We cannot reference pg_catalog.pg_get_partkeydef() in the SQL query that
> getTables() sends to pre-10 servers, right?  But I suppose we need not
> call it in that particular SQL query in the first place.

Oh, yeah, that's a problem; the query will error out against older
servers.  You could do something like:

char *partkeydef;
if (version <= 90600)
partkeydef = "NULL";
else
partkeydef = "CASE WHEN c.relkind = 'P' THEN
pg_catalog.pg_get_partkeydef(c.oid) ELSE NULL END";

...and the use %s to interpolate that into the query string.

> How about we do it in the following manner in getSchemaData():
>
> if (g_verbose)
> write_msg(NULL, "reading partition key information for interesting
> tables\n");
> getTablePartitionKeyInfo(fout, tblinfo, numTables);
>
> I have implemented the same.

That might be OK too; I'll have to read it through carefully.

>> Re-reviewing 0003:
>>
>> + 
>> +  If this table is a partition, one cannot perform DROP
>> NOT NULL
>> +  on a column if it is marked not null in the parent table.
>> + 
>>
>> This would, presumably, also be true for inheritance.  I think we
>> could just leave this out.
>
> Actually, it isn't true for the regular inheritance situation.
>
> create table parent (a int not null);
> create table child () inherits (parent);
> alter table child alter a drop not null;  -- this works (bug?)

Hrm, OK.  That doesn't satisfy MY idea of the principle of least
astonishment, but hey...

> But in the meantime, we can proceed with enforcing inheritance on NOT NULL
> constraint for *partitions*, because they only ever have one parent and
> hence do not need elaborate coninhcount based scheme, I think.  Thoughts?

I agree.

>> +  correspond to the partitioning method and partitioning key of the 
>> target
>>
>> I think that in most places were are referring to the "partitioning
>> method" (with ing) but the "partition 

Re: [HACKERS] Declarative partitioning - another take

2016-11-07 Thread Amit Langote

Hi Jaime,

On 2016/11/08 2:15, Jaime Casanova wrote:
> On 28 October 2016 at 02:53, Amit Langote  
> wrote:
> 
> I started to review the functionality of this patch, so i applied all
> 9 patches. After that i found this warning, which i guess is because
> it needs a cast.

Thanks a ton for reviewing!

> After that, i tried a case that i think is important: to partition an
> already existing table. Because there is no ALTER TABL SET PARTITION
> or something similar (which i think makes sense because such a command
> would need to create the partitions and move the rows to follow the
> rule that there is no rows in a parent table).
> 
> So, what i tried was:
> 
> 1) rename original table
> 2) create a new partitioned table with old name
> 3) attach old table as a partition with bounds outside normal bounds
> and no validate
> 
> the idea is to start attaching valid partitions and delete and insert
> rows from the invalid one (is there a better way of doing that?), that
> will allow to partition a table easily.

So, step 3 creates a partition that is basically unbounded.  From there,
it seems you want to create partitions with proper bounds, moving data
into them as they are created.  It seems like a job of some redistribution
command (split partition?) which is currently unsupported.

> So far so good, until i decided points 1 to 3 should happen inside a
> transaction to make things transparent to the user.
> 
> Attached is script that shows the failure when trying it:
> 
> script 1 (failing_test_1.sql) fails the assert
> "Assert(RelationGetPartitionKey(parentRel) != NULL);" in
> transformAttachPartition() at src/backend/parser/parse_utilcmd.c:3164

Thanks for the test.  This test uncovered a bug which I have fixed in my
local repository.  Given the fix the above will work, although I see that
it's not the best way to do what you want to do.

> After that i tried the same but with an already partitioned (via
> inheritance) table and got this (i did this first without a
> transaction, doing it with a transaction will show the same failure as
> before):
> 
> script 2 (failing_test_2.sql) fails the assert
> "Assert(childrte->relkind == RELKIND_PARTITIONED_TABLE);" in
> expand_inherited_rte_internal() at
> src/backend/optimizer/prep/prepunion.c:1551

This again was an oversight/bug in the patch.  It's not supported to
combine old-style inheritance partitioning with the new partitioned
tables.  In fact, ATTACH PARTITION prevents adding a regular inheritance
parent as partition.  After fixing the bug, you would instead get this error:

alter table prueba attach partition prueba_old for values start
(unbounded, unbounded) end (2008,1) no validate;
ERROR:  cannot attach regular inheritance parent as partition

In this case, you should instead create prueba_old partition hierarchy
using the new partitioning commands and then attach the same.

> PS: i don't like the START END syntax but i don't have any ideas
> there... except, there was a reason not to use expressions (like a
> CHECK constraint?)

Expression syntax is too unrestricted.  What's to prevent users from using
completely different check constraint expressions from partition to
partition (not that any users deliberately try do that)?  With the new
partitioning syntax, you specify the partitioning columns once when
creating the partitioned parent and then specify, using partitioning
method specific syntax, the bounds for every partition of the parent.
That allows to capture the partition metadata in a form that is more
suitable to implement other features related to partitioning.

That said, I think it's always a challenge to come up with a syntax that
is universally acceptable.  For example, the recent discussion about
whether to allow inclusive/exclusive to be specified for START and END
bounds of range partitions or always assume inclusive start and exclusive
end [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoaKOycHcVoed%2BF3fk-z6xUOeysQFG6HT%3Doucw76bSMHCQ%40mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-07 Thread Amit Langote

Hi Jaime,

On 2016/11/08 2:24, Jaime Casanova wrote:
> On 7 November 2016 at 12:15, Jaime Casanova
>  wrote:
>> On 28 October 2016 at 02:53, Amit Langote  
>> wrote:
>>>
>>> Please find attached the latest version of the patches
>>
>> Hi,
>>
>> I started to review the functionality of this patch, so i applied all
>> 9 patches. After that i found this warning, which i guess is because
>> it needs a cast.
>>
> 
> oh! i forgot the warning
> """
> partition.c: In function ‘get_qual_for_list’:
> partition.c:1159:6: warning: assignment from incompatible pointer type
>or = makeBoolExpr(OR_EXPR, list_make2(nulltest2, opexpr), -1);
>   ^
> """

This one I noticed too and have fixed.

> 
> attached a list of the warnings that my compiler give me (basically
> most are just variables that could be used uninitialized)

Thanks a lot for spotting and reporting these.  Will fix as appropriate.

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-07 Thread Jaime Casanova
On 7 November 2016 at 12:15, Jaime Casanova
 wrote:
> On 28 October 2016 at 02:53, Amit Langote  
> wrote:
>>
>> Please find attached the latest version of the patches
>
> Hi,
>
> I started to review the functionality of this patch, so i applied all
> 9 patches. After that i found this warning, which i guess is because
> it needs a cast.
>

oh! i forgot the warning
"""
partition.c: In function ‘get_qual_for_list’:
partition.c:1159:6: warning: assignment from incompatible pointer type
   or = makeBoolExpr(OR_EXPR, list_make2(nulltest2, opexpr), -1);
  ^
"""

attached a list of the warnings that my compiler give me (basically
most are just variables that could be used uninitialized)

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
partition.c: In function ‘get_qual_for_list’:
partition.c:1159:6: warning: assignment from incompatible pointer type
   or = makeBoolExpr(OR_EXPR, list_make2(nulltest2, opexpr), -1);
  ^
partition.c: In function ‘RelationBuildPartitionDesc’:
partition.c:250:8: warning: ‘num_distinct_bounds’ may be used uninitialized in 
this function [-Wmaybe-uninitialized]
  int   num_distinct_bounds;
^
partition.c:587:9: warning: ‘distinct_bounds’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]
 copy_range_bound(key, distinct_bounds[i]);
 ^
partition.c:536:5: warning: ‘all_values_count’ may be used uninitialized in 
this function [-Wmaybe-uninitialized]
 for (i = 0; i < all_values_count; i++)
 ^
partition.c:243:23: warning: ‘all_values’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]
  PartitionListValue **all_values;
   ^
partition.c:619:35: warning: ‘oids’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
result->oids[mapping[i]] = oids[i];
   ^
partition.c: In function ‘get_qual_from_partbound’:
partition.c:973:10: warning: ‘my_qual’ may be used uninitialized in this 
function [-Wmaybe-uninitialized]
  my_qual = (List *) map_variable_attnos((Node *) my_qual,
  ^
partition.c: In function ‘get_partition_for_tuple’:
partition.c:1734:8: warning: ‘index’ may be used uninitialized in this function 
[-Wmaybe-uninitialized]
 if (node->index > index)
^

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-07 Thread Jaime Casanova
On 28 October 2016 at 02:53, Amit Langote  wrote:
>
> Please find attached the latest version of the patches

Hi,

I started to review the functionality of this patch, so i applied all
9 patches. After that i found this warning, which i guess is because
it needs a cast.

After that, i tried a case that i think is important: to partition an
already existing table. Because there is no ALTER TABL SET PARTITION
or something similar (which i think makes sense because such a command
would need to create the partitions and move the rows to follow the
rule that there is no rows in a parent table).

So, what i tried was:

1) rename original table
2) create a new partitioned table with old name
3) attach old table as a partition with bounds outside normal bounds
and no validate

the idea is to start attaching valid partitions and delete and insert
rows from the invalid one (is there a better way of doing that?), that
will allow to partition a table easily.
So far so good, until i decided points 1 to 3 should happen inside a
transaction to make things transparent to the user.

Attached is script that shows the failure when trying it:

script 1 (failing_test_1.sql) fails the assert
"Assert(RelationGetPartitionKey(parentRel) != NULL);" in
transformAttachPartition() at src/backend/parser/parse_utilcmd.c:3164

After that i tried the same but with an already partitioned (via
inheritance) table and got this (i did this first without a
transaction, doing it with a transaction will show the same failure as
before):

script 2 (failing_test_2.sql) fails the assert
"Assert(childrte->relkind == RELKIND_PARTITIONED_TABLE);" in
expand_inherited_rte_internal() at
src/backend/optimizer/prep/prepunion.c:1551

PS: i don't like the START END syntax but i don't have any ideas
there... except, there was a reason not to use expressions (like a
CHECK constraint?)

-- 
Jaime Casanova  www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


failing_test_1.sql
Description: application/sql


failing_test_2.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread Robert Haas
Apologies if I've made some of these comments before and/or missed
comments you've made on these topics.  The size of this patch set is
so large that it's hard to keep track of everything.

Re-reviewing 0001:

+   indicate which table columns are used as partition key.  For example,

s/are used as/are part of the/

+   third table columns make up the partition key.  A zero in this array
+   indicates that the corresponding partition key column is an expression
+   over the table columns, rather than a simple column reference.

I think you can leave out "over the table columns".

+  columns or expressions forms the partitioning key

s/forms/form/

+  The table itself is empty.  A data row inserted into the table is routed

s/The table/The partitioned table/

+ * Anything mentioned in the expressions.  We must ignore the column
+ * references which will count as self-dependency items; in this case,
+ * the depender is the table itself (there is no such thing as partition
+ * key object).

"depender" is not really a word, and the parenthetical note isn't very
clear.  Maybe: We must ignore the column references, which will depend
on the table itself; there is no separate partition key object.

+heap_close(pg_partitioned_table, RowExclusiveLock);

It seems like it would be safe to do this right after
CatalogUpdateIndexes(pg_partitioned_table, tuple), and I'd recommend
that you do.  Not for performance or anything, but just to keep
related code together.

 /*
  * Resolve possibly-defaulted operator class specification
  */
-static Oid
+Oid
 GetIndexOpClass(List *opclass, Oid attrType,

Perhaps we should rename this function to ResolveOpClass, since it's
now going to be used for both indexes and partitioning keys.

+ * Sets *is_expr if attnum is found to be referenced in some partition key
+ * expression.

is_expr doesn't seem quite as clear as, say, used_by_expr or used_in_expr.

Also, the specification for this function doesn't seem to be very
clear about what this is supposed to do if the same column is both an
explicit partitioning column and also used in an expression, and the
code looks like it'll return with *is_expr set based on whichever use
it encounters first.  If that's intended behavior, maybe add a comment
like: It's possible for a column to be used both directly and as part
of a partition key expression; if that happens, *is_expr may end up as
either true or false.  That's OK for current uses of this function,
because *is_expr is only used to tailor the error message text.

+if (is_expr)
+*is_expr = false;
+if (attnum == partattno)
+return true;

I think you should adjust this (and the other bit in the same
function) so that you don't set *is_expr until you're committed to
returning.

+index = -1;
+while ((index = bms_next_member(expr_attrs, index)) > 0)
+{
+AttrNumber attno = index + FirstLowInvalidHeapAttributeNumber;
+
+if (attno == attnum)
+return true;
+}

How about bms_is_member(expr_attrs, attnum -
FirstLowInvalidHeapAttributeNumber), instead of looping?

+ errmsg("cannot reference relation \"%s\"",
RelationGetRelationName(pkrel)),
+ errdetail("Referencing partitioned tables in foreign
key constraints is not supported.")));

I think you could get rid of the errdetail and just have the error
message be "cannot reference partitioned table \"%s\"".

+ errmsg("column \"%s\" appears twice in
partition key", pelem->name),

It could be there three times!  How about column \"%s\" appears more
than once in partition key?  (I see that you seem to have adapted this
from some code in parse_utilcmd.c, which perhaps should also be
adjusted, but that's a job for another patch.)

+/*
+ * Strip any top-level COLLATE clause.  This ensures that we treat
+ * "x COLLATE y" and "(x COLLATE y)" alike.
+ */

But you don't, right?  Unless I am confused, which is possible, the
latter COLLATE will be ignored, while the former one will set the
collation to be used in the context of partitioning comparisons.

Re-reviewing 0002:

+   if (fout->remoteVersion >= 10)
+   {
+   PQExpBuffer acl_subquery = createPQExpBuffer();
+   PQExpBuffer racl_subquery = createPQExpBuffer();
+   PQExpBuffer initacl_subquery = createPQExpBuffer();
+   PQExpBuffer initracl_subquery = createPQExpBuffer();
+
+   PQExpBuffer attacl_subquery = createPQExpBuffer();
+   PQExpBuffer attracl_subquery = createPQExpBuffer();
+   PQExpBuffer attinitacl_subquery = createPQExpBuffer();
+   PQExpBuffer attinitracl_subquery = createPQExpBuffer();

It seems unnecessary to repeat all of this.  The only differences
between the 1 version and the 9600 

Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread Robert Haas
On Wed, Nov 2, 2016 at 3:41 AM, Amit Langote
 wrote:
> As for which parts of the system need to know about these implicit
> partition constraints to *enforce* them for data integrity, we could say
> that it's really just one site - ExecConstraints() called from
> ExecInsert()/ExecUpdate().

Well, that's a slightly optimistic view of the situation, because the
issue is whether ExecConstraints() is going to get called in the first
place.  But now that I look at it, there are only a handful of
callers, so maybe it's not so bad.

> Admittedly, the current error message style as in this patch exposes the
> implicit constraint approach to a certain criticism: "ERROR:  new row
> violates the partition boundary specification of \"%s\"".  It would say
> the following if it were a named constraint: "ERROR: new row for relation
> \"%s\" violates check constraint \"%s\""
>
> For constraint exclusion optimization, we teach get_relation_constraints()
> to look at these constraints.  Although greatly useful, it's not the case
> of being absolutely critical.
>
> Beside the above two cases, there is bunch of code (relcache, DDL) that
> looks at regular constraints, but IMHO, we need not let any of that code
> concern itself with the implicit partition constraints.  Especially, I
> wonder why the client tools should want the implicit partitioning
> constraint to be shown as a CHECK constraint?  As the proposed patch 0004
> (psql) currently does, isn't it better to instead show the partition
> bounds as follows?
>
> +CREATE TABLE part_b PARTITION OF parted (
> +   b WITH OPTIONS NOT NULL DEFAULT 1 CHECK (b >= 0),
> +   CONSTRAINT check_a CHECK (length(a) > 0)
> +) FOR VALUES IN ('b');

Good point.

> +\d part_b
> + Table "public.part_b"
> + Column |  Type   | Modifiers
> ++-+
> + a  | text|
> + b  | integer | not null default 1
> +Partition of: parted FOR VALUES IN ('b')
> +Check constraints:
> +"check_a" CHECK (length(a) > 0)
> +"part_b_b_check" CHECK (b >= 0)
>
> Needless to say, that could save a lot of trouble thinking about
> generating collision-free names of these constraints, their dependency
> handling, unintended altering of these constraints, pg_dump, etc.

Well, that's certainly true, but those problems don't strike me as so
serious that they couldn't be solved with a reasonable amount of
labor.

>> In fact, as far as I can see, the only advantage of this approach is
>> that when the insert arrives through the parent and is routed to the
>> child by whatever tuple-routing code we end up with (I guess that's
>> what 0008 does), we get to skip checking the constraint, saving CPU
>> cycles.  That's probably an important optimization, but I don't think
>> that putting the partitioning constraint in the catalog in any way
>> rules out the possibility of performing that optimization.  It's just
>> that instead of having the partitioning excluded-by-default and then
>> sometimes choosing to include it, you'll have it included-by-default
>> and then sometimes choose to exclude it.
>
> Hmm, doesn't it seem like we would be making *more* modifications to the
> existing code (backend or otherwise) to teach it about excluding the
> implicit partition constraints than the other way around?  The other way
> around being to modify the existing code to include the implicit
> constraints which is what this patch is about.

I'm not sure which way is actually going to be more code modification,
but I guess you've persuaded me that the way you have it is
reasonable, so let's stick with that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread Robert Haas
On Wed, Nov 2, 2016 at 6:47 AM, Amit Langote
 wrote:
> OK, I changed things so that DROP TABLE a_partition no longer complains
> about requiring to detach first.  Much like how index_drop() locks the
> parent table ('parent' in a different sense, of course) and later
> invalidates its relcache, heap_drop_with_catalog(), if the passed in relid
> is a partition, locks the parent table using AccessExclusiveLock, performs
> its usual business, and finally invalidates the parent's relcache before
> closing it without relinquishing the lock.  Does that sounds sane?

Yes.

> One
> downside is that if the specified command is DROP TABLE parent CASCADE,
> the above described invalidation is a waste of cycles because the parent
> will be dropped anyway after all the partitions are dropped.

I don't think that's even slightly worth worrying about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread Robert Haas
On Tue, Nov 1, 2016 at 2:36 PM, Corey Huinker  wrote:
> On Tue, Nov 1, 2016 at 2:24 PM, Robert Haas  wrote:
>> Well, I'm not sure we've exactly reached consensus here, and you're
>> making me feel like I just kicked a puppy.
>
> It was hyperbole. I hope you found it as funny to read as I did to write.
> This is a great feature and I'm not going to make "perfect" the enemy of
> "good".

Oh, OK.  Sorry, the tone did not transfer to my brain in the way you
intended it - I thought you were actually upset.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread Robert Haas
On Thu, Nov 3, 2016 at 7:46 AM,   wrote:
> El 2016-10-28 07:53, Amit Langote escribió:
>> @@ -6267,6 +6416,12 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab,
>> Relation rel,
>>  * Validity checks (permission checks wait till we have the column
>>  * numbers)
>>  */
>> +   if (pkrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>> +   ereport(ERROR,
>> +   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>> +errmsg("cannot reference relation
>> \"%s\"", RelationGetRelationName(pkrel)),
>> +errdetail("Referencing partitioned tables
>> in foreign key constraints is not supported.")));
>
> Is there a plan for fixing this particular limitation?  It's a pretty
> serious problem for users,
> and the suggested workaround (to create a separate non-partitioned table
> which carries only the PK
> columns which is updated by triggers, and direct the FKs to it instead of to
> the partitioned table)
> is not only a very ugly one, but also very slow.

If you have two compatibly partitioned tables, and the foreign key
matches the partitioning keys, you could implement a foreign key
between the two tables as a foreign key between each pair of matching
partitions.  Otherwise, isn't the only way to handle this a global
index?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-03 Thread alvherre

El 2016-10-28 07:53, Amit Langote escribió:

@@ -6267,6 +6416,12 @@ ATAddForeignKeyConstraint(AlteredTableInfo *tab, 
Relation rel,

 * Validity checks (permission checks wait till we have the column
 * numbers)
 */
+   if (pkrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   ereport(ERROR,
+   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reference relation \"%s\"", 
RelationGetRelationName(pkrel)),
+ errdetail("Referencing partitioned tables in foreign key 
constraints is not supported.")));


Is there a plan for fixing this particular limitation?  It's a pretty 
serious problem for users,
and the suggested workaround (to create a separate non-partitioned table 
which carries only the PK
columns which is updated by triggers, and direct the FKs to it instead 
of to the partitioned table)

is not only a very ugly one, but also very slow.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-02 Thread Amit Langote
On 2016/11/02 2:44, Robert Haas wrote:
> On Fri, Oct 28, 2016 at 3:53 AM, Amit Langote
>  wrote:
>>> Insisting that you can't drop a child without detaching it first seems
>>> wrong to me.  If I already made this comment and you responded to it,
>>> please point me back to whatever you said.  However, my feeling is
>>> this is flat wrong and absolutely must be changed.
>>
>> I said the following [1]:
>>
>> | Hmm, I don't think I like this.  Why should it be necessary to detach
>> | a partition before dropping it?  That seems like an unnecessary step.
>>
>> I thought we had better lock the parent table when removing one of its
>> partitions and it seemed a bit odd to lock the parent table when dropping
>> a partition using DROP TABLE?  OTOH, with ALTER TABLE parent DETACH
>> PARTITION, the parent table is locked anyway.
> 
> That "OTOH" part seems like a pretty relevant point.
> 
> Basically, I think people expect to be able to say "DROP THINGTYPE
> thingname" or at most "DROP THINGTYPE thingname CASCADE" and have that
> thing go away.  I'm opposed to anything which requires some other
> series of steps without a very good reason, and possible surprise
> about the precise locks that the command requires isn't a good enough
> reason from my point of view.

OK, I changed things so that DROP TABLE a_partition no longer complains
about requiring to detach first.  Much like how index_drop() locks the
parent table ('parent' in a different sense, of course) and later
invalidates its relcache, heap_drop_with_catalog(), if the passed in relid
is a partition, locks the parent table using AccessExclusiveLock, performs
its usual business, and finally invalidates the parent's relcache before
closing it without relinquishing the lock.  Does that sounds sane?  One
downside is that if the specified command is DROP TABLE parent CASCADE,
the above described invalidation is a waste of cycles because the parent
will be dropped anyway after all the partitions are dropped.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-02 Thread Amit Langote
On 2016/11/02 16:41, Amit Langote wrote:
> Having said all that, I am open to switching to the catalogued partition
> constraints if the arguments I make above in favor of this patch are not
> all that sound.

One problem I didn't immediately see a solution for if we go with the
catalogued partition constraints is how do we retrieve a relation's
partition constraint?  There are a couple of cases where that becomes
necessary.  Consider that we are adding a partition to a partitioned table
that is itself partition.  In this case, the new partition's constraint
consists of whatever we generate from its FOR VALUES specification *ANDed*
with the parent's constraint.  We must somehow get hold of the latter.
Which of the parent's named check constraints in the pg_constraint catalog
is supposed to be the partition constraint?  With the uncatalogued
partition constraints approach, we simply generate it from the parent's
pg_class.relpartbound (or we might get the relcache copy of the same
stored in rd_partcheck).  Hmm.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-02 Thread Amit Langote
On 2016/11/02 2:34, Robert Haas wrote:
> On Fri, Oct 28, 2016 at 3:53 AM, Amit Langote
>  wrote:
>> [ new patches ]
> 
> Reviewing 0006:

Thanks for the review!

> This patch seems scary.  I sort of assumed from the title -- "Teach a
> few places to use partition check quals." -- that this was an optional
> thing, some kind of optimization from which we could reap further
> advantage once the basic infrastructure was in place.  But it's not
> that at all.  It's absolutely necessary that we do this, or data
> integrity is fundamentally compromised.  How do we know that we've
> found all of the places that need to be taught about these new,
> uncatalogued constraints?

Making this a separate commit from 0003 was essentially to avoid this
getting lost among all of its other changes.  In fact, it was to bring to
notice for closer scrutiny whether all the sites in the backend code that
are critical for data integrity in face of the implicit partition
constraints are being informed about those constraints.

> I'm feeling fairly strongly like you should rewind and make the
> partitioning constraints normal catalogued constraints.  That's got a
> number of advantages, most notably that we can be sure they will be
> properly enforced by the entire system (modulo existing bugs, of
> course).  Also, they'll show up automatically in tools like psql's \d
> output, pgAdmin, and anything else that is accustomed to being able to
> find constraints in the catalog.  We do need to make sure that those
> constraints can't be dropped (or altered?) inappropriately, but that's
> a relatively small problem.  If we stick with the design you've got
> here, every client tool in the world needs to be updated, and I'm not
> seeing nearly enough advantage in this system to justify that kind of
> upheaval.

As for which parts of the system need to know about these implicit
partition constraints to *enforce* them for data integrity, we could say
that it's really just one site - ExecConstraints() called from
ExecInsert()/ExecUpdate().

Admittedly, the current error message style as in this patch exposes the
implicit constraint approach to a certain criticism: "ERROR:  new row
violates the partition boundary specification of \"%s\"".  It would say
the following if it were a named constraint: "ERROR: new row for relation
\"%s\" violates check constraint \"%s\""

For constraint exclusion optimization, we teach get_relation_constraints()
to look at these constraints.  Although greatly useful, it's not the case
of being absolutely critical.

Beside the above two cases, there is bunch of code (relcache, DDL) that
looks at regular constraints, but IMHO, we need not let any of that code
concern itself with the implicit partition constraints.  Especially, I
wonder why the client tools should want the implicit partitioning
constraint to be shown as a CHECK constraint?  As the proposed patch 0004
(psql) currently does, isn't it better to instead show the partition
bounds as follows?

+CREATE TABLE part_b PARTITION OF parted (
+   b WITH OPTIONS NOT NULL DEFAULT 1 CHECK (b >= 0),
+   CONSTRAINT check_a CHECK (length(a) > 0)
+) FOR VALUES IN ('b');

+\d part_b
+ Table "public.part_b"
+ Column |  Type   | Modifiers
++-+
+ a  | text|
+ b  | integer | not null default 1
+Partition of: parted FOR VALUES IN ('b')
+Check constraints:
+"check_a" CHECK (length(a) > 0)
+"part_b_b_check" CHECK (b >= 0)

Needless to say, that could save a lot of trouble thinking about
generating collision-free names of these constraints, their dependency
handling, unintended altering of these constraints, pg_dump, etc.

> In fact, as far as I can see, the only advantage of this approach is
> that when the insert arrives through the parent and is routed to the
> child by whatever tuple-routing code we end up with (I guess that's
> what 0008 does), we get to skip checking the constraint, saving CPU
> cycles.  That's probably an important optimization, but I don't think
> that putting the partitioning constraint in the catalog in any way
> rules out the possibility of performing that optimization.  It's just
> that instead of having the partitioning excluded-by-default and then
> sometimes choosing to include it, you'll have it included-by-default
> and then sometimes choose to exclude it.

Hmm, doesn't it seem like we would be making *more* modifications to the
existing code (backend or otherwise) to teach it about excluding the
implicit partition constraints than the other way around?  The other way
around being to modify the existing code to include the implicit
constraints which is what this patch is about.

Having said all that, I am open to switching to the catalogued partition
constraints if the arguments I make above in favor of this patch are not
all that sound.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes 

Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Amit Langote
On 2016/11/02 2:53, Robert Haas wrote:
> On Fri, Oct 28, 2016 at 3:53 AM, Amit Langote
>  wrote:
>>  [ new patches ]
> 
> Reviewing 0005:
> 
> Your proposed commit messages says this:
> 
>> If relation is the target table (UPDATE and DELETE), flattening is
>> done regardless (scared to modify inheritance_planner() yet).

I should have explained my thinking behind why I ended up not handling the
result relation case:

While set_append_rel_size() can be safely invoked recursively on the roots
of partition sub-trees, I was not quite sure if inheritance_planner() is
amenable to such recursive invocation.  While the former is relatively
straightforward baserel processing, the latter is not-so-straightforward
transformation of the whole query for every target child relation of the
target parent.

> In the immortal words of Frank Herbert: “I must not fear. Fear is the
> mind-killer. Fear is the little-death that brings total obliteration.
> I will face my fear. I will permit it to pass over me and through me.
> And when it has gone past I will turn the inner eye to see its path.
> Where the fear has gone there will be nothing. Only I will remain.”
> 
> In other words, I'm not going to accept fear as a justification for
> randomly-excluding the target-table case from this code.  If there's
> an actual reason not to do this in that case or some other case, then
> let's document that reason.  But weird warts in the code that are
> justified only by nameless anxiety are not good.

Perhaps the above comment expands a little on the actual reason.  Though I
guess it still amounts to giving up on doing a full analysis of whether
recursive processing within inheritance_planner() is feasible.

I think we could just skip this patch as long as a full investigation into
inheritance_planner() issues is not done.  It's possible that there might
be other places in the planner code which are not amenable to the proposed
multi-level AppendRelInfos.  Ashutosh had reported one [1], wherein
lateral joins wouldn't work with multi-level partitioned table owing to
the multi-level AppendRelInfos (the patch contains a hunk to address that).

> Of course, the prior question is whether we should EVER be doing this.
> I realize that something like this MAY be needed for partition-wise
> join, but the mission of this patch is not to implement partition-wise
> join.  Does anything in this patch series really require this?  If so,
> what?  If not, how about we leave it out and refactor it when that
> change is really needed for something?

Nothing *requires* it as such.  One benefit I see is that exclusion of the
root of some partition sub-tree means the whole sub-tree is excluded in
one go.  Currently, because of the flattening, every relation in that
sub-tree would be excluded separately, needlessly repeating the expensive
constraint exclusion proof. But I guess that's just an optimization.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAFjFpReZF34MDbY95xoATi0xVj2mAry4-LHBWVBayOc8gj%3Diqg%40mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Corey Huinker
On Tue, Nov 1, 2016 at 2:24 PM, Robert Haas  wrote:

> Well, I'm not sure we've exactly reached consensus here, and you're
> making me feel like I just kicked a puppy.
>

It was hyperbole. I hope you found it as funny to read as I did to write.
This is a great feature and I'm not going to make "perfect" the enemy of
"good".


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Robert Haas
On Tue, Nov 1, 2016 at 2:11 PM, Corey Huinker  wrote:
> On Tue, Nov 1, 2016 at 2:01 PM, Robert Haas  wrote:
>> Yeah.  That syntax has some big advantages, though.  If we define that
>> partition as START ('2014-01-01') INCLUSIVE END ('2014-12-31')
>> INCLUSIVE, there's no way for the system to tell that the there's no
>> gap between the that ending bound and the starting bound of the 2015
>> partition, because the system has no domain-specific knowledge that
>> there is no daylight between 2014-12-31 and 2015-01-01.  So if we
>> allow things to be specified that way, then people will use that
>> syntax and then complain when it doesn't perform quite as well as
>> START ('2014-01-01') END ('2015-01-01').  Maybe the difference isn't
>> material and maybe we don't care; what do you think?
>
> It was a fight I didn't expect to win, and even if we don't get
> [x,x]-expressible partitions, at least we're not in the Oracle
> context-waterfall, where the lower bound of your partition is determined by
> the upper bound of the NEXT partition.

I certainly agree that was a horrible design decision on Oracle's
part.  It's really messy.  If you drop a partition, it changes the
partition constraint for the adjacent partition.  Then you want to add
the partition back, say, but you have to first check whether the
adjacent partition, whose legal range has been expanded, has any
values that are out of bounds.  Which it can't, but you don't know
that, so you have to scan the whole partition.  Aargh!  Maybe this
somehow works out in their system - I'm not an Oracle expert - but I
think it's absolutely vital that we don't replicate it into
PostgreSQL.  (I have some, ahem, first-hand knowledge of how that
works out.)

>> (I really don't want to get tied up adding a system for adding and
>> subtracting one to and from arbitrary data types.  Life is too short.
>> If that requires that users cope with a bit of cognitive dissidence,
>> well, it's not the first time something like that will have happened.
>> I have some cognitive dissidence about the fact that creat(2) has no
>> trailing "e" but truncate(2) does, and moreover the latter can be used
>> to make a file longer rather than shorter.  But, hey, that's what you
>> get for choosing a career in computer science.)
>
> That noise your heard was the sound of my dream dying.

Well, I'm not sure we've exactly reached consensus here, and you're
making me feel like I just kicked a puppy.  However, my goal here is
not to kill your dream but to converge on a committable patch as
quickly as possible.  Adding increment/decrement operators to every
discrete(-ish) type we have is not the way to accomplish that.  To the
contrary, that's just about guaranteed to make this patch take an
extra release cycle to finish.  Now, that does not necessarily mean we
can't keep the INCLUSIVE/EXCLUSIVE stuff -- after all, Amit has
already written it -- but then we have to live with the fact that
+1/-1 based optimizations and matching are not going to be there.
Whether it's still worth having fully-open and fully-closed ranges on
general principle -- and whether the lack of such optimizations
changes the calculus -- is what we are now debating.

More votes welcome.  Right now I count 2 votes for keeping the
inclusive-exclusive stuff (Amit, Corey) and two for nuking it from
orbit (Robert, Francisco).  I'm not going to put my foot down on this
point against a contrary consensus, so please chime in.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Francisco Olarte
Robert:

On Tue, Nov 1, 2016 at 7:09 PM, Robert Haas  wrote:
> In defense of Corey's position, that's not so easy.  First, \0 doesn't
> work; our strings can't include null bytes.  Second, the minimum legal
> character depends on the collation in use.  It's not so easy to figure
> out what the "next" string is, even though there necessarily must be
> one.

I'm aware of that, just wanted to point that it can be done on strings.

> I think we're all in agreement that half-open intervals should not
> only be allowed, but the default.  The question is whether it's a good
> idea to also allow other possibilities.

In my experience, people continuously misuse them. I would specially
like to have them disallowed on timestamp columns ( and other
real-like data, including numeric ). But knowing they cannot do a few
things, and some others are easier with them is enough for allowing
them as an explicit non default for me.

Francisco Olarte.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Corey Huinker
>
> OTOH I've seen a lot of people bitten by [2014-01-01,2014-12-31] on
> TIMESTAMP intervals.
>

No argument there.


> Everybody remembers december has 31 days, but when we have to do
> MONTHLY partitions if you use closed intervals someone always miskeys
> the number of days, or forgets wheter a particular year is leap or
> not, and when doing it automatically I always have to code it as start
> + 1 month - 1day. In my experience having the non-significant part of
> the dates ( days in monthly case, months too in yearly cases ) both 1
> and equal in start and end makes it easier to check and identify, and
> less error prone.
>

Being able to define partitions by expressions based on the values I want
would be a good thing.


> You just do the classical ( I've had to do it ) closed end || minimum
> char ( "XYZ","XYZ\0" in this case ). It is not that difficult as
> strings have a global order, the next string to any one is always that
> plus the \0, or whatever your minimum is.
>

I've thought about doing that in the past, but wasn't sure it would
actually work correctly. If you have experience with it doing so, that
would be encouraging. How does that *look* when you print your partition
layout though?


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Francisco Olarte
On Tue, Nov 1, 2016 at 7:01 PM, Robert Haas  wrote:
> In the end, keywords are not the defining issue here; the issue is
> whether all of this complexity around inclusive and exclusive bounds
> carries its weight, and whether we want to be committed to that.
>
> Any other opinions out there?

If it where for me I would opt for just half-open intervals. The only
problem I've ever had with them is when working with FINITE ranges,
i.e., there is no way of expresing the range of 8 bits integer with
half open intervals of 8 bit integers, but I would happily pay that
cost for the benefits of not having people unintentionally make
non-contiguous date/timestamp intervals, which I periodically suffer.

Francisco Olarte.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Corey Huinker
On Tue, Nov 1, 2016 at 2:01 PM, Robert Haas  wrote:

> Yeah.  That syntax has some big advantages, though.  If we define that
> partition as START ('2014-01-01') INCLUSIVE END ('2014-12-31')
> INCLUSIVE, there's no way for the system to tell that the there's no
> gap between the that ending bound and the starting bound of the 2015
> partition, because the system has no domain-specific knowledge that
> there is no daylight between 2014-12-31 and 2015-01-01.  So if we
> allow things to be specified that way, then people will use that
> syntax and then complain when it doesn't perform quite as well as
> START ('2014-01-01') END ('2015-01-01').  Maybe the difference isn't
> material and maybe we don't care; what do you think?
>

It was a fight I didn't expect to win, and even if we don't get
[x,x]-expressible partitions, at least we're not in the Oracle
context-waterfall, where the lower bound of your partition is determined by
the upper bound of the NEXT partition.

(I really don't want to get tied up adding a system for adding and
> subtracting one to and from arbitrary data types.  Life is too short.
> If that requires that users cope with a bit of cognitive dissidence,
> well, it's not the first time something like that will have happened.
> I have some cognitive dissidence about the fact that creat(2) has no
> trailing "e" but truncate(2) does, and moreover the latter can be used
> to make a file longer rather than shorter.  But, hey, that's what you
> get for choosing a career in computer science.)
>

That noise your heard was the sound of my dream dying.


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Robert Haas
On Tue, Nov 1, 2016 at 2:05 PM, Francisco Olarte  wrote:
>> /me raises hand.  We have tables with a taxonomy in them where the even data
>> splits don't fall on single letter boundaries, and often the single string
>> values have more rows than entire letters. In those situations, being able
>> to express ['XYZ','XYZ'] is important.  ['XYZ,'XZ') would let 'XYZ1' bleed
>> into the partition and ['XYZ','XYZ1') lets in other values, and so I go
>> chasing down the non-discrete set rabbit hole.
>
> You just do the classical ( I've had to do it ) closed end || minimum
> char ( "XYZ","XYZ\0" in this case ). It is not that difficult as
> strings have a global order, the next string to any one is always that
> plus the \0, or whatever your minimum is.

In defense of Corey's position, that's not so easy.  First, \0 doesn't
work; our strings can't include null bytes.  Second, the minimum legal
character depends on the collation in use.  It's not so easy to figure
out what the "next" string is, even though there necessarily must be
one.

> The problem is with anything similar to a real number, but then there
> I've always opted for half-open interval, as they can cover the line
> without overlapping, unlike closed ones.
>
> Anyway, as long as anyone makes sure HALF-OPEN intervals are allowed,
> I'm fine ( I do not remember the name, but once had to work with a
> system that only allowed closed or open and it was a real PITA.

I think we're all in agreement that half-open intervals should not
only be allowed, but the default.  The question is whether it's a good
idea to also allow other possibilities.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Francisco Olarte
On Tue, Nov 1, 2016 at 6:49 PM, Corey Huinker  wrote:
>
> On Tue, Nov 1, 2016 at 12:57 PM, Robert Haas  wrote:
>> For strings and numeric types that are not integers, there is in
>> theory a loss of power.  If you want a partition that allows very
>> value starting with 'a' plus the string 'b' but not anything after
>> that, you are out of luck.  START ('a') END ('b') INCLUSIVE would have
>> done exactly what you want, but now you need to store the first string
>> that you *don't* want to include in that partition, and what's that?
>> Dunno.  Or similarly if you want to store everything from 1.0 up to
>> and including 2.0 but nothing higher, you can't, really.
> Exactly. This is especially true for date ranges. There's a lot of cognitive
> dissonance in defining the "2014" partition as < '2015-01-01', as was the
> case in Oracle waterfall-style partitioning. That was my reasoning for
> pushing for range-ish syntax as well as form.

OTOH I've seen a lot of people bitten by [2014-01-01,2014-12-31] on
TIMESTAMP intervals.

Everybody remembers december has 31 days, but when we have to do
MONTHLY partitions if you use closed intervals someone always miskeys
the number of days, or forgets wheter a particular year is leap or
not, and when doing it automatically I always have to code it as start
+ 1 month - 1day. In my experience having the non-significant part of
the dates ( days in monthly case, months too in yearly cases ) both 1
and equal in start and end makes it easier to check and identify, and
less error prone.

>> But who wants that?  People who are doing prefix-based partitioning of
>> their text keys are going to want all of the 'a' things together, and
>> all of the 'b' things in another category.  Same for ranges of
>> floating-point numbers, which are also probably an unlikely candidate
>> for a partitioning key anyway.
> /me raises hand.  We have tables with a taxonomy in them where the even data
> splits don't fall on single letter boundaries, and often the single string
> values have more rows than entire letters. In those situations, being able
> to express ['XYZ','XYZ'] is important.  ['XYZ,'XZ') would let 'XYZ1' bleed
> into the partition and ['XYZ','XYZ1') lets in other values, and so I go
> chasing down the non-discrete set rabbit hole.

You just do the classical ( I've had to do it ) closed end || minimum
char ( "XYZ","XYZ\0" in this case ). It is not that difficult as
strings have a global order, the next string to any one is always that
plus the \0, or whatever your minimum is.

The problem is with anything similar to a real number, but then there
I've always opted for half-open interval, as they can cover the line
without overlapping, unlike closed ones.

Anyway, as long as anyone makes sure HALF-OPEN intervals are allowed,
I'm fine ( I do not remember the name, but once had to work with a
system that only allowed closed or open and it was a real PITA.

Francisco Olarte.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Robert Haas
On Tue, Nov 1, 2016 at 1:49 PM, Corey Huinker  wrote:
> Exactly. This is especially true for date ranges. There's a lot of cognitive
> dissonance in defining the "2014" partition as < '2015-01-01', as was the
> case in Oracle waterfall-style partitioning. That was my reasoning for
> pushing for range-ish syntax as well as form.

Yeah.  That syntax has some big advantages, though.  If we define that
partition as START ('2014-01-01') INCLUSIVE END ('2014-12-31')
INCLUSIVE, there's no way for the system to tell that the there's no
gap between the that ending bound and the starting bound of the 2015
partition, because the system has no domain-specific knowledge that
there is no daylight between 2014-12-31 and 2015-01-01.  So if we
allow things to be specified that way, then people will use that
syntax and then complain when it doesn't perform quite as well as
START ('2014-01-01') END ('2015-01-01').  Maybe the difference isn't
material and maybe we don't care; what do you think?

(I really don't want to get tied up adding a system for adding and
subtracting one to and from arbitrary data types.  Life is too short.
If that requires that users cope with a bit of cognitive dissidence,
well, it's not the first time something like that will have happened.
I have some cognitive dissidence about the fact that creat(2) has no
trailing "e" but truncate(2) does, and moreover the latter can be used
to make a file longer rather than shorter.  But, hey, that's what you
get for choosing a career in computer science.)

>> But who wants that?  People who are doing prefix-based partitioning of
>> their text keys are going to want all of the 'a' things together, and
>> all of the 'b' things in another category.  Same for ranges of
>> floating-point numbers, which are also probably an unlikely candidate
>> for a partitioning key anyway.
>
> /me raises hand.  We have tables with a taxonomy in them where the even data
> splits don't fall on single letter boundaries, and often the single string
> values have more rows than entire letters. In those situations, being able
> to express ['XYZ','XYZ'] is important.  ['XYZ,'XZ') would let 'XYZ1' bleed
> into the partition and ['XYZ','XYZ1') lets in other values, and so I go
> chasing down the non-discrete set rabbit hole.

Hmm.  I have to admit that I hadn't considered the case where you have
a range partitioning scheme but one of the ranges includes only a
single string.  If that's an important use case, that might be a fatal
problem with my proposal.  :-(

> If we're worried about keywords, maybe a BOUNDED '[]' clause?

In the end, keywords are not the defining issue here; the issue is
whether all of this complexity around inclusive and exclusive bounds
carries its weight, and whether we want to be committed to that.

Any other opinions out there?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Robert Haas
On Fri, Oct 28, 2016 at 3:53 AM, Amit Langote
 wrote:
>  [ new patches ]

Reviewing 0005:

Your proposed commit messages says this:

> If relation is the target table (UPDATE and DELETE), flattening is
> done regardless (scared to modify inheritance_planner() yet).

In the immortal words of Frank Herbert: “I must not fear. Fear is the
mind-killer. Fear is the little-death that brings total obliteration.
I will face my fear. I will permit it to pass over me and through me.
And when it has gone past I will turn the inner eye to see its path.
Where the fear has gone there will be nothing. Only I will remain.”

In other words, I'm not going to accept fear as a justification for
randomly-excluding the target-table case from this code.  If there's
an actual reason not to do this in that case or some other case, then
let's document that reason.  But weird warts in the code that are
justified only by nameless anxiety are not good.

Of course, the prior question is whether we should EVER be doing this.
I realize that something like this MAY be needed for partition-wise
join, but the mission of this patch is not to implement partition-wise
join.  Does anything in this patch series really require this?  If so,
what?  If not, how about we leave it out and refactor it when that
change is really needed for something?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Corey Huinker
On Tue, Nov 1, 2016 at 12:57 PM, Robert Haas  wrote:

> For strings and numeric types that are not integers, there is in
> theory a loss of power.  If you want a partition that allows very
> value starting with 'a' plus the string 'b' but not anything after
> that, you are out of luck.  START ('a') END ('b') INCLUSIVE would have
> done exactly what you want, but now you need to store the first string
> that you *don't* want to include in that partition, and what's that?
> Dunno.  Or similarly if you want to store everything from 1.0 up to
> and including 2.0 but nothing higher, you can't, really.
>
>
Exactly. This is especially true for date ranges. There's a lot of
cognitive dissonance in defining the "2014" partition as < '2015-01-01', as
was the case in Oracle waterfall-style partitioning. That was my reasoning
for pushing for range-ish syntax as well as form.


> But who wants that?  People who are doing prefix-based partitioning of
> their text keys are going to want all of the 'a' things together, and
> all of the 'b' things in another category.  Same for ranges of
> floating-point numbers, which are also probably an unlikely candidate
> for a partitioning key anyway.
>

/me raises hand.  We have tables with a taxonomy in them where the even
data splits don't fall on single letter boundaries, and often the single
string values have more rows than entire letters. In those situations,
being able to express ['XYZ','XYZ'] is important.  ['XYZ,'XZ') would let
'XYZ1' bleed into the partition and ['XYZ','XYZ1') lets in other values,
and so I go chasing down the non-discrete set rabbit hole.

If we're worried about keywords, maybe a BOUNDED '[]' clause?


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Robert Haas
On Fri, Oct 28, 2016 at 3:53 AM, Amit Langote
 wrote:
>> Insisting that you can't drop a child without detaching it first seems
>> wrong to me.  If I already made this comment and you responded to it,
>> please point me back to whatever you said.  However, my feeling is
>> this is flat wrong and absolutely must be changed.
>
> I said the following [1]:
>
> | Hmm, I don't think I like this.  Why should it be necessary to detach
> | a partition before dropping it?  That seems like an unnecessary step.
>
> I thought we had better lock the parent table when removing one of its
> partitions and it seemed a bit odd to lock the parent table when dropping
> a partition using DROP TABLE?  OTOH, with ALTER TABLE parent DETACH
> PARTITION, the parent table is locked anyway.

That "OTOH" part seems like a pretty relevant point.

Basically, I think people expect to be able to say "DROP THINGTYPE
thingname" or at most "DROP THINGTYPE thingname CASCADE" and have that
thing go away.  I'm opposed to anything which requires some other
series of steps without a very good reason, and possible surprise
about the precise locks that the command requires isn't a good enough
reason from my point of view.

>> I wonder if it's really a good idea for the partition constraints to
>> be implicit; what is the benefit of leaving those uncatalogued?
>
> I did start out that way - ie, catalogued implicit constraints, but later
> thought it might not be good to end up with multiple copies of essentially
> the same information.  With cataloguing them will come dependencies and
> all places that know about pg_constraint.
>
> In the long term, I think we're only going to need them because we want to
> enforce them when directly inserting data into partitions.

See my other email on this topic.  I agree there are some complexities
here, including making sure that pg_dump does the right thing.  But I
think it's the right way to go.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Robert Haas
On Fri, Oct 28, 2016 at 3:53 AM, Amit Langote
 wrote:
> [ new patches ]

Reviewing 0006:

This patch seems scary.  I sort of assumed from the title -- "Teach a
few places to use partition check quals." -- that this was an optional
thing, some kind of optimization from which we could reap further
advantage once the basic infrastructure was in place.  But it's not
that at all.  It's absolutely necessary that we do this, or data
integrity is fundamentally compromised.  How do we know that we've
found all of the places that need to be taught about these new,
uncatalogued constraints?

I'm feeling fairly strongly like you should rewind and make the
partitioning constraints normal catalogued constraints.  That's got a
number of advantages, most notably that we can be sure they will be
properly enforced by the entire system (modulo existing bugs, of
course).  Also, they'll show up automatically in tools like psql's \d
output, pgAdmin, and anything else that is accustomed to being able to
find constraints in the catalog.  We do need to make sure that those
constraints can't be dropped (or altered?) inappropriately, but that's
a relatively small problem.  If we stick with the design you've got
here, every client tool in the world needs to be updated, and I'm not
seeing nearly enough advantage in this system to justify that kind of
upheaval.

In fact, as far as I can see, the only advantage of this approach is
that when the insert arrives through the parent and is routed to the
child by whatever tuple-routing code we end up with (I guess that's
what 0008 does), we get to skip checking the constraint, saving CPU
cycles.  That's probably an important optimization, but I don't think
that putting the partitioning constraint in the catalog in any way
rules out the possibility of performing that optimization.  It's just
that instead of having the partitioning excluded-by-default and then
sometimes choosing to include it, you'll have it included-by-default
and then sometimes choose to exclude it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-11-01 Thread Robert Haas
On Fri, Oct 28, 2016 at 3:53 AM, Amit Langote
 wrote:
>> 4. I'm somewhat wondering if we ought to just legislate that the lower
>> bound is always inclusive and the upper bound is always exclusive.
>> The decision to support both inclusive and exclusive partition bounds
>> is responsible for an enormous truckload of complexity in this patch,
>> and I have a feeling it is also going to be a not-infrequent cause of
>> user error.
>
> I thought we decided at some point to go with range type like notation to
> specify range partition bound because of its flexibility.  I agree though
> that with that flexibility, there will more input combinations that will
> cause error.  As for the internal complexity, it's not clear to me whether
> it will be reduced by always-inclusive lower and always-exclusive upper
> bounds.  We would still need to store the inclusive flag with individual
> PartitionRangeBound and consider it when comparing them with each other
> and with partition key of tuples.

We did.  But I now think that was kind of silly.  I mean, we also
talked about not having a hard distinction between list partitioning
and range partitioning, but you didn't implement it that way and I
think that's probably a good thing.  The question is - what's the
benefit of allowing this to be configurable?

For integers, there's absolutely no difference in expressive power.
If you want to allow 1000-2000 with both bounds inclusive, you can
just say START (1000) END (2001) instead of START (1000) END (2000)
INCLUSIVE.  This is also true for any other datatype where it makes
sense to talk about "the next value" and "the previous value".
Instead of making the upper bound inclusive, you can just end at the
next value instead.  If you were tempted to make the lower bound
exclusive, same thing.

For strings and numeric types that are not integers, there is in
theory a loss of power.  If you want a partition that allows very
value starting with 'a' plus the string 'b' but not anything after
that, you are out of luck.  START ('a') END ('b') INCLUSIVE would have
done exactly what you want, but now you need to store the first string
that you *don't* want to include in that partition, and what's that?
Dunno.  Or similarly if you want to store everything from 1.0 up to
and including 2.0 but nothing higher, you can't, really.

But who wants that?  People who are doing prefix-based partitioning of
their text keys are going to want all of the 'a' things together, and
all of the 'b' things in another category.  Same for ranges of
floating-point numbers, which are also probably an unlikely candidate
for a partitioning key anyway.

So let's look at the other side.  What do we gain by excluding this
functionality?  Well, we save a parser keyword: INCLUSIVE no longer
needs to be a keyword.  We also can save some code in
make_one_range_bound(), RelationBuildPartitionDesc(),
copy_range_bound(), partition_rbound_cmp(), and partition_rbound_eq().

Also, if we exclude this now as I'm proposing, we can always add it
back later if it turns out that people need it.  On the other hand, if
we include it in the first version, it's going to be very hard to get
rid of it if it turns out we don't want it.  Once we release support
for a syntax, we're kinda stuck with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-26 Thread Amit Langote
On 2016/10/27 3:13, Robert Haas wrote:
> On Tue, Oct 25, 2016 at 9:06 PM, Amit Langote
>  wrote:
>> I will include these changes in the next version of patches I will post
>> soon in reply to [1].
>> [1]
>> https://www.postgresql.org/message-id/CA%2BTgmoYJcUTcN7vVgg54GHtffH11JJWYZnfF4KiRxjV-iaACQg%40mail.gmail.com
> 
> How soon?  Tempus fugit, and tempus in this release cycle fugit
> particularly quickly.  It looks like the last revision of these
> patches was on October 7th, and that's now more than 2 weeks ago.  We
> need to get moving here if this is going to happen this cycle.

Sorry about the delay, I'll post no later than tomorrow.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-26 Thread Robert Haas
On Tue, Oct 25, 2016 at 10:00 PM, Amit Langote
 wrote:
> I am not (or no longer) sure how that argument affects INSERT on
> partitioned tables with tuple-routing though.  Are partitions at all
> levels *implicitly specified to be affected* when we say INSERT INTO
> root_partitioned_table?

I'd say yes.

>> Some of the other things that we might want to consider disallowing on
>> parent table could be:
>> a. Policy on table_name
>
> Perhaps.  Since there are no rows in the parent table(s) itself of a
> partition hierarchy, it might not make sense to continue to allow creating
> row-level security policies on them.

No, per my previous email.  Those policies are emphatically not without effect.

>> b. Alter table has many clauses, are all of those allowed and will it
>> make sense to allow them?
>
> Currently, we only disallow the following with partitioned parent tables
> as far as alter table is concerned.
>
> - cannot change inheritance by ALTER TABLE partitioned_table INHERIT ...
>
> - cannot let them be regular inheritance parents either - that is, the
>   following is disallowed: ALTER TABLE some_able INHERIT partitioned_table
>
> - cannot create UNIQUE, PRIMARY KEY, FOREIGN KEY, EXCLUDE constraints
>
> - cannot drop column involved in the partitioning key
>
> Most other forms that affect attributes and constraints follow the regular
> inheritance behavior (recursion) with certain exceptions such as:
>
> - cannot add/drop an attribute or check constraint to *only* to/from
>   the parent
>
> - cannot add/drop NOT NULL constraint to/from *only* the parent
>
> Thoughts?

Seems sensible to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-26 Thread Robert Haas
On Tue, Oct 25, 2016 at 9:06 PM, Amit Langote
 wrote:
> I will include these changes in the next version of patches I will post
> soon in reply to [1].
> [1]
> https://www.postgresql.org/message-id/CA%2BTgmoYJcUTcN7vVgg54GHtffH11JJWYZnfF4KiRxjV-iaACQg%40mail.gmail.com

How soon?  Tempus fugit, and tempus in this release cycle fugit
particularly quickly.  It looks like the last revision of these
patches was on October 7th, and that's now more than 2 weeks ago.  We
need to get moving here if this is going to happen this cycle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-26 Thread Robert Haas
On Tue, Oct 25, 2016 at 2:58 AM, Amit Kapila  wrote:
> a. Policy on table_name

No, because queries against the parent will apply the policy to
children.  See today's commit
162477a63d3c0fd1c31197717140a88077a8d0aa.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-26 Thread Robert Haas
On Wed, Oct 26, 2016 at 6:12 AM, Amit Kapila  wrote:
> On Wed, Oct 26, 2016 at 3:04 PM, Amit Langote
>  wrote:
>> On 2016/10/26 17:57, Amit Kapila wrote:
>>> @@ -123,6 +123,9 @@ typedef struct RelationData
>>> {
>>> ..
>>>   MemoryContext rd_partkeycxt; /* private memory cxt for the below */
>>>   struct PartitionKeyData *rd_partkey; /* partition key, or NULL */
>>> + MemoryContext rd_pdcxt; /* private context for partdesc */
>>> + struct PartitionDescData *rd_partdesc; /* partitions, or NULL */
>>> + List   *rd_partcheck; /* partition CHECK quals */
>>> ..
>>> }
>>>
>>> I think one thing to consider here is the increase in size of relcache
>>> due to PartitionDescData.  I think it will be quite useful in some of
>>> the cases like tuple routing.  Isn't it feasible to get it in some
>>> other way, may be by using relpartbound from pg_class tuple?
>>
>> Whereas pg_class.relpartbound stores partition bound of the *individual
>> partitions* in Node form, the above relcache struct is associated with
>> parent tables; it contains some efficient to use (and fairly compact)
>> representation of bounds of *all* the partitions of the parent.
>>
>
> Okay, but still it will be proportional to number of partitions and
> the partition keys.  Is it feasible to store ranges only for
> partitions that are actively accessed?  For example, consider a table
> with 100 partitions and the first access to table requires to access
> 5th partition, then we store ranges for first five partitions or
> something like that.  This could be helpful, if we consider cases that
> active partitions are much less as compare to total partitions of a
> table.

I have serious doubt about whether it's a good idea to do that EVER,
but it certainly doesn't need to be in the first version of this
patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-26 Thread Amit Kapila
On Wed, Oct 26, 2016 at 3:04 PM, Amit Langote
 wrote:
> On 2016/10/26 17:57, Amit Kapila wrote:
>> @@ -123,6 +123,9 @@ typedef struct RelationData
>> {
>> ..
>>   MemoryContext rd_partkeycxt; /* private memory cxt for the below */
>>   struct PartitionKeyData *rd_partkey; /* partition key, or NULL */
>> + MemoryContext rd_pdcxt; /* private context for partdesc */
>> + struct PartitionDescData *rd_partdesc; /* partitions, or NULL */
>> + List   *rd_partcheck; /* partition CHECK quals */
>> ..
>> }
>>
>> I think one thing to consider here is the increase in size of relcache
>> due to PartitionDescData.  I think it will be quite useful in some of
>> the cases like tuple routing.  Isn't it feasible to get it in some
>> other way, may be by using relpartbound from pg_class tuple?
>
> Whereas pg_class.relpartbound stores partition bound of the *individual
> partitions* in Node form, the above relcache struct is associated with
> parent tables; it contains some efficient to use (and fairly compact)
> representation of bounds of *all* the partitions of the parent.
>

Okay, but still it will be proportional to number of partitions and
the partition keys.  Is it feasible to store ranges only for
partitions that are actively accessed?  For example, consider a table
with 100 partitions and the first access to table requires to access
5th partition, then we store ranges for first five partitions or
something like that.  This could be helpful, if we consider cases that
active partitions are much less as compare to total partitions of a
table.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-26 Thread Amit Langote
On 2016/10/26 17:57, Amit Kapila wrote:
> @@ -123,6 +123,9 @@ typedef struct RelationData
> {
> ..
>   MemoryContext rd_partkeycxt; /* private memory cxt for the below */
>   struct PartitionKeyData *rd_partkey; /* partition key, or NULL */
> + MemoryContext rd_pdcxt; /* private context for partdesc */
> + struct PartitionDescData *rd_partdesc; /* partitions, or NULL */
> + List   *rd_partcheck; /* partition CHECK quals */
> ..
> }
> 
> I think one thing to consider here is the increase in size of relcache
> due to PartitionDescData.  I think it will be quite useful in some of
> the cases like tuple routing.  Isn't it feasible to get it in some
> other way, may be by using relpartbound from pg_class tuple?

Whereas pg_class.relpartbound stores partition bound of the *individual
partitions* in Node form, the above relcache struct is associated with
parent tables; it contains some efficient to use (and fairly compact)
representation of bounds of *all* the partitions of the parent.  Consider
for example, an array of sorted range bounds for range partitioned tables.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-26 Thread Amit Kapila
On Wed, Oct 5, 2016 at 7:20 AM, Robert Haas  wrote:
> On Tue, Oct 4, 2016 at 4:02 AM, Amit Langote
>  wrote:
>> [ latest patch set ]
>
> Reviewing 0003:
>
>
> 5. I wonder how well this handles multi-column partition keys.  You've
> just got one Datum flag and one is-finite flag per partition, but I
> wonder if you don't need to track is-finite on a per-column basis, so
> that you could partition on (a, b) and have the first partition go up
> to (10, 10), the second to (10, infinity), the third to (20, 10), the
> fourth to (20, infinity), and the last to (infinity, infinity).  FWIW,
> Oracle supports this sort of thing, so perhaps we should, too.  On a
> related note, I'm not sure it's going to work to treat a composite
> partition key as a record type.  The user may want to specify a
> separate opfamily and collation for each column, not just inherit
> whatever the record behavior is.  I'm not sure if that's what you are
> doing, but the relcache structures don't seem adapted to storing one
> Datum per partitioning column per partition, but rather just one Datum
> per partition, and I'm not sure that's going to work very well.
>

@@ -123,6 +123,9 @@ typedef struct RelationData
{
..
  MemoryContext rd_partkeycxt; /* private memory cxt for the below */
  struct PartitionKeyData *rd_partkey; /* partition key, or NULL */
+ MemoryContext rd_pdcxt; /* private context for partdesc */
+ struct PartitionDescData *rd_partdesc; /* partitions, or NULL */
+ List   *rd_partcheck; /* partition CHECK quals */
..
}

I think one thing to consider here is the increase in size of relcache
due to PartitionDescData.  I think it will be quite useful in some of
the cases like tuple routing.  Isn't it feasible to get it in some
other way, may be by using relpartbound from pg_class tuple?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Langote
On 2016/10/26 12:09, Amit Kapila wrote:
> On Wed, Oct 26, 2016 at 8:27 AM, Amit Langote
>  wrote:
>> On 2016/10/26 11:41, Amit Kapila wrote:
>>> On Wed, Oct 26, 2016 at 6:36 AM, Amit Langote
>>>  wrote:

 Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned
 table is specified, but I thought this may be fine initially.
>>>
>>> Okay, I don't want to add anything to your existing work unless it is
>>> important.  However, I think there should be some agreement on which
>>> of the restrictions are okay for first version of patch.  This can
>>> avoid such questions in future from other reviewers.
>>
>> OK, so I assume you don't find this particular restriction problematic in
>> long term.
> 
> I think you can keep it as you have in patch.  After posting your
> updated patches, please do send a list of restrictions which this
> patch is imposing based on the argument that for first version they
> are not essential.

OK, agreed that it will be better to have all such restrictions and
limitations of the first version listed in one place, rather than being
scattered across different emails where they might have been mentioned and
discussed.

I will try to include such a list when posting the latest set of patches.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Kapila
On Wed, Oct 26, 2016 at 8:27 AM, Amit Langote
 wrote:
> On 2016/10/26 11:41, Amit Kapila wrote:
>> On Wed, Oct 26, 2016 at 6:36 AM, Amit Langote
>>  wrote:
 1.
 @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate,
 {
 ..
 + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 + ereport(ERROR,
 + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
 + errmsg("cannot copy from partitioned table \"%s\"",
 + RelationGetRelationName(rel)),
 + errhint("Try the COPY (SELECT ...) TO variant.")));
 ..
 }

 Why is this restriction?  Won't it be useful to allow it for the cases
 when user wants to copy the data of all the partitions?
>>>
>>> Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned
>>> table is specified, but I thought this may be fine initially.
>>>
>>
>> Okay, I don't want to add anything to your existing work unless it is
>> important.  However, I think there should be some agreement on which
>> of the restrictions are okay for first version of patch.  This can
>> avoid such questions in future from other reviewers.
>
> OK, so I assume you don't find this particular restriction problematic in
> long term.
>

I think you can keep it as you have in patch.  After posting your
updated patches, please do send a list of restrictions which this
patch is imposing based on the argument that for first version they
are not essential.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Langote
On 2016/10/26 11:41, Amit Kapila wrote:
> On Wed, Oct 26, 2016 at 6:36 AM, Amit Langote
>  wrote:
>>> 1.
>>> @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate,
>>> {
>>> ..
>>> + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>>> + errmsg("cannot copy from partitioned table \"%s\"",
>>> + RelationGetRelationName(rel)),
>>> + errhint("Try the COPY (SELECT ...) TO variant.")));
>>> ..
>>> }
>>>
>>> Why is this restriction?  Won't it be useful to allow it for the cases
>>> when user wants to copy the data of all the partitions?
>>
>> Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned
>> table is specified, but I thought this may be fine initially.
>>
> 
> Okay, I don't want to add anything to your existing work unless it is
> important.  However, I think there should be some agreement on which
> of the restrictions are okay for first version of patch.  This can
> avoid such questions in future from other reviewers.

OK, so I assume you don't find this particular restriction problematic in
long term.

>>> 2.
>>> + if (!pg_strcasecmp(stmt->partspec->strategy, "list") &&
>>> + partnatts > 1)
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>>> + errmsg("cannot list partition using more than one column")));
>>>
>>> /cannot list/cannot use list
>>
>> Actually "list partition" works here as a verb, as in "to list partition".
>>
> 
> I am not an expert of this matter, so probably some one having better
> grip can comment.  Are we using something similar in any other error
> message?

In fact, I changed to the current text after Robert suggested the same [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoaPxXJ14eDVia514UiuQAXyZGqfbz8Qg3G4a8Rz2gKF7w%40mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Kapila
On Wed, Oct 26, 2016 at 6:36 AM, Amit Langote
 wrote:
>> 1.
>> @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate,
>> {
>> ..
>> + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>> + errmsg("cannot copy from partitioned table \"%s\"",
>> + RelationGetRelationName(rel)),
>> + errhint("Try the COPY (SELECT ...) TO variant.")));
>> ..
>> }
>>
>> Why is this restriction?  Won't it be useful to allow it for the cases
>> when user wants to copy the data of all the partitions?
>
> Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned
> table is specified, but I thought this may be fine initially.
>

Okay, I don't want to add anything to your existing work unless it is
important.  However, I think there should be some agreement on which
of the restrictions are okay for first version of patch.  This can
avoid such questions in future from other reviewers.


>> 2.
>> + if (!pg_strcasecmp(stmt->partspec->strategy, "list") &&
>> + partnatts > 1)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>> + errmsg("cannot list partition using more than one column")));
>>
>> /cannot list/cannot use list
>
> Actually "list partition" works here as a verb, as in "to list partition".
>

I am not an expert of this matter, so probably some one having better
grip can comment.  Are we using something similar in any other error
message?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Langote
On 2016/10/25 15:58, Amit Kapila wrote:
> On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote
>  wrote:
>> On 2016/10/05 2:12, Robert Haas wrote:
>>> Hmm, do we ever fire triggers on the parent for operations on a child
>>> table?  Note this thread, which seems possibly relevant:
>>>
>>> https://www.postgresql.org/message-id/flat/cd282adde5b70b20c57f53bb9ab75e27%40biglumber.com
>>
>> The answer to your question is no.
>>
>> The thread you quoted discusses statement-level triggers and the
>> conclusion is that they don't work as desired for UPDATE and DELETE on
>> inheritance tables.  As things stand, only UPDATE or DELETE on the parent
>> affects the child tables and it's proposed there that the statement-level
>> triggers on the parent and also on any child tables affected should be
>> fired in that case.
>>
> 
> Doesn't that imply that the statement level triggers should be fired
> for all the tables that get changed for statement?  If so, then in
> your case it should never fire for parent table, which means we could
> disallow statement level triggers as well on parent tables?

I may have misunderstood statement-level triggers, but don't they apply to
tables *specified* as the target table in the statement, instead of those
*changed* by resulting actions?

Now in case of inheritance, unless ONLY is specified, all tables in the
hierarchy including the parent are *implicitly* specified to be affected
by an UPDATE or DELETE operation.  So, if some or all of those tables have
any statement-level triggers defined, they should get fired.  That was the
conclusion of that thread, but that TODO item still remains [1].

I am not (or no longer) sure how that argument affects INSERT on
partitioned tables with tuple-routing though.  Are partitions at all
levels *implicitly specified to be affected* when we say INSERT INTO
root_partitioned_table?

> Some of the other things that we might want to consider disallowing on
> parent table could be:
> a. Policy on table_name

Perhaps.  Since there are no rows in the parent table(s) itself of a
partition hierarchy, it might not make sense to continue to allow creating
row-level security policies on them.

> b. Alter table has many clauses, are all of those allowed and will it
> make sense to allow them?

Currently, we only disallow the following with partitioned parent tables
as far as alter table is concerned.

- cannot change inheritance by ALTER TABLE partitioned_table INHERIT ...

- cannot let them be regular inheritance parents either - that is, the
  following is disallowed: ALTER TABLE some_able INHERIT partitioned_table

- cannot create UNIQUE, PRIMARY KEY, FOREIGN KEY, EXCLUDE constraints

- cannot drop column involved in the partitioning key

Most other forms that affect attributes and constraints follow the regular
inheritance behavior (recursion) with certain exceptions such as:

- cannot add/drop an attribute or check constraint to *only* to/from
  the parent

- cannot add/drop NOT NULL constraint to/from *only* the parent

Thoughts?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Langote

Thanks for the review!

On 2016/10/25 20:32, Amit Kapila wrote:
> On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote
>  wrote:
>> On 2016/10/05 2:12, Robert Haas wrote:
> 
>> Attached revised patches.
> 
> Few assorted review comments for 0001-Catalog*:
> 
> 
> 1.
> @@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate,
> {
> ..
> + else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
> + ereport(ERROR,
> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> + errmsg("cannot copy from partitioned table \"%s\"",
> + RelationGetRelationName(rel)),
> + errhint("Try the COPY (SELECT ...) TO variant.")));
> ..
> }
> 
> Why is this restriction?  Won't it be useful to allow it for the cases
> when user wants to copy the data of all the partitions?

Sure, CopyTo() can be be taught to scan leaf partitions when a partitioned
table is specified, but I thought this may be fine initially.

> 2.
> + if (!pg_strcasecmp(stmt->partspec->strategy, "list") &&
> + partnatts > 1)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> + errmsg("cannot list partition using more than one column")));
> 
> /cannot list/cannot use list

Actually "list partition" works here as a verb, as in "to list partition".

> 3.
> @@ -77,7 +77,7 @@ typedef enum DependencyType
>   DEPENDENCY_INTERNAL = 'i',
>   DEPENDENCY_EXTENSION = 'e',
>   DEPENDENCY_AUTO_EXTENSION = 'x',
> - DEPENDENCY_PIN = 'p'
> + DEPENDENCY_PIN = 'p',
>  } DependencyType;
> Why is this change required?

Looks like a leftover hunk from previous revision.  Will fix.

> 4.
> @@ -0,0 +1,69 @@
> +/*-
> + *
> + * pg_partitioned_table.h
> + *  definition of the system "partitioned table" relation
> + *  along with the relation's initial contents.
> + *
> + *
> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
> 
> Copyright year should be 2016.

Oops, will fix.

> 5.
> +/*
> + * PartitionSpec - partition key definition including the strategy
> + *
> + * 'strategy' partition strategy name ('list', 'range', etc.)
> 
> etc. in above comment seems to be unnecessary.

Will fix, although  I thought that list is yet incomplete.

> 6.
> + {PartitionedRelationId, /* PARTEDRELID */
> 
> Here PARTEDRELID sounds inconvenient, how about PARTRELID?

Agreed.  There used to be another catalog which had used up PARTRELID, but
that's no longer an issue.

I will include these changes in the next version of patches I will post
soon in reply to [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoYJcUTcN7vVgg54GHtffH11JJWYZnfF4KiRxjV-iaACQg%40mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Kapila
On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote
 wrote:
> On 2016/10/05 2:12, Robert Haas wrote:

> Attached revised patches.

Few assorted review comments for 0001-Catalog*:


1.
@@ -1775,6 +1775,12 @@ BeginCopyTo(ParseState *pstate,
{
..
+ else if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot copy from partitioned table \"%s\"",
+ RelationGetRelationName(rel)),
+ errhint("Try the COPY (SELECT ...) TO variant.")));
..
}

Why is this restriction?  Won't it be useful to allow it for the cases
when user wants to copy the data of all the partitions?


2.
+ if (!pg_strcasecmp(stmt->partspec->strategy, "list") &&
+ partnatts > 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("cannot list partition using more than one column")));

/cannot list/cannot use list

3.
@@ -77,7 +77,7 @@ typedef enum DependencyType
  DEPENDENCY_INTERNAL = 'i',
  DEPENDENCY_EXTENSION = 'e',
  DEPENDENCY_AUTO_EXTENSION = 'x',
- DEPENDENCY_PIN = 'p'
+ DEPENDENCY_PIN = 'p',
 } DependencyType;

Why is this change required?

4.
@@ -0,0 +1,69 @@
+/*-
+ *
+ * pg_partitioned_table.h
+ *  definition of the system "partitioned table" relation
+ *  along with the relation's initial contents.
+ *
+ *
+ * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group

Copyright year should be 2016.

5.
+/*
+ * PartitionSpec - partition key definition including the strategy
+ *
+ * 'strategy' partition strategy name ('list', 'range', etc.)

etc. in above comment seems to be unnecessary.

6.
+ {PartitionedRelationId, /* PARTEDRELID */

Here PARTEDRELID sounds inconvenient, how about PARTRELID?



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-25 Thread Amit Kapila
On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote
 wrote:
> On 2016/10/05 2:12, Robert Haas wrote:
>> On Tue, Oct 4, 2016 at 4:02 AM, Amit Langote
>>  wrote:
 Even if we leave the empty relfilenode around for now -- in the long
 run I think it should die -- I think we should prohibit the creation
 of subsidiary object on the parent which is only sensible if it has
 rows - e.g. indexes.  It makes no sense to disallow non-inheritable
 constraints while allowing indexes, and it could box us into a corner
 later.
>>>
>>> I agree.  So we must prevent from the get-go the creation of following
>>> objects on parent tables (aka RELKIND_PARTITIONED_TABLE relations):
>>>
>>> * Indexes
>>> * Row triggers (?)
>>
>> Hmm, do we ever fire triggers on the parent for operations on a child
>> table?  Note this thread, which seems possibly relevant:
>>
>> https://www.postgresql.org/message-id/flat/cd282adde5b70b20c57f53bb9ab75e27%40biglumber.com
>
> The answer to your question is no.
>
> The thread you quoted discusses statement-level triggers and the
> conclusion is that they don't work as desired for UPDATE and DELETE on
> inheritance tables.  As things stand, only UPDATE or DELETE on the parent
> affects the child tables and it's proposed there that the statement-level
> triggers on the parent and also on any child tables affected should be
> fired in that case.
>

Doesn't that imply that the statement level triggers should be fired
for all the tables that get changed for statement?  If so, then in
your case it should never fire for parent table, which means we could
disallow statement level triggers as well on parent tables?

Some of the other things that we might want to consider disallowing on
parent table could be:
a. Policy on table_name
b. Alter table has many clauses, are all of those allowed and will it
make sense to allow them?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-07 Thread Amit Langote
On 2016/10/07 18:27, Ashutosh Bapat wrote:
> It's allowed to specify an non-default opclass in partition by clause,
> but I do not see any testcase testing the same. Can you please add
> one.

OK, I will add some tests related to that.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-07 Thread Ashutosh Bapat
It's allowed to specify an non-default opclass in partition by clause,
but I do not see any testcase testing the same. Can you please add
one.

On Fri, Oct 7, 2016 at 2:50 PM, Amit Langote
 wrote:
> On 2016/10/07 17:33, Rajkumar Raghuwanshi wrote:
>> On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote wrote:
>> I have applied latest patches, getting some error and crash, please check
>> if you are also able to reproduce the same.
>>
>> Observation1 : Not able to create index on partition table.
>> --
>> CREATE TABLE rp (c1 int, c2 int) PARTITION BY RANGE(c1);
>> CREATE TABLE rp_p1 PARTITION OF rp FOR VALUES START (1) END (10);
>> CREATE TABLE rp_p2 PARTITION OF rp FOR VALUES START (10) END (20);
>>
>> CREATE INDEX idx_rp_c1 on rp(c1);
>> ERROR:  cannot create index on partitioned table "rp"
>
> This one is a new behavior as I mentioned earlier in my previous email.
>
>> Observation2 : Getting cache lookup failed error for multiple column range
>> partition
>> --
>> CREATE TABLE rp1_m (c1 int, c2 int) PARTITION BY RANGE(c1, ((c1 + c2)/2));
>>
>> CREATE TABLE rp1_m_p1 PARTITION OF rp1_m FOR VALUES START (1, 1) END (10,
>> 10);
>> ERROR:  cache lookup failed for attribute 0 of relation 16429
>>
>> Observation3 : Getting server crash with multiple column range partition
>> --
>> CREATE TABLE rp2_m (c1 int, c2 int) PARTITION BY RANGE(((c2 + c1)/2), c2);
>> CREATE TABLE rp2_m_p1 PARTITION OF rp2_m FOR VALUES START (1, 1) END (10,
>> 10);
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>> !>
>
> Fixed.  Should have been caught when running the regression tests after I
> made changes to 0001 to address some review comments (apparently there was
> no test in 0003 that would've caught this, so added a new one, thanks!).
>
>
> Attached updated patches.
>
> Thanks,
> Amit



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-07 Thread Rajkumar Raghuwanshi
On Thu, Oct 6, 2016 at 12:44 PM, Amit Langote  wrote:

>
> Attached revised patches.  Also, includes a fix for an issue reported by
> Rajkumar Raghuwanshi [1] which turned out to be a bug in one of the later
> patches.  I will now move on to addressing the comments on patch 0003.
>
> Thanks a lot for the review!
>
> Thanks,
> Amit
>
> [1]
> https://www.postgresql.org/message-id/5dded2f1-c7f6-e7fc-56b
> 5-23ab59495...@lab.ntt.co.jp


Hi,

I have applied latest patches, getting some error and crash, please check
if you are also able to reproduce the same.

Observation1 : Not able to create index on partition table.
--
CREATE TABLE rp (c1 int, c2 int) PARTITION BY RANGE(c1);
CREATE TABLE rp_p1 PARTITION OF rp FOR VALUES START (1) END (10);
CREATE TABLE rp_p2 PARTITION OF rp FOR VALUES START (10) END (20);

CREATE INDEX idx_rp_c1 on rp(c1);
ERROR:  cannot create index on partitioned table "rp"

Observation2 : Getting cache lookup failed error for multiple column range
partition
--
CREATE TABLE rp1_m (c1 int, c2 int) PARTITION BY RANGE(c1, ((c1 + c2)/2));

CREATE TABLE rp1_m_p1 PARTITION OF rp1_m FOR VALUES START (1, 1) END (10,
10);
ERROR:  cache lookup failed for attribute 0 of relation 16429

Observation3 : Getting server crash with multiple column range partition
--
CREATE TABLE rp2_m (c1 int, c2 int) PARTITION BY RANGE(((c2 + c1)/2), c2);
CREATE TABLE rp2_m_p1 PARTITION OF rp2_m FOR VALUES START (1, 1) END (10,
10);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] Declarative partitioning - another take

2016-10-05 Thread Amit Langote

Hi,

On 2016/10/05 16:57, Rajkumar Raghuwanshi wrote:
> I observed, when creating foreign table with range partition, data is not
> inserting into specified partition range. below are steps to reproduce.
> 
> [ ... ]
> 
> postgres=# INSERT INTO test_range (a) values (5),(25),(15);
> INSERT 0 3
> 
> postgres=# select tableoid::regclass, * from test_range;
>  tableoid | a 
> --+
>  ft_test_range_p1 |  5
>  ft_test_range_p2 | 15
>  ft_test_range_p3 | 25
> (3 rows)
> 
> --Here ft_test_range_p2 is created for range 20-30 having value 15.

Thanks a lot for testing.

That's a bug.  I found that it is caused by the FDW plans getting paired
with the wrong result relations during ExecInitModifyTable() initialization.

I will include a fix for the same in the patch set that I will be sending
soon in reply to Robert's review comments on patch 0002 [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoY1aQ5iPz0S2GBJw4YUR1Z2Qg5iKUf8YJSo2Ctya4ZmNg%40mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-05 Thread Rajkumar Raghuwanshi
On Tue, Oct 4, 2016 at 1:32 PM, Amit Langote 
wrote:

Attached updated patches.
>
> Thanks,
> Amit
>

Hi,

I observed, when creating foreign table with range partition, data is not
inserting into specified partition range. below are steps to reproduce.

CREATE EXTENSION postgres_fdw;
CREATE SERVER pwj_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname
'postgres', port '5432',use_remote_estimate 'true');
CREATE USER MAPPING FOR PUBLIC SERVER pwj_server;

CREATE TABLE test_range (a int) PARTITION BY RANGE(a);

CREATE TABLE test_range_p1 (a int);
CREATE FOREIGN TABLE ft_test_range_p1 PARTITION OF test_range FOR VALUES
START (1) END (10) SERVER pwj_server OPTIONS (TABLE_NAME 'test_range_p1');

CREATE TABLE test_range_p2 (a int);
CREATE FOREIGN TABLE ft_test_range_p2 PARTITION OF test_range FOR VALUES
START (20) END (30) SERVER pwj_server OPTIONS (TABLE_NAME 'test_range_p2');

CREATE TABLE test_range_p3 (a int);
CREATE FOREIGN TABLE ft_test_range_p3 PARTITION OF test_range FOR VALUES
START (10) END (20) SERVER pwj_server OPTIONS (TABLE_NAME 'test_range_p3');

postgres=# INSERT INTO test_range (a) values (5),(25),(15);
INSERT 0 3

postgres=# select tableoid::regclass, * from test_range;
 tableoid | a
--+
 ft_test_range_p1 |  5
 ft_test_range_p2 | 15
 ft_test_range_p3 | 25
(3 rows)

--Here ft_test_range_p2 is created for range 20-30 having value 15.

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation


Re: [HACKERS] Declarative partitioning - another take

2016-10-04 Thread Petr Jelinek
On 05/10/16 03:50, Robert Haas wrote:
> On Tue, Oct 4, 2016 at 4:02 AM, Amit Langote
>  wrote:
>> [ latest patch set ]
> 
> Reviewing 0003:
> 
> +  with matching types and no more. Also, it must have all the matching
> +  constraints as the target table.  That includes both NOT 
> NULL
> +  and CHECK constraints.  If some CHECK
> +  constraint of the table being attached is marked NO
> INHERIT,
> +  the command will fail; such constraints must either be dropped or
> +  recreated without the NO INHERIT mark.
> 
> Why all of these requirements?  We could instead perform a scan to
> validate that the constraints are met.  I think the way this should
> work is:

I think it's survivable limitation in initial version, it does not seem
to me like there is anything that prevents improving this in some follow
up patch.

> 
> +  Currently UNIQUE, PRIMARY KEY, 
> and
> +  FOREIGN KEY constraints are not considered, but that
> +  might change in the future.
> 
> Really?  Changing that sounds impractical to me.
> 

Which part of that statement?

> 
> +  Note that if a partition being detached is itself a partitioned table,
> +  it continues to exist as such.
> 
> You don't really need to say this, I think.  All of the properties of
> the detached table are retained, not only its partitioning status.
> You wouldn't like it if I told you to document "note that if a
> partition being detached is unlogged, it will still be unlogged".
> 

I think this is bit different though, it basically means you are
detaching whole branch from the rest of the partitioning tree, not just
that one partition. To me that's worth mentioning to avoid potential
confusion.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-04 Thread Robert Haas
On Tue, Oct 4, 2016 at 4:02 AM, Amit Langote
 wrote:
> [ latest patch set ]

Reviewing 0003:

+  This form attaches an existing table (partitioned or otherwise) as

(which might itself be partitioned)

+  partition of the target table.  Partition bound specification must
+  correspond with the partition method and the key of the target table.

The partition bound specification must correspond to the partitioning
method and partitioning key of the target table.

+  The table being attached must have all the columns of the target table
+  with matching types and no more. Also, it must have all the matching

The table to be attached must have all of the same columns as the
target table and no more; moreover, the column types must also match.

+  with matching types and no more. Also, it must have all the matching
+  constraints as the target table.  That includes both NOT NULL
+  and CHECK constraints.  If some CHECK
+  constraint of the table being attached is marked NO
INHERIT,
+  the command will fail; such constraints must either be dropped or
+  recreated without the NO INHERIT mark.

Why all of these requirements?  We could instead perform a scan to
validate that the constraints are met.  I think the way this should
work is:

1. ATTACH PARTITION works whether matching NOT NULL and CHECK
constraints are present or not.

2. If all of the constraints are present, and a validated constraint
matching the implicit partitioning constraint is also present, then
ATTACH PARTITION does not scan the table to validate constraints;
otherwise, it does.

3. NO VALIDATE is not an option.

+  Currently UNIQUE, PRIMARY KEY, and
+  FOREIGN KEY constraints are not considered, but that
+  might change in the future.

Really?  Changing that sounds impractical to me.

+  This form detaches specified partition of the target table.  The
+  detached partition continues to exist as a standalone table with no ties
+  remaining with the target table.

continues to exist as a standalone table, but no longer has any ties
to the table from which it was detached.

+  Note that if a partition being detached is itself a partitioned table,
+  it continues to exist as such.

You don't really need to say this, I think.  All of the properties of
the detached table are retained, not only its partitioning status.
You wouldn't like it if I told you to document "note that if a
partition being detached is unlogged, it will still be unlogged".

To add the table as a new child of a parent table, you must own the
-   parent table as well.
+   parent table as well.  That applies to both adding the table as a
+   inheritance child of a parent table and attaching a table as partition to
+   the table.

To add the table as a new child of a parent table, or as a new
partition of a partitioned table, you must own the parent table as
well.

+The name of the table to attach as a new partition to or
detach from this table.

s/to or/or to/

+NO VALIDATE option is spcified.

Typo, but see comments above about nuking this altogether.

 A recursive DROP COLUMN operation will remove a
 descendant table's column only if the descendant does not inherit
 that column from any other parents and never had an independent
-definition of the column.  A nonrecursive DROP
+definition of the column (which always holds if the descendant table
+is a partition).  A nonrecursive DROP
 COLUMN (i.e., ALTER TABLE ONLY ... DROP
 COLUMN) never removes any descendant columns, but
-instead marks them as independently defined rather than inherited.
+instead marks them as independently defined rather than inherited,
+unless the descendant table is a partition.

This is a hairy explanation.  I suggest that the documentation say
this (and that the code implement it): A nonrecursive DROP TABLE
command will fail for a partitioned table, because all partitions of a
table must have the same columns as the partitioning root.

-that are not marked NO INHERIT.
+that are not marked NO INHERIT which are unsupported if
+the table is a partitioned table.

I think you can omit this hunk.

+   If PARTITION OF clause is specified then the table is
+   created as a partition of parent_table with specified
+   bounds.  However, unlike regular tables, one cannot specify
+   PARTITION BY clause which means foreign tables can
+   only be created as leaf partitions.

I'd delete the sentence beginning with "However".

+   Create foreign table measurement_y2016m07, which will be
+   accessed through the server server_07, that is partition
+   of the range partitioned table measurement:

s/, that is/ as a/

+and partition_bound_spec is:
+
+FOR VALUES { list_spec |
range_spec }

I think you can inline the definitions of list_spec and range_spec
here instead of making them separate productions, and I think that
would be preferable.

FOR 

Re: [HACKERS] Declarative partitioning - another take

2016-10-04 Thread Robert Haas
On Tue, Oct 4, 2016 at 4:02 AM, Amit Langote
 wrote:
>> Even if we leave the empty relfilenode around for now -- in the long
>> run I think it should die -- I think we should prohibit the creation
>> of subsidiary object on the parent which is only sensible if it has
>> rows - e.g. indexes.  It makes no sense to disallow non-inheritable
>> constraints while allowing indexes, and it could box us into a corner
>> later.
>
> I agree.  So we must prevent from the get-go the creation of following
> objects on parent tables (aka RELKIND_PARTITIONED_TABLE relations):
>
> * Indexes
> * Row triggers (?)

Hmm, do we ever fire triggers on the parent for operations on a child
table?  Note this thread, which seems possibly relevant:

https://www.postgresql.org/message-id/flat/cd282adde5b70b20c57f53bb9ab75e27%40biglumber.com

We certainly won't fire the parent's per-row triggers ever, so those
should be prohibited.  But I'm not sure about per-statement triggers.

> In addition to preventing creation of these objects, we must also teach
> commands that directly invoke heapam.c to skip such relations.

I'm don't think that's required if we're not actually getting rid of
the relfilenode.

> In this case, relkind refers to the command viz. DROP 
> .  It can never be RELKIND_PARTITIONED_TABLE, so the above will
> be equivalent to leaving things unchanged.  Anyway I agree the suggested
> style is better, but I assume you meant:
>
> if (classform->relkind == RELKIND_PARTITIONED_TABLE)
> expected_relkind = RELKIND_RELATION;
> else
> expected_relkind = classform->relkind;
>
> if (relkind != expected_relkind)
> DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);

Uh, yeah, probably that's what I meant. :-)

> Thanks for the explanation. I added a new field to the catalog called
> partcollation.
>
> That means a couple of things - when comparing input key values (consider
> a new tuple being routed) to partition bounds using the btree compare
> function, we use the corresponding key column's partcollation.  The same
> is also used as inputcollid of the OpExpr (or ScalarArrayOpExpr) in the
> implicitly generated CHECK constraints.

Check.

> However, varcollid of any Var nodes in the above expressions is still the
> corresponding attributes' attcollation.

I think that's OK.

> Needless to say, a query-specified qualification clause must now include
> the same collate clause as used for individual partition columns (or
> expressions) for the constraint exclusion to work as intended.

Hopefully this is only required if the default choice of collation
wouldn't be correct anyway.

>> +  partexprbin
>>
>> Is there any good reason not to do partexprbin -> partexpr throughout the 
>> patch?
>
> Agreed. Though I used partexprs (like indexprs).

Oh, good idea.

>>  case EXPR_KIND_TRIGGER_WHEN:
>>  return "WHEN";
>> +case EXPR_KIND_PARTITION_KEY:
>> +return "partition key expression";
>>
>> I think you should say "PARTITION BY" here.  See the function header
>> comment for ParseExprKindName.
>
> Used "PARTITION BY".  I was trying to mimic EXPR_KIND_INDEX_EXPRESSION,
> but this seems to work better.  Also, I renamed EXPR_KIND_PARTITION_KEY to
> EXPR_KIND_PARTITION_EXPRESSION.
>
> By the way, a bunch of error messages say "partition key expression".  I'm
> assuming it's better to leave them alone, that is, not rewrite them as
> "PARTITION BY expressions".

I think that is fine.  ParseExprKindName is something of a special case.

Reviewing 0002:

+prettyFlags = PRETTYFLAG_INDENT;
+PG_RETURN_TEXT_P(string_to_text(pg_get_partkeydef_worker(relid,
+prettyFlags)));

Why bother with the variable?

+/* Must get partclass, and partexprs the hard way */
+datum = SysCacheGetAttr(PARTEDRELID, tuple,
+Anum_pg_partitioned_table_partclass, );
+Assert(!isnull);
+partclass = (oidvector *) DatumGetPointer(datum);

Comment mentions getting two things, but code only gets one.

+partexprs = (List *) stringToNode(exprsString);

if (!IsA(partexprs, List)) elog(ERROR, ...); so as to guard against
corrupt catalog contents.

+switch (form->partstrat)
+{
+case 'l':
+appendStringInfo(, "LIST");
+break;
+case 'r':
+appendStringInfo(, "RANGE");
+break;
+}

default: elog(ERROR, "unexpected partition strategy: %d", (int)
form->partstrat);

Also, why not case PARTITION_STRATEGY_LIST: instead of case 'l': and
similarly for 'r'?

@@ -5296,7 +5301,8 @@ getTables(Archive *fout, int *numTables)
   "OR %s IS NOT NULL "
   "OR %s IS NOT NULL"
   "))"
-  "AS changed_acl "
+  "AS changed_acl, "
+  "CASE WHEN c.relkind = 'P' THEN

Re: [HACKERS] Declarative partitioning - another take

2016-10-02 Thread Amit Langote
On 2016/10/03 13:26, Michael Paquier wrote:
> On Fri, Sep 30, 2016 at 9:10 AM, Robert Haas  wrote:
>> On Thu, Sep 29, 2016 at 8:09 AM, Amit Langote
>>  wrote:
>>> I removed DEPENDENCY_IGNORE.  Does the following look good or am I still
>>> missing something?
>>
>> You missed your commit message, but otherwise looks fine.
> 
> I have moved this patch to next CF because that's fresh, but switched
> the patch as "waiting on author". Be careful, the patch was in "needs
> review".

Thanks Michael.  I was going to post the patch addressing Robert's
comments today, which I will anyway.  Then I will switch the status back
to "Needs Review", albeit as part of the next CF.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-10-02 Thread Michael Paquier
On Fri, Sep 30, 2016 at 9:10 AM, Robert Haas  wrote:
> On Thu, Sep 29, 2016 at 8:09 AM, Amit Langote
>  wrote:
>> I removed DEPENDENCY_IGNORE.  Does the following look good or am I still
>> missing something?
>
> You missed your commit message, but otherwise looks fine.

I have moved this patch to next CF because that's fresh, but switched
the patch as "waiting on author". Be careful, the patch was in "needs
review".
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-29 Thread Robert Haas
On Thu, Sep 29, 2016 at 8:09 AM, Amit Langote
 wrote:
> I removed DEPENDENCY_IGNORE.  Does the following look good or am I still
> missing something?

You missed your commit message, but otherwise looks fine.

>> Also, I think this should be rephrased a bit to be more clear about
>> how the partitioning key works, like this: The optional
>> PARTITION BY clause specifies a method of
>> partitioning the table.  The table thus created is called a
>> partitioned table.  The parenthesized list of
>> expressions forms the partitioning key for the
>> table.  When using range partitioning, the partioning key can include
>> multiple columns or expressions, but for list partitioning, the
>> partitioning key must consist of a single column or expression.  If no
>> btree operator class is specified when creating a partitioned table,
>> the default btree operator class for the datatype will be used.  If
>> there is none, an error will be reported.
>
> Revised text along these lines.

It seems you copied my typo: my text has "partioning" for
"partitioning" and your patch now has that, too.

> Things were that way initially, that is, the parent relations had no
> relfilenode.  I abandoned that project however.  The storage-less parent
> thing seemed pretty daunting to me to handle right away.  For example,
> optimizer and the executor code needed to be taught about the parent rel
> appendrel member that used to be included as part of the result of
> scanning the inheritance set but was no longer.

Even if we leave the empty relfilenode around for now -- in the long
run I think it should die -- I think we should prohibit the creation
of subsidiary object on the parent which is only sensible if it has
rows - e.g. indexes.  It makes no sense to disallow non-inheritable
constraints while allowing indexes, and it could box us into a corner
later.

>> +/*
>> + * Run the expressions through eval_const_expressions. This is
>> + * not just an optimization, but is necessary, because eventually
>> + * the planner will be comparing them to similarly-processed qual
>> + * clauses, and may fail to detect valid matches without this.
>> + * We don't bother with canonicalize_qual, however.
>> + */
>>
>> I'm a bit confused by this, because I would think this processing
>> ought to have been done before storing anything in the system
>> catalogs.  I don't see why it should be necessary to do it again after
>> pulling data back out of the system catalogs.
>
> The pattern matches what's done for other expressions that optimizer deals
> with, such as CHECK, index key, and index predicate expressions.

That's kind of a non-answer answer, but OK.  Still, I think you
shouldn't just copy somebody else's comment blindly into a new place.
Reference the original comment, or write your own.

>> How can it be valid to have no partitioning expressions?
>
> Keys that are simply column names are resolved to attnums and stored
> likewise.  If some key is an expression, then corresponding attnum is 0
> and the expression itself is added to the list that gets stored into
> partexprbin.  It is doing the same thing as index expressions.

Oh, right.  Oops.

>> +if (classform->relkind != relkind &&
>> +(relkind == RELKIND_RELATION &&
>> +classform->relkind != RELKIND_PARTITIONED_TABLE))
>>
>> That's broken.  Note that all of the conditions are joined using &&,
>> so if any one of them fails then we won't throw an error.  In
>> particular, it's no longer possible to throw an error when relkind is
>> not RELKIND_RELATION.
>
> You are right.  I guess it would have to be the following:
>
> +if ((classform->relkind != relkind &&
> + classform->relkind != RELKIND_PARTITIONED_TABLE) ||
> +(classform->relkind == RELKIND_PARTITIONED_TABLE &&
> + relkind != RELKIND_RELATION))
>
> Such hackishness could not be helped because we have a separate DROP
> command for every distinct relkind, except we overload DROP TABLE for both
> regular and partitioned tables.

Maybe this would be better:

if (relkind == RELKIND_PARTITIONED_TABLE)
syntax_relkind = RELKIND_RELATION;
else
syntax_relkind = rekind;
if (classform->relkind != syntax_relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);

>> Why isn't this logic being invoked from transformCreateStmt()?  Then
>> we could use the actual parseState for the query instead of a fake
>> one.
>
> Because we need an open relation for it to work, which in this case there
> won't be until after we have performed heap_create_with_catalog() in
> DefineRelation().  Mainly because we need to perform transformExpr() on
> expressions.  That's similar to how cookConstraint() on the new CHECK
> constraints cannot be performed earlier.  Am I missing something?

Hmm, yeah, I guess that's the same thing.  I guess I got on this line
of thinking because the function 

Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Amit Langote
On 2016/09/27 18:09, Ashutosh Bapat wrote:
>>> I tried to debug the problem somewhat. In set_append_rel_pathlist(),
>>> it finds that at least one child has a parameterized path as the
>>> cheapest path, so it doesn't create an unparameterized path for append
>>> rel. At the same time there is a parameterization common to all the
>>> children, so it doesn't create any path. There seem to be two problems
>>> here
>>> 1. The children from second level onwards may not be getting
>>> parameterized for lateral references. That seems unlikely but
>>> possible.
> 
> Did you check this? We may be missing on creating index scan paths
> with parameterization. If we fix this, we don't need to
> re-parameterize Append.

You're right.  How about the attached patch that fixes the problem along
these lines?  The problem seems to be that multi-level inheritance sets
(partitioned tables) are not handled in create_lateral_join_info(), which
means that lateral_relids and lateral_referencers of the root relation are
not being propagated to the partitions below level 1.

I'm getting concerned about one thing though - for a given *regular*
inheritance set, the root->append_rel_list would be scanned only once; But
for a *partitioned table* inheritance set, it would be scanned for every
partitioned table in the set (ie, the root table and internal partitions).

Thanks,
Amit
commit d69aeabcfcb58f349602a1b7392c611045c37465
Author: amit 
Date:   Tue Sep 27 20:01:44 2016 +0900

Consider multi-level partitioned tables in create_lateral_join_info().

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 84ce6b3..74734f2 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
@@ -623,8 +624,19 @@ create_lateral_join_info(PlannerInfo *root)
 	for (rti = 1; rti < root->simple_rel_array_size; rti++)
 	{
 		RelOptInfo *brel = root->simple_rel_array[rti];
+		RangeTblEntry *rte = root->simple_rte_array[rti];
 
-		if (brel == NULL || brel->reloptkind != RELOPT_BASEREL)
+		if (brel == NULL)
+			continue;
+
+		/*
+		 * If an "other rel" RTE is a "partitioned table", we must propagate
+		 * the lateral info inherited from the parent to its children. That's
+		 * because they are not linked directly with the parent via
+		 * AppendRelInfo's (see expand_inherited_rte_internal()).
+		 */
+		if (brel->reloptkind != RELOPT_BASEREL &&
+			rte->relkind != RELKIND_PARTITIONED_TABLE)
 			continue;
 
 		if (root->simple_rte_array[rti]->inh)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Ashutosh Bapat
On Tue, Sep 27, 2016 at 2:46 PM, Amit Langote
 wrote:
> On 2016/09/27 15:44, Ashutosh Bapat wrote:
>>> By the way, I fixed one thinko in your patch as follows:
>>>
>>> -result->oids[i] = oids[mapping[i]];
>>> +result->oids[mapping[i]] = oids[i];
>>
>> While I can not spot any problem with this logic, when I make that
>> change and run partition_join testcase in my patch, it fails because
>> wrong partitions are matched for partition-wise join of list
>> partitions. In that patch, RelOptInfo of partitions are saved in
>> RelOptInfo of the parent by matching their OIDs. They are saved in the
>> same order as corresponding OIDs. Partition-wise join simply joins the
>> RelOptInfos at the same positions from both the parent RelOptInfos. I
>> can not spot an error in this logic too.
>
> OTOH, using the original logic makes tuple routing put tuples into the
> wrong partitions.  When debugging why that was happening I discovered this
> and hence the proposed change.
>
> You mean that partition RelOptInfo's are placed using the canonical
> ordering of OIDs instead of catalog-scan-driven order, right?  If that's
> true, then there is no possibility of wrong pairing happening, even with
> the new ordering of OIDs in the partition descriptor (ie, the ordering
> that would be produced by my proposed method above).

right! I don't know what's wrong, will debug my changes.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Amit Langote
On 2016/09/27 15:44, Ashutosh Bapat wrote:
>> By the way, I fixed one thinko in your patch as follows:
>>
>> -result->oids[i] = oids[mapping[i]];
>> +result->oids[mapping[i]] = oids[i];
> 
> While I can not spot any problem with this logic, when I make that
> change and run partition_join testcase in my patch, it fails because
> wrong partitions are matched for partition-wise join of list
> partitions. In that patch, RelOptInfo of partitions are saved in
> RelOptInfo of the parent by matching their OIDs. They are saved in the
> same order as corresponding OIDs. Partition-wise join simply joins the
> RelOptInfos at the same positions from both the parent RelOptInfos. I
> can not spot an error in this logic too.

OTOH, using the original logic makes tuple routing put tuples into the
wrong partitions.  When debugging why that was happening I discovered this
and hence the proposed change.

You mean that partition RelOptInfo's are placed using the canonical
ordering of OIDs instead of catalog-scan-driven order, right?  If that's
true, then there is no possibility of wrong pairing happening, even with
the new ordering of OIDs in the partition descriptor (ie, the ordering
that would be produced by my proposed method above).

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Ashutosh Bapat
>>
>> Please check if you are able to reproduce these errors in your
>> repository. I made sure that I cleaned up all partition-wise join code
>> before testing this, but ... .
>
> Thanks for the test case.  I can reproduce the same.
>
>> I tried to debug the problem somewhat. In set_append_rel_pathlist(),
>> it finds that at least one child has a parameterized path as the
>> cheapest path, so it doesn't create an unparameterized path for append
>> rel. At the same time there is a parameterization common to all the
>> children, so it doesn't create any path. There seem to be two problems
>> here
>> 1. The children from second level onwards may not be getting
>> parameterized for lateral references. That seems unlikely but
>> possible.

Did you check this? We may be missing on creating index scan paths
with parameterization. If we fix this, we don't need to
re-parameterize Append.

>> 2. Reparameterization should have corrected this, but
>> reparameterize_path() does not support AppendPaths.
>
> Hmm, 0005-Refactor-optimizer-s-inheritance-set-expansion-code-5.patch is
> certainly to be blamed here; if I revert the patch, the problem goes away.
>
> Based on 2 above, I attempted to add logic for AppendPath in
> reparameterize_path() as in the attached.  It fixes the reported problem
> and does not break any regression tests. If it's sane to do things this
> way, I will incorporate the attached into patch 0005 mentioned above.
> Thoughts?


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Amit Langote

Sorry about the delay in replying.

On 2016/09/15 21:58, Ashutosh Bapat wrote:
> Hi Amit,
> 
> It looks like there is some problem while creating paramterized paths
> for multi-level partitioned tables. Here's a longish testcase
> 
> [ ... ]
> 
> Please check if you are able to reproduce these errors in your
> repository. I made sure that I cleaned up all partition-wise join code
> before testing this, but ... .

Thanks for the test case.  I can reproduce the same.

> I tried to debug the problem somewhat. In set_append_rel_pathlist(),
> it finds that at least one child has a parameterized path as the
> cheapest path, so it doesn't create an unparameterized path for append
> rel. At the same time there is a parameterization common to all the
> children, so it doesn't create any path. There seem to be two problems
> here
> 1. The children from second level onwards may not be getting
> parameterized for lateral references. That seems unlikely but
> possible.
> 2. Reparameterization should have corrected this, but
> reparameterize_path() does not support AppendPaths.

Hmm, 0005-Refactor-optimizer-s-inheritance-set-expansion-code-5.patch is
certainly to be blamed here; if I revert the patch, the problem goes away.

Based on 2 above, I attempted to add logic for AppendPath in
reparameterize_path() as in the attached.  It fixes the reported problem
and does not break any regression tests. If it's sane to do things this
way, I will incorporate the attached into patch 0005 mentioned above.
Thoughts?

Thanks,
Amit
commit 86c8fc2c268d17fb598bf6879cfef2b2a2f42417
Author: amit 
Date:   Tue Sep 27 16:02:54 2016 +0900

Handle AppendPath in reparameterize_path().

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 2a49639..750f0ea 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -1570,6 +1570,40 @@ cost_sort(Path *path, PlannerInfo *root,
 }
 
 /*
+ * cost_append
+ *	  Determines and returns the cost of a Append node
+ *
+ * Compute rows and costs as sums of subplan rows and costs.  We charge
+ * nothing extra for the Append itself, which perhaps is too optimistic,
+ * but since it doesn't do any selection or projection, it is a pretty
+ * cheap node.
+ */
+void
+cost_append(AppendPath *apath, Relids required_outer)
+{
+	ListCell   *l;
+
+	apath->path.rows = 0;
+	apath->path.startup_cost = 0;
+	apath->path.total_cost = 0;
+	foreach(l, apath->subpaths)
+	{
+		Path	   *subpath = (Path *) lfirst(l);
+
+		apath->path.rows += subpath->rows;
+
+		if (l == list_head(apath->subpaths))	/* first node? */
+			apath->path.startup_cost = subpath->startup_cost;
+		apath->path.total_cost += subpath->total_cost;
+		apath->path.parallel_safe = apath->path.parallel_safe &&
+			subpath->parallel_safe;
+
+		/* All child paths must have same parameterization */
+		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
+	}
+}
+
+/*
  * cost_merge_append
  *	  Determines and returns the cost of a MergeAppend node.
  *
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index abb7507..bf8fb55 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -1206,7 +1206,6 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
    int parallel_workers)
 {
 	AppendPath *pathnode = makeNode(AppendPath);
-	ListCell   *l;
 
 	pathnode->path.pathtype = T_Append;
 	pathnode->path.parent = rel;
@@ -1220,32 +1219,7 @@ create_append_path(RelOptInfo *rel, List *subpaths, Relids required_outer,
 		 * unsorted */
 	pathnode->subpaths = subpaths;
 
-	/*
-	 * We don't bother with inventing a cost_append(), but just do it here.
-	 *
-	 * Compute rows and costs as sums of subplan rows and costs.  We charge
-	 * nothing extra for the Append itself, which perhaps is too optimistic,
-	 * but since it doesn't do any selection or projection, it is a pretty
-	 * cheap node.
-	 */
-	pathnode->path.rows = 0;
-	pathnode->path.startup_cost = 0;
-	pathnode->path.total_cost = 0;
-	foreach(l, subpaths)
-	{
-		Path	   *subpath = (Path *) lfirst(l);
-
-		pathnode->path.rows += subpath->rows;
-
-		if (l == list_head(subpaths))	/* first node? */
-			pathnode->path.startup_cost = subpath->startup_cost;
-		pathnode->path.total_cost += subpath->total_cost;
-		pathnode->path.parallel_safe = pathnode->path.parallel_safe &&
-			subpath->parallel_safe;
-
-		/* All child paths must have same parameterization */
-		Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer));
-	}
+	cost_append(pathnode, required_outer);
 
 	return pathnode;
 }
@@ -3204,6 +3178,41 @@ reparameterize_path(PlannerInfo *root, Path *path,
 		 spath->path.pathkeys,
 		 required_outer);
 			}
+		case T_Append:
+			{
+AppendPath	*apath = (AppendPath *) path;
+AppendPath	*newpath;
+List		*new_subpaths = NIL;
+ListCell	*lc;
+
+foreach(lc, 

Re: [HACKERS] Declarative partitioning - another take

2016-09-27 Thread Ashutosh Bapat
>
> With this patch, the mapping is created *only once* during
> RelationBuildPartitionDesc() to assign canonical indexes to individual
> list values.  The partition OID array will also be rearranged such that
> using the new (canonical) index instead of the old
> catalog-scan-order-based index will retrieve the correct partition for
> that value.
>
> By the way, I fixed one thinko in your patch as follows:
>
> -result->oids[i] = oids[mapping[i]];
> +result->oids[mapping[i]] = oids[i];

While I can not spot any problem with this logic, when I make that
change and run partition_join testcase in my patch, it fails because
wrong partitions are matched for partition-wise join of list
partitions. In that patch, RelOptInfo of partitions are saved in
RelOptInfo of the parent by matching their OIDs. They are saved in the
same order as corresponding OIDs. Partition-wise join simply joins the
RelOptInfos at the same positions from both the parent RelOptInfos. I
can not spot an error in this logic too.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-26 Thread Amit Langote
On 2016/09/22 19:10, Ashutosh Bapat wrote:
> On Thu, Sep 22, 2016 at 1:02 PM, Ashutosh Bapat
>  wrote:
>> For list partitions, the ListInfo stores the index maps for values
>> i.e. the index of the partition to which the value belongs. Those
>> indexes are same as the indexes in partition OIDs array and come from
>> the catalogs. In case a user creates two partitioned tables with
>> exactly same lists for partitions but specifies them in a different
>> order, the OIDs are stored in the order specified. This means that
>> index array for these tables come out different. equal_list_info()
>> works around that by creating an array of mappings and checks whether
>> that mapping is consistent for all values. This means we will create
>> the mapping as many times as equal_list_info() is called, which is
>> expected to be more than the number of time
>> RelationBuildPartitionDescriptor() is called. Instead, if we
>> "canonicalise" the indexes so that they come out exactly same for
>> similarly partitioned tables, we build the mapping only once and
>> arrange OIDs accordingly.
>>
>> Here's patch to do that. I have ran make check with this and it didn't
>> show any failure. Please consider this to be included in your next set
>> of patches.

Thanks.  It seems like this will save quite a few cycles for future users
of equal_list_info().  I will incorporate it into may patch.

With this patch, the mapping is created *only once* during
RelationBuildPartitionDesc() to assign canonical indexes to individual
list values.  The partition OID array will also be rearranged such that
using the new (canonical) index instead of the old
catalog-scan-order-based index will retrieve the correct partition for
that value.

By the way, I fixed one thinko in your patch as follows:

-result->oids[i] = oids[mapping[i]];
+result->oids[mapping[i]] = oids[i];

That is, copy an OID at a given position in the original array to a
*mapped position* in the result array (instead of the other way around).

> The patch has an if condition as statement by itself
> +if (l1->null_index != l2->null_index);
>  return false;
> 
> There shouldn't be ';' at the end. It looks like in the tests you have
> added the function always bails out before it reaches this statement.

There is no user of equal_list_info() such that the above bug would have
caused a regression test failure.  Maybe a segfault crash (due to dangling
pointer into partition descriptor's listinfo after the above function
would unintentionally return false to equalPartitionDescs() which is in
turn called by RelationClearRelation()).

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-26 Thread Amit Langote

Hi Ashutosh,

On 2016/09/22 14:42, Ashutosh Bapat wrote:
> Hi Amit,
> Following sequence of DDLs gets an error
> --
> -- multi-leveled partitions
> --
> CREATE TABLE prt1_l (a int, b int, c varchar) PARTITION BY RANGE(a);
> CREATE TABLE prt1_l_p1 PARTITION OF prt1_l FOR VALUES START (0) END
> (250) PARTITION BY RANGE (b);
> CREATE TABLE prt1_l_p1_p1 PARTITION OF prt1_l_p1 FOR VALUES START (0) END 
> (100);
> CREATE TABLE prt1_l_p1_p2 PARTITION OF prt1_l_p1 FOR VALUES START
> (100) END (250);
> CREATE TABLE prt1_l_p2 PARTITION OF prt1_l FOR VALUES START (250) END
> (500) PARTITION BY RANGE (c);
> CREATE TABLE prt1_l_p2_p1 PARTITION OF prt1_l_p2 FOR VALUES START
> ('0250') END ('0400');
> CREATE TABLE prt1_l_p2_p2 PARTITION OF prt1_l_p2 FOR VALUES START
> ('0400') END ('0500');
> CREATE TABLE prt1_l_p3 PARTITION OF prt1_l FOR VALUES START (500) END
> (600) PARTITION BY RANGE ((b + a));
> ERROR:  cannot use column or expression from ancestor partition key
> 
> The last statement is trying create subpartitions by range (b + a),
> which contains a partition key from ancestor partition key but is not
> exactly same as that. In fact it contains some extra columns other
> than the ancestor partition key columns. Why do we want to prohibit
> such cases?

Per discussion [1], I am going to remove this ill-considered restriction.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BHiwqEXAU_m%2BV%3Db-VGmsDNjoqc-Z_9KQdyPuOGbiQGzNObmVg%40mail.gmail.com




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-22 Thread Ashutosh Bapat
On Thu, Sep 22, 2016 at 1:02 PM, Ashutosh Bapat
 wrote:
> For list partitions, the ListInfo stores the index maps for values
> i.e. the index of the partition to which the value belongs. Those
> indexes are same as the indexes in partition OIDs array and come from
> the catalogs. In case a user creates two partitioned tables with
> exactly same lists for partitions but specifies them in a different
> order, the OIDs are stored in the order specified. This means that
> index array for these tables come out different. equal_list_info()
> works around that by creating an array of mappings and checks whether
> that mapping is consistent for all values. This means we will create
> the mapping as many times as equal_list_info() is called, which is
> expected to be more than the number of time
> RelationBuildPartitionDescriptor() is called. Instead, if we
> "canonicalise" the indexes so that they come out exactly same for
> similarly partitioned tables, we build the mapping only once and
> arrange OIDs accordingly.
>
> Here's patch to do that. I have ran make check with this and it didn't
> show any failure. Please consider this to be included in your next set
> of patches.

The patch has an if condition as statement by itself
+if (l1->null_index != l2->null_index);
 return false;

There shouldn't be ';' at the end. It looks like in the tests you have
added the function always bails out before it reaches this statement.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-22 Thread Ashutosh Bapat
For list partitions, the ListInfo stores the index maps for values
i.e. the index of the partition to which the value belongs. Those
indexes are same as the indexes in partition OIDs array and come from
the catalogs. In case a user creates two partitioned tables with
exactly same lists for partitions but specifies them in a different
order, the OIDs are stored in the order specified. This means that
index array for these tables come out different. equal_list_info()
works around that by creating an array of mappings and checks whether
that mapping is consistent for all values. This means we will create
the mapping as many times as equal_list_info() is called, which is
expected to be more than the number of time
RelationBuildPartitionDescriptor() is called. Instead, if we
"canonicalise" the indexes so that they come out exactly same for
similarly partitioned tables, we build the mapping only once and
arrange OIDs accordingly.

Here's patch to do that. I have ran make check with this and it didn't
show any failure. Please consider this to be included in your next set
of patches.

That helps partition-wise join as well. For partition-wise join (and
further optimizations for partitioned tables), we create a list of
canonical partition schemes. In this list two similarly partitioned
tables share partition scheme pointer. A join between relations with
same partition scheme pointer can be joined partition-wise. It's
important that the indexes in partition scheme match to the OIDs array
to find matching RelOptInfos for partition-wise join.

On Thu, Sep 22, 2016 at 11:12 AM, Ashutosh Bapat
 wrote:
> Hi Amit,
> Following sequence of DDLs gets an error
> --
> -- multi-leveled partitions
> --
> CREATE TABLE prt1_l (a int, b int, c varchar) PARTITION BY RANGE(a);
> CREATE TABLE prt1_l_p1 PARTITION OF prt1_l FOR VALUES START (0) END
> (250) PARTITION BY RANGE (b);
> CREATE TABLE prt1_l_p1_p1 PARTITION OF prt1_l_p1 FOR VALUES START (0) END 
> (100);
> CREATE TABLE prt1_l_p1_p2 PARTITION OF prt1_l_p1 FOR VALUES START
> (100) END (250);
> CREATE TABLE prt1_l_p2 PARTITION OF prt1_l FOR VALUES START (250) END
> (500) PARTITION BY RANGE (c);
> CREATE TABLE prt1_l_p2_p1 PARTITION OF prt1_l_p2 FOR VALUES START
> ('0250') END ('0400');
> CREATE TABLE prt1_l_p2_p2 PARTITION OF prt1_l_p2 FOR VALUES START
> ('0400') END ('0500');
> CREATE TABLE prt1_l_p3 PARTITION OF prt1_l FOR VALUES START (500) END
> (600) PARTITION BY RANGE ((b + a));
> ERROR:  cannot use column or expression from ancestor partition key
>
> The last statement is trying create subpartitions by range (b + a),
> which contains a partition key from ancestor partition key but is not
> exactly same as that. In fact it contains some extra columns other
> than the ancestor partition key columns. Why do we want to prohibit
> such cases?
>
> On Thu, Sep 15, 2016 at 2:23 PM, Amit Langote
>  wrote:
>>
>> Hi
>>
>> On 2016/09/09 17:55, Amit Langote wrote:
>>> On 2016/09/06 22:04, Amit Langote wrote:
 Will fix.
>>>
>>> Here is an updated set of patches.
>>
>> An email from Rajkumar somehow managed to break out of this thread.
>> Quoting his message below so that I don't end up replying with patches on
>> two different threads.
>>
>> On 2016/09/14 16:58, Rajkumar Raghuwanshi wrote:
>>> I have Continued with testing declarative partitioning with the latest
>>> patch. Got some more observation, given below
>>
>> Thanks a lot for testing.
>>
>>> -- Observation 1 : Getting overlap error with START with EXCLUSIVE in range
>>> partition.
>>>
>>> create table test_range_bound ( a int) partition by range(a);
>>> --creating a partition to contain records {1,2,3,4}, by default 1 is
>>> inclusive and 5 is exclusive
>>> create table test_range_bound_p1 partition of test_range_bound for values
>>> start (1) end (5);
>>> --now trying to create a partition by explicitly mentioning start is
>>> exclusive to contain records {5,6,7}, here trying to create with START with
>>> 4 as exclusive so range should be 5 to 8, but getting partition overlap
>>> error.
>>> create table test_range_bound_p2 partition of test_range_bound for values
>>> start (4) EXCLUSIVE end (8);
>>> ERROR:  partition "test_range_bound_p2" would overlap partition
>>> "test_range_bound_p1"
>>
>> Wow, this is bad.  What is needed in this case is "canonicalization" of
>> the range partition bounds specified in the command.  Range types do this
>> and hence an equivalent test done with range type values would disagree
>> with the result given by the patch for range partition bounds.
>>
>> select '[1,5)'::int4range && '(4,8]'::int4range as cmp;
>>  cmp
>> -
>>  f
>> (1 row)
>>
>> In this case, the second range is converted into its equivalent canonical
>> form viz. '[5, 9)'.  Then comparison of bounds 5) and [5 can tell that the
>> ranges do not overlap after all.  Range type operators can do this because
>> their code can rely on the availability of a 

Re: [HACKERS] Declarative partitioning - another take

2016-09-21 Thread Ashutosh Bapat
Hi Amit,
Following sequence of DDLs gets an error
--
-- multi-leveled partitions
--
CREATE TABLE prt1_l (a int, b int, c varchar) PARTITION BY RANGE(a);
CREATE TABLE prt1_l_p1 PARTITION OF prt1_l FOR VALUES START (0) END
(250) PARTITION BY RANGE (b);
CREATE TABLE prt1_l_p1_p1 PARTITION OF prt1_l_p1 FOR VALUES START (0) END (100);
CREATE TABLE prt1_l_p1_p2 PARTITION OF prt1_l_p1 FOR VALUES START
(100) END (250);
CREATE TABLE prt1_l_p2 PARTITION OF prt1_l FOR VALUES START (250) END
(500) PARTITION BY RANGE (c);
CREATE TABLE prt1_l_p2_p1 PARTITION OF prt1_l_p2 FOR VALUES START
('0250') END ('0400');
CREATE TABLE prt1_l_p2_p2 PARTITION OF prt1_l_p2 FOR VALUES START
('0400') END ('0500');
CREATE TABLE prt1_l_p3 PARTITION OF prt1_l FOR VALUES START (500) END
(600) PARTITION BY RANGE ((b + a));
ERROR:  cannot use column or expression from ancestor partition key

The last statement is trying create subpartitions by range (b + a),
which contains a partition key from ancestor partition key but is not
exactly same as that. In fact it contains some extra columns other
than the ancestor partition key columns. Why do we want to prohibit
such cases?

On Thu, Sep 15, 2016 at 2:23 PM, Amit Langote
 wrote:
>
> Hi
>
> On 2016/09/09 17:55, Amit Langote wrote:
>> On 2016/09/06 22:04, Amit Langote wrote:
>>> Will fix.
>>
>> Here is an updated set of patches.
>
> An email from Rajkumar somehow managed to break out of this thread.
> Quoting his message below so that I don't end up replying with patches on
> two different threads.
>
> On 2016/09/14 16:58, Rajkumar Raghuwanshi wrote:
>> I have Continued with testing declarative partitioning with the latest
>> patch. Got some more observation, given below
>
> Thanks a lot for testing.
>
>> -- Observation 1 : Getting overlap error with START with EXCLUSIVE in range
>> partition.
>>
>> create table test_range_bound ( a int) partition by range(a);
>> --creating a partition to contain records {1,2,3,4}, by default 1 is
>> inclusive and 5 is exclusive
>> create table test_range_bound_p1 partition of test_range_bound for values
>> start (1) end (5);
>> --now trying to create a partition by explicitly mentioning start is
>> exclusive to contain records {5,6,7}, here trying to create with START with
>> 4 as exclusive so range should be 5 to 8, but getting partition overlap
>> error.
>> create table test_range_bound_p2 partition of test_range_bound for values
>> start (4) EXCLUSIVE end (8);
>> ERROR:  partition "test_range_bound_p2" would overlap partition
>> "test_range_bound_p1"
>
> Wow, this is bad.  What is needed in this case is "canonicalization" of
> the range partition bounds specified in the command.  Range types do this
> and hence an equivalent test done with range type values would disagree
> with the result given by the patch for range partition bounds.
>
> select '[1,5)'::int4range && '(4,8]'::int4range as cmp;
>  cmp
> -
>  f
> (1 row)
>
> In this case, the second range is converted into its equivalent canonical
> form viz. '[5, 9)'.  Then comparison of bounds 5) and [5 can tell that the
> ranges do not overlap after all.  Range type operators can do this because
> their code can rely on the availability of a canonicalization function for
> a given range type.  From the range types documentation:
>
> """
> A discrete range type should have a canonicalization function that is
> aware of the desired step size for the element type. The canonicalization
> function is charged with converting equivalent values of the range type to
> have identical representations, in particular consistently inclusive or
> exclusive bounds. If a canonicalization function is not specified, then
> ranges with different formatting will always be treated as unequal, even
> though they might represent the same set of values in reality.
> """
>
> to extend the last sentence:
>
> "... or consider two ranges overlapping when in reality they are not
> (maybe they are really just adjacent)."
>
> Within the code handling range partition bound, no such canonicalization
> happens, so comparison 5) and (4 ends up concluding that upper1 > lower2,
> hence ranges overlap.
>
> To mitigate this, how about we restrict range partition key to contain
> columns of only those types for which we know we can safely canonicalize a
> range bound (ie, discrete range types)?  I don't think we can use, say,
> existing int4range_canonical but will have to write a version of it for
> partitioning usage (range bounds of partitions are different from what
> int4range_canonical is ready to handle).  This approach will be very
> limiting as then range partitions will be limited to columns of int,
> bigint and date type only.
>
> One more option is we let the user specify the canonicalize function next
> to the column name when defining the partition key.  If not specified, we
> hard-code one for the types for which we will be implementing a
> canonicalize function (ie, above mentioned types).  In 

Re: [HACKERS] Declarative partitioning - another take

2016-09-20 Thread Robert Haas
On Thu, Sep 15, 2016 at 4:53 AM, Amit Langote
 wrote:
> [ new patches ]

Re-reviewing 0001.

+  partexprs
+  pg_node_tree

This documentation doesn't match pg_partition_table.h, which has
partexprsrc and partexprbin.  I don't understand why it's a good idea
to have both, and there seem to be no comments or documentation
supporting that choice anywhere.

+  The optional PARTITION BY clause specifies a method of
+  partitioning the table and the corresponding partition key.  Table
+  thus created is called partitioned table.  Key
+  consists of an ordered list of column names and/or expressions when
+  using the RANGE method, whereas only a single column or
+  expression can be specified when using the LIST method.
+  The type of a key column or an expression must have an associated
+  btree operator class or one must be specified along with the column
+  or the expression.

Both of the sentences in this paragraph that do not begin with "the"
need to begin with "the".  (In my experience, it's generally a feature
of English as spoken in India that connecting words like "the" and "a"
are sometimes left out where non-Indian speakers of English would
include them, so it would be good to watch out for this issue in
general.)

Also, I think this should be rephrased a bit to be more clear about
how the partitioning key works, like this: The optional
PARTITION BY clause specifies a method of
partitioning the table.  The table thus created is called a
partitioned table.  The parenthesized list of
expressions forms the partitioning key for the
table.  When using range partitioning, the partioning key can include
multiple columns or expressions, but for list partitioning, the
partitioning key must consist of a single column or expression.  If no
btree operator class is specified when creating a partitioned table,
the default btree operator class for the datatype will be used.  If
there is none, an error will be reported.

+case RELKIND_PARTITIONED_TABLE:
 options = heap_reloptions(classForm->relkind, datum, false);

Why?  None of the reloptions that pertain to heap seem relevant to a
relkind without storage.

But, ah, do partitioned tables have storage?  I mean, do we end up
with an empty file, or no relfilenode at all?  Can I CLUSTER, VACUUM,
etc. a partitioned table?  It would seem cleaner for the parent to
have no relfilenode at all, but I'm guessing we might need some more
changes for that to work out.

+pg_collation.h pg_range.h pg_transform.h pg_partitioned_table.h\

Whitespace.  Also, here and elsewhere, how about using alphabetical
order, or anyway preserving it insofar as the existing list is
alphabetized?

+/* Remove NO INHERIT flag if rel is a partitioned table */
+if (is_no_inherit &&
+rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ereport(ERROR,
+(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("cannot add NO INHERIT constraint to
partitioned table \"%s\"",
+ RelationGetRelationName(rel;

The code and the comment disagree.  I think the code is right and the
comment should be adjusted to say something like /* Partitioned tables
do not have storage, so a NO INHERIT constraint makes no sense. */

+ * IDENTIFICATION
+ *src/backend/utils/misc/partition.c

Wrong.

+} KeyTypeCollInfo;

I don't like this type name very much.  Can we get "Part" in there someplace?

It doesn't seem to be very well-designed, either.  The number of
entries in each array is determined by the partnatts flag in
PartitionKeyData, which has also got various other arrays whose
lengths are determined by partnatts.  Why do we have some arrays in
one structure and some arrays in another structure?  Would it hurt
anything to merge everything into one structure?  Or could
PartitionKeyData include a field of type KeyTypeCollInfo rather than
KeyTypeCollInfo *, saving one pointer reference every place we access
this data?

+/* Allocate in the supposedly short-lived working context */

Why supposedly?

+datum = fastgetattr(tuple, Anum_pg_partitioned_table_partattrs,
+RelationGetDescr(catalog),
+);

Isn't the point of putting the fixed-length fields first that we can
use GETSTRUCT() here?  And even for partattrs as the first
variable-length thing?

+/*
+ * Run the expressions through eval_const_expressions. This is
+ * not just an optimization, but is necessary, because eventually
+ * the planner will be comparing them to similarly-processed qual
+ * clauses, and may fail to detect valid matches without this.
+ * We don't bother with canonicalize_qual, however.
+ */

I'm a bit confused by this, because I would think this processing
ought to have been done before storing anything in the system
catalogs.  I don't see why it should be necessary 

Re: [HACKERS] Declarative partitioning - another take

2016-09-19 Thread Robert Haas
On Mon, Aug 15, 2016 at 7:21 AM, Amit Langote
 wrote:
>> +if (partexprs)
>> +recordDependencyOnSingleRelExpr(,
>> +(Node *) partexprs,
>> +RelationGetRelid(rel),
>> +DEPENDENCY_NORMAL,
>> +DEPENDENCY_IGNORE);
>>
>> I don't think introducing a new DEPENDENCY_IGNORE type is a good idea
>> here.  Instead, you could just add an additional Boolean argument to
>> recordDependencyOnSingleRelExpr.  That seems less likely to create
>> bugs in unrelated portions of the code.
>
> I did consider a Boolean argument instead of a new DependencyType value,
> however it felt a bit strange to pass a valid value for the fifth argument
> (self_behavior) and then ask using a separate parameter that it (a
> self-dependency) is to be ignored.  By the way, no pg_depend entry is
> created on such a call, so the effect of the new type's usage seems
> localized to me. Thoughts?

I think that's not a very plausible argument.  If you add a fifth
argument to that function, then only that function needs to know about
the possibility of ignoring self-dependencies.  If you add a new
dependency type, then everything that knows about DependencyType needs
to know about them.  That's got to be a much larger surface area for
bugs.  Also, if you look around a bit, I believe you will find other
examples of cases where one argument is used only for certain values
of some other argument.  That's not a novel design pattern.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-18 Thread Amit Langote
On Thu, Sep 15, 2016 at 10:07 PM, Robert Haas  wrote:
> On Thu, Sep 15, 2016 at 4:53 AM, Amit Langote
>  wrote:
>> Wow, this is bad.  What is needed in this case is "canonicalization" of
>> the range partition bounds specified in the command.
>
> I think we shouldn't worry about this.  It seems like unnecessary
> scope creep.  All human beings capable of using PostgreSQL will
> understand that there are no integers between 4 and 5, so any
> practical impact on this would be on someone creating partitions
> automatically.  But if someone is creating partitions automatically
> they are highly likely to be using the same EXCLUSIVE/INCLUSIVE
> settings for all of the partitions, in which case this won't arise.
> And if they aren't, then I think we should just make them deal with
> this limitation in their code instead of dealing with it in our code.
> This patch is plenty complicated enough already; introducing a whole
> new canonicalization concept that will help practically nobody seems
> to me to be going in the wrong direction.  If somebody really cares
> enough to want to try to fix this, they can submit a followup patch
> someday.
>
>> To mitigate this, how about we restrict range partition key to contain
>> columns of only those types for which we know we can safely canonicalize a
>> range bound (ie, discrete range types)?  I don't think we can use, say,
>> existing int4range_canonical but will have to write a version of it for
>> partitioning usage (range bounds of partitions are different from what
>> int4range_canonical is ready to handle).  This approach will be very
>> limiting as then range partitions will be limited to columns of int,
>> bigint and date type only.
>
> -1.  That is letting the tail wag the dog.  Let's leave it the way you
> had it and be happy.

Alright, let's leave this as something to work out later.  We will
have to document the fact that such limitation exists though, I'd
think.

>>> -- Observation 2 : able to create sub-partition out of the range set for
>>> main table, causing not able to insert data satisfying any of the partition.
>>>
>>> create table test_subpart (c1 int) partition by range (c1);
>>> create table test_subpart_p1 partition of test_subpart for values start (1)
>>> end (100) inclusive partition by range (c1);
>>> create table test_subpart_p1_sub1 partition of test_subpart_p1 for values
>>> start (101) end (200);
>>
>> It seems that DDL should prevent the same column being used in partition
>> key of lower level partitions.  I don't know how much sense it would make,
>> but being able to use the same column as partition key of lower level
>> partitions may be a feature useful to some users if they know what they
>> are doing.  But this last part doesn't sound like a good thing.  I
>> modified the patch such that lower level partitions cannot use columns
>> used by ancestor tables.
>
> I again disagree.  If somebody defines partition bounds that make it
> impossible to insert the data that they care about, that's operator
> error.  The fact that, across multiple levels, you can manage to make
> it impossible to insert any data at all does not make it anything
> other than operator error.  If we take the contrary position that it's
> the system's job to prevent this sort of thing, we may disallow some
> useful cases, like partitioning by the year portion of a date and then
> subpartitioning by the month portion of that same date.
>
> I think we'll probably also find that we're making the code
> complicated to no purpose.  For example, now you have to check when
> attaching a partition that it doesn't violate the rule; otherwise you
> end up with a table that can't be created directly (and thus can't
> survive dump-and-restore) but can be created indirectly by exploiting
> loopholes in the checks.  It's tempting to think that we can check
> simple cases - e.g. if the parent and the child are partitioning on
> the same exact column, the child's range should be contained within
> the parent's range - but more complicated cases are tricky.  Suppose
> the table is range-partitioned on (a, b) and range-subpartitioned on
> b.  It's not trivial to figure out whether the set of values that the
> user can insert into that partition is non-empty.  If we allow
> partitioning on expressions, then it quickly becomes altogether
> impossible to deduce anything useful - unless you can solve the
> halting problem.
>
> And, really, why do we care?  If the user creates a partitioning
> scheme that permits no rows in some or all of the partitions, then
> they will have an empty table that can be correctly dumped and
> restored but which cannot be used for anything useful unless it is
> modified first.  They probably don't want that, but it's not any more
> broken than a table inheritance setup with mutually exclusive CHECK
> constraints, or for that matter a plain table with mutually exclusive
> CHECK constraints - and we don't 

Re: [HACKERS] Declarative partitioning - another take

2016-09-15 Thread Robert Haas
On Thu, Sep 15, 2016 at 4:53 AM, Amit Langote
 wrote:
> Wow, this is bad.  What is needed in this case is "canonicalization" of
> the range partition bounds specified in the command.

I think we shouldn't worry about this.  It seems like unnecessary
scope creep.  All human beings capable of using PostgreSQL will
understand that there are no integers between 4 and 5, so any
practical impact on this would be on someone creating partitions
automatically.  But if someone is creating partitions automatically
they are highly likely to be using the same EXCLUSIVE/INCLUSIVE
settings for all of the partitions, in which case this won't arise.
And if they aren't, then I think we should just make them deal with
this limitation in their code instead of dealing with it in our code.
This patch is plenty complicated enough already; introducing a whole
new canonicalization concept that will help practically nobody seems
to me to be going in the wrong direction.  If somebody really cares
enough to want to try to fix this, they can submit a followup patch
someday.

> To mitigate this, how about we restrict range partition key to contain
> columns of only those types for which we know we can safely canonicalize a
> range bound (ie, discrete range types)?  I don't think we can use, say,
> existing int4range_canonical but will have to write a version of it for
> partitioning usage (range bounds of partitions are different from what
> int4range_canonical is ready to handle).  This approach will be very
> limiting as then range partitions will be limited to columns of int,
> bigint and date type only.

-1.  That is letting the tail wag the dog.  Let's leave it the way you
had it and be happy.

>> -- Observation 2 : able to create sub-partition out of the range set for
>> main table, causing not able to insert data satisfying any of the partition.
>>
>> create table test_subpart (c1 int) partition by range (c1);
>> create table test_subpart_p1 partition of test_subpart for values start (1)
>> end (100) inclusive partition by range (c1);
>> create table test_subpart_p1_sub1 partition of test_subpart_p1 for values
>> start (101) end (200);
>
> It seems that DDL should prevent the same column being used in partition
> key of lower level partitions.  I don't know how much sense it would make,
> but being able to use the same column as partition key of lower level
> partitions may be a feature useful to some users if they know what they
> are doing.  But this last part doesn't sound like a good thing.  I
> modified the patch such that lower level partitions cannot use columns
> used by ancestor tables.

I again disagree.  If somebody defines partition bounds that make it
impossible to insert the data that they care about, that's operator
error.  The fact that, across multiple levels, you can manage to make
it impossible to insert any data at all does not make it anything
other than operator error.  If we take the contrary position that it's
the system's job to prevent this sort of thing, we may disallow some
useful cases, like partitioning by the year portion of a date and then
subpartitioning by the month portion of that same date.

I think we'll probably also find that we're making the code
complicated to no purpose.  For example, now you have to check when
attaching a partition that it doesn't violate the rule; otherwise you
end up with a table that can't be created directly (and thus can't
survive dump-and-restore) but can be created indirectly by exploiting
loopholes in the checks.  It's tempting to think that we can check
simple cases - e.g. if the parent and the child are partitioning on
the same exact column, the child's range should be contained within
the parent's range - but more complicated cases are tricky.  Suppose
the table is range-partitioned on (a, b) and range-subpartitioned on
b.  It's not trivial to figure out whether the set of values that the
user can insert into that partition is non-empty.  If we allow
partitioning on expressions, then it quickly becomes altogether
impossible to deduce anything useful - unless you can solve the
halting problem.

And, really, why do we care?  If the user creates a partitioning
scheme that permits no rows in some or all of the partitions, then
they will have an empty table that can be correctly dumped and
restored but which cannot be used for anything useful unless it is
modified first.  They probably don't want that, but it's not any more
broken than a table inheritance setup with mutually exclusive CHECK
constraints, or for that matter a plain table with mutually exclusive
CHECK constraints - and we don't try to prohibit those things.  This
patch is supposed to be implementing partitioning, not artificial
intelligence.

>> -- Observation 3 : Getting cache lookup failed, when selecting list
>> partition table containing array.
>>
>> postgres=# SELECT tableoid::regclass,* FROM test_array;
>> ERROR:  cache lookup failed for type 0
>
> That's a bug.  

Re: [HACKERS] Declarative partitioning - another take

2016-09-15 Thread Ashutosh Bapat
Hi Amit,

It looks like there is some problem while creating paramterized paths
for multi-level partitioned tables. Here's a longish testcase

CREATE TABLE prt1_l (a int, b int, c varchar) PARTITION BY RANGE(a);
CREATE TABLE prt1_l_p1 PARTITION OF prt1_l FOR VALUES START (0) END
(250) PARTITION BY RANGE (b);
CREATE TABLE prt1_l_p1_p1 PARTITION OF prt1_l_p1 FOR VALUES START (0) END (100);
CREATE TABLE prt1_l_p1_p2 PARTITION OF prt1_l_p1 FOR VALUES START
(100) END (250);
CREATE TABLE prt1_l_p2 PARTITION OF prt1_l FOR VALUES START (250) END
(500) PARTITION BY RANGE (c);
CREATE TABLE prt1_l_p2_p1 PARTITION OF prt1_l_p2 FOR VALUES START
('0250') END ('0400');
CREATE TABLE prt1_l_p2_p2 PARTITION OF prt1_l_p2 FOR VALUES START
('0400') END ('0500');
CREATE TABLE prt1_l_p3 PARTITION OF prt1_l FOR VALUES START (500) END
(600) PARTITION BY RANGE ((b + a));
CREATE TABLE prt1_l_p3_p1 PARTITION OF prt1_l_p3 FOR VALUES START
(1000) END (1100);
CREATE TABLE prt1_l_p3_p2 PARTITION OF prt1_l_p3 FOR VALUES START
(1100) END (1200);
INSERT INTO prt1_l SELECT i, i, to_char(i, 'FM') FROM
generate_series(0, 599, 2) i;
CREATE TABLE uprt1_l AS SELECT * FROM prt1_l;

CREATE TABLE prt2_l (a int, b int, c varchar) PARTITION BY RANGE(b);
CREATE TABLE prt2_l_p1 PARTITION OF prt2_l FOR VALUES START (0) END
(250) PARTITION BY RANGE (a);
CREATE TABLE prt2_l_p1_p1 PARTITION OF prt2_l_p1 FOR VALUES START (0) END (100);
CREATE TABLE prt2_l_p1_p2 PARTITION OF prt2_l_p1 FOR VALUES START
(100) END (250);
CREATE TABLE prt2_l_p2 PARTITION OF prt2_l FOR VALUES START (250) END
(500) PARTITION BY RANGE (c);
CREATE TABLE prt2_l_p2_p1 PARTITION OF prt2_l_p2 FOR VALUES START
('0250') END ('0400');
CREATE TABLE prt2_l_p2_p2 PARTITION OF prt2_l_p2 FOR VALUES START
('0400') END ('0500');
CREATE TABLE prt2_l_p3 PARTITION OF prt2_l FOR VALUES START (500) END
(600) PARTITION BY RANGE ((a + b));
CREATE TABLE prt2_l_p3_p1 PARTITION OF prt2_l_p3 FOR VALUES START
(1000) END (1100);
CREATE TABLE prt2_l_p3_p2 PARTITION OF prt2_l_p3 FOR VALUES START
(1100) END (1200);
INSERT INTO prt2_l SELECT i, i, to_char(i, 'FM') FROM
generate_series(0, 599, 3) i;
CREATE TABLE uprt2_l AS SELECT * FROM prt2_l;

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM prt1_l t1 LEFT JOIN LATERAL
  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.a AS
t3a, least(t1.a,t2.a,t3.a) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a =
t3.b AND t2.b = t3.a AND t2.c = t3.c AND t2.b + t2.a = t3.a + t3.b))
ss
  ON t1.a = ss.t2a AND t1.b = ss.t2a AND t1.c = ss.t2c AND
t1.b + t1.a = ss.t2a + ss.t2b WHERE t1.a % 25 = 0 ORDER BY t1.a;
ERROR:  could not devise a query plan for the given query

Let's replace the laterally referenced relation by an unpartitioned table.

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM uprt1_l t1 LEFT JOIN LATERAL
  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.a AS
t3a, least(t1.a,t2.a,t3.a) FROM prt1_l t2 JOIN prt2_l t3 ON (t2.a =
t3.b AND t2.b = t3.a AND t2.c = t3.c AND t2.b + t2.a = t3.a + t3.b))
ss
  ON t1.a = ss.t2a AND t1.b = ss.t2a AND t1.c = ss.t2c AND
t1.b + t1.a = ss.t2a + ss.t2b WHERE t1.a % 25 = 0 ORDER BY t1.a;
ERROR:  could not devise a query plan for the given query

Let's replace another partitioned table in the inner query with an
unpartitioned table.

EXPLAIN (VERBOSE, COSTS OFF)
SELECT * FROM uprt1_l t1 LEFT JOIN LATERAL
  (SELECT t2.a AS t2a, t2.c AS t2c, t2.b AS t2b, t3.a AS
t3a, least(t1.a,t2.a,t3.a) FROM prt1_l t2 JOIN uprt2_l t3 ON (t2.a =
t3.b AND t2.b = t3.a AND t2.c = t3.c AND t2.b + t2.a = t3.a + t3.b))
ss
  ON t1.a = ss.t2a AND t1.b = ss.t2a AND t1.c = ss.t2c AND
t1.b + t1.a = ss.t2a + ss.t2b WHERE t1.a % 25 = 0 ORDER BY t1.a;
ERROR:  could not devise a query plan for the given query

Please check if you are able to reproduce these errors in your
repository. I made sure that I cleaned up all partition-wise join code
before testing this, but ... .

I tried to debug the problem somewhat. In set_append_rel_pathlist(),
it finds that at least one child has a parameterized path as the
cheapest path, so it doesn't create an unparameterized path for append
rel. At the same time there is a parameterization common to all the
children, so it doesn't create any path. There seem to be two problems
here
1. The children from second level onwards may not be getting
parameterized for lateral references. That seems unlikely but
possible.
2. Reparameterization should have corrected this, but
reparameterize_path() does not support AppendPaths.

On Thu, Sep 15, 2016 at 2:23 PM, Amit Langote
 wrote:
>
> Hi
>
> On 2016/09/09 17:55, Amit Langote wrote:
>> On 2016/09/06 22:04, Amit Langote wrote:
>>> Will fix.
>>
>> Here is an updated set of patches.
>
> An email from Rajkumar somehow managed to break out of this thread.
> Quoting his message below so that I don't end up replying with patches on
> two different threads.
>
> On 2016/09/14 16:58, Rajkumar Raghuwanshi wrote:
>> I 

Re: [HACKERS] Declarative partitioning - another take

2016-09-14 Thread Rajkumar Raghuwanshi
I have Continued with testing declarative partitioning with the latest
patch. Got some more observation, given below

-- Observation 1 : Getting overlap error with START with EXCLUSIVE in range
partition.

create table test_range_bound ( a int) partition by range(a);
--creating a partition to contain records {1,2,3,4}, by default 1 is
inclusive and 5 is exclusive
create table test_range_bound_p1 partition of test_range_bound for values
start (1) end (5);
--now trying to create a partition by explicitly mentioning start is
exclusive to contain records {5,6,7}, here trying to create with START with
4 as exclusive so range should be 5 to 8, but getting partition overlap
error.
create table test_range_bound_p2 partition of test_range_bound for values
start (4) EXCLUSIVE end (8);
ERROR:  partition "test_range_bound_p2" would overlap partition
"test_range_bound_p1"

-- Observation 2 : able to create sub-partition out of the range set for
main table, causing not able to insert data satisfying any of the partition.

create table test_subpart (c1 int) partition by range (c1);
create table test_subpart_p1 partition of test_subpart for values start (1)
end (100) inclusive partition by range (c1);
create table test_subpart_p1_sub1 partition of test_subpart_p1 for values
start (101) end (200);

\d+ test_subpart
 Table "public.test_subpart"
 Column |  Type   | Modifiers | Storage | Stats target | Description
+-+---+-+--+-
 c1 | integer |   | plain   |  |
Partition Key: RANGE (c1)
Partitions: test_subpart_p1 FOR VALUES START (1) END (100) INCLUSIVE

\d+ test_subpart_p1
   Table "public.test_subpart_p1"
 Column |  Type   | Modifiers | Storage | Stats target | Description
+-+---+-+--+-
 c1 | integer |   | plain   |  |
Partition Of: test_subpart FOR VALUES START (1) END (100) INCLUSIVE
Partition Key: RANGE (c1)
Partitions: test_subpart_p1_sub1 FOR VALUES START (101) END (200)

insert into test_subpart values (50);
ERROR:  no partition of relation "test_subpart_p1" found for row
DETAIL:  Failing row contains (50).
insert into test_subpart values (150);
ERROR:  no partition of relation "test_subpart" found for row
DETAIL:  Failing row contains (150).

-- Observation 3 : Getting cache lookup failed, when selecting list
partition table containing array.

CREATE TABLE test_array ( i int,j int[],k text[]) PARTITION BY LIST (j);
CREATE TABLE test_array_p1 PARTITION OF test_array FOR VALUES IN ('{1}');
CREATE TABLE test_array_p2 PARTITION OF test_array FOR VALUES IN ('{2,2}');

INSERT INTO test_array (i,j[1],k[1]) VALUES (1,1,1);
INSERT INTO test_array (i,j[1],j[2],k[1]) VALUES (2,2,2,2);

postgres=# SELECT tableoid::regclass,* FROM test_array_p1;
   tableoid| i |  j  |  k
---+---+-+-
 test_array_p1 | 1 | {1} | {1}
(1 row)

postgres=# SELECT tableoid::regclass,* FROM test_array_p2;
   tableoid| i |   j   |  k
---+---+---+-
 test_array_p2 | 2 | {2,2} | {2}
(1 row)

postgres=# SELECT tableoid::regclass,* FROM test_array;
ERROR:  cache lookup failed for type 0

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

On Fri, Sep 9, 2016 at 2:25 PM, Amit Langote 
wrote:

> On 2016/09/06 22:04, Amit Langote wrote:
> > Will fix.
>
> Here is an updated set of patches.
>
> In addition to fixing a couple of bugs reported by Ashutosh and Rajkumar,
> there are a few of major changes:
>
> * change the way individual partition bounds are represented internally
>   and the way a collection of partition bounds associated with a
>   partitioned table is exposed to other modules.  Especially list
>   partition bounds which are manipulated more efficiently as discussed
>   at [1].
>
> * \d partitioned_table now shows partition count and \d+ lists partition
>   names and their bounds as follows:
>
> \d t6
>Table "public.t6"
>  Column |   Type| Modifiers
> .---+---+---
>  a  | integer   |
>  b  | character varying |
> Partition Key: LIST (a)
> Number of partitions: 3 (Use \d+ to list them.)
>
> \d+ t6
>Table "public.t6"
>  Column |   Type| Modifiers | Storage  | Stats target |
> Description
> .---+---+---+--+
> --+-
>  a  | integer   |   | plain|  |
>  b  | character varying |   | extended |  |
> Partition Key: LIST (a)
> Partitions: t6_p1 FOR VALUES IN (1, 2, NULL),
> t6_p2 FOR VALUES IN (4, 5),
> t6_p3 FOR VALUES IN (3, 6)
>
> \d+ p
>  Table "public.p"
>  Column | Type | Modifiers | Storage  | Stats target | Description
> 

Re: [HACKERS] Declarative partitioning - another take

2016-09-08 Thread Amit Langote
On 2016/09/08 21:38, Rajkumar Raghuwanshi wrote:
> On Wed, Sep 7, 2016 at 3:58 PM, Amit Langote wrote:
>> On 2016/09/07 17:56, Rajkumar Raghuwanshi wrote:
>>>
>>> In this case not sure how to create partition table. Do we have something
>>> like we have UNBOUNDED for range partition or oracle have "DEFAULT" for
>>> list partition.
>>>
>>> create table employee (empid int, dept varchar) partition by list(dept);
>>> create table emp_p1 partition of employee for values in ('IT');
>>> create table emp_p2 partition of employee for values in ('HR');
>>> create table emp_p3 partition of employee for values in (??);
>>
>> Sorry, no such feature is currently offered.  It might be possible to
>> offer something like a "default" list partition which accepts values other
>> than those specified for other existing partitions.  However, that means
>> if we add a non-default list partition after a default one has been
>> created, the implementation must make sure that it moves any values from
>> the default partition that now belong to the newly created partition.
> 
> Thanks for clarifying, But I could see same problem of moving data when
> adding a new non-default partition with unbounded range partition.
> 
> For example give here, Initially I have create a partition table test with
> test_p3 as unbounded end,
> Later tried to change test_p3 to contain 7-9 values only, and adding a new
> partition test_p4 contain 10-unbound.
> 
> --create partition table and some leafs
> CREATE TABLE test (a int, b int) PARTITION BY RANGE(a);
> CREATE TABLE test_p1 PARTITION OF test FOR VALUES START (1) END (4);
> CREATE TABLE test_p2 PARTITION OF test FOR VALUES START (4) END (7);
> CREATE TABLE test_p3 PARTITION OF test FOR VALUES START (7) END UNBOUNDED;
> 
> --insert some data
> INSERT INTO test SELECT i, i*10 FROM generate_series(1,3) i;
> INSERT INTO test SELECT i, i*10 FROM generate_series(4,6) i;
> INSERT INTO test SELECT i, i*10 FROM generate_series(7,13) i;
> 
> --directly not able to attach test_p4 because of overlap error, hence
> detached test_p3 and than attaching test_p4
> SELECT tableoid::regclass,* FROM test;
>  tableoid | a  |  b
> --++-
>  test_p1  |  1 |  10
>  test_p1  |  2 |  20
>  test_p1  |  3 |  30
>  test_p2  |  4 |  40
>  test_p2  |  5 |  50
>  test_p2  |  6 |  60
>  test_p3  |  7 |  70
>  test_p3  |  8 |  80
>  test_p3  |  9 |  90
>  test_p3  | 10 | 100
>  test_p3  | 11 | 110
>  test_p3  | 12 | 120
>  test_p3  | 13 | 130
> (13 rows)
> 
> ALTER TABLE test DETACH PARTITION test_p3;
> CREATE TABLE test_p4 (like test);
> ALTER TABLE test ATTACH PARTITION test_p4 FOR VALUES start (10) end
> UNBOUNDED;
> 
> --now can not attach test_p3 because of overlap with test_p4, causing data
> loss from main test table.
> ALTER TABLE test ATTACH PARTITION test_p3 FOR VALUES start (7) end (10);
> ERROR:  source table contains a row violating partition bound specification
> ALTER TABLE test ATTACH PARTITION test_p3 FOR VALUES start (7) end (13);
> ERROR:  partition "test_p3" would overlap partition "test_p4"

In this particular case, you will have to move any rows in test_p3 with
key > or >= 10 into the new partition test_p4 using dml (to not lose any
data).  Then attach test_p3 as partition for values start (7) end (10);
you won't get either of the above errors.

Looking forward, what we need I think is a split partition command.
Adding a new partition that overlaps the default list partition or
unbounded range partition could be done by splitting the latter. Perhaps
something like:

alter table test
  split partition test_p3 at (10) [inclusive | exclusive] with test_p4;

The above command would make test_p3 into 2 partitions:

  test_p3 start (7) end (10) [inclusive | exclusive]
  test_p4 start (10) [exclusive | inclusive] end unbounded

Any rows in test_p3 with key > or >= 10 will be moved into the newly
created test_p4 as part of the execution of this command.

For your list partitioning example:

create table employee (empid int, dept varchar) partition by list(dept);
create table emp_p1 partition of employee for values in ('IT');
create table emp_p2 partition of employee for values in ('HR');
create table emp_p3 partition of employee for values in (default);

alter table emp
  split partition emp_p3 with emp_p3 ('ACCT') emp_p4 (default);

Any rows in emp_p3 with key != 'ACCT' will be moved into the newly created
default partition emp_p4.


But for time being, I think we could provide the syntax and mechanism for
default list partition seeing as we have the same for range partitioned
table (namely a range partition with unbounded start or end).  Although
with the limitations as discussed.

Thoughts?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-08 Thread Rajkumar Raghuwanshi
On Wed, Sep 7, 2016 at 3:58 PM, Amit Langote 
wrote:

>
> Hi,
>
> On 2016/09/07 17:56, Rajkumar Raghuwanshi wrote:
> > Hi,
> >
> > I have a query regarding list partitioning,
> >
> > For example if I want to store employee data in a table, with "IT" dept
> > employee in emp_p1 partition, "HR" dept employee in emp_p2 partition and
> if
> > employee belongs to other than these two, should come in emp_p3
> partition.
> >
> > In this case not sure how to create partition table. Do we have something
> > like we have UNBOUNDED for range partition or oracle have "DEFAULT" for
> > list partition.
> >
> > create table employee (empid int, dept varchar) partition by list(dept);
> > create table emp_p1 partition of employee for values in ('IT');
> > create table emp_p2 partition of employee for values in ('HR');
> > create table emp_p3 partition of employee for values in (??);
>
> Sorry, no such feature is currently offered.  It might be possible to
> offer something like a "default" list partition which accepts values other
> than those specified for other existing partitions.  However, that means
> if we add a non-default list partition after a default one has been
> created, the implementation must make sure that it moves any values from
> the default partition that now belong to the newly created partition.
>
> Thanks,
> Amit
>

Thanks for clarifying, But I could see same problem of moving data when
adding a new non-default partition with unbounded range partition.

For example give here, Initially I have create a partition table test with
test_p3 as unbounded end,
Later tried to change test_p3 to contain 7-9 values only, and adding a new
partition test_p4 contain 10-unbound.

--create partition table and some leafs
CREATE TABLE test (a int, b int) PARTITION BY RANGE(a);
CREATE TABLE test_p1 PARTITION OF test FOR VALUES START (1) END (4);
CREATE TABLE test_p2 PARTITION OF test FOR VALUES START (4) END (7);
CREATE TABLE test_p3 PARTITION OF test FOR VALUES START (7) END UNBOUNDED;

--insert some data
INSERT INTO test SELECT i, i*10 FROM generate_series(1,3) i;
INSERT INTO test SELECT i, i*10 FROM generate_series(4,6) i;
INSERT INTO test SELECT i, i*10 FROM generate_series(7,13) i;

--directly not able to attach test_p4 because of overlap error, hence
detached test_p3 and than attaching test_p4
SELECT tableoid::regclass,* FROM test;
 tableoid | a  |  b
--++-
 test_p1  |  1 |  10
 test_p1  |  2 |  20
 test_p1  |  3 |  30
 test_p2  |  4 |  40
 test_p2  |  5 |  50
 test_p2  |  6 |  60
 test_p3  |  7 |  70
 test_p3  |  8 |  80
 test_p3  |  9 |  90
 test_p3  | 10 | 100
 test_p3  | 11 | 110
 test_p3  | 12 | 120
 test_p3  | 13 | 130
(13 rows)

ALTER TABLE test DETACH PARTITION test_p3;
CREATE TABLE test_p4 (like test);
ALTER TABLE test ATTACH PARTITION test_p4 FOR VALUES start (10) end
UNBOUNDED;

--now can not attach test_p3 because of overlap with test_p4, causing data
loss from main test table.
ALTER TABLE test ATTACH PARTITION test_p3 FOR VALUES start (7) end (10);
ERROR:  source table contains a row violating partition bound specification
ALTER TABLE test ATTACH PARTITION test_p3 FOR VALUES start (7) end (13);
ERROR:  partition "test_p3" would overlap partition "test_p4"

SELECT tableoid::regclass,* FROM test;
 tableoid | a | b
--+---+
 test_p1  | 1 | 10
 test_p1  | 2 | 20
 test_p1  | 3 | 30
 test_p2  | 4 | 40
 test_p2  | 5 | 50
 test_p2  | 6 | 60
(6 rows)


Re: [HACKERS] Declarative partitioning - another take

2016-09-07 Thread Amit Langote

Hi,

On 2016/09/07 17:56, Rajkumar Raghuwanshi wrote:
> Hi,
> 
> I have a query regarding list partitioning,
> 
> For example if I want to store employee data in a table, with "IT" dept
> employee in emp_p1 partition, "HR" dept employee in emp_p2 partition and if
> employee belongs to other than these two, should come in emp_p3 partition.
> 
> In this case not sure how to create partition table. Do we have something
> like we have UNBOUNDED for range partition or oracle have "DEFAULT" for
> list partition.
> 
> create table employee (empid int, dept varchar) partition by list(dept);
> create table emp_p1 partition of employee for values in ('IT');
> create table emp_p2 partition of employee for values in ('HR');
> create table emp_p3 partition of employee for values in (??);

Sorry, no such feature is currently offered.  It might be possible to
offer something like a "default" list partition which accepts values other
than those specified for other existing partitions.  However, that means
if we add a non-default list partition after a default one has been
created, the implementation must make sure that it moves any values from
the default partition that now belong to the newly created partition.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-07 Thread Rajkumar Raghuwanshi
Hi,

I have a query regarding list partitioning,

For example if I want to store employee data in a table, with "IT" dept
employee in emp_p1 partition, "HR" dept employee in emp_p2 partition and if
employee belongs to other than these two, should come in emp_p3 partition.

In this case not sure how to create partition table. Do we have something
like we have UNBOUNDED for range partition or oracle have "DEFAULT" for
list partition.

create table employee (empid int, dept varchar) partition by list(dept);
create table emp_p1 partition of employee for values in ('IT');
create table emp_p2 partition of employee for values in ('HR');
create table emp_p3 partition of employee for values in (??);

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

On Tue, Sep 6, 2016 at 6:37 PM, Amit Langote 
wrote:

> On Tue, Sep 6, 2016 at 9:19 PM, Robert Haas  wrote:
> > On Wed, Aug 31, 2016 at 1:05 PM, Amit Langote
> >  wrote:
> >>> However, it seems a lot better to make it a property of the parent
> >>> from a performance point of view.  Suppose there are 1000 partitions.
> >>> Reading one toasted value for pg_class and running stringToNode() on
> >>> it is probably a lot faster than scanning pg_inherits to find all of
> >>> the child partitions and then doing an index scan to find the pg_class
> >>> tuple for each and then decoding all of those tuples and assembling
> >>> them into some data structure.
> >>
> >> Seems worth trying.  One point that bothers me a bit is how do we
> enforce
> >> partition bound condition on individual partition basis.  For example
> when
> >> a row is inserted into a partition directly, we better check that it
> does
> >> not fall outside the bounds and issue an error otherwise.  With current
> >> approach, we just look up a partition's bound from the catalog and gin
> up
> >> a check constraint expression (and cache in relcache) to be enforced in
> >> ExecConstraints().  With the new approach, I guess we would need to look
> >> up the parent's partition descriptor.  Note that the checking in
> >> ExecConstraints() is turned off when routing a tuple from the parent.
> >
> > [ Sorry for the slow response. ]
> >
> > Yeah, that's a problem.  Maybe it's best to associate this data with
> > the childrels after all - or halfway in between, e.g. augment
> > pg_inherits with this information.  After all, the performance problem
> > I was worried about above isn't really much of an issue: each backend
> > will build a relcache entry for the parent just once and then use it
> > for the lifetime of the session unless some invalidation occurs.  So
> > if that takes a small amount of extra time, it's probably not really a
> > big deal.  On the other hand, if we can't build the implicit
> > constraint for the child table without opening the parent, that's
> > probably going to cause us some serious inconvenience.
>
> Agreed.  So I will stick with the existing approach.
>
> Thanks,
> Amit
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Declarative partitioning - another take

2016-09-06 Thread Amit Langote
On Tue, Sep 6, 2016 at 9:19 PM, Robert Haas  wrote:
> On Wed, Aug 31, 2016 at 1:05 PM, Amit Langote
>  wrote:
>>> However, it seems a lot better to make it a property of the parent
>>> from a performance point of view.  Suppose there are 1000 partitions.
>>> Reading one toasted value for pg_class and running stringToNode() on
>>> it is probably a lot faster than scanning pg_inherits to find all of
>>> the child partitions and then doing an index scan to find the pg_class
>>> tuple for each and then decoding all of those tuples and assembling
>>> them into some data structure.
>>
>> Seems worth trying.  One point that bothers me a bit is how do we enforce
>> partition bound condition on individual partition basis.  For example when
>> a row is inserted into a partition directly, we better check that it does
>> not fall outside the bounds and issue an error otherwise.  With current
>> approach, we just look up a partition's bound from the catalog and gin up
>> a check constraint expression (and cache in relcache) to be enforced in
>> ExecConstraints().  With the new approach, I guess we would need to look
>> up the parent's partition descriptor.  Note that the checking in
>> ExecConstraints() is turned off when routing a tuple from the parent.
>
> [ Sorry for the slow response. ]
>
> Yeah, that's a problem.  Maybe it's best to associate this data with
> the childrels after all - or halfway in between, e.g. augment
> pg_inherits with this information.  After all, the performance problem
> I was worried about above isn't really much of an issue: each backend
> will build a relcache entry for the parent just once and then use it
> for the lifetime of the session unless some invalidation occurs.  So
> if that takes a small amount of extra time, it's probably not really a
> big deal.  On the other hand, if we can't build the implicit
> constraint for the child table without opening the parent, that's
> probably going to cause us some serious inconvenience.

Agreed.  So I will stick with the existing approach.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-06 Thread Amit Langote
Hi,

On Tue, Sep 6, 2016 at 8:15 PM, Rajkumar Raghuwanshi
 wrote:
> Hi,
>
> I have applied updated patches given by you, and observe below.
>
> here in the given example, t6_p3 partition is not allowed to have null, but
> I am able to insert it, causing two nulls in the table.
>
> --create a partition table
> create table t6 (a int, b varchar) partition by list(a);
> create table t6_p1 partition of t6 for values in (1,2,null);
> create table t6_p2 partition of t6 for values in (4,5);
> create table t6_p3 partition of t6 for values in (3,6);
>
> --insert some values
> insert into t6 select i,i::varchar from generate_series(1,6) i;
> insert into t6 values (null,'A');
>
> --try inserting null to t6_p3 partition table
> insert into t6_p3 values (null,'A');
>
> select tableoid::regclass,* from t6;
>  tableoid | a | b
> --+---+---
>  t6_p1| 1 | 1
>  t6_p1| 2 | 2
>  t6_p1|   | A
>  t6_p2| 4 | 4
>  t6_p2| 5 | 5
>  t6_p3| 3 | 3
>  t6_p3| 6 | 6
>  t6_p3|   | A
> (8 rows)

Thanks for testing!

That looks like a bug.  The same won't occur if you had inserted
through the parent (it would be correctly routed to the partition that
has been defined to accept nulls (that is, t6_p1 of your example). The
bug seems to have to do with the handling of nulls in generating
implicit constraints from the list of values of a given list
partition.  The direct insert on t6_p1 should have failed because
partition key has a value (null) violating the partition boundary
condition.  Will fix.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-06 Thread Ashutosh Bapat
This patch uses RangeBound structure. There's also a structure defined with
the same name in rangetypes.h with some slight differences. Should we
rename the one in partition.c as PartRangeBound or something like that to
avoid the confusion?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Declarative partitioning - another take

2016-09-06 Thread Robert Haas
On Wed, Aug 31, 2016 at 1:05 PM, Amit Langote
 wrote:
>> However, it seems a lot better to make it a property of the parent
>> from a performance point of view.  Suppose there are 1000 partitions.
>> Reading one toasted value for pg_class and running stringToNode() on
>> it is probably a lot faster than scanning pg_inherits to find all of
>> the child partitions and then doing an index scan to find the pg_class
>> tuple for each and then decoding all of those tuples and assembling
>> them into some data structure.
>
> Seems worth trying.  One point that bothers me a bit is how do we enforce
> partition bound condition on individual partition basis.  For example when
> a row is inserted into a partition directly, we better check that it does
> not fall outside the bounds and issue an error otherwise.  With current
> approach, we just look up a partition's bound from the catalog and gin up
> a check constraint expression (and cache in relcache) to be enforced in
> ExecConstraints().  With the new approach, I guess we would need to look
> up the parent's partition descriptor.  Note that the checking in
> ExecConstraints() is turned off when routing a tuple from the parent.

[ Sorry for the slow response. ]

Yeah, that's a problem.  Maybe it's best to associate this data with
the childrels after all - or halfway in between, e.g. augment
pg_inherits with this information.  After all, the performance problem
I was worried about above isn't really much of an issue: each backend
will build a relcache entry for the parent just once and then use it
for the lifetime of the session unless some invalidation occurs.  So
if that takes a small amount of extra time, it's probably not really a
big deal.  On the other hand, if we can't build the implicit
constraint for the child table without opening the parent, that's
probably going to cause us some serious inconvenience.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-06 Thread Rajkumar Raghuwanshi
Hi,

I have applied updated patches given by you, and observe below.

here in the given example, t6_p3 partition is not allowed to have null, but
I am able to insert it, causing two nulls in the table.

--create a partition table
create table t6 (a int, b varchar) partition by list(a);
create table t6_p1 partition of t6 for values in (1,2,null);
create table t6_p2 partition of t6 for values in (4,5);
create table t6_p3 partition of t6 for values in (3,6);

--insert some values
insert into t6 select i,i::varchar from generate_series(1,6) i;
insert into t6 values (null,'A');

--try inserting null to t6_p3 partition table
insert into t6_p3 values (null,'A');

select tableoid::regclass,* from t6;
 tableoid | a | b
--+---+---
 t6_p1| 1 | 1
 t6_p1| 2 | 2
 t6_p1|   | A
 t6_p2| 4 | 4
 t6_p2| 5 | 5
 t6_p3| 3 | 3
 t6_p3| 6 | 6
 t6_p3|   | A
(8 rows)

Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

On Tue, Sep 6, 2016 at 1:37 PM, Amit Langote 
wrote:

>
> Hi,
>
> On 2016/09/06 16:13, Ashutosh Bapat wrote:
> > I found a server crash while running make check in regress folder. with
> > this set of patches. Problem is RelationBuildPartitionKey() partexprsrc
> may
> > be used uninitialized. Initializing it with NIL fixes the crash. Here's
> > patch to fix it. Came up with the fix after discussion with Amit.
>
> Thanks for the report.  Here is a rebased version of the patches including
> you fix (no significant changes from those posted on Aug 26).
>
> Thanks,
> Amit
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] Declarative partitioning - another take

2016-09-06 Thread Ashutosh Bapat
I found a server crash while running make check in regress folder. with
this set of patches. Problem is RelationBuildPartitionKey() partexprsrc may
be used uninitialized. Initializing it with NIL fixes the crash. Here's
patch to fix it. Came up with the fix after discussion with Amit.

On Fri, Aug 26, 2016 at 1:45 PM, Amit Langote  wrote:

>
> Sorry it took me a while to reply.  Attached updated patches including the
> review comments on 0001 at [1].
>
> On 2016/08/17 3:54, Robert Haas wrote:
> > On Wed, Aug 10, 2016 at 7:09 AM, Amit Langote
> >  wrote:
> >> 0002-psql-and-pg_dump-support-for-partitioned-tables.patch
> >
> > +if (pset.sversion >= 90600 && tableinfo.relkind == 'P')
> >
> > Version check is redundant, right?
>
> Yep, fixed.
>
> > +) PARTITION BY RANGE ((a+b));
> > +\d describe_range_key
> > +Partitioned table "public.describe_range_key"
> > + Column |  Type   | Modifiers
> > ++-+---
> > + a  | integer |
> > + b  | integer |
> > +Partition Key: PARTITION BY RANGE (((a + b)))
> >
> > I understand that it's probably difficult not to end up with two sets
> > of parentheses here, but can we avoid ending up with three sets?
>
> Fixed.  This code was copy-pasted from the index code which has other
> considerations for adding surrounding parentheses which don't apply to the
> partition key code.
>
> > Also, I wonder if pg_get_partkeydef() should omit "PARTITION BY" and
> > pg_dump can add that part back.  Then this could say:
> >
> > Partition Key: RANGE ((a + b))
> >
> > ...which seems a good deal more natural than what you have now.
>
> Agreed, so done that way.
>
> >> 0003-Catalog-and-DDL-for-partition-bounds.patch
> >>
> >> Partition DDL includes both a way to create new partition and "attach"
> an
> >> existing table as a partition of parent partitioned table.  Attempt to
> >> drop a partition using DROP TABLE causes an error. Instead a partition
> >> needs first to be "detached" from parent partitioned table.  On the
> other
> >> hand, dropping the parent drops all the partitions if CASCADE is
> specified.
> >
> > Hmm, I don't think I like this.  Why should it be necessary to detach
> > a partition before dropping it?  That seems like an unnecessary step.
>
> I thought we had better lock the parent table when removing one of its
> partitions and it seemed a bit odd to lock the parent table when dropping
> a partition using DROP TABLE?  OTOH, with ALTER TABLE parent DETACH
> PARTITION, the parent table is locked anyway.
>
> >  [ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY
> > IMMEDIATE ]
> > +
> >  
> >
> > Unnecessary hunk.
> >
> > + 
> > +  If this table is a partition, one cannot perform DROP
> > NOT NULL
> > +  on a column if it is marked not null in the parent table.
> > +  not null.
> > + 
> >
> > Sentence fragment.
>
> Fixed.
>
> >
> > + 
> > +  Note that unlike the ATTACH PARTITION command, a
> partition
> > +  being detached can be itself partitioned.  In that case, it
> continues
> > +  to exist as such.
> > + 
> >
> > This is another restriction I don't understand.  Why can't I attach a
> > partitioned table?
>
> I removed this restriction.
>
> ATExecAttachPartition() adds a AT work queue entry for the table being
> attached to perform a heap scan in the rewrite phase.  The scan is done
> for checking that no row violates the partition boundary condition.  When
> attaching a partitioned table as partition, multiple AT work queue entries
> are now added - one for each leaf partition of the table being attached.
>
> > +indicate that descendant tables are included.  Note that whether
> > +ONLY or * is specified has no effect in
> case
> > +of a partitioned table; descendant tables (in this case,
> partitions)
> > +are always included.
> >
> > Ugh, why?  I think this should work exactly the same way for
> > partitioned tables that it does for any other inheritance hierarchy.
> > Sure, you'll get no rows, but who cares?
>
> Agreed, done that way.
>
> > +CREATE FOREIGN TABLE measurement_y2016m07
> > +PARTITION OF measurement FOR VALUES START ('2016-07-01') END
> > ('2016-08-01');
> > +SERVER server_07;
> >
> > Extra semicolon?
>
> Fixed.
>
> >
> > +  A partition cannot have columns other than those inherited from
> the
> > +  parent.  That includes the oid column, which can
> be
> >
> > I think experience suggests that this is a good restriction, but then
> > why does the syntax synopsis indicate that PARTITION BY can be
> > specified along with column definitions?  Similarly for CREATE FOREIGN
> > TABLE.
>
> The syntax synopsis of CREATE TABLE ... PARTITION OF indicates that a list
> of column WITH OPTION and/or table_constraint can be specified.  It does
> not allow column definitions.
>
> In this case, inherited columns will be listed in the PARTITION BY clause.
>
> Do you mean 

Re: [HACKERS] Declarative partitioning - another take

2016-09-02 Thread Amit Langote
On 2016/09/02 15:57, Ashutosh Bapat wrote:
> On Fri, Sep 2, 2016 at 12:23 PM, Amit Langote wrote:
>> Getting rid of the parent table in the append list by other means may be a
>> way to go.  We know that the table is empty and safe to just drop.
>>
> Ok. Does a constraint (1 = 2) or something like that which is obviously
> false, help?

No, anything like that would get reduced to a constant false clause by
eval_const_expressions() processing.

Thanks,
Amit





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-02 Thread Ashutosh Bapat
On Fri, Sep 2, 2016 at 12:23 PM, Amit Langote  wrote:

> On 2016/09/02 15:22, Ashutosh Bapat wrote:
> >>
> >>
> >>> 2. A combination of constraints on the partitions should be applicable
> to
> >>> the parent. We aren't doing that.
> >>
> >> How about on seeing that a RELOPT_OTHER_MEMBER_REL is partitioned parent
> >> table, we can have get_relation_constraints() include a constant false
> >> clause in the list of constraints returned for
> >> relation_excluded_by_constraints() to process so that it is not
> included
> >> in the append result by way of constraint exclusion.  One more option is
> >> to mark such rels dummy in set_rel_size().
> >>
> >>
> > I am not complaining about having parent relation there. For the people
> who
> > are used to seeing the parent relation in the list of append relations,
> it
> > may be awkward. But +1 if we can do that. If we can't do that, we should
> at
> > least mark with an OR of all constraints on the partitions, so that
> > constraint exclusion can exclude it if there are conditions incompatible
> > with constraints. This is what would happen in inheritance case as well,
> if
> > there are constraints on the parent. In the above example, the parent
> table
> > would have constraints CHECK ((a >= 0 AND a < 250) OR (a >= 250 and a <
> > 500) OR (a >= 500 or a < 600)). It will probably get excluded, if
> > constraint exclusion is smart enough to understand ORing.
>
> I guess constraint exclusion would be (is) smart enough to handle that
> correctly but why make it unnecessarily spend a *significant* amount of
> time on doing the proof (when we *know* we can just skip it).  Imagine how
> long the OR list could get.  By the way, my suggestion of just returning a
> constant false clause also does not work - neither in case of a query with
> restrictions (predicate has to be an OpExpr to go ahead with the proof
> which constant false clause is not) nor in case of a query without
> restrictions (proof is not performed at all).  So, that one is useless.
>

Huh!


>
> Getting rid of the parent table in the append list by other means may be a
> way to go.  We know that the table is empty and safe to just drop.
>
>
Ok. Does a constraint (1 = 2) or something like that which is obviously
false, help?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Declarative partitioning - another take

2016-09-02 Thread Amit Langote
On 2016/09/02 15:22, Ashutosh Bapat wrote:
>>
>>
>>> 2. A combination of constraints on the partitions should be applicable to
>>> the parent. We aren't doing that.
>>
>> How about on seeing that a RELOPT_OTHER_MEMBER_REL is partitioned parent
>> table, we can have get_relation_constraints() include a constant false
>> clause in the list of constraints returned for
>> relation_excluded_by_constraints() to process so that it is not included
>> in the append result by way of constraint exclusion.  One more option is
>> to mark such rels dummy in set_rel_size().
>>
>>
> I am not complaining about having parent relation there. For the people who
> are used to seeing the parent relation in the list of append relations, it
> may be awkward. But +1 if we can do that. If we can't do that, we should at
> least mark with an OR of all constraints on the partitions, so that
> constraint exclusion can exclude it if there are conditions incompatible
> with constraints. This is what would happen in inheritance case as well, if
> there are constraints on the parent. In the above example, the parent table
> would have constraints CHECK ((a >= 0 AND a < 250) OR (a >= 250 and a <
> 500) OR (a >= 500 or a < 600)). It will probably get excluded, if
> constraint exclusion is smart enough to understand ORing.

I guess constraint exclusion would be (is) smart enough to handle that
correctly but why make it unnecessarily spend a *significant* amount of
time on doing the proof (when we *know* we can just skip it).  Imagine how
long the OR list could get.  By the way, my suggestion of just returning a
constant false clause also does not work - neither in case of a query with
restrictions (predicate has to be an OpExpr to go ahead with the proof
which constant false clause is not) nor in case of a query without
restrictions (proof is not performed at all).  So, that one is useless.

Getting rid of the parent table in the append list by other means may be a
way to go.  We know that the table is empty and safe to just drop.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-02 Thread Ashutosh Bapat
>
>
> > 2. A combination of constraints on the partitions should be applicable to
> > the parent. We aren't doing that.
>
> How about on seeing that a RELOPT_OTHER_MEMBER_REL is partitioned parent
> table, we can have get_relation_constraints() include a constant false
> clause in the list of constraints returned for
> relation_excluded_by_constraints() to process so that it is not included
> in the append result by way of constraint exclusion.  One more option is
> to mark such rels dummy in set_rel_size().
>
>
I am not complaining about having parent relation there. For the people who
are used to seeing the parent relation in the list of append relations, it
may be awkward. But +1 if we can do that. If we can't do that, we should at
least mark with an OR of all constraints on the partitions, so that
constraint exclusion can exclude it if there are conditions incompatible
with constraints. This is what would happen in inheritance case as well, if
there are constraints on the parent. In the above example, the parent table
would have constraints CHECK ((a >= 0 AND a < 250) OR (a >= 250 and a <
500) OR (a >= 500 or a < 600)). It will probably get excluded, if
constraint exclusion is smart enough to understand ORing.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Declarative partitioning - another take

2016-09-02 Thread Amit Langote
On 2016/09/02 14:38, Ashutosh Bapat wrote:
> Here's something I observed with your set of patches posted in June. I have
> not checked the latest set of patches. So, if it's something fixed, please
> ignore the mail and sorry for me being lazy.
> 
> prt1 is partitioned table and it shows following information with \d+
> 
> regression=# \d+ prt1
> Partitioned table "public.prt1"
>  Column |   Type| Modifiers | Storage  | Stats target |
> Description
> +---+---+--+--+-
>  a  | integer   |   | plain|  |
>  b  | integer   |   | plain|  |
>  c  | character varying |   | extended |  |
> Partition Key: PARTITION BY RANGE (a)
> Indexes:
> "iprt1_a" btree (a)
> 
> Shouldn't we show all the partitions of this table and may be their ranges
> of lists?

Something I thought about as well.  I will implement that.

> I found the partitions from EXPLAIN plan
> 
> regression=# explain verbose select * from prt1;
>   QUERY PLAN
> ---
>  Append  (cost=0.00..6.00 rows=301 width=13)
>->  Seq Scan on public.prt1  (cost=0.00..0.00 rows=1 width=40)
>  Output: prt1.a, prt1.b, prt1.c
>->  Seq Scan on public.prt1_p1  (cost=0.00..2.25 rows=125 width=13)
>  Output: prt1_p1.a, prt1_p1.b, prt1_p1.c
>->  Seq Scan on public.prt1_p3  (cost=0.00..1.50 rows=50 width=13)
>  Output: prt1_p3.a, prt1_p3.b, prt1_p3.c
>->  Seq Scan on public.prt1_p2  (cost=0.00..2.25 rows=125 width=13)
>  Output: prt1_p2.a, prt1_p2.b, prt1_p2.c
> (9 rows)
> 
> Then did \d+ on each of those to find their ranges

[ ... ]

> 
> As you will observe that the table prt1 can not have any row with a < 0 and
> a > 600. But when I execute
> 
> regression=# explain verbose select * from prt1 where a > 100;
> QUERY PLAN
> --
>  Append  (cost=0.00..0.00 rows=1 width=40)
>->  Seq Scan on public.prt1  (cost=0.00..0.00 rows=1 width=40)
>  Output: prt1.a, prt1.b, prt1.c
>  Filter: (prt1.a > 100)
> (4 rows)
> 
> it correctly excluded all the partitions, but did not exclude the parent
> relation. I guess, we have enough information to exclude it. Probably, we
> should add a check constraint on the parent which is OR of the check
> constraints on all the partitions. So there are two problems here
> 
> 1. \d+ doesn't show partitions - this is probably reported earlier, I don't
> remember.

You just did, :)

As I said I will implement that on lines of how inheritance children are
listed (with additional information ie, range or list).

> 2. A combination of constraints on the partitions should be applicable to
> the parent. We aren't doing that.

How about on seeing that a RELOPT_OTHER_MEMBER_REL is partitioned parent
table, we can have get_relation_constraints() include a constant false
clause in the list of constraints returned for
relation_excluded_by_constraints() to process so that it is not included
in the append result by way of constraint exclusion.  One more option is
to mark such rels dummy in set_rel_size().

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-09-01 Thread Ashutosh Bapat
Here's something I observed with your set of patches posted in June. I have
not checked the latest set of patches. So, if it's something fixed, please
ignore the mail and sorry for me being lazy.

prt1 is partitioned table and it shows following information with \d+

regression=# \d+ prt1
Partitioned table "public.prt1"
 Column |   Type| Modifiers | Storage  | Stats target |
Description
+---+---+--+--+-
 a  | integer   |   | plain|  |
 b  | integer   |   | plain|  |
 c  | character varying |   | extended |  |
Partition Key: PARTITION BY RANGE (a)
Indexes:
"iprt1_a" btree (a)

Shouldn't we show all the partitions of this table and may be their ranges
of lists?

I found the partitions from EXPLAIN plan

regression=# explain verbose select * from prt1;
  QUERY PLAN
---
 Append  (cost=0.00..6.00 rows=301 width=13)
   ->  Seq Scan on public.prt1  (cost=0.00..0.00 rows=1 width=40)
 Output: prt1.a, prt1.b, prt1.c
   ->  Seq Scan on public.prt1_p1  (cost=0.00..2.25 rows=125 width=13)
 Output: prt1_p1.a, prt1_p1.b, prt1_p1.c
   ->  Seq Scan on public.prt1_p3  (cost=0.00..1.50 rows=50 width=13)
 Output: prt1_p3.a, prt1_p3.b, prt1_p3.c
   ->  Seq Scan on public.prt1_p2  (cost=0.00..2.25 rows=125 width=13)
 Output: prt1_p2.a, prt1_p2.b, prt1_p2.c
(9 rows)

Then did \d+ on each of those to find their ranges

regression=# \d+ prt1_p1
 Table "public.prt1_p1"
 Column |   Type| Modifiers | Storage  | Stats target |
Description
+---+---+--+--+-
 a  | integer   |   | plain|  |
 b  | integer   |   | plain|  |
 c  | character varying |   | extended |  |
Partition Of: prt1 FOR VALUES START (0) END (250)
Indexes:
"iprt1_p1_a" btree (a)

regression=# \d+ prt1_p2
 Table "public.prt1_p2"
 Column |   Type| Modifiers | Storage  | Stats target |
Description
+---+---+--+--+-
 a  | integer   |   | plain|  |
 b  | integer   |   | plain|  |
 c  | character varying |   | extended |  |
Partition Of: prt1 FOR VALUES START (250) END (500)
Indexes:
"iprt1_p2_a" btree (a)

regression=# \d+ prt1_p3
 Table "public.prt1_p3"
 Column |   Type| Modifiers | Storage  | Stats target |
Description
+---+---+--+--+-
 a  | integer   |   | plain|  |
 b  | integer   |   | plain|  |
 c  | character varying |   | extended |  |
Partition Of: prt1 FOR VALUES START (500) END (600)
Indexes:
"iprt1_p3_a" btree (a)

As you will observe that the table prt1 can not have any row with a < 0 and
a > 600. But when I execute

regression=# explain verbose select * from prt1 where a > 100;
QUERY PLAN
--
 Append  (cost=0.00..0.00 rows=1 width=40)
   ->  Seq Scan on public.prt1  (cost=0.00..0.00 rows=1 width=40)
 Output: prt1.a, prt1.b, prt1.c
 Filter: (prt1.a > 100)
(4 rows)

it correctly excluded all the partitions, but did not exclude the parent
relation. I guess, we have enough information to exclude it. Probably, we
should add a check constraint on the parent which is OR of the check
constraints on all the partitions. So there are two problems here

1. \d+ doesn't show partitions - this is probably reported earlier, I don't
remember.
2. A combination of constraints on the partitions should be applicable to
the parent. We aren't doing that.


On Wed, Aug 10, 2016 at 4:39 PM, Amit Langote  wrote:

> Hi,
>
> Attached is the latest set of patches to implement declarative
> partitioning.  There is already a commitfest entry for the same:
> https://commitfest.postgresql.org/10/611/
>
> The old discussion is here:
> http://www.postgresql.org/message-id/55d3093c.5010...@lab.ntt.co.jp/
>
> Attached patches are described below:
>
> 0001-Catalog-and-DDL-for-partition-key.patch
> 0002-psql-and-pg_dump-support-for-partitioned-tables.patch
>
> These patches create the infrastructure and DDL for partitioned
> tables.
>
> In addition to a catalog for storing the partition key information, this
> adds a new relkind to pg_class.h. PARTITION BY clause is added to CREATE
> TABLE. Tables so created are RELKIND_PARTITIONED_REL 

Re: [HACKERS] Declarative partitioning - another take

2016-09-01 Thread Ashutosh Bapat
>
> > I don't think you need to do anything in the path creation code for this.
> > As is it flattens all AppendPath hierarchies whether for partitioning or
> > inheritance or subqueries. We should leave it as it is.
>
> I thought it would be convenient for pairwise join code to work with the
> hierarchy intact even within the AppendPath tree.  If it turns out to be
> so, maybe that patch can take care of it.
>

Partition-wise join work with RelOptInfos, so it's fine if the AppendPath
hierarchy is flattened out. We need the RelOptInfo hierarchy though.


>
> >> I think I can manage to squeeze in (a) in the next version patch and
> will
> >> also start working on (b), mainly the part about RelOptInfo getting some
> >> partitioning info.
> >
> > I am fine with b, where you would include some partitioning information
> in
> > RelOptInfo. But you don't need to do what you said in (b) above.
> >
> > In a private conversation Robert Haas suggested a way slightly different
> > than what my patch for partition-wise join does. He suggested that the
> > partitioning schemes i.e strategy, number of partitions and bounds of the
> > partitioned elations involved in the query should be stored in
> PlannerInfo
> > in the form of a list. Each partitioning scheme is annotated with the
> > relids of the partitioned relations. RelOptInfo of the partitioned
> relation
> > will point to the partitioning scheme in PlannerInfo. Along-with that
> each
> > RelOptInfo will need to store partition keys for corresponding relation.
> > This simplifies matching the partitioning schemes of the joining
> relations.
> > Also it reduces the number of copies of partition bounds floating around
> as
> > we expect that a query will involve multiple partitioned tables following
> > similar partitioning schemes. May be you want to consider this idea while
> > working on (b).
>
> So IIUC, a partitioned relation's (baserel or joinrel) RelOptInfo has only
> the information about partition keys.  They will be matched with query
> restriction quals pruning away any unneeded partitions which happens
> individually for each such parent baserel (within set_append_rel_size() I
> suppose).  Further, two joining relations are eligible to be considered
> for pairwise joining if they have identical partition keys and query
> equi-join quals match the same.  The resulting joinrel will have the same
> partition key (as either joining relation) and will have as many
> partitions as there are in the intersection of sets of partitions of
> joining rels (intersection proceeds by matching partition bounds).
>
> "Partition scheme" structs go into a PlannerInfo list member, one
> corresponding to each partitioned relation - baserel or joinrel, right?
>

Multiple relations (base or join) can share Partition Scheme if they are
partitioned the same way. Each partition scheme also stores the relids of
the base relations partitioned by that scheme.


> As you say, each such struct has the following pieces of information:
> strategy, num_partitions, bounds (and other auxiliary info).  Before
> make_one_rel() starts, the list has one for each partitioned baserel.
>
After make_one_rel() has formed baserel pathlists and before
> make_rel_from_joinlist() is called, are the partition scheme structs of
> processed baserels marked with some information about the pruning activity
> that occurred so far?


Right now pruned partitions are labelled as dummy rels (empty appent
paths). That's enough to detect a pruned partition. I haven't found a need
to label partitioning scheme with pruned partitions for partition-wise join.


> Then as we build successively higher levels of
> joinrels, new entries will be made for those joinrels for which we added
> pairwise join paths, with relids matching the corresponding joinrels.
> Does that make sense?
>
>
I don't think we will make any new partition scheme entry in PlannerInfo
after all the base relations have been considered. Partitionin-wise join
will pick the one suitable for the given join. But in case partition-wise
join needs to make new entries, I will take care of that in my patch.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Declarative partitioning - another take

2016-08-31 Thread Amit Langote
On 2016/08/25 16:15, Ashutosh Bapat wrote:
> On Thu, Aug 25, 2016 at 12:22 PM, Amit Langote wrote:
>> b)
>> when accumulating append subpaths, do not flatten a subpath that is itself
>> an append when ((AppendPath *) subpath)->path.parent is a RelOptInfo with
>> non-NULL partitioning info.Is the latter somehow necessary for
>> pairwise-join considerations?
> 
> I don't think you need to do anything in the path creation code for this.
> As is it flattens all AppendPath hierarchies whether for partitioning or
> inheritance or subqueries. We should leave it as it is.

I thought it would be convenient for pairwise join code to work with the
hierarchy intact even within the AppendPath tree.  If it turns out to be
so, maybe that patch can take care of it.

>> I think I can manage to squeeze in (a) in the next version patch and will
>> also start working on (b), mainly the part about RelOptInfo getting some
>> partitioning info.
> 
> I am fine with b, where you would include some partitioning information in
> RelOptInfo. But you don't need to do what you said in (b) above.
> 
> In a private conversation Robert Haas suggested a way slightly different
> than what my patch for partition-wise join does. He suggested that the
> partitioning schemes i.e strategy, number of partitions and bounds of the
> partitioned elations involved in the query should be stored in PlannerInfo
> in the form of a list. Each partitioning scheme is annotated with the
> relids of the partitioned relations. RelOptInfo of the partitioned relation
> will point to the partitioning scheme in PlannerInfo. Along-with that each
> RelOptInfo will need to store partition keys for corresponding relation.
> This simplifies matching the partitioning schemes of the joining relations.
> Also it reduces the number of copies of partition bounds floating around as
> we expect that a query will involve multiple partitioned tables following
> similar partitioning schemes. May be you want to consider this idea while
> working on (b).

So IIUC, a partitioned relation's (baserel or joinrel) RelOptInfo has only
the information about partition keys.  They will be matched with query
restriction quals pruning away any unneeded partitions which happens
individually for each such parent baserel (within set_append_rel_size() I
suppose).  Further, two joining relations are eligible to be considered
for pairwise joining if they have identical partition keys and query
equi-join quals match the same.  The resulting joinrel will have the same
partition key (as either joining relation) and will have as many
partitions as there are in the intersection of sets of partitions of
joining rels (intersection proceeds by matching partition bounds).

"Partition scheme" structs go into a PlannerInfo list member, one
corresponding to each partitioned relation - baserel or joinrel, right?
As you say, each such struct has the following pieces of information:
strategy, num_partitions, bounds (and other auxiliary info).  Before
make_one_rel() starts, the list has one for each partitioned baserel.
After make_one_rel() has formed baserel pathlists and before
make_rel_from_joinlist() is called, are the partition scheme structs of
processed baserels marked with some information about the pruning activity
that occurred so far?  Then as we build successively higher levels of
joinrels, new entries will be made for those joinrels for which we added
pairwise join paths, with relids matching the corresponding joinrels.
Does that make sense?

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-08-31 Thread Amit Langote
On 2016/08/31 16:17, Robert Haas wrote:
> On Wed, Aug 31, 2016 at 12:37 PM, Amit Langote wrote:
>> What I was trying to understand is why this would not be possible
>> with a design where partition bound is stored in the catalog as a property
>> of individual partitions instead of a design where we store collection of
>> partition bounds as a property of the parent.
> 
> From the point of view of feasibility, I don't think it matters very
> much where the property is stored; it's the locking that is the key
> thing.  In other words, I think this *would* be possible if the
> partition bound is stored as a property of individual partitions, as
> long as it can't change without a lock on the parent.
> 
> However, it seems a lot better to make it a property of the parent
> from a performance point of view.  Suppose there are 1000 partitions.
> Reading one toasted value for pg_class and running stringToNode() on
> it is probably a lot faster than scanning pg_inherits to find all of
> the child partitions and then doing an index scan to find the pg_class
> tuple for each and then decoding all of those tuples and assembling
> them into some data structure.

Seems worth trying.  One point that bothers me a bit is how do we enforce
partition bound condition on individual partition basis.  For example when
a row is inserted into a partition directly, we better check that it does
not fall outside the bounds and issue an error otherwise.  With current
approach, we just look up a partition's bound from the catalog and gin up
a check constraint expression (and cache in relcache) to be enforced in
ExecConstraints().  With the new approach, I guess we would need to look
up the parent's partition descriptor.  Note that the checking in
ExecConstraints() is turned off when routing a tuple from the parent.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-08-31 Thread Robert Haas
On Wed, Aug 31, 2016 at 12:37 PM, Amit Langote
 wrote:
>>> If we need an AccessExclusiveLock on parent to add/remove a partition
>>> (IOW, changing that child table's partitioning information), then do we
>>> need to lock the individual partitions when reading partition's
>>> information?  I mean to ask why the simple syscache look-ups to get each
>>> partition's bound wouldn't do.
>>
>> Well, if X can't be changed without having an AccessExclusiveLock on
>> the parent, then an AccessShareLock on the parent is sufficient to
>> read X, right?  Because those lock modes conflict.
>
> Yes.  And hence we can proceed with performing partition elimination
> before locking any of children.  Lock on parent (AccessShareLock) will
> prevent any of existing partitions to be removed and any new partitions to
> be added because those operations require AccessExclusiveLock on the
> parent.

Agreed.

> What I was trying to understand is why this would not be possible
> with a design where partition bound is stored in the catalog as a property
> of individual partitions instead of a design where we store collection of
> partition bounds as a property of the parent.

>From the point of view of feasibility, I don't think it matters very
much where the property is stored; it's the locking that is the key
thing.  In other words, I think this *would* be possible if the
partition bound is stored as a property of individual partitions, as
long as it can't change without a lock on the parent.

However, it seems a lot better to make it a property of the parent
from a performance point of view.  Suppose there are 1000 partitions.
Reading one toasted value for pg_class and running stringToNode() on
it is probably a lot faster than scanning pg_inherits to find all of
the child partitions and then doing an index scan to find the pg_class
tuple for each and then decoding all of those tuples and assembling
them into some data structure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-08-31 Thread Amit Langote
On 2016/08/29 20:53, Robert Haas wrote:
> On Fri, Aug 26, 2016 at 1:33 PM, Amit Langote
>  wrote:
>> We do take a lock on the parent because we would be changing its partition
>> descriptor (relcache).  I changed MergeAttributes() such that an
>> AccessExclusiveLock instead of ShareUpdateExclusiveLock is taken if the
>> parent is a partitioned table.
> 
> Hmm, that seems both good and bad.  On the good side, as mentioned,
> being able to rely on the partition descriptor not changing under us
> makes this sort of thing much easier to reason about.  On the bad
> side, it isn't good for this feature to have worse concurrency than
> regular inheritance.  Not sure what to do about this.
>
>> If we need an AccessExclusiveLock on parent to add/remove a partition
>> (IOW, changing that child table's partitioning information), then do we
>> need to lock the individual partitions when reading partition's
>> information?  I mean to ask why the simple syscache look-ups to get each
>> partition's bound wouldn't do.
> 
> Well, if X can't be changed without having an AccessExclusiveLock on
> the parent, then an AccessShareLock on the parent is sufficient to
> read X, right?  Because those lock modes conflict.

Yes.  And hence we can proceed with performing partition elimination
before locking any of children.  Lock on parent (AccessShareLock) will
prevent any of existing partitions to be removed and any new partitions to
be added because those operations require AccessExclusiveLock on the
parent.  What I was trying to understand is why this would not be possible
with a design where partition bound is stored in the catalog as a property
of individual partitions instead of a design where we store collection of
partition bounds as a property of the parent.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-08-29 Thread Robert Haas
On Fri, Aug 26, 2016 at 1:33 PM, Amit Langote
 wrote:
> We do take a lock on the parent because we would be changing its partition
> descriptor (relcache).  I changed MergeAttributes() such that an
> AccessExclusiveLock instead of ShareUpdateExclusiveLock is taken if the
> parent is a partitioned table.

Hmm, that seems both good and bad.  On the good side, as mentioned,
being able to rely on the partition descriptor not changing under us
makes this sort of thing much easier to reason about.  On the bad
side, it isn't good for this feature to have worse concurrency than
regular inheritance.  Not sure what to do about this.

> If we need an AccessExclusiveLock on parent to add/remove a partition
> (IOW, changing that child table's partitioning information), then do we
> need to lock the individual partitions when reading partition's
> information?  I mean to ask why the simple syscache look-ups to get each
> partition's bound wouldn't do.

Well, if X can't be changed without having an AccessExclusiveLock on
the parent, then an AccessShareLock on the parent is sufficient to
read X, right?  Because those lock modes conflict.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-08-26 Thread Amit Langote
On 2016/08/18 5:23, Robert Haas wrote:
> On Wed, Aug 17, 2016 at 2:21 AM, Amit Langote
>  wrote:
>> I am slightly tempted to eliminate the pg_partition catalog and associated
>> syscache altogether and add a column to pg_class as Robert suggested.
>> That way, all relid_is_partition() calls will be replaced by
>> rel->rd_partbound != NULL check.  But one potential problem with that
>> approach is that now whenever a parent relation is opened, all the
>> partition relations must be opened to get the partbound value (to form the
>> PartitionDesc to be stored in parent relation's rd_partdesc).  Whereas
>> currently, we just look up the pg_partition catalog (or the associated
>> cache) for every partition and that gets us the partbound.
> 
> Well, you could just look up the pg_class row without opening the
> relation, too.  There is a system cache on pg_class.oid, after all.  I

Yes, I somehow didn't think of that.

> think the issue is whether it's safe to read either one of those
> things without a lock on the child relation.  If altering the
> partitioning information for a relation requires holding only
> AccessExclusiveLock on that relation, and no lock on the parent, then
> you really can't read the information for any child relation without
> taking at least AccessShareLock.  Otherwise, it might change under
> you, and that would be bad.

I'd imagine this won't be a problem because we take an AccessExclusiveLock
on the parent when adding/removing a partition.

> I'm inclined to think that changing the partitioning information for a
> child is going to require AccessExclusiveLock on both the child and
> the parent.  That seems unfortunate from a concurrency point of view,
> but we may be stuck with it: suppose you require only
> ShareUpdateExclusiveLock on the parent.  Well, then a concurrent read
> transaction might see the partition boundaries change when it does a
> relcache rebuild, which would cause it to suddenly start expecting the
> data to be in a different plan in mid-transaction, perhaps even in
> mid-scan.  Maybe that's survivable with really careful coding, but it
> seems like it's probably a bad thing.  For example, it would mean that
> the executor would be unable to rely on the partitioning information
> in the relcache remaining stable underneath it.  Moreover, the
> relcache is always going to be scanned with the most recent possible
> MVCC snapshot, but the transaction snapshot may be older, so such a
> system creates all sorts of nasty possibilities for there to be skew
> between the snapshot being used to via the data and the snapshot being
> used to read the metadata that says where the data is.

We do take a lock on the parent because we would be changing its partition
descriptor (relcache).  I changed MergeAttributes() such that an
AccessExclusiveLock instead of ShareUpdateExclusiveLock is taken if the
parent is a partitioned table.

> This may need some more thought, but if we go with that approach of
> requiring an AccessExclusiveLock on both parent and child, then it
> seems to me that maybe we should consider the partitioning information
> to be a property of the parent rather than the child.  Just take all
> the partitioning information for all children and put it in one big
> node tree and store it in the pg_class or pg_partition_root entry for
> the parent as one big ol' varlena.  Now you can open the parent and
> get all of the partitioning information for all of the children
> without needing any lock on any child, and that's *really* good,
> because it means that some day we might be able to do partition
> elimination before locking any of the children!  That would be
> excellent.

If we need an AccessExclusiveLock on parent to add/remove a partition
(IOW, changing that child table's partitioning information), then do we
need to lock the individual partitions when reading partition's
information?  I mean to ask why the simple syscache look-ups to get each
partition's bound wouldn't do.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-08-25 Thread Ashutosh Bapat
On Thu, Aug 25, 2016 at 12:22 PM, Amit Langote <
langote_amit...@lab.ntt.co.jp> wrote:

> On 2016/08/22 13:51, Ashutosh Bapat wrote:
> > The parent-child relationship of multi-level partitioned tables is not
> > retained when creating the AppendRelInfo nodes. We create RelOptInfo
> nodes
> > for all the leaf and intermediate tables. The AppendRelInfo nodes created
> > for these RelOptInfos set the topmost table as the parent of all the leaf
> > and child tables. Since partitioning scheme/s at each level is/are
> > associated with the parent/s at that level, we loose information about
> the
> > immediate parents and thus it becomes difficult to identify which leaf
> node
> > falls where in the partition hierarchy. This stops us from doing any
> > lump-sum partition pruning where we can eliminate all the partitions
> under
> > a given parent-partition if that parent-partition gets pruned. It also
> > restricts partition-wise join technique from being applied to partial
> > partition hierarchy when the whole partitioning scheme of joining tables
> > does not match. Maintaining a RelOptInfo hierarchy should not create
> > corresponding Append (all kinds) plan hierarchy since
> > accumulate_append_subpath() flattens any such hierarchy while creating
> > paths. Can you please consider this point in your upcoming patch?
>
> I agree.  So there seem to be two things here:  a) when expanding a
> partitioned table inheritance set, do it recursively such that resulting
> AppendRelInfos preserve *immediate* parent-child relationship info.


Right.


> b)
> when accumulating append subpaths, do not flatten a subpath that is itself
> an append when ((AppendPath *) subpath)->path.parent is a RelOptInfo with
> non-NULL partitioning info.Is the latter somehow necessary for
> pairwise-join considerations?
>

I don't think you need to do anything in the path creation code for this.
As is it flattens all AppendPath hierarchies whether for partitioning or
inheritance or subqueries. We should leave it as it is.



> I think I can manage to squeeze in (a) in the next version patch and will
> also start working on (b), mainly the part about RelOptInfo getting some
> partitioning info.
>

I am fine with b, where you would include some partitioning information in
RelOptInfo. But you don't need to do what you said in (b) above.

In a private conversation Robert Haas suggested a way slightly different
than what my patch for partition-wise join does. He suggested that the
partitioning schemes i.e strategy, number of partitions and bounds of the
partitioned elations involved in the query should be stored in PlannerInfo
in the form of a list. Each partitioning scheme is annotated with the
relids of the partitioned relations. RelOptInfo of the partitioned relation
will point to the partitioning scheme in PlannerInfo. Along-with that each
RelOptInfo will need to store partition keys for corresponding relation.
This simplifies matching the partitioning schemes of the joining relations.
Also it reduces the number of copies of partition bounds floating around as
we expect that a query will involve multiple partitioned tables following
similar partitioning schemes. May be you want to consider this idea while
working on (b).


> Thanks,
> Amit
>
>
>


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Declarative partitioning - another take

2016-08-25 Thread Amit Langote
On 2016/08/22 13:51, Ashutosh Bapat wrote:
> The parent-child relationship of multi-level partitioned tables is not
> retained when creating the AppendRelInfo nodes. We create RelOptInfo nodes
> for all the leaf and intermediate tables. The AppendRelInfo nodes created
> for these RelOptInfos set the topmost table as the parent of all the leaf
> and child tables. Since partitioning scheme/s at each level is/are
> associated with the parent/s at that level, we loose information about the
> immediate parents and thus it becomes difficult to identify which leaf node
> falls where in the partition hierarchy. This stops us from doing any
> lump-sum partition pruning where we can eliminate all the partitions under
> a given parent-partition if that parent-partition gets pruned. It also
> restricts partition-wise join technique from being applied to partial
> partition hierarchy when the whole partitioning scheme of joining tables
> does not match. Maintaining a RelOptInfo hierarchy should not create
> corresponding Append (all kinds) plan hierarchy since
> accumulate_append_subpath() flattens any such hierarchy while creating
> paths. Can you please consider this point in your upcoming patch?

I agree.  So there seem to be two things here:  a) when expanding a
partitioned table inheritance set, do it recursively such that resulting
AppendRelInfos preserve *immediate* parent-child relationship info.  b)
when accumulating append subpaths, do not flatten a subpath that is itself
an append when ((AppendPath *) subpath)->path.parent is a RelOptInfo with
non-NULL partitioning info.  Is the latter somehow necessary for
pairwise-join considerations?

I think I can manage to squeeze in (a) in the next version patch and will
also start working on (b), mainly the part about RelOptInfo getting some
partitioning info.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Declarative partitioning - another take

2016-08-21 Thread Ashutosh Bapat
The parent-child relationship of multi-level partitioned tables is not
retained when creating the AppendRelInfo nodes. We create RelOptInfo nodes
for all the leaf and intermediate tables. The AppendRelInfo nodes created
for these RelOptInfos set the topmost table as the parent of all the leaf
and child tables. Since partitioning scheme/s at each level is/are
associated with the parent/s at that level, we loose information about the
immediate parents and thus it becomes difficult to identify which leaf node
falls where in the partition hierarchy. This stops us from doing any
lump-sum partition pruning where we can eliminate all the partitions under
a given parent-partition if that parent-partition gets pruned. It also
restricts partition-wise join technique from being applied to partial
partition hierarchy when the whole partitioning scheme of joining tables
does not match. Maintaining a RelOptInfo hierarchy should not create
corresponding Append (all kinds) plan hierarchy since
accumulate_append_subpath() flattens any such hierarchy while creating
paths. Can you please consider this point in your upcoming patch?


On Wed, Aug 10, 2016 at 4:39 PM, Amit Langote  wrote:

> Hi,
>
> Attached is the latest set of patches to implement declarative
> partitioning.  There is already a commitfest entry for the same:
> https://commitfest.postgresql.org/10/611/
>
> The old discussion is here:
> http://www.postgresql.org/message-id/55d3093c.5010...@lab.ntt.co.jp/
>
> Attached patches are described below:
>
> 0001-Catalog-and-DDL-for-partition-key.patch
> 0002-psql-and-pg_dump-support-for-partitioned-tables.patch
>
> These patches create the infrastructure and DDL for partitioned
> tables.
>
> In addition to a catalog for storing the partition key information, this
> adds a new relkind to pg_class.h. PARTITION BY clause is added to CREATE
> TABLE. Tables so created are RELKIND_PARTITIONED_REL relations which are
> to be special in a number of ways, especially with regard to their
> interactions with regular table inheritance features.
>
> PARTITION BY RANGE ({ column_name | ( expression ) } [ opclass ] [, ...])
> PARTITION BY LIST ({ column_name | ( expression ) } [ opclass ])
>
>
> 0003-Catalog-and-DDL-for-partition-bounds.patch
> 0004-psql-and-pg_dump-support-for-partitions.patch
>
> These patches create the infrastructure and DDL for partitions.
>
> Parent-child relationships of a partitioned table and its partitions are
> managed behind-the-scenes with inheritance.  That means there is a
> pg_inherits entry and attributes, constraints, etc. are marked with
> inheritance related information appropriately.  However this case differs
> from a regular inheritance relationship in a number of ways.  While the
> regular inheritance imposes certain restrictions on what elements a
> child's schema is allowed to contain (both at creation time and
> after-the-fact), the partitioning related code imposes further
> restrictions.  For example, while regular inheritance allows a child to
> contain its own columns, the partitioning code disallows that.  Stuff like
> NO INHERIT marking on check constraints, ONLY are ignored by the the
> partitioning code.
>
> Partition DDL includes both a way to create new partition and "attach" an
> existing table as a partition of parent partitioned table.  Attempt to
> drop a partition using DROP TABLE causes an error. Instead a partition
> needs first to be "detached" from parent partitioned table.  On the other
> hand, dropping the parent drops all the partitions if CASCADE is specified.
>
> CREATE TABLE partition_name
> PARTITION OF parent_table [ (
>   { column_name WITH OPTIONS [ column_constraint [ ... ] ]
> | table_constraint }
> [, ... ]
> ) ] partition_bound_spec
> [ PARTITION BY {RANGE | LIST} ( { column_name | ( expression ) } [ opclass
> ] [, ...] )
>
> CREATE FOREIGN TABLE [ IF NOT EXISTS ] table_name
>   PARTITION OF parent_table [ (
>   { column_name WITH OPTIONS [ column_constraint [ ... ] ]
> | table_constraint }
> [, ... ]
> ) ] partition_bound_spec
>   SERVER server_name
> [ OPTIONS ( option 'value' [, ... ] ) ]
>
> ALTER TABLE parent ATTACH PARTITION partition_name partition_bound_spec [
> VALIDATE | NO VALIDATE ]
>
> ALTER TABLE parent DETACH PARTITION partition_name
>
> partition_bound_spec is:
>
> FOR VALUES { list_spec | range_spec }
>
> list_spec in FOR VALUES is:
>
> IN ( expression [, ...] )
>
> range_spec in FOR VALUES is:
>
> START lower-bound [ INCLUSIVE | EXCLUSIVE ] END upper-bound [ INCLUSIVE |
> EXCLUSIVE ]
>
> where lower-bound and upper-bound are:
>
> { ( expression [, ...] ) | UNBOUNDED }
>
> expression can be a string literal, a numeric literal or NULL.
>
> Note that the one can specify PARTITION BY when creating a partition
> itself. That is to allow creating multi-level partitioned tables.
>
>
> 0005-Teach-a-few-places-to-use-partition-check-constraint.patch
>
> A partition's bound implicitly constrains 

Re: [HACKERS] Declarative partitioning - another take

2016-08-17 Thread Robert Haas
On Wed, Aug 17, 2016 at 2:21 AM, Amit Langote
 wrote:
> I am slightly tempted to eliminate the pg_partition catalog and associated
> syscache altogether and add a column to pg_class as Robert suggested.
> That way, all relid_is_partition() calls will be replaced by
> rel->rd_partbound != NULL check.  But one potential problem with that
> approach is that now whenever a parent relation is opened, all the
> partition relations must be opened to get the partbound value (to form the
> PartitionDesc to be stored in parent relation's rd_partdesc).  Whereas
> currently, we just look up the pg_partition catalog (or the associated
> cache) for every partition and that gets us the partbound.

Well, you could just look up the pg_class row without opening the
relation, too.  There is a system cache on pg_class.oid, after all.  I
think the issue is whether it's safe to read either one of those
things without a lock on the child relation.  If altering the
partitioning information for a relation requires holding only
AccessExclusiveLock on that relation, and no lock on the parent, then
you really can't read the information for any child relation without
taking at least AccessShareLock.  Otherwise, it might change under
you, and that would be bad.

I'm inclined to think that changing the partitioning information for a
child is going to require AccessExclusiveLock on both the child and
the parent.  That seems unfortunate from a concurrency point of view,
but we may be stuck with it: suppose you require only
ShareUpdateExclusiveLock on the parent.  Well, then a concurrent read
transaction might see the partition boundaries change when it does a
relcache rebuild, which would cause it to suddenly start expecting the
data to be in a different plan in mid-transaction, perhaps even in
mid-scan.  Maybe that's survivable with really careful coding, but it
seems like it's probably a bad thing.  For example, it would mean that
the executor would be unable to rely on the partitioning information
in the relcache remaining stable underneath it.  Moreover, the
relcache is always going to be scanned with the most recent possible
MVCC snapshot, but the transaction snapshot may be older, so such a
system creates all sorts of nasty possibilities for there to be skew
between the snapshot being used to via the data and the snapshot being
used to read the metadata that says where the data is.

This may need some more thought, but if we go with that approach of
requiring an AccessExclusiveLock on both parent and child, then it
seems to me that maybe we should consider the partitioning information
to be a property of the parent rather than the child.  Just take all
the partitioning information for all children and put it in one big
node tree and store it in the pg_class or pg_partition_root entry for
the parent as one big ol' varlena.  Now you can open the parent and
get all of the partitioning information for all of the children
without needing any lock on any child, and that's *really* good,
because it means that some day we might be able to do partition
elimination before locking any of the children!  That would be
excellent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2   3   4   >