Re: Boolean partitions syntax

Amit Langote Fri, 20 Apr 2018 03:10:29 -0700

On 2018/04/19 20:35, Kyotaro HORIGUCHI wrote:
> At Wed, 18 Apr 2018 19:27:16 +0900, Amit Langote wrote:
>> On 2018/04/16 16:17, Kyotaro HORIGUCHI wrote:
>>> It was a bother that some rules used c_expr directly but I
>>> managed to replace all of them with a_expr by lowering precedence
>>> of some ordinary keywords (PASSING, BY, COLUMNS and ROW). c_expr
>>> is no loger used elsewhere so we can just remove columnref from
>>> c_expr. Finally [abc]0_expr was eliminated and we have only
>>> a_expr, b_expr, u_expr and c_expr. This seems simple enough.
>>>
>>> The relationship among the rules after this change is as follows.
>>>
>>> a_expr --+-- columnref
>>>          +-- u_expr ---+-- c_expr -- (all old c_expr stuff except columnref)
>>>                        +-- (all old a_expr stuff)
>>>                            
>>> b_expr --+-- columnref
>>>          +-- c_expr -- (ditto)
>>>          +-- (all old b_expr stuff)
>>>
>>> On the way fixing this, I noticed that the precedence of some
>>> keywords (PRESERVE, STRIP_P) that was more than necessary and
>>> corrected it.
>>
>> Thank you for simplifying gram.y changes.  I think I was able to
>> understand them.  Having said that, I think your proposed patch to gram.y
>> could be rewritten such that they do not *appear* to impact the syntax for
>> other features like XML, etc.  For example, allowing a_expr in places
>> where previously only c_expr was allowed seems to me a dangerous, although
>> I don't have any examples to show for it.
> 
> It cannot be dangerous, since "(a_expr)" is a c_expr. The
> difference is just the way resolve conflicts. The words of which
> I changed precedence are used only in the the problematic
> contexts.


Oh, okay.  But the point I was trying to make is that if a change that the
patch makes is not necessary to solve the problem at hand, we should try
to pursue it separately.  At first, I thought *all* the changes to gram.y
in your proposed patch were necessary, but after your clarifications, it
became clear to me that the only change needed was to refactor a_expr,
c_expr such that the we can use a_expr for partition bound expressions
without shift/reduce conflicts.

>> It seems that we would like to use a_expr syntax but without columnref for
>> partition_bound expressions.  No columnrefs because they cause conflicts
>> with unreserved keywords MINVALUE and MAXVALUE that are used in range
> 
> Right.
> 
>> bound productions.  I think that can be implemented with minimal changes
>> to expression syntax productions, as shown in the attached patch.
> 
> Agreed that the refactor stuff can be separated, but I'm a bit
> uneasy with the increased number of new syntax. The purpose of
> the change of c_expr in v6 was to minimize the *structure* change
> of *_expr syntax. I agree that v8 style is preferable to
> backpatch to PG10, but I'd still like to use v6 style for PG11.

I think if this bug/open item can be resolved by adopting the minimal
patch, then we should use it for that.  Maybe, we can discuss the rest of
the changes independently.  If they make things better overall, we should
definitely try to adopt them.

>> About the changes in transformPartitionBoundValue() to check for collation
>> conflict, I think that seems unnecessary.  We're not going to use the
>> partition bound expression's collation anywhere, so trying to validate it
>> seems unnecessary.  I wondered if we should issue a WARNING to warn the
>> user that COLLATE specified in a partition bound expression is ignored,
> 
> That's definitely wrong.

Sorry, there may have been a misunderstanding.

> Partition key expression is inheriting the collation of the base
> column and any collation can be specifiable on the expression.
> Similary we can specify collation on partition bound values with
> this patch. If a key expression and a bound value have
> contradicting collations, we cannot determine the ordering of the
> values and should error out.

There is infrastructure in place to store the collation of the partition
key (partcollation column of pg_partitioned_table catalog), which is used
when routing tuples, generating partition constraint expression, etc.  So,
we do remember the collation, either the default one derived from the
underlying column's definition or a user-specified one for partitioning.
And it *is* used when routing data (a partitioning action) and when
applying partition constraints.  I will borrow an example from an email I
had sent on the pruning thread:

create table p (a text) partition by range (a collate "en_US");
create table p1 partition of p for values from ('a') to ('m');
create table p2 partition of p for values from ('m') to ('z ');

create table q (a text) partition by range (a collate "C");
create table q1 partition of q for values from ('a') to ('m');
create table q2 partition of q for values from ('m') to ('z ');

-- per "en_US", 'a' <= 'A' < 'm'
insert into p values ('A');
INSERT 0 1

-- per "C", that's not true
insert into q values ('A');
ERROR:  no partition of relation "q" found for row
DETAIL:  Partition key of the failing row contains (a) = (A).

So, user-specified collation is definitely considered for partitioning,
but you seem to be already aware of that per rest of your email.

> Collation is requried for those languages where the letter
> ordering is different from ordinary order. For example,
> 
> =# create table p (a text collate "sv_SE") partition by range (a);
> =# create table c1 partiton of p for values from ('x') to ('ö');
> 
> The above is accepted in V10 but,

Right and it's because 'x' <= 'ö' per "sv_SE" which is the collation of
the partition key.

> =# create table p (a text collate "en_US") partition by range (a);
> postgres=# create table c1 partiton of p for values from ('x') to ('ö');
> ERROR:  syntax error at or near "partiton"
> LINE 1: create table c1 partiton of p for values from ('x') to ('ö')...

I guess you meant to show this:

create table p (a text collate "en_US") partition by range (a);
create table c1 partition of p for values from ('x') to ('ö');
ERROR:  empty range bound specified for partition "c1"
DETAIL:  Specified lower bound ('x') is greater than or equal to upper
bound ('ö').

Because 'x' is not <= 'ö', per "en_US".

> This is because the collation of the key column is actually
> *used* to check the range bounds.

Yes.

> With this fix partition bound
> values can have collate specification and if it differs from key
> collation, it is just a mess.

I was trying to point out that there is no infrastructure to keep track of
the collations of the individual bound values.  Neither in the system
catalog, nor in the PartitionBoundInfoData internal structure.  So, even
if a user may specify collation for the individual bound value expressions
because the syntax allows, there is no place to put it or no occasion to
use it.

In the particular example that you provided, an informed user might be
able to deduce from the below DETAIL message that the problem might have
to do with collation.

DETAIL:  Specified lower bound ('x') is greater than or equal to upper
bound ('ö').

Maybe, we should add a HINT to ask user to check the collation that's been
defined for the partition key.  But checking the collation of the
partition bound expression and causing error when it doesn't match the
partition key's collation does not seem to improve the situation for users
in this case.

>> but not sure after seeing that we issue none when a user specifies
>> COLLATION in the expression for a column's DEFAULT value.  We do still
>> store the COLLATION expression in pg_attrdef, but it doesn't seem to be
>> used anywhere.
> 
> DEFAULT value is not compared with anything, partition key
> expression *is* compared with column value and the collation is
> applied on the comparison.
> 
> In the following case, the comparison between 'zirkoniumet' and
> 'z', 'å' must be performed with "sv_SE" since the range is empty
> if the collation were en_US and it is not the intention of the
> user.
> 
> =# create table p (a text) partition by range (a collate "sv_SE");
> =# create table c1 partition of p for values from ('z') to ('å');
> =# insert into p values ('zirkoniumet');

It *will* be performed with "sv_SE", because that's the partition key's
collation and so it will be routed into c1.  That doesn't have anything to
do with what collation was used for partition bounds.

Regarding why I brought up the DEFAULT into this discussion, see the
following example:

create table foo (a text collate "en_US" check (a >= 'x'));
alter table foo alter a set default 'ö' collate "sv_SE";
\d foo
                         Table "public.foo"
 Column | Type | Collation | Nullable |           Default
--------+------+-----------+----------+-----------------------------
 a      | text | en_US     |          | ('ö'::text COLLATE "sv_SE")
Check constraints:
    "foo_a_check" CHECK (a >= 'x'::text)


insert into foo values (DEFAULT);
ERROR:  new row for relation "foo" violates check constraint "foo_a_check"
DETAIL:  Failing row contains (ö).

Here because the collation of the underlying column is "en_US" check
constraint expression a >= 'x' is evaluated using that collation and so
trying to insert the default value 'ö' fails.  Specifying collation for
the default expression didn't change anything, but the user wasn't warned
about that.

Am I missing something?

Thanks,
Amit

Re: Boolean partitions syntax

Reply via email to