[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases

2019-10-30 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962767#comment-16962767
 ] 

Jark Wu commented on FLINK-14539:
-

[~KevinZwx], I created FLINK-14567.

> Unique key metadata should be ketp when using concat or concat_ws in some 
> cases
> ---
>
> Key: FLINK-14539
> URL: https://issues.apache.org/jira/browse/FLINK-14539
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Kevin Zhang
>Assignee: Kevin Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently unique key metadata of a project relnode are only kept in the 
> following three situations:
> # project the child unique keys while not changing them
> # cast the child unique key when ignoring nulls and the original type of the 
> field and cast type are the same
> # rename the child unique keys
> Besides these situations, concat and concat_ws should also keep the metadata 
> if they won't break the uniqueness of the child unique keys, i.e. each 
> operands is in one of the above situations, and the operands include all the 
> child unique keys. 
> Say the f0 and f1 are the unique key fields of the child node, the following 
> sqls should keep the unique key metadata 
> {code:sql}
> select concat(f0, f1)
> -- the type of f0 and f1 are both varchar originally and ignore nulls
> select concat(cast(f0 as varchar), f1)
> select cast(concat(f0, f1) as varchar)
> {code}
> while the following sqls should discard the unique key metadata
> {code:sql}
> -- the type of f0 and f1 are both varchar originally
> select concat(cast(f0 as bigint), f1)
> select cast(concat(f0, f1) as bigint)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases

2019-10-30 Thread Kevin Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962759#comment-16962759
 ] 

Kevin Zhang commented on FLINK-14539:
-

[~jark] That's good, I'd like to join the discussion by then, thanks.

> Unique key metadata should be ketp when using concat or concat_ws in some 
> cases
> ---
>
> Key: FLINK-14539
> URL: https://issues.apache.org/jira/browse/FLINK-14539
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Kevin Zhang
>Assignee: Kevin Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently unique key metadata of a project relnode are only kept in the 
> following three situations:
> # project the child unique keys while not changing them
> # cast the child unique key when ignoring nulls and the original type of the 
> field and cast type are the same
> # rename the child unique keys
> Besides these situations, concat and concat_ws should also keep the metadata 
> if they won't break the uniqueness of the child unique keys, i.e. each 
> operands is in one of the above situations, and the operands include all the 
> child unique keys. 
> Say the f0 and f1 are the unique key fields of the child node, the following 
> sqls should keep the unique key metadata 
> {code:sql}
> select concat(f0, f1)
> -- the type of f0 and f1 are both varchar originally and ignore nulls
> select concat(cast(f0 as varchar), f1)
> select cast(concat(f0, f1) as varchar)
> {code}
> while the following sqls should discard the unique key metadata
> {code:sql}
> -- the type of f0 and f1 are both varchar originally
> select concat(cast(f0 as bigint), f1)
> select cast(concat(f0, f1) as bigint)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases

2019-10-30 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962754#comment-16962754
 ] 

Jark Wu commented on FLINK-14539:
-

[~KevinZwx], Yes, this is an very important use case and is bothered by several 
users when they are trying 1.9.
I will create another issue to discuss this. 

> Unique key metadata should be ketp when using concat or concat_ws in some 
> cases
> ---
>
> Key: FLINK-14539
> URL: https://issues.apache.org/jira/browse/FLINK-14539
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Kevin Zhang
>Assignee: Kevin Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently unique key metadata of a project relnode are only kept in the 
> following three situations:
> # project the child unique keys while not changing them
> # cast the child unique key when ignoring nulls and the original type of the 
> field and cast type are the same
> # rename the child unique keys
> Besides these situations, concat and concat_ws should also keep the metadata 
> if they won't break the uniqueness of the child unique keys, i.e. each 
> operands is in one of the above situations, and the operands include all the 
> child unique keys. 
> Say the f0 and f1 are the unique key fields of the child node, the following 
> sqls should keep the unique key metadata 
> {code:sql}
> select concat(f0, f1)
> -- the type of f0 and f1 are both varchar originally and ignore nulls
> select concat(cast(f0 as varchar), f1)
> select cast(concat(f0, f1) as varchar)
> {code}
> while the following sqls should discard the unique key metadata
> {code:sql}
> -- the type of f0 and f1 are both varchar originally
> select concat(cast(f0 as bigint), f1)
> select cast(concat(f0, f1) as bigint)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases

2019-10-30 Thread Kevin Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962753#comment-16962753
 ] 

Kevin Zhang commented on FLINK-14539:
-

[~jark] thansk, greate catch, I omitted this kind of situations. I'll close the 
PR later.

But there are some scenarios where we need preserve the unique keys. For 
example, we have a hbase table sink with rowkey of varchar (also primary key) 
and a column of bigint, we want to write the result of the following query into 
the sink  using upsert mode, currently the sql will fail the primary key check, 
do you have any suggestions about how to do this?
{code:sql}
select f0, f1 sum(f2)
from t1
group by f0, f1
{code}



> Unique key metadata should be ketp when using concat or concat_ws in some 
> cases
> ---
>
> Key: FLINK-14539
> URL: https://issues.apache.org/jira/browse/FLINK-14539
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Kevin Zhang
>Assignee: Kevin Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently unique key metadata of a project relnode are only kept in the 
> following three situations:
> # project the child unique keys while not changing them
> # cast the child unique key when ignoring nulls and the original type of the 
> field and cast type are the same
> # rename the child unique keys
> Besides these situations, concat and concat_ws should also keep the metadata 
> if they won't break the uniqueness of the child unique keys, i.e. each 
> operands is in one of the above situations, and the operands include all the 
> child unique keys. 
> Say the f0 and f1 are the unique key fields of the child node, the following 
> sqls should keep the unique key metadata 
> {code:sql}
> select concat(f0, f1)
> -- the type of f0 and f1 are both varchar originally and ignore nulls
> select concat(cast(f0 as varchar), f1)
> select cast(concat(f0, f1) as varchar)
> {code}
> while the following sqls should discard the unique key metadata
> {code:sql}
> -- the type of f0 and f1 are both varchar originally
> select concat(cast(f0 as bigint), f1)
> select cast(concat(f0, f1) as bigint)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases

2019-10-29 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962681#comment-16962681
 ] 

Jark Wu commented on FLINK-14539:
-

After rethinking this issue, I think we may can't support drivation primary key 
from concat/concat_ws. For example, if we have a primary key (f0, f1, f2) which 
are all varchar type, say we have two unique records ('a', 'b', 'c') and ('ab', 
'', 'c'), but the results of concat(f0, f1, f2) are the same, which means the 
concat value is not primary key anymore. 

> Unique key metadata should be ketp when using concat or concat_ws in some 
> cases
> ---
>
> Key: FLINK-14539
> URL: https://issues.apache.org/jira/browse/FLINK-14539
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Kevin Zhang
>Assignee: Kevin Zhang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently unique key metadata of a project relnode are only kept in the 
> following three situations:
> # project the child unique keys while not changing them
> # cast the child unique key when ignoring nulls and the original type of the 
> field and cast type are the same
> # rename the child unique keys
> Besides these situations, concat and concat_ws should also keep the metadata 
> if they won't break the uniqueness of the child unique keys, i.e. each 
> operands is in one of the above situations, and the operands include all the 
> child unique keys. 
> Say the f0 and f1 are the unique key fields of the child node, the following 
> sqls should keep the unique key metadata 
> {code:sql}
> select concat(f0, f1)
> -- the type of f0 and f1 are both varchar originally and ignore nulls
> select concat(cast(f0 as varchar), f1)
> select cast(concat(f0, f1) as varchar)
> {code}
> while the following sqls should discard the unique key metadata
> {code:sql}
> -- the type of f0 and f1 are both varchar originally
> select concat(cast(f0 as bigint), f1)
> select cast(concat(f0, f1) as bigint)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases

2019-10-28 Thread Kevin Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960875#comment-16960875
 ] 

Kevin Zhang commented on FLINK-14539:
-

Thanks for your opinions[~jark][~danny0405]. Actually I've already implement 
this by using RexBiVisitor just like a RexVisitor but using the additional 
argument to pass the outIndex, otherwise it's hard to determine what inIndex 
and outIndex pair should be put into the mapInToOutPos. I'll open a pr and 
appreciate it if you can help to review there.

> Unique key metadata should be ketp when using concat or concat_ws in some 
> cases
> ---
>
> Key: FLINK-14539
> URL: https://issues.apache.org/jira/browse/FLINK-14539
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Kevin Zhang
>Priority: Major
>
> Currently unique key metadata of a project relnode are only kept in the 
> following three situations:
> # project the child unique keys while not changing them
> # cast the child unique key when ignoring nulls and the original type of the 
> field and cast type are the same
> # rename the child unique keys
> Besides these situations, concat and concat_ws should also keep the metadata 
> if they won't break the uniqueness of the child unique keys, i.e. each 
> operands is in one of the above situations, and the operands include all the 
> child unique keys. 
> Say the f0 and f1 are the unique key fields of the child node, the following 
> sqls should keep the unique key metadata 
> {code:sql}
> select concat(f0, f1)
> -- the type of f0 and f1 are both varchar originally and ignore nulls
> select concat(cast(f0 as varchar), f1)
> select cast(concat(f0, f1) as varchar)
> {code}
> while the following sqls should discard the unique key metadata
> {code:sql}
> -- the type of f0 and f1 are both varchar originally
> select concat(cast(f0 as bigint), f1)
> select cast(concat(f0, f1) as bigint)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases

2019-10-28 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960873#comment-16960873
 ] 

Jark Wu commented on FLINK-14539:
-

Hi [~danny0405], the I think the reason to use {{RexBiVisitor}} instead of 
{{RexVisitor}} is we need the operands information, i.e. uniqueness of 
operands. We can pass it to the constructor of RexVisitor, but I think using 
RexBiVisitor is much cleaner. 
However I didn't find any evidence that RexBiVisitor is used for three-valued 
boolean logic from the javadoc of RexBiVisitor. 

> Unique key metadata should be ketp when using concat or concat_ws in some 
> cases
> ---
>
> Key: FLINK-14539
> URL: https://issues.apache.org/jira/browse/FLINK-14539
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Kevin Zhang
>Priority: Major
>
> Currently unique key metadata of a project relnode are only kept in the 
> following three situations:
> # project the child unique keys while not changing them
> # cast the child unique key when ignoring nulls and the original type of the 
> field and cast type are the same
> # rename the child unique keys
> Besides these situations, concat and concat_ws should also keep the metadata 
> if they won't break the uniqueness of the child unique keys, i.e. each 
> operands is in one of the above situations, and the operands include all the 
> child unique keys. 
> Say the f0 and f1 are the unique key fields of the child node, the following 
> sqls should keep the unique key metadata 
> {code:sql}
> select concat(f0, f1)
> -- the type of f0 and f1 are both varchar originally and ignore nulls
> select concat(cast(f0 as varchar), f1)
> select cast(concat(f0, f1) as varchar)
> {code}
> while the following sqls should discard the unique key metadata
> {code:sql}
> -- the type of f0 and f1 are both varchar originally
> select concat(cast(f0 as bigint), f1)
> select cast(concat(f0, f1) as bigint)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases

2019-10-28 Thread Danny Chen (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960862#comment-16960862
 ] 

Danny Chen commented on FLINK-14539:


Nice catch [~KevinZwx], you should use RexVisitor instead of RexBiVisitor, 
RexBiVisitor is used to fetch policies for handling two- and three-valued 
boolean logic. You can use a RexVisitor directly here to keep a stack of the 
calls. You just need to update the uniqueness when traversing the calls 
recursively.

> Unique key metadata should be ketp when using concat or concat_ws in some 
> cases
> ---
>
> Key: FLINK-14539
> URL: https://issues.apache.org/jira/browse/FLINK-14539
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Kevin Zhang
>Priority: Major
>
> Currently unique key metadata of a project relnode are only kept in the 
> following three situations:
> # project the child unique keys while not changing them
> # cast the child unique key when ignoring nulls and the original type of the 
> field and cast type are the same
> # rename the child unique keys
> Besides these situations, concat and concat_ws should also keep the metadata 
> if they won't break the uniqueness of the child unique keys, i.e. each 
> operands is in one of the above situations, and the operands include all the 
> child unique keys. 
> Say the f0 and f1 are the unique key fields of the child node, the following 
> sqls should keep the unique key metadata 
> {code:sql}
> select concat(f0, f1)
> -- the type of f0 and f1 are both varchar originally and ignore nulls
> select concat(cast(f0 as varchar), f1)
> select cast(concat(f0, f1) as varchar)
> {code}
> while the following sqls should discard the unique key metadata
> {code:sql}
> -- the type of f0 and f1 are both varchar originally
> select concat(cast(f0 as bigint), f1)
> select cast(concat(f0, f1) as bigint)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases

2019-10-28 Thread Jark Wu (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960843#comment-16960843
 ] 

Jark Wu commented on FLINK-14539:
-

Hi [~KevinZwx], I think RexBiVisitor is fine. 

Some thoughts from my side, I think what we need is something like 
{{SqlOperator#getMonotonicity}}, by using RexBiVisitor, maybe we need something:

{{class UniquenessPreserveVisitor implements RexBiVisitor {...} }}

Where the UniquenessCallContext contains {{isArgumentUnqiuenessPreserved(int 
idx)}}, {{int getArgumentCount()}}. Maybe we can make {{UniquenessCallContext}} 
extends {{CallContext}}.



> Unique key metadata should be ketp when using concat or concat_ws in some 
> cases
> ---
>
> Key: FLINK-14539
> URL: https://issues.apache.org/jira/browse/FLINK-14539
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Kevin Zhang
>Priority: Major
>
> Currently unique key metadata of a project relnode are only kept in the 
> following three situations:
> # project the child unique keys while not changing them
> # cast the child unique key when ignoring nulls and the original type of the 
> field and cast type are the same
> # rename the child unique keys
> Besides these situations, concat and concat_ws should also keep the metadata 
> if they won't break the uniqueness of the child unique keys, i.e. each 
> operands is in one of the above situations, and the operands include all the 
> child unique keys. 
> Say the f0 and f1 are the unique key fields of the child node, the following 
> sqls should keep the unique key metadata 
> {code:sql}
> select concat(f0, f1)
> -- the type of f0 and f1 are both varchar originally and ignore nulls
> select concat(cast(f0 as varchar), f1)
> select cast(concat(f0, f1) as varchar)
> {code}
> while the following sqls should discard the unique key metadata
> {code:sql}
> -- the type of f0 and f1 are both varchar originally
> select concat(cast(f0 as bigint), f1)
> select cast(concat(f0, f1) as bigint)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases

2019-10-28 Thread Kevin Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960781#comment-16960781
 ] 

Kevin Zhang commented on FLINK-14539:
-

I indend to implement this using a RexBiVisitor, because there are cases concat 
and cast can each other inside, and it's more convenient to extend when we find 
more cases that we can keep the unique key metadata. If it's ok and not breaks 
some other issues, I'd like to open a pr for further review.

> Unique key metadata should be ketp when using concat or concat_ws in some 
> cases
> ---
>
> Key: FLINK-14539
> URL: https://issues.apache.org/jira/browse/FLINK-14539
> Project: Flink
>  Issue Type: Bug
>  Components: Table SQL / Planner
>Affects Versions: 1.9.0, 1.9.1
>Reporter: Kevin Zhang
>Priority: Major
>
> Currently unique key metadata of a project relnode are only kept in the 
> following three situations:
> # project the child unique keys while not changing them
> # cast the child unique key when ignoring nulls and the original type of the 
> field and cast type are the same
> # rename the child unique keys
> Besides these situations, concat and concat_ws should also keep the metadata 
> if they won't break the uniqueness of the child unique keys, i.e. each 
> operands is in one of the above situations, and the operands include all the 
> child unique keys. 
> Say the f0 and f1the child are the unique keys of the child node, the 
> following sqls should keep the unique key metadata 
> {code:sql}
> select concat(f0, f1)
> -- the type of f0 and f1 are both varchar originally and ignore nulls
> select concat(cast(f0 as varchar), f1)
> select cast(concat(f0, f1) as varchar)
> {code}
> while the following sqls should discard the unique key metadata
> {code:sql}
> -- the type of f0 and f1 are both varchar originally
> select concat(cast(f0 as bigint), f1)
> select cast(concat(f0, f1) as bigint)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)