[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases
[ https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962767#comment-16962767 ] Jark Wu commented on FLINK-14539: - [~KevinZwx], I created FLINK-14567. > Unique key metadata should be ketp when using concat or concat_ws in some > cases > --- > > Key: FLINK-14539 > URL: https://issues.apache.org/jira/browse/FLINK-14539 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: Kevin Zhang >Assignee: Kevin Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently unique key metadata of a project relnode are only kept in the > following three situations: > # project the child unique keys while not changing them > # cast the child unique key when ignoring nulls and the original type of the > field and cast type are the same > # rename the child unique keys > Besides these situations, concat and concat_ws should also keep the metadata > if they won't break the uniqueness of the child unique keys, i.e. each > operands is in one of the above situations, and the operands include all the > child unique keys. > Say the f0 and f1 are the unique key fields of the child node, the following > sqls should keep the unique key metadata > {code:sql} > select concat(f0, f1) > -- the type of f0 and f1 are both varchar originally and ignore nulls > select concat(cast(f0 as varchar), f1) > select cast(concat(f0, f1) as varchar) > {code} > while the following sqls should discard the unique key metadata > {code:sql} > -- the type of f0 and f1 are both varchar originally > select concat(cast(f0 as bigint), f1) > select cast(concat(f0, f1) as bigint) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases
[ https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962759#comment-16962759 ] Kevin Zhang commented on FLINK-14539: - [~jark] That's good, I'd like to join the discussion by then, thanks. > Unique key metadata should be ketp when using concat or concat_ws in some > cases > --- > > Key: FLINK-14539 > URL: https://issues.apache.org/jira/browse/FLINK-14539 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: Kevin Zhang >Assignee: Kevin Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently unique key metadata of a project relnode are only kept in the > following three situations: > # project the child unique keys while not changing them > # cast the child unique key when ignoring nulls and the original type of the > field and cast type are the same > # rename the child unique keys > Besides these situations, concat and concat_ws should also keep the metadata > if they won't break the uniqueness of the child unique keys, i.e. each > operands is in one of the above situations, and the operands include all the > child unique keys. > Say the f0 and f1 are the unique key fields of the child node, the following > sqls should keep the unique key metadata > {code:sql} > select concat(f0, f1) > -- the type of f0 and f1 are both varchar originally and ignore nulls > select concat(cast(f0 as varchar), f1) > select cast(concat(f0, f1) as varchar) > {code} > while the following sqls should discard the unique key metadata > {code:sql} > -- the type of f0 and f1 are both varchar originally > select concat(cast(f0 as bigint), f1) > select cast(concat(f0, f1) as bigint) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases
[ https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962754#comment-16962754 ] Jark Wu commented on FLINK-14539: - [~KevinZwx], Yes, this is an very important use case and is bothered by several users when they are trying 1.9. I will create another issue to discuss this. > Unique key metadata should be ketp when using concat or concat_ws in some > cases > --- > > Key: FLINK-14539 > URL: https://issues.apache.org/jira/browse/FLINK-14539 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: Kevin Zhang >Assignee: Kevin Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently unique key metadata of a project relnode are only kept in the > following three situations: > # project the child unique keys while not changing them > # cast the child unique key when ignoring nulls and the original type of the > field and cast type are the same > # rename the child unique keys > Besides these situations, concat and concat_ws should also keep the metadata > if they won't break the uniqueness of the child unique keys, i.e. each > operands is in one of the above situations, and the operands include all the > child unique keys. > Say the f0 and f1 are the unique key fields of the child node, the following > sqls should keep the unique key metadata > {code:sql} > select concat(f0, f1) > -- the type of f0 and f1 are both varchar originally and ignore nulls > select concat(cast(f0 as varchar), f1) > select cast(concat(f0, f1) as varchar) > {code} > while the following sqls should discard the unique key metadata > {code:sql} > -- the type of f0 and f1 are both varchar originally > select concat(cast(f0 as bigint), f1) > select cast(concat(f0, f1) as bigint) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases
[ https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962753#comment-16962753 ] Kevin Zhang commented on FLINK-14539: - [~jark] thansk, greate catch, I omitted this kind of situations. I'll close the PR later. But there are some scenarios where we need preserve the unique keys. For example, we have a hbase table sink with rowkey of varchar (also primary key) and a column of bigint, we want to write the result of the following query into the sink using upsert mode, currently the sql will fail the primary key check, do you have any suggestions about how to do this? {code:sql} select f0, f1 sum(f2) from t1 group by f0, f1 {code} > Unique key metadata should be ketp when using concat or concat_ws in some > cases > --- > > Key: FLINK-14539 > URL: https://issues.apache.org/jira/browse/FLINK-14539 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: Kevin Zhang >Assignee: Kevin Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently unique key metadata of a project relnode are only kept in the > following three situations: > # project the child unique keys while not changing them > # cast the child unique key when ignoring nulls and the original type of the > field and cast type are the same > # rename the child unique keys > Besides these situations, concat and concat_ws should also keep the metadata > if they won't break the uniqueness of the child unique keys, i.e. each > operands is in one of the above situations, and the operands include all the > child unique keys. > Say the f0 and f1 are the unique key fields of the child node, the following > sqls should keep the unique key metadata > {code:sql} > select concat(f0, f1) > -- the type of f0 and f1 are both varchar originally and ignore nulls > select concat(cast(f0 as varchar), f1) > select cast(concat(f0, f1) as varchar) > {code} > while the following sqls should discard the unique key metadata > {code:sql} > -- the type of f0 and f1 are both varchar originally > select concat(cast(f0 as bigint), f1) > select cast(concat(f0, f1) as bigint) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases
[ https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962681#comment-16962681 ] Jark Wu commented on FLINK-14539: - After rethinking this issue, I think we may can't support drivation primary key from concat/concat_ws. For example, if we have a primary key (f0, f1, f2) which are all varchar type, say we have two unique records ('a', 'b', 'c') and ('ab', '', 'c'), but the results of concat(f0, f1, f2) are the same, which means the concat value is not primary key anymore. > Unique key metadata should be ketp when using concat or concat_ws in some > cases > --- > > Key: FLINK-14539 > URL: https://issues.apache.org/jira/browse/FLINK-14539 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: Kevin Zhang >Assignee: Kevin Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Currently unique key metadata of a project relnode are only kept in the > following three situations: > # project the child unique keys while not changing them > # cast the child unique key when ignoring nulls and the original type of the > field and cast type are the same > # rename the child unique keys > Besides these situations, concat and concat_ws should also keep the metadata > if they won't break the uniqueness of the child unique keys, i.e. each > operands is in one of the above situations, and the operands include all the > child unique keys. > Say the f0 and f1 are the unique key fields of the child node, the following > sqls should keep the unique key metadata > {code:sql} > select concat(f0, f1) > -- the type of f0 and f1 are both varchar originally and ignore nulls > select concat(cast(f0 as varchar), f1) > select cast(concat(f0, f1) as varchar) > {code} > while the following sqls should discard the unique key metadata > {code:sql} > -- the type of f0 and f1 are both varchar originally > select concat(cast(f0 as bigint), f1) > select cast(concat(f0, f1) as bigint) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases
[ https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960875#comment-16960875 ] Kevin Zhang commented on FLINK-14539: - Thanks for your opinions[~jark][~danny0405]. Actually I've already implement this by using RexBiVisitor just like a RexVisitor but using the additional argument to pass the outIndex, otherwise it's hard to determine what inIndex and outIndex pair should be put into the mapInToOutPos. I'll open a pr and appreciate it if you can help to review there. > Unique key metadata should be ketp when using concat or concat_ws in some > cases > --- > > Key: FLINK-14539 > URL: https://issues.apache.org/jira/browse/FLINK-14539 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: Kevin Zhang >Priority: Major > > Currently unique key metadata of a project relnode are only kept in the > following three situations: > # project the child unique keys while not changing them > # cast the child unique key when ignoring nulls and the original type of the > field and cast type are the same > # rename the child unique keys > Besides these situations, concat and concat_ws should also keep the metadata > if they won't break the uniqueness of the child unique keys, i.e. each > operands is in one of the above situations, and the operands include all the > child unique keys. > Say the f0 and f1 are the unique key fields of the child node, the following > sqls should keep the unique key metadata > {code:sql} > select concat(f0, f1) > -- the type of f0 and f1 are both varchar originally and ignore nulls > select concat(cast(f0 as varchar), f1) > select cast(concat(f0, f1) as varchar) > {code} > while the following sqls should discard the unique key metadata > {code:sql} > -- the type of f0 and f1 are both varchar originally > select concat(cast(f0 as bigint), f1) > select cast(concat(f0, f1) as bigint) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases
[ https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960873#comment-16960873 ] Jark Wu commented on FLINK-14539: - Hi [~danny0405], the I think the reason to use {{RexBiVisitor}} instead of {{RexVisitor}} is we need the operands information, i.e. uniqueness of operands. We can pass it to the constructor of RexVisitor, but I think using RexBiVisitor is much cleaner. However I didn't find any evidence that RexBiVisitor is used for three-valued boolean logic from the javadoc of RexBiVisitor. > Unique key metadata should be ketp when using concat or concat_ws in some > cases > --- > > Key: FLINK-14539 > URL: https://issues.apache.org/jira/browse/FLINK-14539 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: Kevin Zhang >Priority: Major > > Currently unique key metadata of a project relnode are only kept in the > following three situations: > # project the child unique keys while not changing them > # cast the child unique key when ignoring nulls and the original type of the > field and cast type are the same > # rename the child unique keys > Besides these situations, concat and concat_ws should also keep the metadata > if they won't break the uniqueness of the child unique keys, i.e. each > operands is in one of the above situations, and the operands include all the > child unique keys. > Say the f0 and f1 are the unique key fields of the child node, the following > sqls should keep the unique key metadata > {code:sql} > select concat(f0, f1) > -- the type of f0 and f1 are both varchar originally and ignore nulls > select concat(cast(f0 as varchar), f1) > select cast(concat(f0, f1) as varchar) > {code} > while the following sqls should discard the unique key metadata > {code:sql} > -- the type of f0 and f1 are both varchar originally > select concat(cast(f0 as bigint), f1) > select cast(concat(f0, f1) as bigint) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases
[ https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960862#comment-16960862 ] Danny Chen commented on FLINK-14539: Nice catch [~KevinZwx], you should use RexVisitor instead of RexBiVisitor, RexBiVisitor is used to fetch policies for handling two- and three-valued boolean logic. You can use a RexVisitor directly here to keep a stack of the calls. You just need to update the uniqueness when traversing the calls recursively. > Unique key metadata should be ketp when using concat or concat_ws in some > cases > --- > > Key: FLINK-14539 > URL: https://issues.apache.org/jira/browse/FLINK-14539 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: Kevin Zhang >Priority: Major > > Currently unique key metadata of a project relnode are only kept in the > following three situations: > # project the child unique keys while not changing them > # cast the child unique key when ignoring nulls and the original type of the > field and cast type are the same > # rename the child unique keys > Besides these situations, concat and concat_ws should also keep the metadata > if they won't break the uniqueness of the child unique keys, i.e. each > operands is in one of the above situations, and the operands include all the > child unique keys. > Say the f0 and f1 are the unique key fields of the child node, the following > sqls should keep the unique key metadata > {code:sql} > select concat(f0, f1) > -- the type of f0 and f1 are both varchar originally and ignore nulls > select concat(cast(f0 as varchar), f1) > select cast(concat(f0, f1) as varchar) > {code} > while the following sqls should discard the unique key metadata > {code:sql} > -- the type of f0 and f1 are both varchar originally > select concat(cast(f0 as bigint), f1) > select cast(concat(f0, f1) as bigint) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases
[ https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960843#comment-16960843 ] Jark Wu commented on FLINK-14539: - Hi [~KevinZwx], I think RexBiVisitor is fine. Some thoughts from my side, I think what we need is something like {{SqlOperator#getMonotonicity}}, by using RexBiVisitor, maybe we need something: {{class UniquenessPreserveVisitor implements RexBiVisitor {...} }} Where the UniquenessCallContext contains {{isArgumentUnqiuenessPreserved(int idx)}}, {{int getArgumentCount()}}. Maybe we can make {{UniquenessCallContext}} extends {{CallContext}}. > Unique key metadata should be ketp when using concat or concat_ws in some > cases > --- > > Key: FLINK-14539 > URL: https://issues.apache.org/jira/browse/FLINK-14539 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: Kevin Zhang >Priority: Major > > Currently unique key metadata of a project relnode are only kept in the > following three situations: > # project the child unique keys while not changing them > # cast the child unique key when ignoring nulls and the original type of the > field and cast type are the same > # rename the child unique keys > Besides these situations, concat and concat_ws should also keep the metadata > if they won't break the uniqueness of the child unique keys, i.e. each > operands is in one of the above situations, and the operands include all the > child unique keys. > Say the f0 and f1 are the unique key fields of the child node, the following > sqls should keep the unique key metadata > {code:sql} > select concat(f0, f1) > -- the type of f0 and f1 are both varchar originally and ignore nulls > select concat(cast(f0 as varchar), f1) > select cast(concat(f0, f1) as varchar) > {code} > while the following sqls should discard the unique key metadata > {code:sql} > -- the type of f0 and f1 are both varchar originally > select concat(cast(f0 as bigint), f1) > select cast(concat(f0, f1) as bigint) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14539) Unique key metadata should be ketp when using concat or concat_ws in some cases
[ https://issues.apache.org/jira/browse/FLINK-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960781#comment-16960781 ] Kevin Zhang commented on FLINK-14539: - I indend to implement this using a RexBiVisitor, because there are cases concat and cast can each other inside, and it's more convenient to extend when we find more cases that we can keep the unique key metadata. If it's ok and not breaks some other issues, I'd like to open a pr for further review. > Unique key metadata should be ketp when using concat or concat_ws in some > cases > --- > > Key: FLINK-14539 > URL: https://issues.apache.org/jira/browse/FLINK-14539 > Project: Flink > Issue Type: Bug > Components: Table SQL / Planner >Affects Versions: 1.9.0, 1.9.1 >Reporter: Kevin Zhang >Priority: Major > > Currently unique key metadata of a project relnode are only kept in the > following three situations: > # project the child unique keys while not changing them > # cast the child unique key when ignoring nulls and the original type of the > field and cast type are the same > # rename the child unique keys > Besides these situations, concat and concat_ws should also keep the metadata > if they won't break the uniqueness of the child unique keys, i.e. each > operands is in one of the above situations, and the operands include all the > child unique keys. > Say the f0 and f1the child are the unique keys of the child node, the > following sqls should keep the unique key metadata > {code:sql} > select concat(f0, f1) > -- the type of f0 and f1 are both varchar originally and ignore nulls > select concat(cast(f0 as varchar), f1) > select cast(concat(f0, f1) as varchar) > {code} > while the following sqls should discard the unique key metadata > {code:sql} > -- the type of f0 and f1 are both varchar originally > select concat(cast(f0 as bigint), f1) > select cast(concat(f0, f1) as bigint) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)