[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-11 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836338#comment-17836338
 ] 

Julian Hyde commented on CALCITE-6357:
--

We added {{RelRoot}} way back in CALCITE-819 to express the fact that you want 
columns to have particular names, and that you want the relation to be sorted 
by columns that are not returned in the result.

If you need that, use {{RelRoot}}. But it is not a {{RelNode}} because adding 
those capabilities to {{RelNode}} would hamper the operations of the planner.

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-11 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836303#comment-17836303
 ] 

Kenneth Knowles commented on CALCITE-6357:
--

I'm still stuck on the idea that if I write "select x as col1, y as col2 from 
aTableWithAHundredColumns" it could be valid to return a table with columns x 
and y. I would be shocked if there exists an RDMS in the world that fails to 
give output with columns col1 and col2. So I just mean to ask how we (Beam) can 
invoke calcite to achieve that. I'm sure it works, since of course all the 
other users of Calcite haven't hit this.

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835917#comment-17835917
 ] 

Julian Hyde commented on CALCITE-6357:
--

[~brachi_packter], It might be useful to reduce this to a test case in 
{{RelBuilderTest}}. Can you do that?

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835893#comment-17835893
 ] 

Julian Hyde commented on CALCITE-6357:
--

*Caveat*: I haven't read every word you've written above; I only scanned the 
Beam case. I'm correcting what seem to be mistaken assumptions, in the hope 
that it will allow you to diagnose your problem faster. I hope that I am not 
dissuading other Calcite community members who may have more time from jumping 
in to help.

{quote}I presume you would agree that names of output columns is as much part 
of data integrity as the values{quote}

No, I would not. Calcite does not commit to preserving column names, only their 
types and ordering. It recognizes duplicate relational expressions (via 
memoization), forms equivalence sets of relational expressions, and after 
optimization will return one of the relational expressions in that subset.

{quote}isn't it a valid case to have a row with wider schema from what you 
actually need to select?{quote}

Sure, you can write "select x, y from aTableWithAHundredColumns". That's a 
Project (with two expressions) on a Scan (returning 100 columns). My point is 
that the Project knows that its input has 100 columns.

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Kenneth Knowles (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835868#comment-17835868
 ] 

Kenneth Knowles commented on CALCITE-6357:
--

[~julianhyde] - are we "holding it wrong"? Can you help us understand how?

I presume you would agree that names of output columns is as much part of data 
integrity as the values. So in BeamAggregateProjectMergeRule are we somehow 
making Calcite think this is not the output column?

It does seem that our rule is the delta that causes the problem. Calcite 
without that rule seems to do the right thing. The rule does one thing: runs 
the normal AggregateProjectMergeRule only when the underlying connector does 
*not* support pushdown.

In the queries that hit this bug, we know that the connector does *not* support 
pushdown, so the original behavior should be maintained.

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Brachi Packter (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835858#comment-17835858
 ] 

Brachi Packter commented on CALCITE-6357:
-

> If the number of fields does not match, that's probably a problem on your 
> end. RelBuilder almost always requires number of fields to match.

why? isn't it a valid case to have a row with wider schema from you actually 
need to select? (e.g group by queries, select one dimension from the row and 
make some count/sum on it)

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-6357) Calcite enforces select arguments count to be same as row schema fields which causes aliases to be ignored

2024-04-10 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-6357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835833#comment-17835833
 ] 

Julian Hyde commented on CALCITE-6357:
--

If the number of fields does not match, that's probably a problem on your end. 
{{RelBuilder}} almost always requires number of fields to match.

Regarding column aliases. Calcite generally doesn't promise to preserve 
aliases, but in some cases you can force a rename. At 
[RelBuilder#2125|https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125]
 it seems that {{force}} is false. For the behavior you want, {{force}} would 
need to be true.

> Calcite enforces select arguments count to be same as row schema fields which 
> causes aliases to be ignored
> --
>
> Key: CALCITE-6357
> URL: https://issues.apache.org/jira/browse/CALCITE-6357
> Project: Calcite
>  Issue Type: Bug
>Reporter: Brachi Packter
>Priority: Major
>
> Calcite RelBuilder.ProjectNamed checks if row size in the select is identical 
> to schema fields, if no, it creates a project with fields as they appear in 
> the select , meaning if they have aliases, they are returning with their 
> aliases.
> Here, it checks if they are identical:
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2063
> using RexUtil.isIdentity method:
> ```
>  public static boolean isIdentity(List exps,
>   RelDataType inputRowType) {
> return inputRowType.getFieldCount() == exps.size()
> && containIdentity(exps, inputRowType, Litmus.IGNORE);
>   }
> ```
> This is the problematic part `inputRowType.getFieldCount() == exps.size()`
> If they are identical, and return with their aliases, it is ignored in the 
> "rename" method later on
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2125
> and alias is skipped
> https://github.com/apache/calcite/blob/f14cf4c32b9079984a988bbad40230aa6a59b127/core/src/main/java/org/apache/calcite/tools/RelBuilder.java#L2137
> This doesn't impact calcite queries, but in Apache Beam they are doing some 
> optimization on top of it, 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rule/BeamAggregateProjectMergeRule.java
> which causes aliases to be ignored, and data is returning suddenly without 
> correct column field.
> I believe the isIdentity check can causes more issues if not fixed, we need 
> to understand why is it enforced? isn't it valid to have different size of 
> fields in select from what we have in the schema?
> In our case we have a one big row and we run on it different queries, each 
> with different fields in the select.
> Beam issue 
> https://github.com/apache/beam/issues/30498 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)